Write a regular expressions that captures all html tags

Assignment Help Applied Statistics
Reference no: EM131635333

Problem Assignment -

The Enron scandal led to the bankruptcy of the Enron Corporation, the largest bankcruptcy reorganization in US history at that time, and to the dissolution of Arthur Andersen, one of the five largest audit and accountancy partnerships in the world. In this exercise, you will download text on the scandal available through Wikipedia and filter it to the sentences dealing with Kenneth Lay, one of the main figures in the scandal.

Download the source code from following wikipedia page: Enron scandal. Use readLines ( . . . ).

Go to the same webpage in your browser and look at the source code (Google Chrome: right mouse click & view page source). All lines that include text from the main body (no headers, info boxes, etc.) always start with the same html tag, namely <p>. Use a regular expression to limit the downloaded data to lines that include text from the main body. Use grep (. . .).

Remove html tags using gsub( . . . ). Html tags always have the same format, namely a certain number of characters within angle brackets (also called guillemets, '<' and '>'), e.g. <table>. Write a regular expressions that captures all html tags.

We want to construct a vector where each element is a single sentence, which is currently not the case First, collapse the current vector into one character string, using paste( . . . , collapse = " " ) Subsequently, seperate the vector again at the end of individual sentences. We assume that '.' is the only sentence seperator. However, '.' is also a special character for constructing regular expressions. In order to use '.' as full stop, and not as the meta character 'any character', use backslashes as shown below. In addition, use the suffix [[1]] as the output is a list.

strsplit(..., "\\.")[[1]]

Find all sentences that include the term kenneth lay, ignoring cases.

Save the resulting vector of sentences in a text file named enron_ scandal . txt. Make sure that the resulting file does not have column names, row names, or quotation marks around the individual entries.

Reference no: EM131635333

Questions Cloud

Why does pure communism not work : Why does pure communism not work? Why did it fail in the USSR, in Cuba, and to a large extent even in China.
Paper on the uprisings and protest in saudi arabia : Discuss the grievances of the people, i.e. the causes of the uprising and /or protests - Chronicle the major events during the uprising/protests.
What are the constitutional issues related to each search : What are the constitutional issues and legal doctrines related to each search? Is the search of all fifth and sixth-grade students' lockers a legal search?
The criticism of south africas transiton to democracy : What are the criticism of south africas transiton to political democracy.
Write a regular expressions that captures all html tags : The Enron scandal led to the bankruptcy of the Enron Corporation, Find all sentences that include the term kenneth lay, ignoring cases
Solving problem related to opening an account on e-trade : You and your friend have opened an account on E-Trade and have each decided to select five similar companies in which to invest.
The rights of the individual and the need to protect society : What is the role of critical thinking by both ordinary citizens and elected officials in the attempt to find solutions to this dilemma?
What is the minimum number of investors : The Darkroom Window shade Company has 100,000 shares of stock outstanding. The investors in the firm own the following numbers of shares.
Develop a hazard assessment for your workplace : Using Subpart I Appendix B as a guide, develop a hazard assessment for your workplace or a workplace you are familiar with.

Reviews

len1635333

9/9/2017 7:41:45 AM

Subject: Textual analysis in R. Also each bullet point with an exception of a few shouldnt amount to more than one line of code. It should be simple. Save the resulting vector of sentences in a text file named enron scandal. Make sure that the resulting file does not have column names, row names, or quotation marks around the individual entries.

Write a Review

Applied Statistics Questions & Answers

  What descriptive statistics about motion picture industry

Descriptive statistics for each of the four variables along with a discussion of what the descriptive statistics tell us about the motion picture industry

  Find mean and the standard deviation of the round-off error

Find the mean and the standard deviation of the round-off error. Find the probability that the round-off error will be within one standard deviation of the mean.

  Describe the variables implicit in given items

A survey by an electric company contains questions on the given:- Describe the variables implicit in these 11 items as quantitative or qualitative, and describe the scales of measurement

  What factors influence the annual income

Would like to know what factors influence the annual income of a person. What are some of the variables you will look for? How would you collect data on these variables? Is the data qualitative or quantitative? Remember that for each person you find ..

  Find a function that models the brightness of the star

Find a function that models the brightness of the star as a function of time

  What is the number of observations

Given the following, complete the ANOVA table and make the correct inference. Source SS df MS F Treatments ____ 2 3.24 ____ Error ____ 17 ____  Total 40.98 ____  a) In the above ANOVA table, is the factor significant at 5% level of significant? b) Wh..

  Compare three main moral philosophies of moral virtue theory

Compare and contrast the three main moral philosophies of Moral Virtue Theory, Duty Theory and Utilitarianism and explain how each philosophy type would evaluate the morality of a particular ethical decision along with relevant examples.

  The assumptions required for statistical tests are met

Why do we care whether the assumptions required for statistical tests are met?

  What is your decision regarding the null statement if

Traditionally, 2 percent of the citizens of the United States live in a foreign country because they are disenchanted with U.S. politics or social attitudes. In order to test if this proportion has increased since the September 11, 2001, terror attac..

  Construct stem and leaf and dotplot displays

The manufacturer of a water-resistant watch has tested 80 of the watches by submerging each one until its protective seal leaked.- Construct stem-and-leaf and dotplot displays for these data.

  Find the mean, variance and standard deviation

Find the mean, median, mode, variance and standard deviation for the following nine data values and find the four quartiles and the 20 th and 80 th percentiles for the above data.

  Rover''s friends provides dog washing services.

Rover's Friends provides dog washing services. For each dog, supplies cost $ 3 and wages are $ 5. To provide this service, a special room and equipment are needed, at a cost of $ 300 per month. Rover's Friends maintains an average of 30 dogs washed e..

Free Assignment Quote

Assured A++ Grade

Get guaranteed satisfaction & time on delivery in every assignment order you paid with us! We ensure premium quality solution document along with free turntin report!

All rights reserved! Copyrights ©2019-2020 ExpertsMind IT Educational Pvt Ltd