Write a regular expressions that captures all html tags

Assignment Help Applied Statistics
Reference no: EM131635333

Problem Assignment -

The Enron scandal led to the bankruptcy of the Enron Corporation, the largest bankcruptcy reorganization in US history at that time, and to the dissolution of Arthur Andersen, one of the five largest audit and accountancy partnerships in the world. In this exercise, you will download text on the scandal available through Wikipedia and filter it to the sentences dealing with Kenneth Lay, one of the main figures in the scandal.

Download the source code from following wikipedia page: Enron scandal. Use readLines ( . . . ).

Go to the same webpage in your browser and look at the source code (Google Chrome: right mouse click & view page source). All lines that include text from the main body (no headers, info boxes, etc.) always start with the same html tag, namely <p>. Use a regular expression to limit the downloaded data to lines that include text from the main body. Use grep (. . .).

Remove html tags using gsub( . . . ). Html tags always have the same format, namely a certain number of characters within angle brackets (also called guillemets, '<' and '>'), e.g. <table>. Write a regular expressions that captures all html tags.

We want to construct a vector where each element is a single sentence, which is currently not the case First, collapse the current vector into one character string, using paste( . . . , collapse = " " ) Subsequently, seperate the vector again at the end of individual sentences. We assume that '.' is the only sentence seperator. However, '.' is also a special character for constructing regular expressions. In order to use '.' as full stop, and not as the meta character 'any character', use backslashes as shown below. In addition, use the suffix [[1]] as the output is a list.

strsplit(..., "\\.")[[1]]

Find all sentences that include the term kenneth lay, ignoring cases.

Save the resulting vector of sentences in a text file named enron_ scandal . txt. Make sure that the resulting file does not have column names, row names, or quotation marks around the individual entries.

Reference no: EM131635333

Questions Cloud

Why does pure communism not work : Why does pure communism not work? Why did it fail in the USSR, in Cuba, and to a large extent even in China.
Paper on the uprisings and protest in saudi arabia : Discuss the grievances of the people, i.e. the causes of the uprising and /or protests - Chronicle the major events during the uprising/protests.
What are the constitutional issues related to each search : What are the constitutional issues and legal doctrines related to each search? Is the search of all fifth and sixth-grade students' lockers a legal search?
The criticism of south africas transiton to democracy : What are the criticism of south africas transiton to political democracy.
Write a regular expressions that captures all html tags : The Enron scandal led to the bankruptcy of the Enron Corporation, Find all sentences that include the term kenneth lay, ignoring cases
Solving problem related to opening an account on e-trade : You and your friend have opened an account on E-Trade and have each decided to select five similar companies in which to invest.
The rights of the individual and the need to protect society : What is the role of critical thinking by both ordinary citizens and elected officials in the attempt to find solutions to this dilemma?
What is the minimum number of investors : The Darkroom Window shade Company has 100,000 shares of stock outstanding. The investors in the firm own the following numbers of shares.
Develop a hazard assessment for your workplace : Using Subpart I Appendix B as a guide, develop a hazard assessment for your workplace or a workplace you are familiar with.

Reviews

len1635333

9/9/2017 7:41:45 AM

Subject: Textual analysis in R. Also each bullet point with an exception of a few shouldnt amount to more than one line of code. It should be simple. Save the resulting vector of sentences in a text file named enron scandal. Make sure that the resulting file does not have column names, row names, or quotation marks around the individual entries.

Write a Review

Applied Statistics Questions & Answers

  Hypothesis testing

What assumptions about the number of pedestrians passing the location in an hour are necessary for your hypothesis test to be valid?

  Calculate the maximum reduction in the standard deviation

Calculate the maximum reduction in the standard deviation

  Calculate the expected value, variance, and standard deviati

Calculate the expected value, variance, and standard deviation of the total income

  Determine the impact of social media use on student learning

Research paper examines determine the impact of social media use on student learning.

  Unemployment survey

Find a statistics study on Unemployment and explain the five-step process of the study.

  Statistical studies

Locate the original poll, summarize the poling procedure (background on how information was gathered), the sample surveyed.

  Evaluate the expected value of the total number of sales

Evaluate the expected value of the total number of sales

  Statistic project

Identify sample, population, sampling frame (if applicable), and response rate (if applicable). Describe sampling technique (if applicable) or experimental design

  Simple data analysis and comparison

Write a report on simple data analysis and comparison.

  Analyze the processed data in statistical survey

Analyze the processed data in Statistical survey.

  What is the probability

Find the probability of given case.

  Frequency distribution

Accepting Manipulation or Manipulating

Free Assignment Quote

Assured A++ Grade

Get guaranteed satisfaction & time on delivery in every assignment order you paid with us! We ensure premium quality solution document along with free turntin report!

All rights reserved! Copyrights ©2019-2020 ExpertsMind IT Educational Pvt Ltd