Write small python programs to scrape simple html data

Assignment Help Computer Engineering
Reference no: EM132242167

Scraping Data

For this assignment, you'll write several small python programs (within a single jupyter notebook) to scrape simple HTML data from several websites. You will use Python 3 with the following libraries:

  • Beautiful Soup 4 (makes it easier to pull data out of HTML and XML documents)
  • Requests (for handling HTTP requests from python)

Here is a fairly simple example for finding out how many datasets can currently be searched/accessed on data.gov. You should make sure you can run this code before going on to the questions you'll be writing (the answer when I last ran this was 194,708).

import bs4
import requests
response = requests.get('https://www.data.gov/')
soup = bs4.BeautifulSoup(response.text,"html.parser")
link = soup.select("small a")[0]
print(link.text)

This is an individual programming assignment.

Tasks
Write python programs to answer the following questions. You will need to do some reading/research regarding the Beautiful Soup interface and possibly on Python as well. Also reference the relevant material in the Grus text, and my jupyter notebook from the 2/14 class. Do not hardcode any data; everything should be dynamically scraped from the live websites. Remember to post questions on Piazza.

1. Data.gov accept an integer as input and find the name (href text) of the nth "most recent" dataset on data.gov. For example, if the user enters 1, print the name of the first dataset on data.gov when ordered by "date added". You can assume that the dataset appears on the first page.

It is possible to prompt and receive user input in a jupyter notebook cell using the standard python input syntax. Try this code:

num = input("Enter a number: ")
print(num)
Example (based on data when viewed on 2/18/2019):
Which dataset? 4
Summary of RHESSys Simulations of GI Sensitivity

2. White House Press Briefings Programmatically find the link for the most recent press conference (this will be the first one on the page identified by the "Remarks" subheader), follow the link and display the time that the briefing took place. Note that the url for the most recent press briefing should not be hardcoded. If a new press briefing is added, your program should give the time of the newly added briefing. Test your code with several press briefings to be sure it is consistently getting the correct time.

Time of most recent White House Press Briefing

3. Texas Dept of Criminal Justice - Accept two integers as input. You can assume that these values represent a valid starting and ending year within the range of the years in the table. Process the html and find the total number of executions in Texas between the starting year and the ending year (inclusive of the start and end years).

Example:
Enter starting year: 1990
Enter ending year: 2000
Total executions: 206

4. For this problem, you'll interact with the Twitter API. Once again, consult the example code in the Grus text and the 2/14 in-class jupyter notebook. Use:
tweepy (for wrapping the Twitter API)
csv (for interacting with csv file)
json (for parsing json data)

Create a free Twitter account if necessary. Follow the instructions in the Grus text to enable free-access to the twitter API. Feel free to use a credentials.json file similar to my usage in the example notebook.

Use the attached data to find which currently serving senator (not representative) has the most Twitter followers and who has the fewest. Also, look up the 10 most recent tweets of each currently serving senator (once again, no representatives) and report the totals of how many people have favorited the last ten tweets and how many people retweeted the last ten tweets. Use the requests library to read the the csv file in and use the csv module to process it (look at dictreader). Be sure to filter out those who aren't currently in office and those who don't have a twitter account. Write your output in a reasonable/readable format (example below):

Example output:
Most followers: xxx
Fewest followers: xxx
xxx last 10 tweets: xx favorited, xx retweeted
xxx last 10 tweets: xx favorited, xx retweeted

Note: Need only last two questions

Attachment:- HW Spring.rar

Verified Expert

Python code for connecting it with twitter API and and there are inbuilt libraries for accessing the twitter package for this first we have to connect it with the twitter API for which we have to make the API developer in twitter and then we have to use the connectivity code with python hence we can access the tweets and the ratings of twitter using python.

Reference no: EM132242167

Questions Cloud

Determine the potential human resources benefits : Determine the potential human resources benefits of offering child and elder care in the organization. Explain work-life balances regarding achieving greater.
Explain disadvantages of developing a younger workforce : The Fig Technologies Executive Leadership Council (ELC) has asked you to compile a report regarding the age demographics within the organization.
Explain how the leader has influenced you : Explain how this leader has influenced you and why you think he or she is effective. Analyze what characteristics or qualities this person possesses.
Differentiate short-term from long-term problems : Differentiate short-term from long-term problems. Support conclusions and/or assumptions with specific references to the case.
Write small python programs to scrape simple html data : CSC 576 - Data Science - West Chester University - scrape simple HTML data from several websites - simple example for finding out how many datasets can
Explain which financial statement is the most critical : Financial statements are based on generally accepted accounting principles (GAAP) and are audited by CPA firms. Referencing textbook readings, lecture material.
Why do people views on what is right and wrong differ : Why do people's views on what is right and wrong differ? Why would an unemployed sawmill worker in northern California feel differently about cutting old growth
Prepare career opportunities research and report : An important aspect of this assignment is to show you the depth and breadth of your education. As you'll learn, there is a great deal of overlap with respect.
Define the four fundamental relationship forms : Define the four fundamental relationship forms. Provide examples of these four forms. Your response should be at least 75 words in length.

Reviews

Write a Review

Computer Engineering Questions & Answers

  How many cycles does this loop take to execute

Assume that the branch is handled by flushing the pipeline. If all memory references take 1 cycle, how many cycles does this loop take to execute?

  Print out a line stating that the value sought was found

Builds a simple linked list consisting of 100 nodes each of which contains a random integer between 50 and 100, inclusive.

  Draw particular attention to the interaction

COIT 20268 - Responsive Web Design (RWD) - Identify the positive and negative aspects of the website. Draw particular attention to the interaction.

  Calculate the batting average

A walk does not count as either a hit or a time at bat when you calculate the batting average. Which solution is best?

  Prove that the distance of c is at least three

Let A be a matrix, and let C be the code consisting of all solution to Ax = 0. If A has neither a column of zeros. Prove that the distance of C is at least 3.

  Design a minimum synchronous sequential circuit

Design a minimum synchronous sequential circuit to detect the sequence 1001. Sequences may overlap. Use SR flip-flops.

  What is hungarian notation

What is Hungarian notation and why do many object-oriented programmers feel it is not a valuable style to use?

  What technology characteristics should be evaluated

explain how Web services can be used to effectively integrate business applications and data. Search the Web for resources on current Web services such as XML, SOAP, UDDI and WSDL. Discuss how each is used, including examples from industry.

  Write a brief description of your project proposal

Write a brief description of your project proposal that describes the task to be scripted. Explain how it meets the requirements for a task that should be scripted.

  How many bytes are required to store n data items

How many bytes are required to store N data items in each of these three structures: array based list, linked list, and doubly linked list?

  Would it still need hardware memory address protection

Suppose we had a perfect object-oriented language and compiler, so that only an object's methods could access the internal data inside an object.

  Handling of disclosure of companys information

A company find out that some of its proprietary information has been unveiled within the Internet chat rooms.It asks ISPs to disclose actual identities of these people. Should the ISPs comply along with this request? Describe your reasoning.

Free Assignment Quote

Assured A++ Grade

Get guaranteed satisfaction & time on delivery in every assignment order you paid with us! We ensure premium quality solution document along with free turntin report!

All rights reserved! Copyrights ©2019-2020 ExpertsMind IT Educational Pvt Ltd