Reference no: EM131197017
You must use Hadoop technologies to analyze the Yelp 2016 challenge dataset:
https://www.yelp.com/dataset_challenge.
You can use Hadoop, R/ Apache Spark.
Specifically, you must provide the answers (and code) to the following questions:
1. Summarize the number of reviews by US city, by business category.
2. Rank the cities by # of stars for each category, by city.
3. What is the average rank (stars) for businesses within 800 ft of Times Square, by type? For this problem, assume Times Square is at lat: 40° 45' 32.0256'' N, lon: 73° 59' 6.4680'' W, and 800 ft. to be a square 10 seconds in each direction.
4. Rank reviewers by number of reviews. For the top 10 reviewers, show their average number of stars, by category.
5. For the top 10 and bottom 10 food business in Times Square (in terms of stars), summarize rating by hour of day.
Changes: Since the Yelp academic dataset does not include NY, we need to amend the coordinates:
Center: Carnegie Mellon University, Pitsburgh, PA
Latitude: 40-26'28'' N, Longitude: 079-56'34'' W
Decimal Degrees: Latitude: 40.4411801, Longitude: -79.9428294
The bounding box for the midterm is ~5 miles, which we will loosely define as 5 minutes. So the bounding box is a square box, 10 minutes each side (of longitude and latitude), with CMU at the center.
- provide suitable statistical analysis of your results with R.
- provide visualizations for results (distributions, graphs, maps, in R).
Calculate the escape velocity of the moon
: a). Write an essay to explain why the polar regions of the Moon are the most favourable sites for the establishment of a lunar base. b). Calculate the escape velocity of the Moon in units of km/s
|
Address animal welfare and environmental concerns
: Question 1: The article calls for public policy initiatives to address animal welfare and environmental concerns associated with dairy farming. With reference to economic concepts covered in this course, explain why the government might want to in..
|
Compute the mean and variance of this distribution
: The lifetime distributions of high-speed recordable optical disks is assumed to follow lognormal distribution (Irine and Okino, IEEE Trans. on Magnetics, 2007). Suppose μ = 2.5 and σ = 0.5 weeks.
|
Personal injury lawyers may be paid a contingency fee equal
: Personal injury lawyers may be paid a contingency fee equal to a percentage of the amount awarded. The lawyer receives payment only if his or her client wins the case and is awarded a sum of money. Lawyers in other types of cases are often paid on an..
|
Summarize the number of reviews by us city
: Specifically, you must provide the answers (and code) to the following questions: Summarize the number of reviews by US city, by business category. Rank the cities by # of stars for each category, by city
|
Gasoline prices have been very volatile
: During the past year, gasoline prices have been very volatile. One reason given for at least some of the volatility is the threat of war between Israel and Iran. More recently, there has been a report of serious fire damage to Venezuela's largest ref..
|
Angle between the negative x-axis
: a) If the magnitude is A+B+C What is the angle between positive x-axis and the vector, measured in clockwise degrees? b) Use components of magnitude for -A+2B+C, what is gone be the angle between the negative x-axis and this vector measured counter..
|
Discuss their goodness of fit for a normal distribution
: Approximate the mean and standard deviation of these heights from the Q-Q plots.
|
How each item on the checklist helps evaluate a study
: Based on your review of the two studies, create a checklist to analyze the quality of research studies. Your checklist should not have more than 20 items. Avoid repetition.
|