+1-415-670-9189
info@expertsmind.com
Analyze the yelp 2016 challenge dataset
Course:- Basic Statistics
Reference No.:- EM131030607




Assignment Help
Expertsmind Rated 4.9 / 5 based on 47215 reviews.
Review Site
Assignment Help >> Basic Statistics

I need urgent help in my Big Data Assignment. It needs another .tgz file to pull the data. I will share the file with you as soon as you contact me. It is in Big Data- Hadoop. Can u get it done in Hadoop/ R/ apache spark??

You must use Hadoop technologies to analyze the Yelp 2016 challenge dataset: https://www.yelp.com/dataset_challenge.

You can use Hadoop, R/ Apache Spark.

Specifically, you must provide the answers (and code) to the following questions:

Summarize the number of reviews by US city, by business category.

Rank the cities by # of stars for each category, by city.

What is the average rank (stars) for businesses within 800 ft of Times Square, by type? For this problem, assume Times Square is at lat: 40° 45' 32.0256'' N, lon: 73° 59' 6.4680'' W, and 800 ft. to be a square 10 seconds in each direction.

Rank reviewers by number of reviews. For the top 10 reviewers, show their average number of stars, by category.

For the top 10 and bottom 10 food business in Times Square (in terms of stars), summarize rating by hour of day.

Changes :Since the Yelp academic dataset does not include NY, we need to amend the coordinates:

Center: Carnegie Mellon University, Pitsburgh, PA
Latitude: 40-26'28'' N, Longitude: 079-56'34'' W

Decimal Degrees: Latitude: 40.4411801, Longitude: -79.9428294

The bounding box for the midterm is ~5 miles, which we will loosely define as 5 minutes. So the bounding box is a square box, 10 minutes each side (of longitude and latitude), with CMU at the center.

provide suitable statistical analysis of your results with R.

provide visualizations for results (distributions, graphs, maps, in R).




Put your comment
 
Minimize


Ask Question & Get Answers from Experts
Browse some more (Basic Statistics) Materials
Construct a 95% confidence interval for the difference M1-M2 between the mean spending on prescription allergy relief medication (M1).
He finds their mean weight to be 17.78 ounces with a standard deviation of 0.4 ounces. Are the cereal boxes lighter than they should be? Let α = 0.01.
Suppose 10 balls, numbered 1-10, are to be distributed randomly into 10 boxes. a) What is the probability all 10 balls are placed into one box? b) What is the probability that
If the standard deviation of the strength of the vendor's bottles is 8 kg/cm2, with the mean of 310 kg/cm2, what are the chances of seeing what the bottling company observed?
In a certain college, 20% of the physics majors belong to ethnic minorities. If 10 students are selected at random from the physics majors, what is the probability that exac
Decide whether the normal sampling distribution can be used. If it can be used, test the claim about the population proportion p at the given level of significance α using t
A citrus company is interested in evaluating the performance of two orange juice extractors, A and B. It is believed that both the size and the variety of the fruit yield in
How long did the revolts keep the Spanish out? In the long term, were the Spanish successful in conquering/and/or assimilating the pueblo peoples? How or how not? The statue