Analyze the yelp 2016 challenge dataset

Assignment Help Basic Statistics
Reference no: EM131030607

I need urgent help in my Big Data Assignment. It needs another .tgz file to pull the data. I will share the file with you as soon as you contact me. It is in Big Data- Hadoop. Can u get it done in Hadoop/ R/ apache spark??

You must use Hadoop technologies to analyze the Yelp 2016 challenge dataset: https://www.yelp.com/dataset_challenge.

You can use Hadoop, R/ Apache Spark.

Specifically, you must provide the answers (and code) to the following questions:

Summarize the number of reviews by US city, by business category.

Rank the cities by # of stars for each category, by city.

What is the average rank (stars) for businesses within 800 ft of Times Square, by type? For this problem, assume Times Square is at lat: 40° 45' 32.0256'' N, lon: 73° 59' 6.4680'' W, and 800 ft. to be a square 10 seconds in each direction.

Rank reviewers by number of reviews. For the top 10 reviewers, show their average number of stars, by category.

For the top 10 and bottom 10 food business in Times Square (in terms of stars), summarize rating by hour of day.

Changes :Since the Yelp academic dataset does not include NY, we need to amend the coordinates:

Center: Carnegie Mellon University, Pitsburgh, PA
Latitude: 40-26'28'' N, Longitude: 079-56'34'' W

Decimal Degrees: Latitude: 40.4411801, Longitude: -79.9428294

The bounding box for the midterm is ~5 miles, which we will loosely define as 5 minutes. So the bounding box is a square box, 10 minutes each side (of longitude and latitude), with CMU at the center.

provide suitable statistical analysis of your results with R.

provide visualizations for results (distributions, graphs, maps, in R).

Reference no: EM131030607

Correlated groups t-test

A researcher testing the affects of music on studying conducts a within participants design. The results are as follows: music 6,7,6,5,6,8,8, and no music 10,7,8,7,7,9,8

An information security manager routinely

An information security manager routinely monitored Web surfing among her company's employees. She discovered that many employees were visiting the "sinful six" Web sites. (

Determine the expected number of errors

Critical key-entry errors in the data processing operation of a large district bank occur approximately .1% of the time. If a random sample of 10,000 entries is examined, de

Find least regression line from scatter diagram for data set

The scatter diagram for data set below is shown. a) given that x= 3.667, sx = 2.5819889, sy= 1.5937377 and r= -0.8894261, find out least -regression line.

Multitasking computer systems

Why do so many people believe they can think about more than one thing at a time? What effect do fast, multitasking computer systems like Unix have on such beliefs? Is this

Fresh fruit consumption-normal distribution

The Statistical Abstract of the United States published by the US Census Bureau reports that the average annual consumption of fresh fruit per person is 99.9 pounds.

Prepare a brief conclusion statement

Comparison of paired and unpaired samples test -  Prepare a brief conclusion statement summarizing your results. What can you tell this MHMR about client satisfaction? What c

What do the last three problems suggest

Verify that for the data in the previous problem, the boxplot rule declares the values 20, 240, and 250 outliers. -What do the last three problems suggest about the boxplot ru

Reviews

Write a Review

 
Free Assignment Quote

Assured A++ Grade

Get guaranteed satisfaction & time on delivery in every assignment order you paid with us! We ensure premium quality solution document along with free turntin report!

All rights reserved! Copyrights ©2019-2020 ExpertsMind IT Educational Pvt Ltd