Simple data analysis with mapreduce and spark

Assignment Help Other Subject
Reference no: EM131954339

Assignment: Simple Data Analysis with MapReduce and Spark

1 Introduction
This assignment tests your ability to implement simple data analytic workload using basic features of MapReduce and Spark framework. The data set you will work on is the Trend- ing Youtube Video Statistics data from Kaggle . There are two workloads you should design and implement against this data set. You are required to implement one with MapReduce and the other with Spark. You can choose which framework you want to use on which workload.

2 Input Data Set Description
The dataset contains several months' records of daily top trending YouTube video in the following five countries: Canada, France, Germany, UK and USA. There are up to 200 trending videos listed per day.

Each country's data is saved in a separate CSV file. Each row of the CSV file represents a trending video record. If a video is listed as trending in multiple days, each trending appearance has its own record. The record includes video id, title, trending date, publish time, number of views, and so on. The video record also includes a category id field. The categories are slightly different in each country. A JSON file is provided for each country. The JSON file defines the mapping between category ID and category name.

3 Analysis Workload Description

Category and Trending Correlation
Some videos are trending in multiple countries. We are interested to know if there is any correlation between category and overlapping trending. For instance, if UK and CA users have common interests in music, but very different interest in sports, we might see 3% trending music videos in UK that also appear in the trending list of CA; while only 0.5% of trending sports videos in UK appears in CA's trending list.
In this workload, you are asked to find out, for a given pair of countries A and B, for each category in country A, the total number of videos trending in country A and the percentage of them that are also trending in country B. For any video with multiple trending appearances in a country, it should be counted as one video in that country.

The result would look like, suppose the country is GB and US

Entertainment; total: 617; 31.6 in US Sports; total:163; 16.6 in US
...

It means that there are 617 videos from Entertainment category in UK's trending list. 31.4% of the 617 videos also appear in US's trending list; There are 152 videos from Sports category in UK's trending list. 17.1% of the 100 videos also appear in US's trending list.

Impact of Trending on View Number

Listing a video as trending would help it attract more views. The view number may quickly increase after a video is listed as trending for the first time. In fact it is not unusual for the view number to double between a video's first and second trending appearance.

Below are a few records of a particular video:

videoID         Trending Date          Publish Time                    Views         Country

xYtsL9znopI   18.17.02              2018-02-16T14:00:09.000Z  960453      CA

xYtsL9znopI   18.18.02             2018-02-16T14:00:09.000Z  2109193     CA

xYtsL9znopI   18.19.02              2018-02-16T14:00:09.000Z  2768767    CA

xYtsL9znopI   18.20.02             2018-02-16T14:00:09.000Z  3213410     CA

The video has four trending appearances in CA between February 17 of 2018 and Febru- ary 20 of 2018. The view number in its first appearance (2018/02/17) is 960,453; the view number in its second appearance (2018/02/18) is 2,109,193. There is a 119.6% increase between the second and first appearance. In contrast the increase between the third and the second appearance is only 31.2%.
In this workload, you are asked to find out, for each country, all videos that have greater than or equal to 100% 1,000% increase1 in viewing number between its second and first trending appearance. The result should be grouped by country and sorted discerningly by percent increase.

The result would look like
DE; V1zTJIfGKaA, 19501.0
DE; RIgNyiGttog, 12346.6
...
CA; _I_D_8Z4sJE, 8438.1 CA; -K9ujx8vO_A, 8298.3
...

4 Coding and Execution Requirement
Your implementation should utilize features provided by the respective framework. In particular, you should parallelize most of the operations. The Hadoop implementation should run in a pseudo-distributed mode. The Spark implementation should run in a standalone cluster or YARN cluster on a single machine.

5 Deliverable

The report should describe the design of both workloads. In particular, you should describe the sequence of operations/actions taken to obtain the final result, and highlight the part that can be executed in parallel. You can use diagrams to help explaining the sequence.

Attachment:- Assignment.rar

Reference no: EM131954339

Questions Cloud

What are your final thoughts with regards to civil rights : What are your final thoughts with regards to civil rights and civil liberties? Do you feel they are in-tact or starting to dismantle? Explain.
December for units of a product manufactured : The following are monthly actual and forecast demand levels for May through December for units of a product manufactured by the D. Bishop Company in? Des? Moine
Explore the attributes that make the vision meaningful : Explore the attributes that make the vision meaningful. Evaluate whether or not the vision statement encourages organizational change.
Determine the depreciation expense : On January 1, 2017, Slade Inc. purchased a machine for $90,000. Slade depreciated the machine with the straight-line depreciation method over a useful.
Simple data analysis with mapreduce and spark : COMP5349: Cloud Computing - describe the design of both workloads. In particular, you should describe the sequence of operations/actions taken
Conduct a hypothesis to test the state of ct : Conduct a hypothesis to test the state of CT's claim.
How has the idea of fake news impacted our ability : Although the fake news is not new, it seems harder to distinguish and more accepted now. Why? How will this acceptance of fake news impact our democracy?
Calculate the amount of depletion : Calculate the amount of depletion taken in 2019. (Note: In your calculations, round depletion per barrel to two decimal places.)
What the manager is promising that employee actually gets : What the manager is promising (either directly or indirectly), the result is a feeling of entitlement to everything that the employee actually gets.

Reviews

len1954339

4/23/2018 6:52:02 AM

There are two deliverables: source code and brief report (up to 2 pages). Both are due on Wednesday 18th of 23:59 (Week 6) Thursday 26th of 11:59 (Week 7) . Please submit the source code and a soft copy of the report as a zip or tar file in Canvas. You need to demo your implementation in week 7 during tutorial time. Please also submit a hard copy of your report together with signed cover sheet during the demo. The report should describe the design of both workloads. In particular, you should describe the sequence of operations/actions taken to obtain the final result, and highlight the part that can be executed in parallel. You can use diagrams to help explaining the sequence.

Write a Review

Other Subject Questions & Answers

  Review the cnn money report on inequality by race

Review the CNN Money Report on Inequality by Race (see link below). Using various measures, the Inequality Index provides the economic status and differences between racial groups. What are your thoughts on the status of African Americans?

  Design a behavior modeling session

Describe how you would use the components of a behavior modeling approach to design a behavior modeling session that trains supervisors to effectilvely obtain an employee's agreement for improved performance.

  How will you recognize bipolar disorder episodes for mental

How will you Recognize Bipolar Disorder Episodes for mental illness patients?

  How does altruism apply to psychology

What is the future of psychology, specifically in relation to altruism, in contemporary society?

  Discuss the risks and benefits of the applications

Write a Summary. An institution will be implementing smartphone technology to improve patient care delivery. Discuss the types of applications that should be allowed for point of need information for decision-making in practice.Review at least fi..

  Identify the key characters in the family and their roles

Identify the group's leader and explain why he or she is perceived to be the leader. Identify the source of his or her influence.

  Industrial and organizational psychology are important

One of the main points of Chapter 1 in the Landy and Conte text is that work (and by extension Industrial and Organizational Psychology) is very important in people’s lives. Based on you (and perhaps your own experiences), what do you think is the mo..

  Relevance of assessing for conduct disorder features

Explain the relevance of assessing for conduct disorder features/traits in juvenile forensic populations, as well as the reasons for such an assessment

  Describing the organizational structure of the agency

You are going to create a sample project describing the organizational structure of the agency or company for which you are planning the project. Describe as many of the organizational culture attributes as you can. List by name as many of the pro..

  Critically analyse a famous speech

MPM732: Critical thinking Assignment. Critically analyse a famous speech and and structurally compare it with a statement by the head of a company

  Explain the ramifications of the unions on society

The question is about Sociology and it is explain an essay on the sociological forces that have generated controversy over same-sex marriages in the United States.

  Is our us constitution becoming obsolete

Is our U.S. Constitution becoming obsolete? Do we need a new constitution? If so, what would the new document look like? What characterstics would you keep and what would you remove from the current U.S. Constitution

Free Assignment Quote

Assured A++ Grade

Get guaranteed satisfaction & time on delivery in every assignment order you paid with us! We ensure premium quality solution document along with free turntin report!

All rights reserved! Copyrights ©2019-2020 ExpertsMind IT Educational Pvt Ltd