Implement one executable hadoop mapreduce job

Assignment Help Other Subject
Reference no: EM132467625

CST4070 Applied Data Analytics - Tools, Practical Big Data Handling, Cloud Distribution - Middlesex University

Assignment - Big Data

You are required to submit your work via the dedicated Unihub assignment link by the specified deadline. This link will ‘timeout' at the submission deadline. Your work may not be accepted as an email attachment if you miss this deadline. Therefore, you are strongly advised to allow plenty of time to upload your work prior to the deadline.

You are required to solve the tasks illustrated below. Each task should be accompanied by:

A short introduction where you describe the problem and your high level solution. Your step-by-step process supported by screenshots. Each screenshot needs to be accompanied by a short explanatory text.

Eventually, if necessary, conclude each task with brief summary of what you have done.

Your submission needs to be unique

When solving your tasks, you are required to name your files by using your first name (e.g., if your name is Alice, you may name your task 1 file as ) so to make your submission unique. Obviously, also your explanatory text needs to be unique.

Tasks

Follow the lab instructions to install Apache Hadoop into a virtual server running on Linux Ubuntu Server. Once you have Apache Hadoop installed and running, execute the following tasks.

Task 1

Implement one executable Hadoop MapReduce job that counts the total number of words having an even and odd number of characters. As an example, if the text in input is

Hello world , the output should be

, because both

and

world contain an odd number of characters. Whereas, if the input us
My name is Alice the output should be .

The job needs to be executed by a mapper and a reducer. Both mapper and reducer needs to be written in Python and tested in Linux Ubuntu before running them on Hadoop MapReduce.

Task 2

Implement one executable Hadoop MapReduce job that receives in input a .csv table having the structure 'StudentId, Module, Grade' and returns in output the minimum and maximum grade of each student along as her total number of modules she has passed.

Therefore, if your input is:

The job needs to be executed by a mapper and a reducer. Both mapper and reducer needs

to be written in Python and tested in Linux Ubuntu before running them on Hadoop MapReduce.

Task 3

Implement one executable Hadoop MapReduce job that receives in input two .csv tables having the structure:

User: UserId, Name, DOB
Follows: UserIdFollower, UserIdFollowing

The MapReduce job needs to perform the following SQL query:

Therefore, if the two original tables are:

The final table needs to be

The job needs to be executed by a mapper and a reducer. Both mapper and reducer needs to be written in Python and tested in Linux Ubuntu before running them on Hadoop MapReduce.

Attachment:- Applied Data Analytics.rar

Reference no: EM132467625

Questions Cloud

Determine amount should trudata report as common stock : Determine On its acquisition-date consolidated balance sheet, what amount should TruData report as common stock?Webstat's precombination book and fair values
Influences on nutrient intake and food choices : Which nutrients would you increase and which would you decrease to promote a healthy diet and What recommendations would you make in your food choices
Determine what is the impact on net income : Determine what is the impact on net income of including goods in transit f.o.b. shipping point in purchases, but not ending inventory?
Determine CPS Transportations interest expense for the year : Determine CPS Transportation's income tax expense and net income for the year ended December 31, 2021.Determine CPS Transportation's interest expense
Implement one executable hadoop mapreduce job : Describe the problem and your high level solution. Your step-by-step process supported by screenshots. Each screenshot needs to be accompanied.
Do women in academy truly want a dialogue about racism : Do women in the academy truly want a dialogue about racism?When women of Color speak out of the anger that laces so many of our contacts with white women
Discuss the max and median statistical function : What are your favorite functions/formulas? (from Excel's libraries or writing your own) Max and median statistical function?How would apply the Excel features
Why is public opinion important to lawmakers : Why is public opinion important to lawmakers? Do you think elected officials should rely on polls to help guide their thinking or should they use own judgment?
What are crocs core competencies : What are Croc's core competencies? How do they exploit these competencies in the future? Consider the following alternatives: Further vertical integration

Reviews

Write a Review

Other Subject Questions & Answers

  Cross-cultural opportunities and conflicts in canada

Short Paper on Cross-cultural Opportunities and Conflicts in Canada.

  Sociology theory questions

Sociology are very fundamental in nature. Role strain and role constraint speak about the duties and responsibilities of the roles of people in society or in a group. A short theory about Darwin and Moths is also answered.

  A book review on unfaithful angels

This review will help the reader understand the social work profession through different concepts giving the glimpse of why the social work profession might have drifted away from its original purpose of serving the poor.

  Disorder paper: schizophrenia

Schizophrenia does not really have just one single cause. It is a possibility that this disorder could be inherited but not all doctors are sure.

  Individual assignment: two models handout and rubric

Individual Assignment : Two Models Handout and Rubric,    This paper will allow you to understand and evaluate two vastly different organizational models and to effectively communicate their differences.

  Developing strategic intent for toyota

The following report includes the description about the organization, its strategies, industry analysis in which it operates and its position in the industry.

  Gasoline powered passenger vehicles

In this study, we examine how gasoline price volatility and income of the consumers impacts consumer's demand for gasoline.

  An aspect of poverty in canada

Economics thesis undergrad 4th year paper to write. it should be about 22 pages in length, literature review, economic analysis and then data or cost benefit analysis.

  Ngn customer satisfaction qos indicator for 3g services

The paper aims to highlight the global trends in countries and regions where 3G has already been introduced and propose an implementation plan to the telecom operators of developing countries.

  Prepare a power point presentation

Prepare the power point presentation for the case: Santa Fe Independent School District

  Information literacy is important in this environment

Information literacy is critically important in this contemporary environment

  Associative property of multiplication

Write a definition for associative property of multiplication.

Free Assignment Quote

Assured A++ Grade

Get guaranteed satisfaction & time on delivery in every assignment order you paid with us! We ensure premium quality solution document along with free turntin report!

All rights reserved! Copyrights ©2019-2020 ExpertsMind IT Educational Pvt Ltd