Describe the problem and your high level solution

Assignment Help Other Subject
Reference no: EM132473644

CST4070 Applied Data Analytics Tools, Practical Big Data Handling, Cloud Distribution - Middlesex University

Big Data

General information - You are required to submit your work via the dedicated Unihub assignment link by the specified deadline. This link will 'timeout' at the submission deadline. Your work may not be accepted as an email attachment if you miss this deadline. Therefore, you are strongly advised to allow plenty of time to upload your work prior to the deadline.

You are required to solve the Tasks illustrated below. Each Task should be accompanied by:

a. A short introduction where you describe the problem and your high level solution.

b. Your step-by-step process supported by screenshots. Each screenshot needs to be accompanied by a short explanatory text.

c. Eventually, if necessary, conclude each task with brief summary of what you have done.

Tasks - Follow the lab instructions to install Apache Hadoop into a virtual server running on Linux Ubuntu Server. Once you have Apache Hadoop installed and running, execute the following Task tasks.

Task 1 - Implement one executable Hadoop MapReduce job that counts the total number of words having an even and odd number of characters. As an example, if the text in input is Hello world , the output should be even:0, odd:2 , because both Hello and world contain an odd number of characters. Whereas, if the input us My name is Alice the output should be even: 3, odd: 1.

The job needs to be executed by a mapper and a reducer. Both mapper and reducer needs to be written in Python and tested in Linux Ubuntu before running them on Hadoop MapReduce.

Task 2 - Implement one executable Hadoop MapReduce job that receives in input a .csv table having the structure 'StudentId, Module, Grade' and returns in output the minimum and maximum grade of each student along as her total number of modules she has passed.

Therefore, if your input is:

StudentId

Module

Grade

S001

Statistic

75

S002

Statistic

72

S001

Big Data

78

S003

Big Data

66

S001

Programming

70

S002

Programming

55

S001

Machine Learning

65

S002

Machine Learning

61

Your output need to be:

StudentId

MinGrade

MaxGrade

Modules

S001

65

78

4

S002

55

72

3

S003

66

66

1

The job needs to be executed by a mapper and a reducer. Both mapper and reducer needs to be written in Python and tested in Linux Ubuntu before running them on Hadoop MapReduce.

Task 3 - Implement one executable Hadoop MapReduce job that receives in input two .csv tables having the structure:

User: UserId, Name, DOB

Follows: UserIdFollower, UserIdFollowing

The MapReduce job needs to perform the following SQL query:

select U.UserId, U.Name as NameFollower, F.Name as NameFollowing

from User as U

join Follows as F on U.UserId = F.UserId

where F.DOB <= '2002-03-01'

Therefore, if the two original tables are:

UserId

Name

DOB

U001

Alice

2005-01-05

U002

Tom

2001-02-07

U003

John

1998-06-02

U004

Alex

2006-02-01

 

UserIdFollower

UserIdFollowing

U001

U002

U001

U003

U002

U001

U002

U004

U003

U001

U004

U001

The final table needs to be

UserId

NameFollower

NameFollowing

U001

Alice

Tom

U001

Alice

John

The job needs to be executed by a mapper and a reducer. Both mapper and reducer needs to be written in Python and tested in Linux Ubuntu before running them on Hadoop MapReduce.

Reference no: EM132473644

Questions Cloud

Prepare a balance sheet for ryas planning services : How do you prepare a statement of owner's equity for Rya's Planning Services for the month ended January 31, 2019. prepare a balance sheet for Rya
Consolidate the most recent financial statements : Consolidate the most recent financial statements of the Charity and its trading subsidiary Musical Museum Trading Limited eliminating inter-company transfers
Why does merchandise inventory appear in income statement : Why does Merchandise Inventory (an asset) appear in the Income Statement section of the worksheet? Explain clearly and thoroughly.
Prepare olsheskis cash flow statement for the year : Prepare Olsheski's cash flow statement for the year ended December 31, 2017 using the indirect method.balance sheet for =tea at the end of 2016 and 2017
Describe the problem and your high level solution : Implement one executable Hadoop MapReduce job that counts the total number of words having an even and odd number of characters
Prepare ABCs December balance sheet and income statement : On January 2, 2018, ABC Corporation purchased stock in MSN Company. Prepare ABC's December 31, 2018 balance sheet and 2018 income statement
Prepare the journal entries to record the expenditure : Prepare the journal entries to record the $30,400 expenditure and 2019 amortization. (Credit account titles are automatically indented when amount)
Determine the entries based on bank reconciliation : Determine the Entries Based on Bank Reconciliation. Which of the reconciling items listed below require an entry in the company's accounts?
BU7412 International Business Assignment : BU7412 International Business Assignment Help and Solution, University of Chester - Assessment Writing Service - Demonstrate an understanding of the environment

Reviews

Write a Review

Other Subject Questions & Answers

  Cross-cultural opportunities and conflicts in canada

Short Paper on Cross-cultural Opportunities and Conflicts in Canada.

  Sociology theory questions

Sociology are very fundamental in nature. Role strain and role constraint speak about the duties and responsibilities of the roles of people in society or in a group. A short theory about Darwin and Moths is also answered.

  A book review on unfaithful angels

This review will help the reader understand the social work profession through different concepts giving the glimpse of why the social work profession might have drifted away from its original purpose of serving the poor.

  Disorder paper: schizophrenia

Schizophrenia does not really have just one single cause. It is a possibility that this disorder could be inherited but not all doctors are sure.

  Individual assignment: two models handout and rubric

Individual Assignment : Two Models Handout and Rubric,    This paper will allow you to understand and evaluate two vastly different organizational models and to effectively communicate their differences.

  Developing strategic intent for toyota

The following report includes the description about the organization, its strategies, industry analysis in which it operates and its position in the industry.

  Gasoline powered passenger vehicles

In this study, we examine how gasoline price volatility and income of the consumers impacts consumer's demand for gasoline.

  An aspect of poverty in canada

Economics thesis undergrad 4th year paper to write. it should be about 22 pages in length, literature review, economic analysis and then data or cost benefit analysis.

  Ngn customer satisfaction qos indicator for 3g services

The paper aims to highlight the global trends in countries and regions where 3G has already been introduced and propose an implementation plan to the telecom operators of developing countries.

  Prepare a power point presentation

Prepare the power point presentation for the case: Santa Fe Independent School District

  Information literacy is important in this environment

Information literacy is critically important in this contemporary environment

  Associative property of multiplication

Write a definition for associative property of multiplication.

Free Assignment Quote

Assured A++ Grade

Get guaranteed satisfaction & time on delivery in every assignment order you paid with us! We ensure premium quality solution document along with free turntin report!

All rights reserved! Copyrights ©2019-2020 ExpertsMind IT Educational Pvt Ltd