Implement a mapreduce application to perform

Assignment Help Other Subject
Reference no: EM133265450

Big Data Analytics

Prepare a 10-15 mins presentation to their class in the last couple of weeks of classes. Each group must pick a topic from one of the three categories below.

Topics - Categories:

Category 1: Implement a MapReduce application to perform one of the following:
a) Matrix Multiplication.
b) Relational algebra Selection and Projection (set-based ‘no duplicates' version, and bag- based ‘with duplicates' version)
c) Relational algebra Union, Intersection, and Difference (set- & bag-based if applicable)
d) Relational algebra Natural Join operation

Notes:
1. Please refer to chapter 2 from Mining Massive Datasets that can be freely accessed on the book website, which outlines the necessary processing by mappers and reducers to perform each of the operations above.

2. Don't make any assumption about the number of input files or their filenames. The entries from both matrices could appear in any order in the file(s). Of course, this requires storing additional information in the data files such as the matrix name, and the indices of each entry in addition to the values of the entries. For example:
Let A and B be two matrices, given below, and we would like to find their multiplication C = AB. A is a 2x2 matrix while B is a 2x1 matrix (vector)

       Matrix A                                           Matrix B

0

1

 

 

0

0    25

9

 

0

44

1    31

17

 

1

13

The entries from both matrices can be stored in one, or more files. The data files can show entries from either matrix in any order. Each line represent one entry from either matrix. For example, one possible content of the input data files:
A, 0, 0, 25
B, 1, 0, 13
A, 1, 1, 17
A, 0, 1, 9
A, 1, 0, 31
B, 0, 0, 44

3. The note above also holds for relations in relational algebra (tables in SQL). For example, for natural join, rows/tuples from operand tables can appear in the same file or in different files, in one, two, or more files. Of course this requires the table name to be stored in the data files.
4. The shape of the input matrices and the schema of the input relations (tables) to the mappers and reducers must be passed as additional input upon job submission. Please

consider using the job object to pass this additional input to the mappers and reducers, using:
job.getConfiguration().set() // in the driver code
context.getConfiguration().get() // in the setup() method of the mapper and reducer

? Category 2: Hadoop ecosystem: Kafka, Flume, HBase, Storm, etc.
? Category 3: Other big data solutions: Snowflake, Elasticsearch, Amazon Redshift, etc.

Instructions:

1. Each groups must pick a project from one of the three categories above: MapReduce application, a tool from Hadoop ecosystem, or a big data platform.

2. If you choose MapReduce:
(a) You have to submit the sourcecode files as well.
(b) In your talk you have to cover the code, how it works, and do a sample run in the front of the class. Please prepare the necessary input data files to test your code and confirm if it generates the correct output.

3. If you choose a tool/framework from the Hadoop ecosystem, you have to cover:
(a) The main components/daemons of the tool, what exactly needs to be running to use it.
(b) What it is used for? What kind or processing? Alternative tools that serve the same purpose if any.
(c) A practical simple example on how we use the tool: code, scripting language, commands, etc.
(d) Your presentation should be enough for anyone to know the basics of the tool and start using it for simple processing.

4. If you choose a big data platform:
(a) Same as above: a, b, c, and d.

5. Scoring:
? 50 points: overall quality of slides, presentation, and talk,
? 50 points: code and demo, system components (if applicable), daemons, MapReduce code and how it works, etc.

6. At least two group members should give the presentation, using the same laptop/machine. Hopefully we can squeeze each talk between 10 to 15 mins, to give time for all of the groups.

7. Expected time: each group should have the code (if any) and slides ready within 2 to 3 weeks.

8. Final note: please don't worry much about your score and focus on exploring and learning something new. It should be an exciting experience for the whole class including myself.

Reference no: EM133265450

Questions Cloud

Potentially impacted for aboriginal : Explain three ways the service delivery could be potentially impacted for Aboriginal and/or Torres Strait Islander clients
Smoke coming from my neighbor car warrant : What is the warrant? There is smoke coming from my neighbor's car Warrant?
American airlines is losing money : United Airlines is losing money. American Airlines is losing money. Even Delta Airlines is losing money.
Critically evaluate their ability to manage a project : BMSW5104 Managing Projects in the Organisation - Leadership and Management Skills in the Workplace - Plan the implementation of a project of appropriate complex
Implement a mapreduce application to perform : DSA 5620 Big Data Analytics, University of Central Missouri What it is used for? What kind or processing? Alternative tools that serve the same purpose if any.
Systems development and systems development life cycle : Explain the difference between systems development and the systems development life cycle (SDLC)?
About fatal police shootings of unarmed black people : How do you personal experiences or beliefs affect how you evaluate news stories about fatal police shootings of unarmed black people
Why is it important to continue to monitor anthea plan : Why is it important to continue to monitor Anthea's plan when you work with her? Discuss possible outcomes when client targets are set too high or too low?
Susceptible than millenials to believing fake news : Discuss the different reasons for why younger and older people are more susceptible than millenials to believing fake news,

Reviews

Write a Review

Other Subject Questions & Answers

  Cross-cultural opportunities and conflicts in canada

Short Paper on Cross-cultural Opportunities and Conflicts in Canada.

  Sociology theory questions

Sociology are very fundamental in nature. Role strain and role constraint speak about the duties and responsibilities of the roles of people in society or in a group. A short theory about Darwin and Moths is also answered.

  A book review on unfaithful angels

This review will help the reader understand the social work profession through different concepts giving the glimpse of why the social work profession might have drifted away from its original purpose of serving the poor.

  Disorder paper: schizophrenia

Schizophrenia does not really have just one single cause. It is a possibility that this disorder could be inherited but not all doctors are sure.

  Individual assignment: two models handout and rubric

Individual Assignment : Two Models Handout and Rubric,    This paper will allow you to understand and evaluate two vastly different organizational models and to effectively communicate their differences.

  Developing strategic intent for toyota

The following report includes the description about the organization, its strategies, industry analysis in which it operates and its position in the industry.

  Gasoline powered passenger vehicles

In this study, we examine how gasoline price volatility and income of the consumers impacts consumer's demand for gasoline.

  An aspect of poverty in canada

Economics thesis undergrad 4th year paper to write. it should be about 22 pages in length, literature review, economic analysis and then data or cost benefit analysis.

  Ngn customer satisfaction qos indicator for 3g services

The paper aims to highlight the global trends in countries and regions where 3G has already been introduced and propose an implementation plan to the telecom operators of developing countries.

  Prepare a power point presentation

Prepare the power point presentation for the case: Santa Fe Independent School District

  Information literacy is important in this environment

Information literacy is critically important in this contemporary environment

  Associative property of multiplication

Write a definition for associative property of multiplication.

Free Assignment Quote

Assured A++ Grade

Get guaranteed satisfaction & time on delivery in every assignment order you paid with us! We ensure premium quality solution document along with free turntin report!

All rights reserved! Copyrights ©2019-2020 ExpertsMind IT Educational Pvt Ltd