Reference no: EM133913642
Big Data Architecture and Application
Assessable Item:
One (1) piece of a report containing the complete flowchart showing your design idea about how to prepare the data, the tools being used in all steps, and the report for answers to the given questions.
One (1) document of the code collections for your assignment.
Purpose of Assignment
This assignment tests whether a student is capable of using MapReduce to cope with real-world problems and achieve a specific goal. The solution designed should be reasonable, practical, and manageable.
MapReduce enables relatively fast and easy processing of very large datasets using a cluster of commodity machines. In this assignment, students will become more familiar with and gain practical experience with the MapReduce Programming Model on top of the Hadoop software platform. This assignment requires students to understand the process of designing, setting up, and executing MapReduce tasks over the given dataset on a single node. For this assignment, students are required to run Hadoop on the virtual machine and complete the given tasks. The student will be given the task of implementing their own MapReduce job and analysing the outcome produced. Students are required to include comments alongside the code for improved readability. Explaining your design and how you get the answers to each question in the report is essential. Get top-rated assignment help now.
Assignment Goal
This assignment aims to train students to analyse the problems they encounter and find the most suitable way(s) for accomplishing the given tasks in the real-world big data processing environment. Students will encounter some research and discover components to learn skills and knowledge from the project.
Create a Java project named Assignment to produce a working Hadoop project, which will be used to answer the questions below. Follow the template to explain how you designed your solution, the challenges you encountered, how you found the solutions for them, and how you found the answers to each question.
Your codes must fulfil the following criteria and can be used to find answers to the questions:
Part 1:
List all commands used in this assignment in order, as well as readable execution screenshots with your name, student ID, and the VM datetime information with format "Program Executed at: yyyy-MM-dd HH:mm:ss" printed from the driver code. Treat the upper and lower case words independently. Moreover, answer the questions based on your retrieved result. ?Note: It is quite common if you see the VM's time is different from the real-world time. No need to adjust the VM system time to match
reality.
Correct implementation of the Mapper class(es) with readability and well-structured codes and comments.
Correct implementation of the Reducer class(es) with readability and well-structured codes and comments.
Correct implementation of the Driver class(es) with readability and well-structured codes and comments.
Part 2:
Create a Java project named Assignment1_2 and use the output from Assignment1 as the input. Write a MapReduce code to count how many times a number appears. List all commands used in this assignment in order, as well as readable execution screenshots with your name, student ID, and the VM datetime information with format "Program Executed at: yyyy-MM-dd HH:mm:ss" printed from your code. The
corresponding codes should also be packed in the submission.
Explain your chain of thought in solving the given tasks. Include a flowchart in the report to explain the chain of thought step by step.