Reference no: EM133925142
Big Data for Software Development
Assessment - Map-Reduce Programming Challenge
Learning Outcome 1: Critically assess and implement advanced data pre-processing and analytics strategies in a software development context, focusing on tasks like data cleansing, transformation, and feature selection.
Learning Outcome 2: Design, develop, and evaluate big data solutions using programming models like Map-Reduce and technologies like Hadoop, tailored specifically to address software development needs such as DevOps integration and quality assurance.
Assessment Objective: The objective of this assessment is to assess student's knowledge and practical skills in working with large-scale datasets and leveraging Hadoop ecosystem tools and technologies for data processing and analysis. Get online assignment help-AI & plagiarism-free-now!
Assessment: Map-Reduce Programming Challenge
Assignment Description
Supporting Materials
All supporting materials for this assessment can be found in Hadoop Files folder in Moodle: 1- A virtual machine has been prepared for you on which Ubuntu and Hadoop have been
installed and configured (Hadoop Virtual Machine). All files related to the virtual machine
can be found in the zip file Hadoop_VM (WMWare) or Hadoop_VM (VirtualBox). You need to download the Zip file and open it on your computer's hard drive. Then, you need to install VMWare Player on your computer and open the virtual machine file.
Virtual Machine Tutorial (Part 1 and 2) is a tutorial video on how to use the virtual machine. It shows step by step on how you can you start Hadoop and run a WordCount example.
Hadoop Tutorial.PDF also provides you with detailed instructions on how to start Hadoop and run WordCount example.
Instructions
The following file contains user ratings for Amazon products:
Amazon Product Review
(Note: If the link doesn't work, you can download the file from Moodle. It exists in the
Assessment section).
Each user has rated at least one product. The format of the data file is CSV and contains four columns: User ID, Product ID, Rating, Timestamp. Rating is from 1 to 5. The timestamps are unix seconds since 1/1/1970 UTC. For example, the following line of the file
Is interpreted as follows: User A000681618A3WRMCK53V has rated product B0002Y5WZM, 2/5 at time 1383609600 (Tuesday, Nov 05 2013 11:00:00, Australian Eastern Daylight Time).
Your task is to use MapReduce programming and find the number of ratings for each date. Here is an example of the output:
You may choose your preferred output format, but all required information must be included. 2-3 screenshots of the output file must be submitted as part of your assessment. You may also choose your preferred date format (e.g., 05/11/2013 instead of Nov 05 2013);
however, the date must be presented in a human-readable format. Formats such as Unix timestamps are not acceptable.
Deliverable
You need to submit an MS Word or a PDF file which includes the following items:
The source code for map and reduce function (copied/pasted into the MS Word or PDF file; no separate file is needed).
The output file.
Enough screenshots on the steps taken to get the program running.
Screenshots for the output generated by the program. The student's name must be also part of the printed information. Annotate all screenshots with brief descriptions (one line or two is enough).
In all screenshots, the username, date and time of the VM on eduLAB must be clearly shown (look at the sample below).
A section for discussing the potential benefits of your project for Amazon. You need to explain how Amazon can make informed decisions based on the results of your project.
This section must be 450 - 550 words.