Reference no: EM134011627
Big Data for Software Development
Learning Outcome 1: Critically assess and implement advanced data pre-processing and analytics strategies in a software development context, focusing on tasks like data cleansing, transformation, and feature selection using concepts related to Data Mining.
Learning Outcome 2: Design, develop, and evaluate big data solutions using programming models like Map-Reduce and technologies tailored specifically to address software development needs such as DevOps integration and quality assurance through Application Programming And Software Help and Programming Assignment Project Solutions.
Assessment Objective - The objective of this assessment is to assess student's knowledge and practical skills in working with large-scale datasets and leveraging Hadoop ecosystem tools and technologies for data processing and analysis with support from concepts in Database Management Dbms.
Assessment: Map-Reduce Programming Challenge
Assignment Description
Instructions
The dataset for this assignment must be downloaded from Moodle.
Each user has rated at least one product. You need to unzip the dataset file and use u.data file which is a text file and contains four columns: User ID, Item ID, Rating, Timestamp. Rating is from 1 to 5. The timestamps are unix seconds since 1/1/1970 UTC. For example, the following line of the file
196 242 3 881250949
Is interpreted as follows: User 196 has rated item 242, 3 at time 881250949 (Fri Dec 05 1997 02:55:49 GMT+1100 (Australian Eastern Daylight Time).
Your task is to use MapReduce programming and find the average ratings for each item. Students may also apply concepts from Java Programming or C Programming while implementing the solution. Here is an example of the output:
Item Average Ratings
195 3.45
219 4.24
You may choose your preferred output format, but all required information must be included. 2-3 screenshots of the output file must be submitted as part of your assessment.
Deliverable
You need to submit an MS Word or a PDF file which includes the following items:
The CIHE Assessment Cover Sheet must be completed and attached to the report.
The source code for map and reduce function (copied/pasted into the MS Word or PDF file; no separate file is needed).
A section for discussing the potential benefits of your project for Amazon. You need to explain how Amazon can make informed decisions based on the results of your project.
This section must be 450 - 550 words.