Write a report to explain design and implementation

Assignment Help Other Subject

Reference no: EM132370935

Data Science Practice Assignment -

Assignment Task - This assignment consists of two deliverables, being:

One code implementation - The code file in Jupyter Notebook format and the relevant date set files.
A report - The report must be uploaded as a separate file.

Part I - PySpark source code

Important Note: For code reproduction, your code must be self-contained. That is, it should not require other libraries besides PySpark environment we have used in the workshops. The data files are packaged properly with your code file.

In this component, we need to utilise Python 3 and PySpark to complete the following data analysis tasks:

1. Exploratory data analysis

2. Recommendation engine

3. Classification

4. Clustering

You need to choose a dataset from Kaggle to complete these tasks. Remember to include the data set file in you source code submission.

Note: In your notebook, please use Heading 1 Markdown cell to separate each sub task.

Task I.1: Exploratory data analysis

This subtask requires you to explore your dataset by

telling its number of rows and columns,
doing the data cleaning (missing values or duplicated records) if necessary
selecting 3 columns, and drawing 1 plot (e.g. bar chart, histogram, boxplot, etc.) for each to summarise it

Task I.2: Recommendation engine

This subtask requires you to implement a recommender system on Collaborative filtering with Alternative Least Squares Algorithm. You need to include

Model training and predictions
Model evaluation using MSE

Task I.3: Classification

This subtask requires you to implement a classification system on Logistic regression with LogisticRegressionWithLBFGS class. You need to include

Logistic Regression model training
Model evaluation

Task I.4: Clustering

This subtask requires you to implement a clustering system on K-means. You need to include

Model training
Model evaluation

Part II -Report

You are required to write a report to explain your design and implementation of the machine learning parts in your code, including the following topics:

Introduction/summary/explanation to the ML algorithm/concepts
The learning settings, such as how to prepare training/testing set, what are the key parameters and how you set them up
Comments/evaluation for the models learnt

Your report should use the following template:

Table of Contents

1.0 Introduction

Explain the data set you've chosen, including its source URL. Demonstrate your exploratory data analysis in this section.

2.0 Machine learning implementation

2.1 Collaborative filtering

2.3 Logistic regression

2.4 K-Means

3.0 Conclusion

References

Assignment Advice - This assignment will take several weeks to complete and will require a good understanding of machine learning and PySpark for successful completion. It is imperative that students take heed of the following points in relation to doing this assignment:

1. Ensure that you clearly understand the requirements for the assignment - what must be done and what are the deliverables.

2. If you do not understand any of the assignment requirements - Please ASK your tutor.

3. Each time you work on any aspect of the assignment reread the assignment requirements to ensure that what is required is clearly understood.

4. We have practiced nearly all coding tasks in DataCamp before. If you have any difficulty, redoing the practices in DataCamp is recommended.

5. Prior to submitting your code, you should ensure not only that it executes as required, but also looks professional. It is expected that you adhere to python standards for naming and indenting. All methods should be adequately documented such that another programmer examining your code will readily know what the code is doing.

Reference no: EM132370935

Questions Cloud

Encourage others to self-manage emotions : As a team member, what can an individual do to encourage others to self-manage their emotions?

Behaviour and emotions can have on others in the workplace : As a team member, what can an individual do to assist their team mates to understand what affect their behaviour and emotions can have on others in the workplac

Employee paychecks must be issued in five days : Taxes are due in three days and employee paychecks must be issued in five days. There is not enough money to cover both expenses. What do you do?

Pollution from commercial and industrial enterprises : Can you help me Analyze the ethical issue raised by pollution from commercial and industrial enterprises?

Write a report to explain design and implementation : ICT707 Data Science Practice Assignment, University of the Sunshine Coast, Australia. Write a report to explain design and implementation

Strategic audit of a corporation benefit : What elements would be the most helpful to the new entrepreneur when developing his/her business plan?

What is strategic management : What is strategic management? What are the main benefits associated with its main attributes?

How hospitality industry can response : How hospitality industry can response when competitor initiated price cut?

What is organizational conflict : What are some methods of conflict resolution? Give an example of a workplace conflict and describe what can be done to resolve it.

Reviews

len2370935

9/14/2019 4:39:34 AM

The assignment will be marked out of a total of 100 marks and forms 40% of the total assessment for the course. ALL assignments will be checked for plagiarism by SafeAssign system provided by Blackboard automatically. Late submission will be penalised according to the policy in the course outline. Please note Saturday and Sunday are included in the count of days late. Requests for an extension to an assignment MUST be made to the course coordinator prior to the date of submission and requests made on the day of submission or after the submission date will only be considered in exceptional circumstances. Assignment submission extensions will only be made using the official University guidelines.

len2370935

9/14/2019 4:39:28 AM

Assignment Task - This assignment consists of two deliverables, being: One code implementation (50%). The code file in Jupyter Notebook format and the relevant date set files should be contained within a folder named: Task 3-Your Name- Student_Number, the folder is then to be zipped and uploaded to blackboard. A report (50%). The report must be uploaded as a separate file.

len2370935

9/14/2019 4:39:20 AM

Report Format - Your report should be about 1000 words, but no more than 1500 words. The report MUST be formatted using the following guidelines: Title Page - Must not contain headers, footers, or page numbering. Include your name as the report's author. Header - Report title, Footer - your name and the page number, Paragraph text - 12 point Calibri single line spacing, Headings - Arial in an appropriate type size, Margins - 2.5cm on all margins, Page numbering, Introduction and onwards to use conventional numerals (1, 2, 3, 4) starting at page 1 from the introduction. The report is to be created as a single Microsoft Word document (version 2007 or later). No other format is acceptable and doing so will result in the deduction of marks.

len2370935

9/14/2019 4:39:14 AM

Referencing - The report is to include (at least 5) appropriate references and these references should follow the Harvard method of referencing. Note that ALL references should be from journal articles, conference papers, technical papers or a recognized expert in the field. DO NOT use Wikipedia as a reference. The use of unqualified references will result in the deduction of marks.

Write a Review

Required(*) Message

Assured A++ Grade

Get guaranteed satisfaction & time on delivery in every assignment order you paid with us! We ensure premium quality solution document along with free turntin report!

Submit Assignment

User Account

All Pages