Describe how to load data and pre-processing is performed

Assignment Help Computer Engineering
Reference no: EM132313891 , Length: word count:2000

Introduction to Data Science Assignment -

The purpose of this data analysis report is to demonstrate your data processing skills and your ability to analyse real-world data. It helps to develop a deeper understanding of the importance of data and information in business.

Assignment Task - A research team planned to study Australian road transport crash fatalities from 2010 to 2018 (inclusive). As a team member, you were given the dataset about Australian Road Death Fatalities, and were requested to analyse the data and prepare a report about your work and findings.

The dataset can be downloaded from Blackboard or the above website. The dataset contains basic demographic and crash details of Australian road crashes between 1989 and 2019. As the team does not have any specific goal for the analysis, you have the freedom to explore the data, and dig out anything you feel interesting or significant. However, you are to limit your research and analysis to the years 2010 to 2018.

The potential audiences include other researchers, business representatives, and government agencies. They may have limited ICT or mathematical knowledge.

To prepare the report, please include the following sections:

1. Introduction

Provide an introduction to the problem. Include background material as appropriate: who cares about this problem, what impact it has, where does the data come from, what are the dimensions and structures of the data.

2. Data Setup

Describe how to load the data, and how the pre-processing is performed.

The original dataset is not ready for analysis and it is different from the data forms that we are familiar with in previous practices. This means we need to do some pre-processing, either for the whole dataset, or for a subset of the dataset required for each sub task described later.

Once you have some ideas of exploratory or advanced analysis, you need to adjust the form of dataset. This can be achieved either by manipulating records in R by transposition or subsetting, or with other tools (e.g. notepad or excel) before reading them into R. Please explain your solution in this section.

3. Exploratory Data Analysis

3.1 - One-variable analysis - One-variable analysis studies one variable (one row or one column) each time. For example, we can select a particular Australian state or year to get a column of numbers and the histogram can be used.

Perform 2 one-variable analyses. Plot one graph for each variable. Explain the finding for each graph.

3.2 - Two-variable analysis - Two-variable analysis studies the relation between two variables. For example, we can select "Diseases of the nervous system" and "Year", then a time series (scatter) plot can be drawn. Or, we can select "2015" and "Causes".

Perform 2 two-variable analysis. Plot one graph for each variable. Explain the finding for each graph.

4. Advanced Analysis

4.1 - Clustering - Briefly explain the concept of clustering and k-means.

Perform 1 clustering analysis to group years according to a selected cause.

4.2 - Linear Regression - Briefly explain the concept of linear regression.

Perform 2 linear regression analysis. Plot the learned models.

5. Conclusion

6. Reflections

In this part, discuss any difficulties you had performing the analysis and how you solved those difficulties. Reflect on how the analysis process went for you, what you learnt, and what you might do differently next time.

For the data analysis (Section 3 & 4), you need to provide both R code, the explanation to the code, and the result. Please represent each R code snippet in a box with some comments.

Report Format - Your report should be no less than 1,200 words and it would be best to be no longer than 2,000 words long. Text in R code snippets are not counted.

The report MUST be formatted using the following guidelines:

  • Title Page - Must not contain headers, footers, or page numbering. Include your name as the report's author.
  • Header - Report title
  • Footer - your name and the page number
  • Paragraph text - 12 point Calibri single line spacing
  • Headings - Arial in an appropriate type size
  • Margins - 2.5cm on all margins
  • Page numbering
  • Executive summary to the last page of Table of Figures to use roman numerals (i, ii, iii, iv)
  • Introduction and onwards to use conventional numerals (1, 2, 3, 4) starting on page 1 from the introduction.
  • The report is to be created as a single Microsoft Word document (version 2007 or later). No other format is acceptable and doing so will result in the deduction of marks.

Please follow the conventions detailed in: Summers, J. & Smith, B., 2014, Communication Skills Handbook, 4th Ed, Wiley, Australia.

Attachment:- Assignment Files.rar

Reference no: EM132313891

Questions Cloud

Short term financing and working capital management : Prepare a case study on short term financing and working capital management
Differences between vpn and tor : Describe the differences between VPN and TOR with respect to the technologies used.
Describe the concept of the internet of things : What significant information security risks do you foresee with IoT? Why are these risks different from the use of the Internet by typical computers or mobile
How to prescribe controlled substances in your state : In this Practicum Journal Assignment, you will explore the legalities associated with prescribing controlled substances, as well as what a DEA number is.
Describe how to load data and pre-processing is performed : ICT110 Introduction to Data Science Assignment, University of the Sunshine Coast, Australia. Describe how to load data and pre-processing is performed
Handling so many theses submitted by its research students : A student who has completed a higher degree by research is required by LTU to deposit a copy of his/her thesis in the library. What type of architecture
What knowledge might be derived from the data : In the modern era, there are few professions that do not to some extent rely on data. Stockbrokers rely on market data to advise clients on financial matters.
Difference between gospel culture and salvation culture : The difference between gospel culture and salvation culture. Gospel culture is culture that is made for community, is meant to be practiced with others present.
What do you think about the above statement : The evaluation of information technology and its business value are the subjects of many academic and business discussions. Investments in IT are growing

Reviews

len2313891

5/30/2019 12:01:40 AM

Report should be no less than 1,200 words and it would be best to be no longer than 2,000 words long. Text in R code snippets are not counted. The report MUST be formatted using the following guidelines: Title Page - Must not contain headers, footers, or page numbering. Include your name as the report's author. Header - Report title. Footer - your name and the page number. Paragraph text - 12 point Calibri single line spacing. Headings - Arial in an appropriate type size. Margins - 2.5cm on all margins. Page numbering . Executive summary to the last page of Table of Figures to use roman numerals (i, ii, iii, iv)

len2313891

5/30/2019 12:01:34 AM

Introduction and onwards to use conventional numerals (1, 2, 3, 4) starting on page 1 from the introduction. . The report is to be created as a single Microsoft Word document (version 2007 or later). No other format is acceptable and doing so will result in the deduction of marks. Please follow the conventions detailed in: Summers, J. & Smith, B., 2014, Communication Skills Handbook, 4th Ed, Wiley, Australia. Resources - The following links provide the data in the format of a CSV file and also a data dictionary to explain the data: BITRE_ARDD_Fatalities_Mar_2019_II.csv. It is important that you adhere to the assessment instructions and refer to the marking rubric for this assessment task.

len2313891

5/30/2019 12:01:28 AM

The assignment will be marked out of a total of 100 marks and forms 30% of the total assessment for the course. ALL assignments will be checked for plagiarism by SafeAssign system provided by Blackboard automatically. Late submission will be penalised according to the policy in the course outline. Please note Saturday and Sunday are included in the count of days late. Requests for an extension to an assignment MUST be made to the course coordinator prior to the date of submission and requests made on the day of submission or after the submission date will only be considered in exceptional circumstances. Assignment submission extensions will only be made using the official University guidelines.

len2313891

5/30/2019 12:01:22 AM

Referencing - 2 references for the explanation of Clustering and 2 for linear regression are required. These references should follow the Harvard method of referencing. Note that ALL references should be from journal articles, conference papers, technical papers or a recognized expert in the field. DO NOT use Wikipedia as a reference. The use of unqualified references will result in the deduction of marks.

len2313891

5/30/2019 12:01:15 AM

Assignment Advice - This assignment will take several weeks to complete and will require a good understanding of data science theories and practices for successful completion. It is imperative that students take heed of the following points in relation to doing this assignment: Ensure that you clearly understand the requirements for the assignment - what must be done and what are the deliverables. If you do not understand any of the assignment requirements - Please ASK the course coordinator or your tutor. Each time you work on any aspect of the assignment reread the assignment requirements to ensure that what is required is clearly understood.

Write a Review

Computer Engineering Questions & Answers

  Mathematics in computing

Binary search tree, and postorder and preorder traversal Determine the shortest path in Graph

  Ict governance

ICT is defined as the term of Information and communication technologies, it is diverse set of technical tools and resources used by the government agencies to communicate and produce, circulate, store, and manage all information.

  Implementation of memory management

Assignment covers the following eight topics and explore the implementation of memory management, processes and threads.

  Realize business and organizational data storage

Realize business and organizational data storage and fast access times are much more important than they have ever been. Compare and contrast magnetic tapes, magnetic disks, optical discs

  What is the protocol overhead

What are the advantages of using a compiled language over an interpreted one? Under what circumstances would you select to use an interpreted language?

  Implementation of memory management

Paper describes about memory management. How memory is used in executing programs and its critical support for applications.

  Define open and closed loop control systems

Define open and closed loop cotrol systems.Explain difference between time varying and time invariant control system wth suitable example.

  Prepare a proposal to deploy windows server

Prepare a proposal to deploy Windows Server onto an existing network based on the provided scenario.

  Security policy document project

Analyze security requirements and develop a security policy

  Write a procedure that produces independent stack objects

Write a procedure (make-stack) that produces independent stack objects, using a message-passing style, e.g.

  Define a suitable functional unit

Define a suitable functional unit for a comparative study between two different types of paint.

  Calculate yield to maturity and bond prices

Calculate yield to maturity (YTM) and bond prices

Free Assignment Quote

Assured A++ Grade

Get guaranteed satisfaction & time on delivery in every assignment order you paid with us! We ensure premium quality solution document along with free turntin report!

All rights reserved! Copyrights ©2019-2020 ExpertsMind IT Educational Pvt Ltd