Discuss about the group-based data summarisation

Assignment Help Other Subject
Reference no: EM132320540 , Length: 10

CAPSTONE PROJECT - FOUNDATIONS OF DATA SCIENCE

This assessment involves writing a report that summarises a data science related investigation that you have conducted on data that you have collected yourself. The investigation must involve the main topics covered in the subject, most noticeably data pre-processing (representation, wrangling, tidying) and exploratory data visualisation using R/RStudio.

It is a merger of Assessments 3 (Exploratory Visualisation) and 4 (Pre-Processing - Parts A and B), however neither the dataset nor the pre-processing/exploratory steps to be carried out will be provided, you have to make independent choices and decisions.

You will need to find your own data using good practices. Your dataset cannot be smaller than 1000 observations of 5 variables, except if the targeted data science problem to be addressed relates to spatial-temporal data, case in which less than 5 dimensions could be allowed.

Preferably, you should use a dataset relevant to your place of work. Do not use data from textbooks or from R packages. Do not use data from the same public sources that have been used in the subject (e.g. UCI repository). You can use public data, but the data should be appropriate for addressing a relevant data science problem.

You don't need to solve this entire data science problem in your investigation, but you need to clearly indicate what the targeted problem would be about and how your project can contribute towards addressing it.

You have to write a report with details about the problem in question, the data, the methods, results, analyses and findings. You might like to look online for research papers for examples of how to shape your report. Obviously many of these papers will have undergone extensive work to collect their data, we don't expect that for you.

We also don't expect you to win a Nobel prize with this assessment. Ideally, you will be able to demonstrate that: (a) you have grasped important concepts associated with this subject, most noticeably data pre-processing and exploratory visualisation; and (b) you can communicate your investigation in a formal written manner.

Regarding (a), we expect that your investigation will include at least six (60% or more) of the following topics:

1. Data representation

2. Unstructured to Structured data

3. Data cleaning

4. Type conversion

5. Missing value imputation

6. Gathering/Spreading

7. Data subset selection and/or subsampling

8. Group-based data summarisation

9. Variable selection and/or transformation

10. Exploratory visualisation using ggplot2

Regarding (b), the main body of the report (containing title, abstract, introduction, data, methods, results and discussion, and conclusions) cannot exceed 5 (five) A4 pages in 12pt Roman style font using single line spacing.

A maximum of 5 (five) additional pages are allowed for bibliographic references and appendices with any supporting material that you may want to include (e.g. your R codes). Therefore, your report cannot exceed ten (10) pages in total.

Only the main body and references will be formally assessed for grading, though the additional material can help clarify any issues that may arise during the marking process. Further details about the report structure are provided in the following section.

REPORT STRUCTURE

The report should have the following sections marked clearly:

• Title: In today's busy world, it is very important to make the most of your title. Make the title ‘eye-catching', informative and an accurate representation of the contents of the report.

• Abstract: The abstract provides a short sharp overview of the contents in the report and will be around 200 - 300 words. The abstract has five parts:

i. Introductory statement: background to the study, important issue(s) the report addresses. (approximately 1 to 2 sentences)

ii. Purpose of the report: state the objectives (1-2 sentences)

iii. Methodological approach: overview the data and methods (2-3 sentences)

iv. Findings or Achievements: list one or two of the main findings or achievements from your investigation (1-2 sentences)

v. Conclusions and Implications: what conclusions can be drawn from your investigation? How can the findings/achievements in your report deliver a benefit to people, things, systems or processes? (1-2 sentences)

• Introduction: The introduction sets the scene for the investigative efforts. It provides motivation for the work and relevant background information and references that will enable the reader to put in context the key objectives and achievements in your report. Address the important issues that have motivated your investigation.

At the end of the introduction clearly state the objectives of the report. Do not put any results from your investigation in the introduction. Do not discuss details about the data and methods in this section. Do not discuss your conclusions or key findings in the introduction.

• Data: This section should provide details about how the data was obtained and what the data represent. You should include information such as:

i. What the source of the data is.

ii. How the data was originally collected (e.g. from an experiment or observational study).

iii. The sample size.

iv. The number and types of variables.

v. Any known interventions or pre-processing that precede the ones described in your report.

vi. Any other information that is relevant to the understanding and assessment of your work/report.

• Methods: This section should summarise the data science methods that were used to process and to analyse the data, as well as the software version used to generate the results.

To cite R-Studio type RStudio.Version() from the command line. The methods should be appropriate to ensure that the objectives of the paper are met. At times, it may be helpful to interleave your text with a description of key calls to R functions that generated relevant results that you may want to highlight.

E.g. "The lmcommand with default settings for the arguments was used to produce a simple linear regression model between y and x in R-Studio". It is important to provide the sufficient level of details so that your methodology could be repeated by an independent person, while being clearly and objectively presented so that it can be understood without the need to check your complete R code.

• Results and Discussion: This section presents and discusses the results. The discussion centres on the outputs from the pre-processing and exploratory visualisations that you have performed. For example, what are the main outcomes? Why are they useful and what for? How are they interesting and why? Etc. In particular, how do the results align with the goals set in the introduction? What are the main achievements and their implications?

• Conclusions: Final remarks about the key achievements of the investigations and what makes them "interesting" or "useful", right now or for future work. Achievements or findings should be contrasted with the original objectives or hypotheses of the project. Make sure that you mention any limitations of your work here. Limit the conclusions to no more than two or three paragraphs.

• References. List the sources your investigation has drawn from. Note that all references should be referred to in the text.

• Appendices (optional): Add any supporting materials (possibly your detailed R codes) that might be useful to help assess your work.

• FORMAT

The main body of the report must be presented in 12pt Roman style font on no more than 5 (five) A4 pages, using single line spacing. Either a single column or double column format may be used. References and appendices can be listed on at most 5 (five) additional pages.

In total, the report cannot exceed 10 pages.

Attachment:- Capstone with R.rar

Reference no: EM132320540

Questions Cloud

Training plan for assimilating to a new organizational : How to create training plan for assimilating to a new organizational culture and regional culture and how can this plan be best enacted to expedite synthesis
What is the legal and ethical importance of a patients : 1. What do you believe the Patient's Bill of Rights leaves out, and what do you think is the best part of it.
How would you go about implementing : How would you go about implementing any changes now and what changes would you implement? What would you do in the future to avoid having this personnel problem
Plan to take to become a more effective manager : Using this information, what step(s) do you plan to take to become a more effective manager?
Discuss about the group-based data summarisation : Write a report with details about the problem in question, the data, the methods, results, analyses and findings.
What are the three most important forces : How are these forces changing the rules of competition in this industry?
Implementing erp systems : List some of the difficulties that companies have implementing ERP systems.
What is revenue management : What is Revenue Management? What role does revenue management plays in the food & beverage department and Sales department.
What is revenue management : What is Revenue Management? What role does revenue management plays in the food & beverage department and Sales department.

Reviews

len2320540

6/12/2019 12:26:20 AM

• The entire project must be accomplished using R/RStudio. Any calculations, visualisations, results, etc. produced using software other than R/RStudio (e.g. Excel, Tableau, etc.) is not accepted and therefore will not be assessed. Exploratory visualisation must use package ggplot2, rather than functions from base R or other R packages. The report itself can be written using a text editor of your choice (e.g. Microsoft Word or alike); R Markdown is also accepted, but it is not compulsory.

Write a Review

Other Subject Questions & Answers

  Cross-cultural opportunities and conflicts in canada

Short Paper on Cross-cultural Opportunities and Conflicts in Canada.

  Sociology theory questions

Sociology are very fundamental in nature. Role strain and role constraint speak about the duties and responsibilities of the roles of people in society or in a group. A short theory about Darwin and Moths is also answered.

  A book review on unfaithful angels

This review will help the reader understand the social work profession through different concepts giving the glimpse of why the social work profession might have drifted away from its original purpose of serving the poor.

  Disorder paper: schizophrenia

Schizophrenia does not really have just one single cause. It is a possibility that this disorder could be inherited but not all doctors are sure.

  Individual assignment: two models handout and rubric

Individual Assignment : Two Models Handout and Rubric,    This paper will allow you to understand and evaluate two vastly different organizational models and to effectively communicate their differences.

  Developing strategic intent for toyota

The following report includes the description about the organization, its strategies, industry analysis in which it operates and its position in the industry.

  Gasoline powered passenger vehicles

In this study, we examine how gasoline price volatility and income of the consumers impacts consumer's demand for gasoline.

  An aspect of poverty in canada

Economics thesis undergrad 4th year paper to write. it should be about 22 pages in length, literature review, economic analysis and then data or cost benefit analysis.

  Ngn customer satisfaction qos indicator for 3g services

The paper aims to highlight the global trends in countries and regions where 3G has already been introduced and propose an implementation plan to the telecom operators of developing countries.

  Prepare a power point presentation

Prepare the power point presentation for the case: Santa Fe Independent School District

  Information literacy is important in this environment

Information literacy is critically important in this contemporary environment

  Associative property of multiplication

Write a definition for associative property of multiplication.

Free Assignment Quote

Assured A++ Grade

Get guaranteed satisfaction & time on delivery in every assignment order you paid with us! We ensure premium quality solution document along with free turntin report!

All rights reserved! Copyrights ©2019-2020 ExpertsMind IT Educational Pvt Ltd