FIT5145 Introduction to Data Science Assignment

Assignment Help Other Subject
Reference no: EM132633170

FIT5145 Introduction to Data Science - Monash University

The aim of this assignment is to investigate and visualise data using various data science tools. It will test your ability to:

• Using R,
o read data files and extract related data from those files;
o wrangle and process data into the required formats;
o use various graphical and non-graphical tools to perform exploratory data analysis and visualisation; and
• communicate your findings in your report.

Tasks:
• There are two tasks (A & B) in this assignment. Each task has separate data set files.
• You need to use R to complete the tasks.
• You need to use R Markdown to communicate
o your answers,
o the code you used to complete the tasks, and
o your explanation of the steps you took and any issues that arose

It is crucial that the R Markdown report you submit clearly identifies which questions you are answering, and explains how you are processing the data and why you are processing the data in that way. It is not adequate for you to just answer the questions for each task or just supply the code you used.

The data supplied for each task will also have to be wrangled in order to answer the questions. The supplied data is not guaranteed to be "clean" and without faults. This may require you to
• examine the data,
• filter the data,
• deal with missing or inconsistent values or formats,
• deal with any outliers or exceptional values,
• merge or divide the values or data sets,
• sort the data, and/or
• any other pre-processing steps that are required in order to be able to analyse the data. Your report must explain why and how you are performing this data wrangling, including identifying any issues you find with the data.

Task A: Investigating the size of the Indigenous Australian Population

In this task, you are required to visualise the relationship between the distribution and age of Indigenous Australians and gain insights into relations and trends over time. The data files used in this task were originally downloaded from the Australian Bureau of Statistics (ABS). We have extracted the data from the original files and put it into a simpler format. Please download the data from Moodle:
• IndigAusPopData_by_region (Data1): This file contains yearly data regarding the estimated resident population of Indigenous Australians, grouping by indigenous regions, between 2016 to 2031.
• IndigAusPopData_by_state (Data2): This file contains yearly data regarding the estimated resident population of Indigenous Australians, grouping by state or territory, between 2006 and 2031.

A1. Investigating the Distribution of Indigenous Australians

Indigenous Australians are part of Australian society everywhere, but some parts of the country have larger populations than others. For Data1, Australia is segmented into regions (titled "Indigenous regions") and the expected Indigenous population for each region is indicated. This data also divides each region's population into different age groups.

1. Use R to read, wrangle and analyse the data in Data1. Make sure you describe any complications you encounter and the steps you take when answering the following questions.

a. What regions have the maximum and minimum total Indigenous populations in 2016 and 2031?

b. What region/s have the maximum and minimum growth or decay rates of their total
Indigenous population between 2016 and 2031?

Calculate these rates as the percentage difference between the 2016 and 2031, e.g., if 2031 population = 5500 & 2016 population = 5000,
then rate = (5500 - 5000) / 5000 = 500/5000 = 0.1, so 10% growth

c. Plot and describe the growth or decay of the total Indigenous populations for the capitals of the 8 state/territories across all time periods.

For these calculations, you will need to work out the growth/decay rates for each time period, where the total population of the capital in time period N is compared to that in time period N+1.
e.g., if 2017 population = 5050 and 2016 population = 5000,
then rate = (5050 - 5000) / 5000 = 50/5000 = 0.01, so 1% growth for 2016-2017

A2. Investigating the Ages of Indigenous Australians

On average, the lifespan of Indigenous Australians is lower than that of the overall Australian population, due to a variety of socio-economic factors. Data1 and Data2 give separate populations for different ages or age groups, but because this is about living populations, not when they die, we can't use it to calculate average lifespans. Instead, let's look at how many children are in the populations. Make sure you describe any complications you encounter and the steps you take when answering the following questions.

1. Using Data1, which region has the highest percentage of children in its total 2016 population?

For this, calculate this as a percentage of the total population for a region. The ABS commonly considers children to be under 15 years of age.

2. Data2 includes estimated populations measured for the years 2006-2016 and projected estimates predicted for the years 2016-2031. Data1 just uses projected estimates. Using Data2 only, calculate and discuss which state or territory has the highest percentage of children in its total 2006, 2016 and 2031 populations.

3. Use R to build a Motion Chart comparing the total Indigenous Australian population of each region to the percentage of Indigenous Australian children in each state/territory. Use the region populations calculated from Data1 and the child percentage values calculated from Data2. The motion chart should show the population on the x-axis, the percentage on the y-axis, the bubble size should depend on the population.

Hint: an example of how to construct an R motion chart can be found on Moodle. You will have to install the ‘googleVis' package and may have to allow Flash to work on your browser (see https://community.rstudio.com/t/gvismotionchart-from-googlevis-is-not-working-any- suggestion/6109/9 for advice on allowing Flash for Chrome). If you cannot get the example script to work, contact your tutor.

4. Using the Motion Chart, answer the following questions, supporting your answers with relevant R code and/or Motion Charts
a. Which region's population overtakes that of another region in the same state/territory? In which year/s does this happen?

b. Is there generally a relationship between the Indigenous Australian population size and percentage of children in the population? If so, what kind of relationship? Explain your answer.

c. Colour is commonly used in data visualisation to help understand data. Which aspect of this data would you use colour for in your plot and why?

d. Are there any other interesting things you notice in the data or any changes you would recommend for the Motion Chart?

B: Exploratory Analysis on Australian Immunisation rates

In this task, you are required to do some exploratory analysis on data relating to the Australian childhood immunisation rates. This data was originally prepared and released through the Australian Government's Australian Institute of Health and Welfare. We have extracted the data from the original files and put it into a simpler format. Please download the data from Moodle:

• AusImmunisationData (Data3): This file contains yearly data regarding the number of 1, 2 and 5 year-old Australian children fully or partially immunised in various Primary Health Network (PHN) areas.

COLUMN

DESCRIPTION

State

State or territory for the PHN area

PHN code

Identification number for PHN area relating to the data

PHN area name

Description of PHN area

Reporting Year

Financial period examined

Age group

Age group of children

Number of registered children

Number of children registered in the age group

Number fully immunised

Number of children in the age group who were fully immunised, according to government objectives

Number not fully immunised

Number of children in the age group who were not fully immunised, according to government objectives

Number of registered IndigAus children

Number of Indigenous Australian children in the age group

Number IndigAus fully immunised

Number of Indigenous Australian children in the age group who were fully immunised, according to government

objectives

Number IndigAus not fully immunised

Number of Indigenous Australian children in the age group who were not fully immunised, according to government objectives

Interpret with caution

This area's eligible population is between 26 and 100 registered children.

Use R to read, wrangle and analyse the data from Data3. Make sure you describe any complications you encounter and the steps you take when answering the following questions.

B1. Values and Variables

1. How many PHN areas does the data cover?

2. What are the possible values for 'PHN code'?

3. For each row, calculate the percentage of Australian children that are fully immunised (this is the immunisation rate). What are the average, maximum and minimum immunisation rates? Calculate the same for the group that are Indigenous Australian children. Do all of those values seem statistically reasonable to you?

B2. Variation in rates over Time, Age and Location
Generate boxplots (or other plots) of the immunisation rates versus year and age to answer the following questions:

1. Have the immunisation rates improved over time? Are the median immunisation rates increasing, decreasing or staying the same?

2. How do the immunisation rates vary with the age of the child?

Generate boxplots (or other plots) of the immunisation rates versus locations and answer the following questions:

3. What is the median rate per state/territory?

4. Which states or territories seem most consistent in their immunisation rates?

Attachment:- Introduction to Data Science.rar

Reference no: EM132633170

Questions Cloud

Miller urey experiment-endosymbiotic theory : A short summary to compare and contrast the theories: The Miller Urey Experiment, Endosymbiotic Theory
Compare training needs between low-level security agency : Compare the training needs between a low-level security agency and an agency that requires significantly higher trained security professionals.
Find the required rate of return for equity investors : Find the required rate of return for equity investors of a firm with a beta of 1.2 when the risk-free rate is 6%, the market risk premium is 4%
Find element with a frameshift mutation : Explain whether a Tn3 element with a frameshift mutation early in the tnpA gene would be able to form a cointegrate.
FIT5145 Introduction to Data Science Assignment : FIT5145 Introduction to Data Science Assignment Help and Solution, Monash University - Assessment Writing Service
What is the sales volume in units to earn the desired profit : Houston Company produces a product that sells for $175 per unit. What is the sales volume in units and in dollars required to earn the "desired profit.
How does selection shape microevoultion : How does selection shape microevoultion? Give an example where it has shaped human evolution?
International actors and perspectives : Analyze at least two (2) major problems associated with U.S.-based disaster relief coordination and response at the international level.
How much did the asset cost : If a business had depreciation expense of $4000 for the last half of the year when it acquired a non current asset. How much did the asset cost

Reviews

Write a Review

Other Subject Questions & Answers

  Cross-cultural opportunities and conflicts in canada

Short Paper on Cross-cultural Opportunities and Conflicts in Canada.

  Sociology theory questions

Sociology are very fundamental in nature. Role strain and role constraint speak about the duties and responsibilities of the roles of people in society or in a group. A short theory about Darwin and Moths is also answered.

  A book review on unfaithful angels

This review will help the reader understand the social work profession through different concepts giving the glimpse of why the social work profession might have drifted away from its original purpose of serving the poor.

  Disorder paper: schizophrenia

Schizophrenia does not really have just one single cause. It is a possibility that this disorder could be inherited but not all doctors are sure.

  Individual assignment: two models handout and rubric

Individual Assignment : Two Models Handout and Rubric,    This paper will allow you to understand and evaluate two vastly different organizational models and to effectively communicate their differences.

  Developing strategic intent for toyota

The following report includes the description about the organization, its strategies, industry analysis in which it operates and its position in the industry.

  Gasoline powered passenger vehicles

In this study, we examine how gasoline price volatility and income of the consumers impacts consumer's demand for gasoline.

  An aspect of poverty in canada

Economics thesis undergrad 4th year paper to write. it should be about 22 pages in length, literature review, economic analysis and then data or cost benefit analysis.

  Ngn customer satisfaction qos indicator for 3g services

The paper aims to highlight the global trends in countries and regions where 3G has already been introduced and propose an implementation plan to the telecom operators of developing countries.

  Prepare a power point presentation

Prepare the power point presentation for the case: Santa Fe Independent School District

  Information literacy is important in this environment

Information literacy is critically important in this contemporary environment

  Associative property of multiplication

Write a definition for associative property of multiplication.

Free Assignment Quote

Assured A++ Grade

Get guaranteed satisfaction & time on delivery in every assignment order you paid with us! We ensure premium quality solution document along with free turntin report!

All rights reserved! Copyrights ©2019-2020 ExpertsMind IT Educational Pvt Ltd