Use Python to plot the gender ratio

Assignment Help Other Subject
Reference no: EM132334650

Introduction to Data Science Assignment - Description

The aim of this assignment is to investigate and visualise data using various data science tools. It will test your ability to:

  • read data files in Python and extract related data from those files;
  • wrangle and process data into the required formats;
  • use various graphical and non-graphical tools to performing exploratory data analysis and visualisation;
  • communicate your findings in your report.

You will need to submit two files:

1. A report in PDF containing your answers to all the questions. Note that you can use Word or other word processing software to format your submission. Just save the final copy to a PDF before submitting. Make sure to include screenshots/images of the graphs you generate in order to justify your answers to all the questions.

2. The Python code as a Jupyter notebook file that you have written to analyse and plot the ata.

Tasks -

Task A: Investigating Population and Gender Equality in Education

In the task, you are required to visualise the relationship between the population in different countries, the income in different countries and the gender ratio (women % men, 25 to 34 years) in schools of different countries, and gain insights from how these relations and trends change over time. The data files used in this task were originally downloaded from Gapminder. We have extracted the data from the original files and put into a simpler format.

Please download the data from Moodle:

  • Population.csv: This file contains yearly data regarding the estimated resident population, grouping by countries around the world, between 1800 and 2018.
  • GenderEquality.csv: This data file contains yearly data about the ratio of female to male number of years in school, among 25- to 34-years-olds, including primary, secondary and tertiary education across different countries around the world, for the period between 1970 and 2015.
  • Income.csv: This data file contains yearly data of income per person adjusted fordifferences in purchasing power (in international dollars) across different countries around the world, for the period between 1800 and 2018.

A1. Investigating the Gender Equality Data

Have a look at the gender equality data.

1. Use Python to plot the gender ratio (women % men) in schools for Australia, China and United States over time.

  • What are the maximum and minimum values for gender ratio in Australia over the time period?
  • How do you compare the trend in gender ratio (women % men) in schools for these three countries over the time period? Which two countries have similar growth trend?

A2. Visualising the Relationship over Time

Have a look at the relationship between gender ratio in schools and income over time.

1. Use Python to build a Motion Chart comparing the gender ratio in schools, the income, and the population of each country over time. The motion chart should show the gender ratio in schools on the x-axis, the income on the y-axis, and the bubble size should depend on the population.

2. Run the visualisation from start to finish. (Hint: In Python, to speed up the animation, set timer bar next to the play/pause button to the minimum value.) And then answer the following questions:

  • Which two countries generally have the lowest gender ratio (women % men) in schools?
  • Select Cape Verde and Bolivia for this question: From which year onwards does Cape Verde start to have a higher gender ratio and a higher income from Bolivia. Please support your answer with a relevant python code and motion chart.
  • Is there generally a relationship between the amount of income and gender ratio (women % men) in schools in all countries during the whole period of time? What kind of relationship? Explain your answer.
  • Any other interesting things you notice in the data? Please support your answer with relevant python code and/or motion chart.

Task B: Exploratory Analysis on Big Data

In this part, you are required to do some exploratory analysis on the health insurance marketplace data. The file InsuranceRates.csv.zip contains data on health and dental plans offered to individuals and small businesses through the US Health Insurance Marketplace. This data was originally prepared and released by the Centers for Medicare & Medicaid Services (CMS). The data was then published on Kaggle. The file we provide is an extract from the data on Kaggle. Unzipped, the file is over 500MB and contains the following fields:

COLUMN

DESCRIPTION

BusinessYear

Year for which plan provides coverage to enrollees.

StateCode

Two-character state abbreviation indicating the state where the plan is offered

IssuerId

Five-digit numeric code that identifies the issuer organization in the Health Insurance Oversight System (HIOS)

PlanId

Fourteen-character alpha-numeric code that identifies an insurance plan within HIOS

Age

Categorical indicator of whether a subscriber's age is used to determine rate eligibility for the insurance plan.

IndividualRate

Dollar value for the monthly insurance premium cost applicable to a non-tobacco user for the insurance plan in a rating area, or to a general subscriber if there is no tobacco preference.

IndividualTobaccoRate

Dollar value for the monthly insurance premium cost applicable to a tobacco user for the insurance plan in a rating area

Load the InsuranceRates.csv data in Python and answer the following questions:

B1. How many years does the data cover? (Hint: pandas provides functionality to see 'unique' values.)

1. What are the possible values for 'Age'?

2. What are the average, maximum and minimum values for the monthly insurance premium cost for an individual? Do those values seem reasonable to you?

B2. Variation in Costs over Time and with Age

Generate boxplots (or other plots) of insurance costs versus year and age to answer the following questions:

1. Are insurance policies becoming cheaper or more expensive over time? Is the median insurance cost increasing or decreasing?

2. How does insurance costs vary with the age of the person being insured? (Hint: filter out the value 'Family Option' before plotting the data.) In terms of median cost, do older people pay more or less for insurance than younger people? How much more/less to they pay?

Task C: Exploratory Analysis on Other Data

(Note: This additional task is for those students wishing to get higher grades for their assessment. It is not required to pass the assignment, but it is required to get higher credit.)

Find some publicly available data and repeat some of the analysis performed in Tasks A and B above. Good sources of data are government websites,

Please note that your analysis should at least contain visualisation, interpretation of your visualisation and a prediction task.

Attachment:- Data Science Assignment File.rar

Reference no: EM132334650

Questions Cloud

Expectancy theory of motivation : Which of the following actions would you take to influence the effort, performance, and rewards perceptions that are integral to this theory?
What you understand about results based on your experience : What you understand about your results based on your experience and an explanation of why? Do the results from this assessment affirm or contradict.
Idea of giving old t-shirts new life : Ross Lohr and Nathan Rothstein have built a thriving small business from the idea of giving old T-shirts new life by having them cut into squares and sewn into
What is the target market for each ad : What is the target market for each ad? What do you think that Anheuser-Busch has identified as its main consumer's buying behavior for each product?
Use Python to plot the gender ratio : FIT5145 - Introduction to Data Science Assignment – Description, Monash University, Australia. Use Python to plot the gender ratio
Effective leader howard schultz : 1. Assess if your selected leader uses more boss- or moresu bordinate- centered leadership styles.
Compare characteristics of classic and modern cryptography : Compare and contrast the characteristics of classic and modern cryptography. (Maximum: Half a Page, double-spaced). References and in-text citations must.
Mode for entering a new market is an important consideration : For a fast food joint that wishes to enter a foreign market, which mode of entry would you recommend and why?
Consider the social determinants of health : Consider the social determinants of health, epidemiological measures and population assessment process.

Reviews

len2334650

7/7/2019 11:43:13 PM

Tasks: There are three tasks (A, B & C) in this assignment. Task C is “Optional” task for higher credit. Students that complete only tasks A and B can only get a maximum of Distinction. Students who attempt task C can achieve a higher grade by demonstrating critical analysis skills and a deeper understanding of the task. You need to use Python to complete the tasks. Please note that your analysis should at least contain visualisation, interpretation of your visualisation and a prediction task.

Write a Review

Other Subject Questions & Answers

  Cross-cultural opportunities and conflicts in canada

Short Paper on Cross-cultural Opportunities and Conflicts in Canada.

  Sociology theory questions

Sociology are very fundamental in nature. Role strain and role constraint speak about the duties and responsibilities of the roles of people in society or in a group. A short theory about Darwin and Moths is also answered.

  A book review on unfaithful angels

This review will help the reader understand the social work profession through different concepts giving the glimpse of why the social work profession might have drifted away from its original purpose of serving the poor.

  Disorder paper: schizophrenia

Schizophrenia does not really have just one single cause. It is a possibility that this disorder could be inherited but not all doctors are sure.

  Individual assignment: two models handout and rubric

Individual Assignment : Two Models Handout and Rubric,    This paper will allow you to understand and evaluate two vastly different organizational models and to effectively communicate their differences.

  Developing strategic intent for toyota

The following report includes the description about the organization, its strategies, industry analysis in which it operates and its position in the industry.

  Gasoline powered passenger vehicles

In this study, we examine how gasoline price volatility and income of the consumers impacts consumer's demand for gasoline.

  An aspect of poverty in canada

Economics thesis undergrad 4th year paper to write. it should be about 22 pages in length, literature review, economic analysis and then data or cost benefit analysis.

  Ngn customer satisfaction qos indicator for 3g services

The paper aims to highlight the global trends in countries and regions where 3G has already been introduced and propose an implementation plan to the telecom operators of developing countries.

  Prepare a power point presentation

Prepare the power point presentation for the case: Santa Fe Independent School District

  Information literacy is important in this environment

Information literacy is critically important in this contemporary environment

  Associative property of multiplication

Write a definition for associative property of multiplication.

Free Assignment Quote

Assured A++ Grade

Get guaranteed satisfaction & time on delivery in every assignment order you paid with us! We ensure premium quality solution document along with free turntin report!

All rights reserved! Copyrights ©2019-2020 ExpertsMind IT Educational Pvt Ltd