Create basic scatterplot to represent life expectancy

Assignment Help Other Subject

Reference no: EM132375203

Data Applications Assignment - Problem Set

Instructions - Problem Set is organized as follows:

Question 1 simulates a data visualization task. It builds off skills you have applied in previous labs and problem sets.

Question 2 asks you to execute and interpret a fairly basic regression model.

You are required to use an R Notebook for problem sets. Therefore, you should have a file with the extension .Rmd once you Knit your notebook. Please name your notebook file ps3_lastname1_lastname2_lastname3. This is the file you should submit via eLC. I will Knit your notebook and grade accordingly, adding comments to your notebook so everything is self-contained.

It is time to start using your R Notebook as if you are preparing a report for an external audience. Throughout this problem set, consider what code and results should be included in your output. For instance, a reader probably isn't interested in the importing and wrangling needed to produce plots. If you think code or results are not necessary for a reader to see, suppress accordingly. Use your best judgment; I will not grade your choices strictly.

Students may work in groups of at most three. If in a group, please provide one submission. I cannot enforce this, but I highly recommend groups to actually work together in each other's presence rather than working on separate questions remotely.

Any external data or documentation required to complete a problem set will be available on eLC. Please read each question carefully and provide a thorough response.

Question 1 - United Nations life expectancy data

Life expectancy at birth can vary along time or between countries because of many causes: the evolution of medicine, the degree of development of countries, or the effect of armed conflicts. Life expectancy varies between gender, as well. Women generally live longer than men. Why? Several potential factors, including biological reasons and the theory that women tend to be more health conscious.

Let's create some plots to explore the inequalities about life expectancy at birth around the world. We will use a dataset from the United Nations Statistics Division that is available on eLC.

Part 1: Import

Take a look at the UNdata.csv file before importing. Since our focus is life expectancy between sexes by country and year, this file clearly has some unnecessary columns and rows.

Import the UNdata.csv dataset. Name the new object life_expectancy. Within the import command, change the variable names to the following:

country
sex
year
source
unit
lifeExp
footnote

Also, tell R to skip the first row. and set the column types to be stored as character except for the one quantitative variable that should be stored as integer. Set the maximum number of rows imported so that the footnotes in the bottom rows of the .csv file are not imported. This maximum number should be equal to the number of rows that contain actual data in the .csv file.

Part 2: Wrangle

Our first plot will compare male and female life expectancy with the most recent data available in life_expectancy. The dataset still requires some wrangling to facilitate making such a plot.

Generate a new object that contains a subset of life_expectancy according the following instructions:

Keep only the following variables:

country
sex
year
lifeExp

Include only the most recent time period.

Drop the year variable since it is no longer necessary.

Change the sex variable into two columns where lifeExp serves as their value.

Part 3: Scatterplot Step-by-Step

First, create a basic scatterplot to represent life expectancy of males (x-axis) against females (y-axis).

Next, adjust this plot to make it easier to interpret. Set limits for the x and y axis from 35 to 85. Add a dashed reference line that intersects the y-axis at 0 and has a slope of 1.

Briefly explain to your readers how they should interpret a point lying either above or below the reference line. In other words, what does the reference line help a reader interpret?

Next, adjust the scatterplot according to the following directions:

Alter the points so that their outline color is "white", their fill color is "chartreuse3", shape equals 21, alpha equals 0.55, and size equals 4.
Add an appropriate label for its title, a subtitle to specify which years the data include, a caption to report the source of the data, and appropriate labels for the x and y axes.

We want to draw attention to some countries where the gap in life expectancy between men and women is particularly high.

For this to be done, you need to generate two new objects-top_male and top_female-that contain the 3 countries with the highest difference in life expectancy for males and females, respectively.

Lastly, modify your previous scatterplot code according to the following instructions:

Add a label aesthetic assigned to country
Add two text geoms. One should use top_male as its data and the other should use top_female. Set the size of the text equal to 3.
Add a theme that you think is best.

Now that you have a fantastic plot, provide a brief interpretation containing what a reader should learn from it.

Part 4: Scatterplot 2

Since our data contain historical information, let's see how life expectancy between males and females has changed over time. Our second plot will represent the difference between men and women across countries between two periods: 2000-2005 and 1985-1990.

First, we need to generate a new object that subsets life_expectancy. Ultimately, we need a dataset where country is the unit of analysis, contains life expectancy for each sex in each time period as separate variables, and contains two more variables that are the difference in life expectancy between the two time periods for each sex.

The following instructions are provided to help you generate this new subset:

Include only observations where year equals the two aformentioned time periods
Unite the sex and year columns into one column separated by an underscore
Change the hyphen separating years to an underscore so R does not think it is a minus operation (code provided below)
Transform the sex_year column into four columns where lifeExp serves as their value
Create two new variables that calculate the change in life expectancy over time for each sex (difference = new value - old value)

Now we are ready to plot. This time we want to plot the difference in male life expectancy over time on the x-axis and the difference among females on the y-axis.

Fortunately, much of the code you used for the last plot will work for this new one. You can copy and paste the code from the last plot and make the required modifications to it to save some time.

As for modifying the code you hust reused, you will obviously need to change the variables used for this new scatterplot. Also, the axis scales need to be adjusted based on the values of the new variables (hint: use the summary function to obtain min and max values). Choose the scale limits you think work best so long as they are the same for both x and y. You will also need to change the data source for the text labels that highlight interesting countries.

Finally, add two new reference lines. A dashed horizontal line at 0 and a dashed verital line at 0.

Some code is provided below to assist you.

As before, provide an interpretation of the graph for readers. What does a point's position relative to each reference line mean? What is the main takeaway for each group of interesting countries?

Question 2 -

For this question, we will use the Georgia school district data from lab 3 and problem set 2.

load("ga_schdist_clean.RData")

After seeing the results of your last analysis concerning districts with the highest and lowest expenditures, suppose your boss is now interested to know what variables are associated with district revenues.

Based on the data avaible to us, we may suspect the number of students enrolled in special programs (e.g. Limited English Proficiency) and the share of total enrollment comprised of black and Hispanic student explain revenues.

Therefore, we want to create a new dataset that contains percentages of enrollment for special programs and race as well as revenues expressed in per pupil terms. The below code does this. Note the use of mutate_at as a shortcut to mutate multiple variables with the same function.

Part 1 - Run Regressions

With this new dataset, run two regressions according to the following model:

Y_i = β₀+ β₁PctLEP_i + β₂PctSPED_i + β₃PctFRPL_i + β₄PctBlack_i + β₅PctHisp_i + ?

where y is total revenues for each district i in the first regression and total local revenues in the second.

Part 2 - Regression Tables

Provide your reader a table of results for each regression. Be sure to provide a line of text that tells your reader which table belongs to which regression model.

Part 3 - Interpret Coefficients

Provide an interpretation for the following coefficients:

The percent of students enrolled in special education in regression 1.

The percent of black and Hispanic students in regression 2.

Attachment:- Data Applications Assignment File.rar

Reference no: EM132375203

Questions Cloud

What is the relationship between scm-marketing : What is the relationship between SCM, marketing, and operations? What activities require coordination between these three areas of a firm?

Choose two data-driven variables and two policy variables : Choose two data-driven variables and two policy variables that are used in allocation studies. Explain the meaning and relative importance of each.

How the patient factor impact pathophysiology of cvi : Explain how the patient factor you selected might impact the pathophysiology of CVI and DVT. Describe how you would diagnose and prescribe treatment.

Contingency thinking in the management process : Explain and give examples of several ways a manager might use contingency thinking in the management process.

Create basic scatterplot to represent life expectancy : PADP 7120 Data Applications Assignment - Problem Set, University of Georgia, UGA, United States. Create basic scatterplot to represent life expectancy

Legal claims could patty make against cash mart : What types of legal claims could Patty make against Cash Mart, Gerry, and Acme Corporation? What are possible tort claims that Patty can make against Cash Mart

Difference between clinical and statistical significance : Not all EBP projects result in statistically significant results. Define clinical significance, and explain the difference between clinical and statistical.

Comparative programming languages - parallel implementations : Programming applications to specification in a number of different programming languages - Comparative Programming Languages - Parallel Implementations

Rules with regard to curfew and drinking alcoholic beverages : she went to her coach and accused Jean of violating team rules with regard to curfew and drinking alcoholic beverages

Reviews

len2375203

9/23/2019 10:42:31 PM

Instructions: Homework assignment for a working group of students. Need help with R coding and the two problems. Would like a draft completion by the date above. Needs to be done in RStudio Notebook file. The main thing is the r code works and the questions are answered.

Write a Review

Required(*) Message

User Account

All Pages