STAT 430 Topics in Applied Statistics - Assignment Problem

Assignment Help Basic Statistics
Reference no: EM132394236

Continent Total Populations

Continuing our work with the gapminder dataset. Recall that you can get the full dataset as a data. table object for local use with the following command:

gm <- data.table: :fread.

We will provide this dataset as a data .table object called gm here for you when grading in your function.

The gm data.table object contains several columns: country, continent, year, lifeExp (for life expectancy), pop (for population) and gdpPercap.

Write a function named gmTotalPop that does the following:

• Given a supplied input year, yr, select the subset corresponding to that year. As a simplification you can assume that only valid arguments for yr will be used.

• Grouped per continent, create a summary variable totpop which sums population for the given year and continent.

• Return a data.table object with two columns continent and totpop, in that order --- and have them ordered by totpop in descending order, i.e. highest first

Your function must be called 'gmTotalPop' and must be written in R. Feel free to use RStudio or another R interface to write and debug your function. Only submit the function for grading. Any additional code outside of your function will not be used or graded.

For additional discussion about this question please ask in this GitHub issue. student.R

1 # Enter your code below

2 #

3 # Do not alter the function signature: ensure it remains named 'emTotalPoo'

• adds columns lifeExpGrowth, popGrowth, gdppcGrowth for, respectively, (arithmetic) growth in life expectancy, population and per-capita GDP as a percent;

• omits missing values (in growth rates) by calling passing the object into na.omit() (a base R function for which you can consult the documentation).

Second, write a function which, using your first function,

• takes in a character variable with a country name, and

• returns data for just that country

• with the same columns as before and row order as before.

Third, use the initial function to write a third function which

• returns the five worst (lowest) observations (globally) in terms of gdppcGrowth.

• all columns should still be present in the returned tibble (although there should only be five rows).

Fourth, use the initial function to write a fourth function which

• returns the five best (highest) popGrowth observations (globally).

• all columns should still be present in the returned tibble (although there should only be five rows).

Your functions should return tibble (or tbl) objects with columns (in this order): country, continent, year, lifeExp, pop, gdpPercap, prevPeriod, lifeExpGrowth, popGrowth and gdppcGrowth.

You may wish to create your functions in RStudio in which case you can download the gapminder dataset with this command: gm <-
data.table: :fread. This file can be loaded into an R Session anyway you like.

Your function(s) must retain the provided names shown below, must be written in R, and should not utilize additional packages beyond dplyr (which you can assume to be loaded). Feel free to use RStudio or another R interface to write and debug your function. Only submit the functions for grading. Any additional code outside of your functions will not be used or graded. Ensure that you return the requested type. Also ensure you just functions from the requested package: dplyr.

Dplyr Flights Analysis

Recall the flights data on flights from New York airports. Using the dplyr library, write two functions that takes a tibble (or tbl) object called flights (which we will supply to your function) with columns for carrier, distance, arr_delay, and all the columns you have used before.

Your first task is to

• group the fights by carrier,

• then calculate the average distance of flights by carrier as a new column called avg_dist,

• finally, sort the output in descending order by average flights distance (so largest average distances first).

Your function should return a tibble (or tin) with two columns (in this order): carrier and avg_dist. There should be exactly one row for each carrier.

Your second task is to

• group the flights by dest and carrier,

• then calculate the median of arr_delay by destination and carrier as a new column called median _ arr _delay,

• finally, sort the output in ascending order by median_arr_delay (lowest delays first).

Your function should return a tibble (or tbl) with three columns (in this order): dest, carrier and median_arr_delay. There should be exactly one row for each dest and carrier pair.

You may wish to create your functions in RStudio in which case you can download the flights dataset here: flights.csv. This file can be loaded into an R Session anyway you like. Hint: you may want to make sure it is a tbi by using as.tbl() on the data .frame or data .table object you get from (read() or read . csv( ).

Your function(s) must retain the provided names shown below, must be written in R, and should not utilize additional packages beyond dplyr (which you can assume to be loaded). Feel free to use RStudio or another R interface to write and debug your function. Only submit the functions for grading. Any additional code outside of your functions will not be used or graded. Ensure that you return the requested type.

Also ensure you just functions from the requested package: dplyr.

For additional discussion about this question please ask in this GitHub issue.

Parallel To-Upper

R contains two helpful vectors letters and LETTERS with, respectively, all lowercase and uppercase letters. Let us assume LETTERS did not exist.

Starting from the letters vector, write a function that uses mclapply() to turn each lower-case letter in the vector letters into an upper-case letter in a new vector, in parallel. Let us also assume that we do not know that toupper() (which is vectorized) exists. Instead use a function oneup( which might be defined as

oneup <- function(x) toupper(x[1])

and which works on one character object at a time. Write a function which takes in two arguments:

• letters, a character vector (which could be the same as the R variable, or a different vector),

• cores, the number of (cpu) cores to use.

Then, in parallel, pass the elements of letters into oneup() to re-create LETTERS. The function should return a character vector equal to LETTERS.

Your solution should use mclapply(). a may assume that cores will be a positive integer. You may also assume that the following packages are loaded and available for your use:

• parallel

Please be extra careful when testing locally for this problem as there are difference across OS's, Windows particularly. If you're running into issues, us RStudio cloud to work on this problem.

Your function must be called par_toupper and must be written in R. Feel free to use RStudio or another R interface to write and debug your function.

Reference no: EM132394236

Questions Cloud

Confidence interval on the mean time between failures : 1. Consider the differential equation dy/dt = 0.2y(t) -2, y(0) = 40.
HLTENN006 - Apply Principles of Wound Management : HLTENN006 - Apply Principles of Wound Management In The Clinical Environment Assignment Help and Solution. TAFE NSW - Vocational College, Australia
Calculated t before applying the absolute value : What is the tcalc for a test that the mean equals 210 g? Use the correct sign for the calculated t before applying the absolute value.
Human resource department surveys : In order to assess work environment and morale, a human resource department surveys all employees every 2 years. Based on the information
STAT 430 Topics in Applied Statistics - Assignment Problem : STAT 430- Topics in Applied Statistics Assignment Help and Solutions-University of Illinois at Urbana-Champaign-USA- Create your functions in RStudio.
Normal approximation of the sampling distribution : Suppose that the expected number of phone calls that are handled by a switchboard in each second is 5.35. Assume that the distribution of the number
Calculating the confidence interval : How many degrees of freedom were there when calculating the confidence interval?
True difference between the population means : Calculate the margin of error of a confidence interval for the difference between the two population means. Round answer to 6 decimal places.
Player wins the amount of money : The player wins the amount of money shown in the table on the right, depending on where the spinner lands. Determine

Reviews

Write a Review

Basic Statistics Questions & Answers

  Statistics-probability assignment

MATH1550H: Assignment:  Question:  A word is selected at random from the following poem of Persian poet and mathematician Omar Khayyam (1048-1131), translated by English poet Edward Fitzgerald (1808-1883). Find the expected value of the length of th..

  What is the least number

MATH1550H: Assignment:  Question:     what is the least number of applicants that should be interviewed so as to have at least 50% chance of finding one such secretary?

  Determine the value of k

MATH1550H: Assignment:  Question:     Experience shows that X, the number of customers entering a post office during any period of time t, is a random variable the probability mass function of which is of the form

  What is the probability

MATH1550H: Assignment:Questions: (Genetics) What is the probability that at most two of the offspring are aa?

  Binomial distributions

MATH1550H: Assignment:  Questions:  Let’s assume the department of Mathematics of Trent University has 11 faculty members. For i = 0; 1; 2; 3; find pi, the probability that i of them were born on Canada Day using the binomial distributions.

  Caselet on mcdonald’s vs. burger king - waiting time

Caselet on McDonald’s vs. Burger King - Waiting time

  Generate descriptive statistics

Generate descriptive statistics. Create a stem-and-leaf plot of the data and box plot of the data.

  Sampling variability and standard error

Problems on Sampling Variability and Standard Error and Confidence Intervals

  Estimate the population mean

Estimate the population mean

  Conduct a marketing experiment

Conduct a marketing experiment in which students are to taste one of two different brands of soft drink

  Find out the probability

Find out the probability

  Linear programming models

LINEAR PROGRAMMING MODELS

Free Assignment Quote

Assured A++ Grade

Get guaranteed satisfaction & time on delivery in every assignment order you paid with us! We ensure premium quality solution document along with free turntin report!

All rights reserved! Copyrights ©2019-2020 ExpertsMind IT Educational Pvt Ltd