Reference no: EM132394236
Continent Total Populations
Continuing our work with the gapminder dataset. Recall that you can get the full dataset as a data. table object for local use with the following command:
gm <- data.table: :fread.
We will provide this dataset as a data .table object called gm here for you when grading in your function.
The gm data.table object contains several columns: country, continent, year, lifeExp (for life expectancy), pop (for population) and gdpPercap.
Write a function named gmTotalPop that does the following:
• Given a supplied input year, yr, select the subset corresponding to that year. As a simplification you can assume that only valid arguments for yr will be used.
• Grouped per continent, create a summary variable totpop which sums population for the given year and continent.
• Return a data.table object with two columns continent and totpop, in that order --- and have them ordered by totpop in descending order, i.e. highest first
Your function must be called 'gmTotalPop' and must be written in R. Feel free to use RStudio or another R interface to write and debug your function. Only submit the function for grading. Any additional code outside of your function will not be used or graded.
For additional discussion about this question please ask in this GitHub issue. student.R
1 # Enter your code below
2 #
3 # Do not alter the function signature: ensure it remains named 'emTotalPoo'
• adds columns lifeExpGrowth, popGrowth, gdppcGrowth for, respectively, (arithmetic) growth in life expectancy, population and per-capita GDP as a percent;
• omits missing values (in growth rates) by calling passing the object into na.omit() (a base R function for which you can consult the documentation).
Second, write a function which, using your first function,
• takes in a character variable with a country name, and
• returns data for just that country
• with the same columns as before and row order as before.
Third, use the initial function to write a third function which
• returns the five worst (lowest) observations (globally) in terms of gdppcGrowth.
• all columns should still be present in the returned tibble (although there should only be five rows).
Fourth, use the initial function to write a fourth function which
• returns the five best (highest) popGrowth observations (globally).
• all columns should still be present in the returned tibble (although there should only be five rows).
Your functions should return tibble (or tbl) objects with columns (in this order): country, continent, year, lifeExp, pop, gdpPercap, prevPeriod, lifeExpGrowth, popGrowth and gdppcGrowth.
You may wish to create your functions in RStudio in which case you can download the gapminder dataset with this command: gm <-
data.table: :fread. This file can be loaded into an R Session anyway you like.
Your function(s) must retain the provided names shown below, must be written in R, and should not utilize additional packages beyond dplyr (which you can assume to be loaded). Feel free to use RStudio or another R interface to write and debug your function. Only submit the functions for grading. Any additional code outside of your functions will not be used or graded. Ensure that you return the requested type. Also ensure you just functions from the requested package: dplyr.
Dplyr Flights Analysis
Recall the flights data on flights from New York airports. Using the dplyr library, write two functions that takes a tibble (or tbl) object called flights (which we will supply to your function) with columns for carrier, distance, arr_delay, and all the columns you have used before.
Your first task is to
• group the fights by carrier,
• then calculate the average distance of flights by carrier as a new column called avg_dist,
• finally, sort the output in descending order by average flights distance (so largest average distances first).
Your function should return a tibble (or tin) with two columns (in this order): carrier and avg_dist. There should be exactly one row for each carrier.
Your second task is to
• group the flights by dest and carrier,
• then calculate the median of arr_delay by destination and carrier as a new column called median _ arr _delay,
• finally, sort the output in ascending order by median_arr_delay (lowest delays first).
Your function should return a tibble (or tbl) with three columns (in this order): dest, carrier and median_arr_delay. There should be exactly one row for each dest and carrier pair.
You may wish to create your functions in RStudio in which case you can download the flights dataset here: flights.csv. This file can be loaded into an R Session anyway you like. Hint: you may want to make sure it is a tbi by using as.tbl() on the data .frame or data .table object you get from (read() or read . csv( ).
Your function(s) must retain the provided names shown below, must be written in R, and should not utilize additional packages beyond dplyr (which you can assume to be loaded). Feel free to use RStudio or another R interface to write and debug your function. Only submit the functions for grading. Any additional code outside of your functions will not be used or graded. Ensure that you return the requested type.
Also ensure you just functions from the requested package: dplyr.
For additional discussion about this question please ask in this GitHub issue.
Parallel To-Upper
R contains two helpful vectors letters and LETTERS with, respectively, all lowercase and uppercase letters. Let us assume LETTERS did not exist.
Starting from the letters vector, write a function that uses mclapply() to turn each lower-case letter in the vector letters into an upper-case letter in a new vector, in parallel. Let us also assume that we do not know that toupper() (which is vectorized) exists. Instead use a function oneup( which might be defined as
oneup <- function(x) toupper(x[1])
and which works on one character object at a time. Write a function which takes in two arguments:
• letters, a character vector (which could be the same as the R variable, or a different vector),
• cores, the number of (cpu) cores to use.
Then, in parallel, pass the elements of letters into oneup() to re-create LETTERS. The function should return a character vector equal to LETTERS.
Your solution should use mclapply(). a may assume that cores will be a positive integer. You may also assume that the following packages are loaded and available for your use:
• parallel
Please be extra careful when testing locally for this problem as there are difference across OS's, Windows particularly. If you're running into issues, us RStudio cloud to work on this problem.
Your function must be called par_toupper and must be written in R. Feel free to use RStudio or another R interface to write and debug your function.