Reference no: EM132391547
Assignment
Overview
It is time to start getting familiar with R! Remember the 17 years of Ontario library data from assignment 1? Well now it's time to start analyzing it! Data Scientists very rarely interact with data that is in a clean and perfect format. One of the most under rated steps in the analytics process is the data prep step! The data you interact with is almost never in a structure that will allow you to analyze it. In this assignment you will get to roll your sleeves up and start to see the power of R!
Purpose
This assessment is designed to get you to start thinking about data structures and coding in R. You will get an introduction to some of the most common tasks Data Scientists have to perform before any analysis can be done such as gathering data, joins, column consolidation and creating new data! In this assessment you will achieve the following course learning outcomes:
• Apply sound principles and practices of data manipulation, validation, and transformation leveraging R programming
• Program at a basic level using R programming language
• Explore data analysis techniques such as correlation, scatter plots, crosstab analysis, distribution of data, and outlier analysis
Instructions
You will be required to submit your R code so the instructor can reproduce all your output starting from the data import step (Assume that I will have downloaded the Libraries data sets on my own machine).
1. Import all the datasets in R and save them in separate objects.
2. Create an object that merges all the files into one object. This is a tip in how to be efficient with your data - instead of having 17 separate objects, why not store them into 1 object?
3. Write a sequence of code which will create a single data set that can be used to output a table that lists the number of libraries in each city for the last 17 years.
|
1999
|
2000
|
2001
|
2002
|
2003
|
2004
|
...
|
|
City1
|
|
|
|
|
|
|
|
|
City2
|
|
|
|
|
|
|
|
|
City3
|
|
|
|
|
|
|
|
|
City4
|
|
|
|
|
|
|
|
|
...
|
|
|
|
|
|
|
|
4. Write a sequence of code that shows the total number of active cardholders for each library for the last 17 years
|
1999
|
2000
|
2001
|
2002
|
2003
|
2004
|
...
|
|
Library1
|
|
|
|
|
|
|
|
|
Library2
|
|
|
|
|
|
|
|
|
Library3
|
|
|
|
|
|
|
|
|
Library4
|
|
|
|
|
|
|
|
|
...
|
|
|
|
|
|
|
|
5. Write a sequence of code that lists the top 5 libraries with the highest average Total Operating Revenues from 2012-2017.
6. Submit your full R code ensuring the professor is able to produce all your output. Include comment lines throughout your code describing what the blocks of code are for.