Obtain a Pearson correlation matrix relating variables count

Assignment Help Other Subject
Reference no: EM132374126

Statistics for Data Science Assignment - Capital BikeShare

Bike sharing systems are a new generation of bike rentals where the whole process from membership, rental and return has become automatic. Through these systems, a user is able to easily rent a bike from a particular position and return the bike at another position. Currently, there are over 500 bike-sharing programs around the world, with some of the best and largest found in Hangzhou (China), Paris (France), London (England), New York City (US) and Montreal (Canada). Great interest in these systems exists due to their role in addressing traffic congestion, environmental impact and population health issues in big cities.

The data for this assignment comes from one such program, called Capital Bikeshare, operating in Washington in the US. It has over 3000 bicycles that can be rented from over 350 stations across Washington, D.C., Arlington and Alexandria, VA and Montgomery County, MD. Their website encourages users to check out bikes for a trip to work, to run errands, go shopping, or visit friends and family. Users can join Capital Bikeshare for one to three days (casual membership), or for a month or a year (registered membership). Access to the Capital Bikeshare fleet of bikes is available 24 hours a day, 365 days a year. The first 30 minutes of each trip are free.

You will use data derived from Capital Bikeshare trip records to build a statistical model for the purposes of predicting the number of rentals per day.

References and Data Sources:

1. Bache, K. & Lichman, M. (2013). UCI Machine Learning Repository Irvine, CA: University of California, School of Information and Computer Science.

2. Fanaee-T, Hadi, and Gama, Joao, 'Event labeling combining ensemble detectors and background knowledge', Progress in Artificial Intelligence (2013): pp. 1-15, Springer Berlin Heidelberg.

Data files for this assignment:

The main data file for this assignment is called daily.sas7bdat and contains daily counts of bike rentals for 2011 and 2012, derived from Capital Bikeshare trip history data, with additional weather and seasonal information. The data was downloaded from the UCI Machine Learning Repository. Variables in that file are as follows:

Variable

Description

instant

Record index

dteday

Date

season

Winter, spring, summer or fall (northern hemisphere)

yr

0 = 2011, 1 = 2012

month

Month (January to December)

weekday

Day of the week (Monday to Sunday)

workingday

Working day = 1, weekend or public holiday = 0

temp

Normalised temperature in degrees Celsius; observed temperature divided by 41 (max)

atemp

Normalised 'feels like' temperature in degrees Celsius; values divided by 50 (max)

hum

Normalised humidity; observed values divided by 100 (max)

windspeed

Normalised wind speed; values divided by 67 (max)

casual

Count of casual users

registered

Count of registered users

count

Total count of bike rentals (casual plus registered)

The second file for this assignment is called random_sample.xlsx and it can be downloaded from the Data Files folder on the course website. The file contains a stratified sample of bike rentals taken from the Capital Bikeshare trip history data for the second quarter of 2012. Variables in that file are as follows:

Variable

Description

Duration

Trip duration, in seconds

Start_date

Date and time stamp for the beginning of the trip

Start_station

Address for the location from which the bike was rented

End_date

Date and time stamp for the end of the trip

End_station

Address for the location to which the bike was returned

Bike_number

Bike identification number

User_type

Type of user (casual or registered)

Assignment tasks:

Question 1 -

(a) Use SAS to study the distribution of the total daily number of rentals. Obtain measures of location, dispersion, skewness and kurtosis. Obtain a boxplot, histogram and a quantile-quantile plot. Also carry out Normal goodness-of-fit tests. What are the key features of this distribution?

(b) Now use SAS to obtain boxplots of the total daily number of rentals according to season and by type of day (working day vs weekend or public holiday). What do the boxplots suggest about the pattern, if any, of bike rentals?

(c) In 2012, the east coast of the United States was struck by Hurricane Sandy. Is this severe weather event evident in your results? Provide a relevant graph to support your answer.

Question 2 -

(a) Obtain a Pearson correlation matrix relating variables count, atemp, temp, hum and windspeed. Also obtain a scatterplot matrix of the same variables. Discuss the relationships.

(b) Fit a simple regression model relating count to atemp, with count as the dependent variable, and determine the residuals from this regression. Discuss the fitted relationship and the goodness of fit. Examine residual plots and influence diagnostics and comment on the residual behaviour.

(c) Obtain a correlation matrix relating the residuals from part (b) to variables temp, hum and windspeed. Comment on these correlations. What do they tell you about the importance of these variables for predicting the daily count of bike rentals?

(d) Using the correlations in part (c) identify a set of potential explanatory variables. Regress count on your selection of variables. Discuss the fitted relationship and the goodness of fit. Also examine and discuss residual patterns.

(e) Extend your multiple regression model from part (c) to include categorical predictors. You can use stepwise selection to help you find the most parsimonious (simplest) model with the highest R-square. Report and interpret in detail only your final model, but do indicate how it was obtained and why it was considered the 'best'.

In building your model consider as many potential explanatory variables as possible (you may need to define additional dummy variables). Be sure to check, and if necessary correct, for collinearity.

Question 3 -

(a) Upload the data file random_sample.xlsx into a folder of your choice in your home directory on the SAS server and then use the import procedure to convert the data file into a SAS table. The code snippet shown below assumes that the Excel data file was uploaded directly to the home directory in SAS Studio, and proc print is used to check that the data was converted correctly into SAS format:

(b) Is there a statistically significant difference in duration of bike trips by casual versus registered users? If so, which trips are typically longer? Check the necessary conditions and perform an appropriate hypothesis test. Should it be a two-sample or a paired t-test? You may need to use a transformation (e.g. log) in order to justify performing a t-test on this data. Justify your choices and discuss your results.

Question 4 -

Write a summary of your findings from Questions 1 to 3. Keep the technical details of the analyses that led you to these conclusions to the absolute minimum. Rather, focus on practical significance and present your findings in non-specialist terms. A few paragraphs (up to a page) will be sufficient.

Note - Please include screenshots of SAS graphs where needed, followed with texts to explain them, according to the questions, thank you very much! There is no need to answer/explain graphs if the questions do not state so.

Attachment:- Statistics for Data Science Assignment File.rar

Reference no: EM132374126

Questions Cloud

What is multiculturalism : What is multiculturalism with respect to technology and information access.
Looking for information on what to do post graduation : What is the best way to recruit and screen group members for a therapy group for high school students that are soon to be graduates and their parents
Understanding of the group process : How often should we meet and for how long? Just trying to get a better understanding of the group process.
Test score difference occurring by chance : ''What does this mean about the probability of this test score difference occurring by chance''?'' Is it less than 0.05''?
Obtain a Pearson correlation matrix relating variables count : MATH 4044 - Statistics for Data Science Assignment - Capital BikeShare. Obtain a Pearson correlation matrix relating variables count
Evaluate client satisfaction with services : How would you go about planning a process to evaluate client satisfaction with services?
Article on lack of education : Looking for an article on " lack of education" where lack good critical thinking skills are being demonstrated by the author or speaker.
Three good critical skill from the article : Please provide at least three good critical skill from the article.
What is meant by the utility of a test : What is meant by the utility of a test? What are factors that affect a test's utility?

Reviews

len2374126

9/21/2019 4:16:46 AM

Please include screenshots of SAS graphs where needed, followed with texts to explain them, according to the questions, thank you very much! There is no need to answer/explain graphs if the questions do not state so. Instructions: This assignment is worth 25% of your final grade. It is due no later than 11pm on Sunday 22 September, at the end of Week 8. You will need to submit your assignment via Learnonline. There is no need to include a cover sheet as it is generated automatically by Learnonline system.

len2374126

9/21/2019 4:16:40 AM

The submitted assignment needs to be a single file, in either a Microsoft Word (doc or docx) or pdf file format. The assignment is out of 120 marks. To achieve maximum marks for each question, you should aim to: Complete the requested statistical analysis in SAS using appropriate tasks or procedures. (40%) Provide and interpret only the output most relevant to the question. Do not include every piece of output produced by SAS! (40%) Discuss the results in the context of the question. (20%) Assignments submitted late, without an extension being granted, will attract a penalty of 10 marks per each day or any part thereof beyond the due date and time.

Write a Review

 

Other Subject Questions & Answers

  Discuss about the american anthropology

American anthropology was founded by Franz Boas pretty much the way he looked at anthropology was culture was the center of the whole study.

  Identify the types of measures and sources data

Identify the types of measures and sources data that could help identify the cause(s) of the infections and explain the rationale for your choices.

  What will be expected of you what should you do

Explain the statement, "A meeting isn't over until it's over." How might this statement pertain to other meetings-say, a job interview?

  Discuss what teaching should be provided

During an assessment of a newborn infant, the examiner notes that the Moro reflex is absent. No other abnormalities are visible on inspection.

  An assignment on music of egypt

Normal 0 false false false EN-US X-NONE X-NONE Music of Egypt

  Researching the support services

Colleges and universities offer an array of services and support to assist college students. You will research the support services available at a college.

  How you believe this training would improve the workplace

Write a 350- to 700-word memo to your HR department in which you request development of training classes on interpersonal relationships. Describe the importance and characteristics of interpersonal relationships in the workplace.

  How do you manage your own personal inventory

How do you manage your own personal inventory of various supplies? Do you stock up or wait until you run out of something before you replace it? Why? Be sure to use operational management terminology

  Apply a sociological perspective to the social world

Apply a sociological perspective to the social world

  Explain meaning of total architecture and machine for living

Please respond to the following discussion topic and submit it to the discussion forum. Explain the meanings of "total architecture" and "machines for living.

  Same situation has numerous views

same situation has numerous views of what is the truth (reality) which result in diverse understanding and/or conflict.

  Discuss about the post given below

A psychological assessment report is created by psychology professionals to inform groups or individuals of the assessments appropriate for their current needs. This type of report also includes a summary of the services provided to these groups o..

Free Assignment Quote

Assured A++ Grade

Get guaranteed satisfaction & time on delivery in every assignment order you paid with us! We ensure premium quality solution document along with free turntin report!

All rights reserved! Copyrights ©2019-2020 ExpertsMind IT Educational Pvt Ltd