Statistical programming for data science

Assignment Help Advanced Statistics
Reference no: EM131711609

Statistical Programming for Data Science

Does Public Transit Encourage Physical Activity?

In recent years there have been a number of health interventions conducted world-wide, with the goal of encouraging adults to become more physically active on a daily basis. A motivation for such interventions is that physically active adults have been reported to have lower rates of chronic diseases as well as a reduced risk of obesity, diabetes and heart disease. However, with demanding careers and busy personal lives, many adults do not make time for regular exercise.

Active public transport has long been the focus of health initiatives, as it involves walking to bus or tram stops or train stations, as well as cycling using public bike lanes and roadways. In some countries active public transport has become public policy - for example, in Denmark, walking and cycling are popular modes of public transport and now form part of the government outdoor recreation strategy.

In 2010-2011 the NYC Department of Health and Mental Hygiene conducted the Physical Activity and Transit Survey (PAT). The PAT Survey consisted of three parts:

1. A telephone survey of physical activity and health
2. A weeklong accelerometer device component of a sub-sample of participants; and
3. A weeklong GPS device component.

The data can be downloaded from the COMP 5070 website.

Your analysis and conclusions in your report should focus on whether the use of Public Transport encourages physical activity. You have been provided a number of data sets you are free to use as you see fit, providing the requested components (below) are included. You can focus on one file or merge data files - wherever your analyst's brain takes you

For the report you are free to analyse whichever aspects of the dataset(s) interest you, with the following caveats:

1. Movement achieved via public transport must be part of the analysis.

2. MVPA (Moderate to Vigorous Physical Activity) forms part of your analysis.

3. The report focuses on answering 2 questions of interest - posed by you - and answered by your data analysis.

Programming language: R is probably easier to use however you are free to use just R, just Python or R and Python - whatever you need to get the job done

The report should contain the following:

1. Report Aim: Write a short explanation of the analysis to be performed and an explanation of the question(s) you are investigating within the data (1-2 paragraphs).

2. Data Summary: Write a short explanation of the data you will analyse - e.g. demographics of interest and/or other variables you will investigate (1-2 paragraphs).

3. Data Cleaning an explanation of any data cleaning performed (including merging) to enable the analysis to be performed.

4. Descriptive statistics including visualisations to support your analysis including the 2 questions you have posed as part of your analysis.

5. Statistical comparisons: besides focusing on MVPA there should be a comparative aspect of interest in your analysis. E.g. MVPA by activity type or MVPA by demographics, MVPA by day of the week ... these are just suggestions - it's really up to you

As a minimum you need to produce one comparison for each question of interest you wish to answer.

6. Exploratory Factor Analysis: conduct an exploratory factor analysis for the data in PublicTransportSurvey.csv - a codebook has been provided in the file PublicTransportSurvey.txt. Partial code has been provided with this take-home exam in the file TakeHomeExam_EFA.R. Please use this code as your starting point and follow the prompts given to you inside the file. In particular look for lines starting:
### !!! or ### R
as there is a question in the code you need to answer and code you need to write, respectively.

For the Exploratory Factor Analysis you need to include the following in your written report here:

- The Cronbach Alpha output and a short discussion (2-3 sentences) as to whether the data is trustworthy and why you conducted the Cronbach Alpha test.

- Correlation output of your choosing (graphical and/or numerical) with an accompanying discussion (3-4 sentences). If numerical, round the correlations to 2 digits. There is a hint below regarding rounding.

- A single paragraph explaining the outcome of the determinant test, Bartlett's test of sphericity and the KMO statistic for the data. Do not include R output.

- Your decision regarding the number of factors to estimate (scree plot may be shown, do not show the R console output).

- The FINAL factor solution you have chosen - you may wish to present this as a table. You do not need to discuss results of any other solutions tried, however you should justify your final factor solution.

- The final factor solution should include names of the factors in each analysis and an explanation as to how you come up with these names.

- Are there any survey questions you think could be dropped? Explain.

- The plot of your final factor solution. In the workshop example we plotted four in one overall graph; in this case you only need to produce one plot and it should match your chosen solution.

7. Conclusion: Draw conclusions about:
i. The use of public transport by visitors to Adelaide (1 paragraph).
ii. Your analysis overall (1 paragraph).

Data Files

https://www.dropbox.com/s/ehjdso1unzrkpjg/Exam%20and%20data%20files%20-%2011-8-2017.zip?dl=0

Verified Expert

From this paper, it can be said that the number of respondents who have a higher BMI tend to use private cars and have a higher urge of consuming carcinogens. In addition to this, these respondents are much more prone to chronic diseases and various ailments. There are various types of tests that are conducted to ascertain the results. This has been done by collecting data from the target group and interpreting the results by using the methods mentioned above. R programming has been used to analyse the findings.

Reference no: EM131711609

Questions Cloud

Calculate the depreciation for the second year : On June 1, Michael Company purchased equipment at a cost of $120,000, Using straight line depreciation, calculate the depreciation for the second year
Discuss sprawl is a consequence of the market economic : There are three basic factors which cause the urban sprawl. First of all is the market factors. The American urban sprawl is a consequence of the market
What policy you recommend with respect to remote management : What policy would you recommend with respect to remote management? Would you recommend a packet sniffer for your organization? Why or why not?
Discuss verizon could sell its services and products : Teyya, and it shows your perceptive intelligence. Here's an added question as well: if Verizon could sell its services and products
Statistical programming for data science : COMP 5070 - Write a short explanation of the analysis to be performed and an explanation of the question(s) you are investigating within the data
Discuss change of venue motions made to courts : Change of venue motions made to courts, the grounds upon which the change was requested and whether the request was granted or denied
What is the net operating income earned by product : According to the company's accounting system, what is the net operating income earned by product V41B
Write an executive summary on the specific industry problem : Write an executive summary on the specific industry and business problem that you are going to do your research paper on.
Write a conclusion about immediate concrete experience : A French teacher tested the claim that "concrete, immediate experience enhances vocabulary learning." Driving from the college into the city.

Reviews

inf1711609

4/1/2018 5:27:15 AM

All result gave were according to the requirements and that was also without any plagiarism. I will ask all my friends to take your help in their assignments and shall get the benefits of referral program as well.

inf1711609

11/14/2017 4:04:19 AM

ITS OPEN GO TO DATA FILE I SENT IT WITH FILE (TAKE HOME EXAM ) AND OPEN THE DATA FILE ---- (READ FIRST - READ SECOND ) THEN THERE USE THE MOUSE TO CHOOSE ANY PART AND JUST MAKE CLICK ALSO THISIS JUST FOR :::: More information about the study can be found online at: https://www1.nyc.gov/site/doh/data/data-­-sets/physical-­-activity-­-and-­-transit-­-survey-­- methodology.page. (YOU CAN USE DATA FILE (READ FIRST --READ SECOND) BY USE THE MOUSE. SO FOR THE DATA FROM COMP 5070 YOU CAN FIND IT BY ATTAC dear all my work with you no complete so please use the (Programming language: R (RStudio) is probably easier to use however you are free to use just R, just Python or R and Python – whatever you need to get the job done) to running the file attach to answer for question inside it please review the assignment all and back it to me ? I need the answer code to running with R programming

len1711609

11/8/2017 2:31:24 AM

The real challenge in this exam is to conduct an analysis of your choosing, based on the specifications given to you in the exam file. Providing you follow the guidelines provided you are free in many aspects to drive your own analysis. the due date after 4-5 days

len1711609

11/8/2017 2:30:48 AM

Write a report summarising your analysis (50%). Length indicators are given within the question itself. • Plagiarism is a specific form of academic misconduct. Although the University encourages discussing work with others and the Social Forum will support this, ultimately this submission is to represent your individual work. If plagiarism is found, all parties will be penalised. You should retain copies of all assignment computer files used during development. These files must remain unchanged after submission, for the purpose of checking if required. explanations are given to help you understand the requested analyses. You do not need to write a lot of specialised code -­- you should be able to find nearly all the code you need from the R files provided throughout the course, via case studies and other examples and your second assignment. I have also given you partial code for the last part of the analysis.

len1711609

11/8/2017 2:30:33 AM

• The take-­-home exam is worth 30% of your overall grade. The exam is out of 100 marks. • The exam is to be submitted as a compressed file (e.g. .zip, .tar.gz, .gz) using Gradebook. This file should include ALL code needed to run your analysis. • You do NOT need to include any data files provided to you, as it will be assumed I too have themIf you have created new data files, then please include those. • To obtain the maximum available marks you should aim to: 1. Code all requested components (40%). 2. Use a clear style of code presentation (10%). Code clarity is an important part of your submission. Thus you should choose meaningful variable names and adopt the use of comments -­- you don't need to comment every single line, as this will affect readability -­- however you should aim to comment at least each section of code.

Write a Review

Advanced Statistics Questions & Answers

  Schedule of cost of goods

Parmian Corporation has provided the following data concerning last month's manufacturing operations. Prepare a schedule of cost of goods manufactured for the month.

  Compute the elasticities for each independent variable

Compute the elasticities for each independent variable and determine the implications for each of the computed elasticities for the business in terms of short-term and long-term pricing strategies. Provide a rationale in which you cite your results..

  Find the cdf of each yi in terms of that of the xi

Find the CDF of each Yi in terms of that of the Xi. Find the PMF of the number of arrivals of the generalized renewal process at each epoch at which arrivals occur.

  Difference in haemoglobin concentration at enrolment

Difference in haemoglobin concentration at enrolment between women with detectable peripheral malaria parasites and those without detectable parasites?

  1 the young company produces plastic bottles to customer

1. the young company produces plastic bottles to customer order. the quality inspector randomly selects four bottles

  What is the standard deviation of the time

What is the standard deviation of the time it would take each of Bill and Ben to finish the job? Interpret the results - Who would you choose to do the job?

  Predetermine overhead rate

ZXC Firm has the following information to determine Predetermine overhead rate for Standard Costing:

  Determining number of sales using algebra

A Cold Stone Creamery ice cream shop sells sundaes for $3.60 and banana splits for $4.25. The shop sells four times as many sundaes as banana splits. If total sales amounted to $3,730 last weekend, how many banana splits were sold?

  Question regarding statistical sampling

What are some benefits of an auditor using statistical sampling? What are some examples of statistical sampling? Does anyone have any experience they are willing to share with the class on statistical sampling?

  Critically appraise of the statistical material

401077 Introduction to Biostatistics - Critically appraise of the statistical material in this paper against items 10, 12-17 of the STROBE checklist

  Plantwide predetermined oh rate

Red River Farm Machine makes a wide variety of products, all of which must be processed in the cutting and Assembly departments. For the year 2010, Red River budgeted total overhead of $993,000,

  Description of sampling

A polling company obtains an alphabetical list of names of voters in a precinct. They select every 20th person from the list until a sample of 100 is obtained. They then call these 100 people. Does this sampling plan result in a random sample?

Free Assignment Quote

Assured A++ Grade

Get guaranteed satisfaction & time on delivery in every assignment order you paid with us! We ensure premium quality solution document along with free turntin report!

All rights reserved! Copyrights ©2019-2020 ExpertsMind IT Educational Pvt Ltd