Reference no: EM132364478
Assignment
Questions
In this assignment, we will examine a random subset of a dataset that contains information on athletes that competed in an event in the "Olympics" (both winter and summer versions) over the 120 year history of the modern games.
1. Consider the variable Weight in the "Olympics" dataset, which records the weight of the athlete.
(a) Use appropriate graphical displays and measures of centrality and dispersion to summarise the Weight variable. Provide a reasonable explanation for why the Weight data might have the distribution you observe.
(b) For the most appropriate measure of centrality and measure of dispersion you have selected for Weight, produce a table of the form shown below that presents:
• the particular measures (i.e., statistics) you have chosen,
• those measures (i.e., statistics) as calculated for the variable Weight,
• the jackknife and bootstrap estimators for those statistics,
• the jackknife and bootstrap standard errors for those statistics, and
• the jackknife and bootstrap estimates of bias for those statistics.
Do these measures of centrality and dispersion appear to be biased or unbiased estimators?
Measure of Centrality:
<Name of measure of centrality>
<Value of measure of centrality when applied to original data>
|
Jackknife
|
Bootstrap
|
Estimator
|
|
|
Standard error
|
|
|
Bias
|
|
|
Measure of Dispersion:
<Name of measure of dispersion>
<Value of measure of dispersion when applied to original data>
|
Jackknife
|
Bootstrap
|
Estimator
|
|
|
Standard error
|
|
|
Bias
|
|
|
(c) Produce graphical displays of the sampling distributions of the measure of centrality and measure of dispersion you have selected for Weight. Comment on the shapes of these distributions.
Additionally, produce a 95% bootstrap percentile confidence interval for both your measure of centrality and measure of dispersion and interpret them. If there is anything unusual about the 95% bootstrap percentile confidence intervals, comment on that.
2. Now consider the relationship between (Weight) and type of medal won (Medal). Consider carefully the categories of Medal present in the dataset.
(a) Clearly and accurately state the
• linearity,
• independence,
• normality, and
• equal variances (i.e., homoscedasticity)
assumptions of linear regression as they pertain to these data, and assess them for a linear model of Weight on Medal. This assessment should include reference to appropriate graphical displays.
b) Consider common transformations of the data and present the form of the linear model which you believe would be best when attempting to assess the relationship between Weight and Medal. Present and discuss relevant diagnostic plots for assessing the assumptions of linear regression for this model, clearly noting any violations of assumptions that may still exist.
(c) Assuming that the model presented in Part (b) is wholly appropriate (i.e., there are no violations of the assumptions of linear regression), provide a table of relevant R output for that model and comment on whether there is a significant "effect" of winning a medal on the weight of an athlete. If so, how would you interpret this "effect".
3. To assess the level of preservatives used in mass-produced breads, researchers randomly sampled four loaves of bread of different brands and varieties that are stocked by a large supermarket chain. They let the loaves sit in a controlled environment at 27?C until mould appeared. The number of days until mould appeared for each of the four loaves of bread is as follows:
2 3 8 6
By hand (i.e., no computer allowed), calculate the jackknife estimator and standard error of the median time until mould appears, showing all working. Is the estimator unbiased?
4. Presentation marks:
These marks are allocated based on:
• structure, clarity, and tidiness of presented solutions/answers,
• correctness in spelling and grammar, and
• readability of R code (which includes usage of informative variable names and commenting).
Attachment:- olympics.rar