Reference no: EM132265739
Assignment -
Consider the SAS file on n = 50 children from some hospital. This data set has been edited and is limited to:
- The response variable (y) is "base". It is some clinical measurement on each child.
- The main regressor variable is "age" of each child (in months).
- The first data entry is "obs" which is the id of each child. Do not use obs in the regression model. It is simply in the data to help physicians identify each child.
Problem:
1. Run SAS with the option "influence" to obtain for each data value the diagnostics R-Student, hat diagonal, and the covariance ratio. Calculate for each of these three influence diagnostics the cut-off values. Check whether you should be removing any point(s) or not. Justify your decisions. If you remove any data value(s), you need to use only the remaining data in the next analyses.
2. Run a SAS program for a linear regression of base(y) on age of child (x). Do not transform the data here at all.
(a) Comment on the SAS output with respect to information on the slope and the intercept.
(b) Comment on the residual plots and any other information that could help you assess how well the assumptions for the regression are met.
(c) Which proportion of the total variability in base is accounted for by the linear regression of base on age?
(d) Test for Lack of Fit with SAS. What can you conclude regarding the model and LOF?
3. Test for normality of the response variable base. Comment on your findings. Make use of the Shapiro-Wilks Test in proc univariate and also use a QQ plot in SAS and associated correlation test. Do not transform the data here at all. What are your overall conclusions on the assumption of normality of the variable base?
4. Run a SAS program for a Box-Cox transformation of base to obtain a "power" that will create a transformed variable "base" that is approximately normal. Then check for normality of the transformed variable base with the Shapiro-Wilks Test in proc univariate and also use a QQ plot in SAS.
5. Using the transformed variable base (based on your results in 4) with the regressor variable age, test for Lack of Fit with SAS. What can you conclude regarding the model and LOF?
Attachment:- Assignment File.rar