Reference no: EM132369780
Assignment
Read the instructions carefully. You will use the data set below, BRFSS 2013 dataset HW1 and the data dictionary.This is a real public use dataset. Notice it is large and messy! You will complete your assignment in word document. Please be sure your homework document is neat and easy to read. Only paste the SPSS output requested.
Instructions: Use the BRFSS 2013 dataset located under Homework assignment #1. Pay close attention to variable names. Remember that you can switch your view within an SPSS command box to be the variable names instead of the labels by right clicking and selecting the option.
You can also sort alphabetically by right clicking and selecting the option. Answer all questions in complete sentences and in terms of the problem. Reference the BRFSS 2013 data dictionary for coding of each variable. BMI3 is a variable calculated by me to restrict the BMI values included in the data from 12 to 49.99.
1. The outcome variable will be BMI3. Explore these variables BMI3, ADDEPEV2, ASTHMA3, @_RFHYPE5, CHCCOPD1, CHCKIDNY, CHCOCNCR, CHCSCNCR, CVDCRHD4, CVDINFR4, CVDSTRK3, DIABETE3, @_DRDXAR1, TOLDHI2, and SEX together in one command (only run Explore once). This will take your computer a while so sit back and relax.
a. Fill out the table below. Use two decimal places.
|
Mean
|
Median
|
Skewness
|
Kurtosis
|
BMI3
|
|
|
|
|
ADDEPEV2
|
|
|
|
|
ASTHMA3
|
|
|
|
|
@_RFHYPE5
|
|
|
|
|
CHCCOPD1
|
|
|
|
|
CHCKIDNY
|
|
|
|
|
CHCOCNCR
|
|
|
|
|
CHCSCNCR
|
|
|
|
|
CVDCRHD4
|
|
|
|
|
CVDINFR4
|
|
|
|
|
CVDSTRK3
|
|
|
|
|
DIABETE3
|
|
|
|
|
@_DRDXAR1
|
|
|
|
|
TOLDHI2
|
|
|
|
|
SEX
|
|
|
|
|
b. How many valid observations and how many missing observations are shown in the output?
c. Since BMI3 is the outcome variable, we need to check if it is normally distributed and appropriate to use for linear regression. Produce a histogram of BMI3 using the Chart Builder. Paste it below. Describe its shape, range, and reference its skewness ad kurtosis. Using only these tools, is BMI3 approximately normally distributed?
d. With regards to BMI3, what does the Kolmogorov-Smirnov test for normality tell us? Paste that table below. Why or why not is this particular test reliable for this dataset?
2. Run a linear regression predicting BMI3 from ADDEPEV2, ASTHMA3, @_RFHYPE5, CHCCOPD1, CHCKIDNY, CHCOCNCR, CHCSCNCR, CVDCRHD4, CVDINFR4, CVDSTRK3, DIABETE3, @_DRDXAR1, TOLDHI2, and SEX. Under statistics select Estimates, Confidence Intervals 95%, Model fit, Collinearity diagnostics, and Durbin-Watson. Under plots, select Histogram, normal probability plot, and produce all partial plots. Under Save, select unstandardized predicted values, standardized residuals, and Mahalanobis distance. Under Options, be sure "Exclude cases listwise" is selected under missing values. Use the Enter method.
a. What is the R2 for the fit model?
b. Interpret the R2 for the model using a complete sentence.
c. What is the F-value for the ANOVA for the model?
d. What is the p-value in the ANOVA table, and is it significant at the alpha = 0.001 level?
e. Fill in the table below from the coefficients output.
Variable name
|
Beta estimate
|
p-value
|
VIF
|
Constant
|
|
|
|
ADDEPEV2
|
|
|
|
ASTHMA3
|
|
|
|
@_RFHYPE5
|
|
|
|
CHCCOPD1
|
|
|
|
CHCKIDNY
|
|
|
|
CHCOCNCR
|
|
|
|
CHCSCNCR
|
|
|
|
CVDCRHD4
|
|
|
|
CVDINFR4
|
|
|
|
CVDSTRK3
|
|
|
|
DIABETE3
|
|
|
|
@_DRDXAR1
|
|
|
|
TOLDHI2
|
|
|
|
SEX
|
|
|
|
f. Which variables are NOT significant?
g. Interpret every significant predictor of BMI3. Use a separate sentence for each significant predictor. Be sure to write your interpretations using the language and units related to the variable (i.e. do not use the SPSS variable name and do not use the word units).
h. Find the Mahalanobis distance (M-distance). What is the minimum, maximum, mean, and standard deviation of the M-distance?
i. What is the chi-square critical value to test for outliers using the M-distance?
j. How many observations have a M-distance value greater than that of the critical value? (You may want to complete the instructions in question 3 first to make answering this question easier).
3. Create a new dataset with only those observations that have an M-distance less than the chi-square critical value. You can do this a couple of ways - one way is to create a new variable that defines if an observation should be included or not and use the "Select cases" under the Data menu. Select cases "If a condition is satisfied" and only use where your new defined variable is set to include in the data. Under Output, copy selected cases to a new dataset, and name your new dataset BRFSS2013new. Be sure to save this new dataset.
a. Run another linear regression using the same instructions as in the instructions for question 2. Unselect everything under the Save option menu. How many observations were used in the analysis?
b. How much did the R-square change?
c. Fill in the table below with the new coefficients from the output.
Variable name
|
Beta estimate
|
p-value
|
VIF
|
Constant
|
|
|
|
ADDEPEV2
|
|
|
|
ASTHMA3
|
|
|
|
@_RFHYPE5
|
|
|
|
CHCCOPD1
|
|
|
|
CHCKIDNY
|
|
|
|
CHCOCNCR
|
|
|
|
CHCSCNCR
|
|
|
|
CVDCRHD4
|
|
|
|
CVDINFR4
|
|
|
|
CVDSTRK3
|
|
|
|
DIABETE3
|
|
|
|
@_DRDXAR1
|
|
|
|
TOLDHI2
|
|
|
|
SEX
|
|
|
|
d. Did any of the beta estimates change by more than 10%? Fill in the table below in order to identify which ones and by what percent increase or decrease their beta estimate changed.
Variable name
|
old
|
new
|
new/old
|
% change
|
%
|
Constant
|
|
|
|
|
|
ADDEPEV2
|
|
|
|
|
|
ASTHMA3
|
|
|
|
|
|
@_RFHYPE5
|
|
|
|
|
|
CHCCOPD1
|
|
|
|
|
|
CHCKIDNY
|
|
|
|
|
|
CHCOCNCR
|
|
|
|
|
|
CHCSCNCR
|
|
|
|
|
|
CVDCRHD4
|
|
|
|
|
|
CVDINFR4
|
|
|
|
|
|
CVDSTRK3
|
|
|
|
|
|
DIABETE3
|
|
|
|
|
|
@_DRDXAR1
|
|
|
|
|
|
TOLDHI2
|
|
|
|
|
|
SEX
|
|
|
|
|
|
e. Check the coding in the data dictionary to identify what value corresponds with having each diagnosis. You can search the data dictionary by using control+F and typing the variable name or change the target column in the variable view. Show your calculations by hand (you can input a picture of handwritten work or use the "Insert > Equation" option in Microsoft Word.
Predict the BMI for:
i. A male who has depressive disorder, asthma, high blood pressure, COPD, other types of chronic conditions, skin cancer, angina, heart attack, stroke, diabetes, arthritis, high cholesterol, and no kidney disease.
ii. A female who does not have any of the characteristics in part i (no kidney disease as well).