Create a new variable that categorizes blood hdl

Assignment Help Other Subject
Reference no: EM131031673

Project: Multiple Linear Regression

Use the dataset m2project2016.dta for this project. The dataset is a sample from the 1999-2002 National Health and Nutrition Examination Survey (NHANES). A race-stratified random sample was selected from a complete dataset for the variables included, and then modified to remove missing data. The random sampling based on race was setup to result in a race distribution similar to the US population. Also, variable names are modified (so you won't find the variable names on the NHANES website), and in some cases continuous variables are categorized, or categories combined for categorical variables. The data are restricted to no reported history of heart disease, and no reported use of prescription drugs for hyperlipidemia or inflammation. The dietary intakes are based on a 24-hour diet recall interview.

The objective is to examine the nature of the relationship between fish intake and blood HDL-cholesterol levels (if any), and evaluate the confounding influence of other variables. Submit the following items for this project using the Dropbox:

1. A project report organized according to the steps listed below. The insertion of program output into the report should be limited and clearly tied to the discussion of results. Figures (graphs) can be inserted onto the report document of posted as separate files. All tables should be incorporated into the report document.

2. A program file with all the program commands used to complete the project (do-file in Stata). The commands should be ordered according to the list of tasks below and comments included describing the data analysis step the commands are used for.

Data Analysis Steps:

1. Create a new variable that categorizes blood HDL into quartiles: <=40 mg/dL, 41 to 48 mg/dL.to make sure you get the proper HDL categories, you can use the following Stata code:
genhdlcat = 1 if hdl>0 &hdl<=40
replacehdlcat = 2 if hdl>40 &hdl<=48
replacehdlcat = 3 if hdl>48 &hdl<=59
replacehdlcat = 4 if hdl>59 &hdl<.

2. Conduct descriptive analyses and bivariate association tests between HDL quartiles and other variables so as to complete Tables 1 and 2. Note the categorized variable for blood HDL needs to be used for Tables 1 and 2, but the continuous HDL variable is used for the multiple regression analyses.Conduct the appropriate tests to evaluate bivariate associations of characteristics with HDL quartile (as a categorical variable). When the characteristic is numerical (e.g., BMI), test for linear and quadratic trends across the HDL quartiles. See the competed analyses for fish consumption and race in Table 1 below as examples.

3. Create new variables to mean-center the following continuous variables: age, bmi, tkcal, tprot, tcarb, tchol, tfat, tfibe, tvc, tsele, tg, and ldl.Keep the original variables.

4. Use a saturated model with no interactions (a saturated model includes all predictors) and perform an initial evaluation of collinearity using the variance inflation factor.

Saturated Model
Dependent Variable: HDL

Predictors:gender, centered-age, race, education, centered bmi, centered dietary energy, centered dietary protein, centered dietary carbohydrate, centered dietary cholesterol, centered dietary fat, centered dietary fiber, centered dietary vitamin C, centered dietary selenium, alcohol, physical activity, dietary fish, centered blood triglyceride, centered blood LDL, smoking status

Identify the predictors, that when removed from the model, eliminate the collinearity.First remove predictors other than tfish, gender, age, bmi, pactive, and smoke to resolve collinearity (these predictors are in the first group for the Allen-Cady procedure - see below). Establish a model containing as many of the original variables as possible that does not exhibit collinearity. This will still be called model 1.

5. Use model 1 from step 4 above and evaluate the linearity assumption forBMI, age, fish consumption, and blood triglyceride. There should be evidence for nonlinearity for two of these four predictors. Create quadratic terms for the two variables and include in model 1. Then re-evaluate linearity for the two variables (just the linear terms). Include in the report a description of how linearity was assessed and any graphs and relevant statistical output used both before and after inclusion of quadratic terms in the model. The model from this step is called model 2.

6. Usemodel 2 from step 5 above and evaluate the normality and homogeneity of variance assumptions for the dependent variable. A problem with the normality and homogeneity of variance should be found that is mostly fixed by transforming the dependent variable (use the natural log).Include in the report a description of how the assumptions are evaluated and include any graphs and relevantstatistical output used. The model from this step is called model 3.

Normally, the linearity for numerical variables would be rechecked (BMI, age, fish consumption, and blood triglyceride). This step will be skipped for the project. The two variables that whose nonlinearity was fixed by inclusion of quadratic terms remained fixed when the transformed HDL is used as the dependent variable instead of HDL.

7. Usethe DFBETA statistic with model 3 (log HDL)to evaluate and document the presence of influential data, but do not delete any data. Make a list of the influential observations (ID numbers and associated influence statistics; graphs are also useful).

8. Usethe transformed form of HDL and fish consumption (extra terms for a nonlinearity or not) and examine the confounding influence of covariates (extra terms for a nonlinearity or not) on the association of fish consumption and blood HDL. Do this using regression models for HDL containing only fish consumption and the one potential confounder. Complete Table 3 and describe the findings.

9. Use the Allen-Cady Modified Backwards Selection procedure with Model 3 (log HDL) to reduce the number of predictors in the regression model. For the first group of predictors that are always in models, use the following predictors: total fish consumed, gender, centered-age, centered-BMI, physical activity, and smoker status (exclude any of these variables that induce collinearity - exceptthe predictor of interest total fish consumption). These predictors were chosen for the first group based on one of two criteria: 1) a predictor of interest (total fish consumption), or 2) some documentation in the literature for an association with blood HDL. There may be documentation for some of the other predictors having an association with blood HDL, but for this project the predictors previously listed will be used to simplify the possible final models.The ranking of covariates that is required for the second group is left to each student to perform independently. Then carry out the backward selection using p=0.1 as the retention criteria. Use Table 4 to present the stepwise results and use Table 5 to summarize the model resulting from the backward selection (include all predictors from both groups and add rows to the table or modify as needed). Write a summary that includes interpretations of the regression coefficients in terms of the association between predictor and blood HDL-cholesterol (see page 129 of VGSM for interpreting coefficients when the dependent variable is log transformed). Evaluate any ordinal predictors in the modelfor trends (linear and quadratic when justified), and if there is no trend adjust p-values for multiple comparisons for multilevel categorical variables.

10. Using the regression model selected by the Allen-Cady Modified Backwards Selection procedure evaluate the interaction between fish consumption and gender.Use Table 6 to summarize the final model with the interaction added (add rows to the table or modify as needed), and write a summary that includes interpretations of the regression coefficients for the interaction in terms of the association between predictor and blood HDL-cholesterol.Make a graph that illustrates the interaction (or absence of interaction). Indicate whether the inclusion of the interaction modified any association HDL-cholesterol with the other predictors in the model.

Table 1. Characteristics of the study sample by Blood HDL Quartiles

Characteristic

HDL (mg/dL) Categories

p-value

 

< 40

(n=439)

41 to 48

(n=458)

48 to 59

(n=433)

> 59

(n=436)

Mean or %

SD

Mean or %

SD

Mean or %

SD

Mean or %

SD

Fish Consumption (meals/30 days)

1.9

3.4

1.5

3.1

2.1

3.0

3.0

4.6

< 0.001a

0.002c

Age (years)

 

 

 

 

 

 

 

 

 

 

BMI (kg/m2)

 

 

 

 

 

 

 

 

 

 

Gender (% female)

 

 

 

 

 

 

 

 

 

 

Smoker (% yes)

 

 

 

 

 

 

 

 

 

 

Race/Ethnicity (%)

 

 

 

 

 

 

 

 

0.005d

    White

63.8

 

58.3

 

61.2

 

65.1

 

    Black

8.9

 

13.1

 

15.2

 

14.9

 

    Hispanic

27.3

 

28.6

 

23.6

 

20.0

 

Physical Activity (%)

 

 

 

 

 

 

 

 

 

   Low

 

 

 

 

 

 

 

 

   Low-Moderate

 

 

 

 

 

 

 

 

   High-Moderate

 

 

 

 

 

 

 

 

   High

 

 

 

 

 

 

 

 

Education Level (%)

 

 

 

 

 

 

 

 

 

   Less than HS

 

 

 

 

 

 

 

 

   HS/GED

 

 

 

 

 

 

 

 

   Some college

 

 

 

 

 

 

 

 

   College or more

 

 

 

 

 

 

 

 

a. ANOVA F-test.

b. Test for linear trend after ANOVA.

c. Test for quadratic trend after ANOVA.

d. Chi-square test.

Table 2. 24-Hour diet intake profile of the study sample by Blood HDL Quartile

Dietary Factor

HDL (mg/dL) Categories

p-value

< 40

(n=439)

41 to 48

(n=458)

48 to 59

(n=433)

> 59

(n=436)

Mean or %

SE

Mean or %

SE

Mean or %

SE

Mean or %

SE

Energy (kcal)

 

 

 

 

 

 

 

 

 

Protein (gm)

 

 

 

 

 

 

 

 

 

Carbohydrate (gm)

 

 

 

 

 

 

 

 

 

Fat (gm)

 

 

 

 

 

 

 

 

 

Cholesterol (gm)

 

 

 

 

 

 

 

 

 

Fiber (gm)

 

 

 

 

 

 

 

 

 

Vitamin C (mg)

 

 

 

 

 

 

 

 

 

Selenium (mcg)

 

 

 

 

 

 

 

 

 

Alcohol (% yes)

 

 

 

 

 

 

 

 

 

a. ANOVA F-test.

b. Test for linear trend after ANOVA.

c. Test for quadratic trend after ANOVA.

d. Chi-square test.

Table 3. Confounding Influence of Covariates on Fish Consumption Regression Coefficient

Potential Confounder

b

% Change in ba

p-valueb

None

0.551

 

< 0.001

Age

0.554

+ 0.5

< 0.001

BMI

 

 

 

Gender

 

 

 

Smoker

 

 

 

Race/Ethnicity

 

 

 

Physical Activity

 

 

 

Education Level

 

 

 

Dietary Energy

 

 

 

Dietary Protein

 

 

 

Dietary Carbohydrate

 

 

 

Dietary Fat

 

 

 

Dietary Cholesterol

 

 

 

Dietary Fiber

 

 

 

Dietary Vitamin C

 

 

 

Dietary Selenium

 

 

 

Alcohol

 

 

 

a. (b(confounder) - b(fish))/b(fish) as %.

b. P-value for fish beta coefficient for in a model also containing the potential confounder.

Table 4. Allen-Cady Procedure Results

Predictor in Rank Ordera

Coefficient Estimate P-Valueb

Step 1

Step 2

Step 3

Step 4

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

a. Most important to least important.

b. P-value for beta coefficient for predictors at each step in the backward selection with p=0.1 as the retention criteria.

Table 5. Regression model for the association of blood HDL-cholesterol with fish consumption adjusting for confounding by demographic characteristics and dietary factors.

Predictor

b

95% CI

p-value

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Table 6. Regression model for the association of blood HDL-cholesterol with fish consumption and the interaction with gender, with adjustment for confounding by demographic characteristics and dietary factors.

Predictor

b

95% CI

p-value

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Attachment:- Assignment.rar

Reference no: EM131031673

Questions Cloud

Use some of your project management learnings : What is it about project management that you like most based upon what you know so far in this course? What is it about project management that you like least based upon what you know so far in this course? What work or non-work activities will you b..
Considering changing her cell phone plan : A real estate agent is considering changing her cell phone plan. There are three plans to choose from, all of which involve a monthly service charge of $20. Plan A has a cost of $.38 a minute for daytime calls and $.18 a minute for evening calls. Sup..
Why do many managers act like control freaks : What is Pareto's Principle (or the 80-20 Rule), and what does it have to do with control? Why do many managers act like control freaks? Why do many managers micromanage so much? Why are many managers reluctant to take the actions necessary to correct..
Zero interest-bearing note : Identify the authoritative literature that provides guidance on the zero interest-bearing note. Use some of the examples to explain how the standard applies in this setting.
Create a new variable that categorizes blood hdl : Module 2 Project: Multiple Linear Regression, HSC-731, spring 2016. Create a new variable that categorizes blood HDL into quartiles:
Determining business-IT solutions and investments : Which answer is not a technique used for identifying and determining business/IT solutions and investments?
Determining the balance depreciation rate : The new machine will cost $500,000 and can be depreciated straight line over 10years for tax purposes. Accounting depreciation is 15% reducing balance. Mr PODS has recommended that the 15% reducing balance depreciation rate beused for any analysis..
Describe the process of quantitative analysis : Describe the process of quantitative analysis. Analyze appropriate methods of quantitative analysis used for business problems. Solve business problems with appropriate quantitative decision-making models.
How to shape the argument for each audience : In order to understand how to shape the argument for each audience, you must be clear who the main audience is, what their needs and concerns are, and what you want them to do. (Your audience is those who have the power to approve funding for your so..

Reviews

Write a Review

 

Other Subject Questions & Answers

  Write a geographic description of a cultural landscape

Write a Geographic Description of a Cultural Landscape and Mapungubwe Cultural Landscape.

  Statements regarding child sexual abguse

Which of the following statements regarding child sexual abguse is the MOST important tenent to remember for documentation?

  The requirement of perception to validate quality

the requirement of perception to validate quality improvements gransberg 2010.when i was working in supply chain

  What is an externality

What is an externality

  Explain the historical development of public health

Write a 700- to 1,050-word article in which you: Define public health. Explain the historical development of public health. Identify careers within public health

  Explain the significance for public administration

Briefly define and explain the significance for public administration of the following terms:alienation; hierarchy; legitimation; time and motion studies; POSDCORB; the Hawthorne Experiments

  Illustration of a multicultural experience

Discuss your preconceived notions about the experience. Explain how your notions were impacted by group influence.

  Research research at least six 6 information sources on

nbspinternet field tripresearch research at least six 6 information sources on forecasting methods take notes and

  According to contemporary concepts of property

Do you think that the outcome of Harris v. Cooper and the court's reasoning would be significantly different if the case were decided today according to contemporary concepts of property, equality, justice, and marriage?

  Beginning of the obama presidency

Compare and contrast the fiscal policies of Presidents G.W. Bush and Barack. Obama. In your analysis describe the historical context of the bail-outs and the financial crisis at the end of the Bush Presidency and the beginning of the Obama Presid..

  Briefly describes how the sociological approach you have

1. identifies and defines relevant sociological concepts foranalysing the health issue case study 2. describes and

  How the critical ideas of a chosen theorist are impacted

Write a 2- to 3-page paper exploring how the critical ideas of a chosen theorist are impacted by or impacting (positively or negatively) a current contextual issue in learning and schooling

Free Assignment Quote

Assured A++ Grade

Get guaranteed satisfaction & time on delivery in every assignment order you paid with us! We ensure premium quality solution document along with free turntin report!

All rights reserved! Copyrights ©2019-2020 ExpertsMind IT Educational Pvt Ltd