Create a new variable that categorizes blood hdl

Assignment Help Other Subject
Reference no: EM131031673

Project: Multiple Linear Regression

Use the dataset m2project2016.dta for this project. The dataset is a sample from the 1999-2002 National Health and Nutrition Examination Survey (NHANES). A race-stratified random sample was selected from a complete dataset for the variables included, and then modified to remove missing data. The random sampling based on race was setup to result in a race distribution similar to the US population. Also, variable names are modified (so you won't find the variable names on the NHANES website), and in some cases continuous variables are categorized, or categories combined for categorical variables. The data are restricted to no reported history of heart disease, and no reported use of prescription drugs for hyperlipidemia or inflammation. The dietary intakes are based on a 24-hour diet recall interview.

The objective is to examine the nature of the relationship between fish intake and blood HDL-cholesterol levels (if any), and evaluate the confounding influence of other variables. Submit the following items for this project using the Dropbox:

1. A project report organized according to the steps listed below. The insertion of program output into the report should be limited and clearly tied to the discussion of results. Figures (graphs) can be inserted onto the report document of posted as separate files. All tables should be incorporated into the report document.

2. A program file with all the program commands used to complete the project (do-file in Stata). The commands should be ordered according to the list of tasks below and comments included describing the data analysis step the commands are used for.

Data Analysis Steps:

1. Create a new variable that categorizes blood HDL into quartiles: <=40 mg/dL, 41 to 48 mg/dL.to make sure you get the proper HDL categories, you can use the following Stata code:
genhdlcat = 1 if hdl>0 &hdl<=40
replacehdlcat = 2 if hdl>40 &hdl<=48
replacehdlcat = 3 if hdl>48 &hdl<=59
replacehdlcat = 4 if hdl>59 &hdl<.

2. Conduct descriptive analyses and bivariate association tests between HDL quartiles and other variables so as to complete Tables 1 and 2. Note the categorized variable for blood HDL needs to be used for Tables 1 and 2, but the continuous HDL variable is used for the multiple regression analyses.Conduct the appropriate tests to evaluate bivariate associations of characteristics with HDL quartile (as a categorical variable). When the characteristic is numerical (e.g., BMI), test for linear and quadratic trends across the HDL quartiles. See the competed analyses for fish consumption and race in Table 1 below as examples.

3. Create new variables to mean-center the following continuous variables: age, bmi, tkcal, tprot, tcarb, tchol, tfat, tfibe, tvc, tsele, tg, and ldl.Keep the original variables.

4. Use a saturated model with no interactions (a saturated model includes all predictors) and perform an initial evaluation of collinearity using the variance inflation factor.

Saturated Model
Dependent Variable: HDL

Predictors:gender, centered-age, race, education, centered bmi, centered dietary energy, centered dietary protein, centered dietary carbohydrate, centered dietary cholesterol, centered dietary fat, centered dietary fiber, centered dietary vitamin C, centered dietary selenium, alcohol, physical activity, dietary fish, centered blood triglyceride, centered blood LDL, smoking status

Identify the predictors, that when removed from the model, eliminate the collinearity.First remove predictors other than tfish, gender, age, bmi, pactive, and smoke to resolve collinearity (these predictors are in the first group for the Allen-Cady procedure - see below). Establish a model containing as many of the original variables as possible that does not exhibit collinearity. This will still be called model 1.

5. Use model 1 from step 4 above and evaluate the linearity assumption forBMI, age, fish consumption, and blood triglyceride. There should be evidence for nonlinearity for two of these four predictors. Create quadratic terms for the two variables and include in model 1. Then re-evaluate linearity for the two variables (just the linear terms). Include in the report a description of how linearity was assessed and any graphs and relevant statistical output used both before and after inclusion of quadratic terms in the model. The model from this step is called model 2.

6. Usemodel 2 from step 5 above and evaluate the normality and homogeneity of variance assumptions for the dependent variable. A problem with the normality and homogeneity of variance should be found that is mostly fixed by transforming the dependent variable (use the natural log).Include in the report a description of how the assumptions are evaluated and include any graphs and relevantstatistical output used. The model from this step is called model 3.

Normally, the linearity for numerical variables would be rechecked (BMI, age, fish consumption, and blood triglyceride). This step will be skipped for the project. The two variables that whose nonlinearity was fixed by inclusion of quadratic terms remained fixed when the transformed HDL is used as the dependent variable instead of HDL.

7. Usethe DFBETA statistic with model 3 (log HDL)to evaluate and document the presence of influential data, but do not delete any data. Make a list of the influential observations (ID numbers and associated influence statistics; graphs are also useful).

8. Usethe transformed form of HDL and fish consumption (extra terms for a nonlinearity or not) and examine the confounding influence of covariates (extra terms for a nonlinearity or not) on the association of fish consumption and blood HDL. Do this using regression models for HDL containing only fish consumption and the one potential confounder. Complete Table 3 and describe the findings.

9. Use the Allen-Cady Modified Backwards Selection procedure with Model 3 (log HDL) to reduce the number of predictors in the regression model. For the first group of predictors that are always in models, use the following predictors: total fish consumed, gender, centered-age, centered-BMI, physical activity, and smoker status (exclude any of these variables that induce collinearity - exceptthe predictor of interest total fish consumption). These predictors were chosen for the first group based on one of two criteria: 1) a predictor of interest (total fish consumption), or 2) some documentation in the literature for an association with blood HDL. There may be documentation for some of the other predictors having an association with blood HDL, but for this project the predictors previously listed will be used to simplify the possible final models.The ranking of covariates that is required for the second group is left to each student to perform independently. Then carry out the backward selection using p=0.1 as the retention criteria. Use Table 4 to present the stepwise results and use Table 5 to summarize the model resulting from the backward selection (include all predictors from both groups and add rows to the table or modify as needed). Write a summary that includes interpretations of the regression coefficients in terms of the association between predictor and blood HDL-cholesterol (see page 129 of VGSM for interpreting coefficients when the dependent variable is log transformed). Evaluate any ordinal predictors in the modelfor trends (linear and quadratic when justified), and if there is no trend adjust p-values for multiple comparisons for multilevel categorical variables.

10. Using the regression model selected by the Allen-Cady Modified Backwards Selection procedure evaluate the interaction between fish consumption and gender.Use Table 6 to summarize the final model with the interaction added (add rows to the table or modify as needed), and write a summary that includes interpretations of the regression coefficients for the interaction in terms of the association between predictor and blood HDL-cholesterol.Make a graph that illustrates the interaction (or absence of interaction). Indicate whether the inclusion of the interaction modified any association HDL-cholesterol with the other predictors in the model.

Table 1. Characteristics of the study sample by Blood HDL Quartiles

Characteristic

HDL (mg/dL) Categories

p-value

 

< 40

(n=439)

41 to 48

(n=458)

48 to 59

(n=433)

> 59

(n=436)

Mean or %

SD

Mean or %

SD

Mean or %

SD

Mean or %

SD

Fish Consumption (meals/30 days)

1.9

3.4

1.5

3.1

2.1

3.0

3.0

4.6

< 0.001a

0.002c

Age (years)

 

 

 

 

 

 

 

 

 

 

BMI (kg/m2)

 

 

 

 

 

 

 

 

 

 

Gender (% female)

 

 

 

 

 

 

 

 

 

 

Smoker (% yes)

 

 

 

 

 

 

 

 

 

 

Race/Ethnicity (%)

 

 

 

 

 

 

 

 

0.005d

    White

63.8

 

58.3

 

61.2

 

65.1

 

    Black

8.9

 

13.1

 

15.2

 

14.9

 

    Hispanic

27.3

 

28.6

 

23.6

 

20.0

 

Physical Activity (%)

 

 

 

 

 

 

 

 

 

   Low

 

 

 

 

 

 

 

 

   Low-Moderate

 

 

 

 

 

 

 

 

   High-Moderate

 

 

 

 

 

 

 

 

   High

 

 

 

 

 

 

 

 

Education Level (%)

 

 

 

 

 

 

 

 

 

   Less than HS

 

 

 

 

 

 

 

 

   HS/GED

 

 

 

 

 

 

 

 

   Some college

 

 

 

 

 

 

 

 

   College or more

 

 

 

 

 

 

 

 

a. ANOVA F-test.

b. Test for linear trend after ANOVA.

c. Test for quadratic trend after ANOVA.

d. Chi-square test.

Table 2. 24-Hour diet intake profile of the study sample by Blood HDL Quartile

Dietary Factor

HDL (mg/dL) Categories

p-value

< 40

(n=439)

41 to 48

(n=458)

48 to 59

(n=433)

> 59

(n=436)

Mean or %

SE

Mean or %

SE

Mean or %

SE

Mean or %

SE

Energy (kcal)

 

 

 

 

 

 

 

 

 

Protein (gm)

 

 

 

 

 

 

 

 

 

Carbohydrate (gm)

 

 

 

 

 

 

 

 

 

Fat (gm)

 

 

 

 

 

 

 

 

 

Cholesterol (gm)

 

 

 

 

 

 

 

 

 

Fiber (gm)

 

 

 

 

 

 

 

 

 

Vitamin C (mg)

 

 

 

 

 

 

 

 

 

Selenium (mcg)

 

 

 

 

 

 

 

 

 

Alcohol (% yes)

 

 

 

 

 

 

 

 

 

a. ANOVA F-test.

b. Test for linear trend after ANOVA.

c. Test for quadratic trend after ANOVA.

d. Chi-square test.

Table 3. Confounding Influence of Covariates on Fish Consumption Regression Coefficient

Potential Confounder

b

% Change in ba

p-valueb

None

0.551

 

< 0.001

Age

0.554

+ 0.5

< 0.001

BMI

 

 

 

Gender

 

 

 

Smoker

 

 

 

Race/Ethnicity

 

 

 

Physical Activity

 

 

 

Education Level

 

 

 

Dietary Energy

 

 

 

Dietary Protein

 

 

 

Dietary Carbohydrate

 

 

 

Dietary Fat

 

 

 

Dietary Cholesterol

 

 

 

Dietary Fiber

 

 

 

Dietary Vitamin C

 

 

 

Dietary Selenium

 

 

 

Alcohol

 

 

 

a. (b(confounder) - b(fish))/b(fish) as %.

b. P-value for fish beta coefficient for in a model also containing the potential confounder.

Table 4. Allen-Cady Procedure Results

Predictor in Rank Ordera

Coefficient Estimate P-Valueb

Step 1

Step 2

Step 3

Step 4

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

a. Most important to least important.

b. P-value for beta coefficient for predictors at each step in the backward selection with p=0.1 as the retention criteria.

Table 5. Regression model for the association of blood HDL-cholesterol with fish consumption adjusting for confounding by demographic characteristics and dietary factors.

Predictor

b

95% CI

p-value

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Table 6. Regression model for the association of blood HDL-cholesterol with fish consumption and the interaction with gender, with adjustment for confounding by demographic characteristics and dietary factors.

Predictor

b

95% CI

p-value

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Attachment:- Assignment.rar

Reference no: EM131031673

Questions Cloud

Use some of your project management learnings : What is it about project management that you like most based upon what you know so far in this course? What is it about project management that you like least based upon what you know so far in this course? What work or non-work activities will you b..
Considering changing her cell phone plan : A real estate agent is considering changing her cell phone plan. There are three plans to choose from, all of which involve a monthly service charge of $20. Plan A has a cost of $.38 a minute for daytime calls and $.18 a minute for evening calls. Sup..
Why do many managers act like control freaks : What is Pareto's Principle (or the 80-20 Rule), and what does it have to do with control? Why do many managers act like control freaks? Why do many managers micromanage so much? Why are many managers reluctant to take the actions necessary to correct..
Zero interest-bearing note : Identify the authoritative literature that provides guidance on the zero interest-bearing note. Use some of the examples to explain how the standard applies in this setting.
Create a new variable that categorizes blood hdl : Module 2 Project: Multiple Linear Regression, HSC-731, spring 2016. Create a new variable that categorizes blood HDL into quartiles:
Determining business-IT solutions and investments : Which answer is not a technique used for identifying and determining business/IT solutions and investments?
Determining the balance depreciation rate : The new machine will cost $500,000 and can be depreciated straight line over 10years for tax purposes. Accounting depreciation is 15% reducing balance. Mr PODS has recommended that the 15% reducing balance depreciation rate beused for any analysis..
Describe the process of quantitative analysis : Describe the process of quantitative analysis. Analyze appropriate methods of quantitative analysis used for business problems. Solve business problems with appropriate quantitative decision-making models.
How to shape the argument for each audience : In order to understand how to shape the argument for each audience, you must be clear who the main audience is, what their needs and concerns are, and what you want them to do. (Your audience is those who have the power to approve funding for your so..

Reviews

Write a Review

Other Subject Questions & Answers

  Write a film review about joy lock club

Write a film review about "joy lock club". the review must describe about what we learned in SOC 100

  Prison guard supervising a tier

You are a prison guard supervising a tier. One of the inmates comes to you and asks a favor. Because he is a troublemaker., his mail privileges have been taken away.

  The value of campus rotation

The value of Campus Rotation: You and several classmates have decided to rotate to a different campus next year. Explain why this is important for your education and whether it should be optional or mandatory for all UG students.

  Does loyalty skew a person beliefs away

Does loyalty skew a person beliefs away from what is evidence will support.

  Most important to sleep and wakefulness

The brain structure that appears to be the MOST important to sleep and wakefulness is the. The brain receives information about the position of the various parts of the body through the

  How this issue can help with early detection

Write a 750- to 1,500-word paper regarding how this issue can help with early detection and prevention

  Securities and exchange commission may obtain information

Like other federal agencies, the Securities and Exchange Commission may obtain information concern­ing activities and organizations that it oversees by compelling disclosure through

  What open-invoicing and which types businesses use

What is open-invoicing and which types of businesses would use it? Compare it with the balance-forward method. Why wouldn't one or the other suffice for all companies?

  Identify the specific elements of the law of negligence

Identify the specific elements of the law of negligence that Patty must prove to recover any actual damages from Donald. Explain why those elements are critical to this situation.

  Write about eubusiness and eu observer

Contemporary Events Discussions. Below is a list of suggested sources, although this is by no means a comprehensive list. Write about EUBusiness and EU Observer.

  Developing a market research plan

You have been tasked with developing a market research plan for your company’s new line of lip balms for men and women. Determine what type of market research (primary or secondary) you’ll need to collect and the tool(s) you’ll use to perform your da..

  Naturalistic observations and evaluation

Naturalistic Observations and Evaluation

Free Assignment Quote

Assured A++ Grade

Get guaranteed satisfaction & time on delivery in every assignment order you paid with us! We ensure premium quality solution document along with free turntin report!

All rights reserved! Copyrights ©2019-2020 ExpertsMind IT Educational Pvt Ltd