Is this adjusted association statistically significant

Assignment Help Other Subject
Reference no: EM131082557

Part I. Interpreting regression results

All of the analyses in this section used a data set containing measurements of variables of characteristics of 40 married couples and the first child born to each of those 40 married couples (you have seen analyses from these data previously in lecture). These variables include the height of the mother and the height of the father at the time of the birth of their first child, the gender of the child, and the eventual height of the child at age 18. These variables are named mother, father, male, and child, respectively. Various relevant output follows (all heights in inches).

For each of the linear/logistic regression analyses conducted with STATA shown below, answer the questions that appear immediately below the output, citing the evidence from the output that supports your answers. This bolded point is very important. None of the questions below can be answered without citing information from the STATA output.

HELPFUL HINTS FOR PART I - For any slope measuring the association between a categorical EV with 2 categories and the RV in linear regression, the value of the slope is equal to the value of the RV mean in one EV category minus the value of the RV mean in the second EV category. Whenever STATA computes these slopes, it always displays the slope as the value of the RV mean in the EV category with the highest numerical code minus the RV mean in the EV category with the lowest numerical code. So if the 2 categories of an EV are coded with the values 1 and 0, the slope will be equal to the value of the RV mean in EV category 1 minus the value of the RV mean in EV category 0.

Simple linear regression of child height predicted by child gender:

reg child male

Y=child (child height)

X1=male (child gender 0=female 1=male)

  Source |       SSdf       MS                 Number of obs =      40

---------+------------------------------            F(  1,    38) =   26.59

   Model |  114.022619     1  114.022619               Prob> F      =  0.0000

Residual |  162.952381    38  4.28822055             R-squared     =  0.4117

---------+------------------------------              

   Total |     276.975    39  7.10192308              

------------------------------------------------------------------------------

child |      Coef.   Std. Err.       t     P>|t|       [95% Conf. Interval]

---------+--------------------------------------------------------------------

male |   3.380952   .6556652      5.157   0.000       2.053628    4.708277

   _cons |         66   .4750745    138.926   0.000       65.03826    66.96174

------------------------------------------------------------------------------

1) Are the male children taller than the female children in this population? If so, by how much?

2) What is the range of values within which the true value of the difference in height between 18-year-old boys and girls is likely to reside?

Simple linear regression of mother's height predicted by father's height:

reg mother father

  Source |       SSdf       MS                  Number of obs =      40

---------+------------------------------               F(  1,    38) =   41.53

   Model |  86.6284483     1  86.6284483               Prob> F      =  0.0000

Residual |  79.2715517    38  2.08609347               R-squared     =  0.5222

---------+------------------------------              

   Total |      165.90    39  4.25384615              

------------------------------------------------------------------------------

mother |      Coef.   Std. Err.       t     P>|t|       [95% Conf. Interval]

---------+--------------------------------------------------------------------

father |   .6831897   .1060176      6.444   0.000       .4685683     .897811

   _cons |   18.84159   7.329374      2.571   0.014       4.004054    33.67914

3) Do taller men tend to marry taller women in this population?

Mulitple linear regression of child's height predicted by child's gender and mother's height.

. reg child male mother

Y=child (child height)

X1=male (child gender)

X2=mother (mother's height)

  Source |       SSdf       MS                  Number of obs =      40

---------+------------------------------               F(  2,    37) =   91.70

   Model |  230.477536     2  115.238768               Prob> F      =  0.0000

Residual |  46.4974635    37   1.2566882               R-squared     =  0.8321

---------+------------------------------              

   Total |     276.975    39  7.10192308              

------------------------------------------------------------------------------

child |      Coef.   Std. Err.       t     P>|t|       [95% Conf. Interval]

--------+--------------------------------------------------------------------

male |   2.788491   .3602383      7.741   0.000       2.058579    3.518403

mother |   .8503315    .088333      9.626   0.000       .6713517    1.029311

   _cons |   10.14665   5.807782      1.747   0.089      -1.621034    21.91433

------------------------------------------------------------------------------

4) For every one-inch increase in mother's height, by how much would you expect child's height to increase, after controlling for possible confounding due to child gender? Is this adjusted association statistically significant?

Mulitple linear regression of child's height predicted by child's gender, mother's height and father's height.

. reg child male mother father

Y=Child (child height)

X1=male (child gender)

X2=mother (mother's height)

X3=father (father's height)

  Source |       SSdf       MS                  Number of obs =      40

---------+------------------------------               F(  3,    36) =   65.33

   Model |  233.994448     3  77.9981494               Prob> F      =  0.0000

Residual |  42.9805517    36  1.19390421               R-squared     =  0.8448

---------+------------------------------              

   Total |     276.975    39  7.10192308              

------------------------------------------------------------------------------

child |      Coef.   Std. Err.       t     P>|t|       [95% Conf. Interval]

---------+--------------------------------------------------------------------

male |   2.901731   .3572694      8.122   0.000       2.177155    3.626307

mother |   .6907186   .1267338      5.450   0.000       .4336906    .9477467

father |   .2026242    .118058      1.716   0.095      -.0368085    .4420569

   _cons |   6.628294   6.020587      1.101   0.278      -5.582023    18.83861

------------------------------------------------------------------------------
5) For every one-inch increase in mother's height, by how much would you expect child's height to increase, controlling for child gender and father's height?

6) What is the predicted height for the daughter of a 70-inch mother and a 74-inch father?

Multiple logistic regression of child's gender predicted by mother's height and father's height:

logistic male mother father

Y=male (child gender - 0=female 1=male)

X1=mother (mother's height)

X2=father (father's height)

Logit estimates                                   Number of obs=         40

                                                 LR chi2(2)      =       2.60

Prob> chi2     =     0.2731

Log likelihood = -26.377775                       Pseudo R2       =     0.0469

------------------------------------------------------------------------------

male | Odds Ratio   Std. Err.       z     P>|z|       [95% Conf. Interval]

---------+--------------------------------------------------------------------

mother |   1.469965   .3722595      1.521   0.128       .8948401    2.414729

father |   .7659505   .1771988     -1.153   0.249       .4867203    1.205374

7) Can giving birth to a boy (RV=1) be predicted at a greater-than-chance level in the population from the parents' heights combined?

8) Is either mother's or father's height individually associated with the probability that one's first child will be a boy?

Part II - ANOVA through regression

you addressed the vital issue of whether different cookie types lead to more or less milk consumption among children enrolled in public elementary schools in the Los Angeles Unified School District (LAUSD) in order to generate knowledge that would help to prevent LAUSD from going bankrupt due to excessive spending on milk (imagine a tax increase for this reason...).

You conducted a test of the hypothesis that cookie type (a 3-category EV) is associated with milk consumption (measured in ounces) in the population of LAUSD elementary school students using a sample of students obtained from LAUSD, in which each student was randomly assigned to eat 1 of 3 cookie types (chips ahoy, oreos, fig newtons). You recorded how much milk each child consumed while eating the cookies. The hypothesis-testing method you used was analysis of variance (ANOVA) with follow-up tests of differences in milk consumption between all possible pairs of cookie categories using Tukey's multiple comparison procedure.

you have learned that you can test the association between a 3-category EV and a quantitative RV using multiple linear regression. This requires creating g-1 dummy variables (where g is the number of groups, or categories of the EV) and including the dummy variables as EVs in a multiple regression. This "ANOVA through regression" is preferred to traditional ANOVA with follow-up tests if 2 conditions are met:

-you can test your hypothesis while limiting the number of comparisons of pairs of group means among the set of 3 group means to g-1, or 2 comparisons.

-each of the comparisons you test has the same reference group, or comparison group

So you are going to repeat the analysis you completed testing for differences in milk consumption between/among cookie types. This time, you will assume that school district administrators hypothesized that children who ate fig newtons would consume significantly less milk than children who ate chips ahoy and children who ate oreos. So these are the 2 comparisons that you will test statistically, and you will do so by creating 2 dummy variables and including them as EVs in a multiple linear regression analysis.

Here are the sample means for each cookie type (not all of these will appear in your regression output), and the null and alternative hypothesis statements:

EV - cookie type (3 categories)

RV - milk consumption (in ounces)

  1. Chocolate chip (n=18)
  2. Oreo (n=18)
  3. Fig Newton (n=18)

X- = 12.0 ounces

X- = 10.1 ounces

X- = 4.2 ounces

 

 


H0: µ1 (chips ahoy) = µ2 (oreo) = µ3 (fig newton)

Ha: at least one µ ≠ another µ

Use α = .05

Using the data from the original problem #8 on the course website, create two dummy variables from the 3-category COOKIETYPE variable. Be very careful when deciding which cookie categories should receive values of 1 on each of the two dummy variables, and which cookie categories should receive values of 0 on the two dummy variables. Also think carefully about which cookie represents the reference category. This is the cookie that should receive codes of 0 on both dummy variables.

It should be noted that you did not complete a lab exercise in which you created dummy variables and included them as EVs in a multiple linear regression. But you can create these dummy variables using either the GENERATE (GEN) or RECODE functions in STATA. You have instructions from previous lab sections that will help you to do this. After creating the 2 dummy variables:

9) Run the multiple linear regression analysis with milk consumption as the RV and the 2 dummy variables included as EVs using STATA. Paste your output below.

10) Interpret the value of the y-intercept in your output

11) Interpret the value of the slope measuring the association between your FIRST dummy variable and milk consumption. Does the t-test for this slope indicate a significant difference in milk consumption between the fig newton group and another group? Report the t-statistic and p-value in support of your answer.

12) Interpret the value of the slope measuring the association between your SECOND dummy variable and milk consumption. Does the t-test for this slope indicate a significant difference in milk consumption between the fig newton group and another group? Report the t-statistic and p-value in support of your answer.

Part III. Choosing Statistical Tests, conducting those tests and interpreting the results of those tests

We noted on the midterm that an important skill necessary for conducting applied social science research is to be able to translate a research question into the correct choice of a statistical test. It is just as important to be able to subsequently conduct the statistical test and interpret the results of that test.

For each of the following research questions below, you will not only choose the appropriate statistical hypothesis testing procedure, but you will also run the test using data that we provide and interpret the results. Consider each of the research questions below and answer the questions that follow:

13) Do the annual incomes of men and women differ after holding years of education, hours worked per week and years employed at current job constant?

There is a data file on the CCLE course website named FINALPARTIII13 containing GSS data that you will use to conduct the statistical test that you choose. The following variables are in the file:

CONRINC - annual income of the respondent
SEX - gender of the respondent 1=male 2=female
EDUC - number of years of education completed by the respondent
HRS1 - number of hours worked in the last week (we are using this as a more general measure of hours worked per week)
YEARSJOB - years employed at current job

a) identify the EV

b) identify the RV

c) identify the CV(s)

d) state a hypothesis about the direction of the expected association between the EV and the RV that you identified in a and b above. You are not being asked to state null and alternative hypotheses here. Just state the hypothesis that you would as the researcher conducting this study

e) identify the statistical test that will appropriately test whether or not the EV and RV are associated in the population after controlling for the CV(s), and explain why you chose this particular test

f) statethe null and alternative hypotheses for the expected association between the EV and RV you identified in a and b above, after holding all CVs constant that you identified in part c above

g) conduct the statistical test you chose in part e above in STATA using the data provided on the CCLE course website. Paste the output below

h) write a short summary of the results in which you state your decision to reject or retain the null hypothesis you stated in f above. In your summary, identify the values the association between all EVs/CVs combined and the RV and its test statistic/p-value. Also identify the values of the individual adjusted EV/RV association, the adjusted CV/RV associations, their test statistics and p-values

i) what is you main conclusion about the presence or lack of presence of an EV/RV association in the population after holding the CV(s) constant?

Part III (CONTINUED). Choosing Statistical Tests, conducting those tests and interpreting the results of those tests

14) Does the probability that someone reports being happy (HAPPY) differ for whites and non-whites (WHITE) after holding constant other indicators of quality of life like health status (GOODHEALTH), feelings of safety (SAFE), and poverty level (NOPOVERTY)?

There is also a data file on the CCLE course website named FINALPARTIII14 containing GSS data that you will use to conduct the statistical test that you choose. The following variables are in the file:

HAPPY - responses to question asking how happy a respondent is with his/her life. 1=respondent reports being "very happy" or "happy", 0 =respondent reports being "not so happy"
WHITE - race/ethnicity 1=non-Hispanic White, 0=all others
GOODHEALTH - responses to question asking the respondent to rate his/her overall health. 1=respondent reports "very good" or "good" health, 0=respondent reports "fair" or "poor" health
SAFE - responses to question asking whether or not respondent feels afraid when walking in his/her neighborhood at night. 1=no 0=yes
NOPOVERTY - 1=respondent does not live in poverty, 0=respondent lives in poverty

On the variables GOODHEALTH, SAFE and NOPOVERTY, values of 1 indicate higher quality of life and values of 0 indicate lower quality of life.

a) identify the EV

b) identify the RV

c) identify the CV(s)

d) state a hypothesis about the direction of the expected association between the EV and the RV that you identified in a and b above. Again, you are not being asked to state null and alternative hypotheses. Just state the hypothesis that you would as the researcher conducting this study

e) identify the statistical test that will appropriately test whether or not the EV and RV are associated in the population after controlling for the CV(s), and explain why you chose this particular test

f) statethe null and alternative hypotheses for the expected association between the EV and RV you identified in a and b above, after holding all CVs constant that you identified in part c above

g) conduct the statistical test you chose in part e above in STATA using the data provided on the CCLE course website. Paste the output below

h) write a short summary of the results in which you state your decision to reject or retain the null hypothesis you stated in f above. In your summary, identify the values of the individual adjusted EV/RV association and CV/RV associations, their test statistics and p-values

i) what is you main conclusion about the presence or lack of presence of an EV/RV association in the population after holding the CV(s) constant?

Part IV. Multiple logistic regression: Main effects and interaction models

The Western Collaborative Group Study (WCGS) is a 25-year longitudinal study of Coronary Heart Disease (CHD) among men that began in 1960. Participants were age 39-59 at the beginning and were also determined not to have heart disease at that time.

A distinctive feature of the WCGS was that it investigated behavioral and personality risk factors for heart disease, with a focus on the Type A personality as a risk factor for heart disease. And the question of whether the Type A personality was a risk factor for CHD was central to the study.

The Type A personality reflects a lifestyle characterized by taking on more than one can handle, multitasking, and high stress levels in all areas of life. The Type B personality reflects a much more relaxed lifestyle. Type A/B classification is determined by responses to questions in a questionnaire called the Jenkins Activity Survey. The original sample was contacted annually for follow-up interviews and measurements of whether or not each participant acquired CHD.

The following STATA output is from a logistic regression analysis predicting whether a participant had acquired CHD by 1969. The explanatory variables include Type A/Type B classification and other risk factors for CHD that could be confounding variables of any observed association between Type A personality and CHD. These possible confounding variables included age, blood pressure, cholesterol, smoking status, and BMI. A summary of all variables appears below, followed by the results of the main effects logistic regression analysis:

Variable  Label

chd69       RV:Any CHD event (heart attack or angina (0=no CHD,1=CHD)

typea EV:Type A/B classification (0=Type B,1=Type A)

age         CV:Participant age in years

chol240CV:Cholesterol above/below 240 (0= <240 "good", 1= >=240 "bad")

sbp140      CV:Systolic BP above/below 140 (0= <140 "good" 1= >=140 "bad")

overweightorobese CV:BMI above/below 25 (0=<25 normal 1=>=25 overweight/obese)

smoke       CV:Smoking Status (0=non-smoker 1=smoker)

Main effects model for CHD at 10 years (1969)             

 Logistic regression                               Number of obs=       3142

LR chi2(6)      =     152.29

Prob> chi2=     0.0000

Log likelihood = -813.45397                       Pseudo R2=     0.0856

 -------------------------------------------------------------------------------

chd69 |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]

--------------+----------------------------------------------------------------

typea |   .7440875   .1433427     5.19   0.000     .4631409    1.025034

age |   .0595851   .0118483     5.03   0.000     .0363629    .0828074

chol240 |   .7380767   .1352749     5.46   0.000     .4729427    1.003211

sbp140 |   .5555681   .1451587     3.83   0.000     .2710623    .8400738

overweighto~e |   .2184401   .1378186     1.58   0.113    -.0516795    .4885597

smoke |   .5939591   .1386625     4.28   0.000     .3221856    .8657325

        _cons |  -6.579158   .5891206   -11.17   0.000    -7.733814   -5.424503

-------------------------------------------------------------------------------

-------------------------------------------------------------------------------

chd69 | Odds Ratio   Std. Err.      z    P>|z|     [95% Conf. Interval]

--------------+----------------------------------------------------------------

typea |    2.10452   .3016676     5.19   0.000     1.589057     2.78719

age |   1.061396   .0125757     5.03   0.000     1.037032    1.086333

chol240 |   2.091908   .2829827     5.46   0.000     1.604709    2.727023

sbp140 |   1.742931   .2530015     3.83   0.000     1.311357    2.316538

overweighto~e |   1.244134   .1714649     1.58   0.113     .9496332    1.629967

smoke |   1.811145   .2511378     4.28   0.000     1.380141    2.376746

        _cons |    .001389   .0008183   -11.17   0.000     .0004378    .0044073

-------------------------------------------------------------------------------

15) Refer to the main effects model results above and interpret the adjusted odds ratios measuring the association between the main EV (Type A/B personality) and the RV (CHD after 10 years), and between each CV and the RV. You don't have to interpret the values of the LR coefficients in the output above.

Interaction model for CHD at 10 years (1969)

The interactive effect of Type A/B classification (EV1) and overweight/normal weight classification (EV2) on CHD 10 years later (RV) was tested because you were interested in determining if the association between Type A personality and CHD 10 years later differed between overweight and normal weight individuals. So Type A personality is the main EV.

Logistic regression                               Number of obs=       3142

LR chi2(7)      =     152.70

Prob> chi2=     0.0000

Log likelihood = -813.25227                       Pseudo R2=     0.0858

 -------------------------------------------------------------------------------

chd69 |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]

--------------+----------------------------------------------------------------

typea |    .552525   .1925011     2.87   0.004     .1752254     .929825

age |    .059724   .0118595     5.04   0.000     .0364799    .0829681

chol240 |   .7373236   .1353085     5.45   0.000     .4721237    1.002523

sbp140 |   .5543145   .1452926     3.82   0.000     .2695463    .8390826

overweighto~e |    .095618   .2381595     0.40   0.688    -.3711662    .5624021

smoke |    .594825   .1387015     4.29   0.000     .3229751    .8666748

typea_by_over |   .4056084   .19882762.04   0.041     .0159063    .7953105

        _cons |  -6.531349   .5936486   -11.00   0.000    -7.694879   -5.367819

-------------------------------------------------------------------------------

-------------------------------------------------------------------------------

chd69 | Odds Ratio   Std. Err.      z    P>|z|     [95% Conf. Interval]

--------------+----------------------------------------------------------------

typea |   1.737551   .3729809     2.87   0.004     1.191515    2.534066

age |   1.061544   .0125893     5.04   0.000     1.037153    1.086507

chol240 |   2.090333     .28284     5.45   0.000     1.603396     2.72515

sbp140 |   1.740747   .2529176     3.82   0.000      1.30937    2.314243

overweighto~e |   1.100339   .2620561     0.40   0.688     .6899293    1.754883

smoke |   1.812714    .251426     4.29   0.000     1.381231    2.378987

typea_by_over |   1.500215     .34537     2.04   0.0411.016033     2.21513

        _cons |    .001457    .000865   -11.00   0.000     .0004552    .0046643

16) Refer to the interaction model and interpret the test of Type A personality-by-BMI category interaction. If it is significant, it will be necessary for you to thoroughly describe how the size of the effect of Type A personality on CHD within 10 years is modified by BMI. In your interpretation:

- report and interpret two ORs. The first OR should measure the association between Type A/B personality and getting CHD for the normal weight group. And the second OR should measure the association between Type A/B personality and getting CHD for the overweight group (recall that one of these is already in the table, and the other has to be calculated from the relevant logistic regression coefficients in the output above)

-cite evidence from the STATA output indicating whether these 2 ORs differ significantly or do not differ significantly

Here is the coding of the relevant variables shown at the beginning of Part IV for your reference:

CHD: 0=absent, 1=present
Type A/B classification: 0=Type B, 1=Type A
BMI category: 0=normal weight, 1=overweight or obese

Reference no: EM131082557

Questions Cloud

Generalize the create df () and extract variables () : generalize the create DF () and extract Variables () functions to handle additional oddities in the raw text files.
Risk of running out of money : Using the Time Value of Money tell me how much you need to save per year, at what percentage, for how many years to reach your retirement goal. Then tell me how you will take an income from that goal without the risk of running out of money.
Compare the women in 1999 to the women in 2012 : Compare the women in 1999 to the women in 2012 and to the men in 1999 and 2012.
Reducing overhead costs : Perform an Internet search using the phrase "reducing overhead costs". Select and read a case study or article from the results of your search.
Is this adjusted association statistically significant : For every one-inch increase in mother's height, by how much would you expect child's height to increase, controlling for child gender and father's height?
What is the npv of the movie : What is the NPV of the movie if the cost of capital is 10.8%? According to the NPV rule, Should the movie be made?
Create longitudinal records for the females : create longitudinal records for the females.
Which uses less memory what is the dimension of each : . Which uses less memory? What is the dimension of each?
Original amount of the loan : A Bank makes a 360-month mortgage loan to a traditional (Prime) Borrower. The cost of the house was $250,000 and the original amount of the loan was $200,000. The Bank charged the Borrower an interest rate of 4.5%

Reviews

Write a Review

Other Subject Questions & Answers

  What are some common errors that might occur in the budget

1.what is a line item budget and how is it developed and monitored? provide specific examples.2. what are some common

  Discussed by wade davis in novel serpent and the rainbow

How does Nigerian witchcraft in the novel "Return to Laughter" compare with the rare Haitian practice of making zombies as discussed by Wade Davis in the novel "Serpent and the Rainbow"?

  Difference between general and limited jurisdiction

Marbled Granite Company files a suit against Natural Stone, Inc., in a Colorado court with general jurisdiction. In a Delaware court with limited jurisdiction, E-Sales Corporation files a suit against First State Bank. The difference between general ..

  Development of the concept of electric charge

What, according to Kuhn, is a field of inquiry like before it adopts its first paradigm? Why is it so difficult for scientists engaged in such inquiry to reach a consensus about how to go forward?What, according to Kuhn, is a field of inquiry like..

  Viewing the english mania video

1. After viewing the English Mania video (in the Overview section of the course Content area), answer the following questions: Why is it important to understand cultures other than our own? English is a compulsory subject for all Chinese elementary s..

  Exercise contain arguments from a single set of premises

This exercise contain arguments from a single set of premises. In each case decide whether or not the argument is valid. If it is, give an informal proof. If it isn't, use Tarski's World to construct a counter example.

  Evaluate social and economic factors in mental health

Evaluate social, political, environmental, and economic factors in relation to mental health. Relate primary health care philosophy, principles, and strategies to promote mental health in nursing practice.

  Explain what religion is in a single sentence

Explain what religion is in a single sentence. Explain whether “religion” is different from being “religious.” Explain whether “religious” is different from being “spiritual.” Describe whether what you learned in this course changed your understandin..

  Control charts

Control charts: A Control the process B Distinguish "common cause" from "special cause" variation C Manipulates factors at different levels to see their effect on some desired result D Are a combination of factors that influence the production of a r..

  Social capital and human capital

Discuss the ways in which social inequality can be alleviated at is relates to social capital and human capital.

  Precision performance appraisal process

What are some problems with Precision's performance appraisal process that might cause challenges for Jackson to implement a merit pay program?

  What is a learning organization

What is a learning organization? Do you think that to be successful an organization should be a learning organization? Why or why not?

Free Assignment Quote

Assured A++ Grade

Get guaranteed satisfaction & time on delivery in every assignment order you paid with us! We ensure premium quality solution document along with free turntin report!

All rights reserved! Copyrights ©2019-2020 ExpertsMind IT Educational Pvt Ltd