What can you conclude about the research hypothesis

Assignment Help Basic Statistics
Reference no: EM131045548

Scenario:

A researcher randomly recruited a group of 63 children in southwest Western Australia from an elementary school and followed them for a period of 12 years. At the beginning of the study, each of the children was given an identification number and their gender, area where they live, their daily energy intake, their fibre intake and time spent on physical activities were recorded by a research assistant who compiled all the information into the dataset '2016'. After 12 years, the amount of time these teenagers (the then children) spent playing sports were measured and recorded in the same dataset. The body mass index of the children and the teenagers were also measured. All of the data were measured independently of the children. The variables of the dataset are listed in Table 1.

Table 1: Variables and their descriptions as collected in the study

Variable

Labels

ID

Identification number of the children

GENDER

Gender of the children (1 = Girl; 2 = Boy)

AREA

Area where the children live (1 = Country; 2 = City)

ENERGY

Daily energy intake of the children (in kJ's)

FIBRE

Daily fibre intake of the children (in g)

TIME1

Time children spent playing sports per day (in mins)

TIME2

Time teenager spent playing sports per day (in mins)

BMIC

Body mass index of the children (in kg/m2)

BMIT

Body mass index of the teenagers (in kg/m2)

Open the 2016data set. Use the following questions to guide you through the process as you run some descriptive statistics and also the inferential statistics and prepare your interpretations or conclusions for the researcher.

It is recommended that you first assign the variable 'labels' and 'values' according to the Table above. This will enable you to read the outputs easily.

1. Which of the following would be appropriate to describe the frequency distribution for gender (boys and girls)?       

a. Frequency and percentage

b. Mean and standard deviation

c. Median and interquartile range

d. Variance and standard error

2. The appropriate statistics to describe the body mass index (BMI) of the children would be: _______

a. The percentage of BMI of the children is 20.37 kg/m2.

b. The range of BMI of the children population is 17.26 kg/m2.

c. The BMI of the children is a lot lesser than the BMI of the teenager in the population.

d. In this sample, the average BMI of the children is 20.27 kg/m2, with a standard deviation of 3.67 kg/m2.

3. The researcher is interested to know if variable ENERGY has a Normal distribution. Use the following table as a guide.                                                                               

Measures

Criteria/Cut off points

  • Histogram

Symmetrical, bell-shaped curve

  • Boxplot

Median in the centre of the box with whiskers at equal length at both ends of the box and no outliers

  • Normal Q-Q plot

Most observations appear on the straight line

  • Skewness coefficient

Between -1 and 1

  • Kurtosis coefficient

Between -1 and 1

[STATA users to substrate 3 from the given kurtosis coefficient]

a. Do you think transformation is required for variable ENERGY?                               

i. Yes, natural logarithm of the variable should be done and assessment needs to be carried out in full to assess the Normality of the newly transformed variable.

ii. Yes, variable ENERGY has a Normal distribution and should be transformed to ensure the distribution remains Normal.

iii. No, variable ENERGY already has a Normal distribution.

iv. No, skewed variables (including variable ENERGY) should never be transformed.

b. As you learn to know the distribution of variable ENERGY, what should be the most appropriate measures of centrality and variability to report for variable ENERGY? *Hint: Different measures of centrality and variability need to be reported for data that display a Normal or a skewed distribution.*   i. Mean and standard deviation. The reason is that variable ENERGY has a normal (symmetric) distribution.

ii. Median and interquartile range. The reason is that variable ENERGY does not have a normal distribution but a skewed distribution.

c. A practical interpretation about the measure of variability of variable ENERGY within this sample of southwest Western Australian children for the dietician, as referred to by the 68-95-99% rule, would be: _______                                                     

i. Approximately 4573.04 to 5291.15 kJ is consumed by approximately 68% of the children in this population.

ii. Approximately 95% of the children in this sample consumed 4213.99 kJ to 5650.20 kJ per day.

iii. In this population, approximately 99% of the children consumed 3854.94 kJ to 6009.25 kJ a day.

iv. The average daily energy intake of the children should be between 4841.67 kJ and 5022.52 kJ in this population, as estimated with 95% confidence.

4. The researcher now wants to investigate the levels of energy intake of the boys and girls who spent various amount of time playing sports per day. You need to first recode the variable ENERGY and TIME1 as follows: *Hint: Give the recoded variables a new name and assign value labels to the new recoded variables. *                   

ENERGY

Values for ENERGY to be recoded into following levels

Code

 

 Less than 4500 kJ

(< 4500 kJ)

1

 

Greater or equal to 4500 kJ but less than 5000 kJ

(4500 - 5000 kJ)

2

 

Greater than 5000 kJ

(>5000 kJ)

3

TIME1

Values for TIME1 to be recoded into following levels

Code

 

Less than 45 minutes

(< 45 min)

1

 

Equal to or more than 45 minutes

(>= 45 min)

2

a. Obtain a cross-tabulation consisting of the appropriate statistics for energy intake and time the children played sports*Hint: cross-tabulation is for categorical variables.* Which of the following statement(s) is/are appropriate to describe the levels of energy intake between the children who spent less than 45 minutes and those who spent equal or more than 45 minutes playing sports?   

i. Most of the girls (53% of them) spent less than 45 minutes playing sports while less of the boys (45% of them) spent less than 45 minutes playing sports.

ii. There are 32% of the children who consumed between 4500 and 5000 kJ per day spent less than 45 minutes playing sports.

iii. There is not much difference between the percentages of children who spent less than 45 minutes playing sports (54%) than those who spent more than 45 minutes playing sports (46%).

iv. Of the children who consumed more than 5000 kJ, more of them also tend to spend more than 45 minutes playing sports (60%).

b. Assuming the assumptions are met, how would you test if there is any association between the levels of energy intake of children and the levels of time they spent playing sports?                                      

i. Use Pearson Correlation Coefficient, with significance level set at 5% level.

ii. Use Chi-square test, with significance level set at 5% level.

iii. Use an independent samples t-test, with significance level set at 5% level.

iv. None of the above is suitable for this research hypothesis.

c. How can you conclude about the relationship between levels of energy intake and levels of time the children played sports?

i. The chi-square statistic is 12.12 with a p-value of less than 0.05. Assuming the assumptions are met, it can be concluded that there is an association between levels of energy intake and levels of time the children played sports.

ii. The p-value from the test is 0.02. Assuming the assumptions are met, it can be concluded that there is no association between levels of energy intake and levels of time the children played sports when the significance level is set at 5%.

iii. The p-value of the t-test was found to be 0.88. Assuming the assumptions (including the Levene's test) are met, it can be concluded that the energy intake between those who played less than 45 minutes of sports is not significantly different from those who played more than 45 minutes of sports.

iv. None of the above.

5. Confidence intervals (CI) are used to estimate the population parameters as it is impossible to reach everyone in the population.

a. Which of the following statement is correct about the estimation of the average time the population of teenagers spent playing sports?                                 

i. The average time the population of teenagers spent playing sports is estimated to be between 19.7 and 24.8 minutes.

ii. We are 95% confident that the mean time the children spent playing sports lie between 19.7 and 24.8 minutes in this population.

iii. The higher the confidence levels (eg. from 90% to 95% to 99%), the more confident we are about capturing the actual population parameter and therefore the corresponding lengths of the CIs tend to be shorter.

iv. None of the above is correct.

b. If the sample size of this study increased from 63 to 630, we will expect: ______

i. The range of values that are captured within the 90% CI, 95% CI and 99% CI to become shorter as we can be more confident about our estimation now with larger sample size.

ii. The length of the 95% CI to remain the same but the 95% CI is now a more reliable estimation than the 99% CI as the larger sample size warrants a higher level of precision.

iii. The length of the 99% CI to be shorter and be more precise than when the sample size was 63.

iv. Statements 'i' and 'iii' are both correct.

6. The researcher wants to test a research hypothesis that the mean body mass index (BMI) of the teenagers in this population is 22 kg/m2.  

a. The correct hypotheses statement(s) for this research objective would be: ___

i. Ho: μ =22 years old.

ii. Ho: μ =22 kg/m2, H1(or Ha): μ ≠ 22kg/m2

iii. Null hypothesis: the mean BMI of the teenagers is 22kg/m2; Alternative hypothesis: the mean BMI of the teenager is not 22 kg/m2.

iv. Null hypothesis: the population mean BMI of the teenagers is 22kg/m2; Alternative hypothesis: the population mean BMI of the teenagers is not 22 kg/m2.

v. Statements 'ii' and 'iii' are both correct.

vi. Statements 'ii' and 'iv' are both correct.

b. The appropriate statistical test to test this hypothesis and the results would be: ___

i. One sample t-test with 5% level of significance; t-value = 3.84, p-value = <0.001.

ii. Two samples (independent samples) t-test with 5% level of significance; t-value = 3.84, p-value = <0.001.

iii. Paired-sample t-test with the 'alpha' set at 5%; t-value = 0.64, p-value = 0.527.

iv. Pearson correlation coefficient with 5% level of significance; r = 0.083, p-value = 0.520.

v. One-way ANOVA with 5% level of significance; t-value = -5.57, p-value = <0.001.

vi. Chi-square test with 5% level of significance; χ2= 3906, p-value = 0.239.

c. An appropriate conclusion about the research hypothesis would therefore be: ____

i. There is no significant mean difference between the population BMI of the teenagers and the test value, 22 kg/m2, as the p-value is close to zero.

ii. In this population, it is estimated that the mean BMI of the teenagers is 2.21571 kg/m2 less than the hypothesized 22 kg/m2, and therefore the null hypothesis has to be rejected (p<0.05).

iii. The p-value is not much difference from the set level of significance. In addition, the 95% confidence intervaI of the difference does not include the hypothetical value '22', therefore supporting the decision to accept the null hypothesis and conclude that the population mean BMI of the teenagers is 22 kg/m2.

iv. In this population, it is estimated that the mean BMI of the teenagers is significantly 2.21571 kg/m2 higher than the hypothesized 22 kg/m2. In addition, the estimated 95% confidence interval does not include the hypothetical value '22' and therefore the null hypothesis has to be rejected (p<0.05).

7. The researcher now wishes to test the hypothesis that the population mean fibre intake is the same for the boys and girls.  You will test the hypothesis by following the steps of hypothesis testing.

a. State the hypotheses.                                                                                                             

b. State which statistical test you plan to use, and the level of significance (α) you are using.       

c. In addition to 'random sampling' and 'independent observations', state the other two assumptions for the statistical test you decided to use, and test if these two assumptions are met.                               

Assumptions

Evidence of assumptions being met

Biostatistical remedy if assumptions are not met (where applicable)

 

 

 

 

 

 

 

 

 

 

d. After you run the statistical analyses, what can you conclude about the research hypothesis?                               

i. The test statistics is 2.10, the p-value is 0.04, the 95% CI of the difference is (0.02, 0.93) and does not include '0', suggesting that we have to reject the null hypothesis and conclude that the population mean fibre intake is different between the boys and the girls.

ii. The test statistics is 2.54, the p-value is 0.01, the 95% CI of the difference is (0.90, 7.58) and does not include '0', suggesting that we have to reject the null hypothesis and conclude that the population mean fibre intake is different between the boys and the girls.

iii. The test statistics is 2.44, the p-value is 0.02, the 95% CI of the difference is (0.77, 7.71) and does not include '0', suggesting that we have to reject the null hypothesis and conclude that the population mean fibre intake is different between the boys and the girls.

iv. The test statistics is 2.10, the p-value is 0.98, the 95% CI of the difference is (0.02, 0.93) and does not include '0', suggesting that we have to accept the null hypothesis and conclude that the population mean fibre intake is the same between the boys and the girls.

v. None of the above is correct.

8. The researcher wants to know if the average time the children played sports (in minutes) are the same as the average time they spent playing sports (in minutes) when they became teenagers.

a. State the hypotheses.                                                                                                             

b. State which statistical test you plan to use, and the level of significance (α) you are using.                       

c. Assuming the assumptions for the statistical test you chose to do are met, what can you conclude about the research hypothesis after you run the statistical analyses?             

i. It is found that the r-value is -0.27, the p-value is 0.03, suggesting that the time the children spent playing sports is only mildy related to the time the teenagers spent playing sports.

ii. The mean difference between the time the children played sports and the time the teenagers played sports is 22.16 minutes. The t-value is 14.07, p-value is <0.001, 95% CI of the difference is (19.01, 25.31) minutes and does not include '0', suggesting that, in this population, there is a significant difference between the time the children spent playing sports and the time they played sports when they became teenagers on average.

iii. The mean difference between the time the teenagers played sports and the time they played sports while they were children - 22.16 minutes. The t-value is -14.07, p-value is <0.001, 95% CI of the difference is (-25.31, -19.01) minutes and does not include '0', suggesting that, in this population, there is a significant difference between the time the children spent playing sports and the time they played sports when they became teenagers.

iv. Only statement 'i' is incorrect.

9. The researcher wishes to know if population mean energy intake amongst the children is related to the BMI of the teenagers.

a. Assuming all the assumptions are met, the appropriate statistical analysis would be____________

i. One sample t-test with 5% level of significance.

ii. Two samples (independent samples) t-test with 5% level of significance.

iii. Paired-sample t-test with the 'alpha' set at 5%.

iv. Pearson's correlation coefficient with 5% level of significance.

v. One-way ANOVA with 5% level of significance.

vi. Chi-square test with 5% level of significance.

b. Based on the analyses you conducted, is there any relationship between energy intake and BMI of the teenagers?

i. Yes, the p-value is larger than 0.05 from the one sample t-test so we can conclude that there is a relationship between energy intake and BMI of the teenagers.

ii. The p-value of the independent samples t-test is in agreement with the 95% CI of population mean difference ('0' is included in the 95% CI), suggesting that there is no relationship between energy intake and BMI of the teenagers.

iii. Yes, the p-value from the paired-sample t-test is p<0.001, suggesting that there is a significant relationship between energy intake and the BMI of the children and the teenagers.

iv. The correlation coefficient (-0.09) indicates that there is a weak negative linear relationship between the children's energy intake and BMI of the teenagers, suggesting that there is no significant linear relationship between energy intake and the BMI of the teenagers in this population (p = 0.506).

v. The p-value of the one-way ANOVA test is 0.94, suggesting that there is no significant relationship between children's energy intake and BMI of the teenagers in this population.

vi. The chi-square statistic is 126 with a p-value of 0.43, suggesting that there is no relationship between energy intake and BMI of the teenagers in this population.

10. The researcher wishes to test if the BMI of the children varies across the three levels of daily energy they consumed.

a. The appropriate test to use would be ___________ 

i. One sample t-test with 5% level of significance.

ii. Two samples (independent samples) t-test with 5% level of significance.

iii. Paired-sample t-test with the 'alpha' set at 5%.

iv. Pearson correlation coefficient with 5% level of significance.

v. One-way ANOVA with 5% level of significance.

vi. Chi-square test with 5% level of significance.

b. What can you conclude about this research hypothesis?                         

i. The t-statistic is 109.03, p-value is <0.001, the 95% CI is (4841.67, 5022.52) kJ and does not include '0', suggesting that in this population, there is a significant difference between the BMI of the children and their daily energy intake.  

ii. The p-value of the multiple comparison groups is larger than 0.05, suggesting that the null hypothesis should be accepted.

iii. The t-statistic is 108.64, p-value is <0.001, the 95% CI is (4821.45, 5002.20) kJ and does not include '0', suggesting that in this population, there is a significant difference between the BMI of the children and their daily energy intake.

iv. The r-value is 0.06, p-value is 0.640, suggesting that there is no strong variation between the BMI of the children in this population and their daily energy intake.

v. The F test-statistic is 1.08, p-value is 0.347, suggesting that there are no significant differences between the population mean BMI of the children who had different levels of energy intake in this population.

vi. The χ2 is 124, p-value = 0.433, suggesting that there is no significant differences between the levels of energy intake and the BMI of the children in this population.

11. The researcher wants to know if the time the children spent playing sports is related to the area they lived when they were children. Note that there are two different statistical tests that you can use to test this hypothesis. You need to therefore decide if you wish to use the outcome variable as a continuous or the recoded categorical variable and state the variables clearly in the hypothesis, the one statistical test you plan to carry out to test your stated hypothesis, the significance level, justify your choice of statistical test and significance level you planned to use, state the conclusion and provide the output (0.5 mark). You can assume that the assumptions for the test you chose to run are met.

12. Lastly, the researcher wants to test if the BMI (kg/m2) of the teenagers can be predicted by the time these teenagers spent playing sports (in minutes), the time they spent playing sports as children (in minutes), and their BMI (kg/m2) when they were children. *Hint: you only need to consider one of the independent variables that is significant.*You will need to (i) state the statistical approach you plan to carry out to answer the stated research question, including the significance level, (ii) justify your choice (1 mark) of statistical approach and significance level you planned to use, (iii) state the regression equation that the researcher can use to predict mean values of BMI of the teenagers, (iv) interpret the estimated regression coefficient, (v) comment on the fitness of the regression model and (vi) provide the output. You can assume that the assumptions for the statistical approach you chose to partake are met.

Attachment:- Assignment.rar

Verified Expert

This task provides a clear working example of independent sample t test, binomial distribution, Poisson distribution and normal distribution. Multiple linear regression analysis was used to determine the BMI of teenagers. The time spent in playing sports in childhood and BMI in childhood was also taken in to consideration to predict the BMI of teenagers

Reference no: EM131045548

Questions Cloud

How effective do you think the two are in enabling citizens : How effective do you think those two are in enabling citizens to influence government and public policy (laws)? Which of all the types of political participation do you think is most likely to succeed in affecting public policy and why?
Show how rsa can be used for two-way authentication : Show how RSA can be used for two-way authentication.
Chase hereby places an order with you for fifty cases : Regan received a letter from Chase, the material portion of which stated: ‘‘Chase hereby places an order with you for fifty cases of Red Top Tomatoes, ship them C.O.D.''
Entered into a written contract for the sale : Stein, a mechanic, and Beal, a life insurance agent, entered into a written contract for the sale of Stein's tractor to Beal for $6,800 cash. It was agreed that Stein would tune the motor on the tractor.
What can you conclude about the research hypothesis : STAT6000Health Research Methods. Assuming the assumptions for the statistical test you chose to do are met, what can you conclude about the research hypothesis after you run the statistical analyses
Compare a filtering firewall to a proxy firewall : Assume a configuration as in Figure 8.21, where the internal host under attack trusts the remote company user.
Develop a comprehensive listing of all research sites : Create a forum in which you illustrate your research. Write an essay using the above research and analysis - 500 words. Develop a comprehensive listing of all research sites.
Non-profit organization to establish human resources plan : Imagine you are a consultant working with a start-up non-profit organization to establish their Human Resources plan. What does the plan look like and why? What are some challenges in the current workforce that you are likely to face, and should c..
Expected dividend growth rate and discount rate : Trust Bankers just paid an annual dividend of $1.8 per share. The expected dividend growth rate is 6.2 percent, the discount rate is 11 percent, and the dividends will last for 4 more years. What is the value of the stock?

Reviews

inf1045548

12/19/2017 5:00:43 AM

Thank you so much for your great support team mates. You looked to my problem even the mistake was from my side not yours. I know you provided correct solutions to me but I just wanted to cross check to avoid any further doubts. thanks a lot I appreciate you help.

inf1045548

12/19/2017 4:59:12 AM

Sorry mate, I think this one is it? Pls et me know if it has the data you need. 25158494_12017 STATA.dta Sorry for the confusion - probably mainly on my side. 1. I need to clarify what data set was used for the solution sent to me. ie: I was asked for a "data set" After locating the "data set" I sent it through, not realising that "the solution" had already been sent to me. I need to understand categorically if the "data set" is exactly the same as the data set used in the solution sent to me. Can you please help, I am missing some data. I need the software (stata) outputs generated to answer Questions 2,4c,5a,6b,6c,7c,7d,8c,9b,10b

Write a Review

Basic Statistics Questions & Answers

  The amount of time that a custome spends waiting at an

the amount of time that a custome spends waiting at an airport check-in counter is a random variable with mean 8.3

  Order the numbers in increasing order

You randomly choose an integer from 0 to 9; what are the odds that the integer is 3 or more?

  Explain which requirements are met and which are not

A student drilled a hole in a die, loaded it with lead weight, and then rolled the die 200 times. The observed frequencies for the outcomes of 1, 2, 3, 4, 5, and 6 are respectively 27, 31, 42, 40, 28, and 32. Would you be able to conduct a goo..

  T-test critical value assuming the variances are equal

A medical researcher is interested in whether patient's left arms or right arms are longer. if 14 patients participate in this study (so that n left arms and n right arms are measured), how many degrees of freedom should the researcher use in her ..

  Calculation a 90 confidence interval for the unknown

hoping to lure more shoppers downtown a city built a new public parking garage in the central business distri. the city

  Find mean income using a standard deviation

She surveys 100 randomly selected families and finds the mean income to be $96,321 with a standard deviation of $9555.With a0.05, is she correct?

  Determine probability that pineapple weighs more

Assume the single pineapple is selected at random from last year's crop. Determine the probability that it weighs more than 35 ounces?

  Calculate the mean median and mode measures

Calculate the mean, median and mode measures of central tendency for the percentage of gold content. Which measure does the best job of describing central tendency for this variable?

  Chi-square statistic tests for goodness of fit-independence

Gender differences in dream content are well documented (Winget & kramer, 1979). Suppose a researcher studies aggression content in the dreams of men and women.

  Find least-squares regression line for predicting yield

Find the least-squares regression line for predicting yield from planting rate and add this line to your plot. Choose the correct equation for the least-sqruares regression line.

  The pew research center social & demographic trends project

The Pew Research Center’s Social & Demographic Trends project found that 46% of U.S. adults would rather live in a different type of community than the one where they are living now (Pew Research Center, January 29, 2009). The national survey of 2260..

  Using symbols state hypotheses for two-tailed test

Using symbols state hypotheses (H 0 and H 1 ) for two-tailed test. Sketch suitable distribution and locate critical region for a = 0.05.

Free Assignment Quote

Assured A++ Grade

Get guaranteed satisfaction & time on delivery in every assignment order you paid with us! We ensure premium quality solution document along with free turntin report!

All rights reserved! Copyrights ©2019-2020 ExpertsMind IT Educational Pvt Ltd