Disease Y Study: Let us consider the Disease Y study, a brief description of the study is as follows. A case-control study was conducted of 75 cases with a rare genetic disorder "Disease Y", and 134 controls without the disease. The cases were patients recruited through a hospital clinic that provides treatment for the disease and the controls were randomly selected from patients without Disease Y who attended the same clinic for treatment of other diseases. All study participants provided a blood sample and completed a lifestyle and demographic questionnaire.
The objective was to find lifestyle factors that may relate to the risk of Disease Y and our analysis will focus mainly on regular smoking. Two measurements were made on each of the blood samples and are well-known to be related to risk of developing the disease.
The variables in the dataset are as follows:
ID Unique identifier for each study participant
AGE Age in years
GENDER 0=Male, 1=Female
M1 Measurement 1
M2 Measurement 2
GROUP 0=Case, 1=Control
SMOKER 0=No, 1=Yes
In this assignment you will investigate whether smoking is a risk factor for Disease Y. We also investigate whether our data supports the previous findings that measurement 1 (M1) and measurement 2 (M2) are related to the disease. Further, we evaluate whether age and gender are a risk factors for the disease.
Questions:
Question 1: Identify whether each of the variables collected in the study is an example of discrete, continuous, nominal or ordinal.
Question 2: Using an appropriate graph compare patient's age by disease status (case-control). Do you find any age effect on Disease Y?
Question 3: Report appropriate summary statistics (in the following table) for GENDER, M1, M2 and SMOKER and discuss the results (consider the following table as a sample, please make your own table).
Question 4 :
a) Consider the Disease Y Study population. Discuss how do you construct sampling distribution for the mean M1 for the cases in the population?
b) Consider M1 in a random sample of 5 cases: 104, 96, 157, 80, 150. Calculate the sample mean and standard deviation. Quantify the uncertainty in your sample mean. How do you reduce uncertainty in sample results?
c) Discuss the difference between standard deviation and standard error in the context of M1 data in a sample of 5 patients given above.
Question 5:
a) Consider that measurement 2 (M2) in a random sample of 1000 controls from the Disease Y Study population follows the normal distribution. The sample mean of M2 is 90 and 95% of the sample observations falls within 70 and 110. Calculate the sample standard deviation and standard error of the sample mean and interpret.
b) If you randomly select a case from the Disease Y Study population, what is the probability that the patient will have M2 above 115?
Question 6: The 30-day mortality rate for patients undertaken cardiac surgery is 1.7% (30-day mortality: patients who die within 30day of cardiac surgery). In a random sample of 60 patients what is the probability that less than 3 patients will die within 30 days of their cardiac surgery?
Question 7: Consider that the BMI of women in a population between age 17-40 years follows a normal distribution with mean 25kg/m^{2} and standard deviation 5kg/m^{2}. In a random sample of 10 women from this age group what will be the probability that the sample mean BMI will be greater than 30kg/m^{2}?