Reference no: EM132296264
Assignment -
Read all questions carefully and submit your solution in a word file (NO PDF) with R codes. To be done by using R Statistics.
Q1. State whether the following statements are true or false.
a) Total sum of probability is 1.
b) Accepting null when it is false is type I error.
c) Normal distribution is symmetric.
d) It is possible to have more than one mode in the data.
e) Standard normal variable, Z has no unit.
f) Large p-values indicate the rejection of H0.
g) Smaller coefficient of variation (CV) indicates less spread data.
h) Central Limit Theorem assures the normality of the distribution of sample mean.
Q2. Following data represents the score obtained by students in one of the exams 10, 33, 14, 15, 16, 21, 16, 17, 19, 20, 21, 21, 22, 23, 24, 25. Calculate the followings-
(a) Mean, Median, and Mode
(b) First quartile, third quartile and IQR.
(c) Calculate CV.
(d) Construct Box and Whisker plot.
(e) Comment on the shape of the distribution.
(f) Identify any possible outlier?
Q3. A group of 50 students at a school takes a history test. The distribution of exam score is normal with a mean of 25, and a standard deviation of 4.
(a) What percentage of students scored more than 28?
(b) How many students scored between 28 and 31?
(c) Everyone who scores in the top 30% of the distribution gets a certificate. What is the lowest score someone can get and still earn a certificate?
Q4. (a) Create a R data, rcr, by importing RCR.csv. (If you are unable to import this data and need code, let me know. If I provide the code, you don't get 2 points for this part of the problem).
Here is the description of RCR.csv data.
This data is a part of a research data whose purpose is to investigate the factors that may influence the outcomes and patient satisfaction following rotator cuff (RC) repair. The given data consist of patient information who underwent arthroscopic Rotator Cuff Repair (RCR) performed over a 3-year period (2012 - 2014) and had a minimum of 2-year follow-up are included. Multiple outcome measures and variables were assessed. List of variables represented in this data is shown below.
Variables
|
Description
|
record_i
|
Patient ID (Identifier)
|
outcome
|
Patient Satisfaction Score
|
fullreco
|
Full Recovery Status, 1- Yes, 0- No
|
mumford
|
Mumford Procedure, 1- Yes, 0- No
|
asascore
|
Derived ASA Score, 1- normal health, 2- some health problems, 3- poor health
|
agegrp
|
Age group, 1- <55 yrs old, 2- ≥ 55 yrs. old
|
pain_cat
|
Pain Category, 0- No pain, 1- some pain, 2- severe pain
|
a) Create a descriptive summary table for <55 yrs. and ≥55 yrs. age groups.
Here is the template for descriptive summary (notice that, for continuous variables, you need to report mean and standard deviation and for qualitative data, you need to report count and percentage. Round values to 2 decimal places if not exact. The following is just a template, not the exact table.
Variables
|
Age Group
|
<55 yrs.
|
|
≥ 55 yrs.
|
|
Mean OR count(%)
|
sd
|
Mean OR count(%)
|
sd
|
n (sample size)
|
?
|
|
?
|
|
quantitative_var
|
123.45
|
12.34
|
134
|
29.3
|
Activity
|
|
|
|
|
Low
|
120 (60%)
|
|
240 (80%)
|
|
Medium
|
70 (35%)
|
|
45 (15%)
|
|
High
|
10 (5%)
|
|
55 (5%)
|
|
AND SO ON
|
|
|
|
|
b) Create pie charts of derived ASA score by age group.
c) Create box plots of patient satisfaction score by age group.
d) Carry out a hypothesis test (two sample t-test) to see if the patient satisfaction score differs by age group. Show all five steps-
a. Define H0 and H1.
b. Determine the level of significance.
c. Test statistic.
d. P-value.
e. Conclusion.
e) What is the CI of the difference in average satisfaction scores between two age groups? Interpret this CI in the context of given problem.
f) Carry out a hypothesis test to see if there is any association between derived ASA score and age group. Show all five steps as in part (d).
Q5. Use the same data (rcr) from Q4 and answer the followings:
a) Is this study cohort or case-control?
b) Calculate the OR of full recovery by age groups.
(Hint: table( ) function in R is useful to see the frequency of each group or use your descriptive table to see the frequency).
c) Is there significant difference in full recovery rates (%) between <55 and ≥55 yrs. old patients. Just report the p-value and write the conclusion. (Hint: Use appropriate test to see if the percentage of full recovery differs in two age groups).
Q6. Early diagnosis of breast cancer is very important to successfully treat this fatal disease. The following table shows the diagnosis and actual status of breast cancer in a certain facility. Out of 165 individuals who actually had breast cancer, 140 were diagnosed early and out of 170 individuals who didn't have breast cancer, 10 of them were misdiagnosed as having breast cancer in early screening. Complete the following table and calculate the followings.
|
Actual Statius (Breast Cancer)
|
No
|
Yes
|
|
Negative
|
?
|
?
|
Positive
|
?
|
?
|
I. Calculate and interpret the sensitivity and specificity of this test.
II. What are the positive and negative predictive values? How do you interpret them?
III. Calculate false discovery rate and false omission rates and interpret them.
IV. What is the level of significance (false positive rate) of this test?
V. What is the power of this test? What is type II error rate?
Attachment:- Assignment Files.rar