Calculate the likelihood equations

Assignment Help Applied Statistics

Reference no: EM132324315

Principles of Statistical Inference (PSI) Assignment -

Question 1 - Neurofibromatosis Type 1 (NF1) is a human genetic disorder. As well as physical symptoms, affected children often suffer from impaired cognition and learning. A learning task that involves recognising and remembering the location of patterns on a screen is administered. If the child makes an error the task is presented again, and the number of attempts recorded. We are interested in estimating the population mean number of unsuccessful attempts before solving the task correctly in children with NF1 and in healthy controls.

Although the Poisson distribution is often used for statistical models of count data, data which exhibit greater than expected variability ("overdispersion") may be modelled by the negative binomial distribution, which has probability function

f_X(x) = P (X = x) = (Γ(k+x))/(Γ(x+1)Γ(K)) (μ/(k+μ))^x(k/(k+μ))^k

where

x = 0, 1, 2,...

μ > 0 is the mean

k > 0 is known as the dispersion parameter

Assume that n₁ typically developing "control" children are each given this task, and the number of unsuccessful attempts taken by child i is described by the random variable X_i, which has a negative binomial distribution with mean μ₁. A further n₂ children with NF1 are given the task, and their number of unsuccessful attempts are described by the random variable Y_i, which has a negative binomial distribution with mean μ₂.

The data in the file "PSI Ass 2 Semester 2 2019 data.xlsx" in sheet "NF1" are the observed values from attempting this task in two groups of children:

x_i for n₁ = 42 control children without NF1
y_i for n₂ = 107 children with NF1.

When evaluating with the above data, assume that the dispersion parameter k = 10 and is the same in both populations. However, derive all results in general terms for any value of k before evaluating numerical results for this known value of k.

We wish to test whether the population means are equal, i.e. to test the null hypothesis

H₀: μ₁ = μ₂ versus H₁: μ1 ≠ μ2

Carry out a likelihood ratio test of H₀: μ₁ = μ₂

a) Write down the log-likelihood for the full model, calculate the likelihood equations and find the general form of the MLEs for μ₁ and μ₂. Compute the MLEs for the observed data above. Obtain the maximum value achieved by the log-likelihood in the full model [the constant term should be omitted in this calculation].

b) Write down the likelihood function in the reduced model, i.e. under the assumption that a common parameter μ = μ₁ = μ₂ can be used to describe the number of unsuccessful attempts in both populations. Derive the MLE for μ, first in general terms and then for the data above. Obtain the maximum value achieved by the log-likelihood under the reduced model [as above, the constant must be omitted].

c) Using your results from parts (a) and (b) write down the likelihood ratio test statistic for testing the null hypothesis H₀: μ₁ = μ₂, evaluate the test statistic and compute its p-value. What do you conclude about the mean number of unsuccessful attempts in the two populations?

Carry out a Wald test of H₀ = μ₁ = μ₂.

d) Compute the expected information and the asymptotic variance-covariance matrix of the MLE's in the full model. (In general terms, i.e. don't substitute in data at this stage).

Consider the new parameter δ = μ₁ - μ₂ in the full model; then testing H₀ is equivalent to testing δ = 0.

e) Using your answers for the previous questions only (do not rewrite the log-likelihood and derive these parameters), give the MLE for δ and its standard error in general terms. Justify your answers.

f) Calculate the MLE and its standard error for the data above and calculate a 95% confidence interval for δ.

g) Give the formula for an approximate Wald test of δ = 0. Calculate the value of the test statistic and its associated p-value, using the data. What can you conclude about the mean number of unsuccessful attempts in the two populations using only this information?

h) Are your conclusions from the two hypothesis tests consistent with each other?

Question 2 - In Phase I clinical trials the maximum tolerated dose (MTD) of a drug treatment is often chosen by first finding the dose at which no more than one patient in a cohort of six experiences any dose limiting toxicities (DLT). The recommended dose for further development is often then set to a dose below the MTD.

Assuming that the number (X) of patients experiencing a DLT in a group of six patients follows a binomial distribution with parameter p

X ∼ Bin(6, p)

a) Write down the probability that at most one patient experiences a DLT.

b) Tabulate this probability for values of p = 0.05, 0.10,..., 0.95.

How high does the parameter ???? need to be before the probability of at most one patient with DLT being observed becomes < 0.1? Use the above values only - you do not need to calculate it exactly.

c) What conclusion could you make about the underlying rate of DLT if we observe at most one patient with DLT in a cohort of 6 patients?

For a particularly severe outcome (toxic death), we wish to be confident that the population rate (p_t) is at most 10%. Such an event is highly unlikely to be observed in a cohort of 6 patients if p_t is as low as 10%, so the investigators propose a Bayesian monitoring rule for the next study. This is designed to trigger stopping of the trial if the posterior probability p_t > 0.1 exceeds 75%.

We assume a prior Beta(1, 3) distribution for p_t and assume that the number of toxic deaths follows a Bin(n, p_t) distribution.

d) Suggest why a Beta prior distribution has been chosen.

e) What is the prior probability that p_t > 0.1?

f) Give the posterior distribution for p_t if ???? toxic deaths are observed in the first n patients.

g) If no toxic deaths are observed in the first 10 patients, what is the posterior probability that p_t > 0.1? Suppose that the next two patients (i.e. patients 11 and 12) both experience toxic death. Would you consider stopping the study?

h) Tabulate the posterior probabilities that p_t > 0.1 for 2/20, 4/40, 6/60, 8/80 and 10/100 observed toxic deaths.

What do you notice about these probabilities? Discuss this in terms of the behaviour of the posterior distribution as data accumulate.

i) Plot the prior distribution and the posterior distribution after 10/100 observed toxic deaths on a single graph.

j) If the study had continued to observe this number of events (10/100) what would you conclude about the prevalence of toxic death in the study?

Question 3 - The six minute walk test is used as a measure of exercise tolerance in a number of medical conditions. It involves walking as far as possible during a six minute time period on a flat straight track of length 30 metres. An exercise physiologist wishes to perform this test on a group of 48 children with Perthes disease.

It is not known whether there is any suitable parametric model for the walk distances so we will investigate non-parametric methods.

The data in the file "PSI Ass 2 Semester 2 2019 data.xlsx" in sheet "SixMWT" are the observed distances walked by the 48 children.

a) Calculate appropriate summary statistics and thus give the parameters for a normal distribution that may be applicable to these data.

b) Using the observed data, calculate the empirical distribution function. Plot the empirical distribution function and the CDF of the normal distribution described in (a) on a single graph.

c) Do you think the normal distribution is an appropriate model for the data? Justify your answer.

The mean six-minute walk distance in healthy control children is 503 metres.

Carry out the Wilcoxon signed-rank test on these data to test the null hypothesis that the mean walk distance for children with Perthes disease is the same as for healthy controls.

d) Calculate the value of the test statistic and give the approximate normal distribution of the test statistic under the null hypothesis.

e) Calculate the p-value for the test assuming a two-sided alternative hypothesis. Interpret the p-value.

f) What do you conclude about the mean walk distance for children with Perthes disease compared to healthy controls?

g) Describe in a few sentences how you would calculate a 95% confidence interval for the mean distance without assuming any particular parametric model for the data. You do not need to calculate the confidence interval.

Attachment:- Assignment File.rar

Reference no: EM132324315

Questions Cloud

Prepare a draft related to your site using given details : Initial Draft - For this assignment, you're going to begin to work on your site. Based on your storyboard and client feedback (professor's comments).

Write a program to correctly import the data : STA 581 Programming Project - Write a program to correctly import the data as an .xls file and create a SAS data file, named 'weight_mult1'

Explain the importance of documentation in forensic analysis : Decide whether software-generated reports assist with this specific portion of the report writing process and provide a rationale for your response.

How the information could potentially be used as evidence : Describe the information that can be discovered in email headers and determine how this information could potentially be used as evidence in the investigation.

Calculate the likelihood equations : Principles of Statistical Inference - Write down the log-likelihood for the full model, calculate the likelihood equations and find the general form of the MLEs

Discuss recommendation using a corporate blog for branding : Identify and briefly discuss one recommendation that should be followed when using a corporate blog for branding, marketing, or public relations purposes.

Creating a community health promotion resource : Imagine you are creating a community health promotion resource that addresses a disease of your choice. You are creating this resource for the general public.

Develop product service idea by social media and networking : Using social media and Networking: Develop a product service idea. Describe the product/service including the benefits of using the product/service.

Identify missing phrase to complete granular definition : To further enhance our knowledge and understanding of RM, ISO provided a more refined definition of RM to a granular level as "[the] field of management.

User Account

All Pages