Calculate the likelihood equations

Assignment Help Applied Statistics
Reference no: EM132324315

Principles of Statistical Inference (PSI) Assignment -

Question 1 - Neurofibromatosis Type 1 (NF1) is a human genetic disorder. As well as physical symptoms, affected children often suffer from impaired cognition and learning. A learning task that involves recognising and remembering the location of patterns on a screen is administered. If the child makes an error the task is presented again, and the number of attempts recorded. We are interested in estimating the population mean number of unsuccessful attempts before solving the task correctly in children with NF1 and in healthy controls.

Although the Poisson distribution is often used for statistical models of count data, data which exhibit greater than expected variability ("overdispersion") may be modelled by the negative binomial distribution, which has probability function

fX(x) = P (X = x) = (Γ(k+x))/(Γ(x+1)Γ(K)) (μ/(k+μ))x(k/(k+μ))k

where

x = 0, 1, 2,...

μ > 0 is the mean

k > 0 is known as the dispersion parameter

Assume that n1 typically developing "control" children are each given this task, and the number of unsuccessful attempts taken by child i is described by the random variable Xi, which has a negative binomial distribution with mean μ1. A further n2 children with NF1 are given the task, and their number of unsuccessful attempts are described by the random variable Yi, which has a negative binomial distribution with mean μ2.

The data in the file "PSI Ass 2 Semester 2 2019 data.xlsx" in sheet "NF1" are the observed values from attempting this task in two groups of children:

  • xi for n1 = 42 control children without NF1
  • yi for n2 = 107 children with NF1.

When evaluating with the above data, assume that the dispersion parameter k = 10 and is the same in both populations. However, derive all results in general terms for any value of k before evaluating numerical results for this known value of k.

We wish to test whether the population means are equal, i.e. to test the null hypothesis

H0: μ1 = μ2 versus H1: μ1 ≠ μ2

Carry out a likelihood ratio test of H0: μ1 = μ2

a) Write down the log-likelihood for the full model, calculate the likelihood equations and find the general form of the MLEs for μ1 and μ2. Compute the MLEs for the observed data above. Obtain the maximum value achieved by the log-likelihood in the full model [the constant term should be omitted in this calculation].

b) Write down the likelihood function in the reduced model, i.e. under the assumption that a common parameter μ = μ1 = μ2 can be used to describe the number of unsuccessful attempts in both populations. Derive the MLE for μ, first in general terms and then for the data above. Obtain the maximum value achieved by the log-likelihood under the reduced model [as above, the constant must be omitted].

c) Using your results from parts (a) and (b) write down the likelihood ratio test statistic for testing the null hypothesis H0: μ1 = μ2, evaluate the test statistic and compute its p-value. What do you conclude about the mean number of unsuccessful attempts in the two populations?

Carry out a Wald test of H0 = μ1 = μ2.

d) Compute the expected information and the asymptotic variance-covariance matrix of the MLE's in the full model. (In general terms, i.e. don't substitute in data at this stage).

Consider the new parameter δ = μ1 - μ2 in the full model; then testing H0 is equivalent to testing δ = 0.

e) Using your answers for the previous questions only (do not rewrite the log-likelihood and derive these parameters), give the MLE for δ and its standard error in general terms. Justify your answers.

f) Calculate the MLE and its standard error for the data above and calculate a 95% confidence interval for δ.

g) Give the formula for an approximate Wald test of δ = 0. Calculate the value of the test statistic and its associated p-value, using the data. What can you conclude about the mean number of unsuccessful attempts in the two populations using only this information?

h) Are your conclusions from the two hypothesis tests consistent with each other?

Question 2 - In Phase I clinical trials the maximum tolerated dose (MTD) of a drug treatment is often chosen by first finding the dose at which no more than one patient in a cohort of six experiences any dose limiting toxicities (DLT). The recommended dose for further development is often then set to a dose below the MTD.

Assuming that the number (X) of patients experiencing a DLT in a group of six patients follows a binomial distribution with parameter p

X ∼ Bin(6, p)

a) Write down the probability that at most one patient experiences a DLT.

b) Tabulate this probability for values of p = 0.05, 0.10,..., 0.95.

How high does the parameter ???? need to be before the probability of at most one patient with DLT being observed becomes < 0.1? Use the above values only - you do not need to calculate it exactly.

c) What conclusion could you make about the underlying rate of DLT if we observe at most one patient with DLT in a cohort of 6 patients?

For a particularly severe outcome (toxic death), we wish to be confident that the population rate (pt) is at most 10%. Such an event is highly unlikely to be observed in a cohort of 6 patients if pt is as low as 10%, so the investigators propose a Bayesian monitoring rule for the next study. This is designed to trigger stopping of the trial if the posterior probability pt > 0.1 exceeds 75%.

We assume a prior Beta(1, 3) distribution for pt and assume that the number of toxic deaths follows a Bin(n, pt) distribution.

d) Suggest why a Beta prior distribution has been chosen.

e) What is the prior probability that pt > 0.1?

f) Give the posterior distribution for pt if ???? toxic deaths are observed in the first n patients.

g) If no toxic deaths are observed in the first 10 patients, what is the posterior probability that pt > 0.1? Suppose that the next two patients (i.e. patients 11 and 12) both experience toxic death. Would you consider stopping the study?

h) Tabulate the posterior probabilities that pt > 0.1 for 2/20, 4/40, 6/60, 8/80 and 10/100 observed toxic deaths.

What do you notice about these probabilities? Discuss this in terms of the behaviour of the posterior distribution as data accumulate.

i) Plot the prior distribution and the posterior distribution after 10/100 observed toxic deaths on a single graph.

j) If the study had continued to observe this number of events (10/100) what would you conclude about the prevalence of toxic death in the study?

Question 3 - The six minute walk test is used as a measure of exercise tolerance in a number of medical conditions. It involves walking as far as possible during a six minute time period on a flat straight track of length 30 metres. An exercise physiologist wishes to perform this test on a group of 48 children with Perthes disease.

It is not known whether there is any suitable parametric model for the walk distances so we will investigate non-parametric methods.

The data in the file "PSI Ass 2 Semester 2 2019 data.xlsx" in sheet "SixMWT" are the observed distances walked by the 48 children.

a) Calculate appropriate summary statistics and thus give the parameters for a normal distribution that may be applicable to these data.

b) Using the observed data, calculate the empirical distribution function. Plot the empirical distribution function and the CDF of the normal distribution described in (a) on a single graph.

c) Do you think the normal distribution is an appropriate model for the data? Justify your answer.

The mean six-minute walk distance in healthy control children is 503 metres.

Carry out the Wilcoxon signed-rank test on these data to test the null hypothesis that the mean walk distance for children with Perthes disease is the same as for healthy controls.

d) Calculate the value of the test statistic and give the approximate normal distribution of the test statistic under the null hypothesis.

e) Calculate the p-value for the test assuming a two-sided alternative hypothesis. Interpret the p-value.

f) What do you conclude about the mean walk distance for children with Perthes disease compared to healthy controls?

g) Describe in a few sentences how you would calculate a 95% confidence interval for the mean distance without assuming any particular parametric model for the data. You do not need to calculate the confidence interval.

Attachment:- Assignment File.rar

Reference no: EM132324315

Questions Cloud

Prepare a draft related to your site using given details : Initial Draft - For this assignment, you're going to begin to work on your site. Based on your storyboard and client feedback (professor's comments).
Write a program to correctly import the data : STA 581 Programming Project - Write a program to correctly import the data as an .xls file and create a SAS data file, named 'weight_mult1'
Explain the importance of documentation in forensic analysis : Decide whether software-generated reports assist with this specific portion of the report writing process and provide a rationale for your response.
How the information could potentially be used as evidence : Describe the information that can be discovered in email headers and determine how this information could potentially be used as evidence in the investigation.
Calculate the likelihood equations : Principles of Statistical Inference - Write down the log-likelihood for the full model, calculate the likelihood equations and find the general form of the MLEs
Discuss recommendation using a corporate blog for branding : Identify and briefly discuss one recommendation that should be followed when using a corporate blog for branding, marketing, or public relations purposes.
Creating a community health promotion resource : Imagine you are creating a community health promotion resource that addresses a disease of your choice. You are creating this resource for the general public.
Develop product service idea by social media and networking : Using social media and Networking: Develop a product service idea. Describe the product/service including the benefits of using the product/service.
Identify missing phrase to complete granular definition : To further enhance our knowledge and understanding of RM, ISO provided a more refined definition of RM to a granular level as "[the] field of management.

Reviews

Write a Review

Applied Statistics Questions & Answers

  A real estate agent wishes to determine the selling price of

A real estate agent wishes to determine the selling price of residences using the size (square feet), and whether the residence is a condominium or a single-family home. A sample of 20 residences was obtained with the following results: x 233 10 24 5..

  Find the percentile that corresponds to each life span

(a) The life spans of three randomly selected fruit flies are 34 days, 30 days, and 42 days. Find the z-score that corresponds to each life span. Determine whether any of these life spans are unusual. (b) The life spans of three randomly selected fru..

  Part-1- first reset the lower limit to zero and the upper

part-1- first reset the lower limit to zero and the upper limit to 1000 and then click update.- now put 6 points

  A researcher is interested in gender differences

A researcher is interested in gender differences in attitudes toward flying

  What percentage of the discount chains female employees have

If 25 percent of the discount chain's employees have a management position, what percentage of the discount chain's female employees have a management position?

  Calculate standard error of the mean

Calculate standard error of the mean. Calculate the margin of error at an 80% confidence level. Calculate the confidence interval at an 80% confidence level.

  Explain why the times are a population

Explain why the times are a population - Find the mean and median of the times and Find the range and the standard deviation of the times

  Determine the factor associated with this experiment

In conjunction with the housing foreclosure crisis of 2009, many economists expressed increasing concern about the level of credit card debt and efforts of banks to raise interest rates on these cards. The banks claimed the increases were justified. ..

  Find the amounts of recycling material at collecting points

Find the amounts of recycling material at collecting points according to the population or something else? The number of population is about 21, 850 people.

  Testing hypotheses for means

Research Design and Statistical Design and the Skill Builder: Hypothesis Testing for Independent Samples t-test, which you can find by navigating back to your

  1 find the equation of the regression line for the given

1. find the equation of the regression line for the given data.nbsp what is the predicted value of y when x -2?nbsp

  By using r code

By using R code, Data on last year's sales ( y, in 100,000s of dollars) in 15 sales districts are given in the file sales. This file also contains promotional expenditures(x1, in thousands of dollars), the number of active accounts(x2), the number of..

Free Assignment Quote

Assured A++ Grade

Get guaranteed satisfaction & time on delivery in every assignment order you paid with us! We ensure premium quality solution document along with free turntin report!

All rights reserved! Copyrights ©2019-2020 ExpertsMind IT Educational Pvt Ltd