Explain the shape of all the three power functions

Assignment Help Advanced Statistics
Reference no: EM131747086

1. Let X1,. .., Xn be iid N(θ, θ2) where θ > 0 is an unknown parameter. Consider the following estimators of θ:
• The sample mean T1(X1,. .., Xn) = X¯
• The modified sample standard deviation

T2(X1,. .., Xn) = kn S

where

S2 = 1/n-1 ∑ni=1 (X - X¯ )2 and kn = (√(n - 1) Γ(n-1/2))/(√2Γ(n/2)

and Γ is the Gamma function.
• The combined estimator

T3 (X1,. .., Xn) = (T1(X1,. .., Xn) + T2(X1,. .., Xn))/2

• The maximum likelihood estimator

T4(X1,. .., Xn) = √(X¯2 + 4/n∑ni=1 Xi2 - X¯ )/2

• The sample mean absolute deviation

T5 (X1,. .., Xn) = 1/n∑ni=1 |Xi - X¯|

• The sample median

T6(X1,. .., Xn) = inf{x ∈ R : F^(x) ≥ 0.5}
where F^(x) is the empirical cumulative distribution function (see Definitions 12 and 13 of the Lecture notes). In R you can compute this estimator by using quantile(x,probs=0.5,type=1)

1.1 Show that θ is scale parameter and then show that

Zi = Ti

is a pivot for all i ∈ {1, 2, 3, 4, 5, 6}.

1.2 Use the pivots Z1,. .., Z6 to construct 95% confidence intervals for θ and show the expressions for the endpoints of the six intervals as a function of Ti and the quantiles of the sampling distributions of Zi when n = 30. To estimate those quantiles, you can use the Monte Carlo method with r = 10000.

1.3 Using n = 30 again, plot (in a single figure) the expected length of each of the confidence intervals as a function of θ in the interval (0, 5). Which interval is shortest on average for all values of θ ∈ (0, 5)? How does the combined estimator T3 compared to T1 and T2 in terms of expected length? To estimate the expected lengths, you can use the Monte Carlo method with r = 10000.

1.4 For n = 30, plot (in a single figure) the mean square error (as a function of θ and in the interval (0, 5)) for all the six estimators.

2. Let X1,. .., Xn be iid continuous random variables with density

f(x|µ, ∈) = (1 - ∈) Φ(x - µ) + ∈Φ (x - (µ + 10))
where Φ is the standard normal density. This density has the interpretation of being the mixture of two densities: one (say f1(x|µ) = Φ(x - µ)) corresponding to a normal distribution with mean µ and variance one (with weight (1 - ∈)) and another density (say f2(x µ) = Φ(x µ 10)) corresponding to a normal with mean µ+10 and variance equal to one (with weight ∈).

This family of densities is commonly used to model observations in the presence of outliers or aberrant observations. The idea is that we cannot observe directly form the first density f1 which has mean µ and where µ is the actual parameter of interest. We can only observe samples which contain observations from f1 but also some outlying observations we are not really interested in. These outlying observations follow a density with a notoriously diαerent mean which in this case is µ + 10. The weight ∈ is the average proportion of outlying observations in the sample. The main problem is that we cannot distinguish between an unusally large observation from f1 from an outlying observation from f2. They are all mixed-up and we need to decide what to do to minimize the influence of such outlying observations.

The following R function rand.f return random samples from the density f(x µ, ∈). To illustrate the shape of the densities, we have generated 10000 observations from f(x µ, ∈) with µ = 0 and ∈ = 0.1 and then plotted the corrresponding histogram. We can see that indeed there is a small proportion of the observations (10% on average) that have a mean 10 units larger than the mean of zero of the observations from f1. We need to modify our current known estimates of µ in order to deal with outliers and avoid bias. In Statistics, we say that we need a robust procedure.

rand.f<-function(n=10,mu=0,epsilon=0.1){
if (epsilon==0){
x<-rnorm(n,mean=mu,sd=1)
}else{
x<-rep(0,n)
for (i in 1:n){
u<-rbinom(1,size=1,prob=epsilon)
if (u==0){
x[i]<-rnorm(1,mean=mu,sd=1)
}else{
x[i]<-rnorm(1,mean=(mu+10),sd=sqrt(1))
}
}
}
return(x)
}
xx1<-rand.f(n=10000,mu=0,eps=0.1)
hist(xx1,50,freq=F,xlim=c(-3,12),xlab="x",main="f(x|0,0.1)")

f(x|0,0.1)

347_figure.jpg

We consider the following three estimators of the parameter of interest µ.

The sample mean X¯ . This estimator of µ is biased upwards since the outlying observations are pulling the arithmetic mean above the target µ.
The sample median as defined in the previous question. This estimator is more robust than the mean to the presence of outliers since it does not take into account the magnitude of the outlying observations.

• The p% upper trimmed mean defined as:

µ˜p =  (X(1) + X(2) + .. . + X(N))/N

where X(i) are the order statistics X(1) < X(2) < and N = ceiling((1-p) n). This estimator of µ removes the p% largest observations from the sample and then takes the arithmetic average of the remaining.

We want to test the hypotheses:

H0 : µ = 0 vs H1 : µ> 0

and we consider the following critical regions:

C1 = {(x1,. .., xn) : x¯ > k1} (1)
C2 = {(x1,. .., xn) : median(x1,. .., xn) > k2} (2)
C3 = {(x1,. .., xn) : µ˜p > k3} (3)

2.1 Using the Monte Carlo method with r = 10000, obtain the values of k1, k2 and k3 to perform the tests at a significance level of α = 0.05 when n = 30, ∈ = 0.1. For C3 you should use p = 0.3.

2.2 Plot the power functions (in a single figure) as a function of µ in the interval (0, 2). Explain the shape of all the three power functions. What happens when we decrease the value of p to p = 0.15 and if we increase to p = 0.5?

3. Let X be a continuous and positive random variable with probability density function g(x|θ) where θ > 0 is an unknown parameter. We do not know the functional form of g and the only thing we know about g is how to simulate random samples from g for given values of θ. We will use the R function rg for this purpose where the function can be downloaded as follows

The function rg has two arguments: n and theta which specify the sample size and the parameter value θ, respectively, and the function returns a generated random sample of size n. To simulate a random sample of size n = 5 from g(x|θ = 1) we can simply use

x.sample<-rg(n=5,theta=1)

x.sample
## [,1]
## [1,] 0.5718220
## [2,] 1.3856786
## [3,] 0.4198774
## [4,] 1.0478892
## [5,] 0.6905807

Suppose we are interested in the simple null hypothesis

H0 : θ = θ0

for some known θ0 > 0 and that we are given a test statistic S in the form of an R function

The function S has two arguments: x and theta0 where x is a vector of observations and theta0 is the value of θ under the null hypothesis. For example, if we are interested in the simple null hypothesis

H0 : θ = 1

and we have available the observed random sample x.sample generated above, then the value of the test statistic is given by

S(x=x.sample, theta0=1)

## [1] 0.2236653

Let X1,. .., Xn be a random sample from the density g(x|θ) where the parameter θ > 0 is unknown. We want to test the hypotheses

H0 : θ = 1 vs. H1 : θ ≠ 1

3.1 How would you use the given test statistic S in order to construct a sensible critical region with a significance level α = 0.05 when n = 30. You must provide the specific form of the critical region as a subset of Rn, in terms of the statistic S and not in terms of unspecified constants. You must use Monte Carlo with r = 10000.

3.2 Using n = 30 and the Monte Carlo method with r = 10000, plot the corresponding power function as function of θ in the interval θ ∈ (0, 2). Interpret the plot by answering questions like: Why is the power function large or small in diαerent regions of θ.

3.3 Plot histograms of samples of size n = 10000 for θ = 1, 5, 10, 15, 20. Also (in a diαerent figure) plot both the expected value and the variance of X under the density g as a function of θ in the interval (0, 6). How does θ changes the density of the observations?

3.4 Propose a new statistic T and show the corresponding critical region to test the hypotheses

H0 : θ = 1 vs. H1 : θ ≠ 1

with a significance level α = 0.05 when n = 30. You must provide the specific form of the critical region as a subset of R30, in terms of the statistic T and not in terms of unspecified constants. You must use Monte Carlo with r = 10000. Plot the power of your proposed test in the interval θ ∈ (0, 2). How does the power of the test based on T compare to the one defined by S?

3.5 Given the data

x.sample<-c(1.373,2.29,1.434,0.747,1.45,1.912,0.405,0.607,0.515,0.548)

use the critical region based on your proposed statistic T to test the hypotheses

H0 : θ = 1 vs. H1 : θ ≠ 1
with a significance level of α = 0.1. What can you conclude?

Reference no: EM131747086

Questions Cloud

Taxes and forestland usage : Taxes and forestland usage. A study was designed to assess the impact of taxes on forestland usage in part of the Upper Wabash River Watershed in Indiana.
Plot the means of the treatments that used humor : The effect of humor. In advertising, humor is often used to overcome sales resistance and stimulate customer purchase behavior. A recent experiment looked.
Explain how the definition of food is a social construct : Explain how the definition of food is a social construct? Why would most cultures eat Cheetos and not a cockroach?
Ringtones for cell phones : You decide to change the ringtones for your cell phone by choosing 2 from a list of the 10 most popular ringtones.17 Here is the list:
Explain the shape of all the three power functions : MA20226 - Explain the shape of all the three power functions - How does the power of the test based on T compare to the one defined by S
Describe the distribution of the response variable : Predicting movie revenue: simple linear regressions. Using numerical and graphical summaries, describe the distribution of the response variable, USRevenue
Is the zara model sustainable : Is the Zara model sustainable? What would you do to preserve their edge over the next 5-10 years,
Select the three songs for your ipod : Listen to three songs. The walk to your statistics class takes about 10 minutes, about the amount of time needed to listen to three songs on your iPod.
Run the multiple regression model : Predicting movie revenue: multiple linear regression. Run the multiple regression model and specify the fitted regression equation

Reviews

Write a Review

Advanced Statistics Questions & Answers

  Relationship between speed, flow and geometry

Write a project proposal on relationship between speed, flow and geometry on single carriageway roads.

  Logistic regression model

Compute the log-odds ratio for each group in Logistic regression model.

  Logistic regression

Foundations of Logistic Regression

  Probability and statistics

The tubes produced by a machine are defective. If six tubes are inspected at random , determine the probability that.

  Solve the linear model

o This is a linear model. If your model needs a different engine, then you need to rethink your approach to the model. Remember, there are no IF, Max, or MIN statements in linear models.

  Plan the analysis

Plan the analysis

  Quantitative analysis

State the hypotheses that you are going to test.

  Modelise as a markov chain

modelise as a markov chain

  Correlation and regression

What are the degrees of freedom for regression

  Construct a frequency distribution for payment method

Construct a frequency distribution for Payment method

  Perform simple linear regression

Perform simple linear regression

  Quality control analysis

Determining the root causes

Free Assignment Quote

Assured A++ Grade

Get guaranteed satisfaction & time on delivery in every assignment order you paid with us! We ensure premium quality solution document along with free turntin report!

All rights reserved! Copyrights ©2019-2020 ExpertsMind IT Educational Pvt Ltd