##### Reference no: EM131444857

**Question: **In this simulation exercise we consider an example of the use of the bootstrap in constructing an interval estimate of the median. If the median is taken as a measure of location of a distribution f, this can be estimated by the sample median. For a random sample of size n, the sample median has a standard deviation of (2f(m)√n)^{-1} where m is the median of the density f. When the distribution f is unknown, this expression cannot be used to construct an interval estimate.

a. Show that for random samples from the normal distribution the standard deviation of the sample median is σ√∏/√2n whereas the standard deviation of the sample mean isσ/√n Comment on these results.

b. Show that for random samples from the Cauchy distribution (that is, the t(1) distribution) the standard deviation of the sample mean does not exist, but that the standard deviation of the sample median is finite and equal to ∏/(2√n).

c. Simulate a data set of n ¼ 1000 observations y1,.., y1000 by independent drawings from the t(1) distribution.

d. Use the bootstrap method (based on the data of c) to construct a 95% interval estimate of the median, as follows. Generate a new set of 1000 observations by IID drawings from the bootstrap distribution and compute the median. Repeat this 10,000 times. The 95% interval estimate of the median can be obtained by ordering the 10,000 computed sample medians. The lower bound is then the 251st value and the upper bound is the 9750th value in this ordered sequence of sample medians (this interval contains 9500 of the 10,000 medians- that is, 95%).

e. Compute the standard deviation of the median over the 10,000 simulations in d, and compare this with the theoretical standard deviation in b.

f. Repeat c 10,000 times. Construct a corresponding 95% interval estimate of the median and compare this with the result in d. Also compute the standard deviation of the median over these 10,000 simulations and compare this with the result in b.

g. Comment on the differences between the methods in d and f and their usefulness in practice if we do not know the true data generating process.