Reference no: EM132398826
STAT 1301/2300: Statistical Packages, University of Pittsburghs, USA
Problem 1. Probability distributions and graphical displays
Part 1.a. Generate 1000 observations from a normal distribution with mean 100 and standard deviation 15. Call the random number vector norm_vec.
Part 1.b. Create a histogram to show the sample data. Overlay a red theoretical normal density curve on the histogram.
Part 1.c. Calculate the following
(i) The 90th percentile of an F distribution with 5 and 10 degrees of freedom.
(ii) Calculate the probability P(t19 > 3).
(iii) Calculate the probability P(−1.98 < Z < 2.98) where Z ∼ N(0, 1).
(iv) Find the 99th percentile of χ214
Problem 2. Sampling distributions
Part 2.a. Generate 10000 observations from χ2 5 distribution. Save the random numbers in a vector called chisq_vec. Create a histogram for it. Observe the shape, especially the skewness of it.
Part 2.b. Sample 50 observations from chisq_vec without replacement. Call it chisq_samp. Calculate the mean of the sample data. Compare it with the mean of chisq_vec.
Part 2.c. Put chisq_vec in a matrix with 50 rows and 200 columns. Call it chisq_mat.
Part 2.d. Consider each column of chisq_mat as a random sample of size 50 from χ25 distribution.
Now we have 200 samples (columns)! Calculate the mean of each column and save the means in a vector called mean_vec. You may use the apply() function. Type ?apply in R console for details about apply.
Part 2.e. mean_vec actually contains 100 sample means! Now we are able to verify the properties of the sampling distribution of sample mean.
(i) Calculate the mean and standard devation of mean_vec.
Compare them with their theoretical values. Note that for an χ2k distribution, the mean is k and the variance is 2k. And, based on the property of sampling distribution of sample mean, we have the following:
µx¯ =µx
σ2x¯ = (σ2x/n)
where n is the sample size.
(ii) Create a histogram for mean_vec. What shape do you observe? Is it roughly symmetric? Compare it with the one of chisq_vec.
(iii) Overlay an empirical normal curve on the histogram of mean_vec. In order to do so, you need to create a sequence of consecutive numbers using seq. Check the range of mean_vec for the from and to values of seq. Use dnorm() function to find the density values. The mean and std of dnorm should be the same with the mean and std of mean_vec.
Problem 3. Plot grouped data
The mtcars data set is a built-in data set in base R. It comprises fuel consumption and 10 aspects of automobile design and performance for 32 automobiles (1973-74 models). Type ?mtcars in R console for more details about it.
We want to examine the relationship between type of transmission (am) and fuel efficiency (mpg). Use a side-by-side boxplot the compare mpg between the two groups.
Requirements:
1. It should be horizontal.
2. The title should be “Fuel Efficiency vs. Transmission”
3. The x label should be “miles per gallon”
4. Give meaningful names for the transmission levels and show them on the plot. Instead of changing the labels directly, you may consider transforming am to a factor vector and giving character labels to the levels.