Reference no: EM132253656
Short Answer Questions - 200 words each and one reference in APA style.
Q1. Suppose you had a six-sided die where each number (1, 2, 3, 4, 5, and 6) has the same probability of showing up (1/6). If the die is rolled an infinite number of times and the number recorded, what will be the average value that shows up? Is the average value one of the actual possibilities (1, 2, 3, 4, 5, or 6)? Why or why not?
Q2. Suppose you wanted to understand the relationship between a customer's yearly income (X) and the number of movies (Y) the customer watched in a year. You then gather data on incomes and the number of movies watched in a year. The range of incomes in your data set is $5K to $150K. After fitting a simple linear model and performing all the appropriate diagnostics, the model showed that, on average, for every $10K in income, the customer watched 1.5 movies in the year. So, for example, if a customer earned 60K in a year, he or she would be expected to watch nine movies during the year. Now you want to apply this model to your very wealthy friend who will earn $1 million in the next year. Is this an appropriate application of your model? Why or why not? Provide specific examples to justify your opinion.
Q3. If you regress daily high temperature (Y) on the amount of ice cream sales (X), you will notice that there is a strong positive correlation between the two. In other words, as daily ice cream sales increase, the daily high temperature increases. This implies that if we knew the amount of ice cream sales in a particular day, we could estimate, with a high level of accuracy, the high temperature in that day. Does this mean that if we wanted to increase the daily temperature, we need to sell more ice cream? Explain why or why not?
Q4. Suppose you were asked to investigate which predictors explain the number of minutes that 10- to18-year-old students spend on Twitter? To do so, you build a linear regression model with Twitter usage (Y) measured as the number of minutes per week. The four predictors you include in the model are Height, Weight, Grade Level, and Age of each student. You build four simple linear regression models with Y regressed separately on each predictor, and each predictor is statistically significant. Then you build a multiple linear regression model with Y regressed on all four predictors, but only one predictor, Age, is statistically significant, and the others are not. What is likely going on among the four predictors? If you include more than one of these predictors in the model, what are some problems that can result?
Q5. After building a regression model and performing residual diagnostics, you notice that the errors show severe departures from normality and appear to have non-constant variance. What steps would you take in this case to resolve the errors? If the problems are not corrected after all steps are taken, what does that imply about the modeling approach you are taking? Explain in detail.