Ls linear regression to make predictions

Assignment Help Basic Statistics
Reference no: EM13925861

1. In a simple linear regression, the least squares regression line is (a) the line which makes the sample correlation as close to +1 or -1 as possible. (b) the line which best splits the data in half, with half of the data points lying above the regression line and half of the data points lying below the regression line. (c) the line which minimizes the sum of squared residuals. (d) the line which minimizes the number of points that do not pass through the line.

2. A least squares regression line is determined from a sample of values for variables x and y, where x is the size of a listed home (in square feet), and y is the selling price of the home. Which of the following statements is true concerning the fitted line ˆy = b0 + b1x? (a) If there is a positive correlation r between x and y, then the slope b1 must also be positive (b) The units on the intercept b0 and the slope b1 will be the same as the units on the variable y (c) If r 2 = 0.85, then it is appropriate to conclude that a change in x will cause a change in y (d) None of above is true

3. The residual plot below consists of 104 observations. Based on the plot one can conclude that RMSE is around (a) 0 (b) 25 (c) 40 (d) 80

An insurance agent has selected a sample of drivers that she insures whose ages are in the range from 16 to 42 years. For each driver, she records the age of the driver (x) and the dollar amount of claims (y) that the driver filed in the previous 12 months. A scatterplot showing the dollar amount of claims as the response and the age as the predictor shows a linear trend. The least squares regression line is determined to be: ˆy = 3715 - 75.4x. A plot of the residuals versus age of the drivers showed no pattern, and the following were reported: r 2 = 0.822, standard deviation of the residuals Se = 312.1.

4. Which of the following is correct? (a) If the age of a driver increases from 20 to 21, the dollar amount of claims is predicted to decrease by $75.4 (b) If the age of a driver increases by one year, the dollar amount of claims is predicted to increase by $3715 (c) One can use the least squares regression line to obtain a reliable prediction of the dollar amount of claims for a driver whose age is 55 years (d) The dollar amount of claims for a driver of 10 years old is expected to be $2961.

5. Which of the following is false? (a) 82.2% of the variation in the dollar amounts of claims is explained by the age of the driver. (b) The correlation r between the response and the predictor is 0.907 (c) If the histogram of the residuals is symmetric around zero and bell-shaped, then about 68% of the dollar amounts of claims are within 312.1 dollars of the regression line. (d) A driver in the data set whose age is 25 years had a residual of -$150 using the fitted line above; this means his dollar amount of claims is $1680.

6. Which of the following is correct? (a) A linear model is okay because the association between the two variables is fairly strong. (b) The linear model is not good because the correlation between the response and the predictor is near 0. (c) The linear model is not good because some residuals are large. (d) The linear model is not good because of the curve in the residuals.

7. If one uses the LS linear regression to make predictions, which of the following statements is true? (a) The predictions tend to be too high for large x's. (b) The predictions tend to be too high for intermediate x's. (c) The predictions tend to be too high for small x's. (d) None of the above is correct

8. In a study of the association between the car mileage (miles per gallon, mpg) and the car weight, it is found that the association is curved. To make the association to be linear, one decides to change the response to be 100 multiple of the reciprocal of the mileage. The scatterplot of the new response vs the car weight (in thousands of pounds) is shown below

A LS linear regression is fitted to the transformed variables, and yields the following equation Estimated new response = 0.95 + 1.25 ∗ Weight (000 lbs) Based on the equation, what's the predicted mileage (measured in mpg) for a car of weight 5,000 pounds? (a) 6251 (b) 0.016 (c) 7.2 (d) 13.89

Each worker at an assembly plant that produces clock radios is responsible for the entire assembly of each unit they work on. The plant manager has collected data from a sample of workers: the number of years (YRS) of experience at the plant, and the number of hours per unit (TIME) required for assembly. The scatterplot of TIME versus YRS is shown below

9. Which of the following is an appropriate reason why a regression line should not be used to make predictions based on this data? (a) The magnitude of the slope of the line is too large (b) The intercept of the fitted line has no practical interpretation in this context (c) The linear condition for simple regression does not appear to be met (d) The associate between TIME and YRS is negative

10. The manager has decided to transform the response variable from TIME (hours/unit) to 1/TIME (units/hour). The scatterplot of 1/TIME versus YRS is shown below.

Which of the following is an appropriate interpretation of these results? (a) The unit on Se is hours per unit (b) More experienced workers are predicted to produce more units per hour on average than less experienced workers (c) Because the transformed model has a higher r 2 , it is better. (d) The slope b1 measures the elasticity between 1/TIME and YRS

The scatterplot of sales in thousands of cartons (y) of half-gallon orange juice versus the price (x) is given below. We apply log transformation on both y and x to fit the nonlinear pattern. Assume the transformed x and y agree with SRM.

Transformed Fit Log to Log Log(Sales) = 4.811646 - 1.7523832*Log(Price) Summary of Fit RSquare 0.755335 Root Mean Square Error 0.385788 Mean of Response 3.136468 Parameter Estimates Term Estimate Std Error t Ratio Prob>|t| Intercept 4.811646 0.148033 32.50 <.0001* Log(Price) -1.752383 0.143954 -12.17 <.0001*

11. Which of the following interpretations of the fitted equation is true? (a) As the price increase by 1%, the sales decrease by 1.75% on average (b) As the price increase by $1, the sales decrease by 1.75 units on average. (c) As the price increase by 1%, the sales decrease by 1.75 units (d) As the price increase by $1, the sales decrease by 1.75% on average

12. Based on the fitted equation, what's the predicted sales (in thousands of cartons) for a price of $2.3? (a) 3.35 (b) 28.56 (c) 4.18 (d) 65.22

13. Suppose the cost of a half-gallon juice is $1.5, then the optimal price is about (a) $1.9 (b) $3.0 (c) $3.5 (d) $4.1

14. The statistics of the slope show that (a) The elasticity is positive with at least 95% confidence (b) The elasticity is bigger than -1 with at least 95% confidence (c) The elasticity is smaller than -1 with at least 95% confidence (d) None of the above is correct.

15. About the estimated intercept 4.811646, which of the following is the appropriate interpretation? (a) It estimates the sales in thousands of cartons when the price equals $0. (b) It estimates the sales in thousands of cartons when the price equals $1. (c) It estimates the logarithm of the sales in thousands of cartons when the price equals $1. (d) None of the above is correct.

16. The normal quantile plot of residuals from a regression equation in the plot below suggests that

(a) The fitted equation is linear. (b) The R-squared statistic is about 0.9 or more. (c) The errors are normally distributed. (d) The data in the sample are dependent.

A LS linear regression is fitted to the 2011 daily returns on HSBC (HSBC Rtn) vs those on Hang Seng index (HS Rtn). The following are some plots and summaries one gets in the fitting procedure.

17. Based on the plots above, which of the following assumptions about the SRM seems to be violated? (a) Linear association (b) Normality of errors (c) Equal variance of errors (d) Independence of errors

18. If the return on Hang Seng index increases by 1%, at 95% confidence level, which of the following statements about the return on HSBC is true? (a) It will increase by at least 0.93%. (b) It will increase by less than 1.08%, on average. (c) It will be at least 0.93%, on average. (d) It will increase by 1.002947%.

19. Which of the following statements is false? (a) We do not reject the hypothesis that β0 = 0 at 5% significant level. (b) At 5% significant level, we do not reject the hypothesis that returns on HSBC move on average by the same amount with returns on the Hang Seng index. (c) We do not reject the hypothesis that β1 = 0 at 5% significant level. (d) Returns on HSBC are correlated with the returns on the market

A large national bank charges local companies for using their services. A bank official reported the results of a regression analysis designed to predict the bank's charges (Y ) - measured in dollars per month - for services rendered to local companies. One explanatory variable used to predict service charge to a company is the company's sales revenue (X) - measured in millions of dollars. Data for 21 companies who use the bank's services were used to fit the model. The results of the simple linear regression are provided below. Assume the conditions of the SRM are satisfied. yˆ = -2, 700 + 20x, RMSE = 65, p-value for testing β1 = 0 is 0.034.

20. Interpret the estimate of β0, the intercept of the line. (a) All companies will be charged at least $2,700 by the bank. (b) There is no practical interpretation since a sales revenue of $0 is a nonsensical value. (c) About 95% of the observed service charges fall within $2,700 of the least squares line. (d) For every $1 million increase in sales revenue, we expect a service charge to decrease $2,700.

21. Interpret the estimate of σε, the standard deviation of the error term in the model. (a) About 95% of the observed service charges fall within $65 of the least squares line. (b) About 95% of the observed service charges equal their corresponding predicted values. (c) About 95% of the observed service charges fall within $130 of the least squares line. (d) For every $1 million increase in sales revenue, we expect a service charge to increase $65.

22. Interpret the p-value for testing the hypothesis that β1 = 0. (a) There is sufficient evidence (at α = 0.05) to conclude that sales revenue (X) is a useful linear predictor of service charge (Y ). (b) There is insufficient evidence (at α = 0.05) to conclude that sales revenue (X) is a useful linear predictor of service charge (Y ). (c) Sales revenue (X) is a poor predictor of service charge (Y ). (d) For every $1 million increase in sales revenue, we expect a service charge to increase $0.034.

23. A 95% confidence interval for β1 is [15, 30]. Interpret the interval. (a) We are 95% confident that the mean service charge will fall between $15 and $30 per month. (b) We are 95% confident that the sales revenue (X) will increase between $15 and $30 million for every $1 increase in service charge (Y ). (c) We are 95% confident that on average the service charge (Y ) will increase between $15 and $30 for every $1 million increase in sales revenue (X). (d) At the α = 0.05 level, there is not enough evidence of a linear relationship between service charge (Y ) and sales revenue (X).

24. To obtain a narrower confidence interval for the estimated slope in this model, we should advise the bank official to (a) concentrate on companies which spent less on using the bank's services. (b) concentrate on companies which spent more on using the bank's services. (c) concentrate on companies whose sales revenues are either relatively low or relatively high. (d) obtain additional data for companies of widely varying sales revenues.

It is believed that, the average numbers of hours spent studying per day (HOURS) during undergraduate education should have a positive linear relationship with the starting salary (SALARY, measured in thousands of dollars per month) after graduation. Given below is the output from regressing SALARY on HOURS for a sample of 51 students. R Square 0.7845 Standard Error 1.3704 Observations 51 Coefficients Standard Error t Stat P-value Intercept -1.8940 0.4018 -4.7134 2.051E-05 Hours 0.9795 0.0733 13.3561 5.944E-18

25. What's the value of the t-test statistic to test whether HOURS is a useful linear predictor of SALARY? (a) -4.7134 (b) -1.8940 (c) 0.9795 (d) 13.3561

26. The 90% confidence interval for the average change in SALARY (in thousands of dollars) associated with one extra hour of studying per day is (a) wider than [-2.70, -1.09] (b) narrower than [-2.70, -1.09] (c) wider than [0.83, 1.13] (d) narrower than [0.83, 1.13]

A construction contractor is involved in a wide variety of construction projects. The operations manager wants to investigate how the Total Hours of labor (design, engineering, modeling, simulation, construction, software support, etc.) required for a project is related to the Total Cost of completing the project. Based on data collected over many projects, the data was used to determine a predicting equation for the simple regression model: Total Cost = F + M × Total Hours + ε, where F and M are the fixed and marginal costs respectively. After determining the predicting equation, a scatterplot of residuals vs. Total Hours was determined as given below:

27. Which of the following statements is an appropriate interpretation of these results? (a) The similar variances condition for simple regression does not appear to be satisfied by the data (b) Prediction intervals for small values of the Total Hours would tend to be too narrow (c) Confidence intervals for the slope of the line should still be considered reliable (d) None of the above

In an attempt to improve the model, the manager decides to use 1/Total Hours as the explanatory variable, and Cost/Hour ($/Hour) as the response. The model becomes: Cost Hour = M + F 1 Hours + ε 0 The regression output and the scatterplot of the data are given below:

28. Which of the following statements is correct? (a) The total cost of a project is predicted to decrease as the number of hours required increases. (b) The total cost of a project is expected to increase by $118.41 per additional hour of labor required for the project. (c) The fixed cost of a project is predicted to be approximately $118.41. (d) None of the above.

29. Using the revised model, what is the average cost per hour for a project that will require 300 total hours of labor to complete? (a) -466, 401 (b) 113.2 (c) 0.0088 (d) 118.4

30. Using the revised model, what is the approximate 95% prediction interval for the total cost of a project that will require 600 total hours of labor to complete? (a) (61, 170) (b) (53,200, 85,800) (c) (36,900, 102,100) (d) (69,400, 69,500)

31. The information given in the parameter estimates table about the intercept implies that (a) Fixed costs are significantly different from zero. (b) Marginal costs are significantly different from zero. (c) Marginal costs decrease as the total hours increase. (d) Marginal costs cannot be estimated from the model.

A simple regression model is fitted to a data set, with the scatterplot and the least squares line shown below. It is clear that the observation represented by a solid circle in the upper-right corner is an outlier

32. If the outlier is removed, how will the intercept and the slope of the least squares line change? (a) The intercept will be smaller, and the slope will be smaller too. (b) The intercept will be smaller, and the slope will be bigger. (c) The intercept will be bigger, and the slope will be bigger too. (d) The intercept will be bigger, and the slope will be smaller

33. If the outlier is removed, how will the standard deviation of the residuals change? (a) increase (b) decrease (c) stay the same (d) cannot tell

Weekly commodity prices for heating oils (in cents) were obtained and regressed against time. The residual plot is shown below

34. Which assumptions of SRM appears to be violated? (a) Linear association (b) Normality of errors (c) Equal variance of errors (d) Independence of errors

35. If one uses the obtained regression equation to make prediction about the commodity prices for heating oils in the next week, then compared with the actual price, the prediction is likely to be . (a) higher (b) lower (c) on target (d) cannot tell based on the information given

Reference no: EM13925861

Questions Cloud

Grading scale for a particular class incorporates : Practice Pre-exam on Background Concepts - Solutions Dr. Stanley D. Longhofer
Design a strategy using swaps that would enable it : In fact, there are bond indices that are quite representative of the universe of bonds in which it would invest. Design a strategy using swaps that would enable it to achieve its objective.
Calculate the net pay for each employee for the week : Calculate the net pay for each employee for the week.
Use a value property node-wire directly to the indicator : Someone's cell phone is ruining my lead II ECG. A unity gain INA has 100dB of CMRR. The Lead III SNR at the output of the INA is 60dB with the phone on. Someone suggests grounding the RA to improve the SNR. Specify the SNR (in dB) after groun..
Ls linear regression to make predictions : If one uses the LS linear regression to make predictions, which of the following statements is true? (a) The predictions tend to be too high for large x's. (b) The predictions tend to be too high for intermediate x's. (c) The predictions tend to b..
How would you feel if you had jeannette parents : Jeannette did not let her parents live with her while they were in New York City. If you were Jeannette, would you let them live with you? Or would you rather watch them roam around homeless? Explain your choice.
Managerial economics and globalization : In each of the following examples, discuss which market model appears to best explain the behavior described:a. Dry weather unexpectedly cut the 2003 soybean harvest by 15 percent, making it the smallest harvest in seven years. China increase..
What amounts will book mart report : What amounts will Book Mart report on the 2013 W-2s for each employee?
Alternative strategies for kudler fine foods : Kudler Fine Foods is a gourmet food shop owned by Kathy Kudler whose sole purpose for opening such a store was driven by her passion for gourmet cooking. Kudler Fine Foods has experienced huge success since its birth in 1998, because of the consum..

Reviews

Write a Review

Basic Statistics Questions & Answers

  Type of damage occur after grabbing a door frame

What type of damage could occur if you grabbed a door frame with your left arm stepped with your right leg the leg shoots out from under you and you bounce down hitting the edge.

  Probability that defective have defective braked

Keith's florist has 18 delivery trucks used mainly to deliver flowers in the SC area. Of these 18 trucks, 5 have brake problems. A sample of 4 trucks is randomly selected. What is the probability that 2 of those defective have defective braked. Ro..

  Is the average run length performance of a control chart a

is the average run length performance of a control chart a more meaningful measure of performance than the type i and

  To determine forecasting accuracy

The Delphi method solicits input from customers or potential customers regarding their future purchasing plans.

  Estimate the correlation coefficient

Construct a scatter plot of P-RFM and P-HELM. If drawing the plot by hand, use graph paper to ensure accuracy.

  A small company has 10 employees that receive a monthly

a small company has 10 employees that receive a monthly paycheck and you calculate the mean and standard deviation for

  Explain what is so special about polynomials

The true population regression function, it isn"t. Before leaving this section we should ask and answer the following question. What"s so special about polynomials?

  Determining probability that tire wears out

Determine the probability that tire wears out before 70,000 miles?

  Statistics test paper

Answer the following three questions based on what you know about statistics now.

  Major league baseball mlb uses a best of 7 series to

major league baseball mlb uses a best of 7 series to determine the winner for the finals the winner must win 4 games.

  Binomial probabilities by normal probability distribution

Which binomial probabilities can be approximated by the normal probability distribution? Explain. c) what is the probability of between 100 and 110 successes? d) What is the probability of 130 or more successes?

  A statistics instructor collected data on the time it

a statistics instructor collected data on the time it takes the students to complete a test. the test taking time is

Free Assignment Quote

Assured A++ Grade

Get guaranteed satisfaction & time on delivery in every assignment order you paid with us! We ensure premium quality solution document along with free turntin report!

All rights reserved! Copyrights ©2019-2020 ExpertsMind IT Educational Pvt Ltd