Non-normality, multicollinearity, heteroscadasticity

Assignment Help Mathematics
Reference no: EM13925003

STAT 501 - Final Exam

State which of the following statements are true and which are false. For the statements that are false, explain why they are false.

  • In a logistic regression analysis where Y=1 represents survival and Y=0 represents death, the logit of the survival probability is the negative of the logit of death probability.
  • In regression analysis, the method of ordinary least squares can be used in the presence of non-normal errors.
  • In multiple linear regression analysis, the width of a prediction interval for a future response of Y based on a single predictor X increases with the value of X.
  • The error terms in an AR(1) model have zero mean.
  • In model selection, the MSE (or S) criterion minimizes confidence/prediction interval widths, while the PRESS criterion evaluates model unbiasedness.

(7x2 = 14 points) Fill in the blanks with terms from the list: non-normality, multicollinearity, heteroscadasticity, confidence intervals, prediction intervals. [Note: there are 7 blanks but only 5 terms, so you'll have to use some terms more than once and there may be some terms you don't use at all.]

A small p-value for the Ryan-Joiner test indicates ________.
A residual vs. fits plot with a non-random pattern around a horizontal line at zero indicates ________.
A large sample size ensures the validity of a confidence interval for a mean response even when errors exhibit ________.
________ among predictors will lead to unreliable ________ for regression coefficients.
Weighted least squares estimation can be used in the presence of error ________ and ________.

The following ANOVA table is abstracted from a regression fit to the
model: Y = β0 + β1 X1 + β2 X2 + β3 X3 + ... + β10 X10 + ε.

Source              DF    SS

Regression                 110.53

Residual Error    28

Total                         150
 
Source            DF          Seq SS
X1                  1             0.10
X2                  1              40
X3                  1              1.0
X4                  1.0             1.0
X5                   1              2.5
X6                   1             0.08
X7                   1              6.5
X8                   1               4.0
X9                   1               0.4
X10                 1                0.95

  • Calculate the three missing values in the upper table.
  • For the 10-predictor model, perform a hypothesis test at significance level 0.05 to determine whether predictors X7, X8, X9, and X10 are significantly linearly related to Y upon controlling for the remaining predictors X1-X6 using a general linear F test. Write the null and alternative hypotheses, the value of the test statistic, the decision rule, and the conclusion. [Note: an F-distribution table is provided on the last page of the exam.]
  • Later it was decided to consider a regression of Y on the first 4 predictors ONLY. Use information from both tables above to calculate adjusted R2 for the model with only the first 4 predictors.
  • Given the information in both tables above, is it possible to test whether X1 and X3 can be dropped from the 4-predictor model? Give a brief argument supporting your answer. [You do not have to do a test, even if one is possible.]

(4+9+4+4 = 21 points) Data from a local supermarket revealed that the deli usage of customers depends on their grocery bill and also on the time of shopping. To understand the link between these variables, a logistic regression model was fitted based on data from 890 sales records, which yielded the following.

bill 110.82410.824110.820.001

Odds Ratio95% CI

Odds Ratio for lunch=1 relative to lunch=0
Odds Ratio 95% CI

Here is the estimated probability of deli usage, bill is the amount of the grocery bill and lunch is a binary variable that equals 1 for a store visit at lunchtime and 0 for a store visit at other times.
Is their any statistical evidence that shopping time is related to the odds of deli usage, and if so, does lunchtime have a higher odds of deli usage than a visit at other times?
Write the regression equation to estimate the logit of:
the probability of deli usage in terms of both predictors bill and lunch.
the probability of deli usage for lunchtime shoppers and others separately.
the probability of NO deli usage in terms of both predictors bill and lunch.
Use your answer in (b)(ii) above to find the value of bill at which the probability of using the deli is 0.80 for a lunchtime shopper.
Write a sentence that interprets the coefficient estimate for the predictor variable bill.

(7+7+8 = 22 points) Minitab outputs shown below are the results of a statistical analysis performed on a set of data consisting of 22 crop values (crop), along with fertilizer amounts (fert) and temperature (temp).
Coefficients

Term Coef SE Coef T-Value P-Value VIF
Constant 104 104 1.00 0.329
fert 7.57 3.88 1.95 0.067 88.20
temp 3.591 0.855 4.20 0.001 1.61
fert*fert -0.0821 0.0354 -2.32 0.032 90.92

Fitted Regression Equation

crop = 104 + 7.57 fert + 3.591 temp - 0.0821 fert*fert

fert temp Fit 95% CI 95% PI
50 37.64 412.701 (388.693, 436.709) (345.607, 479.795)
20 40 366.490 (285.192, 447.788) (263.852, 469.128)

Descriptive Statistics: fert, crop, temp

Variable Mean SE Mean StDev Minimum Maximum
crop 376.4 13.4 62.9 270.0 460.0
fert 59.50 3.35 15.73 32.00 79.00
temp 37.64 1.34 6.29 27.00 46.00

Comment on the validity of the fitted regression equation above in terms of the statistical significance of each predictor (use α =0.05) and regression pitfalls such as multicollinearity, outlier presence, error non-normality, and heteroscedasticity.
Propose an alternative population regression model that may remedy some of these pitfalls.
Comment on the validity of the 95% confidence intervals and prediction intervals computed for fert and temp settings:
50 and 37.64
20 and 40

  • The figure below gives the sample PACF for a time series data of monthly sales (in thousands of dollars). Use the PACF plot to propose an appropriate time series model for yt = sales during the tth month.
  • (5x3 = 15 points) In an experiment, a researcher compares three different metal alloys (say, A, B, and C) used to weld pipes. For each alloy, 10 welds are made. The response variable is strength of the weld (Y). In addition to the type of alloy (A, B, or C), a quantitative predictor, diameter of weld (X), will be used in a regression model for predicting Y.
  • Write a population regression model (with no interaction) for predicting Y using X and alloy (A, B, or C) as predictors. Clearly define any necessary indicator variables.
  • Explain precisely what each regression coefficient measures in the model that you wrote for part (a).
  • What null hypothesis would be tested to determine whether there are differences among the alloys? Write the hypothesis in terms of the regression coefficients in part (a).
  • Explain how you would carry out the hypothesis test described in part (c). Do not forget to write down the degrees of freedom for the test you propose.
  • What term(s) should be added to the model to create a model with interactions? Write down the model

Reference no: EM13925003

Questions Cloud

Compute a par spread in basis points for a 5yr cds : Modify the data on the CDS pricing worksheet in the work book data.xlsx to compute a par spread in basis points for a 5yr CDS with notional principalN=10million assuming that the expected recovery rateR=25%, the 3-month hazard rate is a flat1%, and..
Flexible budgets and standard cost system : This discussion forum is to expand with other resources for learning. You will perform an internet search on one of two terms. When you perform your internet search on the term of your choice, you will find several hits. Select and read a case study ..
Hiv and aids from the perspective of hipaa confidentiality : Write a 1,500- to 1,750-word essay in which you discuss implications of both forms of the patient's diseases, HIV and AIDS, from the perspective of HIPAA confidentiality.
Real-life international incidents from the past five : " This assignment covers the manner in which this shift occurred and the consequences the United States faces as a result of its status as "policemen of the world." Using the Internet, research two (2) real-life international incidents from the pa..
Non-normality, multicollinearity, heteroscadasticity : In a logistic regression analysis where Y=1 represents survival and Y=0 represents death, the logit of the survival probability is the negative of the logit of death probability.
Portfolio of stocks consisting : You hold a portfolio of stocks consisting of the following: Stock Beta Current Value Caterpillar 0.6 $20,000 CitiCorp 0.8 $21,000 Wendy’s 1.0 $22,000 Boeing… 1.2 $27,000 Total: $90,000 a. What is the beta of the portfolio?
Accountant makes an adjusting entry for accrued expenses : When an accountant makes an adjusting entry for accrued expenses, which statement best reflects what the accounts look like before the adjustment?
How many dollars do you need to invest in the cash account : Consider a 1-period binomial model with R=1.02, S0=100, u=1/d=1.05. 1.Compute the value of a European call option on the stock with strike K=102. The stock does not pay dividends. 2.When you construct the replicating portfolio for the option in the p..
Identify the major features of the working capital cycle : Take a walk around the major business neighbourhood in your home town or city and identify the major features of the working capital cycle of a few businesses

Reviews

Write a Review

Mathematics Questions & Answers

  Graphs for three equations and find common point

Using first two equations determine 3 possible points that the 2 conic sections that you get from equations meet. Add a third equation figure to out the one point that all three conic sections meet.

  How far south has the triathlete run

A triathlete runs 12 km on a bearing of 210 degrees from the start (North) to the finish line which is found in a South West direction.

  Rotate the coordinate axes to change the equation

Rotate the coordinate axes to change the equation 3x2 + 2√3xy + y2 - 8x + 8√3y = 0 into an equation that has no cross product (xy) term. Then identify the equation.

  Different age categories have different mean blood pressure

Is there sufficient evidence to support the claim that women in the different age categories have different mean blood pressure levels?

  Cnsider quadrilateral abcd with a -34 b 26 c 2-1 and d

question consider quadrilateral abcd with a -34 b 26 c 2-1 and d -3-3. find the area and perimeter of abcd. please

  How large must a random sample with replacement be in order

A pollster wishes to know the percentage p of people in a population who intend to vote for a particular candidate. How large must a random sample with replacement be in order to be at least 95 % sure that the sample percentage is within one perce..

  How else could the manager justify using an order size

What ordering cost would enable the manager to justify ordering every other day? (Round your intermediate calculations and final answer to 2 decimal places. Omit the "tiny_mce_markerquot; sign in your response.)

  Determine the angle of elevation

The cross section of an irrigation canal are isoscles triangles of which three sides are 8 feet long. Determine the angle of elevation.

  Customers experiencing technical difficulty with their

customers experiencing technical difficulty with their internet cable hookup may call an 800 number for technical

  At what rate is the base of the triangle changing

The altitude of a triangle is increasing at a rate of 2500 centimeters/minute while the area of the triangle is increasing at a rate of 1000 square centimeters/minute. At what rate is the base of the triangle changing when the altitude is 7500 cen..

  What is the sample variance

A sample of n=25 scores has a mean of M=65 and an estimated standard error of points. What is the sample variance?

  What are the intercepts of the equation

What are the intercepts of the equation from Part a. of this problem? What are the intercepts from Part b. of this problem? Where would the lines intersect if you solved the system by graphing?

Free Assignment Quote

Assured A++ Grade

Get guaranteed satisfaction & time on delivery in every assignment order you paid with us! We ensure premium quality solution document along with free turntin report!

All rights reserved! Copyrights ©2019-2020 ExpertsMind IT Educational Pvt Ltd