Best subsets regression, Advanced Statistics

In the time series plot and scatter graphs there were many outliers that were clearly visible. These have been removed to identify if they were influential or had high leverage and in order to see if the multiple regression model assumptions have been met.

Below are the rows of the outliers that I removed out of the 1519 observations:

77, 674, 448, 757, 317, 549, 1187, 1198, 26, 456, 405, 307, 1205, 1348, 611, 368, 309

Best Subsets Regression: wfood versus totexp, income, age, nk

Response is wfood

                                                                   t i

                                                                   o n

                                                                    t c

                                                                    e o a

                               Mallows                         x m g n

Vars  R-Sq  R-Sq(adj)       Cp         S             p e e k

   1  22.9       22.9     67.4            0.092326  X

   1   5.5        5.4      424.9           0.10222    X

   2  24.8       24.7     31.3            0.091236  X     X

   2  24.2       24.1     42.7           0.091572  X   X

   3  26.1       26.0      6.1            0.090461  X   X X

   3  24.8       24.7     32.3           0.091239  X X   X

   4  26.3       26.1      5.0            0.090397  X X X X

The best subset is a way of identifying which independent variable such as the totexp, income, age and nk are best suited to the regression model.  According to the results above income is the variable that has the highest Cp and the lowest R-squared value therefore it will be the variable that will be dropped to see if the data fits the model.

Posted Date: 3/4/2013 6:44:10 AM | Location : United States







Related Discussions:- Best subsets regression, Assignment Help, Ask Question on Best subsets regression, Get Answer, Expert's Help, Best subsets regression Discussions

Write discussion on Best subsets regression
Your posts are moderated
Related Questions
Longitudinal data : The data arising when each of the number of subjects or patients give rise to the vector of measurements representing same variable observed at the number of di

The term used when the aggregated data (for instance, aggregated over different areas) are analysed and the results supposed to apply to the relationships at the individual level.

I need you to help me for Business Statistics class with homework quizzes. Can you help to do it?

Mosaic displays  is the graphical display of the standardized residuals from the fitting a log-linear model to a contingency table in which the colour and outline of the mosaic's '

Profile plots  is a technique of representing the multivariate data graphically. Each of the observation is represented by a diagram comprising of a sequence of equispaced vertical

Multilevel models are the regression models for the multilevel or clustered data where units i are nested in the clusters j, for example a cross-sectional study where students are

1) Let N1(t) and N2(t) be independent Poisson processes with rates, ?1 and ?2, respectively. Let N (t) = N1(t) + N2(t). a) What is the distribution of the time till the next epoch

The procedures for extracting the pattern in a series of observations when this is obscured by the noise. Basically any such technique or method separates the original series into

It is the multivariate normal random vector which satisfies certain conditional independence suppositions. This can be viewed as a model framework which contains a wide range of st

Blinding : A procedure used in clinical trials to get rid of the possible bias which might be introduced if the patient and/or the doctor knew which treatment the patient is receiv