Best subsets regression, Advanced Statistics

In the time series plot and scatter graphs there were many outliers that were clearly visible. These have been removed to identify if they were influential or had high leverage and in order to see if the multiple regression model assumptions have been met.

Below are the rows of the outliers that I removed out of the 1519 observations:

77, 674, 448, 757, 317, 549, 1187, 1198, 26, 456, 405, 307, 1205, 1348, 611, 368, 309

Best Subsets Regression: wfood versus totexp, income, age, nk

Response is wfood

                                                                   t i

                                                                   o n

                                                                    t c

                                                                    e o a

                               Mallows                         x m g n

Vars  R-Sq  R-Sq(adj)       Cp         S             p e e k

   1  22.9       22.9     67.4            0.092326  X

   1   5.5        5.4      424.9           0.10222    X

   2  24.8       24.7     31.3            0.091236  X     X

   2  24.2       24.1     42.7           0.091572  X   X

   3  26.1       26.0      6.1            0.090461  X   X X

   3  24.8       24.7     32.3           0.091239  X X   X

   4  26.3       26.1      5.0            0.090397  X X X X

The best subset is a way of identifying which independent variable such as the totexp, income, age and nk are best suited to the regression model.  According to the results above income is the variable that has the highest Cp and the lowest R-squared value therefore it will be the variable that will be dropped to see if the data fits the model.

Posted Date: 3/4/2013 6:44:10 AM | Location : United States







Related Discussions:- Best subsets regression, Assignment Help, Ask Question on Best subsets regression, Get Answer, Expert's Help, Best subsets regression Discussions

Write discussion on Best subsets regression
Your posts are moderated
Related Questions
Collective risk models : The models applied to insurance portfolios which do not create direct reference to the risk characteristics of individual members of the portfolio when des

Randomization tests are the procedures for determining the statistical significance directly from the data with- out recourse to some particular sampling distribution. For instanc

Quantile regression is an extension of the classical least squares from estimation of the conditional mean models to the estimation of the variety of models for many conditional q

The objective of this assignment is to test your understanding in the learning outcome (LO2) and learning outcome (LO3) and learning outcome (LO4). 1) This is a grouped assignme

Normal approximation : Normal distributions which approximate other distributions; such as, a normal distribution with the mean np and variance np(1 - p) which acts as an approxima

Negative hyper geometric distribution : In sampling without replacement from the population comprising of r elements of one kind and N - r of another, if two elements corresponding

Link functions: The link function relates the linear predictor ηi to the expected value of the data. In classical linear models the mean and the linear predictor are identical

Marginal matching is the matching of the treatment groups in terms of means or other summary characteristics of matching variables. This has been shown to be almost as efficient a

Misspecification  is the term is applied to describe the assumed statistical models which are incorrect for one of the several of reasons, for instance, using the wrong probability

A construction for events that happen in some planar area a, consisting of the series of 'territories' each of which comprises of that part of a closer to the particular event xi t