Best subsets regression, Advanced Statistics

In the time series plot and scatter graphs there were many outliers that were clearly visible. These have been removed to identify if they were influential or had high leverage and in order to see if the multiple regression model assumptions have been met.

Below are the rows of the outliers that I removed out of the 1519 observations:

77, 674, 448, 757, 317, 549, 1187, 1198, 26, 456, 405, 307, 1205, 1348, 611, 368, 309

Best Subsets Regression: wfood versus totexp, income, age, nk

Response is wfood

                                                                   t i

                                                                   o n

                                                                    t c

                                                                    e o a

                               Mallows                         x m g n

Vars  R-Sq  R-Sq(adj)       Cp         S             p e e k

   1  22.9       22.9     67.4            0.092326  X

   1   5.5        5.4      424.9           0.10222    X

   2  24.8       24.7     31.3            0.091236  X     X

   2  24.2       24.1     42.7           0.091572  X   X

   3  26.1       26.0      6.1            0.090461  X   X X

   3  24.8       24.7     32.3           0.091239  X X   X

   4  26.3       26.1      5.0            0.090397  X X X X

The best subset is a way of identifying which independent variable such as the totexp, income, age and nk are best suited to the regression model.  According to the results above income is the variable that has the highest Cp and the lowest R-squared value therefore it will be the variable that will be dropped to see if the data fits the model.

Posted Date: 3/4/2013 6:44:10 AM | Location : United States







Related Discussions:- Best subsets regression, Assignment Help, Ask Question on Best subsets regression, Get Answer, Expert's Help, Best subsets regression Discussions

Write discussion on Best subsets regression
Your posts are moderated
Related Questions
Unequal probability sampling is the sampling design in which the different sampling units in the population have different probabilities of being included in sample. The differing

Invariant transformations to combine marginal probability functions to form multivariate distributions motivated by the need to enlarge the class of multivariate distributions beyo

the problem that demonstrates inference from two dependent samples uses hypothetical data from TB vaccinations and the number of new cases before and after vaccinations for cases o

Weathervane plot is the graphical display of the multivariate data based on bubble plot. The latter is enhanced by the addiction of the lines whose lengths and directions code the

regression line drawn as Y=C+1075x, when x was 2, and y was 239, given that y intercept was 11. calculate the residual

Bimodal distribution : The probability distribution, or we can simply say the frequency distribution, with two modes. Figure 15 shows the example of each of them

Normality - Reasons for Screening Data Prior to analyzing multivariate normality, one should consider univariate normality Histogram, Normal Q-Qplot (values on x axis

Missing Data - Reasons for screening data In case of any missing data, the researcher needs to conduct tests to ascertain that the pattern of these missing cases is random.

It is the multivariate normal random vector which satisfies certain conditional independence suppositions. This can be viewed as a model framework which contains a wide range of st

Hypothesis testing is a  general term for procedure of assessing whether the sample data is consistent or otherwise with statements made about the population. It basically tells u