Best subsets regression, Advanced Statistics

In the time series plot and scatter graphs there were many outliers that were clearly visible. These have been removed to identify if they were influential or had high leverage and in order to see if the multiple regression model assumptions have been met.

Below are the rows of the outliers that I removed out of the 1519 observations:

77, 674, 448, 757, 317, 549, 1187, 1198, 26, 456, 405, 307, 1205, 1348, 611, 368, 309

Best Subsets Regression: wfood versus totexp, income, age, nk

Response is wfood

                                                                   t i

                                                                   o n

                                                                    t c

                                                                    e o a

                               Mallows                         x m g n

Vars  R-Sq  R-Sq(adj)       Cp         S             p e e k

   1  22.9       22.9     67.4            0.092326  X

   1   5.5        5.4      424.9           0.10222    X

   2  24.8       24.7     31.3            0.091236  X     X

   2  24.2       24.1     42.7           0.091572  X   X

   3  26.1       26.0      6.1            0.090461  X   X X

   3  24.8       24.7     32.3           0.091239  X X   X

   4  26.3       26.1      5.0            0.090397  X X X X

The best subset is a way of identifying which independent variable such as the totexp, income, age and nk are best suited to the regression model.  According to the results above income is the variable that has the highest Cp and the lowest R-squared value therefore it will be the variable that will be dropped to see if the data fits the model.

Posted Date: 3/4/2013 6:44:10 AM | Location : United States







Related Discussions:- Best subsets regression, Assignment Help, Ask Question on Best subsets regression, Get Answer, Expert's Help, Best subsets regression Discussions

Write discussion on Best subsets regression
Your posts are moderated
Related Questions
Literature controls : The patients with the disease of interest who have received, in the past, one of two treatments under the investigation, and for whom the results have been pu

I need you to help me for Business Statistics class with homework quizzes. Can you help to do it?

The linear component ηi, de?ned just in the traditional way: η i = x' 1 A monotone differentiable link function g that describes how E(Yi) = µi is related to the linear compon

Jonckheere Terpstra test  is the test for detecting particular types of departures from the independence in a contingency table in which both the row and column categories contain

Tree is the term from the branch of the mathematics which known as the graph theory, used to describe any set of the straight-line segments joining the pairs of points in some pro

Martingale: In the gambling context the term at first referred to a system for recouping losses by doubling the stake after each loss has occured. The modern mathematical concept

Cohort study : An investigation in which the group of individuals (or the cohort) is identi?ed and followed prospectively, possibly for many years, and their subsequent medical his

Baddeley'smetric : A manner of measuring the 'error' in the image processing technique or method. The metric is derived using the fundamental theory from the stochastic geometry an

Non-identified response is a term used to signify censored observations in survival data, which are not independent of the endpoint of the interest. Such observations can happen f

Prevented fraction is a measure which can be used to attribute the protection against the disease directly to an intervention. The measure can given by the proportion of disease w