+1-415-670-9189
info@expertsmind.com

# Get Solution

Regression modeling
Course:- Microeconomics
Reference No.:- EM13214

 Tweet

Expertsmind Rated 4.9 / 5 based on 47215 reviews.
Review Site
Assignment Help >> Microeconomics

Here is a procedure you can use when developing a regression model.

Regression Methodology

1) Collect data.  This will be the most time consuming part of your job.  Be sure to consider dummy variables and time.  Be selective (Remember garbage in = garbage out).

Sources:

Interviews, focus groups

Market Studies

Market Experiments (can be very valuable)

Gvt. Data

2) Check for problems with covariance - different explanatory variables may be correlated and fighting each other in the model.  For example - GDP vs. housing starts vs. Income.  Pick the best of all variables that give you the same information.  You can check covariance on Excel.

3) Regress your data first as a linear model

4) Check t-ratios and p-values of your coefficients.  Make sure t-ratios are at least 1.75-2.0 (choice is up to you, decide on your desired confidence level).  P-values tell you exactly what confidence level is.

5) Check R2 .  This tells you percent of variability you have explained with your model.  If it is very low - try adding more explanatory variables and/or a transformation.  For multi-variable can also use corrected R2

6) Check p-value on f for regression.  This tells you confidence that your model is statistically valid.  Note the standard error - the smaller this is the less error you will have in your forecast. SE =standard deviation of the sample mean.

7) Weed out bad variables and develop a model.

8) Try a transformation and see if your diagnostics improve. (A detailed residual analysis is necessary to completely analyze your model - refer to a stats book for this.  Diagnostic Plots like standardized residuals vs fitted values, nscores.  Check for autocorrelation for time series.  Check for heteroscedasticity (variable variance of residuals) - can often reduce with proper transformation.  Check normal probability plots.  Check outliers.  We will not go into this level of detail in this course)

CAVEATS:

1)  R2 will increase as you add more explanatory variables - this is not necessarily a better model.  You are not trying to maximize R2.  You are trying to develop a simple accurate model that predicts demand.

3) avoid collecting too many explanatory variables - they might mask the effect of the important ones

Sample Regression Model

Here is a detailed example of the development of a regression model

You open a jewelry store in Naperville.  You make most of your money from diamond sales.  You want to better understand and forecast your diamond demand:

1. Data collection:  What variables do you think might influence your demand for diamonds?  You develop the following list:
 income price wealth GDP stock market price of diamonds at WalMart demographics -age wedding stats time advertising season (high in holiday time)

You opened your store on Jan 1, 2002 and have gathered the following data.

 carats/mo \$/carat \$/carat \$/mo Naperville \$/yr \$B Demand Price Pwalmart adv \$ Ave Age Income GDP Dow Jones month 47 1500 1200 200 50 35000 1155 11000 Jan 51 1500 1000 200 49 35350 1260 12000 Feb 40 1500 800 250 48.02 35703.5 1207.5 11500 Mar 121 900 1500 225 47.0596 36060.54 1102.5 10500 Apr 127 900 1600 180 46.11841 36421.14 1080.45 10290 May 121 900 1400 120 45.19604 36785.35 1058.841 10084.2 Jun 117 900 1000 500 44.29212 37153.21 1037.664 9882.516 Jul 73 1200 650 126 43.40628 37524.74 1016.911 9684.866 Aug 89 1200 890 600 42.53815 37899.98 996.5727 9491.168 Sep 77 1200 658 140 41.68739 38278.98 976.6412 9301.345 Oct 92 1200 986 220 40.85364 38661.77 957.1084 9115.318 Nov 92 1200 659 120 40.03657 39048.39 937.9662 8933.012 Dec 62 1500 888 50 39.23584 39438.88 919.2069 8754.352 Jan 77 1500 1020 125 38.45112 39833.26 900.8228 8579.264 Feb 55 1500 325 562 37.6821 40231.6 882.8063 8407.679 Mar 88 1500 1500 300 36.92846 40633.91 865.1502 8239.526 Apr 131 1100 1500 251 36.18989 41040.25 847.8472 8074.735 May 112 1100 800 401 35.46609 41450.66 830.8902 7913.24 Jun 115 1100 800 200 34.75677 41865.16 814.2724 7754.976 Jul 131 1100 1350 150 34.06163 42283.81 797.987 7599.876 Aug 140 1100 1350 687 33.3804 42706.65 782.0272 7447.879 Sep 119 1100 800 200 32.71279 43133.72 766.3867 7298.921 Oct 128 1100 825 500 32.05853 43565.06 751.059 7152.943 Nov 148 950 789 140 31.41736 44000.71 736.0378 7009.884 Dec 156 950 1200 263 30.78902 44440.71 721.317 6869.686 Jan 166 950 1200 241 30.17324 44885.12 706.8907 6732.292 Feb 175 950 1600 523 29.56977 45333.97 692.7529 6597.646 Mar 146 950 600 526 28.97838 45787.31 678.8978 6465.694 Apr 126 1200 600 641 28.39881 46245.18 665.3199 6336.38 May 94 1500 600 320 27.83083 46707.64 652.0135 6209.652 Jun 118 1300 600 265 27.27422 47174.71 638.9732 6085.459 Jul 122 1250 600 145 26.72873 47646.46 626.1937 5963.75 Aug 131 1250 700 285 26.19416 48122.92 613.6699 5844.475 Sep 138 1200 700 521 25.67027 48604.15 601.3965 5727.585 Oct

You select the following variables:

 Price Pwm adv \$ Dec Feb mos in business GDP Dow Jones Ave Age Income

 SUMMARY OUTPUT Regression Statistics Multiple R 0.999662 R Square 0.999324 Adjusted R Square 0.99903 Standard Error 1.080827 Observations 34 ANOVA df SS MS F Regression 10 39723.51 3972.351 3400.44 Residual 23 26.86831 1.168188 Total 33 39750.38 Coefficients Standard Error t Stat P-value Upper 95.0% Intercept 218.594 279.748 0.781396 0.442539 797.2961 Price -0.09904 0.001088 -91.0638 6.28E-31 -0.09679 Pwm 0.030112 0.000706 42.67671 2.1E-23 0.031571 adv \$ 0.01156 0.001222 9.457305 2.17E-09 0.014089 Dec 9.288621 0.887907 10.46125 3.24E-10 11.12539 Feb 8.619078 0.763213 11.29314 7.35E-11 10.1979 mos in business 5.951601 8.343354 0.713334 0.482815 23.21112 GDP -0.35999 12383.11 -2.9E-05 0.999977 25616.01 Dow Jones 0.03772 1300.226 2.9E-05 0.999977 2689.757 Ave Age 2.211563 3.854305 0.57379 0.571679 10.18479 Income -0.00506 0.013574 -0.37298 0.712582 0.023018

What do you think??

GDP, Dow Jones and Income look insignificant, but is that reasonable?

The problem is that they are fighting each other.  They each have similar information.  You only need one of these variables.  You need to think about which makes the most sense or test and see which one works best alone in model.

Can check covariance (variables that are correlated with each other).  This is undesirable.  Each explanatory variable should give you new information.

In excel, it's also under tools, data analysis, covariance.  Highlight data you want to check.

Covariance table:

 Price Pwm adv \$ Dec Dow Jones Ave Age Income Price 43194 Pwm -26060 117538 adv \$ -4557 -6724 30249 Dec -6 -14 -10 0 Feb 12 10 -10 0 mos in business -263 -1076 541 0 GDP 6630 17638 -9701 -1 305052 -707947 Dow Jones 63141 167984 -92392 -14 2905255 Ave Age 212 769 -397 0 12203 52 Income -102177 -449231 222270 5 -6742349 -28900 16335056

The higher the number, the greater the covariance.  Income and Dow Jones have strong correlation.  As do Income and GDP and GDP and Dow Jones.  We just need one of these variables.

Let's keep only income as this is logical.  Re-run the regression:

 SUMMARY OUTPUT Regression Statistics Multiple R 0.999662 R Square 0.999324 Adjusted R Square 0.999108 Standard Error 1.036802 Observations 34 ANOVA df SS MS F Regression 8 39723.51 4965.439 4619.188 Residual 25 26.87398 1.074959 Total 33 39750.38 Coefficients Standard Error t Stat P-value Upper 95.0% Intercept 214.2703 261.6642 0.818875 0.420595 753.1774 Price -0.09905 0.001025 -96.6054 1.08E-33 -0.09694 Pwm 0.030119 0.000669 45.02481 1.87E-25 0.031497 adv \$ 0.011552 0.001167 9.900353 3.92E-10 0.013955 Dec 9.285746 0.85082 10.91388 5.34E-11 11.03804 Feb 8.598333 0.674088 12.75551 1.92E-12 9.986641 mos in business 5.805573 7.746753 0.74942 0.460598 21.7603 Ave Age 2.122573 3.488305 0.608483 0.548357 9.306867 Income -0.00483 0.012633 -0.38262 0.705234 0.021185

Income still looks bad - check covariance again.

 Price Pwm adv \$ Dec Income Price 43194.2 Pwm -26060 117537.7 adv \$ -4556.53 -6723.71 30249.34 Dec -6.40138 -13.9689 -9.96021 0.055363 Feb 11.72145 9.870242 -9.76384 -0.00519 mos in business -262.5 -1075.62 541.0441 0.029412 Ave Age 212.2758 769.0929 -397.053 -0.04747 -28899.6 Income -102177 -449231 222270 4.870607 16335056

Income is varying along with advertising and the negative of price and age (should investigate this).

Remove it:

 Remove income SUMMARY OUTPUT Regression Statistics Multiple R 0.99966 R Square 0.99932 Adjusted R Square 0.999137 Standard Error 1.019641 0.01945 Observations 34 ANOVA df SS MS F Regression 7 39723.35 5674.764 5458.249 Residual 26 27.03136 1.039668 Total 33 39750.38 Coefficients Standard Error t Stat P-value Upper 95.0% Intercept 114.302 14.08797 8.113449 1.36E-08 143.2603 Price -0.09905 0.001008 -98.2357 5.91E-35 -0.09697 Pwm 0.030188 0.000634 47.64864 7.83E-27 0.031491 adv \$ 0.011618 0.001135 10.23668 1.3E-10 0.013951 Dec 9.328693 0.829424 11.2472 1.74E-11 11.0336 Feb 8.649533 0.649737 13.31237 4.06E-13 9.985087 mos in business 2.842606 0.208486 13.63453 2.35E-13 3.271155 Ave Age 0.792431 0.283334 2.796804 0.009581 1.374834

Looks good.  You might question age - as it doesn't make sense.  Might investigate replacing age with income.  This is the art.  We decide to keep the model as is.  This is our demand equation:

The above model looks good, but perhaps a transformation would be even better.  Let's try log-linear.  I take logs of all my original data and place in a new table.

 ln(Demand) ln(Price) ln(Pwm) ln(mos) ln(age) ln(Inc) ln(adv) Dec Feb 3.85 7.31 7.09 0.00 3.91 10.46 5.30 0 0 3.93 7.31 6.91 0.69 3.89 10.47 5.30 0 1 3.69 7.31 6.68 1.10 3.87 10.48 5.52 0 0 4.80 6.80 7.31 1.39 3.85 10.49 5.42 0 0 4.84 6.80 7.38 1.61 3.83 10.50 5.19 0 0 4.80 6.80 7.24 1.79 3.81 10.51 4.79 0 0 4.76 6.80 6.91 1.95 3.79 10.52 6.21 0 0 4.29 7.09 6.48 2.08 3.77 10.53 4.84 0 0 4.49 7.09 6.79 2.20 3.75 10.54 6.40 0 0 4.34 7.09 6.49 2.30 3.73 10.55 4.94 0 0 4.52 7.09 6.89 2.40 3.71 10.56 5.39 0 0 4.52 7.09 6.49 2.48 3.69 10.57 4.79 1 0 4.13 7.31 6.79 2.56 3.67 10.58 3.91 0 0 4.34 7.31 6.93 2.64 3.65 10.59 4.83 0 1 4.01 7.31 5.78 2.71 3.63 10.60 6.33 0 0 4.48 7.31 7.31 2.77 3.61 10.61 5.70 0 0 4.88 7.00 7.31 2.83 3.59 10.62 5.53 0 0 4.72 7.00 6.68 2.89 3.57 10.63 5.99 0 0 4.74 7.00 6.68 2.94 3.55 10.64 5.30 0 0 4.88 7.00 7.21 3.00 3.53 10.65 5.01 0 0 4.94 7.00 7.21 3.04 3.51 10.66 6.53 0 0 4.78 7.00 6.68 3.09 3.49 10.67 5.30 0 0 4.85 7.00 6.72 3.14 3.47 10.68 6.21 0 0 5.00 6.86 6.67 3.18 3.45 10.69 4.94 1 0 5.05 6.86 7.09 3.22 3.43 10.70 5.57 0 0 5.11 6.86 7.09 3.26 3.41 10.71 5.48 0 1 5.16 6.86 7.38 3.30 3.39 10.72 6.26 0 0 4.98 6.86 6.40 3.33 3.37 10.73 6.27 0 0 4.84 7.09 6.40 3.37 3.35 10.74 6.46 0 0 4.54 7.31 6.40 3.40 3.33 10.75 5.77 0 0 4.77 7.17 6.40 3.43 3.31 10.76 5.58 0 0 4.80 7.13 6.40 3.47 3.29 10.77 4.98 0 0 4.88 7.13 6.55 3.50 3.27 10.78 5.65 0 0 4.93 7.09 6.55 3.53 3.25 10.79 6.26 0 0

 SUMMARY OUTPUT Regression Statistics Multiple R 0.986941 R Square 0.974052 Adjusted R Square 0.965749 Standard Error 0.069079 Observations 34 ANOVA df SS MS Regression 8 4.478337 0.559792 Residual 25 0.119299 0.004772 Total 33 4.597636 Coefficients Standard Error t Stat Lower 95.0% Upper 95.0% Intercept 27.29298 0 65535 27.29298 27.29298 ln(Price) -1.21469 0.085059 -14.2805 -1.38987 -1.0395 ln(Pwm) 0.258325 0.040836 6.325837 0.17422 0.342429 ln(adv) 0.023881 0.022677 1.053086 -0.02282 0.070585 Dec 0.060605 0.055786 1.086398 -0.05429 0.175498 Feb 0.01679 0.044114 0.380596 -0.07406 0.107644 mos in business 0.115075 0.036956 3.113827 0.038963 0.191188 Ave Age -1.30596 0 65535 -1.30596 -1.30596 Income -1.09152 0 65535 -1.09152 -1.09152 Remove income SUMMARY OUTPUT Regression Statistics Multiple R 0.986941 R Square 0.974052 Adjusted R Square 0.967066 Standard Error 0.067738 Observations 34 ANOVA df SS MS Regression 7 4.478337 0.639762 Residual 26 0.119299 0.004588 Total 33 4.597636 Coefficients Standard Error t Stat Lower 95.0% Upper 95.0% Intercept 13.76921 1.097963 12.54069 11.51231 16.0261 ln(Price) -1.21469 0.083407 -14.5633 -1.38613 -1.04324 ln(Pwm) 0.258325 0.040043 6.451113 0.176014 0.340635 ln(adv) 0.023881 0.022237 1.073942 -0.02183 0.069589 Dec 0.060605 0.054702 1.107913 -0.05184 0.173047 Feb 0.01679 0.043257 0.388134 -0.07213 0.105707 mos in business 0.115075 0.036239 3.175493 0.040586 0.189565 Ave Age -0.76836 0.155205 -4.95063 -1.08739 -0.44933 remove Dec, Feb and Adv SUMMARY OUTPUT Regression Statistics Multiple R 0.971709 R Square 0.944219 Adjusted R Square 0.938641 Standard Error 0.092459 Observations 34 ANOVA df SS MS Regression 3 4.341174 1.447058 Residual 30 0.256462 0.008549 Total 33 4.597636 Coefficients Standard Error t Stat Lower 95.0% Upper 95.0% Intercept 10.67027 0.984992 10.83285 8.65865 12.68189 ln(Price) -1.18737 0.105218 -11.2848 -1.40226 -0.97249 ln(Pwm) 0.237193 0.051394 4.615168 0.132232 0.342154 mos in business 0.282874 0.020884 13.54498 0.240223 0.325524

No improvement - in fact I had to remove variables that seem intuitive.  I will stick with my linear model (I reality I might also investigate squaring some of the variables).

I will stick with my first linear demand model.

Now we can forecast our sales.  These are the types of questions I can answer with my model.

1. What type of demand loss should I expect if I raise my prices?
2. What level of demand increase should I anticipate if I spend an additional \$100 of advertising?
3. How will my demand changes as incomes rise?  As my customers age?
4. If I want to open a similar store in Westmont and forecast the following:

Price at Walmart = \$1200/carat

Age = 45

What should expect to sell in the first 3 months of the year (Jan-Mar) if I open the store in Jan?

I use the demand equation for forecast sales in each month.  Check this yourself on excel and see if you can replicate these results.

 Jan Feb Mar Total Ave Pwm = 1200 95.6766 107.1666 101.3566 304.1998 101.3999 Age= 45 Adv = 500 Price 1000

Minimize