The data used for this project are contained in the EViews-files. Before you start working, copy the files on a local drive and use the copied files only.
You are expected to solve the project until and including question 3a of this PC-Project before our PC session. You may complete the remaining exercises during the session.
Please hand in a printout of your results at the beginning of the PC session or upload your solutions to the respective StudyNet folder before 12:15. (name the file as follows: PC1_'surname1'_'surname2').
Wages, human capital, and ability
Data: 'pc_I_HS12.wf1' contains data on hourly wages, IQ score and several other variables for 920 men in the US in the year 1980.
hwage - hourly wages
lhwage - ln(hwage)
educ - years of education
exper - years of work experience
tenure - years with current employer
married - =1 if married
black - =1 if African-American
south - =1 if living in South
urban - =1 if living in SMSA
iq - IQ score
kww - knowledge of world work score
dropouts - =1 if high-school dropout
hsgrad - =1 if high-school graduate
somecoll - =1 if some college
collgrad - =1 if college graduate
SMSA is the Standard Metropolitan Statistical Area, an integrated economic and social unit having a large population nucleus. It is a good indicator of living in a city.
KWW is a general test of work-related abilities.
Note: Each individual belongs to one and only one education category. Each person can only be a high-school dropout (dropout), a high-school graduate (hsgrad), has some college (somecoll) or graduated from college (collgrad). The dummy variables on education add up to 100%.
Original source: M. Blackburn and D. Neumark (1992), "Unobserved ability, efficiency wages, and interindustry wage differential", Quarterly Journal of Economics, 107, 1421-1436.
The goal for your empirical study is to determine the causal effect of education on hourly wages for the population from which this sample has been drawn.
1. Conceptual questions a. Write down the average treatment effect on the treated (ATET) of higher education (D=1 if education is high, D=0 if education is low) on hourly wages using the Roy-Rubin-Causal Framework. Explain the difference of causation and correlation with this example. Formulate and comment on the counterfactual and the selection bias.
b. Explain the difference between ATET, ATENT and ATE with the above example. Under which circumstances are they equal?
c. Can random assignment of education solve the selection bias problem? Explain.
d. How could covariates help in this situation?
e. In trying to establish a causal relationship of education on hourly wages, do you think the knowledge of the following conditional expectation
E [hwage| educ, iq, kww, exper, tenure, married, south]
helps us in deciding whether education has an effect on wages? Why and how? If not, do you expect that we will over- or underestimate the causal effect?
f. If you were interested in estimating the returns to ability, which problem could arise by using iq or kww as measure for ability?
2. Descriptive analysis (use the file 'pc_i_hs12.wf1') a. Show the distribution of the outcome (hwage) and the treatment variable (educ). Obtain the distribution of hwage when education is high/low and comment briefly on the difference in the two means of hwage.
b. Investigate the variables exper and tenure. Does the data for these two characteristics appear to be credible? If not, state the problem and provide evidence.
c. Are educ, exper, tenure, south, urban, and iq correlated with hwage?
3. Estimation (use the file 'pc_i_hs12est.wf1') a. Regress hwage on educ, tenure and married. Then regress lhwage on the same covariates. How does the interpretation of the coefficient on education differ between both specifications?
From now on, use lhwage as the dependent (outcome) variable.
b. Estimate the effect of education on log hourly wages without any additional control variable by OLS - E[lhwage|educ]. Now use a specification that has exper as additional regressor - E[lhwage|educ, exper].
- Are the coefficients significant?
- How do you explain the difference in the coefficients for education in the two models? (Hint: What is the role of exper?)
- What might be the problem for the causal interpretation of the wage-education relationship after conditioning on experience? c. Regress lhwage on educ, exper, tenure, married, black, south and urban. Are the returns to experience and tenure significantly different? Write down a modified version of the model that allows you to simply answer this question, estimate it and comment on the result.
d. Add iq and kww as proxies for ability to the specification and redo the estimation. What happens to the effect of education on wages compared to the results obtained in 3c)? Can you intuitively explain these changes?
e. You now allow the return to education to be different for African-Americans. How does this change the results?
f. Now we investigate the interpretation of the coefficients in the linear regression model based on
E [lhwage| educ, exper, tenure, married, black, south, urban, iq, kww] a bit further:
What is the effect of one more year of education on wages for individuals with 11 years of education (high-school dropouts) compared to those with 17 years of education? Is this relationship reasonable?
g. Consider a model that allows for more flexibility on the return to education. One way would be to use dummy variables for education (dropouts, hsgrad, somecoll, collgrad).
- Regress lhwage on dropouts, somecoll, collgrad, using exper, tenure, married, black, south, urban, iq and kww as additional controls. What is the wage differential between high-school dropouts and high-school graduates? Is it significant?
- What is the wage differential between "some college" and college graduates? Estimate a model that allows you to assess easily whether this difference is significant?
- Compare the different returns to education (high school dropouts vs. high school graduates and "some college" vs. college graduates) to the ones you obtain in 3f. Keep in mind that between high school dropouts and high school graduates (as well as "some college" and college graduates) there is a 3 year difference in education.
4. Additional home exercises (the following exercises are voluntary and are not discussed in the PC lab, however they might help you in preparing for the exam) a. The value of conditioning on X: Consider the following table (similar to the one in the lecture) depicting different earnings levels for high and low education. We are interested in the effect of high education on earnings. Earnings and level of education depend on ability (X1) and mother's education level (X2).