The dataset also contains a variable y , which is the dependent variable of interest, as well as
x1 , x2 , x3 , x4 , x5 , and x6 , all explanatory variables that are potentially related to y and to each other. There are also variables z1 and z2 , which will be explained further in the questions below.
You will notice that your data is organized by id and period. There are multiple observations of each id for some number of periods - panel data. Each of you have roughly 5000 observations total.
For the purposes of this final, you may find the following STATA command useful: xtivreg does instrumental variable regressions on panel data. It has a similar syntax to ivreg and xtreg .
For regressing y on x , instrumenting x with z , for a panel where unit denotes the panel variable, you would use: xtivreg y (x = z), fe i(unit) first . This would use fixed effects to control for the common effect for each unit (the ai in our terminology), and would report the first stage results. xtivreg can also do random effects estimation (using the re option) or first-differencing (using the fd option).
One additional point. In using ivreg or xtivreg , if you want STATA to create dummies (using i.period , for example) you need to type xi: ivreg y (x1 =z) i.period .
1. I am interested in your best estimates of the effect of x1, x2, and x3 on y. For the purposes of this question, ignore the z1 and z2 variables.
When writing up your answer, comment on the following points:
- What are your preferred estimated effects of x1, x2, or x3? Are those effects statistically significant?
- What is the specification you use to estimate the effects? This can be as simple as telling me the STATA command you used.
- Why did you use the specification you did? What assumptions are you making when you use your specification? Why is it better than alternative specifications?
- What possible biases may exist in your estimates?
- For any other variables (besides x1, x2, or x3) you include, explain why they were included, and offer statistical tests (either joint or individual tests) that justify their inclusion. For any variables you exclude, explain why they were excluded.
- How were your standard errors calculated? What can you infer about how the noise in the data is distributed across the x-values?
2. I suspect that one of the three variables - x1, x2, or x3 - is endogenous. I suggest that two instruments - z1 and z2 - might be used to control for that endogeneity.
Provide me with new estimates of the effects of x1, x2, and x3 on y using instrumental variables if possible for the endogenous variable. You should comment on the following in your answer
- Which of the three x variables - x1, x2, or x3 - is the one that is endogenous? How do you justify your answer?
- What are the appropriate instruments for the endogenous variable? What tests and/or other information did you use to determine this?
- What is the specification you used to produce your new estimates of the effect of x1, x2, and x3 on y. Providing the STATA command is sufficient.
- Why did you use the specification you did?
- How do the estimated effects of x1, x2, or x3 differ in problem (2) from problem (1)?
- What can you infer about how the endogenous variable was related to the error term in (1)?
- Are you standard errors larger or smaller than in (1)? Explain why.
- Is your specification over-identified? If no, explain why. If yes, provide a Sargan test, explaining how you obtained the information necessary to complete the test, the null hypothesis of the test, and your ultimate decision to reject or fail to reject that null.
What does the test imply about your instruments?