ps8_fall2015_shorter+version

5
Department of Economics W3412 Columbia University Fall 2015 Problem Set 8 Introduction to Econometrics Profs. Seyhan Erden and Miikka Rokkanen for all sections. Part I: True, False, Uncertain with Explanation: 1. You want to estimate a supply equation for snow boots: = + + + where is the quantity sold in city i, is price charged in city i, and is cost of transportation to city i from a central production facility. The instrument you had in mind using for in this equation, annual snowfall, is correlated with . This correlation is evidence that your instrument is not valid. 2. One can use the same instrument for two different endogenous explanatory variables in the same regression. 3. Instrument relevance means that some of the variation in the regressor is related to variation in the instrument. Part II: 1. Do you think attendance to lectures affects performance on final exam? A model to explain the standardized outcome on a final exam (stndfnl) in terms of percentage of classes attended, prior college grade point average, and ACT score is stndfnl = β 0 + β 1 atndrte +β 2 priGPA + β 3 ACT + u where variables are given in the following table: Variable Definition stndfnl Standardized final exam score atndrte Percentage of classes attended priGPA Prior college grade point average ACT Achievement Test score priGPAxatndrte Prior GPA times attendance rate (a) Let dist be the distance from the students’ living quarters to the lecture hall. Do you think dist is uncorrelated with u?

description

Econometrics pset

Transcript of ps8_fall2015_shorter+version

Page 1: ps8_fall2015_shorter+version

Department of Economics W3412

Columbia University Fall 2015

Problem Set 8

Introduction to Econometrics

Profs. Seyhan Erden and Miikka Rokkanen

for all sections.

Part I:

True, False, Uncertain with Explanation:

1. You want to estimate a supply equation for snow boots:

𝑄𝑖 = 𝛾 + 𝛿𝑃𝑖 + 𝜃𝐶𝑖 + 𝑢𝑖

where 𝑄𝑖 is the quantity sold in city i, 𝑃𝑖 is price charged in city i, and 𝐶𝑖 is cost of

transportation to city i from a central production facility. The instrument you had in mind

using for 𝑃𝑖 in this equation, annual snowfall, is correlated with 𝑄𝑖. This correlation is

evidence that your instrument is not valid.

2. One can use the same instrument for two different endogenous explanatory variables in

the same regression.

3. Instrument relevance means that some of the variation in the regressor is related to

variation in the instrument.

Part II:

1. Do you think attendance to lectures affects performance on final exam? A model to explain the

standardized outcome on a final exam (stndfnl) in terms of percentage of classes attended, prior

college grade point average, and ACT score is

stndfnl = β0 + β1 atndrte +β2 priGPA + β3 ACT + u

where variables are given in the following table:

Variable Definition

stndfnl Standardized final exam score

atndrte Percentage of classes attended

priGPA Prior college grade point average

ACT Achievement Test score

priGPAxatndrte Prior GPA times attendance rate

(a) Let dist be the distance from the students’ living quarters to the lecture hall. Do you think

dist is uncorrelated with u?

Page 2: ps8_fall2015_shorter+version

(b) Assuming that dist and u are uncorrelated, what other assumption must dist satisfy in order

to be a valid IV for atndrte?

(c) Suppose we add the interaction term priGPAxatndrte:

stndfnl = β0 + β1 atndrte +β2 priGPA + β3 ACT + β4 priGPAxatndrte + u

If atndrte is correlated with u, in general, so is priGPAxatndrte. What might be a good IV for

priGPAxatndrte? [Hint: if E(u│priGPA,ACT,dist)=0, as happens when priGPA, ACT and dist

are all exogenous, then any function of priGPA and dist is uncorrelated with u]

2. The purpose of this question is to compare the estimates and standard errors obtained by correctly

using 2SLS with those obtained using inappropriate procedures. Use the data file WAGE2.dta,

variables are:

Variable Definition

wage Monthly earnings

educ Years of education

exper Years of working experience

tenure Years with current employment

black =1 if the person is African-American, 0

otherwise

sibs Number of siblings

(a) Use a 2SLS routine to estimate the equation

Log(wage) = β0 + β1 educ +β2 exper + β3 tenure + β4 black + u

using sibs as the IV for educ. Report your results

(b) Now, manually, carry out 2SLS. That is, first regress educ on sibs, exper, tenure and

black, obtain the fitted values educ_hat, then run the 2nd

stage regression log(wage) on

educ_hat, exper, tenure and black. Verify that estimated sample coefficients are identical

to those obtained form part (a) but that the standard errors are somewhat different. The

standard errors obtained from the second stage regression when manually carrying out

2SLS are generally inappropriate.

(c) Now, use the following two-step procedure, which generally yields inconsistent

parameter estimates of βj and not just inconsistent standard errors. In step one, regress

educ on sibs only and obtain the fitted values, say educ_tilde (Note that this is an

incorrect first stage regression). Then, in the second step, run the regression of log(wage)

on educ_tilde, exper, tenure and black. How does the estimate from this incorrect, two-

step procedure compare with the correct 2SLS estimate of the return to education?

Page 3: ps8_fall2015_shorter+version

3. Suppose you are investigating how wage inflation affects price inflation. You hypothesize that

wage costs force prices up. You model this by the following linear relationship,

p = β0 + β1w + up

where p = annual rate of growth of prices, w = annual rate of growth in wages, and up is an error

term. You are then interested in estimating the parameter β1. However, you suspect that, at the

same time, workers protect their wages by demanding higher wages as prices rise, but their ability

to do so becomes weaker, the higher the rate of unemployment.

You try to model this through the following second equation,

w = 0 + 1 p + 2 U + uw

Here, U is the unemployment rate which we assume is an exogenous variable, Cov(U, up) =0 and

Cov(U, uw) = 0.

(a) From the above economic arguments, what do you expect the signs of the slope

parameters in the two equations to be?

(b) Solve the two equations w.r.t. the endogenous variables, p and w. That is, express these

variables as functions of U, up and uw only.

(c) Show that Cov(w, up) 0 and use this to argue that the OLS estimator of β1 will be biased

and inconsistent.

(d) Under what additional assumption on U will 2SLS estimation of β1 with U as an

instrument be consistent? How can you test this assumption?

4. Consider the following regression model:

Y = β0 +β1X + u

where Cov(X, u) 0. You want to use the variables Z1 and Z2 as instruments for X. You check

for instrument exogeneity using the overidentifying restrictions test and the value of the

heteroskedasticity-robust J-statistic you obtain is J = 15.7. Assume that the instruments are not

weak and that your sample is large.

(a) What does the value of the J-statistic mean? Do we reject the null hypothesis of instrument

exogeneity?

(b) Does the value of the J-statistic suggests that Cov(u,Z1) 0, or Cov(u,Z2) 0 or both?

Explain.

5. By studying the probability limit (plim) of the IV estimator we can see that when Z and u

are possibly correlated, we can write

𝑝𝑙𝑖𝑚 �̂�1,𝐼𝑉 = 𝛽1+ 𝐶𝑜𝑟𝑟(𝑍,𝑢)

𝐶𝑜𝑟𝑟(𝑍,𝑋) 𝜎𝑢

𝜎𝑥 (1)

Page 4: ps8_fall2015_shorter+version

where u and x are the standard deviation of u and X in the population, respectively. The

interesting part of this equation involves the correlation terms. It shows that, even if

Corr(Z,u) is small, the inconsistency in the IV estimator can be very large if Corr(Z,X) is

also small. Thus, even if we focus only on consistency, it is not necessarily better to use

IV than OLS if the correlation between Z and u are smaller than that between X and u.

Using the fact that 𝐶𝑜𝑟𝑟(𝑋, 𝑢) = 𝐶𝑜𝑣(𝑋, 𝑢) (𝜎𝑢. 𝜎𝑥)⁄ along with the fact that plim �̂�1=

𝛽1 + Cov(X,u)/Var(X) = 𝛽 when 𝐶𝑜𝑣(𝑋, 𝑢) = 0, we can write the plim of OLS estimator

– call it 𝑝𝑙𝑖𝑚 �̂�1,𝑂𝐿𝑆 - as

𝑝𝑙𝑖𝑚 �̂�1,𝑂𝐿𝑆 = 𝛽1+ 𝐶𝑜𝑟𝑟(𝑋, 𝑢) 𝜎𝑢

𝜎𝑥 (2)

Assume that u = x , so that the population variance in the error term is the same as it is

in X. Suppose the instrumental variable, Z, is slightly correlated with u: 𝐶𝑜𝑣(𝑍, 𝑢) = 0.1.

Suppose also that Z and X have somewhat stronger correlation: 𝐶𝑜𝑣(𝑍, 𝑋) = 0.2. (i) What is the bias in the asymptotic IV estimator?

(ii) How much correlation would have to exist between X and u before OLS has more

asymptotic bias than TSLS?

6. Define the in terms of observable differences in the treatment and control

group, before and after the treatment. Explain why this presentation is the equivalent of

calculating the coefficient in a regression framework.

Page 5: ps8_fall2015_shorter+version

The following questions will not be graded, they are for you to practice and will be discussed at

recitation:

1. SW Exercise 12.2

2. SW Exercise 12.5

3. SW Exercise 12.7

4. SW Exercise 12.8

5. SW Exercise 12.10

6. SW Empirical Exercise 12.1

7. SW Empirical Exercise 12.2

8. SW Exercise 13.4

9. SW Exercise 13.5