Lecture 4 – More on instrumental variables - heterogeneity ... · heterogeneity Lecture 4 –...

heterogeneity

Lecture 4 – More on instrumental variables -heterogeneity and the Roy model

Economics 2123George Washington University

Instructor: Prof. Ben Williams

heterogeneity

• Outline• a model of heterogeneous effects• the Roy model (selection model/control function approach)• IV in models with heterogeneity

heterogeneity

Suppose Di denotes a binary “treatment” and (Y0i ,Y1i) thepotential outcomes.

• The observed outcome is

Yi = Y1iDi + Y0i(1 − Di)

• If Ydi = µd + Udi then

Yi = µ0 + (µ1 − µ0)Di + U0i + (U1i − U0i)Di

• If Zi is independent of (U0i ,U1i), is β identified?• Typically, no.

• The Roy model (and its extensions) help us think about thismore carefully.

Review of IV Roy Model Generalized Roy Model MTE

Roy Model

Two sector economy (e.g., think college graduates andhigh school graduates)Wage differential between sectors in general does not givethe (counterfactual) wage effect from switching sectors.Sector choice causes wage gains (or losses) but wagedifferentials also induces sector choice.

Economics 379 George Washington University

Lecture 4


Roy Model

Suppose Yd = µd + Ud for d = 0,1 and thatD = 1(Y1 − Y0 ≥ c)

Then E(Y0 | D = 1) 6= E(Y0 | D = 0) (selection bias) andand E(Y1 − Y0 | D = 1) 6= E(Y1 − Y0) (sorting gains)


Lecture 4


Roy Model

Sector 1 observed earnings:

E(Y1 | D = 1) = µ1 + E(U1 | U1 − U0 > −(µ1 − µ0 − c))

not equal to µ1 because U1 and V = U1 − U0 arecorrelatedSuppose (U1,U0) are jointly normal with mean 0,covariance denoted by σ10 and variances σ2

d = Var(Ud ).Due to normality, U1 = ρσV V + ε where

ρ – correlation between U1 and VσV – variance of Vε1 and V are independent


Lecture 4


Roy Model

As a result, observed sector 1 earnings –

µ1 +Var(U1)− Cov(U1,U0)√

Var(U1 − U0)λ

(− µ1 − µ0 − c√

Var(U1 − U0)

)

λ(·) is the inverse mills ratio –λ(z) = E(Z | Z > z) = φ(z)/(1− Φ(z)) for Z ∼ N(0,1)


Lecture 4


Roy Model

some empirical implications:positive selection in both sectors if the two distributions areuncorrelatedin general, cannot have negative selection in both sectorscost increase –

∂E(Y1 | D = 1)

∂c=

Var(U1)− Cov(U1,U0)

Var(U1 − U0)λ′ (x∗)

where λ′ > 0


Lecture 4


Roy Model

more results:Also, ∂Var(Y1 | D = 1)/∂c has the opposite sign.negative selection in sector 0 requiresVar(U0) < Cov(U0,U1) < Var(U1)

selection reduces inequalityif the distributions are not normal these things don’t have tobe true


Lecture 4


Roy Model

Identification:If U1,U0 are jointly normal then the unknown parametersare µ1, µ0,Var(U1),Var(U0),Cov(U1,U0).Three different observation schemes:

Observe (Yi ,Di ) for iid sample i = 1, . . . ,nObserve Yi for iid sample i = 1, . . . ,nObserve (Yi ,Di ) for iid sample i = 1, . . . ,n where Yi isobserved to be missing if Di = 0.


Lecture 4


Roy Model

Identification:If U1,U0 are jointly normal then all parameters areidentified in first sampling scheme.Partial identification in the other two sampling schemes.Suppose there are regressors, µd = β′dX for d = 0,1.Then

E(Y1 | D = 1,X = x) = β′1x + E(U1 | U1 − U0 > −z,X = x)

E(Y0 | D = 0,X = x) = β′0x + E(U0 | U1 − U0 < −z,X = x)

Pr(D = 1 | X = x) = Pr(U1 − U0 > −z | X = x)

where z = (β1 − β0)′x − c


Lecture 4


Roy Model

Now suppose U1,U0 are jointly normal, conditional on X

Then U1 =σ2

1−σ10σV

(U1 − U0) + ε where ε ⊥⊥ U1 − U0 | X

Therefore, E(Y1 | D = 1,X = x) = β′1x +σ2

1−σ10σV

λ(−z)

where z = (β′1x − β′0x − c)/σV andthe conditional probability of being in sector 1 is

Pr(D = 1 | X = x) = Φ (z) (propensity score)


Lecture 4


Roy Model

Based on these two conditional moments, β1 is identifiedas well as some combinations of other parameters.The direction of selection is identifiedβ1,k − β0,k is identified up to scale


Lecture 4


Roy Model

If we have “complete” data

E(Y0 | D = 0,X = x) = β′0x +σ2

0−σ10σV

λ(z) wherez = (β′1x − β′0x − c)/σV

everything is identified


Lecture 4


Roy Model

Counterfactualsthe distribution of potential wage gains – Y1 − Y0

the proportion of the population who benefits –Pr(Y1 > Y0)

the effect of a policy of subsidizing cost for those with Y0below a cutoff value, y0


Lecture 4


Generalized Roy Model

Generalized Roy Model:Let Yd = µd (X ) + Ud and D = 1(µD(X ,Z ) ≥ V )

where (U1,U0,V ) ⊥⊥ (X ,Z )

special case –µD(X ,Z ) = µ1(X )− µ0(X )− µC(X ,Z )V = U1 − U0 − UC

what is identified in this case?


Lecture 4



Generalized Roy Model:Assumptions

HV1 (U1,U0,V ) ⊥⊥ (X ,Z )HVN1 (U1,U0,V ) is normally distributed

Then

E(Y1 | D = 1,X = x ,Z = z) = β′1x−Cov(U1,V )

σVλ

(−µD(x , z)

σV

)


Lecture 4



Identificationβ1, β0 are identified with data on (Y ,D,X ,Z )

suppose µD(x , z) = β′1x − β′0x + γ′xx + γ′zzif there is a component of x with γx set to 0 then σV , γx , γzare identifiedregardless, the sorting gains and the selection bias can beidentified


Lecture 4



Generalized Roy Model without normalityHeckman and Honore (1990) study the Roy model withand without regressors under nonnormalityLet’s consider the generalized Roy model undernonnormalityIn this case,

E(Y1 | D = 1,X = x ,Z = z) = β′1x + E(U1 | µD(x , z) ≥ V )


Lecture 4



Generalized Roy Model without normalityThe propensity score:P(x , z) := Pr(D = 1 | X = x ,Z = z) = FV (µD(x , z))

Index sufficiency: E(U1 | µD(x , z) ≥ V ) = K1(P(x , z))

K1 is called a control function


Lecture 4



Generalized Roy Model without normalityβ1 is identified if limz→∞ P(x , z) = 1 (orlimz→−∞ P(x , z) = 1)because limP(x ,z)→∞ K1(P(x , z)) = 0“identification at infinity”HV1 is needed for this argument but HVN1 is replaced bythe support condition


Lecture 4


MTE

Marginal Treatment EffectDefine UD = FV (V )

Then D = 1(P(X ,Z ) ≥ UD)

what is the distribution of UD?MTE(x ,u) = E(Y1 − Y0 | X = x ,UD = u) (conceptoriginates with Bjorklund and Moffitt (1987))


Lecture 4


MTE

Identification –

∂E(Y | X = x ,P(X ,Z ) = p)

∂p=∂E(Y0 | X = x ,P(X ,Z ) = p)

∂p

+∂E(D(Y1 − Y0) | X = x ,P(X ,Z ) = p)

∂p

= 0 +∂

∂p

∫ p

0MTE(x ,u)du

= MTE(x ,p)


Lecture 4


MTE

Marginal Treatment Effectmany parameters of interest can be written as∫ 1

0MTE(x ,u)ω(x ,u)du,

∫ 1

0ω(x ,u)du = 1

for example,

ATE(x) := E(Y1 − Y0 | X = x), ωATE (x ,u) = 1[0,1]

TT (x) := E(Y1 − Y0 | D = 1,X = x),

ωTT (x ,u) ∝ Pr(P(x ,Z ) > u | X = x)


Lecture 4


MTE

weights for IV

the IV estimand is ∆IV (x) = Cov(J(Z ),Y |X=x)Cov(J(Z ),D|X=x) where

J = J(Z ) is some function of the instruments that maydepend implicitly on X as well.

then ∆IV (x) =∫ 1

0 MTE(x ,u)ωIV (x ,u)du where

ωIV (x ,u) =E(J − E(J) | X = x ,P ≥ u)Pr(P ≥ u | X = x)

Cov(J,P | X = x)


Lecture 4


MTE

weights for IVwhat if J = P?if Z is scalar and binary then

∆IV (x) =

∫ p1

p0

MTE(u, x)1

p1 − p0du

where ps = Pr(D = 1 | Z = s,X = x) for s = 0,1in general, weights are not always positive!!


Lecture 4


MTE

LATELet D(z) denote the value D takes when Z takes the valuez.Imbens and Angrist showed that the IV estimand in thebinary case takes the form E(Y1 − Y0 | D(z)− D(z ′) = 1)

This is called the local average treatment effectrepresents the average effect of treatment for individualsinduced to receive treatment when Z changes from 0 to 1

MTE(P(z)) = limz′→z

LATE(z, z ′)


Lecture 4


MTE

Carneiro, Heckman, and Vytlacil (2011)data from NLSYY is log age in 1991 (individuals are between 28 and 34),D represents college attendance, X contains usual controlsinstruments: (i) distance to college, (ii) local wage, (iii) localunemployment, (iv) average local public tuition


Lecture 4


MTE2771cARnEiRO Et Al.: EStimAting mARginAl REtuRnS tO EducAtiOnVOl. 101 nO. 6

mean values in the sample. As above, we annualize the MTE. Our estimates show that, in agreement with the normal model, E( u 1 − u 0 | u S = u S ) is declining in u S , i.e., students with high values of u S have lower returns than those with low values of u S .

Even though the semiparametric estimate of the MTE has larger standard errors than the estimate based on the normal model, we still reject the hypothesis that its slope is zero. We have already discussed the rejection of the hypothesis that MTE is constant in u S , based on the test results reported in Table 4, panel A. But we can also directly test whether the semiparametric MTE is constant in u S or not. We evaluate the MTE at 26 points, equally spaced between 0 and 1 (with intervals of 0.04). We construct pairs of nonoverlapping adjacent intervals (0–0.04, 0.08–0.12, 0.16–0.20, 0.24–0.28, …), and we take the mean of the MTE for each pair. These are LATEs defined over different sections of the MTE. We compare adjacent LATEs. Table 4, panel B, reports the outcome of these comparisons. For example, the first column reports that

E ( Y 1 − Y 0 | X = _ x , 0 ≤ u S ≤ 0.04)

− E ( Y 1 − Y 0 | X = _ x , 0.08 ≤ u S ≤ 0.12) = 0.0689.

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1−0.6

−0.4

−0.2

0

0.2

0.4

0.6

0.8

US

MT

E

Figure 4. E( Y 1 − Y 0 | X, u S ) with 90 Percent Confidence Interval— Locally Quadratic Regression Estimates

notes: To estimate the function plotted here, we first use a partially linear regression of log wages on polynomials in X, interactions of polynomials in X and P, and K(P), a locally quadratic function of P (where P is the predicted probability of attending college), with a bandwidth of 0.32; X includes experience, current average earnings in the county of residence, current average unemployment in the state of residence, AFQT, mother’s education, number of siblings, urban residence at 14, permanent local earnings in the county of residence at 17, permanent unemployment in the state of residence at 17, and cohort dummies. The figure is generated by evaluating by the derivative of (9) at the average value of X. Ninety percent standard error bands are obtained using the bootstrap (250 replications).


Lecture 4


MTE

MTE assumptions:HV1the distribution of µD(x ,Z ) conditional on X = x is notdegenerate (exclusion restriction)0 < Pr(D = 1 | X = x) < 1 for each xX is invariant to counterfactual manipulations (X1 = X0)

Vytlacil shows that these are equivalent to the conditions ofImbens and Angrist (1994).uniformity/monotonicity: Pr(D(z) ≥ D(z ′)) is equal to 1 or0 for each pair z, z ′ – implied by separability in HV model.


Lecture 4

heterogeneity

Summarizing,• In models with essential heterogeneity,IV does not

estimate an economically interesting parameter.• Instead, IV estimates a LATE, or a weighted average of the

MTE.• Under the more restrictive assumptions of Imbens and

Angrist (1994), this is the the “treatment effect for thoseinduced to switch by an increase in Z ”.

• More generally, the weights may be negative.• Different instruments identify different parameters.• The MTE itself can be identified so we can do more.

Lecture 4 – More on instrumental variables - heterogeneity ... · heterogeneity Lecture 4 –...

Documents

Transcript of Lecture 4 – More on instrumental variables - heterogeneity ... · heterogeneity Lecture 4 –...