Evaluation of Public Policies - Université Laval · Guy Lacroix (Université Laval) Program...

Evaluation of Public PoliciesParametric and non-parametric approaches

Guy Lacroix

CIRPEE, Université Laval

AGRODEP, Dakar28-30 Octobre 2015

http://www.ecn.ulaval.ca/guy_lacroix/Cours/Dakar/

Guy Lacroix (Université Laval) Program Evaluation 25–28 October 2015 1 / 72

http://www.ecn.ulaval.ca/guy_lacroix/Cours/Dakar/

The fundamental problem of evaluation


What is a counterfactual ?A counterfactual is a virtual representation of the outcome of a givenindividual were he in a state other than the observed one. . .

ProblemHow do we measure the impact of a program on a givenindividual?Need to observe the individual in two states, with and without theprogram.Selectivity issue: program participants are not randomly chosen!



Sample selection problem

Definition (Sample selection bias)The sample selection bias or auto-selection bias arises whenparticipants are not representative of the population of interest⇒Cannot make inference based on such a sample.

Source of the problemAuto-selection: derives from participants.

Program manager (goals need to be met)

For the analyst (econometrician):

Linked to observable characteristicsLinked to unobservable characteristicsLinked to both



Statistical fixes

Cross-sectional dataPanel dataBased on

Observable heterogeneityUnobserved heterogeneityBoth


Evaluation problem: formal presentation

Formal presentation

Causal model of Rubin (1974)

Treatment variable→ T ∈ [0,1]

Outcome variable→ Y0 ou Y1

Observed outcome variable→ Y = T · Y1 + (1− T ) · Y0



Parameter of interest

Causal impact of treatment T: ∆ = Y1 − Y0

Unobservable

Varies across individuals. ∃ a distribution of treatment effectsSince ∆ is unobservable, so is the distribution





UnobservableVaries across individuals. ∃ a distribution of treatment effects

Since ∆ is unobservable, so is the distribution





UnobservableVaries across individuals. ∃ a distribution of treatment effectsSince ∆ is unobservable, so is the distribution



Parameter of interest (cont.)

Parameter of interest from policy maker’s point of view:

Mean causal effect in the population (ATE)

∆ATE = E(Y1 − Y0)→ Hard to identify

Mean causal effect on the treated population (ATT)

∆ATT = E(Y1 − Y0|T = 1)→ (The most common)



Selection bias

Conditional Independence Assumption (CIA) (identification)

Si (Y0,Y1) ⊥ T , then ∆ATE = ∆ATT

Mean causal effect on the population

∆ATE = E(Y1 − Y0) = E(Y1|T = 1)− E(Y0|T = 0)

= E(Y |T = 1)− E(Y |T = 0)

Mean causal effect on the treated population

∆ATT = E(Y1|T = 1)− E(Y0|T = 1)

= E(Y1|T = 1)− E(Y0|T = 0)

= E(Y |T = 1)− E(Y |T = 0)



Sample selection (cont.)

For ∆ATE we must have

E(Y1|T = 1) = E(Y |T = 1)

andE(Y0|T = 0) = E(Y |T = 0)

For ∆ATT we must have

E(Y0|T = 1) = E(Y |T = 1) = E(Y |T = 0)



Sample selection (cont.)

If (Y0,Y1) is correlated with T , then ∆ATE 6= ∆ATT. In addition,

∆ATE = E(Y |T = 1)− E(Y |T = 0)

= E(Y1|T = 1)− E(Y0|T = 0)

= E(Y1|T = 1)− E(Y0|T = 1) +

[E(Y0|T = 1)− E(Y0|T = 0)]

= ∆TT + BTT


Evaluation problem: formal presentation Selection bias: Formal presentation

Selection bias: Formal presentation

Let Yi = µi + αTi + ui

= Xiβ + αTi + ui

Let T ?i = Wiγ + νi , with

Ti =

{1 si T ?

i > 00 if T ?

i ≤ 0⇒ Ti = 1 if νi > −Wiγ



If Corr(ui , νi) 6= 0 then

E(ui |Ti ,Xi ,Wi) 6= 0

It can then be shown that:

E(Yi |Ti ,Xi ,Wi) = Xiβ + αTi + σφ(−Wiγ)

Φ(−Wiγ)(1)

6= Xiβ + αTi

And thus:⇒ E(α) 6= α



Example: Negative correlation:β = 0,5,α = 1,0,ρ = −0,9, γ = 0,01

DataW X ui νi Xiβ Wiγ Ti Yi1 2 -1,52 1,71 1 0,01 1 0,482 4 -0,07 -0,16 2 0,02 0 1,923 6 -0,44 0,95 3 0,03 1 3,554 8 0,17 -0,1 4 0,04 0 4,175 10 -1,71 1,42 5 0,05 1 4,281 2 0,37 -1,01 1 0,01 0 1,372 4 1,05 -1,47 2 0,02 0 3,053 6 -1,48 0,79 3 0,03 1 2,514 8 -0,28 0,22 4 0,04 1 4,715 10 1,22 -0,2 5 0,05 0 6,22

RegressionPara Para StdErr TCste 0,28 0,49 0,56β 0,54 0,07 7,55α -0,68 0,41 -1,66

Difference in means:

Y1 − Y0 = (3, 11− 3, 35)

= −0, 24



Example: Positive correlation:β = 0,5,α = 1,0,ρ = 0,9, γ = 0,01

Data:

X ui νi Xiβ Ti Yi

2 -1,52 -1,02 1 0 0,484 -0,07 -0,31 2 0 2,926 -0,44 0,15 3 1 4,558 0,17 0,22 4 1 6,1710 -1,71 -1,67 5 0 4,282 0,37 -0,34 1 0 2,374 1,05 0,43 2 1 5,056 -1,48 -1,89 3 0 2,518 -0,28 -0,29 4 0 4,7110 1,22 2,01 5 1 8,22

Regression:

Para Para Ecart TCste 0,56 0,66 0,84β 0,43 0,10 4,21α 2,40 0,59 4,00

Difference in means:

Y1 − Y0 = (5, 13− 4, 47)

= 1, 66


Evaluation problem: formal presentation Attrition bias

Attrition bias

An attrition bias arises when individuals who leave the samplehave observable or unobservable characteristics that are differentfrom those who remain in the sample.The attrition bias shares many common features, but is distinctfrom, the selection bias.Statistical remedies for the attrition bias are similar, albeit morecomplex, than those used to correct for selection biases.



Attrition bias

An attrition bias arises when individuals who leave the samplehave observable or unobservable characteristics that are differentfrom those who remain in the sample.

The attrition bias shares many common features, but is distinctfrom, the selection bias.Statistical remedies for the attrition bias are similar, albeit morecomplex, than those used to correct for selection biases.



Attrition bias

An attrition bias arises when individuals who leave the samplehave observable or unobservable characteristics that are differentfrom those who remain in the sample.The attrition bias shares many common features, but is distinctfrom, the selection bias.

Statistical remedies for the attrition bias are similar, albeit morecomplex, than those used to correct for selection biases.



Attrition bias

An attrition bias arises when individuals who leave the samplehave observable or unobservable characteristics that are differentfrom those who remain in the sample.The attrition bias shares many common features, but is distinctfrom, the selection bias.Statistical remedies for the attrition bias are similar, albeit morecomplex, than those used to correct for selection biases.



Formal presentation

The latent model is the following:

Let Yi = Xiβ + αTi + ui

Let T ?i = Wiγ + νi and A?i = Ziδ + εi

The observable model isTi =

{1 if T ?i > 00 if T ?i ≤ 0

⇒ Ti = 1 if νi > −Wiγ

Ai =

{1 if A?i > 00 if A?i ≤ 0

⇒ Ai = 1 if εi > −Ziδ



The attrition bias is due to the correlation between εi and ui . Indeed,

E(Yi |Xi ,Ti ,Wi ,Zi) = Xiβ + αTi + E(ui |νi > −Wiγ, ε > −Ziδ)

6= Xiβ + αTi

ExamplesEarnings of immigrantsImpact of training programmesWelfare spells of immigrants



Example: β = 0,5, α = 1, γ = 0,01, δ = 0,01, ρ = 0,9

W X ui νi ei Xiβ Wiγ Ziδ Ti Ai Yi1 2 1,326 1,107 1,305 1 0,010 0,015 1 1 3,3261 4 0,232 -0,950 -0,244 2 0,010 0,025 0 0 2,2322 6 0,003 0,249 0,362 3 0,020 0,040 1 1 4,0032 8 -1,065 1,996 -0,535 4 0,020 0,050 1 0 3,9353 10 -0,843 -0,289 -1,877 5 0,030 0,065 0 0 4,1573 12 0,390 0,278 0,872 6 0,030 0,075 1 1 7,3904 14 2,236 -0,075 2,872 7 0,040 0,090 0 1 9,2364 16 -0,174 0,251 0,115 8 0,040 0,100 1 1 8,8265 18 -0,270 0,872 -0,378 9 0,050 0,115 1 0 9,7305 20 -0,317 -1,774 -0,677 10 0,050 0,125 0 0 9,6831 2 0,278 0,309 0,233 1 0,010 0,015 1 1 2,2781 4 -2,675 0,050 -2,233 2 0,010 0,025 1 0 0,3252 6 0,661 0,068 0,282 3 0,020 0,040 1 1 4,6612 8 -1,126 -0,360 -1,022 4 0,020 0,050 0 0 2,8743 10 0,137 -1,648 -0,135 5 0,030 0,065 0 0 5,1373 12 0,857 -1,433 1,318 6 0,030 0,075 0 1 6,8574 14 0,576 -1,092 0,252 7 0,040 0,090 0 1 7,5764 16 1,001 -2,339 1,497 8 0,040 0,100 0 1 9,0015 18 0,956 1,803 0,519 9 0,050 0,115 1 1 10,9565 20 0,025 -0,065 -0,769 10 0,050 0,125 0 0 10,025

Para Para StdErr TCste -0,981 0,606 -1,618β 1,111 0,086 12,875α 0,089 0,552 0,162


Evaluation problem: formal presentation Social experimentation

Experimention has a solution to the selection problem

Random assignment eliminates the correlation between u and ν.Xi ui Xiβ Ti Yi1 -1,911 0,5 1 -0,4112 -1,447 1,0 1 0,5533 -0,037 1,5 1 2,4634 0,703 2,0 1 3,7035 -0,407 2,5 1 3,0936 -0,473 3,0 1 3,5277 1,832 3,5 1 6,3328 1,467 4,0 1 6,4679 -1,766 4,5 1 3,73410 0,384 5,0 1 6,3841 0,728 0,5 0 1,2282 -0,977 1,0 0 0,0233 -1,107 1,5 0 0,3934 0,531 2,0 0 2,5315 -0,834 2,5 0 1,6666 -1,643 3,0 0 1,3577 1,104 3,5 0 4,6048 0,011 4,0 0 4,0119 1,633 4,5 0 6,13310 -0,698 5,0 0 4,302


E(Y |T = 1) = 3,584E(Y |T = 0) = 2,625Difference = 0,960




Random assignment eliminates the correlation between u and ν.

Xi ui Xiβ Ti Yi1 -1,911 0,5 1 -0,4112 -1,447 1,0 1 0,5533 -0,037 1,5 1 2,4634 0,703 2,0 1 3,7035 -0,407 2,5 1 3,0936 -0,473 3,0 1 3,5277 1,832 3,5 1 6,3328 1,467 4,0 1 6,4679 -1,766 4,5 1 3,73410 0,384 5,0 1 6,3841 0,728 0,5 0 1,2282 -0,977 1,0 0 0,0233 -1,107 1,5 0 0,3934 0,531 2,0 0 2,5315 -0,834 2,5 0 1,6666 -1,643 3,0 0 1,3577 1,104 3,5 0 4,6048 0,011 4,0 0 4,0119 1,633 4,5 0 6,13310 -0,698 5,0 0 4,302






Random assignment eliminates the correlation between u and ν.Xi ui Xiβ Ti Yi1 -1,911 0,5 1 -0,4112 -1,447 1,0 1 0,5533 -0,037 1,5 1 2,4634 0,703 2,0 1 3,7035 -0,407 2,5 1 3,0936 -0,473 3,0 1 3,5277 1,832 3,5 1 6,3328 1,467 4,0 1 6,4679 -1,766 4,5 1 3,73410 0,384 5,0 1 6,3841 0,728 0,5 0 1,2282 -0,977 1,0 0 0,0233 -1,107 1,5 0 0,3934 0,531 2,0 0 2,5315 -0,834 2,5 0 1,6666 -1,643 3,0 0 1,3577 1,104 3,5 0 4,6048 0,011 4,0 0 4,0119 1,633 4,5 0 6,13310 -0,698 5,0 0 4,302





Social experimentation: Limits and caveats

ProblemsAttrition biasRandomization biasContagion biasLogistics bias

LimitsGeneral equilibrium effectsNo inference on structural parameters (Ex: SSP in Canada)External validity


Solutions

Solutions

Before-After estimatorsDifference in differences estimatorCross-sectional estimators

Sample selection bais (“Heckit”)"Treatment effect” methodMatching Estimators’

Difference in differences matching estimator


Solutions Before-After estimator

Before-After estimator

Let Y1t ,Y0t ′ with t > k > t ′

This estimator assumes that E(Y0t − Y0t ′ |T = 1) = 0If it holds: α = E(Y1t − Y0t ′ |T = 1) (unbiased)Why ?: Y1t − Y0t = (Y1t − Y0t ′) + (Y0t ′ − Y0t )

The second term is an approximation error ((Y0t ′ − Y0t )→ 0 ).Does not require panel data. Repeated cross-sections can beused



The approximation error may not be zero:Changes in the economic environment between t and t ′.If the sample windows are too wide (1 year +) many changes mayoccur, in addition to the program implementation“Ashenfelter Dip”.Empirical regularity: The earnings of program participants declinein months prior to participation (t ′).



Ashenfelter Dip: Example # 1

FIGURE 1Mean Self-Reported Monthly Earnings

National JTPA Study Controls and Eligible Non-participants (ENPs) and SIPP EligiblesMale Adults

0

100

200

300

400

500

600

700

800

900

1000

1100

1200

1300

k-20 k-15 k-10 k-5 k k+5 k+10 k+15

Month Relative to Random Assignment (Controls) or Eligibility (ENPs and SIPP Eligibles)

SIPP Eligibles JTPA ENPs JTPA Controls

NominalDollars

Source: Heckman and Smith (1998b)




FIGURE 2Mean Annual Earnings Prior, During, and Subsequent to Training for 1964 MDTA Classroom Trainees

and a Comparison Group: White Males

$0

$1,000

$2,000

$3,000

$4,000

$5,000

$6,000

1959 1960 1961 1962 1963 1964 1965 1966 1967 1968 1969Year

Ear

ning

s

Trainees Comparison GroupSource: Ashenfelter (1978).




FIGURE 3Mean Annual Earnings for 1976 CETA Trainees and a Comparison Group

Males

$0

$1,000

$2,000

$3,000

$4,000

$5,000

$6,000

1970 1971 1972 1973 1974 1975 1976 1977 1978Year

Ear

ning

s in

196

7 D

olla

rs

Trainees Comparison GroupSource: Ashenfelter and Card (1985).




FIGURE 4National Supported Work (NSW) Average Annual Earnings Treatments, Controls, and Matched CPS Comparison Group

AFDC Recipients

0

500

1000

1500

2000

2500

1972 1973 1974 1975 1976 1977 1978 1979Year

Ear

ning

s

NSW experimentals NSW Controls Matched CPS Comparison Group

Source: Fraker and Maynard (1987)

Enrollment Period




FIGURE 5Earnings of Participants in Swedish UI Training in 1991 and Two Comparison Groups

Adult Males--Ages 26-54

0

20

40

60

80

100

120

140

160

1986 1987 1988 1989 1990 1991 1992

Year

Mea

n A

nnua

l Ear

ning

s in

Tho

usan

ds o

f 199

5 S

EK

Trainees Comparison Group 1 Comparison Group 2Source: Regner (1997)




FIGURE 6Earnings of 1991 Participants in Norwegian Labor Market Training Programme and a Randomly Assigned

Control GroupAll Participants

0

10000

20000

30000

40000

50000

60000

70000

80000

90000

100000

1989 1990 1991 1992 1993 1994Year

Mea

n A

nnua

l Ear

ning

s in

199

4 N

OK

Treatment Group Control GroupSource: Raaum and Torp (1997)



Let Yit be the outcome variable of participant i at time t , where t > k .

One can write : Yit = δ + Xitβ + αTit + uit

Yit ′ = δ + Xit ′β + uit ′ , où

Tit =

{1 if t > k0 otherwise

The parameter estimate α is valid only under the following assumption:



T ∗it = Witγ + θi + εit , with

T ∗it =

{1 if Witγ + θi + εit > 00 otherwise

and where Corr(εit ,uit ) = 0. It can then be shown that:

E(θi |T = 1) = δ


Solutions Difference in differences estimator

Difference in differences estimator

Let Y1t ,Y0t ′ , with t > k > t ′.This estimator assumes that:

E(Y0t − Y0t ′ |T = 1) = E(Y0t − Y0t ′ |T = 0).

If the assumption if true:

α = E(Y1t − Y0t ′ |T = 1)− E(Y0t − Y0t ′ |T = 0)

= (Y1t − Y0t ′ |T = 1)− (Y0t − Y0t ′ |T = 0)

The assumption is violated if the data depict an Ashenfelter Dip.


Solutions Difference in differences estimator(regression)

Difference in differences estimator(regression)

Let Yit be the outcome variable at time t of participant i , wheret > k > t ′.We can write

Yit = δ + Xitβ + φPit + αPitTit + uit où

Tit =

{1 if Participant0 otherwise

Pit =

{1 if After participation0 otherwise

This estimator allows for a temporal effect that is common to bothparticipants and non-participants (φ).


Solutions Difference in differences estimator(regression)

Graphical representation of the DD estimator

X

Y

Y=X

Y=X

Y=X

k


Solutions Cross-section estimator

Cross-section estimator

In order to better understand the nature of the selection bias problem,assume the decision to participate in a given program can beparametrize as follows:

T ∗i = Wiγ + νi (2)

where νi is an error term, γ is a vector of unknown parameters, andwhere

Ti =

{1, if T ∗i > 00, otherwise

(3)


Solutions Cross-section estimator

Likewise, let:Yi = Xiβ + αTi + ui , (4)

To the extent the participation decision depends on νi , it can be shownthat the parametre estimate α of (4) will be biased because

E(ui |Ti ,Xi) 6= 0. (5)


Solutions Heckman-type correction

Heckman-type correction (“Heckit”)

The main assumption that is required to derive an unbiased estimatorof α requires that the error terms ui and νi are mean zero and follow ajoint normal distribution. In other words:(

uiνi

)∼[σ2 ρρ 1

](6)

Under this assumption we can estimate α in two stages or bymaximum likelihood (STATA: “Treatreg”).



1 - Two-stage least squares: Y continuous

A. Probit on participation:

T ∗i = Wiγ + νi :

Prob(Ti = 1) = Prob(T ∗i > 0)

= Prob(Wiγ + νi > 0)

= Prob(νi > −Wiγ)

=

∫ ∞−Wiγ

f (ξ)dξ

= 1− Φ(−Wiγ)

= Φ(Wiγ)

Based on γ ,we compute Ti = Φ(Wi γ)



B. Second stage:Yi = Xiβ + Tiα + ui

Three problems with this approach:Identification of α;Correction of standard errors;Only valid if Yi is continuous and not truncated.

1 Probit on Yi2 Tobit on Yi3 Many recent developments (non-parametric methods)



2-Two-stage least squares: Y truncated

A. Probit on participation:

T ∗i = Wiγ + νi

Prob(Ti = 1) = Prob(T ∗i > 0)

= Prob(Wiγ + νi > 0)

= Prob(νi > −Wiγ)

=

∫ ∞−Wiγ

f (ξ)dξ

= 1− Φ(−Wiγ)

= Φ(Wiγ)

Ti = Φ(Wi γ)



B. We select only observations for which Y > 0. We then estimate(see equation (1))

E(Yi |Ti ,Xi ,Wi) = Xiβ + αTi + σφ(−Wiγ)

Φ(−Wiγ)(7)

6= Xiβ + αTi

Three problems with this approahc:Identification of α a real issueCorrection of standard errors;Valid only if Yi is continuous



3 - Maximum likelihood

The likelihood function for observation i :

Yi = Xiβ + Tiα + ui

T ∗i = Wiγ + νi

Li = Ti ×∫ ∞−W1γ

f (ui , ξ)dξ +

(1− Ti)×∫ −W1γ

−∞f (ui , ξ)dξ.


Propensity score matching


PSM aims at making participation similar to a random experimentAdvantages of PSM:

1 Avoids making assumptions about the distribution of error terms;2 Avoids assuming additivity in the error ters.

The approach rests upon a restrictive set of assumptions:1 Conditional independence (on observables) assumption (CIA);2 Existence of a comparison group.




CIAThe vector of observables, X , is such that Y 0 is independent ofparticipation, conditionally on X . Formally,

Y 0 ⊥ T |X (8)

Equation (8) states that, for a given X , the mean of Y fornon-participants corresponds to the mean that would have beenobserved for participants, had they not participated.



Existence of a comparaison groupThis assumption insures that for each participant there exists atleast one non-participant that is somehow similar in many aspects:

0 < P(T = 1|X ) < 1 (9)



CIA requires numerous conditioning variables (= Dim(X )).⇒ Each individual must be matched on the basis of manyobservable characteristics.Raises an important dimensionality problem. Avoided thanks to atheorem by Rosenbaum et al(1983):

CIA⇒ CIS (independence conditional on a score).

The score is a scaler mapping (dimension 1) of X .

The score, in most cases, is simply the probability of participating:P = P(T = 1|X )



This assumtion is written as:

Y 0 ⊥ T |X ⇒ Y 0 ⊥ T |P(X )



The construction if a comparison group for each participant is based on theproximity between his score and that of the non-participants.

Because the score is a continuous variable, there exits many distance metricsthat can be used.

The most common is the one proposed by Heckman et al (1998). They suggestthe use of a kernel estimator such that:

E(Y 0|P(X )) =∑j∈I0

(Kh[P(Xj )− P(Xi )]∑

j∈I0Kh[P(Xj )− P(Xi )]

)Yj ,

where I0 is the set of non-participants and

Kh[P(Xj )− P(Xi )] = K[

P(Xj )− P(Xi )

h

],

with K the kernel and h a window or "bandwith".



Each non-participant contributes to the construction of thecounterfactual for participant i . Individual weights vary according to themetric distance between the scores.The estimator is:

α =1

N1

∑i∈I1

Yi −∑j∈I0

(Kh[P(Xj)− P(Xi)]∑

j∈I0 Kh[P(Xj)− P(Xi)]

)Yj

,where N1 is the number of participants.


Propensity score matching Propensity score matching and DD

Propensity score matching and DD

CIA is a strong assumption that is easily violated in manycircumstances.It certainly will be the case if there are unobservable individualfixed effects, λ, that affect both the impact of participation as wellas the probability of participating.These fixed effects are not problematic if they can be somehowremoved from the estimatorsIt can be shown that CIA can be generalized to models with fixedeffects. Indeed, we can write(8) as:

Y 0 ⊥ T |X , λ et g(Y 0,X ) ⊥ λ|X ⇒ g(Y 0,X ) ⊥ T |X ,

where g(·) is a linear transformation of Y 0.



CIA will be valid if the effects of λ can be removed from Y 0.To remover λ from the equations it is necessary to haveinformation on Y 1

it and Y 0it before and after participation.

A simple differentiation will eliminate individual fixed effects.The estimator generalizes the DD estimator within a PSMframework:



α =1

N1

∑i∈I1

∆Yi −∑j∈I0

(Kh[P(Xj)− P(Xi)]∑

j∈I0 Kh[P(Xj)− P(Xi)]

)∆Yj

where ∆Yi = Yit − Yit ′ , and where t ′ < k . This estimator is verycommon in the literature on program evaluation.


Propensity score matching Practical considerations

Practical considerations

Tests on the distribution if X (“Balancing score property”).“PSCORE” in Stata. Observations with the same score P(X )should have the same distribution of observables,X ,independently of T . ⇒ Observational data have the sameproperties as a random experiment.



Procedure

1 Estimate the model

Prob(Ti = 1|Xi) = Φ{h(xi)}

2 Divide the sample into k sub-samples on the basis of equidistantintervals of the scores P(X ).

3 For each interval, test for the equality of the mean scores for thetwo groupes (treated and untreated).

4 If the test is rejected, sub-divide the interval anew and redo thetest.

5 Continue until the test passes for each interval.



6 Verify that the X have the same mean for both group within eachinterval.

7 If rejected, change the model specification (usually moreparcimonuous)

8 If not rejected, estimate the impact of the program.



Common support

Common Support Restriction

For each participant me must construct a counterfactual using snon-participants (s ≥ 1)⇒We can only construct such acounterfactual for individuals whose scores are common to thescores of both groups.

The estimator is thus "local" in a certain sense, i.e.E(α|P(X ) ∈ S∩,T = 1), where S∩ = ST ∩ SNT .Consequently, the estimator of the score (probit) must not be "too"good . . .



Other types of matching estimators1 Nearest-Neighbour Matching

Let Ci = minj ||pi − pj ||. Each participant is associated with a singlenon-participant. The latter is the one whose score is closest to thatof participant i .

2 Radius Matching

Let Ci = {pj | ||pi − pj || < r}. Each participant is associated with allthe non-participant whose scores are within a given distance fromhis own score.



Let NCi be the number of non-participants and

ωij =

{1

NCi, if j ∈ Ci ,

0, otherwise

The matching estimator in both cases is given by:

α =1

NT

∑i∈T

Y Ti −

∑j∈Ci

ωij Y Cj

=

1NT

∑i∈T

Y Ti −

∑i∈T

∑j∈Ci

ωij Y Cj

=

1NT

∑i∈T

Y Ti −

1NT

∑j∈C

ωj Y Cj ,

where ωj =∑

i ωij .


Propensity score matching Heterogeneous Treatment Effect Analysis

Heterogeneous Treatment Effect Analysis

A basic paradigm of the literature based on the potentialoutcomes model is that there can be individual heterogeneity intreatment effects, which stands in contrast to traditional regressionmodelling assuming constant parameters.The view that treatment effects can be heterogeneous led to newmethods for causal inference and also to new uses andinterpretations of existing methods (e.g. LATE interpretation of IVestimators, revival of matching and regression discontinuitydesigns).Surprisingly, however, not much attention is usually paid to theexplicit analysis of the heterogeneity of treatment effects inapplied studies.



Heterogeneous Treatment Effect Analysis (Cont.)

Recall that:

Yi = µi + αTi + ui

= α + Xiβ + δTi + ui (Homogeneous treatment)or= α + Xiβ + δiTi + ui ( Heterogeneous treatment)

Equivalently:

Yi,1 = α1 + Xiβ1 + ui,1, if Ti = 1,Yi,0 = α0 + Xiβ0 + ui,0, if Ti = 0,

Ti =

{1, if T ∗i ≥ 00, if T ∗i < 0

where T ∗i γZi − Vi




Observed outcome:

Yi = Ti × Yi,1 + (1− Ti)× Yi,0

= Ti(α1 + Xiβ1 + ui,1) + (1− Ti)(α0 + Xiβ0 + ui,0)

= α0 + Xβ0 + [(α1 − α0) + (β1 − β0)Xi + (ui,1 − ui,0)]Ti + ui,0

Therefore:

δi = (α1 − α0)︸︷︷︸ATEα

+ (β1 − β0)Xi︸︷︷︸ATEx

+ (ui,1 − ui,0)︸︷︷︸Het on Unobs



Unbiasedness of OLSIf δi and ui,0 are uncorrelated with Ti , the population average treatmenteffect may be estimated by OLS

Homogeneous treatment effect: ATE = ATEα

Yi = α + Xiβ + δTi + ui

Heterogeneous treatment effect: ATE = ATEα + ATEx

Yi = α + Xiβ0 + (β1 − β0)Xi × Ti + δTi + ui



When are OLS wrong ?

OLS are biased if δi and ui,0 are uncorrelated with Ti

δ(xi ) = E(Ti |Xi = xi ,Ti = 1)− E(Ti |Xi = xi ,Ti = 0)

= E(α1 + Xiβ1 + ui,1|Ti = 1)− E(α0 + Xiβ0 + ui,0|Ti = 0)

= (α1 − α0) + (β1 − β0)xi + E(ui,1|Ti = 1)− E(ui,0|Ti = 0)

= E(δi |Xi = xi ) + E(ui,1|Ti = 1)− E(ui,0|Ti = 0)

= ATE(xi ) + E(ui,1|Ti = 1)− E(ui,0|Ti = 1)

+ E(ui,0|Ti = 1)− E(ui,0|Ti = 0)

= ATE(xi ) + {E(ui,1 − ui,0)|Ti = 1)}︸︷︷︸Sorting on Gains

+ {E(ui,0|Ti = 1)− E(ui,0|Ti = 0)}︸︷︷︸Selection Bias

= ATE(xi ) + SGEui,1 + SB1→i,0



Biases, reverse order

δ(xi ) = ATE(xi ) + {E(ui,1 − ui,0)|Ti = 0)}︸︷︷︸Sorting on Gains

+ {E(ui,1|Ti = 1)− E(ui,1|Ti = 0)}︸︷︷︸Selection Bias

= ATE(xi ) + SGEui,0 + SB0→i,1

Different estimators

TT (xi ) = E(Yi,1 − Yi,0|Xi = xi ,Ti = 1)

= ATE(xi ) + SGEu1,i

TUTxi = E(Yi,1 − Yi,0|Xi = xi ,Ti = 0)

= ATE(xi ) + SGEu0,i



Population estimates

ATE =

∫X

ATE(xi )dFX (x)

TT =

∫X |T=1

TT (xi )dFX |T=1(x)

TUT =

∫X |T=0

TUT (xi )dFX |T=0(x)

δ = ATE + SGE + SB1→0



Heterogeneous Treatment Effect Analysis in Practice

For example, in the literature on economic returns to highereducation various theories have been proposed that implyheterogeneous effects depending on the probability to go tocollege.

Human-capital theory in economics predicts positive selection intotreatment, because people choose to go to college based on theexpected economic returns. This is a widely accepted view.More sociologically oriented literature suggests that collegeattendance is strongly influenced by social origin, which leads tonegative selection into treatment under certain conditions.

To evaluate these theories it is therefore crucial to analyze howtreatment effects vary with treatment probability.



Heterogeneous Treatment Effect Analysis in Practice

Ben Jann, Jennie E. Brand and Yu Xie have developed a usefulStata module to perform this analysis (HTE).

Brand, J. E., Y. Xie (2010). Who Benefits Most From College?Evidence for Negative Selection in Heterogeneous Economic Returnsto Higher Education. American Sociological Review 75:273–302.

HTE comes in 3 flavours: het, het2, het3




The approach of hte is to assume conditional unconfoundednessgiven a set of covariates and use propensity score stratification toestimate treatment effects at various points over the range of thepropensity score.In hte the strata-specific effects are then analyzed to determinewhether there is a trend in treatment effects.With hte2 and hte3, non-parametric analysis of the treatmenteffect in relation the individual scores.



The hte algorithm consists of four basic steps:1 Estimation of the propensity score (i.e. the conditional probability

to receive treatment).

2 Construction of balanced propensity score strata (using PSCOREor PSMATCH2)

3 Estimation of strata-specific average treatment effects4 Estimation of the trend of treatment effects across propensity

score strata.




to receive treatment).2 Construction of balanced propensity score strata (using PSCORE

or PSMATCH2)

3 Estimation of strata-specific average treatment effects4 Estimation of the trend of treatment effects across propensity

score strata.





or PSMATCH2)3 Estimation of strata-specific average treatment effects

4 Estimation of the trend of treatment effects across propensityscore strata.





or PSMATCH2)3 Estimation of strata-specific average treatment effects4 Estimation of the trend of treatment effects across propensity

score strata.



HTE Examples

-500

00

5000

1000

0Tr

eatm

ent E

ffect

1 2 3 4 5 6 7Propensity Score Strata

95% CI TE within stratalinear trend

slope of linear trend (s.e.) = 785.728 (311.864)Difference earnings

050

0010

000

Trea

tmen

t Effe

ct

.2 .4 .6 .8 1Propensity Score

95% CI lpoly fit

Diff Earnings-2

0000

-100

000

1000

020

000

Trea

tmen

t Effe

ct



slope of linear trend (s.e.) = 414.933 (640.637)Immigrant Diff Earnings

-100

00-5

000

050

0010

000

1500

0Tr

eatm

ent E

ffect



slope of linear trend (s.e.) = 1001.051 (640.637)Natives Diff Earnings


Evaluation of Public Policies - Université Laval · Guy Lacroix (Université Laval) Program...

Documents

Transcript of Evaluation of Public Policies - Université Laval · Guy Lacroix (Université Laval) Program...