Evaluation of Public Policies - Université Laval · Guy Lacroix (Université Laval) Program...
Transcript of Evaluation of Public Policies - Université Laval · Guy Lacroix (Université Laval) Program...
Evaluation of Public PoliciesParametric and non-parametric approaches
Guy Lacroix
CIRPEE, Université Laval
AGRODEP, Dakar28-30 Octobre 2015
http://www.ecn.ulaval.ca/guy_lacroix/Cours/Dakar/
Guy Lacroix (Université Laval) Program Evaluation 25–28 October 2015 1 / 72
The fundamental problem of evaluation
The fundamental problem of evaluation
What is a counterfactual ?A counterfactual is a virtual representation of the outcome of a givenindividual were he in a state other than the observed one. . .
ProblemHow do we measure the impact of a program on a givenindividual?Need to observe the individual in two states, with and without theprogram.Selectivity issue: program participants are not randomly chosen!
Guy Lacroix (Université Laval) Program Evaluation 25–28 October 2015 2 / 72
The fundamental problem of evaluation
The fundamental problem of evaluation
What is a counterfactual ?A counterfactual is a virtual representation of the outcome of a givenindividual were he in a state other than the observed one. . .
ProblemHow do we measure the impact of a program on a givenindividual?Need to observe the individual in two states, with and without theprogram.Selectivity issue: program participants are not randomly chosen!
Guy Lacroix (Université Laval) Program Evaluation 25–28 October 2015 2 / 72
The fundamental problem of evaluation
Sample selection problem
Definition (Sample selection bias)The sample selection bias or auto-selection bias arises whenparticipants are not representative of the population of interest⇒Cannot make inference based on such a sample.
Source of the problemAuto-selection: derives from participants.
Program manager (goals need to be met)
For the analyst (econometrician):
Linked to observable characteristicsLinked to unobservable characteristicsLinked to both
Guy Lacroix (Université Laval) Program Evaluation 25–28 October 2015 3 / 72
The fundamental problem of evaluation
Sample selection problem
Definition (Sample selection bias)The sample selection bias or auto-selection bias arises whenparticipants are not representative of the population of interest⇒Cannot make inference based on such a sample.
Source of the problemAuto-selection: derives from participants.
Program manager (goals need to be met)
For the analyst (econometrician):
Linked to observable characteristicsLinked to unobservable characteristicsLinked to both
Guy Lacroix (Université Laval) Program Evaluation 25–28 October 2015 3 / 72
The fundamental problem of evaluation
Statistical fixes
Cross-sectional dataPanel dataBased on
Observable heterogeneityUnobserved heterogeneityBoth
Guy Lacroix (Université Laval) Program Evaluation 25–28 October 2015 4 / 72
Evaluation problem: formal presentation
Formal presentation
Causal model of Rubin (1974)
Treatment variable→ T ∈ [0,1]
Outcome variable→ Y0 ou Y1
Observed outcome variable→ Y = T · Y1 + (1− T ) · Y0
Guy Lacroix (Université Laval) Program Evaluation 25–28 October 2015 5 / 72
Evaluation problem: formal presentation
Formal presentation
Causal model of Rubin (1974)
Treatment variable→ T ∈ [0,1]
Outcome variable→ Y0 ou Y1
Observed outcome variable→ Y = T · Y1 + (1− T ) · Y0
Guy Lacroix (Université Laval) Program Evaluation 25–28 October 2015 5 / 72
Evaluation problem: formal presentation
Formal presentation
Causal model of Rubin (1974)
Treatment variable→ T ∈ [0,1]
Outcome variable→ Y0 ou Y1
Observed outcome variable→ Y = T · Y1 + (1− T ) · Y0
Guy Lacroix (Université Laval) Program Evaluation 25–28 October 2015 5 / 72
Evaluation problem: formal presentation
Parameter of interest
Causal impact of treatment T: ∆ = Y1 − Y0
Unobservable
Varies across individuals. ∃ a distribution of treatment effectsSince ∆ is unobservable, so is the distribution
Guy Lacroix (Université Laval) Program Evaluation 25–28 October 2015 6 / 72
Evaluation problem: formal presentation
Parameter of interest
Causal impact of treatment T: ∆ = Y1 − Y0
UnobservableVaries across individuals. ∃ a distribution of treatment effects
Since ∆ is unobservable, so is the distribution
Guy Lacroix (Université Laval) Program Evaluation 25–28 October 2015 6 / 72
Evaluation problem: formal presentation
Parameter of interest
Causal impact of treatment T: ∆ = Y1 − Y0
UnobservableVaries across individuals. ∃ a distribution of treatment effectsSince ∆ is unobservable, so is the distribution
Guy Lacroix (Université Laval) Program Evaluation 25–28 October 2015 6 / 72
Evaluation problem: formal presentation
Parameter of interest (cont.)
Parameter of interest from policy maker’s point of view:
Mean causal effect in the population (ATE)
∆ATE = E(Y1 − Y0)→ Hard to identify
Mean causal effect on the treated population (ATT)
∆ATT = E(Y1 − Y0|T = 1)→ (The most common)
Guy Lacroix (Université Laval) Program Evaluation 25–28 October 2015 7 / 72
Evaluation problem: formal presentation
Parameter of interest (cont.)
Parameter of interest from policy maker’s point of view:
Mean causal effect in the population (ATE)
∆ATE = E(Y1 − Y0)→ Hard to identify
Mean causal effect on the treated population (ATT)
∆ATT = E(Y1 − Y0|T = 1)→ (The most common)
Guy Lacroix (Université Laval) Program Evaluation 25–28 October 2015 7 / 72
Evaluation problem: formal presentation
Selection bias
Conditional Independence Assumption (CIA) (identification)
Si (Y0,Y1) ⊥ T , then ∆ATE = ∆ATT
Mean causal effect on the population
∆ATE = E(Y1 − Y0) = E(Y1|T = 1)− E(Y0|T = 0)
= E(Y |T = 1)− E(Y |T = 0)
Mean causal effect on the treated population
∆ATT = E(Y1|T = 1)− E(Y0|T = 1)
= E(Y1|T = 1)− E(Y0|T = 0)
= E(Y |T = 1)− E(Y |T = 0)
Guy Lacroix (Université Laval) Program Evaluation 25–28 October 2015 8 / 72
Evaluation problem: formal presentation
Selection bias
Conditional Independence Assumption (CIA) (identification)
Si (Y0,Y1) ⊥ T , then ∆ATE = ∆ATT
Mean causal effect on the population
∆ATE = E(Y1 − Y0) = E(Y1|T = 1)− E(Y0|T = 0)
= E(Y |T = 1)− E(Y |T = 0)
Mean causal effect on the treated population
∆ATT = E(Y1|T = 1)− E(Y0|T = 1)
= E(Y1|T = 1)− E(Y0|T = 0)
= E(Y |T = 1)− E(Y |T = 0)
Guy Lacroix (Université Laval) Program Evaluation 25–28 October 2015 8 / 72
Evaluation problem: formal presentation
Selection bias
Conditional Independence Assumption (CIA) (identification)
Si (Y0,Y1) ⊥ T , then ∆ATE = ∆ATT
Mean causal effect on the population
∆ATE = E(Y1 − Y0) = E(Y1|T = 1)− E(Y0|T = 0)
= E(Y |T = 1)− E(Y |T = 0)
Mean causal effect on the treated population
∆ATT = E(Y1|T = 1)− E(Y0|T = 1)
= E(Y1|T = 1)− E(Y0|T = 0)
= E(Y |T = 1)− E(Y |T = 0)
Guy Lacroix (Université Laval) Program Evaluation 25–28 October 2015 8 / 72
Evaluation problem: formal presentation
Sample selection (cont.)
For ∆ATE we must have
E(Y1|T = 1) = E(Y |T = 1)
andE(Y0|T = 0) = E(Y |T = 0)
For ∆ATT we must have
E(Y0|T = 1) = E(Y |T = 1) = E(Y |T = 0)
Guy Lacroix (Université Laval) Program Evaluation 25–28 October 2015 9 / 72
Evaluation problem: formal presentation
Sample selection (cont.)
If (Y0,Y1) is correlated with T , then ∆ATE 6= ∆ATT. In addition,
∆ATE = E(Y |T = 1)− E(Y |T = 0)
= E(Y1|T = 1)− E(Y0|T = 0)
= E(Y1|T = 1)− E(Y0|T = 1) +
[E(Y0|T = 1)− E(Y0|T = 0)]
= ∆TT + BTT
Guy Lacroix (Université Laval) Program Evaluation 25–28 October 2015 10 / 72
Evaluation problem: formal presentation Selection bias: Formal presentation
Selection bias: Formal presentation
Let Yi = µi + αTi + ui
= Xiβ + αTi + ui
Let T ?i = Wiγ + νi , with
Ti =
{1 si T ?
i > 00 if T ?
i ≤ 0⇒ Ti = 1 if νi > −Wiγ
Guy Lacroix (Université Laval) Program Evaluation 25–28 October 2015 11 / 72
Evaluation problem: formal presentation Selection bias: Formal presentation
If Corr(ui , νi) 6= 0 then
E(ui |Ti ,Xi ,Wi) 6= 0
It can then be shown that:
E(Yi |Ti ,Xi ,Wi) = Xiβ + αTi + σφ(−Wiγ)
Φ(−Wiγ)(1)
6= Xiβ + αTi
And thus:⇒ E(α) 6= α
Guy Lacroix (Université Laval) Program Evaluation 25–28 October 2015 12 / 72
Evaluation problem: formal presentation Selection bias: Formal presentation
If Corr(ui , νi) 6= 0 then
E(ui |Ti ,Xi ,Wi) 6= 0
It can then be shown that:
E(Yi |Ti ,Xi ,Wi) = Xiβ + αTi + σφ(−Wiγ)
Φ(−Wiγ)(1)
6= Xiβ + αTi
And thus:⇒ E(α) 6= α
Guy Lacroix (Université Laval) Program Evaluation 25–28 October 2015 12 / 72
Evaluation problem: formal presentation Selection bias: Formal presentation
If Corr(ui , νi) 6= 0 then
E(ui |Ti ,Xi ,Wi) 6= 0
It can then be shown that:
E(Yi |Ti ,Xi ,Wi) = Xiβ + αTi + σφ(−Wiγ)
Φ(−Wiγ)(1)
6= Xiβ + αTi
And thus:⇒ E(α) 6= α
Guy Lacroix (Université Laval) Program Evaluation 25–28 October 2015 12 / 72
Evaluation problem: formal presentation Selection bias: Formal presentation
Example: Negative correlation:β = 0,5,α = 1,0,ρ = −0,9, γ = 0,01
DataW X ui νi Xiβ Wiγ Ti Yi1 2 -1,52 1,71 1 0,01 1 0,482 4 -0,07 -0,16 2 0,02 0 1,923 6 -0,44 0,95 3 0,03 1 3,554 8 0,17 -0,1 4 0,04 0 4,175 10 -1,71 1,42 5 0,05 1 4,281 2 0,37 -1,01 1 0,01 0 1,372 4 1,05 -1,47 2 0,02 0 3,053 6 -1,48 0,79 3 0,03 1 2,514 8 -0,28 0,22 4 0,04 1 4,715 10 1,22 -0,2 5 0,05 0 6,22
RegressionPara Para StdErr TCste 0,28 0,49 0,56β 0,54 0,07 7,55α -0,68 0,41 -1,66
Difference in means:
Y1 − Y0 = (3, 11− 3, 35)
= −0, 24
Guy Lacroix (Université Laval) Program Evaluation 25–28 October 2015 13 / 72
Evaluation problem: formal presentation Selection bias: Formal presentation
Example: Negative correlation:β = 0,5,α = 1,0,ρ = −0,9, γ = 0,01
DataW X ui νi Xiβ Wiγ Ti Yi1 2 -1,52 1,71 1 0,01 1 0,482 4 -0,07 -0,16 2 0,02 0 1,923 6 -0,44 0,95 3 0,03 1 3,554 8 0,17 -0,1 4 0,04 0 4,175 10 -1,71 1,42 5 0,05 1 4,281 2 0,37 -1,01 1 0,01 0 1,372 4 1,05 -1,47 2 0,02 0 3,053 6 -1,48 0,79 3 0,03 1 2,514 8 -0,28 0,22 4 0,04 1 4,715 10 1,22 -0,2 5 0,05 0 6,22
RegressionPara Para StdErr TCste 0,28 0,49 0,56β 0,54 0,07 7,55α -0,68 0,41 -1,66
Difference in means:
Y1 − Y0 = (3, 11− 3, 35)
= −0, 24
Guy Lacroix (Université Laval) Program Evaluation 25–28 October 2015 13 / 72
Evaluation problem: formal presentation Selection bias: Formal presentation
Example: Negative correlation:β = 0,5,α = 1,0,ρ = −0,9, γ = 0,01
DataW X ui νi Xiβ Wiγ Ti Yi1 2 -1,52 1,71 1 0,01 1 0,482 4 -0,07 -0,16 2 0,02 0 1,923 6 -0,44 0,95 3 0,03 1 3,554 8 0,17 -0,1 4 0,04 0 4,175 10 -1,71 1,42 5 0,05 1 4,281 2 0,37 -1,01 1 0,01 0 1,372 4 1,05 -1,47 2 0,02 0 3,053 6 -1,48 0,79 3 0,03 1 2,514 8 -0,28 0,22 4 0,04 1 4,715 10 1,22 -0,2 5 0,05 0 6,22
RegressionPara Para StdErr TCste 0,28 0,49 0,56β 0,54 0,07 7,55α -0,68 0,41 -1,66
Difference in means:
Y1 − Y0 = (3, 11− 3, 35)
= −0, 24
Guy Lacroix (Université Laval) Program Evaluation 25–28 October 2015 13 / 72
Evaluation problem: formal presentation Selection bias: Formal presentation
Example: Negative correlation:β = 0,5,α = 1,0,ρ = −0,9, γ = 0,01
DataW X ui νi Xiβ Wiγ Ti Yi1 2 -1,52 1,71 1 0,01 1 0,482 4 -0,07 -0,16 2 0,02 0 1,923 6 -0,44 0,95 3 0,03 1 3,554 8 0,17 -0,1 4 0,04 0 4,175 10 -1,71 1,42 5 0,05 1 4,281 2 0,37 -1,01 1 0,01 0 1,372 4 1,05 -1,47 2 0,02 0 3,053 6 -1,48 0,79 3 0,03 1 2,514 8 -0,28 0,22 4 0,04 1 4,715 10 1,22 -0,2 5 0,05 0 6,22
RegressionPara Para StdErr TCste 0,28 0,49 0,56β 0,54 0,07 7,55α -0,68 0,41 -1,66
Difference in means:
Y1 − Y0 = (3, 11− 3, 35)
= −0, 24
Guy Lacroix (Université Laval) Program Evaluation 25–28 October 2015 13 / 72
Evaluation problem: formal presentation Selection bias: Formal presentation
Example: Positive correlation:β = 0,5,α = 1,0,ρ = 0,9, γ = 0,01
Data:
X ui νi Xiβ Ti Yi
2 -1,52 -1,02 1 0 0,484 -0,07 -0,31 2 0 2,926 -0,44 0,15 3 1 4,558 0,17 0,22 4 1 6,1710 -1,71 -1,67 5 0 4,282 0,37 -0,34 1 0 2,374 1,05 0,43 2 1 5,056 -1,48 -1,89 3 0 2,518 -0,28 -0,29 4 0 4,7110 1,22 2,01 5 1 8,22
Regression:
Para Para Ecart TCste 0,56 0,66 0,84β 0,43 0,10 4,21α 2,40 0,59 4,00
Difference in means:
Y1 − Y0 = (5, 13− 4, 47)
= 1, 66
Guy Lacroix (Université Laval) Program Evaluation 25–28 October 2015 14 / 72
Evaluation problem: formal presentation Selection bias: Formal presentation
Example: Positive correlation:β = 0,5,α = 1,0,ρ = 0,9, γ = 0,01
Data:
X ui νi Xiβ Ti Yi
2 -1,52 -1,02 1 0 0,484 -0,07 -0,31 2 0 2,926 -0,44 0,15 3 1 4,558 0,17 0,22 4 1 6,1710 -1,71 -1,67 5 0 4,282 0,37 -0,34 1 0 2,374 1,05 0,43 2 1 5,056 -1,48 -1,89 3 0 2,518 -0,28 -0,29 4 0 4,7110 1,22 2,01 5 1 8,22
Regression:
Para Para Ecart TCste 0,56 0,66 0,84β 0,43 0,10 4,21α 2,40 0,59 4,00
Difference in means:
Y1 − Y0 = (5, 13− 4, 47)
= 1, 66
Guy Lacroix (Université Laval) Program Evaluation 25–28 October 2015 14 / 72
Evaluation problem: formal presentation Selection bias: Formal presentation
Example: Positive correlation:β = 0,5,α = 1,0,ρ = 0,9, γ = 0,01
Data:
X ui νi Xiβ Ti Yi
2 -1,52 -1,02 1 0 0,484 -0,07 -0,31 2 0 2,926 -0,44 0,15 3 1 4,558 0,17 0,22 4 1 6,1710 -1,71 -1,67 5 0 4,282 0,37 -0,34 1 0 2,374 1,05 0,43 2 1 5,056 -1,48 -1,89 3 0 2,518 -0,28 -0,29 4 0 4,7110 1,22 2,01 5 1 8,22
Regression:
Para Para Ecart TCste 0,56 0,66 0,84β 0,43 0,10 4,21α 2,40 0,59 4,00
Difference in means:
Y1 − Y0 = (5, 13− 4, 47)
= 1, 66
Guy Lacroix (Université Laval) Program Evaluation 25–28 October 2015 14 / 72
Evaluation problem: formal presentation Selection bias: Formal presentation
Example: Positive correlation:β = 0,5,α = 1,0,ρ = 0,9, γ = 0,01
Data:
X ui νi Xiβ Ti Yi
2 -1,52 -1,02 1 0 0,484 -0,07 -0,31 2 0 2,926 -0,44 0,15 3 1 4,558 0,17 0,22 4 1 6,1710 -1,71 -1,67 5 0 4,282 0,37 -0,34 1 0 2,374 1,05 0,43 2 1 5,056 -1,48 -1,89 3 0 2,518 -0,28 -0,29 4 0 4,7110 1,22 2,01 5 1 8,22
Regression:
Para Para Ecart TCste 0,56 0,66 0,84β 0,43 0,10 4,21α 2,40 0,59 4,00
Difference in means:
Y1 − Y0 = (5, 13− 4, 47)
= 1, 66
Guy Lacroix (Université Laval) Program Evaluation 25–28 October 2015 14 / 72
Evaluation problem: formal presentation Attrition bias
Attrition bias
An attrition bias arises when individuals who leave the samplehave observable or unobservable characteristics that are differentfrom those who remain in the sample.The attrition bias shares many common features, but is distinctfrom, the selection bias.Statistical remedies for the attrition bias are similar, albeit morecomplex, than those used to correct for selection biases.
Guy Lacroix (Université Laval) Program Evaluation 25–28 October 2015 15 / 72
Evaluation problem: formal presentation Attrition bias
Attrition bias
An attrition bias arises when individuals who leave the samplehave observable or unobservable characteristics that are differentfrom those who remain in the sample.
The attrition bias shares many common features, but is distinctfrom, the selection bias.Statistical remedies for the attrition bias are similar, albeit morecomplex, than those used to correct for selection biases.
Guy Lacroix (Université Laval) Program Evaluation 25–28 October 2015 15 / 72
Evaluation problem: formal presentation Attrition bias
Attrition bias
An attrition bias arises when individuals who leave the samplehave observable or unobservable characteristics that are differentfrom those who remain in the sample.The attrition bias shares many common features, but is distinctfrom, the selection bias.
Statistical remedies for the attrition bias are similar, albeit morecomplex, than those used to correct for selection biases.
Guy Lacroix (Université Laval) Program Evaluation 25–28 October 2015 15 / 72
Evaluation problem: formal presentation Attrition bias
Attrition bias
An attrition bias arises when individuals who leave the samplehave observable or unobservable characteristics that are differentfrom those who remain in the sample.The attrition bias shares many common features, but is distinctfrom, the selection bias.Statistical remedies for the attrition bias are similar, albeit morecomplex, than those used to correct for selection biases.
Guy Lacroix (Université Laval) Program Evaluation 25–28 October 2015 15 / 72
Evaluation problem: formal presentation Attrition bias
Formal presentation
The latent model is the following:
Let Yi = Xiβ + αTi + ui
Let T ?i = Wiγ + νi and A?i = Ziδ + εi
The observable model isTi =
{1 if T ?i > 00 if T ?i ≤ 0
⇒ Ti = 1 if νi > −Wiγ
Ai =
{1 if A?i > 00 if A?i ≤ 0
⇒ Ai = 1 if εi > −Ziδ
Guy Lacroix (Université Laval) Program Evaluation 25–28 October 2015 16 / 72
Evaluation problem: formal presentation Attrition bias
Formal presentation
The latent model is the following:
Let Yi = Xiβ + αTi + ui
Let T ?i = Wiγ + νi and A?i = Ziδ + εi
The observable model isTi =
{1 if T ?i > 00 if T ?i ≤ 0
⇒ Ti = 1 if νi > −Wiγ
Ai =
{1 if A?i > 00 if A?i ≤ 0
⇒ Ai = 1 if εi > −Ziδ
Guy Lacroix (Université Laval) Program Evaluation 25–28 October 2015 16 / 72
Evaluation problem: formal presentation Attrition bias
Formal presentation
The latent model is the following:
Let Yi = Xiβ + αTi + ui
Let T ?i = Wiγ + νi and A?i = Ziδ + εi
The observable model isTi =
{1 if T ?i > 00 if T ?i ≤ 0
⇒ Ti = 1 if νi > −Wiγ
Ai =
{1 if A?i > 00 if A?i ≤ 0
⇒ Ai = 1 if εi > −Ziδ
Guy Lacroix (Université Laval) Program Evaluation 25–28 October 2015 16 / 72
Evaluation problem: formal presentation Attrition bias
The attrition bias is due to the correlation between εi and ui . Indeed,
E(Yi |Xi ,Ti ,Wi ,Zi) = Xiβ + αTi + E(ui |νi > −Wiγ, ε > −Ziδ)
6= Xiβ + αTi
ExamplesEarnings of immigrantsImpact of training programmesWelfare spells of immigrants
Guy Lacroix (Université Laval) Program Evaluation 25–28 October 2015 17 / 72
Evaluation problem: formal presentation Attrition bias
The attrition bias is due to the correlation between εi and ui . Indeed,
E(Yi |Xi ,Ti ,Wi ,Zi) = Xiβ + αTi + E(ui |νi > −Wiγ, ε > −Ziδ)
6= Xiβ + αTi
ExamplesEarnings of immigrantsImpact of training programmesWelfare spells of immigrants
Guy Lacroix (Université Laval) Program Evaluation 25–28 October 2015 17 / 72
Evaluation problem: formal presentation Attrition bias
Example: β = 0,5, α = 1, γ = 0,01, δ = 0,01, ρ = 0,9
W X ui νi ei Xiβ Wiγ Ziδ Ti Ai Yi1 2 1,326 1,107 1,305 1 0,010 0,015 1 1 3,3261 4 0,232 -0,950 -0,244 2 0,010 0,025 0 0 2,2322 6 0,003 0,249 0,362 3 0,020 0,040 1 1 4,0032 8 -1,065 1,996 -0,535 4 0,020 0,050 1 0 3,9353 10 -0,843 -0,289 -1,877 5 0,030 0,065 0 0 4,1573 12 0,390 0,278 0,872 6 0,030 0,075 1 1 7,3904 14 2,236 -0,075 2,872 7 0,040 0,090 0 1 9,2364 16 -0,174 0,251 0,115 8 0,040 0,100 1 1 8,8265 18 -0,270 0,872 -0,378 9 0,050 0,115 1 0 9,7305 20 -0,317 -1,774 -0,677 10 0,050 0,125 0 0 9,6831 2 0,278 0,309 0,233 1 0,010 0,015 1 1 2,2781 4 -2,675 0,050 -2,233 2 0,010 0,025 1 0 0,3252 6 0,661 0,068 0,282 3 0,020 0,040 1 1 4,6612 8 -1,126 -0,360 -1,022 4 0,020 0,050 0 0 2,8743 10 0,137 -1,648 -0,135 5 0,030 0,065 0 0 5,1373 12 0,857 -1,433 1,318 6 0,030 0,075 0 1 6,8574 14 0,576 -1,092 0,252 7 0,040 0,090 0 1 7,5764 16 1,001 -2,339 1,497 8 0,040 0,100 0 1 9,0015 18 0,956 1,803 0,519 9 0,050 0,115 1 1 10,9565 20 0,025 -0,065 -0,769 10 0,050 0,125 0 0 10,025
Para Para StdErr TCste -0,981 0,606 -1,618β 1,111 0,086 12,875α 0,089 0,552 0,162
Guy Lacroix (Université Laval) Program Evaluation 25–28 October 2015 18 / 72
Evaluation problem: formal presentation Attrition bias
Example: β = 0,5, α = 1, γ = 0,01, δ = 0,01, ρ = 0,9
W X ui νi ei Xiβ Wiγ Ziδ Ti Ai Yi1 2 1,326 1,107 1,305 1 0,010 0,015 1 1 3,3261 4 0,232 -0,950 -0,244 2 0,010 0,025 0 0 2,2322 6 0,003 0,249 0,362 3 0,020 0,040 1 1 4,0032 8 -1,065 1,996 -0,535 4 0,020 0,050 1 0 3,9353 10 -0,843 -0,289 -1,877 5 0,030 0,065 0 0 4,1573 12 0,390 0,278 0,872 6 0,030 0,075 1 1 7,3904 14 2,236 -0,075 2,872 7 0,040 0,090 0 1 9,2364 16 -0,174 0,251 0,115 8 0,040 0,100 1 1 8,8265 18 -0,270 0,872 -0,378 9 0,050 0,115 1 0 9,7305 20 -0,317 -1,774 -0,677 10 0,050 0,125 0 0 9,6831 2 0,278 0,309 0,233 1 0,010 0,015 1 1 2,2781 4 -2,675 0,050 -2,233 2 0,010 0,025 1 0 0,3252 6 0,661 0,068 0,282 3 0,020 0,040 1 1 4,6612 8 -1,126 -0,360 -1,022 4 0,020 0,050 0 0 2,8743 10 0,137 -1,648 -0,135 5 0,030 0,065 0 0 5,1373 12 0,857 -1,433 1,318 6 0,030 0,075 0 1 6,8574 14 0,576 -1,092 0,252 7 0,040 0,090 0 1 7,5764 16 1,001 -2,339 1,497 8 0,040 0,100 0 1 9,0015 18 0,956 1,803 0,519 9 0,050 0,115 1 1 10,9565 20 0,025 -0,065 -0,769 10 0,050 0,125 0 0 10,025
Para Para StdErr TCste -0,981 0,606 -1,618β 1,111 0,086 12,875α 0,089 0,552 0,162
Guy Lacroix (Université Laval) Program Evaluation 25–28 October 2015 18 / 72
Evaluation problem: formal presentation Attrition bias
Example: β = 0,5, α = 1, γ = 0,01, δ = 0,01, ρ = 0,9
W X ui νi ei Xiβ Wiγ Ziδ Ti Ai Yi1 2 1,326 1,107 1,305 1 0,010 0,015 1 1 3,3261 4 0,232 -0,950 -0,244 2 0,010 0,025 0 0 2,2322 6 0,003 0,249 0,362 3 0,020 0,040 1 1 4,0032 8 -1,065 1,996 -0,535 4 0,020 0,050 1 0 3,9353 10 -0,843 -0,289 -1,877 5 0,030 0,065 0 0 4,1573 12 0,390 0,278 0,872 6 0,030 0,075 1 1 7,3904 14 2,236 -0,075 2,872 7 0,040 0,090 0 1 9,2364 16 -0,174 0,251 0,115 8 0,040 0,100 1 1 8,8265 18 -0,270 0,872 -0,378 9 0,050 0,115 1 0 9,7305 20 -0,317 -1,774 -0,677 10 0,050 0,125 0 0 9,6831 2 0,278 0,309 0,233 1 0,010 0,015 1 1 2,2781 4 -2,675 0,050 -2,233 2 0,010 0,025 1 0 0,3252 6 0,661 0,068 0,282 3 0,020 0,040 1 1 4,6612 8 -1,126 -0,360 -1,022 4 0,020 0,050 0 0 2,8743 10 0,137 -1,648 -0,135 5 0,030 0,065 0 0 5,1373 12 0,857 -1,433 1,318 6 0,030 0,075 0 1 6,8574 14 0,576 -1,092 0,252 7 0,040 0,090 0 1 7,5764 16 1,001 -2,339 1,497 8 0,040 0,100 0 1 9,0015 18 0,956 1,803 0,519 9 0,050 0,115 1 1 10,9565 20 0,025 -0,065 -0,769 10 0,050 0,125 0 0 10,025
Para Para StdErr TCste -0,981 0,606 -1,618β 1,111 0,086 12,875α 0,089 0,552 0,162
Guy Lacroix (Université Laval) Program Evaluation 25–28 October 2015 18 / 72
Evaluation problem: formal presentation Social experimentation
Experimention has a solution to the selection problem
Random assignment eliminates the correlation between u and ν.Xi ui Xiβ Ti Yi1 -1,911 0,5 1 -0,4112 -1,447 1,0 1 0,5533 -0,037 1,5 1 2,4634 0,703 2,0 1 3,7035 -0,407 2,5 1 3,0936 -0,473 3,0 1 3,5277 1,832 3,5 1 6,3328 1,467 4,0 1 6,4679 -1,766 4,5 1 3,73410 0,384 5,0 1 6,3841 0,728 0,5 0 1,2282 -0,977 1,0 0 0,0233 -1,107 1,5 0 0,3934 0,531 2,0 0 2,5315 -0,834 2,5 0 1,6666 -1,643 3,0 0 1,3577 1,104 3,5 0 4,6048 0,011 4,0 0 4,0119 1,633 4,5 0 6,13310 -0,698 5,0 0 4,302
Para Para StdErr TCste -0,81 0,62 -1,31β 0,62 0,09 6,87α 0,95 0,52 1,83
E(Y |T = 1) = 3,584E(Y |T = 0) = 2,625Difference = 0,960
Guy Lacroix (Université Laval) Program Evaluation 25–28 October 2015 19 / 72
Evaluation problem: formal presentation Social experimentation
Experimention has a solution to the selection problem
Random assignment eliminates the correlation between u and ν.
Xi ui Xiβ Ti Yi1 -1,911 0,5 1 -0,4112 -1,447 1,0 1 0,5533 -0,037 1,5 1 2,4634 0,703 2,0 1 3,7035 -0,407 2,5 1 3,0936 -0,473 3,0 1 3,5277 1,832 3,5 1 6,3328 1,467 4,0 1 6,4679 -1,766 4,5 1 3,73410 0,384 5,0 1 6,3841 0,728 0,5 0 1,2282 -0,977 1,0 0 0,0233 -1,107 1,5 0 0,3934 0,531 2,0 0 2,5315 -0,834 2,5 0 1,6666 -1,643 3,0 0 1,3577 1,104 3,5 0 4,6048 0,011 4,0 0 4,0119 1,633 4,5 0 6,13310 -0,698 5,0 0 4,302
Para Para StdErr TCste -0,81 0,62 -1,31β 0,62 0,09 6,87α 0,95 0,52 1,83
E(Y |T = 1) = 3,584E(Y |T = 0) = 2,625Difference = 0,960
Guy Lacroix (Université Laval) Program Evaluation 25–28 October 2015 19 / 72
Evaluation problem: formal presentation Social experimentation
Experimention has a solution to the selection problem
Random assignment eliminates the correlation between u and ν.Xi ui Xiβ Ti Yi1 -1,911 0,5 1 -0,4112 -1,447 1,0 1 0,5533 -0,037 1,5 1 2,4634 0,703 2,0 1 3,7035 -0,407 2,5 1 3,0936 -0,473 3,0 1 3,5277 1,832 3,5 1 6,3328 1,467 4,0 1 6,4679 -1,766 4,5 1 3,73410 0,384 5,0 1 6,3841 0,728 0,5 0 1,2282 -0,977 1,0 0 0,0233 -1,107 1,5 0 0,3934 0,531 2,0 0 2,5315 -0,834 2,5 0 1,6666 -1,643 3,0 0 1,3577 1,104 3,5 0 4,6048 0,011 4,0 0 4,0119 1,633 4,5 0 6,13310 -0,698 5,0 0 4,302
Para Para StdErr TCste -0,81 0,62 -1,31β 0,62 0,09 6,87α 0,95 0,52 1,83
E(Y |T = 1) = 3,584E(Y |T = 0) = 2,625Difference = 0,960
Guy Lacroix (Université Laval) Program Evaluation 25–28 October 2015 19 / 72
Evaluation problem: formal presentation Social experimentation
Experimention has a solution to the selection problem
Random assignment eliminates the correlation between u and ν.Xi ui Xiβ Ti Yi1 -1,911 0,5 1 -0,4112 -1,447 1,0 1 0,5533 -0,037 1,5 1 2,4634 0,703 2,0 1 3,7035 -0,407 2,5 1 3,0936 -0,473 3,0 1 3,5277 1,832 3,5 1 6,3328 1,467 4,0 1 6,4679 -1,766 4,5 1 3,73410 0,384 5,0 1 6,3841 0,728 0,5 0 1,2282 -0,977 1,0 0 0,0233 -1,107 1,5 0 0,3934 0,531 2,0 0 2,5315 -0,834 2,5 0 1,6666 -1,643 3,0 0 1,3577 1,104 3,5 0 4,6048 0,011 4,0 0 4,0119 1,633 4,5 0 6,13310 -0,698 5,0 0 4,302
Para Para StdErr TCste -0,81 0,62 -1,31β 0,62 0,09 6,87α 0,95 0,52 1,83
E(Y |T = 1) = 3,584E(Y |T = 0) = 2,625Difference = 0,960
Guy Lacroix (Université Laval) Program Evaluation 25–28 October 2015 19 / 72
Evaluation problem: formal presentation Social experimentation
Social experimentation: Limits and caveats
ProblemsAttrition biasRandomization biasContagion biasLogistics bias
LimitsGeneral equilibrium effectsNo inference on structural parameters (Ex: SSP in Canada)External validity
Guy Lacroix (Université Laval) Program Evaluation 25–28 October 2015 20 / 72
Evaluation problem: formal presentation Social experimentation
Social experimentation: Limits and caveats
ProblemsAttrition biasRandomization biasContagion biasLogistics bias
LimitsGeneral equilibrium effectsNo inference on structural parameters (Ex: SSP in Canada)External validity
Guy Lacroix (Université Laval) Program Evaluation 25–28 October 2015 20 / 72
Solutions
Solutions
Before-After estimatorsDifference in differences estimatorCross-sectional estimators
Sample selection bais (“Heckit”)"Treatment effect” methodMatching Estimators’
Difference in differences matching estimator
Guy Lacroix (Université Laval) Program Evaluation 25–28 October 2015 21 / 72
Solutions Before-After estimator
Before-After estimator
Let Y1t ,Y0t ′ with t > k > t ′
This estimator assumes that E(Y0t − Y0t ′ |T = 1) = 0If it holds: α = E(Y1t − Y0t ′ |T = 1) (unbiased)Why ?: Y1t − Y0t = (Y1t − Y0t ′) + (Y0t ′ − Y0t )
The second term is an approximation error ((Y0t ′ − Y0t )→ 0 ).Does not require panel data. Repeated cross-sections can beused
Guy Lacroix (Université Laval) Program Evaluation 25–28 October 2015 22 / 72
Solutions Before-After estimator
The approximation error may not be zero:Changes in the economic environment between t and t ′.If the sample windows are too wide (1 year +) many changes mayoccur, in addition to the program implementation“Ashenfelter Dip”.Empirical regularity: The earnings of program participants declinein months prior to participation (t ′).
Guy Lacroix (Université Laval) Program Evaluation 25–28 October 2015 23 / 72
Solutions Before-After estimator
Ashenfelter Dip: Example # 1
FIGURE 1Mean Self-Reported Monthly Earnings
National JTPA Study Controls and Eligible Non-participants (ENPs) and SIPP EligiblesMale Adults
0
100
200
300
400
500
600
700
800
900
1000
1100
1200
1300
k-20 k-15 k-10 k-5 k k+5 k+10 k+15
Month Relative to Random Assignment (Controls) or Eligibility (ENPs and SIPP Eligibles)
SIPP Eligibles JTPA ENPs JTPA Controls
NominalDollars
Source: Heckman and Smith (1998b)
Guy Lacroix (Université Laval) Program Evaluation 25–28 October 2015 24 / 72
Solutions Before-After estimator
Ashenfelter Dip: Example # 2
FIGURE 2Mean Annual Earnings Prior, During, and Subsequent to Training for 1964 MDTA Classroom Trainees
and a Comparison Group: White Males
$0
$1,000
$2,000
$3,000
$4,000
$5,000
$6,000
1959 1960 1961 1962 1963 1964 1965 1966 1967 1968 1969Year
Ear
ning
s
Trainees Comparison GroupSource: Ashenfelter (1978).
Guy Lacroix (Université Laval) Program Evaluation 25–28 October 2015 25 / 72
Solutions Before-After estimator
Ashenfelter Dip: Example # 3
FIGURE 3Mean Annual Earnings for 1976 CETA Trainees and a Comparison Group
Males
$0
$1,000
$2,000
$3,000
$4,000
$5,000
$6,000
1970 1971 1972 1973 1974 1975 1976 1977 1978Year
Ear
ning
s in
196
7 D
olla
rs
Trainees Comparison GroupSource: Ashenfelter and Card (1985).
Guy Lacroix (Université Laval) Program Evaluation 25–28 October 2015 26 / 72
Solutions Before-After estimator
Ashenfelter Dip: Example # 4
FIGURE 4National Supported Work (NSW) Average Annual Earnings Treatments, Controls, and Matched CPS Comparison Group
AFDC Recipients
0
500
1000
1500
2000
2500
1972 1973 1974 1975 1976 1977 1978 1979Year
Ear
ning
s
NSW experimentals NSW Controls Matched CPS Comparison Group
Source: Fraker and Maynard (1987)
Enrollment Period
Guy Lacroix (Université Laval) Program Evaluation 25–28 October 2015 27 / 72
Solutions Before-After estimator
Ashenfelter Dip: Example # 5
FIGURE 5Earnings of Participants in Swedish UI Training in 1991 and Two Comparison Groups
Adult Males--Ages 26-54
0
20
40
60
80
100
120
140
160
1986 1987 1988 1989 1990 1991 1992
Year
Mea
n A
nnua
l Ear
ning
s in
Tho
usan
ds o
f 199
5 S
EK
Trainees Comparison Group 1 Comparison Group 2Source: Regner (1997)
Guy Lacroix (Université Laval) Program Evaluation 25–28 October 2015 28 / 72
Solutions Before-After estimator
Ashenfelter Dip: Example # 6
FIGURE 6Earnings of 1991 Participants in Norwegian Labor Market Training Programme and a Randomly Assigned
Control GroupAll Participants
0
10000
20000
30000
40000
50000
60000
70000
80000
90000
100000
1989 1990 1991 1992 1993 1994Year
Mea
n A
nnua
l Ear
ning
s in
199
4 N
OK
Treatment Group Control GroupSource: Raaum and Torp (1997)
Guy Lacroix (Université Laval) Program Evaluation 25–28 October 2015 29 / 72
Solutions Before-After estimator
Let Yit be the outcome variable of participant i at time t , where t > k .
One can write : Yit = δ + Xitβ + αTit + uit
Yit ′ = δ + Xit ′β + uit ′ , où
Tit =
{1 if t > k0 otherwise
The parameter estimate α is valid only under the following assumption:
Guy Lacroix (Université Laval) Program Evaluation 25–28 October 2015 30 / 72
Solutions Before-After estimator
T ∗it = Witγ + θi + εit , with
T ∗it =
{1 if Witγ + θi + εit > 00 otherwise
and where Corr(εit ,uit ) = 0. It can then be shown that:
E(θi |T = 1) = δ
Guy Lacroix (Université Laval) Program Evaluation 25–28 October 2015 31 / 72
Solutions Difference in differences estimator
Difference in differences estimator
Let Y1t ,Y0t ′ , with t > k > t ′.This estimator assumes that:
E(Y0t − Y0t ′ |T = 1) = E(Y0t − Y0t ′ |T = 0).
If the assumption if true:
α = E(Y1t − Y0t ′ |T = 1)− E(Y0t − Y0t ′ |T = 0)
= (Y1t − Y0t ′ |T = 1)− (Y0t − Y0t ′ |T = 0)
The assumption is violated if the data depict an Ashenfelter Dip.
Guy Lacroix (Université Laval) Program Evaluation 25–28 October 2015 32 / 72
Solutions Difference in differences estimator
The estimator is valide since by decomposition :
(Y1t − Y0t ′ |T = 1) = (Y1t − Y0t |T = 1) + (Y0t − Y0t ′ |T = 1)
Thus, under the assumption that
E(Y0t − Y0t ′ |T = 1) = E(Y0t − Y0t ′ |T = 0)
It follows that
E(Y1t − Y0t |T = 1) = E(Y1t − Y0t′ |T = 1)︸ ︷︷ ︸observable
−E(Y0t − Y0t′ |T = 0)︸ ︷︷ ︸observable
Guy Lacroix (Université Laval) Program Evaluation 25–28 October 2015 33 / 72
Solutions Difference in differences estimator(regression)
Difference in differences estimator(regression)
Let Yit be the outcome variable at time t of participant i , wheret > k > t ′.We can write
Yit = δ + Xitβ + φPit + αPitTit + uit où
Tit =
{1 if Participant0 otherwise
Pit =
{1 if After participation0 otherwise
This estimator allows for a temporal effect that is common to bothparticipants and non-participants (φ).
Guy Lacroix (Université Laval) Program Evaluation 25–28 October 2015 34 / 72
Solutions Difference in differences estimator(regression)
Graphical representation of the DD estimator
X
Y
Y=X
Y=X
Y=X
k
Guy Lacroix (Université Laval) Program Evaluation 25–28 October 2015 35 / 72
Solutions Cross-section estimator
Cross-section estimator
In order to better understand the nature of the selection bias problem,assume the decision to participate in a given program can beparametrize as follows:
T ∗i = Wiγ + νi (2)
where νi is an error term, γ is a vector of unknown parameters, andwhere
Ti =
{1, if T ∗i > 00, otherwise
(3)
Guy Lacroix (Université Laval) Program Evaluation 25–28 October 2015 36 / 72
Solutions Cross-section estimator
Likewise, let:Yi = Xiβ + αTi + ui , (4)
To the extent the participation decision depends on νi , it can be shownthat the parametre estimate α of (4) will be biased because
E(ui |Ti ,Xi) 6= 0. (5)
Guy Lacroix (Université Laval) Program Evaluation 25–28 October 2015 37 / 72
Solutions Heckman-type correction
Heckman-type correction (“Heckit”)
The main assumption that is required to derive an unbiased estimatorof α requires that the error terms ui and νi are mean zero and follow ajoint normal distribution. In other words:(
uiνi
)∼[σ2 ρρ 1
](6)
Under this assumption we can estimate α in two stages or bymaximum likelihood (STATA: “Treatreg”).
Guy Lacroix (Université Laval) Program Evaluation 25–28 October 2015 38 / 72
Solutions Heckman-type correction
1 - Two-stage least squares: Y continuous
A. Probit on participation:
T ∗i = Wiγ + νi :
Prob(Ti = 1) = Prob(T ∗i > 0)
= Prob(Wiγ + νi > 0)
= Prob(νi > −Wiγ)
=
∫ ∞−Wiγ
f (ξ)dξ
= 1− Φ(−Wiγ)
= Φ(Wiγ)
Based on γ ,we compute Ti = Φ(Wi γ)
Guy Lacroix (Université Laval) Program Evaluation 25–28 October 2015 39 / 72
Solutions Heckman-type correction
B. Second stage:Yi = Xiβ + Tiα + ui
Three problems with this approach:Identification of α;Correction of standard errors;Only valid if Yi is continuous and not truncated.
1 Probit on Yi2 Tobit on Yi3 Many recent developments (non-parametric methods)
Guy Lacroix (Université Laval) Program Evaluation 25–28 October 2015 40 / 72
Solutions Heckman-type correction
B. Second stage:Yi = Xiβ + Tiα + ui
Three problems with this approach:Identification of α;Correction of standard errors;Only valid if Yi is continuous and not truncated.
1 Probit on Yi2 Tobit on Yi3 Many recent developments (non-parametric methods)
Guy Lacroix (Université Laval) Program Evaluation 25–28 October 2015 40 / 72
Solutions Heckman-type correction
2-Two-stage least squares: Y truncated
A. Probit on participation:
T ∗i = Wiγ + νi
Prob(Ti = 1) = Prob(T ∗i > 0)
= Prob(Wiγ + νi > 0)
= Prob(νi > −Wiγ)
=
∫ ∞−Wiγ
f (ξ)dξ
= 1− Φ(−Wiγ)
= Φ(Wiγ)
Ti = Φ(Wi γ)
Guy Lacroix (Université Laval) Program Evaluation 25–28 October 2015 41 / 72
Solutions Heckman-type correction
B. We select only observations for which Y > 0. We then estimate(see equation (1))
E(Yi |Ti ,Xi ,Wi) = Xiβ + αTi + σφ(−Wiγ)
Φ(−Wiγ)(7)
6= Xiβ + αTi
Three problems with this approahc:Identification of α a real issueCorrection of standard errors;Valid only if Yi is continuous
Guy Lacroix (Université Laval) Program Evaluation 25–28 October 2015 42 / 72
Solutions Heckman-type correction
B. We select only observations for which Y > 0. We then estimate(see equation (1))
E(Yi |Ti ,Xi ,Wi) = Xiβ + αTi + σφ(−Wiγ)
Φ(−Wiγ)(7)
6= Xiβ + αTi
Three problems with this approahc:Identification of α a real issueCorrection of standard errors;Valid only if Yi is continuous
Guy Lacroix (Université Laval) Program Evaluation 25–28 October 2015 42 / 72
Solutions Heckman-type correction
3 - Maximum likelihood
The likelihood function for observation i :
Yi = Xiβ + Tiα + ui
T ∗i = Wiγ + νi
Li = Ti ×∫ ∞−W1γ
f (ui , ξ)dξ +
(1− Ti)×∫ −W1γ
−∞f (ui , ξ)dξ.
Guy Lacroix (Université Laval) Program Evaluation 25–28 October 2015 43 / 72
Propensity score matching
Propensity score matching
PSM aims at making participation similar to a random experimentAdvantages of PSM:
1 Avoids making assumptions about the distribution of error terms;2 Avoids assuming additivity in the error ters.
The approach rests upon a restrictive set of assumptions:1 Conditional independence (on observables) assumption (CIA);2 Existence of a comparison group.
Guy Lacroix (Université Laval) Program Evaluation 25–28 October 2015 44 / 72
Propensity score matching
Propensity score matching
PSM aims at making participation similar to a random experimentAdvantages of PSM:
1 Avoids making assumptions about the distribution of error terms;2 Avoids assuming additivity in the error ters.
The approach rests upon a restrictive set of assumptions:1 Conditional independence (on observables) assumption (CIA);2 Existence of a comparison group.
Guy Lacroix (Université Laval) Program Evaluation 25–28 October 2015 44 / 72
Propensity score matching
Propensity score matching
CIAThe vector of observables, X , is such that Y 0 is independent ofparticipation, conditionally on X . Formally,
Y 0 ⊥ T |X (8)
Equation (8) states that, for a given X , the mean of Y fornon-participants corresponds to the mean that would have beenobserved for participants, had they not participated.
Guy Lacroix (Université Laval) Program Evaluation 25–28 October 2015 45 / 72
Propensity score matching
Existence of a comparaison groupThis assumption insures that for each participant there exists atleast one non-participant that is somehow similar in many aspects:
0 < P(T = 1|X ) < 1 (9)
Guy Lacroix (Université Laval) Program Evaluation 25–28 October 2015 46 / 72
Propensity score matching
CIA requires numerous conditioning variables (= Dim(X )).⇒ Each individual must be matched on the basis of manyobservable characteristics.Raises an important dimensionality problem. Avoided thanks to atheorem by Rosenbaum et al(1983):
CIA⇒ CIS (independence conditional on a score).
The score is a scaler mapping (dimension 1) of X .
The score, in most cases, is simply the probability of participating:P = P(T = 1|X )
Guy Lacroix (Université Laval) Program Evaluation 25–28 October 2015 47 / 72
Propensity score matching
This assumtion is written as:
Y 0 ⊥ T |X ⇒ Y 0 ⊥ T |P(X )
Guy Lacroix (Université Laval) Program Evaluation 25–28 October 2015 48 / 72
Propensity score matching
The construction if a comparison group for each participant is based on theproximity between his score and that of the non-participants.
Because the score is a continuous variable, there exits many distance metricsthat can be used.
The most common is the one proposed by Heckman et al (1998). They suggestthe use of a kernel estimator such that:
E(Y 0|P(X )) =∑j∈I0
(Kh[P(Xj )− P(Xi )]∑
j∈I0Kh[P(Xj )− P(Xi )]
)Yj ,
where I0 is the set of non-participants and
Kh[P(Xj )− P(Xi )] = K[
P(Xj )− P(Xi )
h
],
with K the kernel and h a window or "bandwith".
Guy Lacroix (Université Laval) Program Evaluation 25–28 October 2015 49 / 72
Propensity score matching
Each non-participant contributes to the construction of thecounterfactual for participant i . Individual weights vary according to themetric distance between the scores.The estimator is:
α =1
N1
∑i∈I1
Yi −∑j∈I0
(Kh[P(Xj)− P(Xi)]∑
j∈I0 Kh[P(Xj)− P(Xi)]
)Yj
,where N1 is the number of participants.
Guy Lacroix (Université Laval) Program Evaluation 25–28 October 2015 50 / 72
Propensity score matching Propensity score matching and DD
Propensity score matching and DD
CIA is a strong assumption that is easily violated in manycircumstances.It certainly will be the case if there are unobservable individualfixed effects, λ, that affect both the impact of participation as wellas the probability of participating.These fixed effects are not problematic if they can be somehowremoved from the estimatorsIt can be shown that CIA can be generalized to models with fixedeffects. Indeed, we can write(8) as:
Y 0 ⊥ T |X , λ et g(Y 0,X ) ⊥ λ|X ⇒ g(Y 0,X ) ⊥ T |X ,
where g(·) is a linear transformation of Y 0.
Guy Lacroix (Université Laval) Program Evaluation 25–28 October 2015 51 / 72
Propensity score matching Propensity score matching and DD
CIA will be valid if the effects of λ can be removed from Y 0.To remover λ from the equations it is necessary to haveinformation on Y 1
it and Y 0it before and after participation.
A simple differentiation will eliminate individual fixed effects.The estimator generalizes the DD estimator within a PSMframework:
Guy Lacroix (Université Laval) Program Evaluation 25–28 October 2015 52 / 72
Propensity score matching Propensity score matching and DD
α =1
N1
∑i∈I1
∆Yi −∑j∈I0
(Kh[P(Xj)− P(Xi)]∑
j∈I0 Kh[P(Xj)− P(Xi)]
)∆Yj
where ∆Yi = Yit − Yit ′ , and where t ′ < k . This estimator is verycommon in the literature on program evaluation.
Guy Lacroix (Université Laval) Program Evaluation 25–28 October 2015 53 / 72
Propensity score matching Practical considerations
Practical considerations
Tests on the distribution if X (“Balancing score property”).“PSCORE” in Stata. Observations with the same score P(X )should have the same distribution of observables,X ,independently of T . ⇒ Observational data have the sameproperties as a random experiment.
Guy Lacroix (Université Laval) Program Evaluation 25–28 October 2015 54 / 72
Propensity score matching Practical considerations
Procedure
1 Estimate the model
Prob(Ti = 1|Xi) = Φ{h(xi)}
2 Divide the sample into k sub-samples on the basis of equidistantintervals of the scores P(X ).
3 For each interval, test for the equality of the mean scores for thetwo groupes (treated and untreated).
4 If the test is rejected, sub-divide the interval anew and redo thetest.
5 Continue until the test passes for each interval.
Guy Lacroix (Université Laval) Program Evaluation 25–28 October 2015 55 / 72
Propensity score matching Practical considerations
6 Verify that the X have the same mean for both group within eachinterval.
7 If rejected, change the model specification (usually moreparcimonuous)
8 If not rejected, estimate the impact of the program.
Guy Lacroix (Université Laval) Program Evaluation 25–28 October 2015 56 / 72
Propensity score matching Practical considerations
Common support
Common Support Restriction
For each participant me must construct a counterfactual using snon-participants (s ≥ 1)⇒We can only construct such acounterfactual for individuals whose scores are common to thescores of both groups.
The estimator is thus "local" in a certain sense, i.e.E(α|P(X ) ∈ S∩,T = 1), where S∩ = ST ∩ SNT .Consequently, the estimator of the score (probit) must not be "too"good . . .
Guy Lacroix (Université Laval) Program Evaluation 25–28 October 2015 57 / 72
Propensity score matching Practical considerations
Guy Lacroix (Université Laval) Program Evaluation 25–28 October 2015 58 / 72
Propensity score matching Practical considerations
Other types of matching estimators1 Nearest-Neighbour Matching
Let Ci = minj ||pi − pj ||. Each participant is associated with a singlenon-participant. The latter is the one whose score is closest to thatof participant i .
2 Radius Matching
Let Ci = {pj | ||pi − pj || < r}. Each participant is associated with allthe non-participant whose scores are within a given distance fromhis own score.
Guy Lacroix (Université Laval) Program Evaluation 25–28 October 2015 59 / 72
Propensity score matching Practical considerations
Let NCi be the number of non-participants and
ωij =
{1
NCi, if j ∈ Ci ,
0, otherwise
The matching estimator in both cases is given by:
α =1
NT
∑i∈T
Y Ti −
∑j∈Ci
ωij Y Cj
=
1NT
∑i∈T
Y Ti −
∑i∈T
∑j∈Ci
ωij Y Cj
=
1NT
∑i∈T
Y Ti −
1NT
∑j∈C
ωj Y Cj ,
where ωj =∑
i ωij .
Guy Lacroix (Université Laval) Program Evaluation 25–28 October 2015 60 / 72
Propensity score matching Heterogeneous Treatment Effect Analysis
Heterogeneous Treatment Effect Analysis
A basic paradigm of the literature based on the potentialoutcomes model is that there can be individual heterogeneity intreatment effects, which stands in contrast to traditional regressionmodelling assuming constant parameters.The view that treatment effects can be heterogeneous led to newmethods for causal inference and also to new uses andinterpretations of existing methods (e.g. LATE interpretation of IVestimators, revival of matching and regression discontinuitydesigns).Surprisingly, however, not much attention is usually paid to theexplicit analysis of the heterogeneity of treatment effects inapplied studies.
Guy Lacroix (Université Laval) Program Evaluation 25–28 October 2015 61 / 72
Propensity score matching Heterogeneous Treatment Effect Analysis
Heterogeneous Treatment Effect Analysis (Cont.)
Recall that:
Yi = µi + αTi + ui
= α + Xiβ + δTi + ui (Homogeneous treatment)or= α + Xiβ + δiTi + ui ( Heterogeneous treatment)
Equivalently:
Yi,1 = α1 + Xiβ1 + ui,1, if Ti = 1,Yi,0 = α0 + Xiβ0 + ui,0, if Ti = 0,
Ti =
{1, if T ∗i ≥ 00, if T ∗i < 0
where T ∗i γZi − Vi
Guy Lacroix (Université Laval) Program Evaluation 25–28 October 2015 62 / 72
Propensity score matching Heterogeneous Treatment Effect Analysis
Heterogeneous Treatment Effect Analysis (Cont.)
Recall that:
Yi = µi + αTi + ui
= α + Xiβ + δTi + ui (Homogeneous treatment)or= α + Xiβ + δiTi + ui ( Heterogeneous treatment)
Equivalently:
Yi,1 = α1 + Xiβ1 + ui,1, if Ti = 1,Yi,0 = α0 + Xiβ0 + ui,0, if Ti = 0,
Ti =
{1, if T ∗i ≥ 00, if T ∗i < 0
where T ∗i γZi − Vi
Guy Lacroix (Université Laval) Program Evaluation 25–28 October 2015 62 / 72
Propensity score matching Heterogeneous Treatment Effect Analysis
Heterogeneous Treatment Effect Analysis (Cont.)
Observed outcome:
Yi = Ti × Yi,1 + (1− Ti)× Yi,0
= Ti(α1 + Xiβ1 + ui,1) + (1− Ti)(α0 + Xiβ0 + ui,0)
= α0 + Xβ0 + [(α1 − α0) + (β1 − β0)Xi + (ui,1 − ui,0)]Ti + ui,0
Therefore:
δi = (α1 − α0)︸ ︷︷ ︸ATEα
+ (β1 − β0)Xi︸ ︷︷ ︸ATEx
+ (ui,1 − ui,0)︸ ︷︷ ︸Het on Unobs
Guy Lacroix (Université Laval) Program Evaluation 25–28 October 2015 63 / 72
Propensity score matching Heterogeneous Treatment Effect Analysis
Heterogeneous Treatment Effect Analysis (Cont.)
Observed outcome:
Yi = Ti × Yi,1 + (1− Ti)× Yi,0
= Ti(α1 + Xiβ1 + ui,1) + (1− Ti)(α0 + Xiβ0 + ui,0)
= α0 + Xβ0 + [(α1 − α0) + (β1 − β0)Xi + (ui,1 − ui,0)]Ti + ui,0
Therefore:
δi = (α1 − α0)︸ ︷︷ ︸ATEα
+ (β1 − β0)Xi︸ ︷︷ ︸ATEx
+ (ui,1 − ui,0)︸ ︷︷ ︸Het on Unobs
Guy Lacroix (Université Laval) Program Evaluation 25–28 October 2015 63 / 72
Propensity score matching Heterogeneous Treatment Effect Analysis
Unbiasedness of OLSIf δi and ui,0 are uncorrelated with Ti , the population average treatmenteffect may be estimated by OLS
Homogeneous treatment effect: ATE = ATEα
Yi = α + Xiβ + δTi + ui
Heterogeneous treatment effect: ATE = ATEα + ATEx
Yi = α + Xiβ0 + (β1 − β0)Xi × Ti + δTi + ui
Guy Lacroix (Université Laval) Program Evaluation 25–28 October 2015 64 / 72
Propensity score matching Heterogeneous Treatment Effect Analysis
When are OLS wrong ?
OLS are biased if δi and ui,0 are uncorrelated with Ti
δ(xi ) = E(Ti |Xi = xi ,Ti = 1)− E(Ti |Xi = xi ,Ti = 0)
= E(α1 + Xiβ1 + ui,1|Ti = 1)− E(α0 + Xiβ0 + ui,0|Ti = 0)
= (α1 − α0) + (β1 − β0)xi + E(ui,1|Ti = 1)− E(ui,0|Ti = 0)
= E(δi |Xi = xi ) + E(ui,1|Ti = 1)− E(ui,0|Ti = 0)
= ATE(xi ) + E(ui,1|Ti = 1)− E(ui,0|Ti = 1)
+ E(ui,0|Ti = 1)− E(ui,0|Ti = 0)
= ATE(xi ) + {E(ui,1 − ui,0)|Ti = 1)}︸ ︷︷ ︸Sorting on Gains
+ {E(ui,0|Ti = 1)− E(ui,0|Ti = 0)}︸ ︷︷ ︸Selection Bias
= ATE(xi ) + SGEui,1 + SB1→i,0
Guy Lacroix (Université Laval) Program Evaluation 25–28 October 2015 65 / 72
Propensity score matching Heterogeneous Treatment Effect Analysis
When are OLS wrong ?
OLS are biased if δi and ui,0 are uncorrelated with Ti
δ(xi ) = E(Ti |Xi = xi ,Ti = 1)− E(Ti |Xi = xi ,Ti = 0)
= E(α1 + Xiβ1 + ui,1|Ti = 1)− E(α0 + Xiβ0 + ui,0|Ti = 0)
= (α1 − α0) + (β1 − β0)xi + E(ui,1|Ti = 1)− E(ui,0|Ti = 0)
= E(δi |Xi = xi ) + E(ui,1|Ti = 1)− E(ui,0|Ti = 0)
= ATE(xi ) + E(ui,1|Ti = 1)− E(ui,0|Ti = 1)
+ E(ui,0|Ti = 1)− E(ui,0|Ti = 0)
= ATE(xi ) + {E(ui,1 − ui,0)|Ti = 1)}︸ ︷︷ ︸Sorting on Gains
+ {E(ui,0|Ti = 1)− E(ui,0|Ti = 0)}︸ ︷︷ ︸Selection Bias
= ATE(xi ) + SGEui,1 + SB1→i,0
Guy Lacroix (Université Laval) Program Evaluation 25–28 October 2015 65 / 72
Propensity score matching Heterogeneous Treatment Effect Analysis
When are OLS wrong ?
OLS are biased if δi and ui,0 are uncorrelated with Ti
δ(xi ) = E(Ti |Xi = xi ,Ti = 1)− E(Ti |Xi = xi ,Ti = 0)
= E(α1 + Xiβ1 + ui,1|Ti = 1)− E(α0 + Xiβ0 + ui,0|Ti = 0)
= (α1 − α0) + (β1 − β0)xi + E(ui,1|Ti = 1)− E(ui,0|Ti = 0)
= E(δi |Xi = xi ) + E(ui,1|Ti = 1)− E(ui,0|Ti = 0)
= ATE(xi ) + E(ui,1|Ti = 1)− E(ui,0|Ti = 1)
+ E(ui,0|Ti = 1)− E(ui,0|Ti = 0)
= ATE(xi ) + {E(ui,1 − ui,0)|Ti = 1)}︸ ︷︷ ︸Sorting on Gains
+ {E(ui,0|Ti = 1)− E(ui,0|Ti = 0)}︸ ︷︷ ︸Selection Bias
= ATE(xi ) + SGEui,1 + SB1→i,0
Guy Lacroix (Université Laval) Program Evaluation 25–28 October 2015 65 / 72
Propensity score matching Heterogeneous Treatment Effect Analysis
When are OLS wrong ?
OLS are biased if δi and ui,0 are uncorrelated with Ti
δ(xi ) = E(Ti |Xi = xi ,Ti = 1)− E(Ti |Xi = xi ,Ti = 0)
= E(α1 + Xiβ1 + ui,1|Ti = 1)− E(α0 + Xiβ0 + ui,0|Ti = 0)
= (α1 − α0) + (β1 − β0)xi + E(ui,1|Ti = 1)− E(ui,0|Ti = 0)
= E(δi |Xi = xi ) + E(ui,1|Ti = 1)− E(ui,0|Ti = 0)
= ATE(xi ) + E(ui,1|Ti = 1)− E(ui,0|Ti = 1)
+ E(ui,0|Ti = 1)− E(ui,0|Ti = 0)
= ATE(xi ) + {E(ui,1 − ui,0)|Ti = 1)}︸ ︷︷ ︸Sorting on Gains
+ {E(ui,0|Ti = 1)− E(ui,0|Ti = 0)}︸ ︷︷ ︸Selection Bias
= ATE(xi ) + SGEui,1 + SB1→i,0
Guy Lacroix (Université Laval) Program Evaluation 25–28 October 2015 65 / 72
Propensity score matching Heterogeneous Treatment Effect Analysis
When are OLS wrong ?
OLS are biased if δi and ui,0 are uncorrelated with Ti
δ(xi ) = E(Ti |Xi = xi ,Ti = 1)− E(Ti |Xi = xi ,Ti = 0)
= E(α1 + Xiβ1 + ui,1|Ti = 1)− E(α0 + Xiβ0 + ui,0|Ti = 0)
= (α1 − α0) + (β1 − β0)xi + E(ui,1|Ti = 1)− E(ui,0|Ti = 0)
= E(δi |Xi = xi ) + E(ui,1|Ti = 1)− E(ui,0|Ti = 0)
= ATE(xi ) + E(ui,1|Ti = 1)− E(ui,0|Ti = 1)
+ E(ui,0|Ti = 1)− E(ui,0|Ti = 0)
= ATE(xi ) + {E(ui,1 − ui,0)|Ti = 1)}︸ ︷︷ ︸Sorting on Gains
+ {E(ui,0|Ti = 1)− E(ui,0|Ti = 0)}︸ ︷︷ ︸Selection Bias
= ATE(xi ) + SGEui,1 + SB1→i,0
Guy Lacroix (Université Laval) Program Evaluation 25–28 October 2015 65 / 72
Propensity score matching Heterogeneous Treatment Effect Analysis
When are OLS wrong ?
OLS are biased if δi and ui,0 are uncorrelated with Ti
δ(xi ) = E(Ti |Xi = xi ,Ti = 1)− E(Ti |Xi = xi ,Ti = 0)
= E(α1 + Xiβ1 + ui,1|Ti = 1)− E(α0 + Xiβ0 + ui,0|Ti = 0)
= (α1 − α0) + (β1 − β0)xi + E(ui,1|Ti = 1)− E(ui,0|Ti = 0)
= E(δi |Xi = xi ) + E(ui,1|Ti = 1)− E(ui,0|Ti = 0)
= ATE(xi ) + E(ui,1|Ti = 1)− E(ui,0|Ti = 1)
+ E(ui,0|Ti = 1)− E(ui,0|Ti = 0)
= ATE(xi ) + {E(ui,1 − ui,0)|Ti = 1)}︸ ︷︷ ︸Sorting on Gains
+ {E(ui,0|Ti = 1)− E(ui,0|Ti = 0)}︸ ︷︷ ︸Selection Bias
= ATE(xi ) + SGEui,1 + SB1→i,0
Guy Lacroix (Université Laval) Program Evaluation 25–28 October 2015 65 / 72
Propensity score matching Heterogeneous Treatment Effect Analysis
When are OLS wrong ?
OLS are biased if δi and ui,0 are uncorrelated with Ti
δ(xi ) = E(Ti |Xi = xi ,Ti = 1)− E(Ti |Xi = xi ,Ti = 0)
= E(α1 + Xiβ1 + ui,1|Ti = 1)− E(α0 + Xiβ0 + ui,0|Ti = 0)
= (α1 − α0) + (β1 − β0)xi + E(ui,1|Ti = 1)− E(ui,0|Ti = 0)
= E(δi |Xi = xi ) + E(ui,1|Ti = 1)− E(ui,0|Ti = 0)
= ATE(xi ) + E(ui,1|Ti = 1)− E(ui,0|Ti = 1)
+ E(ui,0|Ti = 1)− E(ui,0|Ti = 0)
= ATE(xi ) + {E(ui,1 − ui,0)|Ti = 1)}︸ ︷︷ ︸Sorting on Gains
+ {E(ui,0|Ti = 1)− E(ui,0|Ti = 0)}︸ ︷︷ ︸Selection Bias
= ATE(xi ) + SGEui,1 + SB1→i,0
Guy Lacroix (Université Laval) Program Evaluation 25–28 October 2015 65 / 72
Propensity score matching Heterogeneous Treatment Effect Analysis
Biases, reverse order
δ(xi ) = ATE(xi ) + {E(ui,1 − ui,0)|Ti = 0)}︸ ︷︷ ︸Sorting on Gains
+ {E(ui,1|Ti = 1)− E(ui,1|Ti = 0)}︸ ︷︷ ︸Selection Bias
= ATE(xi ) + SGEui,0 + SB0→i,1
Different estimators
TT (xi ) = E(Yi,1 − Yi,0|Xi = xi ,Ti = 1)
= ATE(xi ) + SGEu1,i
TUTxi = E(Yi,1 − Yi,0|Xi = xi ,Ti = 0)
= ATE(xi ) + SGEu0,i
Guy Lacroix (Université Laval) Program Evaluation 25–28 October 2015 66 / 72
Propensity score matching Heterogeneous Treatment Effect Analysis
Biases, reverse order
δ(xi ) = ATE(xi ) + {E(ui,1 − ui,0)|Ti = 0)}︸ ︷︷ ︸Sorting on Gains
+ {E(ui,1|Ti = 1)− E(ui,1|Ti = 0)}︸ ︷︷ ︸Selection Bias
= ATE(xi ) + SGEui,0 + SB0→i,1
Different estimators
TT (xi ) = E(Yi,1 − Yi,0|Xi = xi ,Ti = 1)
= ATE(xi ) + SGEu1,i
TUTxi = E(Yi,1 − Yi,0|Xi = xi ,Ti = 0)
= ATE(xi ) + SGEu0,i
Guy Lacroix (Université Laval) Program Evaluation 25–28 October 2015 66 / 72
Propensity score matching Heterogeneous Treatment Effect Analysis
Population estimates
ATE =
∫X
ATE(xi )dFX (x)
TT =
∫X |T=1
TT (xi )dFX |T=1(x)
TUT =
∫X |T=0
TUT (xi )dFX |T=0(x)
δ = ATE + SGE + SB1→0
Guy Lacroix (Université Laval) Program Evaluation 25–28 October 2015 67 / 72
Propensity score matching Heterogeneous Treatment Effect Analysis
Heterogeneous Treatment Effect Analysis in Practice
For example, in the literature on economic returns to highereducation various theories have been proposed that implyheterogeneous effects depending on the probability to go tocollege.
Human-capital theory in economics predicts positive selection intotreatment, because people choose to go to college based on theexpected economic returns. This is a widely accepted view.More sociologically oriented literature suggests that collegeattendance is strongly influenced by social origin, which leads tonegative selection into treatment under certain conditions.
To evaluate these theories it is therefore crucial to analyze howtreatment effects vary with treatment probability.
Guy Lacroix (Université Laval) Program Evaluation 25–28 October 2015 68 / 72
Propensity score matching Heterogeneous Treatment Effect Analysis
Heterogeneous Treatment Effect Analysis in Practice
Ben Jann, Jennie E. Brand and Yu Xie have developed a usefulStata module to perform this analysis (HTE).
Brand, J. E., Y. Xie (2010). Who Benefits Most From College?Evidence for Negative Selection in Heterogeneous Economic Returnsto Higher Education. American Sociological Review 75:273–302.
HTE comes in 3 flavours: het, het2, het3
Guy Lacroix (Université Laval) Program Evaluation 25–28 October 2015 69 / 72
Propensity score matching Heterogeneous Treatment Effect Analysis
Heterogeneous Treatment Effect Analysis (Cont.)
The approach of hte is to assume conditional unconfoundednessgiven a set of covariates and use propensity score stratification toestimate treatment effects at various points over the range of thepropensity score.In hte the strata-specific effects are then analyzed to determinewhether there is a trend in treatment effects.With hte2 and hte3, non-parametric analysis of the treatmenteffect in relation the individual scores.
Guy Lacroix (Université Laval) Program Evaluation 25–28 October 2015 70 / 72
Propensity score matching Heterogeneous Treatment Effect Analysis
The hte algorithm consists of four basic steps:1 Estimation of the propensity score (i.e. the conditional probability
to receive treatment).
2 Construction of balanced propensity score strata (using PSCOREor PSMATCH2)
3 Estimation of strata-specific average treatment effects4 Estimation of the trend of treatment effects across propensity
score strata.
Guy Lacroix (Université Laval) Program Evaluation 25–28 October 2015 71 / 72
Propensity score matching Heterogeneous Treatment Effect Analysis
The hte algorithm consists of four basic steps:1 Estimation of the propensity score (i.e. the conditional probability
to receive treatment).2 Construction of balanced propensity score strata (using PSCORE
or PSMATCH2)
3 Estimation of strata-specific average treatment effects4 Estimation of the trend of treatment effects across propensity
score strata.
Guy Lacroix (Université Laval) Program Evaluation 25–28 October 2015 71 / 72
Propensity score matching Heterogeneous Treatment Effect Analysis
The hte algorithm consists of four basic steps:1 Estimation of the propensity score (i.e. the conditional probability
to receive treatment).2 Construction of balanced propensity score strata (using PSCORE
or PSMATCH2)3 Estimation of strata-specific average treatment effects
4 Estimation of the trend of treatment effects across propensityscore strata.
Guy Lacroix (Université Laval) Program Evaluation 25–28 October 2015 71 / 72
Propensity score matching Heterogeneous Treatment Effect Analysis
The hte algorithm consists of four basic steps:1 Estimation of the propensity score (i.e. the conditional probability
to receive treatment).2 Construction of balanced propensity score strata (using PSCORE
or PSMATCH2)3 Estimation of strata-specific average treatment effects4 Estimation of the trend of treatment effects across propensity
score strata.
Guy Lacroix (Université Laval) Program Evaluation 25–28 October 2015 71 / 72
Propensity score matching Heterogeneous Treatment Effect Analysis
HTE Examples
-500
00
5000
1000
0Tr
eatm
ent E
ffect
1 2 3 4 5 6 7Propensity Score Strata
95% CI TE within stratalinear trend
slope of linear trend (s.e.) = 785.728 (311.864)Difference earnings
050
0010
000
Trea
tmen
t Effe
ct
.2 .4 .6 .8 1Propensity Score
95% CI lpoly fit
Diff Earnings-2
0000
-100
000
1000
020
000
Trea
tmen
t Effe
ct
1 2 3 4 5 6 7Propensity Score Strata
95% CI TE within stratalinear trend
slope of linear trend (s.e.) = 414.933 (640.637)Immigrant Diff Earnings
-100
00-5
000
050
0010
000
1500
0Tr
eatm
ent E
ffect
1 2 3 4 5 6 7Propensity Score Strata
95% CI TE within stratalinear trend
slope of linear trend (s.e.) = 1001.051 (640.637)Natives Diff Earnings
Guy Lacroix (Université Laval) Program Evaluation 25–28 October 2015 72 / 72