Matching and Selection on Observables
Transcript of Matching and Selection on Observables
![Page 1: Matching and Selection on Observables](https://reader030.fdocuments.us/reader030/viewer/2022012809/61bf3866b8eb1e764d18aa33/html5/thumbnails/1.jpg)
Matching and Selection on ObservablesResearch Design for Causal Inference
Northwestern University, 2012
Alberto AbadieHarvard University
![Page 2: Matching and Selection on Observables](https://reader030.fdocuments.us/reader030/viewer/2022012809/61bf3866b8eb1e764d18aa33/html5/thumbnails/2.jpg)
Motivation: Smoking and Mortality (Cochran, 1968)
Table 1
Death Rates per 1,000 Person-Years
Smoking group Canada U.K. U.S.
Non-smokers 20.2 11.3 13.5Cigarettes 20.5 14.1 13.5Cigars/pipes 35.5 20.7 17.4
![Page 3: Matching and Selection on Observables](https://reader030.fdocuments.us/reader030/viewer/2022012809/61bf3866b8eb1e764d18aa33/html5/thumbnails/3.jpg)
Motivation: Smoking and Mortality (Cochran, 1968)
Table 2
Mean Ages, Years
Smoking group Canada U.K. U.S.
Non-smokers 54.9 49.1 57.0Cigarettes 50.5 49.8 53.2Cigars/pipes 65.9 55.7 59.7
![Page 4: Matching and Selection on Observables](https://reader030.fdocuments.us/reader030/viewer/2022012809/61bf3866b8eb1e764d18aa33/html5/thumbnails/4.jpg)
Subclassification
To control for di↵erences in age, we would like to comparemortality across di↵erent smoking-habit groups with the same
age distribution
One possibility is to use subclassification:
Divide the smoking-habit samples into age groups
For each of the smoking-habit samples, calculate the mortalityrates for the age groups
Construct weighted averages of the age groups mortality ratesfor each smoking-habit sample using a fixed set of weightsacross all samples (that is, for a fixed distribution of the agevariable across all samples)
![Page 5: Matching and Selection on Observables](https://reader030.fdocuments.us/reader030/viewer/2022012809/61bf3866b8eb1e764d18aa33/html5/thumbnails/5.jpg)
Subclassification: Example
Death Rates # Pipe- # Non-Pipe Smokers Smokers Smokers
Age 20 - 50 15 11 29Age 50 - 70 35 13 9Age + 70 50 16 2Total 40 40
QuestionWhat is the average death rate for Pipe Smokers?
![Page 6: Matching and Selection on Observables](https://reader030.fdocuments.us/reader030/viewer/2022012809/61bf3866b8eb1e764d18aa33/html5/thumbnails/6.jpg)
Subclassification: Example
Death Rates # Pipe- # Non-Pipe Smokers Smokers Smokers
Age 20 - 50 15 11 29Age 50 - 70 35 13 9Age + 70 50 16 2Total 40 40
QuestionWhat is the average death rate for Pipe Smokers?
Answer15 · (11/40) + 35 · (13/40) + 50 · (16/40) = 35.5
![Page 7: Matching and Selection on Observables](https://reader030.fdocuments.us/reader030/viewer/2022012809/61bf3866b8eb1e764d18aa33/html5/thumbnails/7.jpg)
Subclassification: Example
Death Rates # Pipe- # Non-Pipe Smokers Smokers Smokers
Age 20 - 50 15 11 29Age 50 - 70 35 13 9Age + 70 50 16 2Total 40 40
QuestionWhat would be the average death rate for Pipe Smokers if they
had same age distribution as Non-Smokers?
![Page 8: Matching and Selection on Observables](https://reader030.fdocuments.us/reader030/viewer/2022012809/61bf3866b8eb1e764d18aa33/html5/thumbnails/8.jpg)
Subclassification: Example
Death Rates # Pipe- # Non-Pipe Smokers Smokers Smokers
Age 20 - 50 15 11 29Age 50 - 70 35 13 9Age + 70 50 16 2Total 40 40
QuestionWhat would be the average death rate for Pipe Smokers if they
had same age distribution as Non-Smokers?
Answer15 · (29/40) + 35 · (9/40) + 50 · (2/40) = 21.2
![Page 9: Matching and Selection on Observables](https://reader030.fdocuments.us/reader030/viewer/2022012809/61bf3866b8eb1e764d18aa33/html5/thumbnails/9.jpg)
Smoking and Mortality (Cochran, 1968)
Table 3
Adjusted Death Rates using 3 Age groups
Smoking group Canada U.K. U.S.
Non-smokers 20.2 11.3 13.5Cigarettes 28.3 12.8 17.7Cigars/pipes 21.2 12.0 14.2
![Page 10: Matching and Selection on Observables](https://reader030.fdocuments.us/reader030/viewer/2022012809/61bf3866b8eb1e764d18aa33/html5/thumbnails/10.jpg)
Covariates and Outcomes
Definition (Predetermined Covariates)
Variable X is predetermined with respect to the treatment D (alsocalled “pretreatment”) if for each individual i , X
0i = X
1i , ie. thevalue of Xi does not depend on the value of Di . Suchcharacteristics are called covariates.
Does not imply that X and D are independent
Predetermined variables are often time invariant (sex, race,etc.), but time invariance is not necessary
Definition (Outcomes)
Those variables, Y , that are (possibly) not predetermined arecalled outcomes (for some individual i , Y
0i 6= Y
1i )
In general, one should not condition on outcomes, because thismay induce bias
![Page 11: Matching and Selection on Observables](https://reader030.fdocuments.us/reader030/viewer/2022012809/61bf3866b8eb1e764d18aa33/html5/thumbnails/11.jpg)
Adjustment for Observables in Observational Studies
Subclassification
Matching
Propensity Score Methods
Regression
![Page 12: Matching and Selection on Observables](https://reader030.fdocuments.us/reader030/viewer/2022012809/61bf3866b8eb1e764d18aa33/html5/thumbnails/12.jpg)
Identification in Randomized Experiments
Randomization implies:
(Y1
,Y0
) independent of D, or (Y1
,Y0
)??D.
Therefore:
E [Y |D = 1]� E [Y |D = 0] = E [Y1
|D = 1]� E [Y0
|D = 0]
= E [Y1
]� E [Y0
]
= E [Y1
� Y
0
]
Also, we have that
E [Y1
� Y
0
] = E [Y1
� Y
0
|D = 1].
![Page 13: Matching and Selection on Observables](https://reader030.fdocuments.us/reader030/viewer/2022012809/61bf3866b8eb1e764d18aa33/html5/thumbnails/13.jpg)
Identification under Selection on Observables
Identification Assumption
1 (Y1
,Y0
)??D|X (selection on observables)
2 0 < Pr(D = 1|X ) < 1 with probability one (common support)
Identification ResultGiven selection on observables we have
E [Y1
� Y
0
|X ] = E [Y1
� Y
0
|X ,D = 1]
= E [Y |X ,D = 1]� E [Y |X ,D = 0]
Therefore, under the common support condition:
↵ATE = E [Y1
� Y
0
] =
ZE [Y
1
� Y
0
|X ] dP(X )
=
Z �E [Y |X ,D = 1]� E [Y |X ,D = 0]
�dP(X )
![Page 14: Matching and Selection on Observables](https://reader030.fdocuments.us/reader030/viewer/2022012809/61bf3866b8eb1e764d18aa33/html5/thumbnails/14.jpg)
Identification under Selection on Observables
Identification Assumption
1 (Y1
,Y0
)??D|X (selection on observables)
2 0 < Pr(D = 1|X ) < 1 with probability one (common support)
Identification ResultSimilarly,
↵ATET = E [Y1
� Y
0
|D = 1]
=
Z �E [Y |X ,D = 1]� E [Y |X ,D = 0]
�dP(X |D = 1)
To identify ↵ATET the selection on observables and common support
conditions can be relaxed to:
Y
0
??D|X
Pr(D = 1|X ) < 1 (with Pr(D = 1) > 0)
![Page 15: Matching and Selection on Observables](https://reader030.fdocuments.us/reader030/viewer/2022012809/61bf3866b8eb1e764d18aa33/html5/thumbnails/15.jpg)
Subclassification Estimator
The identification result is:
↵ATE =
Z �E [Y |X ,D = 1]� E [Y |X ,D = 0]
�dP(X )
↵ATET =
Z �E [Y |X ,D = 1]� E [Y |X ,D = 0]
�dP(X |D = 1)
Assume X takes on K di↵erent cells {X 1, ...,X k , ...,XK}.Then,the analogy principle suggests the following estimators:
b↵ATE =KX
k=1
�Y
k1
� Y
k0
�·✓N
k
N
◆; b↵ATET =
KX
k=1
�Y
k1
� Y
k0
�·✓N
k1
N
1
◆
N
k is # of obs. and N
k1
is # of treated obs. in cell k
Y
k1
is mean outcome for the treated in cell k
Y
k0
is mean outcome for the untreated in cell k
![Page 16: Matching and Selection on Observables](https://reader030.fdocuments.us/reader030/viewer/2022012809/61bf3866b8eb1e764d18aa33/html5/thumbnails/16.jpg)
Subclassification Estimator
The identification result is:
↵ATE =
Z �E [Y |X ,D = 1]� E [Y |X ,D = 0]
�dP(X )
↵ATET =
Z �E [Y |X ,D = 1]� E [Y |X ,D = 0]
�dP(X |D = 1)
Assume X takes on K di↵erent cells {X 1, ...,X k , ...,XK}.
Then,the analogy principle suggests the following estimators:
b↵ATE =KX
k=1
�Y
k1
� Y
k0
�·✓N
k
N
◆; b↵ATET =
KX
k=1
�Y
k1
� Y
k0
�·✓N
k1
N
1
◆
N
k is # of obs. and N
k1
is # of treated obs. in cell k
Y
k1
is mean outcome for the treated in cell k
Y
k0
is mean outcome for the untreated in cell k
![Page 17: Matching and Selection on Observables](https://reader030.fdocuments.us/reader030/viewer/2022012809/61bf3866b8eb1e764d18aa33/html5/thumbnails/17.jpg)
Subclassification Estimator
The identification result is:
↵ATE =
Z �E [Y |X ,D = 1]� E [Y |X ,D = 0]
�dP(X )
↵ATET =
Z �E [Y |X ,D = 1]� E [Y |X ,D = 0]
�dP(X |D = 1)
Assume X takes on K di↵erent cells {X 1, ...,X k , ...,XK}.Then,the analogy principle suggests the following estimators:
b↵ATE =KX
k=1
�Y
k1
� Y
k0
�·✓N
k
N
◆; b↵ATET =
KX
k=1
�Y
k1
� Y
k0
�·✓N
k1
N
1
◆
N
k is # of obs. and N
k1
is # of treated obs. in cell k
Y
k1
is mean outcome for the treated in cell k
Y
k0
is mean outcome for the untreated in cell k
![Page 18: Matching and Selection on Observables](https://reader030.fdocuments.us/reader030/viewer/2022012809/61bf3866b8eb1e764d18aa33/html5/thumbnails/18.jpg)
Subclassification by Age (K = 2)
Death Rate Death Rate # #Xk Smokers Non-Smokers Di↵. Smokers Obs.Old 28 24 4 3 10
Young 22 16 6 7 10Total 10 20
QuestionWhat is b↵ATE =
PKk=1
�Y
k1
� Y
k0
�·⇣Nk
N
⌘?
![Page 19: Matching and Selection on Observables](https://reader030.fdocuments.us/reader030/viewer/2022012809/61bf3866b8eb1e764d18aa33/html5/thumbnails/19.jpg)
Subclassification by Age (K = 2)
Death Rate Death Rate # #Xk Smokers Non-Smokers Di↵. Smokers Obs.Old 28 24 4 3 10
Young 22 16 6 7 10Total 10 20
QuestionWhat is b↵ATE =
PKk=1
�Y
k1
� Y
k0
�·⇣Nk
N
⌘?
b↵ATE = 4 · (10/20) + 6 · (10/20) = 5
![Page 20: Matching and Selection on Observables](https://reader030.fdocuments.us/reader030/viewer/2022012809/61bf3866b8eb1e764d18aa33/html5/thumbnails/20.jpg)
Subclassification by Age (K = 2)
Death Rate Death Rate # #Xk Smokers Non-Smokers Di↵. Smokers Obs.Old 28 24 4 3 10
Young 22 16 6 7 10Total 10 20
QuestionWhat is b↵ATET =
PKk=1
�Y
k1
� Y
k0
�·⇣Nk1
N1
⌘?
![Page 21: Matching and Selection on Observables](https://reader030.fdocuments.us/reader030/viewer/2022012809/61bf3866b8eb1e764d18aa33/html5/thumbnails/21.jpg)
Subclassification by Age (K = 2)
Death Rate Death Rate # #Xk Smokers Non-Smokers Di↵. Smokers Obs.Old 28 24 4 3 10
Young 22 16 6 7 10Total 10 20
QuestionWhat is b↵ATET =
PKk=1
�Y
k1
� Y
k0
�·⇣Nk1
N1
⌘?
b↵ATET = 4 · (3/10) + 6 · (7/10) = 5.4
![Page 22: Matching and Selection on Observables](https://reader030.fdocuments.us/reader030/viewer/2022012809/61bf3866b8eb1e764d18aa33/html5/thumbnails/22.jpg)
Subclassification by Age and Gender (K = 4)
Death Rate Death Rate # #Xk Smokers Non-Smokers Di↵. Smokers Obs.Old, Male 28 22 4 3 7Old, Female 24 0 3Young, Male 21 16 5 3 4Young, Female 23 17 6 4 6Total 10 20
ProblemWhat is b↵ATE =
PKk=1
�Y
k1
� Y
k0
�·⇣
Nk
N
⌘?
![Page 23: Matching and Selection on Observables](https://reader030.fdocuments.us/reader030/viewer/2022012809/61bf3866b8eb1e764d18aa33/html5/thumbnails/23.jpg)
Subclassification by Age and Gender (K = 4)
Death Rate Death Rate # #Xk Smokers Non-Smokers Di↵. Smokers Obs.Old, Male 28 22 4 3 7Old, Female 24 0 3Young, Male 21 16 5 3 4Young, Female 23 17 6 4 6Total 10 20
ProblemWhat is b↵ATE =
PKk=1
�Y
k1
� Y
k0
�·⇣
Nk
N
⌘?
Not identified!
![Page 24: Matching and Selection on Observables](https://reader030.fdocuments.us/reader030/viewer/2022012809/61bf3866b8eb1e764d18aa33/html5/thumbnails/24.jpg)
Subclassification by Age and Gender (K = 4)
Death Rate Death Rate # #Xk Smokers Non-Smokers Di↵. Smokers Obs.Old, Male 28 22 4 3 7Old, Female 24 0 3Young, Male 21 16 5 3 4Young, Female 23 17 6 4 6Total 10 20
ProblemWhat is b↵ATET =
PKk=1
�Y
k1
� Y
k0
�·⇣
Nk1
N1
⌘?
![Page 25: Matching and Selection on Observables](https://reader030.fdocuments.us/reader030/viewer/2022012809/61bf3866b8eb1e764d18aa33/html5/thumbnails/25.jpg)
Subclassification by Age and Gender (K = 4)
Death Rate Death Rate # #Xk Smokers Non-Smokers Di↵. Smokers Obs.Old, Male 28 22 4 3 7Old, Female 24 0 3Young, Male 21 16 5 3 4Young, Female 23 17 6 4 6Total 10 20
ProblemWhat is b↵ATET =
PKk=1
�Y
k1
� Y
k0
�·⇣
Nk1
N1
⌘?
b↵ATET = 4 · (3/10) + 5 · (3/10) + 6 · (4/10) = 5.1
![Page 26: Matching and Selection on Observables](https://reader030.fdocuments.us/reader030/viewer/2022012809/61bf3866b8eb1e764d18aa33/html5/thumbnails/26.jpg)
Subclassification and the “Curse of Dimensionality”
Subclassification becomes unfeasible with many covariates
Assume we have k covariates and divide each of them into 3coarse categories (e.g., age could be “young”, “middle age” or“old”, and income could be “low”, “medium” or “high”).
The number of subclassification cells is 3k . For k = 10, weobtain 310 = 59049
Many cells may contain only treated or untreatedobservations, so we cannot use subclassification
Subclassification is also problematic if the cells are “toocoarse”. But using “finer” cells worsen the curse ofdimensionality problem: e.g., using 10 variables and 5categories for each variable we obtain 510 = 9765625
![Page 27: Matching and Selection on Observables](https://reader030.fdocuments.us/reader030/viewer/2022012809/61bf3866b8eb1e764d18aa33/html5/thumbnails/27.jpg)
Matching
An alternative way to estimate ↵ATET is by “imputing” the missingpotential outcome of each treated unit using the observed outcomefrom the “closest” untreated unit:
b↵ATET =1
N
1
X
Di=1
�Yi � Yj(i)
�
where Yj(i) is the outcome of an untreated observation such thatXj(i) is the closest value to Xi among the untreated observations.
We can also use the average for M closest matches:
b↵ATET =1
N
1
X
Di=1
(Yi �
1
M
MX
m=1
Yjm(i)
!)
Works well when we can find good matches for each treated unit,so M is usually small (typically, M = 1 or M = 2).
![Page 28: Matching and Selection on Observables](https://reader030.fdocuments.us/reader030/viewer/2022012809/61bf3866b8eb1e764d18aa33/html5/thumbnails/28.jpg)
Matching
An alternative way to estimate ↵ATET is by “imputing” the missingpotential outcome of each treated unit using the observed outcomefrom the “closest” untreated unit:
b↵ATET =1
N
1
X
Di=1
�Yi � Yj(i)
�
where Yj(i) is the outcome of an untreated observation such thatXj(i) is the closest value to Xi among the untreated observations.
We can also use the average for M closest matches:
b↵ATET =1
N
1
X
Di=1
(Yi �
1
M
MX
m=1
Yjm(i)
!)
Works well when we can find good matches for each treated unit,so M is usually small (typically, M = 1 or M = 2).
![Page 29: Matching and Selection on Observables](https://reader030.fdocuments.us/reader030/viewer/2022012809/61bf3866b8eb1e764d18aa33/html5/thumbnails/29.jpg)
Matching
We can also use matching to estimate ↵ATE . In that case, wematch in both directions:
1 If observation i is treated, we impute Y
0i using untreatedmatchs, {Yj
1
(i), . . . ,YjM(i)}2 If observation i is untreated, we impute Y
1i using treatedmatchs, {Yj
1
(i), . . . ,YjM(i)}The estimator is:
b↵ATE =1
N
NX
i=1
(2Di � 1)
(Yi �
1
M
MX
m=1
Yjm(i)
!)
![Page 30: Matching and Selection on Observables](https://reader030.fdocuments.us/reader030/viewer/2022012809/61bf3866b8eb1e764d18aa33/html5/thumbnails/30.jpg)
Matching: Example with single X
Potential Outcome Potential Outcomeunit under Treatment under Controli Y
1i Y
0i Di Xi
1 6 ? 1 32 1 ? 1 13 0 ? 1 104 0 0 25 9 0 36 1 0 -27 1 0 -4
QuestionWhat is b↵ATET = 1
N1
PDi=1
�Yi � Yj(i)
�?
![Page 31: Matching and Selection on Observables](https://reader030.fdocuments.us/reader030/viewer/2022012809/61bf3866b8eb1e764d18aa33/html5/thumbnails/31.jpg)
Matching: Example with single X
Potential Outcome Potential Outcomeunit under Treatment under Controli Y
1i Y
0i Di Xi
1 6 ? 1 32 1 ? 1 13 0 ? 1 104 0 0 25 9 0 36 1 0 -27 1 0 -4
QuestionWhat is b↵ATET = 1
N1
PDi=1
�Yi � Yj(i)
�?
Match and plugin in
![Page 32: Matching and Selection on Observables](https://reader030.fdocuments.us/reader030/viewer/2022012809/61bf3866b8eb1e764d18aa33/html5/thumbnails/32.jpg)
Matching: Example with single X
Potential Outcome Potential Outcomeunit under Treatment under Controli Y
1i Y
0i Di Xi
1 6 9 1 32 1 0 1 13 0 9 1 104 0 0 25 9 0 36 1 0 -27 1 0 -4
QuestionWhat is b↵ATET = 1
N1
PDi=1
�Yi � Yj(i)
�?
b↵ATET = 1/3 · (6� 9) + 1/3 · (1� 0) + 1/3 · (0� 9) = �3.7
![Page 33: Matching and Selection on Observables](https://reader030.fdocuments.us/reader030/viewer/2022012809/61bf3866b8eb1e764d18aa33/html5/thumbnails/33.jpg)
A Training ExampleTrainees Non-Trainees
unit age earnings unit age earnings
1 28 17700 1 43 20900
2 34 10200 2 50 31000
3 29 14400 3 30 21000
4 25 20800 4 27 9300
5 29 6100 5 54 41100
6 23 28600 6 48 29800
7 33 21900 7 39 42000
8 27 28800 8 28 8800
9 31 20300 9 24 25500
10 26 28100 10 33 15500
11 25 9400 11 26 400
12 27 14300 12 31 26600
13 29 12500 13 26 16500
14 24 19700 14 34 24200
15 25 10100 15 25 23300
16 43 10700 16 24 9700
17 28 11500 17 29 6200
18 27 10700 18 35 30200
19 28 16300 19 32 17800
Average: 28.5 16426 20 23 9500
21 32 25900
Average: 33 20724
![Page 34: Matching and Selection on Observables](https://reader030.fdocuments.us/reader030/viewer/2022012809/61bf3866b8eb1e764d18aa33/html5/thumbnails/34.jpg)
A Training ExampleTrainees Non-Trainees Matched Sample
unit age earnings unit age earnings unit age earnings
1 28 17700 1 43 20900
2 34 10200 2 50 31000
3 29 14400 3 30 21000
4 25 20800 4 27 9300
5 29 6100 5 54 41100
6 23 28600 6 48 29800
7 33 21900 7 39 42000
8 27 28800 8 28 8800
9 31 20300 9 24 25500
10 26 28100 10 33 15500
11 25 9400 11 26 400
12 27 14300 12 31 26600
13 29 12500 13 26 16500
14 24 19700 14 34 24200
15 25 10100 15 25 23300
16 43 10700 16 24 9700
17 28 11500 17 29 6200
18 27 10700 18 35 30200
19 28 16300 19 32 17800
Average: 28.5 16426 20 23 9500 Average:
21 32 25900
Average: 33 20724
![Page 35: Matching and Selection on Observables](https://reader030.fdocuments.us/reader030/viewer/2022012809/61bf3866b8eb1e764d18aa33/html5/thumbnails/35.jpg)
A Training ExampleTrainees Non-Trainees Matched Sample
unit age earnings unit age earnings unit age earnings
1 28 17700 1 43 20900
2 34 10200 2 50 31000
3 29 14400 3 30 21000
4 25 20800 4 27 9300
5 29 6100 5 54 41100
6 23 28600 6 48 29800
7 33 21900 7 39 42000
8 27 28800 8 28 8800
9 31 20300 9 24 25500
10 26 28100 10 33 15500
11 25 9400 11 26 400
12 27 14300 12 31 26600
13 29 12500 13 26 16500
14 24 19700 14 34 24200
15 25 10100 15 25 23300
16 43 10700 16 24 9700
17 28 11500 17 29 6200
18 27 10700 18 35 30200
19 28 16300 19 32 17800
Average: 28.5 16426 20 23 9500 Average:
21 32 25900
Average: 33 20724
![Page 36: Matching and Selection on Observables](https://reader030.fdocuments.us/reader030/viewer/2022012809/61bf3866b8eb1e764d18aa33/html5/thumbnails/36.jpg)
A Training ExampleTrainees Non-Trainees Matched Sample
unit age earnings unit age earnings unit age earnings
1 28 17700 1 43 20900
2 34 10200 2 50 31000
3 29 14400 3 30 21000
4 25 20800 4 27 9300
5 29 6100 5 54 41100
6 23 28600 6 48 29800
7 33 21900 7 39 42000
8 27 28800 8 28 8800
9 31 20300 9 24 25500
10 26 28100 10 33 15500
11 25 9400 11 26 400
12 27 14300 12 31 26600
13 29 12500 13 26 16500
14 24 19700 14 34 24200
15 25 10100 15 25 23300
16 43 10700 16 24 9700
17 28 11500 17 29 6200
18 27 10700 18 35 30200
19 28 16300 19 32 17800
Average: 28.5 16426 20 23 9500 Average:
21 32 25900
Average: 33 20724
![Page 37: Matching and Selection on Observables](https://reader030.fdocuments.us/reader030/viewer/2022012809/61bf3866b8eb1e764d18aa33/html5/thumbnails/37.jpg)
A Training ExampleTrainees Non-Trainees Matched Sample
unit age earnings unit age earnings unit age earnings
1 28 17700 1 43 20900 8 28 8800
2 34 10200 2 50 31000
3 29 14400 3 30 21000
4 25 20800 4 27 9300
5 29 6100 5 54 41100
6 23 28600 6 48 29800
7 33 21900 7 39 42000
8 27 28800 8 28 8800
9 31 20300 9 24 25500
10 26 28100 10 33 15500
11 25 9400 11 26 400
12 27 14300 12 31 26600
13 29 12500 13 26 16500
14 24 19700 14 34 24200
15 25 10100 15 25 23300
16 43 10700 16 24 9700
17 28 11500 17 29 6200
18 27 10700 18 35 30200
19 28 16300 19 32 17800
Average: 28.5 16426 20 23 9500 Average:
21 32 25900
Average: 33 20724
![Page 38: Matching and Selection on Observables](https://reader030.fdocuments.us/reader030/viewer/2022012809/61bf3866b8eb1e764d18aa33/html5/thumbnails/38.jpg)
A Training ExampleTrainees Non-Trainees Matched Sample
unit age earnings unit age earnings unit age earnings
1 28 17700 1 43 20900 8 28 8800
2 34 10200 2 50 31000 14 34 24200
3 29 14400 3 30 21000
4 25 20800 4 27 9300
5 29 6100 5 54 41100
6 23 28600 6 48 29800
7 33 21900 7 39 42000
8 27 28800 8 28 8800
9 31 20300 9 24 25500
10 26 28100 10 33 15500
11 25 9400 11 26 400
12 27 14300 12 31 26600
13 29 12500 13 26 16500
14 24 19700 14 34 24200
15 25 10100 15 25 23300
16 43 10700 16 24 9700
17 28 11500 17 29 6200
18 27 10700 18 35 30200
19 28 16300 19 32 17800
Average: 28.5 16426 20 23 9500 Average:
21 32 25900
Average: 33 20724
![Page 39: Matching and Selection on Observables](https://reader030.fdocuments.us/reader030/viewer/2022012809/61bf3866b8eb1e764d18aa33/html5/thumbnails/39.jpg)
A Training ExampleTrainees Non-Trainees Matched Sample
unit age earnings unit age earnings unit age earnings
1 28 17700 1 43 20900 8 28 8800
2 34 10200 2 50 31000 14 34 24200
3 29 14400 3 30 21000 17 29 6200
4 25 20800 4 27 9300
5 29 6100 5 54 41100
6 23 28600 6 48 29800
7 33 21900 7 39 42000
8 27 28800 8 28 8800
9 31 20300 9 24 25500
10 26 28100 10 33 15500
11 25 9400 11 26 400
12 27 14300 12 31 26600
13 29 12500 13 26 16500
14 24 19700 14 34 24200
15 25 10100 15 25 23300
16 43 10700 16 24 9700
17 28 11500 17 29 6200
18 27 10700 18 35 30200
19 28 16300 19 32 17800
Average: 28.5 16426 20 23 9500 Average:
21 32 25900
Average: 33 20724
![Page 40: Matching and Selection on Observables](https://reader030.fdocuments.us/reader030/viewer/2022012809/61bf3866b8eb1e764d18aa33/html5/thumbnails/40.jpg)
A Training ExampleTrainees Non-Trainees Matched Sample
unit age earnings unit age earnings unit age earnings
1 28 17700 1 43 20900 8 28 8800
2 34 10200 2 50 31000 14 34 24200
3 29 14400 3 30 21000 17 29 6200
4 25 20800 4 27 9300 15 25 23300
5 29 6100 5 54 41100
6 23 28600 6 48 29800
7 33 21900 7 39 42000
8 27 28800 8 28 8800
9 31 20300 9 24 25500
10 26 28100 10 33 15500
11 25 9400 11 26 400
12 27 14300 12 31 26600
13 29 12500 13 26 16500
14 24 19700 14 34 24200
15 25 10100 15 25 23300
16 43 10700 16 24 9700
17 28 11500 17 29 6200
18 27 10700 18 35 30200
19 28 16300 19 32 17800
Average: 28.5 16426 20 23 9500 Average:
21 32 25900
Average: 33 20724
![Page 41: Matching and Selection on Observables](https://reader030.fdocuments.us/reader030/viewer/2022012809/61bf3866b8eb1e764d18aa33/html5/thumbnails/41.jpg)
A Training ExampleTrainees Non-Trainees Matched Sample
unit age earnings unit age earnings unit age earnings
1 28 17700 1 43 20900 8 28 8800
2 34 10200 2 50 31000 14 34 24200
3 29 14400 3 30 21000 17 29 6200
4 25 20800 4 27 9300 15 25 23300
5 29 6100 5 54 41100 17 29 6200
6 23 28600 6 48 29800
7 33 21900 7 39 42000
8 27 28800 8 28 8800
9 31 20300 9 24 25500
10 26 28100 10 33 15500
11 25 9400 11 26 400
12 27 14300 12 31 26600
13 29 12500 13 26 16500
14 24 19700 14 34 24200
15 25 10100 15 25 23300
16 43 10700 16 24 9700
17 28 11500 17 29 6200
18 27 10700 18 35 30200
19 28 16300 19 32 17800
Average: 28.5 16426 20 23 9500 Average:
21 32 25900
Average: 33 20724
![Page 42: Matching and Selection on Observables](https://reader030.fdocuments.us/reader030/viewer/2022012809/61bf3866b8eb1e764d18aa33/html5/thumbnails/42.jpg)
A Training ExampleTrainees Non-Trainees Matched Sample
unit age earnings unit age earnings unit age earnings
1 28 17700 1 43 20900 8 28 8800
2 34 10200 2 50 31000 14 34 24200
3 29 14400 3 30 21000 17 29 6200
4 25 20800 4 27 9300 15 25 23300
5 29 6100 5 54 41100 17 29 6200
6 23 28600 6 48 29800 20 23 9500
7 33 21900 7 39 42000
8 27 28800 8 28 8800
9 31 20300 9 24 25500
10 26 28100 10 33 15500
11 25 9400 11 26 400
12 27 14300 12 31 26600
13 29 12500 13 26 16500
14 24 19700 14 34 24200
15 25 10100 15 25 23300
16 43 10700 16 24 9700
17 28 11500 17 29 6200
18 27 10700 18 35 30200
19 28 16300 19 32 17800
Average: 28.5 16426 20 23 9500 Average:
21 32 25900
Average: 33 20724
![Page 43: Matching and Selection on Observables](https://reader030.fdocuments.us/reader030/viewer/2022012809/61bf3866b8eb1e764d18aa33/html5/thumbnails/43.jpg)
A Training ExampleTrainees Non-Trainees Matched Sample
unit age earnings unit age earnings unit age earnings
1 28 17700 1 43 20900 8 28 8800
2 34 10200 2 50 31000 14 34 24200
3 29 14400 3 30 21000 17 29 6200
4 25 20800 4 27 9300 15 25 23300
5 29 6100 5 54 41100 17 29 6200
6 23 28600 6 48 29800 20 23 9500
7 33 21900 7 39 42000 10 33 15500
8 27 28800 8 28 8800
9 31 20300 9 24 25500
10 26 28100 10 33 15500
11 25 9400 11 26 400
12 27 14300 12 31 26600
13 29 12500 13 26 16500
14 24 19700 14 34 24200
15 25 10100 15 25 23300
16 43 10700 16 24 9700
17 28 11500 17 29 6200
18 27 10700 18 35 30200
19 28 16300 19 32 17800
Average: 28.5 16426 20 23 9500 Average:
21 32 25900
Average: 33 20724
![Page 44: Matching and Selection on Observables](https://reader030.fdocuments.us/reader030/viewer/2022012809/61bf3866b8eb1e764d18aa33/html5/thumbnails/44.jpg)
A Training ExampleTrainees Non-Trainees Matched Sample
unit age earnings unit age earnings unit age earnings
1 28 17700 1 43 20900 8 28 8800
2 34 10200 2 50 31000 14 34 24200
3 29 14400 3 30 21000 17 29 6200
4 25 20800 4 27 9300 15 25 23300
5 29 6100 5 54 41100 17 29 6200
6 23 28600 6 48 29800 20 23 9500
7 33 21900 7 39 42000 10 33 15500
8 27 28800 8 28 8800 4 27 9300
9 31 20300 9 24 25500
10 26 28100 10 33 15500
11 25 9400 11 26 400
12 27 14300 12 31 26600
13 29 12500 13 26 16500
14 24 19700 14 34 24200
15 25 10100 15 25 23300
16 43 10700 16 24 9700
17 28 11500 17 29 6200
18 27 10700 18 35 30200
19 28 16300 19 32 17800
Average: 28.5 16426 20 23 9500 Average:
21 32 25900
Average: 33 20724
![Page 45: Matching and Selection on Observables](https://reader030.fdocuments.us/reader030/viewer/2022012809/61bf3866b8eb1e764d18aa33/html5/thumbnails/45.jpg)
A Training ExampleTrainees Non-Trainees Matched Sample
unit age earnings unit age earnings unit age earnings
1 28 17700 1 43 20900 8 28 8800
2 34 10200 2 50 31000 14 34 24200
3 29 14400 3 30 21000 17 29 6200
4 25 20800 4 27 9300 15 25 23300
5 29 6100 5 54 41100 17 29 6200
6 23 28600 6 48 29800 20 23 9500
7 33 21900 7 39 42000 10 33 15500
8 27 28800 8 28 8800 4 27 9300
9 31 20300 9 24 25500 12 31 26600
10 26 28100 10 33 15500
11 25 9400 11 26 400
12 27 14300 12 31 26600
13 29 12500 13 26 16500
14 24 19700 14 34 24200
15 25 10100 15 25 23300
16 43 10700 16 24 9700
17 28 11500 17 29 6200
18 27 10700 18 35 30200
19 28 16300 19 32 17800
Average: 28.5 16426 20 23 9500 Average:
21 32 25900
Average: 33 20724
![Page 46: Matching and Selection on Observables](https://reader030.fdocuments.us/reader030/viewer/2022012809/61bf3866b8eb1e764d18aa33/html5/thumbnails/46.jpg)
A Training ExampleTrainees Non-Trainees Matched Sample
unit age earnings unit age earnings unit age earnings
1 28 17700 1 43 20900 8 28 8800
2 34 10200 2 50 31000 14 34 24200
3 29 14400 3 30 21000 17 29 6200
4 25 20800 4 27 9300 15 25 23300
5 29 6100 5 54 41100 17 29 6200
6 23 28600 6 48 29800 20 23 9500
7 33 21900 7 39 42000 10 33 15500
8 27 28800 8 28 8800 4 27 9300
9 31 20300 9 24 25500 12 31 26600
10 26 28100 10 33 15500 11,13 26 8450
11 25 9400 11 26 400
12 27 14300 12 31 26600
13 29 12500 13 26 16500
14 24 19700 14 34 24200
15 25 10100 15 25 23300
16 43 10700 16 24 9700
17 28 11500 17 29 6200
18 27 10700 18 35 30200
19 28 16300 19 32 17800
Average: 28.5 16426 20 23 9500 Average:
21 32 25900
Average: 33 20724
![Page 47: Matching and Selection on Observables](https://reader030.fdocuments.us/reader030/viewer/2022012809/61bf3866b8eb1e764d18aa33/html5/thumbnails/47.jpg)
A Training ExampleTrainees Non-Trainees Matched Sample
unit age earnings unit age earnings unit age earnings
1 28 17700 1 43 20900 8 28 8800
2 34 10200 2 50 31000 14 34 24200
3 29 14400 3 30 21000 17 29 6200
4 25 20800 4 27 9300 15 25 23300
5 29 6100 5 54 41100 17 29 6200
6 23 28600 6 48 29800 20 23 9500
7 33 21900 7 39 42000 10 33 15500
8 27 28800 8 28 8800 4 27 9300
9 31 20300 9 24 25500 12 31 26600
10 26 28100 10 33 15500 11,13 26 8450
11 25 9400 11 26 400 15 25 23300
12 27 14300 12 31 26600
13 29 12500 13 26 16500
14 24 19700 14 34 24200
15 25 10100 15 25 23300
16 43 10700 16 24 9700
17 28 11500 17 29 6200
18 27 10700 18 35 30200
19 28 16300 19 32 17800
Average: 28.5 16426 20 23 9500 Average:
21 32 25900
Average: 33 20724
![Page 48: Matching and Selection on Observables](https://reader030.fdocuments.us/reader030/viewer/2022012809/61bf3866b8eb1e764d18aa33/html5/thumbnails/48.jpg)
A Training ExampleTrainees Non-Trainees Matched Sample
unit age earnings unit age earnings unit age earnings
1 28 17700 1 43 20900 8 28 8800
2 34 10200 2 50 31000 14 34 24200
3 29 14400 3 30 21000 17 29 6200
4 25 20800 4 27 9300 15 25 23300
5 29 6100 5 54 41100 17 29 6200
6 23 28600 6 48 29800 20 23 9500
7 33 21900 7 39 42000 10 33 15500
8 27 28800 8 28 8800 4 27 9300
9 31 20300 9 24 25500 12 31 26600
10 26 28100 10 33 15500 11,13 26 8450
11 25 9400 11 26 400 15 25 23300
12 27 14300 12 31 26600 4 27 9300
13 29 12500 13 26 16500
14 24 19700 14 34 24200
15 25 10100 15 25 23300
16 43 10700 16 24 9700
17 28 11500 17 29 6200
18 27 10700 18 35 30200
19 28 16300 19 32 17800
Average: 28.5 16426 20 23 9500 Average:
21 32 25900
Average: 33 20724
![Page 49: Matching and Selection on Observables](https://reader030.fdocuments.us/reader030/viewer/2022012809/61bf3866b8eb1e764d18aa33/html5/thumbnails/49.jpg)
A Training ExampleTrainees Non-Trainees Matched Sample
unit age earnings unit age earnings unit age earnings
1 28 17700 1 43 20900 8 28 8800
2 34 10200 2 50 31000 14 34 24200
3 29 14400 3 30 21000 17 29 6200
4 25 20800 4 27 9300 15 25 23300
5 29 6100 5 54 41100 17 29 6200
6 23 28600 6 48 29800 20 23 9500
7 33 21900 7 39 42000 10 33 15500
8 27 28800 8 28 8800 4 27 9300
9 31 20300 9 24 25500 12 31 26600
10 26 28100 10 33 15500 11,13 26 8450
11 25 9400 11 26 400 15 25 23300
12 27 14300 12 31 26600 4 27 9300
13 29 12500 13 26 16500 17 29 6200
14 24 19700 14 34 24200
15 25 10100 15 25 23300
16 43 10700 16 24 9700
17 28 11500 17 29 6200
18 27 10700 18 35 30200
19 28 16300 19 32 17800
Average: 28.5 16426 20 23 9500 Average:
21 32 25900
Average: 33 20724
![Page 50: Matching and Selection on Observables](https://reader030.fdocuments.us/reader030/viewer/2022012809/61bf3866b8eb1e764d18aa33/html5/thumbnails/50.jpg)
A Training ExampleTrainees Non-Trainees Matched Sample
unit age earnings unit age earnings unit age earnings
1 28 17700 1 43 20900 8 28 8800
2 34 10200 2 50 31000 14 34 24200
3 29 14400 3 30 21000 17 29 6200
4 25 20800 4 27 9300 15 25 23300
5 29 6100 5 54 41100 17 29 6200
6 23 28600 6 48 29800 20 23 9500
7 33 21900 7 39 42000 10 33 15500
8 27 28800 8 28 8800 4 27 9300
9 31 20300 9 24 25500 12 31 26600
10 26 28100 10 33 15500 11,13 26 8450
11 25 9400 11 26 400 15 25 23300
12 27 14300 12 31 26600 4 27 9300
13 29 12500 13 26 16500 17 29 6200
14 24 19700 14 34 24200 9,16 24 17700
15 25 10100 15 25 23300
16 43 10700 16 24 9700
17 28 11500 17 29 6200
18 27 10700 18 35 30200
19 28 16300 19 32 17800
Average: 28.5 16426 20 23 9500 Average:
21 32 25900
Average: 33 20724
![Page 51: Matching and Selection on Observables](https://reader030.fdocuments.us/reader030/viewer/2022012809/61bf3866b8eb1e764d18aa33/html5/thumbnails/51.jpg)
A Training ExampleTrainees Non-Trainees Matched Sample
unit age earnings unit age earnings unit age earnings
1 28 17700 1 43 20900 8 28 8800
2 34 10200 2 50 31000 14 34 24200
3 29 14400 3 30 21000 17 29 6200
4 25 20800 4 27 9300 15 25 23300
5 29 6100 5 54 41100 17 29 6200
6 23 28600 6 48 29800 20 23 9500
7 33 21900 7 39 42000 10 33 15500
8 27 28800 8 28 8800 4 27 9300
9 31 20300 9 24 25500 12 31 26600
10 26 28100 10 33 15500 11,13 26 8450
11 25 9400 11 26 400 15 25 23300
12 27 14300 12 31 26600 4 27 9300
13 29 12500 13 26 16500 17 29 6200
14 24 19700 14 34 24200 9,16 24 17700
15 25 10100 15 25 23300 15 25 23300
16 43 10700 16 24 9700
17 28 11500 17 29 6200
18 27 10700 18 35 30200
19 28 16300 19 32 17800
Average: 28.5 16426 20 23 9500 Average:
21 32 25900
Average: 33 20724
![Page 52: Matching and Selection on Observables](https://reader030.fdocuments.us/reader030/viewer/2022012809/61bf3866b8eb1e764d18aa33/html5/thumbnails/52.jpg)
A Training ExampleTrainees Non-Trainees Matched Sample
unit age earnings unit age earnings unit age earnings
1 28 17700 1 43 20900 8 28 8800
2 34 10200 2 50 31000 14 34 24200
3 29 14400 3 30 21000 17 29 6200
4 25 20800 4 27 9300 15 25 23300
5 29 6100 5 54 41100 17 29 6200
6 23 28600 6 48 29800 20 23 9500
7 33 21900 7 39 42000 10 33 15500
8 27 28800 8 28 8800 4 27 9300
9 31 20300 9 24 25500 12 31 26600
10 26 28100 10 33 15500 11,13 26 8450
11 25 9400 11 26 400 15 25 23300
12 27 14300 12 31 26600 4 27 9300
13 29 12500 13 26 16500 17 29 6200
14 24 19700 14 34 24200 9,16 24 17700
15 25 10100 15 25 23300 15 25 23300
16 43 10700 16 24 9700 1 43 20900
17 28 11500 17 29 6200
18 27 10700 18 35 30200
19 28 16300 19 32 17800
Average: 28.5 16426 20 23 9500 Average:
21 32 25900
Average: 33 20724
![Page 53: Matching and Selection on Observables](https://reader030.fdocuments.us/reader030/viewer/2022012809/61bf3866b8eb1e764d18aa33/html5/thumbnails/53.jpg)
A Training ExampleTrainees Non-Trainees Matched Sample
unit age earnings unit age earnings unit age earnings
1 28 17700 1 43 20900 8 28 8800
2 34 10200 2 50 31000 14 34 24200
3 29 14400 3 30 21000 17 29 6200
4 25 20800 4 27 9300 15 25 23300
5 29 6100 5 54 41100 17 29 6200
6 23 28600 6 48 29800 20 23 9500
7 33 21900 7 39 42000 10 33 15500
8 27 28800 8 28 8800 4 27 9300
9 31 20300 9 24 25500 12 31 26600
10 26 28100 10 33 15500 11,13 26 8450
11 25 9400 11 26 400 15 25 23300
12 27 14300 12 31 26600 4 27 9300
13 29 12500 13 26 16500 17 29 6200
14 24 19700 14 34 24200 9,16 24 17700
15 25 10100 15 25 23300 15 25 23300
16 43 10700 16 24 9700 1 43 20900
17 28 11500 17 29 6200 8 28 8800
18 27 10700 18 35 30200
19 28 16300 19 32 17800
Average: 28.5 16426 20 23 9500 Average:
21 32 25900
Average: 33 20724
![Page 54: Matching and Selection on Observables](https://reader030.fdocuments.us/reader030/viewer/2022012809/61bf3866b8eb1e764d18aa33/html5/thumbnails/54.jpg)
A Training ExampleTrainees Non-Trainees Matched Sample
unit age earnings unit age earnings unit age earnings
1 28 17700 1 43 20900 8 28 8800
2 34 10200 2 50 31000 14 34 24200
3 29 14400 3 30 21000 17 29 6200
4 25 20800 4 27 9300 15 25 23300
5 29 6100 5 54 41100 17 29 6200
6 23 28600 6 48 29800 20 23 9500
7 33 21900 7 39 42000 10 33 15500
8 27 28800 8 28 8800 4 27 9300
9 31 20300 9 24 25500 12 31 26600
10 26 28100 10 33 15500 11,13 26 8450
11 25 9400 11 26 400 15 25 23300
12 27 14300 12 31 26600 4 27 9300
13 29 12500 13 26 16500 17 29 6200
14 24 19700 14 34 24200 9,16 24 17700
15 25 10100 15 25 23300 15 25 23300
16 43 10700 16 24 9700 1 43 20900
17 28 11500 17 29 6200 8 28 8800
18 27 10700 18 35 30200 4 27 9300
19 28 16300 19 32 17800
Average: 28.5 16426 20 23 9500 Average:
21 32 25900
Average: 33 20724
![Page 55: Matching and Selection on Observables](https://reader030.fdocuments.us/reader030/viewer/2022012809/61bf3866b8eb1e764d18aa33/html5/thumbnails/55.jpg)
A Training ExampleTrainees Non-Trainees Matched Sample
unit age earnings unit age earnings unit age earnings
1 28 17700 1 43 20900 8 28 8800
2 34 10200 2 50 31000 14 34 24200
3 29 14400 3 30 21000 17 29 6200
4 25 20800 4 27 9300 15 25 23300
5 29 6100 5 54 41100 17 29 6200
6 23 28600 6 48 29800 20 23 9500
7 33 21900 7 39 42000 10 33 15500
8 27 28800 8 28 8800 4 27 9300
9 31 20300 9 24 25500 12 31 26600
10 26 28100 10 33 15500 11,13 26 8450
11 25 9400 11 26 400 15 25 23300
12 27 14300 12 31 26600 4 27 9300
13 29 12500 13 26 16500 17 29 6200
14 24 19700 14 34 24200 9,16 24 17700
15 25 10100 15 25 23300 15 25 23300
16 43 10700 16 24 9700 1 43 20900
17 28 11500 17 29 6200 8 28 8800
18 27 10700 18 35 30200 4 27 9300
19 28 16300 19 32 17800 8 28 8800
Average: 28.5 16426 20 23 9500 Average:
21 32 25900
Average: 33 20724
![Page 56: Matching and Selection on Observables](https://reader030.fdocuments.us/reader030/viewer/2022012809/61bf3866b8eb1e764d18aa33/html5/thumbnails/56.jpg)
A Training ExampleTrainees Non-Trainees Matched Sample
unit age earnings unit age earnings unit age earnings
1 28 17700 1 43 20900 8 28 8800
2 34 10200 2 50 31000 14 34 24200
3 29 14400 3 30 21000 17 29 6200
4 25 20800 4 27 9300 15 25 23300
5 29 6100 5 54 41100 17 29 6200
6 23 28600 6 48 29800 20 23 9500
7 33 21900 7 39 42000 10 33 15500
8 27 28800 8 28 8800 4 27 9300
9 31 20300 9 24 25500 12 31 26600
10 26 28100 10 33 15500 11,13 26 8450
11 25 9400 11 26 400 15 25 23300
12 27 14300 12 31 26600 4 27 9300
13 29 12500 13 26 16500 17 29 6200
14 24 19700 14 34 24200 9,16 24 17700
15 25 10100 15 25 23300 15 25 23300
16 43 10700 16 24 9700 1 43 20900
17 28 11500 17 29 6200 8 28 8800
18 27 10700 18 35 30200 4 27 9300
19 28 16300 19 32 17800 8 28 8800
Average: 28.5 16426 20 23 9500 Average: 28.5 13982
21 32 25900
Average: 33 20724
![Page 57: Matching and Selection on Observables](https://reader030.fdocuments.us/reader030/viewer/2022012809/61bf3866b8eb1e764d18aa33/html5/thumbnails/57.jpg)
Age Distribution: Before Matching
01
23
01
23
20 30 40 50 60
A: Trainees
B: Non−Trainees
frequ
ency
ageGraphs by group
![Page 58: Matching and Selection on Observables](https://reader030.fdocuments.us/reader030/viewer/2022012809/61bf3866b8eb1e764d18aa33/html5/thumbnails/58.jpg)
Age Distribution: After Matching
01
23
01
23
20 30 40 50 60
A: Trainees
B: Non−Trainees
frequ
ency
ageGraphs by group
![Page 59: Matching and Selection on Observables](https://reader030.fdocuments.us/reader030/viewer/2022012809/61bf3866b8eb1e764d18aa33/html5/thumbnails/59.jpg)
Training E↵ect Estimates
Di↵erence in average earnings between trainees and non-trainees
Before matching
16426� 20724 = �4298
After matching:
16426� 13982 = 2444
![Page 60: Matching and Selection on Observables](https://reader030.fdocuments.us/reader030/viewer/2022012809/61bf3866b8eb1e764d18aa33/html5/thumbnails/60.jpg)
Matching: Distance MetricWhen the vector of matching covariates,
X =
0
BBB@
X
1
X
2
...Xk
1
CCCA,
has more than one dimension (k > 1) we need to define a distance
metric to measure “closeness”. The usual Euclidean distance is:
kXi � Xjk =q(Xi � Xj)0(Xi � Xj)
=
vuutkX
n=1
(Xni � Xnj)2.
) The Euclidean distance is not invariant to changes in the scale of theX ’s.
) For this reason, we often use alternative distances that are invariant to
changes in scale.
![Page 61: Matching and Selection on Observables](https://reader030.fdocuments.us/reader030/viewer/2022012809/61bf3866b8eb1e764d18aa33/html5/thumbnails/61.jpg)
Matching: Distance Metric
A commonly used distance is the normalized Euclidean distance:
kXi � Xjk =q(Xi � Xj)0 bV�1(Xi � Xj)
where
bV =
0
BBB@
b�2
1
0 · · · 00 b�2
2
· · · 0...
.... . .
...0 0 · · · b�2
k
1
CCCA
Notice that, the normalized Euclidean distance is equal to:
kXi � Xjk =
vuutkX
n=1
(Xni � Xnj)2
b�2
n.
) Changes in the scale of Xni a↵ect also b�n, and the normalized
Euclidean distance does not change.
![Page 62: Matching and Selection on Observables](https://reader030.fdocuments.us/reader030/viewer/2022012809/61bf3866b8eb1e764d18aa33/html5/thumbnails/62.jpg)
Matching: Distance Metric
Another popular scale-invariant distance is the Mahalanobis distance:
kXi � Xjk =q(Xi � Xj)0b⌃�1
X (Xi � Xj),
where b⌃X is the sample variance-covariance matrix of X .
We can also define arbitrary distances:
kXi � Xjk =
vuutkX
n=1
!n · (Xni � Xnj)2
(with all !n � 0) so that we assign large !n’s to those covariates that we
want to match particularly well.
![Page 63: Matching and Selection on Observables](https://reader030.fdocuments.us/reader030/viewer/2022012809/61bf3866b8eb1e764d18aa33/html5/thumbnails/63.jpg)
Matching and the Curse of Dimensionality
Matching discrepancies kXi � Xj(i)k tend to increase with k ,the dimension of X
Matching discrepancies converge to zero. But they convergevery slowly if k is large
Mathematically, it can be shown that kXi � Xj(i)k converges
to zero at the same rate as1
N
1/k
It is di�cult to find good matches in large dimensions: youneed many observations if k is large
![Page 64: Matching and Selection on Observables](https://reader030.fdocuments.us/reader030/viewer/2022012809/61bf3866b8eb1e764d18aa33/html5/thumbnails/64.jpg)
Matching: Bias Problem
b↵ATET =1
N
1
X
Di=1
(Yi � Yj(i)),
where Xi ' Xj(i) and Dj(i) = 0. Let
µ0
(x) = E [Y |X = x ,D = 0] = E [Y0
|X = x ],
µ1
(x) = E [Y |X = x ,D = 1] = E [Y1
|X = x ],
Yi = µDi (Xi ) + "i .
Then,
b↵ATET =1
N
1
X
Di=1
⇣(µ
1
(Xi ) + "i )� (µ0
(Xj(i)) + "j(i))⌘
=1
N
1
X
Di=1
(µ1
(Xi )� µ0
(Xj(i)))
+1
N
1
X
Di=1
("i � "j(i)).
![Page 65: Matching and Selection on Observables](https://reader030.fdocuments.us/reader030/viewer/2022012809/61bf3866b8eb1e764d18aa33/html5/thumbnails/65.jpg)
Matching: Bias Problem
b↵ATET � ↵ATET =1
N
1
X
Di=1
(µ1
(Xi )� µ0
(Xj(i))� ↵ATET )
+1
N
1
X
Di=1
("i � "j(i)).
Therefore,
b↵ATET � ↵ATET =1
N
1
X
Di=1
(µ1
(Xi )� µ0
(Xi )� ↵ATET )
+1
N
1
X
Di=1
("i � "j(i))
+1
N
1
X
Di=1
(µ0
(Xi )� µ0
(Xj(i))).
![Page 66: Matching and Selection on Observables](https://reader030.fdocuments.us/reader030/viewer/2022012809/61bf3866b8eb1e764d18aa33/html5/thumbnails/66.jpg)
Matching: Bias Problem
We hope that we can apply a Central Limit Theorem and
pN
1
(b↵ATET � ↵ATET )
converges to a Normal distribution with zero mean. However,
E [pN
1
(b↵ATET � ↵ATET )] = E [pN
1
(µ0
(Xi )� µ0
(Xj(i)))|D = 1].
Now, if k is large:
) The di↵erence between Xi and Xj(i) converges to zero very slowly
) The di↵erence µ0
(Xi )� µ0
(Xj(i)) converges to zero very slowly
) E [pN
1
(µ0
(Xi )� µ0
(Xj(i)))|D = 1] may not converge to zero!
) E [pN
1
(b↵ATET � ↵ATET )] may not converge to zero!
) Bias is often an issue when we match in many dimensions
![Page 67: Matching and Selection on Observables](https://reader030.fdocuments.us/reader030/viewer/2022012809/61bf3866b8eb1e764d18aa33/html5/thumbnails/67.jpg)
Matching: Solutions to the Bias Problem
The bias of the matching estimator is caused by large matchingdiscrepancies kXi � Xj(i)k. However:
1 The matching discrepancies are observed. We can alwayscheck in the data how well we are matching the covariates.
2 For b↵ATET we can always make the matching discrepanciessmall by using a large reservoir of untreated units to select thematches (that is, by making N
0
large).
3 If the matching discrepancies are large, so we are worriedabout potential biases, we can apply bias correctiontechniques.
4 Partial solution: Propensity score methods (to come).
![Page 68: Matching and Selection on Observables](https://reader030.fdocuments.us/reader030/viewer/2022012809/61bf3866b8eb1e764d18aa33/html5/thumbnails/68.jpg)
Matching with Bias Correction
Each treated observation contributes
µ0
(Xi )� µ0
(Xj(i))
to the bias.
Bias-corrected matching:
b↵BCATET =
1
N
1
X
Di=1
⇣(Yi � Yj(i))� (bµ
0
(Xi )� bµ0
(Xj(i))).⌘
where bµ0
(x) is an estimate of E [Y |X = x ,D = 0] (e.g., OLS).
Under some conditions, the bias correction eliminate the bias of the
matching estimator without a↵ecting the variance.
![Page 69: Matching and Selection on Observables](https://reader030.fdocuments.us/reader030/viewer/2022012809/61bf3866b8eb1e764d18aa33/html5/thumbnails/69.jpg)
Bias Adjustment in Matched DataPotential Outcome Potential Outcome
unit under Treatment under Controli Y
1i Y
0i Di Xi
1 10 8 1 32 4 1 1 13 10 9 1 104 8 0 45 1 0 06 9 0 8
b↵ATET = (10� 8)/3 + (4� 1)/3 + (10� 9)/3 = 2
For the bias correction, estimate bµ0
(x) = b�0
+ b�1
x = 2 + x .
b↵BCATET =
⇣(10� 8)� (bµ
0
(3)� bµ0
(4))⌘/3
+⇣(4� 1)� (bµ
0
(1)� bµ0
(0))⌘/3
+⇣(10� 9)� (bµ
0
(10)� bµ0
(8))⌘/3 = 1.33
![Page 70: Matching and Selection on Observables](https://reader030.fdocuments.us/reader030/viewer/2022012809/61bf3866b8eb1e764d18aa33/html5/thumbnails/70.jpg)
Bias Adjustment in Matched DataPotential Outcome Potential Outcome
unit under Treatment under Controli Y
1i Y
0i Di Xi
1 10 8 1 32 4 1 1 13 10 9 1 104 8 0 45 1 0 06 9 0 8
b↵ATET = (10� 8)/3 + (4� 1)/3 + (10� 9)/3 = 2
For the bias correction, estimate bµ0
(x) = b�0
+ b�1
x = 2 + x .
b↵BCATET =
⇣(10� 8)� (bµ
0
(3)� bµ0
(4))⌘/3
+⇣(4� 1)� (bµ
0
(1)� bµ0
(0))⌘/3
+⇣(10� 9)� (bµ
0
(10)� bµ0
(8))⌘/3 = 1.33
![Page 71: Matching and Selection on Observables](https://reader030.fdocuments.us/reader030/viewer/2022012809/61bf3866b8eb1e764d18aa33/html5/thumbnails/71.jpg)
Bias Adjustment in Matched DataPotential Outcome Potential Outcome
unit under Treatment under Controli Y
1i Y
0i Di Xi
1 10 8 1 32 4 1 1 13 10 9 1 104 8 0 45 1 0 06 9 0 8
b↵ATET = (10� 8)/3 + (4� 1)/3 + (10� 9)/3 = 2
For the bias correction, estimate bµ0
(x) = b�0
+ b�1
x = 2 + x .
b↵BCATET =
⇣(10� 8)� (bµ
0
(3)� bµ0
(4))⌘/3
+⇣(4� 1)� (bµ
0
(1)� bµ0
(0))⌘/3
+⇣(10� 9)� (bµ
0
(10)� bµ0
(8))⌘/3 = 1.33
![Page 72: Matching and Selection on Observables](https://reader030.fdocuments.us/reader030/viewer/2022012809/61bf3866b8eb1e764d18aa33/html5/thumbnails/72.jpg)
Matching Bias: Implications for Practice
Bias arises because of the e↵ect of large matching discrepancies onµ0
(Xi )� µ0
(Xj(i)). To minimize matching discrepancies:
1 Use a small M (e.g., M = 1). Large values of M producelarge matching discrepancies.
2 Use matching with replacement. Because matching withreplacement can use untreated units as a match more thanonce, matching with replacement produces smaller matchingdiscrepancies than matching without replacement.
3 Try to match covariates with a large e↵ect on µ0
(·)particularly well.
![Page 73: Matching and Selection on Observables](https://reader030.fdocuments.us/reader030/viewer/2022012809/61bf3866b8eb1e764d18aa33/html5/thumbnails/73.jpg)
Matching Estimators: Large Sample Distribution (b↵ATET
)
Matching estimators have a Normal distribution in large samples(provided that the bias is small):
pN
1
(b↵ATET � ↵ATET )d�! N(0,�2
ATET ).
For matching without replacement, the “usual” variance estimator,
b�2
ATET =1
N
1
X
Di=1
Yi �
1
M
MX
m=1
Yjm(i) � b↵ATET
!2
,
is valid.
![Page 74: Matching and Selection on Observables](https://reader030.fdocuments.us/reader030/viewer/2022012809/61bf3866b8eb1e764d18aa33/html5/thumbnails/74.jpg)
Matching Estimators: Large Sample Distribution (b↵ATET
)
For matching with replacement:
b�2
ATET =1
N
1
X
Di=1
Yi �
1
M
MX
m=1
Yjm(i) � b↵ATET
!2
+1
N
1
X
Di=0
✓Ki (Ki � 1)
M
2
◆cvar("i |Xi ,Di = 0),
where Ki is the number of times observation i is used as a match.
cvar(Yi |Xi ,Di = 0) can be estimated also by matching. For example,take two observations with Di = Dj = 0 and Xi ' Xj , then
cvar(Yi |Xi ,Di = 0) =(Yi � Yj)2
2
is an unbiased estimator of cvar("i |Xi ,Di = 0).
The bootstrap does not work!
![Page 75: Matching and Selection on Observables](https://reader030.fdocuments.us/reader030/viewer/2022012809/61bf3866b8eb1e764d18aa33/html5/thumbnails/75.jpg)
Matching (with replacement) in Stata: nnmatch
Basic usage:
nnmatch Y D X
1
X
2
. . ., robust(1) pop
M 6= 1 (default is M = 1):
nnmatch Y D X
1
X
2
. . ., m(2) robust(1) pop
ATT (default is ATE):
nnmatch Y D X
1
X
2
. . ., tc(att) robust(1) pop
Mahalanobis metric (default is normalized Euclidean):
nnmatch Y D X
1
X
2
. . ., metric(maha) robust(1) pop
Exact matches for some covariates:
nnmatch Y D X
1
X
2
. . ., exact(X exact) robust(1) pop
Bias correction (default is no bias correction):
nnmatch Y D X
1
X
2
. . ., biasadj(bias) robust(1) pop
![Page 76: Matching and Selection on Observables](https://reader030.fdocuments.us/reader030/viewer/2022012809/61bf3866b8eb1e764d18aa33/html5/thumbnails/76.jpg)
Matching (with replacement) in Stata: nnmatch
Basic usage:
nnmatch Y D X
1
X
2
. . ., robust(1) pop
M 6= 1 (default is M = 1):
nnmatch Y D X
1
X
2
. . ., m(2) robust(1) pop
ATT (default is ATE):
nnmatch Y D X
1
X
2
. . ., tc(att) robust(1) pop
Mahalanobis metric (default is normalized Euclidean):
nnmatch Y D X
1
X
2
. . ., metric(maha) robust(1) pop
Exact matches for some covariates:
nnmatch Y D X
1
X
2
. . ., exact(X exact) robust(1) pop
Bias correction (default is no bias correction):
nnmatch Y D X
1
X
2
. . ., biasadj(bias) robust(1) pop
![Page 77: Matching and Selection on Observables](https://reader030.fdocuments.us/reader030/viewer/2022012809/61bf3866b8eb1e764d18aa33/html5/thumbnails/77.jpg)
Matching (with replacement) in Stata: nnmatch
Basic usage:
nnmatch Y D X
1
X
2
. . ., robust(1) pop
M 6= 1 (default is M = 1):
nnmatch Y D X
1
X
2
. . ., m(2) robust(1) pop
ATT (default is ATE):
nnmatch Y D X
1
X
2
. . ., tc(att) robust(1) pop
Mahalanobis metric (default is normalized Euclidean):
nnmatch Y D X
1
X
2
. . ., metric(maha) robust(1) pop
Exact matches for some covariates:
nnmatch Y D X
1
X
2
. . ., exact(X exact) robust(1) pop
Bias correction (default is no bias correction):
nnmatch Y D X
1
X
2
. . ., biasadj(bias) robust(1) pop
![Page 78: Matching and Selection on Observables](https://reader030.fdocuments.us/reader030/viewer/2022012809/61bf3866b8eb1e764d18aa33/html5/thumbnails/78.jpg)
Matching (with replacement) in Stata: nnmatch
Basic usage:
nnmatch Y D X
1
X
2
. . ., robust(1) pop
M 6= 1 (default is M = 1):
nnmatch Y D X
1
X
2
. . ., m(2) robust(1) pop
ATT (default is ATE):
nnmatch Y D X
1
X
2
. . ., tc(att) robust(1) pop
Mahalanobis metric (default is normalized Euclidean):
nnmatch Y D X
1
X
2
. . ., metric(maha) robust(1) pop
Exact matches for some covariates:
nnmatch Y D X
1
X
2
. . ., exact(X exact) robust(1) pop
Bias correction (default is no bias correction):
nnmatch Y D X
1
X
2
. . ., biasadj(bias) robust(1) pop
![Page 79: Matching and Selection on Observables](https://reader030.fdocuments.us/reader030/viewer/2022012809/61bf3866b8eb1e764d18aa33/html5/thumbnails/79.jpg)
Matching (with replacement) in Stata: nnmatch
Basic usage:
nnmatch Y D X
1
X
2
. . ., robust(1) pop
M 6= 1 (default is M = 1):
nnmatch Y D X
1
X
2
. . ., m(2) robust(1) pop
ATT (default is ATE):
nnmatch Y D X
1
X
2
. . ., tc(att) robust(1) pop
Mahalanobis metric (default is normalized Euclidean):
nnmatch Y D X
1
X
2
. . ., metric(maha) robust(1) pop
Exact matches for some covariates:
nnmatch Y D X
1
X
2
. . ., exact(X exact) robust(1) pop
Bias correction (default is no bias correction):
nnmatch Y D X
1
X
2
. . ., biasadj(bias) robust(1) pop
![Page 80: Matching and Selection on Observables](https://reader030.fdocuments.us/reader030/viewer/2022012809/61bf3866b8eb1e764d18aa33/html5/thumbnails/80.jpg)
Matching (with replacement) in Stata: nnmatch
Basic usage:
nnmatch Y D X
1
X
2
. . ., robust(1) pop
M 6= 1 (default is M = 1):
nnmatch Y D X
1
X
2
. . ., m(2) robust(1) pop
ATT (default is ATE):
nnmatch Y D X
1
X
2
. . ., tc(att) robust(1) pop
Mahalanobis metric (default is normalized Euclidean):
nnmatch Y D X
1
X
2
. . ., metric(maha) robust(1) pop
Exact matches for some covariates:
nnmatch Y D X
1
X
2
. . ., exact(X exact) robust(1) pop
Bias correction (default is no bias correction):
nnmatch Y D X
1
X
2
. . ., biasadj(bias) robust(1) pop
![Page 81: Matching and Selection on Observables](https://reader030.fdocuments.us/reader030/viewer/2022012809/61bf3866b8eb1e764d18aa33/html5/thumbnails/81.jpg)
Matching)as)a)condi-oning)strategy)
Propensity)score)matching)
![Page 82: Matching and Selection on Observables](https://reader030.fdocuments.us/reader030/viewer/2022012809/61bf3866b8eb1e764d18aa33/html5/thumbnails/82.jpg)
Propensity+Score+
• Overview:+– Mo3va3on:+what+do+we+use+a+propensity+score+for?+
– Construc3ng+the+propensity+score+– Implemen3ng+the+propensity+score+in+Stata++
• I�ll+post+these+slides+on+the+web++
+
![Page 83: Matching and Selection on Observables](https://reader030.fdocuments.us/reader030/viewer/2022012809/61bf3866b8eb1e764d18aa33/html5/thumbnails/83.jpg)
Mo3va3on:+a+Case+in+which+the+Propensity+Score+is+Useful++
• This+is+just+an+illustra3ve+example:+• Suppose+that+you+want+to+evaluate+the+effect+of+a+scholarship+program+for+secondary+school+on+years+of+schooling.++
• Suppose+every+primary+school+graduate+is+eligible.++• You+have+data+on+every+child,+including+test+scores,+family+income,+age,+gender,+etc.+
• Scholarships+are+awarded+based+on+some+combina3on+of+test+scores,+family+income,+gender+etc.+but+you+don�t+know+the+exact+formula.++
+
![Page 84: Matching and Selection on Observables](https://reader030.fdocuments.us/reader030/viewer/2022012809/61bf3866b8eb1e764d18aa33/html5/thumbnails/84.jpg)
Mo3va3on+(cont.)+
• If+scholarships+where+assigned+completely+at+random,+we+could+just+compare+treatment+and+control+group+(randomized+experiment)+
• In+our+example,+assume+you+know+that+children+with+higher+test+scores+are+more+likely+to+get+the+scholarship,+but+you+don�t+know+how+important+this+and+other+factors+are,+you+just+know+that+the+decision+is+based+on+informa3on+you+have+and+some+randomness.++
• What+can+you+do+with+this+informa3on?+
![Page 85: Matching and Selection on Observables](https://reader030.fdocuments.us/reader030/viewer/2022012809/61bf3866b8eb1e764d18aa33/html5/thumbnails/85.jpg)
Mo3va3on+(cont.)+
• In+principle,+you+could+just+run+the+following+regression:+++years+of+schoolingi+=+β0++β1+Ii+scholarship+++β2+Xi+++εi++where+Iischolarship+is+a+dummy+variable+for+receiving+the+scholarship+and+Xi+are+all+the+variables+that+you+think+affect+the+probability+of+receiving+a+scholarship+(Xi+can+be+a+whole+vector+of+things),+εi++is+the+error+term+
• This+works+only+if+you+know+that+the+probability+of+geUng+a+scholarship+is+just+a+linear+func3on+of+X.++
• But+we+may+have+no+idea+how+the+selec3on+depends+on+X.+For+instance,+they+may+use+discrete+cutoffs+rather+than+a+linear+func3on+
+
![Page 86: Matching and Selection on Observables](https://reader030.fdocuments.us/reader030/viewer/2022012809/61bf3866b8eb1e764d18aa33/html5/thumbnails/86.jpg)
Mo3va3on+(cont.)+• Suppose+your+variables+are+not+con3nuous,+but+they+are+
categories+(somewhat+arbitrarily).++– E.g.+family+income+above+or+below+$50+per+week,+scores+above+or+
below+the+mean,+sex,+age,+etc.++• Now,+you+could+put+in+dummy+variables+for+each+category+and+
interac3on+between+all+dummies.+This+would+dis3nguish+every+group+formed+by+the+categories.++
• Or+you+could+run+separate+regressions+for+each+group+– This+is+more+flexible+since+it+allows+the+effect+of+the+scholarship+to+
differ+by+group.++• These+methods+are+in+principle+correct,+but+they+are+only+
feasible+if+you+have+a+lot+of+data+and+few+categories.++
![Page 87: Matching and Selection on Observables](https://reader030.fdocuments.us/reader030/viewer/2022012809/61bf3866b8eb1e764d18aa33/html5/thumbnails/87.jpg)
Construc3ng+the+Propensity+Score+
• Es3ma3on+based+on+the+propensity+score+can+deal+with+these+kinds+of+situa3ons.++
• For+it+to+work,+you+need+to+have+a+selec3on+into+the+treatment+(in+our+case+the+scholarship)+that+is+based+on+observables+(observed+covariates).+
• The+following+gives+a+brief+overview+of+how+the+propensity+score+is+constructed.+
• In+prac3ce,+you+can+download+a+canned+Stata+command+that+will+do+all+of+this+for+you.+
![Page 88: Matching and Selection on Observables](https://reader030.fdocuments.us/reader030/viewer/2022012809/61bf3866b8eb1e764d18aa33/html5/thumbnails/88.jpg)
Defini3on+and+General+Idea+
• Defini3on:+The+propensity+score+is+the+probability+of+receiving+treatment+(in+our+case+a+scholarship),+condi3onal+on+the+covariates+(in+our+case+Xi).++
• The+idea+is+to+compare+individuals+who+based+on+observables+have+a+very+similar+probability+of+receiving+treatment+(similar+propensity+score),+but+one+of+them+received+treatment+and+the+other+didn�t.+
• We+then+think+that+all+the+difference+in+the+outcome+variable+is+due+to+the+treatment.++
+
![Page 89: Matching and Selection on Observables](https://reader030.fdocuments.us/reader030/viewer/2022012809/61bf3866b8eb1e764d18aa33/html5/thumbnails/89.jpg)
First+stage+• In+the+first+stage+of+a+propensity+score+es3ma3on,+you+find+the+propensity+score.++
• To+do+this,+you+run+a+regression+(probit+or+logit)+that+has+the+probability+of+receiving+treatment+on+the+le`+hand+side,+and+the+covariates+that+determine+selec3on+into+the+treatment+on+the+right+hand+side:++Iitreat+=+β0++β1+Xi1+++β2+Xi2++…+(+γ+1+Xi12+++γ+2+Xi22…)+++εi+• The+propensity+score+is+just+the+predicted+value+Îitreat+that+you+get+from+this+regression.++
• You+start+with+a+simple+specifica3on+(e.g.+just+linear+terms).+Then+you+follow+the+following+algorithm+to+decide+whether+this+specifica3on+is+good+enough:+
![Page 90: Matching and Selection on Observables](https://reader030.fdocuments.us/reader030/viewer/2022012809/61bf3866b8eb1e764d18aa33/html5/thumbnails/90.jpg)
Algorithm+1) Sort+your+data+by+the+propensity+score+and+divide+it+into+
blocks+(groups)+of+observa3ons+with+similar+propensity+sores.++++
2) Within+each+block,+test+(using+a+tetest),+whether+the+means+of+the+covariates+are+equal+in+the+treatment+and+control+group.++
+If+so+!+stop,+you�re+done+with+the+first+stage++
3) If+a+par3cular+block+has+one+or+more+unbalanced+covariates,+divide+that+block+into+finer+blocks+and+reeevaluate+++
4) If+a+par3cular+covariate+is+unbalanced+for+mul3ple+blocks,+modify+the+ini3al+logit+or+probit+equa3on+by+including+higher+order+terms+and/or+interac3ons+with+that+covariate+and+start+again.++
![Page 91: Matching and Selection on Observables](https://reader030.fdocuments.us/reader030/viewer/2022012809/61bf3866b8eb1e764d18aa33/html5/thumbnails/91.jpg)
Second+Stage+• In+the+second+stage,+we+look+at+the+effect+of+treatment+on+the+outcome+(in+our+example+of+geUng+the+scholarship+on+years+of+schooling),+using+the+propensity+score.+
• Once+you+have+determined+your+propensity+score+with+the+procedure+above,+there+are+several+ways+to+use+it.+I�ll+present+two+of+them+(canned+version+in+Stata+for+both):++
+• Stra3fying+on+the+propensity+score+
– Divide+the+data+into+blocks+based+on+the+propensity+score+(blocks+are+determined+with+the+algorithm).+Run+the+second+stage+regression+within+each+block.+Calculate+the+weighted+mean+of+the+withineblock+es3mates+to+get+the+average+treatment+effect.++
+• Matching+on+the+propensity+score+
– Match+each+treatment+observa3on+with+one+or+more+control+observa3ons,+based+on+similar+propensity+scores.+You+then+include+a+dummy+for+each+matched+group,+which+controls+for+everything+that+is+common+within+that+group.++
++++
![Page 92: Matching and Selection on Observables](https://reader030.fdocuments.us/reader030/viewer/2022012809/61bf3866b8eb1e764d18aa33/html5/thumbnails/92.jpg)
Implementa3on+in+Stata+
• Search+pscore+in+help+• Click+on+the+word+�pscore�+• Download+the+command+• See+stata+help+for+how+to+implement+it+• First+stage:+
pscore+treat+X1+X2+X3…,+pscore(scorename)+
• Second+stage:+akr+(for+matching)+or+aks+(for+stra3fying):+a(r+outcome+treat,+pscore(scorename)+)
![Page 93: Matching and Selection on Observables](https://reader030.fdocuments.us/reader030/viewer/2022012809/61bf3866b8eb1e764d18aa33/html5/thumbnails/93.jpg)
General+Remarks ++• The+propensity+score+approach+becomes+more+appropriate+
the+more+we+have+randomness+determining+who+gets+treatment+(closer+to+randomized+experiment).+
• The+propensity+score+doesn�t+work+very+well+if+almost+everyone+with+a+high+propensity+score+gets+treatment+and+almost+everyone+with+a+low+score+doesn�t:+we+need+to+be+able+to+compare+people+with+similar+propensi3es+who+did+and+did+not+get+treatment.+
+• The+propensity+score+approach+doesn�t+correct+for+
unobservable+variables+that+affect+whether+observa3ons+receive+treatment.++
+
![Page 94: Matching and Selection on Observables](https://reader030.fdocuments.us/reader030/viewer/2022012809/61bf3866b8eb1e764d18aa33/html5/thumbnails/94.jpg)
Propensity Score Methods
DefinitionPropensity score is defined as the selection probability conditionalon the confounding variables: p(X ) = P(D = 1|X )
Identification Assumption
1 (Y1
,Y0
)??D |X (selection on observables)
2 0 < Pr(D = 1|X ) < 1 (common support)
Identification Result (Rosenbaum and Rubin, 1983)
Selection on observables implies:
(Y1
,Y0
)??D | p(X )
) conditioning on the propensity score is enough to have
independence between the treatment indicator and the
potential outcomes
) substantial dimension reduction in the matching variables!
![Page 95: Matching and Selection on Observables](https://reader030.fdocuments.us/reader030/viewer/2022012809/61bf3866b8eb1e764d18aa33/html5/thumbnails/95.jpg)
Propensity Score Methods
Proof.Assume that (Y
1
,Y0
)??D |X . Then:
P(D = 1|Y1
,Y0
, p(X )) = E [D|Y1
,Y0
, p(X )]
= E [E [D|Y1
,Y0
,X ]|Y1
,Y0
, p(X )]
= E [E [D|X ]|Y1
,Y0
, p(X )]
= E [p(X )|Y1
,Y0
, p(X )]
= p(X )
Using a similar argument, we obtain
P(D = 1|p(X )) = E [D|p(X )] = E [E [D|X ]|p(X )]
= E [p(X )|p(X )] = p(X )
) P(D = 1|Y1
,Y0
, p(X )) = P(D = 1|p(X ))
) (Y1
,Y0
)??D | p(X )
![Page 96: Matching and Selection on Observables](https://reader030.fdocuments.us/reader030/viewer/2022012809/61bf3866b8eb1e764d18aa33/html5/thumbnails/96.jpg)
Matching on the Propensity Score
CorollaryIf (Y
1
,Y0
)??D|X, then
E [Y1
� Y
0
|p(X )] = E [Y |D = 1, p(X )]� E [Y |D = 0, p(X )]
Suggests a two step procedure to estimate causal e↵ects underselection on observables:
1 estimate the propensity score p(X ) = P(D = 1|X ) (e.g., usinglogit or probit regression)
2 do matching or subclassification on the estimated propensityscore
Hugely popular method to estimate treatment e↵ects. However,valid method to calculate standard errors not know until (very)recently.
![Page 97: Matching and Selection on Observables](https://reader030.fdocuments.us/reader030/viewer/2022012809/61bf3866b8eb1e764d18aa33/html5/thumbnails/97.jpg)
Propensity Score: Balancing PropertyBecause the propensity score, p(X ) is a function of X :
Pr(D = 1|X , p(X )) = Pr(D = 1|X )
= p(X )
) conditional on p(X ), the probability that D = 1 does notdepend on X .
) D and X are independent conditional on p(X ):
D??X | p(X ).
So we obtain the balancing property of the propensity score:
P(X |D = 1, p(X )) = P(X |D = 0, p(X )),
) conditional on the propensity score, the distribution of thecovariates is the same for treated and non-treated.
We can use this to check if our estimated propensity score actuallyproduces balance:
P(X |D = 1, bp(X )) = P(X |D = 0, bp(X ))
![Page 98: Matching and Selection on Observables](https://reader030.fdocuments.us/reader030/viewer/2022012809/61bf3866b8eb1e764d18aa33/html5/thumbnails/98.jpg)
Weighting on the Propensity Score
PropositionIf Y
1
,Y0
??D|X, then
↵ATE = E [Y1
� Y
0
] = E
Y · D � p(X )
p(X ) · (1� p(X ))
�
↵ATET = E [Y1
� Y
0
|D = 1] =1
P(D = 1)· EY · D � p(X )
1� p(X )
�
Proof.
E
Y
D � p(X )
p(X )(1�p(X ))
����X�= E
Y
p(X )
����X ,D = 1
�p(X )+E
�Y
1�p(X )
����X ,D = 0
�(1�p(X ))
= E [Y |X ,D = 1]� E [Y |X ,D = 0]
And the results follow from integration over P(X ) and P(X |D = 1).
![Page 99: Matching and Selection on Observables](https://reader030.fdocuments.us/reader030/viewer/2022012809/61bf3866b8eb1e764d18aa33/html5/thumbnails/99.jpg)
Weighting on the Propensity Score
↵ATE = E [Y1
� Y
0
] = E
Y · D � p(X )
p(X ) · (1� p(X ))
�
↵ATET = E [Y1
� Y
0
|D = 1] =1
P(D = 1)· EY · D � p(X )
1� p(X )
�
The analogy principle suggests a two step estimator:
1 Estimate the propensity score: bp(X )
2 Use estimated score to produce analog estimators:
b↵ATE =1
N
NX
i=1
Yi ·Di � bp(Xi )
bp(Xi ) · (1� bp(Xi ))
b↵ATET =1
N
1
NX
i=1
Yi ·Di � bp(Xi )
1� bp(Xi )
![Page 100: Matching and Selection on Observables](https://reader030.fdocuments.us/reader030/viewer/2022012809/61bf3866b8eb1e764d18aa33/html5/thumbnails/100.jpg)
Weighting on the Propensity Score
b↵ATE =1
N
NX
i=1
Yi ·Di � bp(Xi )
bp(Xi ) · (1� bp(Xi ))
b↵ATET =1
N
1
NX
i=1
Yi ·Di � bp(Xi )
1� bp(Xi )
Standard errors:
We need to adjust the s.e.’s for first-step estimation of p(X )
Parametric first-step: Newey & McFadden (1994)
Non-parametric first-step: Newey (1994)
Or bootstrap the entire two-step procedure.
![Page 101: Matching and Selection on Observables](https://reader030.fdocuments.us/reader030/viewer/2022012809/61bf3866b8eb1e764d18aa33/html5/thumbnails/101.jpg)
Regression Estimators
Can we use regression estimators?
1 Least squares as an approximation
2 Least squares as a weighting scheme
3 Estimators of average treatment e↵ects based onnonparametric regression
![Page 102: Matching and Selection on Observables](https://reader030.fdocuments.us/reader030/viewer/2022012809/61bf3866b8eb1e764d18aa33/html5/thumbnails/102.jpg)
Least Squares as an Approximation
(Y1
,Y0
)??D|X implies that the conditional expectation E [Y |D,X ]can be interpreted as a conditional causal response function:
E [Y |D = 1,X ] = E [Y1
|D = 1,X ] = E [Y1
|X ],
E [Y |D = 0,X ] = E [Y0
|D = 0,X ] = E [Y0
|X ]
So E [Y |D,X ] provides average potential responses with andwithout the treatment.
The functional form of E [Y |D,X ] is typically unknown, but LeastSquares provides a well-defined approximation.
![Page 103: Matching and Selection on Observables](https://reader030.fdocuments.us/reader030/viewer/2022012809/61bf3866b8eb1e764d18aa33/html5/thumbnails/103.jpg)
Least Squares as an Approximation
Linear OLS is:
(b↵, b�) = argmina,b1
N
NX
i=1
�Yi � aDi � X
0i b�2
As a result, (b↵, b�) converge to
(↵,�) = argmina,b Eh�Y � aD � X
0b
�2
i
It can be shown that the (↵,�) that solve this minimizationproblem also solve:
(↵,�) = argmina,b Eh�E [Y |D,X ]� (aD + X
0b)�2
i
) Even if E [Y |D,X ] is nonlinear, b↵D + X
0b� provides anestimation of a well-defined approximation to E [Y |D,X ].
) The result is true also for nonlinear specifications.) But the approximation could be a bad one if the regression
is severely misspecified.
![Page 104: Matching and Selection on Observables](https://reader030.fdocuments.us/reader030/viewer/2022012809/61bf3866b8eb1e764d18aa33/html5/thumbnails/104.jpg)
Least Squares as a Weighting Scheme
Suppose that the covariates take on a finite number of values:x
1, x2, . . . , xK . Then, from the subclassification section we know that:
↵ATE =KX
k=1
⇣E [Y |D = 1,X = x
k ]� E [Y |D = 0,X = x
k ]⌘Pr(X = x
k)
Now, suppose that you run a regression that is saturated in X : aregression including one dummy variable, Z k , for each possible value ofX :
Y = ↵D +KX
k=1
Z
k�k + u,
where
Z
k =
⇢1 if X = x
k
0 if X 6= x
k
![Page 105: Matching and Selection on Observables](https://reader030.fdocuments.us/reader030/viewer/2022012809/61bf3866b8eb1e764d18aa33/html5/thumbnails/105.jpg)
Least Squares as a Weighting SchemeIt can be shown that the coe�cient b↵OLS of the saturated regressionconverges to:
↵OLS =KX
k=1
⇣E [Y |D = 1,X = x
k ]� E [Y |D = 0,X = x
k ]⌘wk
where
wk =var(D|X = x
k) Pr(X = x
k)PK
k=1
var(D|X = x
k) Pr(X = x
k).
Strata k with a higher
var(D|X = x
k) = Pr(D|X = X
k)(1� Pr(D|X = X
k))
(i.e. propensity scores close to 0.5) receive higher weight. Stratawith propensity score close to 0 or 1 receive lower weight.
OLS down-weights strata where the average causal e↵ects are lessprecisely estimated.
![Page 106: Matching and Selection on Observables](https://reader030.fdocuments.us/reader030/viewer/2022012809/61bf3866b8eb1e764d18aa33/html5/thumbnails/106.jpg)
Estimators based on Nonparametric Regression
↵ATE =
Z �E [Y |X ,D = 1]� E [Y |X ,D = 0]
�dP(X )
↵ATET =
Z �E [Y |X ,D = 1]� E [Y |X ,D = 0]
�dP(X |D = 1)
This suggests the following estimators:
b↵ATE =1
N
NX
i=1
�bµ1
(Xi )� bµ0
(Xi )�
b↵ATET =1
N
1
X
Di=1
�bµ1
(Xi )� bµ0
(Xi )�,
where bµ1
() and bµ0
() are nonparametric estimators ofE [Y |X ,D=1] and E [Y |X ,D = 0], respectively.
But estimating these regressions nonparametrically is di�cult if thedimension of X is large (curse of dimensionality again!).
![Page 107: Matching and Selection on Observables](https://reader030.fdocuments.us/reader030/viewer/2022012809/61bf3866b8eb1e764d18aa33/html5/thumbnails/107.jpg)
Estimators based on Nonparametric Regression
It is enough, however, to adjust for the propensity score. So wecan proceed in two steps.
1 Estimate the propensity score, bp(X ).
2 Calculate:
b↵ATE =1
N
NX
i=1
�b⌫1
(bp(Xi ))� b⌫0(bp(Xi ))�
b↵ATET =1
N
1
X
Di=1
�b⌫1
(bp(Xi ))� b⌫0(bp(Xi ))�,
where b⌫1
() and b⌫0
() are nonparametric estimators ofE [Y |p(X ),D=1] and E [Y |p(X ),D = 0], respectively.
![Page 108: Matching and Selection on Observables](https://reader030.fdocuments.us/reader030/viewer/2022012809/61bf3866b8eb1e764d18aa33/html5/thumbnails/108.jpg)
Dehejia and Wabha (1999)
![Page 109: Matching and Selection on Observables](https://reader030.fdocuments.us/reader030/viewer/2022012809/61bf3866b8eb1e764d18aa33/html5/thumbnails/109.jpg)
Abadie and Imbens (2011)
6 Journal of Business & Economic Statistics, January 2011
tistics for the experimental treatment group. The second pairof columns presents summary statistics for the experimentalcontrols. The third pair of columns presents summary statisticsfor the nonexperimental comparison group constructed fromthe PSID. The last two columns present normalized differencesbetween the covariate distributions, between the experimentaltreated and controls, and between the experimental treated andthe PSID comparison group, respectively. These normalizeddifferences are calculated as
nor-dif = X1 − X0√(S2
0 + S21)/2
,
where Xw = ∑i:Wi=w Xi/Nw and S2
w = ∑i:Wi=w(Xi − Xw)2/
(Nw −1). Note that this differs from the t-statistic for the test ofthe null hypothesis that E[X|W = 0] = E[X|W = 1], which willbe
t-stat = X1 − X0√S2
0/N0 + S21/N1
.
The normalized difference provides a scale-free measure of thedifference in the location of the two distributions, and is usefulfor assessing the degree of difficulty in adjusting for differencesin covariates.
Panel A contains the results for pretreatment variables andPanel B for outcomes. Notice the large differences in back-ground characteristics between the program participants and thePSID sample. This is what makes drawing causal inferences
from comparisons between the PSID sample and the treatmentgroup a tenuous task. From Panel B, we can obtain an unbi-ased estimate of the effect of the NSW program on earnings in1978 by comparing the averages for the experimental treatedand controls, 6.35 − 4.55 = 1.80, with a standard error of 0.67(earnings are measured in thousands of dollars). Using a nor-mal approximation to the limiting distribution of the effect ofthe program on earnings in 1978, we obtain a 95% confidenceinterval, which is [0.49,3.10].
Table 2 presents estimates of the causal effect of the NSWprogram on earnings using various matching, regression, andweighting estimators. Panel A reports estimates for the exper-imental data (treated and controls). Panel B reports estimatesbased on the experimental treated and the PSID comparisongroup. The first set of rows in each case reports matching es-timates for M equal to 1, 4, 16, 64, and 2490 (the size of thePSID comparison group). The matching estimates include sim-ple matching with no-bias adjustment and bias-adjusted match-ing, first by matching on the covariates and then by matchingon the estimated propensity score. The covariate matching es-timators use the matrix Ane (the diagional matrix with inversesample variances on the diagonal) as the distance measure. Be-cause we are focused on the average effect for the treated, thebias correction only requires an estimate of µ0(Xi). We esti-mate this regression function using linear regression on all ninepretreatment covariates in Table 1, panel A, but do not includeany higher order terms or interactions, with only the controlunits that are used as a match [the units j such that Wj = 0 and
Table 2. Experimental and nonexperimental estimates for the NSW data
M = 1 M = 4 M = 16 M = 64 M = 2490
Est. (SE) Est. (SE) Est. (SE) Est. (SE) Est. (SE)
Panel A:Experimental estimates
Covariate matching 1.22 (0.84) 1.99 (0.74) 1.75 (0.74) 2.20 (0.70) 1.79 (0.67)Bias-adjusted cov matching 1.16 (0.84) 1.84 (0.74) 1.54 (0.75) 1.74 (0.71) 1.72 (0.68)Pscore matching 1.43 (0.81) 1.95 (0.69) 1.85 (0.69) 1.85 (0.68) 1.79 (0.67)Bias-adjusted pscore matching 1.22 (0.81) 1.89 (0.71) 1.78 (0.70) 1.67 (0.69) 1.72 (0.68)
Regression estimatesMean difference 1.79 (0.67)Linear 1.72 (0.68)Quadratic 2.27 (0.80)Weighting on pscore 1.79 (0.67)Weighting and linear regression 1.69 (0.66)
Panel B:Nonexperimental estimates
Simple matching 2.07 (1.13) 1.62 (0.91) 0.47 (0.85) −0.11 (0.75) −15.20 (0.61)Bias-adjusted matching 2.42 (1.13) 2.51 (0.90) 2.48 (0.83) 2.26 (0.71) 0.84 (0.63)Pscore matching 2.32 (1.21) 2.06 (1.01) 0.79 (1.25) −0.18 (0.92) −1.55 (0.80)Bias-adjusted pscore matching 3.10 (1.21) 2.61 (1.03) 2.37 (1.28) 2.32 (0.94) 2.00 (0.84)
Regression estimatesMean difference −15.20 (0.66)Linear 0.84 (0.88)Quadratic 3.26 (1.04)Weighting on pscore 1.77 (0.67)Weighting and linear regression 1.65 (0.66)
NOTE: The outcome is earnings in 1978 in thousands of dollars.
![Page 110: Matching and Selection on Observables](https://reader030.fdocuments.us/reader030/viewer/2022012809/61bf3866b8eb1e764d18aa33/html5/thumbnails/110.jpg)
What to Match On: A Brief Introduction to DAGs
A Directed Acyclic Graph (DAG) is a set of nodes (vertices) anddirected edges (arrows) with no directed cycles.
Z X Y
U
Nodes represent variables.
Arrows represent direct causal e↵ects (“direct” means notmediated by other variables in the graph).
A causal DAG must include:1 All direct causal e↵ects among the variables in the graph2 All common causes (even if unmeasured) of any pair of
variables in the graph
![Page 111: Matching and Selection on Observables](https://reader030.fdocuments.us/reader030/viewer/2022012809/61bf3866b8eb1e764d18aa33/html5/thumbnails/111.jpg)
Some DAG Concepts
In the DAG:
Z X Y
U
U is a parent of X and Y .
X and Y are descendants of Z .
There is a directed path from Z to Y .
There are two paths from Z to U (but no directed path).
X is a collider of the path Z ! X U.
X is a noncollider of the path Z ! X ! Y .
![Page 112: Matching and Selection on Observables](https://reader030.fdocuments.us/reader030/viewer/2022012809/61bf3866b8eb1e764d18aa33/html5/thumbnails/112.jpg)
Confounding
Confounding arises when the treatment and the outcome havecommon causes.
D
X
Y
The association between D and Y does not only reflect thecausal e↵ect of D on Y .Confounding creates backdoor paths, that is, paths startingwith incoming arrows. In the DAG we can see a backdoorpath from D to Y (D X ! Y ).However, once we “block” the backdoor path by conditioningon the common cause, X , the association between D and Y isonly reflective of the e↵ect of D on Y .
D
X
Y
![Page 113: Matching and Selection on Observables](https://reader030.fdocuments.us/reader030/viewer/2022012809/61bf3866b8eb1e764d18aa33/html5/thumbnails/113.jpg)
Blocked Paths
A path is blocked if and only if:
It contains a noncollider that has been conditioned on,
Or, it contains a collider that has not been conditioned on andhas no descendants that have been conditioned on.
Examples:
1 Conditioning on a noncollider blocks a path:
X Z Y
2 Conditioning on a collider opens a path:
Z X Y
3 Not conditioning on a collider (or its descendants) leaves apath blocked:
Z X Y
![Page 114: Matching and Selection on Observables](https://reader030.fdocuments.us/reader030/viewer/2022012809/61bf3866b8eb1e764d18aa33/html5/thumbnails/114.jpg)
Backdoor Criterion
Suppose that:D is a treatment,Y is an outcome,X
1
, . . . ,Xk is a set of covariates.
Is it enough to match on X
1
, . . . ,Xk in order to estimate thecausal e↵ect of D on Y ? Pearl’s Backdoor Criterionprovides su�cient conditions.
Backdoor criterion: X1
, . . . ,Xk satisfies the backdoor criterionwith respect to (D,Y ) if:
1 No element of X1
, . . . ,Xk is a descendant of D.2 All backdoor paths from D to Y are blocked by X
1
, . . . ,Xk .
If X1
, . . . ,Xk satisfies the backdoor criterion with respect to(D,Y ), then matching on X
1
, . . . ,Xk identifies the causale↵ect of D on Y .
![Page 115: Matching and Selection on Observables](https://reader030.fdocuments.us/reader030/viewer/2022012809/61bf3866b8eb1e764d18aa33/html5/thumbnails/115.jpg)
Implication for Practice
Matching on all common causes is su�cient: There aretwo backdoor paths from D to Y .
X
1
D Y
X
2
Conditioning on X
1
and X
2
blocks the backdoor paths.
Matching may work even if not all common causes are
observed: U and X
1
are common causes.
U
X
1
X
2
D Y
Conditioning on X
1
and X
2
is enough.
![Page 116: Matching and Selection on Observables](https://reader030.fdocuments.us/reader030/viewer/2022012809/61bf3866b8eb1e764d18aa33/html5/thumbnails/116.jpg)
Implication for Practice
Matching on all common causes is su�cient: There aretwo backdoor paths from D to Y .
X
1
D Y
X
2
Conditioning on X
1
and X
2
blocks the backdoor paths.
Matching may work even if not all common causes are
observed: U and X
1
are common causes.
U
X
1
X
2
D Y
Conditioning on X
1
and X
2
is enough.
![Page 117: Matching and Selection on Observables](https://reader030.fdocuments.us/reader030/viewer/2022012809/61bf3866b8eb1e764d18aa33/html5/thumbnails/117.jpg)
Implication for Practice
Matching on all common causes is su�cient: There aretwo backdoor paths from D to Y .
X
1
D Y
X
2
Conditioning on X
1
and X
2
blocks the backdoor paths.
Matching may work even if not all common causes are
observed: U and X
1
are common causes.
U
X
1
X
2
D Y
Conditioning on X
1
and X
2
is enough.
![Page 118: Matching and Selection on Observables](https://reader030.fdocuments.us/reader030/viewer/2022012809/61bf3866b8eb1e764d18aa33/html5/thumbnails/118.jpg)
Implication for Practice
Matching on all common causes is su�cient: There aretwo backdoor paths from D to Y .
X
1
D Y
X
2
Conditioning on X
1
and X
2
blocks the backdoor paths.
Matching may work even if not all common causes are
observed: U and X
1
are common causes.
U
X
1
X
2
D Y
Conditioning on X
1
and X
2
is enough.
![Page 119: Matching and Selection on Observables](https://reader030.fdocuments.us/reader030/viewer/2022012809/61bf3866b8eb1e764d18aa33/html5/thumbnails/119.jpg)
Implications for Practice (cont.)Matching on an outcome may create bias: There is onlyone backdoor path from D to Y .
X
1
D Y
X
2
Conditioning on X
1
blocks the backdoor path. Conditioningon X
2
would open a path!Matching on all pretreatment covariates is not always
the answer: There is one backdoor path and it is closed.
U
1
U
2
X D Y
No confounding. Conditioning on X would open a path!
![Page 120: Matching and Selection on Observables](https://reader030.fdocuments.us/reader030/viewer/2022012809/61bf3866b8eb1e764d18aa33/html5/thumbnails/120.jpg)
Implications for Practice (cont.)Matching on an outcome may create bias: There is onlyone backdoor path from D to Y .
X
1
D Y
X
2
Conditioning on X
1
blocks the backdoor path. Conditioningon X
2
would open a path!
Matching on all pretreatment covariates is not always
the answer: There is one backdoor path and it is closed.
U
1
U
2
X D Y
No confounding. Conditioning on X would open a path!
![Page 121: Matching and Selection on Observables](https://reader030.fdocuments.us/reader030/viewer/2022012809/61bf3866b8eb1e764d18aa33/html5/thumbnails/121.jpg)
Implications for Practice (cont.)Matching on an outcome may create bias: There is onlyone backdoor path from D to Y .
X
1
D Y
X
2
Conditioning on X
1
blocks the backdoor path. Conditioningon X
2
would open a path!Matching on all pretreatment covariates is not always
the answer: There is one backdoor path and it is closed.
U
1
U
2
X D Y
No confounding. Conditioning on X would open a path!
![Page 122: Matching and Selection on Observables](https://reader030.fdocuments.us/reader030/viewer/2022012809/61bf3866b8eb1e764d18aa33/html5/thumbnails/122.jpg)
Implications for Practice (cont.)Matching on an outcome may create bias: There is onlyone backdoor path from D to Y .
X
1
D Y
X
2
Conditioning on X
1
blocks the backdoor path. Conditioningon X
2
would open a path!Matching on all pretreatment covariates is not always
the answer: There is one backdoor path and it is closed.
U
1
U
2
X D Y
No confounding. Conditioning on X would open a path!
![Page 123: Matching and Selection on Observables](https://reader030.fdocuments.us/reader030/viewer/2022012809/61bf3866b8eb1e764d18aa33/html5/thumbnails/123.jpg)
Implications for Practice (cont.)
There may be di↵erent set of conditioning variables that
satisfy the backdoor criterion:
X
1
X
1
X
2
X
2
X
3
X
3
D DY Y
Conditioning on the common causes, X1
and X
2
, is su�cient,as always.But conditioning on X
3
only also blocks the backdoor paths.
![Page 124: Matching and Selection on Observables](https://reader030.fdocuments.us/reader030/viewer/2022012809/61bf3866b8eb1e764d18aa33/html5/thumbnails/124.jpg)
DAGs: Final Remarks
The Backdoor Criterion provides a useful graphical test that can beused to select matching variables.
The Backdoor Criterion is only a set of su�cient conditions, but coversmost of the interesting cases. However, there are general results(Shpitser, VanderWelle, and Robins, 2012).
Applying the backdoor criterion requires knowledge of the DAG. Thereexist results on the validity of matching under limited knowledge of theDAG (VanderWeele and Shpitser, 2011):
Suppose that there is a set of observed pretreatment covariates and thatwe know if they are causes of the treatment and/or the outcome.
Then, as long as there exists a subset of the observed covariates suchthat matching on them is enough to control for confounding, thenmatching on the subset of the observed covariates that are either a causeof the treatment or a cause of the outcome or both is also enough tocontrol for confounding.