Sensitivity to Missing Data Assumptions: Theory and An Evaluation …pkline/papers/Slides_KS.pdf ·...
Transcript of Sensitivity to Missing Data Assumptions: Theory and An Evaluation …pkline/papers/Slides_KS.pdf ·...
![Page 1: Sensitivity to Missing Data Assumptions: Theory and An Evaluation …pkline/papers/Slides_KS.pdf · 2015. 7. 26. · Sensitivity to Missing Data Assumptions: Theory and An Evaluation](https://reader036.fdocuments.us/reader036/viewer/2022071505/6125a75e9ed4b2615e6b17d6/html5/thumbnails/1.jpg)
Sensitivity to Missing Data Assumptions: Theoryand An Evaluation of the U.S. Wage Structure
Patrick Kline Andres Santos
UC Berkeley UC San Diego
September 21, 2012
Patrick Kline UC Berkeley
![Page 2: Sensitivity to Missing Data Assumptions: Theory and An Evaluation …pkline/papers/Slides_KS.pdf · 2015. 7. 26. · Sensitivity to Missing Data Assumptions: Theory and An Evaluation](https://reader036.fdocuments.us/reader036/viewer/2022071505/6125a75e9ed4b2615e6b17d6/html5/thumbnails/2.jpg)
The Problem
Missing data is ubiquitous in modern economic research
• Roughly one quarter of earnings observations in CPS and Census.• Problem can be worse in proprietary surveys and experiments.
Equally ubiquitous solutions: Missing at Random (MAR)• Justification for imputation procedures in CPS and Census.• And for ignoring missingness altogether...
Question: How can we evaluate sensitivity of conclusions to MAR?• Want to consider plausible deviations from MAR without presuming
much about selection mechanism.• And to enable study of sensitivity at different points in conditional
distribution (tails likely more sensitive).
Patrick Kline UC Berkeley
![Page 3: Sensitivity to Missing Data Assumptions: Theory and An Evaluation …pkline/papers/Slides_KS.pdf · 2015. 7. 26. · Sensitivity to Missing Data Assumptions: Theory and An Evaluation](https://reader036.fdocuments.us/reader036/viewer/2022071505/6125a75e9ed4b2615e6b17d6/html5/thumbnails/3.jpg)
The Problem
Missing data is ubiquitous in modern economic research
• Roughly one quarter of earnings observations in CPS and Census.• Problem can be worse in proprietary surveys and experiments.
Equally ubiquitous solutions: Missing at Random (MAR)• Justification for imputation procedures in CPS and Census.• And for ignoring missingness altogether...
Question: How can we evaluate sensitivity of conclusions to MAR?• Want to consider plausible deviations from MAR without presuming
much about selection mechanism.• And to enable study of sensitivity at different points in conditional
distribution (tails likely more sensitive).
Patrick Kline UC Berkeley
![Page 4: Sensitivity to Missing Data Assumptions: Theory and An Evaluation …pkline/papers/Slides_KS.pdf · 2015. 7. 26. · Sensitivity to Missing Data Assumptions: Theory and An Evaluation](https://reader036.fdocuments.us/reader036/viewer/2022071505/6125a75e9ed4b2615e6b17d6/html5/thumbnails/4.jpg)
The Problem
Missing data is ubiquitous in modern economic research
• Roughly one quarter of earnings observations in CPS and Census.• Problem can be worse in proprietary surveys and experiments.
Equally ubiquitous solutions: Missing at Random (MAR)• Justification for imputation procedures in CPS and Census.• And for ignoring missingness altogether...
Question: How can we evaluate sensitivity of conclusions to MAR?• Want to consider plausible deviations from MAR without presuming
much about selection mechanism.• And to enable study of sensitivity at different points in conditional
distribution (tails likely more sensitive).
Patrick Kline UC Berkeley
![Page 5: Sensitivity to Missing Data Assumptions: Theory and An Evaluation …pkline/papers/Slides_KS.pdf · 2015. 7. 26. · Sensitivity to Missing Data Assumptions: Theory and An Evaluation](https://reader036.fdocuments.us/reader036/viewer/2022071505/6125a75e9ed4b2615e6b17d6/html5/thumbnails/5.jpg)
The Model
Consider a triplet (Y,X,D) with Y ∈ R, X ∈ Rl, D ∈ {0, 1}.
Y = X ′β(τ) + ε P (ε ≤ 0|X) = τ
and D = 1 if Y is observable, and D = 0 if Y is missing.
Without Missing Data• Quantile regression as a summary of conditional distribution.• Under misspecification⇒ best approximation to true model.
Without MAR• Point identification fails.• Need to consider misspecification and partial identification.
Patrick Kline UC Berkeley
![Page 6: Sensitivity to Missing Data Assumptions: Theory and An Evaluation …pkline/papers/Slides_KS.pdf · 2015. 7. 26. · Sensitivity to Missing Data Assumptions: Theory and An Evaluation](https://reader036.fdocuments.us/reader036/viewer/2022071505/6125a75e9ed4b2615e6b17d6/html5/thumbnails/6.jpg)
The Model
Consider a triplet (Y,X,D) with Y ∈ R, X ∈ Rl, D ∈ {0, 1}.
Y = X ′β(τ) + ε P (ε ≤ 0|X) = τ
and D = 1 if Y is observable, and D = 0 if Y is missing.
Without Missing Data• Quantile regression as a summary of conditional distribution.• Under misspecification⇒ best approximation to true model.
Without MAR• Point identification fails.• Need to consider misspecification and partial identification.
Patrick Kline UC Berkeley
![Page 7: Sensitivity to Missing Data Assumptions: Theory and An Evaluation …pkline/papers/Slides_KS.pdf · 2015. 7. 26. · Sensitivity to Missing Data Assumptions: Theory and An Evaluation](https://reader036.fdocuments.us/reader036/viewer/2022071505/6125a75e9ed4b2615e6b17d6/html5/thumbnails/7.jpg)
The Model
Consider a triplet (Y,X,D) with Y ∈ R, X ∈ Rl, D ∈ {0, 1}.
Y = X ′β(τ) + ε P (ε ≤ 0|X) = τ
and D = 1 if Y is observable, and D = 0 if Y is missing.
Without Missing Data• Quantile regression as a summary of conditional distribution.• Under misspecification⇒ best approximation to true model.
Without MAR• Point identification fails.• Need to consider misspecification and partial identification.
Patrick Kline UC Berkeley
![Page 8: Sensitivity to Missing Data Assumptions: Theory and An Evaluation …pkline/papers/Slides_KS.pdf · 2015. 7. 26. · Sensitivity to Missing Data Assumptions: Theory and An Evaluation](https://reader036.fdocuments.us/reader036/viewer/2022071505/6125a75e9ed4b2615e6b17d6/html5/thumbnails/8.jpg)
Missing Data
Define the conditional distribution functions,
Fy|x(c) ≡ P (Y ≤ c|X = x) Fy|d,x(c) ≡ P (Y ≤ c|D = d,X = x)
Missing at Random• Rubin (1974). Assume that Fy|x(c) = Fy|1,x(c) for all c ∈ R.
Nonparametric Bounds• Manski (1994). Exploit that 0 ≤ Fy|0,x(c) ≤ 1 for all c ∈ R.• Often bounds are uninformative.• And typically overly conservative.
Between these extremes lie a continuum of selection mechanisms...
Patrick Kline UC Berkeley
![Page 9: Sensitivity to Missing Data Assumptions: Theory and An Evaluation …pkline/papers/Slides_KS.pdf · 2015. 7. 26. · Sensitivity to Missing Data Assumptions: Theory and An Evaluation](https://reader036.fdocuments.us/reader036/viewer/2022071505/6125a75e9ed4b2615e6b17d6/html5/thumbnails/9.jpg)
Missing Data
Define the conditional distribution functions,
Fy|x(c) ≡ P (Y ≤ c|X = x) Fy|d,x(c) ≡ P (Y ≤ c|D = d,X = x)
Missing at Random• Rubin (1974). Assume that Fy|x(c) = Fy|1,x(c) for all c ∈ R.
Nonparametric Bounds• Manski (1994). Exploit that 0 ≤ Fy|0,x(c) ≤ 1 for all c ∈ R.• Often bounds are uninformative.• And typically overly conservative.
Between these extremes lie a continuum of selection mechanisms...
Patrick Kline UC Berkeley
![Page 10: Sensitivity to Missing Data Assumptions: Theory and An Evaluation …pkline/papers/Slides_KS.pdf · 2015. 7. 26. · Sensitivity to Missing Data Assumptions: Theory and An Evaluation](https://reader036.fdocuments.us/reader036/viewer/2022071505/6125a75e9ed4b2615e6b17d6/html5/thumbnails/10.jpg)
Missing Data
Define the conditional distribution functions,
Fy|x(c) ≡ P (Y ≤ c|X = x) Fy|d,x(c) ≡ P (Y ≤ c|D = d,X = x)
Missing at Random• Rubin (1974). Assume that Fy|x(c) = Fy|1,x(c) for all c ∈ R.
Nonparametric Bounds• Manski (1994). Exploit that 0 ≤ Fy|0,x(c) ≤ 1 for all c ∈ R.• Often bounds are uninformative.• And typically overly conservative.
Between these extremes lie a continuum of selection mechanisms...
Patrick Kline UC Berkeley
![Page 11: Sensitivity to Missing Data Assumptions: Theory and An Evaluation …pkline/papers/Slides_KS.pdf · 2015. 7. 26. · Sensitivity to Missing Data Assumptions: Theory and An Evaluation](https://reader036.fdocuments.us/reader036/viewer/2022071505/6125a75e9ed4b2615e6b17d6/html5/thumbnails/11.jpg)
Deviation from MAR
Characterize selection as distance between Fy|1,x and Fy|0,x:
d(Fy|1,x, Fy|0,x) ≤ k
• A way of nonparametrically indexing set of selection mechanisms:
◦ Missing at random corresponds to imposing k = 0.
◦ Manski Bounds corresponds to imposing k =∞.
• Allows study of sensitivity to deviations from MAR(e.g. what level of k is necessary to overturn conclusions regarding β(τ)?)
• And in some cases k may be estimated using validation data.
Patrick Kline UC Berkeley
![Page 12: Sensitivity to Missing Data Assumptions: Theory and An Evaluation …pkline/papers/Slides_KS.pdf · 2015. 7. 26. · Sensitivity to Missing Data Assumptions: Theory and An Evaluation](https://reader036.fdocuments.us/reader036/viewer/2022071505/6125a75e9ed4b2615e6b17d6/html5/thumbnails/12.jpg)
General Outline
Nominal Identified Set• Find possible quantiles under restriction d(Fy|1,x, Fy|0,x) ≤ k.• Bound β(τ) as a function of τ, k allowing for misspecification.
Inference• Obtain distribution of estimates of boundary of nominal identified set.• Exploit distribution as a function of (τ, k) for sensitivity analysis.
Changes in Wage Structure• Examine changes in wage structure across Decennial Censuses.• Measure departures from MAR in matched CPS-SSA.
Patrick Kline UC Berkeley
![Page 13: Sensitivity to Missing Data Assumptions: Theory and An Evaluation …pkline/papers/Slides_KS.pdf · 2015. 7. 26. · Sensitivity to Missing Data Assumptions: Theory and An Evaluation](https://reader036.fdocuments.us/reader036/viewer/2022071505/6125a75e9ed4b2615e6b17d6/html5/thumbnails/13.jpg)
General Outline
Nominal Identified Set• Find possible quantiles under restriction d(Fy|1,x, Fy|0,x) ≤ k.• Bound β(τ) as a function of τ, k allowing for misspecification.
Inference• Obtain distribution of estimates of boundary of nominal identified set.• Exploit distribution as a function of (τ, k) for sensitivity analysis.
Changes in Wage Structure• Examine changes in wage structure across Decennial Censuses.• Measure departures from MAR in matched CPS-SSA.
Patrick Kline UC Berkeley
![Page 14: Sensitivity to Missing Data Assumptions: Theory and An Evaluation …pkline/papers/Slides_KS.pdf · 2015. 7. 26. · Sensitivity to Missing Data Assumptions: Theory and An Evaluation](https://reader036.fdocuments.us/reader036/viewer/2022071505/6125a75e9ed4b2615e6b17d6/html5/thumbnails/14.jpg)
General Outline
Nominal Identified Set• Find possible quantiles under restriction d(Fy|1,x, Fy|0,x) ≤ k.• Bound β(τ) as a function of τ, k allowing for misspecification.
Inference• Obtain distribution of estimates of boundary of nominal identified set.• Exploit distribution as a function of (τ, k) for sensitivity analysis.
Changes in Wage Structure• Examine changes in wage structure across Decennial Censuses.• Measure departures from MAR in matched CPS-SSA.
Patrick Kline UC Berkeley
![Page 15: Sensitivity to Missing Data Assumptions: Theory and An Evaluation …pkline/papers/Slides_KS.pdf · 2015. 7. 26. · Sensitivity to Missing Data Assumptions: Theory and An Evaluation](https://reader036.fdocuments.us/reader036/viewer/2022071505/6125a75e9ed4b2615e6b17d6/html5/thumbnails/15.jpg)
Literature Review
Missing Data:
Rubin (1974), Greenlees, Reece, & Zieschang (1982), Lillard, Smith &Welch (1986), Manski (1994,2003), Dinardo, McCrary, & Sanbonmatsu(2006), Lee (2008)
Sensitivity Analysis:
Altonji, Elder, and Taber (2005); Rosenbaum and Rubin (1983); Rosenbaum(1987, 2002).
Misspecification
White (1980, 1982), Chamberlain (1994), Angrist, Chernozhukov &Fernandez-Val (2006).
Misspecification and Partial Identification
Horowitz & Manski (2006), Stoye (2007), Ponomareva & Tamer (2009),Bugni, Canay & Guggenberger (2010).
Patrick Kline UC Berkeley
![Page 16: Sensitivity to Missing Data Assumptions: Theory and An Evaluation …pkline/papers/Slides_KS.pdf · 2015. 7. 26. · Sensitivity to Missing Data Assumptions: Theory and An Evaluation](https://reader036.fdocuments.us/reader036/viewer/2022071505/6125a75e9ed4b2615e6b17d6/html5/thumbnails/16.jpg)
1 Nominal Identified Set
2 Parametric Approximation
3 Inference
4 Changes in Wage Structure
5 CPS-SSA Analysis
Patrick Kline UC Berkeley
![Page 17: Sensitivity to Missing Data Assumptions: Theory and An Evaluation …pkline/papers/Slides_KS.pdf · 2015. 7. 26. · Sensitivity to Missing Data Assumptions: Theory and An Evaluation](https://reader036.fdocuments.us/reader036/viewer/2022071505/6125a75e9ed4b2615e6b17d6/html5/thumbnails/17.jpg)
Bounds on Conditional Quantiles
Define true conditional quantile q(τ |x) and non-missing probability p(x):
Fy|x(q(τ |x)) = τ p(x) ≡ P (D = 1|X = x)
Goal: Obtain identified set for q(τ |x) under hypothetical d(Fy|1,x, Fy|0,x) ≤ k.
For the distance metric we use Kolmogorov-Smirnov, which is given by:
S(F ) ≡ supx∈X
KS(Fy|1,x, Fy|0,x) = supx∈X
supc∈R|Fy|1,x(c)− Fy|0,x(c)|
Comments
• KS provides control over maximal distance between Fy|1,x and Fy|0,x.• Nests a wide nonparametric class of potential selection mechanisms.
Patrick Kline UC Berkeley
![Page 18: Sensitivity to Missing Data Assumptions: Theory and An Evaluation …pkline/papers/Slides_KS.pdf · 2015. 7. 26. · Sensitivity to Missing Data Assumptions: Theory and An Evaluation](https://reader036.fdocuments.us/reader036/viewer/2022071505/6125a75e9ed4b2615e6b17d6/html5/thumbnails/18.jpg)
Choice of Metric
Information necessarily lost with scalar index of selection, but ...• Not ruling out selection mechanisms as done in parametric approaches.• Different levels of selection can be considered at each quantile.• Easy extension to different weights on covariate realizations.• Scalar metric well suited to sensitivity analysis.
What is a big S(F )?
Example: Suppose the data generating process is given by:
(Y, v) ∼ N(0,( 1 ρρ 1
)) D = 1{µ+ v > 0} .
where µ is chosen so that the missing probability is 25% to match data.
ρ S(F ) ρ S(F ) ρ S(F )0.10 0.0672 0.40 0.2778 0.70 0.51650.20 0.1355 0.50 0.3520 0.80 0.61580.30 0.2069 0.60 0.4304 0.90 0.7377
Patrick Kline UC Berkeley
![Page 19: Sensitivity to Missing Data Assumptions: Theory and An Evaluation …pkline/papers/Slides_KS.pdf · 2015. 7. 26. · Sensitivity to Missing Data Assumptions: Theory and An Evaluation](https://reader036.fdocuments.us/reader036/viewer/2022071505/6125a75e9ed4b2615e6b17d6/html5/thumbnails/19.jpg)
Choice of Metric
Information necessarily lost with scalar index of selection, but ...• Not ruling out selection mechanisms as done in parametric approaches.• Different levels of selection can be considered at each quantile.• Easy extension to different weights on covariate realizations.• Scalar metric well suited to sensitivity analysis.
What is a big S(F )?
Example: Suppose the data generating process is given by:
(Y, v) ∼ N(0,( 1 ρρ 1
)) D = 1{µ+ v > 0} .
where µ is chosen so that the missing probability is 25% to match data.
ρ S(F ) ρ S(F ) ρ S(F )0.10 0.0672 0.40 0.2778 0.70 0.51650.20 0.1355 0.50 0.3520 0.80 0.61580.30 0.2069 0.60 0.4304 0.90 0.7377
Patrick Kline UC Berkeley
![Page 20: Sensitivity to Missing Data Assumptions: Theory and An Evaluation …pkline/papers/Slides_KS.pdf · 2015. 7. 26. · Sensitivity to Missing Data Assumptions: Theory and An Evaluation](https://reader036.fdocuments.us/reader036/viewer/2022071505/6125a75e9ed4b2615e6b17d6/html5/thumbnails/20.jpg)
Choice of Metric
Information necessarily lost with scalar index of selection, but ...• Not ruling out selection mechanisms as done in parametric approaches.• Different levels of selection can be considered at each quantile.• Easy extension to different weights on covariate realizations.• Scalar metric well suited to sensitivity analysis.
What is a big S(F )?
Example: Suppose the data generating process is given by:
(Y, v) ∼ N(0,( 1 ρρ 1
)) D = 1{µ+ v > 0} .
where µ is chosen so that the missing probability is 25% to match data.
ρ S(F ) ρ S(F ) ρ S(F )0.10 0.0672 0.40 0.2778 0.70 0.51650.20 0.1355 0.50 0.3520 0.80 0.61580.30 0.2069 0.60 0.4304 0.90 0.7377
Patrick Kline UC Berkeley
![Page 21: Sensitivity to Missing Data Assumptions: Theory and An Evaluation …pkline/papers/Slides_KS.pdf · 2015. 7. 26. · Sensitivity to Missing Data Assumptions: Theory and An Evaluation](https://reader036.fdocuments.us/reader036/viewer/2022071505/6125a75e9ed4b2615e6b17d6/html5/thumbnails/21.jpg)
Figure: Missing and Observed Outcome CDFs
−2.5 −2 −1.5 −1 −0.5 0 0.5 1 1.5 2 2.50
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1Missing and Observed Outcome CDFs
Observed (ρ=.1)
Missing (ρ=.1)Observed (ρ=.5)
Missing (ρ=.5)
Patrick Kline UC Berkeley
![Page 22: Sensitivity to Missing Data Assumptions: Theory and An Evaluation …pkline/papers/Slides_KS.pdf · 2015. 7. 26. · Sensitivity to Missing Data Assumptions: Theory and An Evaluation](https://reader036.fdocuments.us/reader036/viewer/2022071505/6125a75e9ed4b2615e6b17d6/html5/thumbnails/22.jpg)
Figure: Distance Between Missing and Observed Outcome CDFs
−2.5 −2 −1.5 −1 −0.5 0 0.5 1 1.5 2 2.50
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4Vertical Distance Between CDFs
ρ=.1ρ=.5
Patrick Kline UC Berkeley
![Page 23: Sensitivity to Missing Data Assumptions: Theory and An Evaluation …pkline/papers/Slides_KS.pdf · 2015. 7. 26. · Sensitivity to Missing Data Assumptions: Theory and An Evaluation](https://reader036.fdocuments.us/reader036/viewer/2022071505/6125a75e9ed4b2615e6b17d6/html5/thumbnails/23.jpg)
Mixture Interpretation
Suppose a fraction k of the missing population is distributed according to anarbitrary CDF F̃y|x, while the remaining fraction 1− k of that population aremissing at random in the sense that they are distributed according to Fy|1,x.Then:
Fy|0,x(c) = (1− k)Fy|1,x(c) + kF̃y|x(c) ,
where F̃y|x is unknown, and the above holds for all x ∈ X and any c ∈ R.Now:
S(F ) = supx∈X
supc∈R|Fy|1,x(c)− kF̃y|x(c)− (1− k)Fy|1,x(c)|
= k × supx∈X
supc∈R|Fy|1,x(c)− F̃y|x(c)| .
Worst Case: S(F ) = k. Thus, k gives bound on the fraction of the missingsample that is not well represented by the observed data distribution.
Patrick Kline UC Berkeley
![Page 24: Sensitivity to Missing Data Assumptions: Theory and An Evaluation …pkline/papers/Slides_KS.pdf · 2015. 7. 26. · Sensitivity to Missing Data Assumptions: Theory and An Evaluation](https://reader036.fdocuments.us/reader036/viewer/2022071505/6125a75e9ed4b2615e6b17d6/html5/thumbnails/24.jpg)
Nominal Identified set for q(τ |·)
Assumption (A)(i) X ∈ Rl has finite support X .(ii) Fy|d,x(c) is continuous, strictly increasing ∀c with 0 < Fy|d,x(c) < 1.(iii) D equals one if Y is observable and zero otherwise.
Lemma Under (A), if S(F ) ≤ k, then the identified set for q(τ |·) is:
C(τ, k) ≡ {θ : X → R : qL(τ, k|x) ≤ θ(x) ≤ qU (τ, k|x)}
where the bounds qL(τ, k|x) and qU (τ, k|x) are given by:
qL(τ, k|x) ≡ F−1y|1,x
(τ −min{τ + kp(x), 1}{1− p(x)}p(x)
)qU (τ, k|x) ≡ F−1y|1,x
(τ −max{τ − kp(x), 0}{1− p(x)}p(x)
)Patrick Kline UC Berkeley
![Page 25: Sensitivity to Missing Data Assumptions: Theory and An Evaluation …pkline/papers/Slides_KS.pdf · 2015. 7. 26. · Sensitivity to Missing Data Assumptions: Theory and An Evaluation](https://reader036.fdocuments.us/reader036/viewer/2022071505/6125a75e9ed4b2615e6b17d6/html5/thumbnails/25.jpg)
Example – No Covariates and p(x) = 2/3
Bounds on Quantiles as a Function of k
0.000
0.100
0.200
0.300
0.400
0.500
0.600
0.700
0.800
0.900
1.000
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
k
Qua
ntile
of O
bser
ved
Dat
a
Upper Bound (median)Lower Bound (median)Upper Bound (p25)Lower Bound (p25)Upper Bound (p75)Lower Bound (p75)
Patrick Kline UC Berkeley
![Page 26: Sensitivity to Missing Data Assumptions: Theory and An Evaluation …pkline/papers/Slides_KS.pdf · 2015. 7. 26. · Sensitivity to Missing Data Assumptions: Theory and An Evaluation](https://reader036.fdocuments.us/reader036/viewer/2022071505/6125a75e9ed4b2615e6b17d6/html5/thumbnails/26.jpg)
Sensitivity Example 1 (Pointwise Analysis)
Suppose X is binary so that X ∈ {0, 1}, and write:
Y = q(τ |X) + ε P (ε ≤ 0|X) = τ
Suppose that under MAR we have q(τ0|X = 1) 6= q(τ0|X = 0) for some τ0.
We can evaluate sensitivity of this conclusion to MAR by defining:
k0 ≡ inf k : qL(τ0, k|X=1)−qU (τ0, k|X=0) ≤ 0 ≤ qU (τ0, k|X=1)−qL(τ0, k|X=0)
Comment
• k0 is the minimal level for overturning q(τ0|X = 1) 6= q(τ0|X = 0).• Large k0 indicates robust conclusion.
Patrick Kline UC Berkeley
![Page 27: Sensitivity to Missing Data Assumptions: Theory and An Evaluation …pkline/papers/Slides_KS.pdf · 2015. 7. 26. · Sensitivity to Missing Data Assumptions: Theory and An Evaluation](https://reader036.fdocuments.us/reader036/viewer/2022071505/6125a75e9ed4b2615e6b17d6/html5/thumbnails/27.jpg)
Example 2 (Distributional Analysis)
We want to know if Fy|x=1 first order stochastically dominates Fy|x=0.
Suppose that under MAR we find q(τ |X = 1) > q(τ |X = 0) at all τ .
We evaluate sensitivity of FOSD conclusion by examining:
k0 ≡ inf k : qL(τ, k|X = 1) ≤ qU (τ, k|X = 0) for some τ ∈ (0, 1)
Comment
• k0 is the minimal level of selection under which the conclusion of FOSDmay be undermined.
Patrick Kline UC Berkeley
![Page 28: Sensitivity to Missing Data Assumptions: Theory and An Evaluation …pkline/papers/Slides_KS.pdf · 2015. 7. 26. · Sensitivity to Missing Data Assumptions: Theory and An Evaluation](https://reader036.fdocuments.us/reader036/viewer/2022071505/6125a75e9ed4b2615e6b17d6/html5/thumbnails/28.jpg)
Example 3 (Breakdown Analysis)
Y = q(τ |X) + ε P (ε ≤ 0|X) = τ
Suppose that under MAR we have q(τ |X = 1) 6= q(τ |X = 0) for multiple τ .
More nuanced analysis can consider the quantile specific critical level:
κ0(τ)≡inf k : qL(τ, k|X=1)−qU (τ, k|X=0) ≤ 0 ≤ qU (τ, k|X=1)−qL(τ, k|X=0)
Comment
• Changes in τ map out a “breakdown function” τ 7→ κ0(τ).• Reveals differential sensitivity of the entire conditional distribution.
Patrick Kline UC Berkeley
![Page 29: Sensitivity to Missing Data Assumptions: Theory and An Evaluation …pkline/papers/Slides_KS.pdf · 2015. 7. 26. · Sensitivity to Missing Data Assumptions: Theory and An Evaluation](https://reader036.fdocuments.us/reader036/viewer/2022071505/6125a75e9ed4b2615e6b17d6/html5/thumbnails/29.jpg)
1 Nominal Identified Set
2 Parametric Approximation
3 Inference
4 Changes in Wage Structure
5 CPS-SSA Analysis
Patrick Kline UC Berkeley
![Page 30: Sensitivity to Missing Data Assumptions: Theory and An Evaluation …pkline/papers/Slides_KS.pdf · 2015. 7. 26. · Sensitivity to Missing Data Assumptions: Theory and An Evaluation](https://reader036.fdocuments.us/reader036/viewer/2022071505/6125a75e9ed4b2615e6b17d6/html5/thumbnails/30.jpg)
Adding Parametric Structure
With lots of covariates, convenient to assume a linear parametric model:
q(τ |X) = X ′β(τ)
Identified set for β(τ) is intersection of C(τ, k) with parametric models:{β(τ) ∈ Rl : qL(τ, k|X) ≤ X ′β(τ) ≤ qU (τ, k|X)
}
Comments:
• Set of functions in identified set may be severely restricted.• Inadvertently rewards misspecification.
Identification by misspecification
Patrick Kline UC Berkeley
![Page 31: Sensitivity to Missing Data Assumptions: Theory and An Evaluation …pkline/papers/Slides_KS.pdf · 2015. 7. 26. · Sensitivity to Missing Data Assumptions: Theory and An Evaluation](https://reader036.fdocuments.us/reader036/viewer/2022071505/6125a75e9ed4b2615e6b17d6/html5/thumbnails/31.jpg)
Figure: Linear Conditional Quantile Functions as a Subset of the Identified Set
5
6
4
5
6
3
4
5
6
2
3
4
5
6
0
1
2
3
4
5
6
Quantile Bounds Lowest Slope Line Highest Slope Line0
1
2
3
4
5
6
1 2 3 4 5 6 7
Quantile Bounds Lowest Slope Line Highest Slope Line
Patrick Kline UC Berkeley
![Page 32: Sensitivity to Missing Data Assumptions: Theory and An Evaluation …pkline/papers/Slides_KS.pdf · 2015. 7. 26. · Sensitivity to Missing Data Assumptions: Theory and An Evaluation](https://reader036.fdocuments.us/reader036/viewer/2022071505/6125a75e9ed4b2615e6b17d6/html5/thumbnails/32.jpg)
Adding Parametric Structure
Instead allow for misspecification in the linear quantile model
Y = X ′β(τ) + η
• If identified, misspecification as “pseudo true” approximation
β(τ) ≡ arg minγ∈Rl
∫(q(τ |x)− x′γ)2dS(x)
• If partially identified, each θ ∈ C(τ, k) implies a pseudo true vector β(τ)
P(τ, k) ≡{β ∈ Rl : β = arg min
γ∈Rl
∫(θ(x)− x′γ)2dS(x) for some θ ∈ C(τ, k)
}⇒ i.e. consider β ∈ Rl that are best approximation to some θ ∈ C(τ, k).
Patrick Kline UC Berkeley
![Page 33: Sensitivity to Missing Data Assumptions: Theory and An Evaluation …pkline/papers/Slides_KS.pdf · 2015. 7. 26. · Sensitivity to Missing Data Assumptions: Theory and An Evaluation](https://reader036.fdocuments.us/reader036/viewer/2022071505/6125a75e9ed4b2615e6b17d6/html5/thumbnails/33.jpg)
Figure: Conditional Quantile and its Pseudo-True Approximation
7
5
6
7
4
5
6
7
2
3
4
5
6
7
1
2
3
4
5
6
7
Q til B d C diti l Q til P d T A i ti0
1
2
3
4
5
6
7
1 2 3 4 5 6 7
Quantile Bounds Conditional Quantile Pseudo‐True Approximation
Patrick Kline UC Berkeley
![Page 34: Sensitivity to Missing Data Assumptions: Theory and An Evaluation …pkline/papers/Slides_KS.pdf · 2015. 7. 26. · Sensitivity to Missing Data Assumptions: Theory and An Evaluation](https://reader036.fdocuments.us/reader036/viewer/2022071505/6125a75e9ed4b2615e6b17d6/html5/thumbnails/34.jpg)
Misspecification
Choice of quadratic loss allows for simple characterization of P(τ, k).
Lemma: Under (A), if S(F ) ≤ k and∫xx′dS(x) is invertible:
P(τ, k) ={β =
[ ∫xx′dS(x)
]−1 ∫xθ(x)dS(x) : q(τ, k|x) ≤ θ(x) ≤ q(τ, k|x)
}Note: It follows that P(τ, k) is convex.
One more assumption: We will assume the measure S is known.
• Analogous to having a known loss function.
Patrick Kline UC Berkeley
![Page 35: Sensitivity to Missing Data Assumptions: Theory and An Evaluation …pkline/papers/Slides_KS.pdf · 2015. 7. 26. · Sensitivity to Missing Data Assumptions: Theory and An Evaluation](https://reader036.fdocuments.us/reader036/viewer/2022071505/6125a75e9ed4b2615e6b17d6/html5/thumbnails/35.jpg)
Parameter of Interest
Inference on parameters of the form λ′β(τ) for some λ ∈ Rl.
Corollary: The identified set for λ′β(τ) is [πL(τ, k), πU (τ, k)], where:
πL(τ, k) ≡ infθλ′[ ∫
xx′dS]−1∫
xθ(x)dS(x) s.t. qL(τ, k|x)≤ θ(x) ≤qU (τ, k|x)
πU (τ, k) ≡ supθλ′[ ∫
xx′dS]−1∫
xθ(x)dS(x) s.t. qL(τ, k|x)≤ θ(x) ≤qU (τ, k|x)
Comments:• Examples: individual coefficients and fitted values.• Bounds sharp for fixed τ and k.• Bounds become wider with k and change across τ .
Patrick Kline UC Berkeley
![Page 36: Sensitivity to Missing Data Assumptions: Theory and An Evaluation …pkline/papers/Slides_KS.pdf · 2015. 7. 26. · Sensitivity to Missing Data Assumptions: Theory and An Evaluation](https://reader036.fdocuments.us/reader036/viewer/2022071505/6125a75e9ed4b2615e6b17d6/html5/thumbnails/36.jpg)
Bounds on the Process
Previous corollary implies that if S(F ) ≤ k, then λ′β(·) belongs to:
G(k) ≡{g : [0, 1]→ R : πL(τ, k) ≤ g(τ) ≤ πU (τ, k) for all τ
}Unfortunately, G(k) is not a sharp identified set of the process λ′β(·).
However ...
• The bounds πL(·, k) and πU (·, k) are in identified set where finite.• The bounds of G(k) are sharp at every point of evaluation τ .• If θ /∈ G(k), then the function θ(·) cannot equal λ′β(·).• Ease of analysis and graphical representation.
Patrick Kline UC Berkeley
![Page 37: Sensitivity to Missing Data Assumptions: Theory and An Evaluation …pkline/papers/Slides_KS.pdf · 2015. 7. 26. · Sensitivity to Missing Data Assumptions: Theory and An Evaluation](https://reader036.fdocuments.us/reader036/viewer/2022071505/6125a75e9ed4b2615e6b17d6/html5/thumbnails/37.jpg)
Example 1 (cont)
Y = α(τ) +X ′β(τ) + η
Suppose that under MAR we have β(τ0) 6= 0 for some specific quantile τ0.
We can evaluate sensitivity of this conclusion to MAR by defining:
k0 ≡ inf k : πL(τ0, k) ≤ 0 ≤ πU (τ0, k)
Comment
• k0 is the minimal level of selection necessary to overturn β(τ0) 6= 0.
Patrick Kline UC Berkeley
![Page 38: Sensitivity to Missing Data Assumptions: Theory and An Evaluation …pkline/papers/Slides_KS.pdf · 2015. 7. 26. · Sensitivity to Missing Data Assumptions: Theory and An Evaluation](https://reader036.fdocuments.us/reader036/viewer/2022071505/6125a75e9ed4b2615e6b17d6/html5/thumbnails/38.jpg)
Example 2 (cont)
Y = α(τ) +X ′β(τ) + η
Suppose that under MAR we have β(τ) > 0 for multiple τ .
We evaluate sensitivity to conclusion of Fy|x being increasing at some τ :
k0 ≡ inf k : πL(τ, k) ≤ 0 for all τ ∈ [0, 1]
Comment
• k0 is the minimal level of selection that overturns β(τ) > 0 for some τ .• πL(·, k0) is in identified set for β(·) under S(F ) ≤ k.
Patrick Kline UC Berkeley
![Page 39: Sensitivity to Missing Data Assumptions: Theory and An Evaluation …pkline/papers/Slides_KS.pdf · 2015. 7. 26. · Sensitivity to Missing Data Assumptions: Theory and An Evaluation](https://reader036.fdocuments.us/reader036/viewer/2022071505/6125a75e9ed4b2615e6b17d6/html5/thumbnails/39.jpg)
Example 3 (cont)
Y = α(τ) +X ′β(τ) + η
Suppose that under MAR we have β(τ) 6= 0 for multiple τ .
More nuanced analysis can consider the quantile specific critical level:
κ0(τ) ≡ inf k : πL(τ, k) ≤ 0 ≤ πU (τ, k)
Comment
• Changing τ maps out a “breakdown function” τ 7→ κ0(τ).• Reveals differential sensitivity of the entire conditional distribution.
Patrick Kline UC Berkeley
![Page 40: Sensitivity to Missing Data Assumptions: Theory and An Evaluation …pkline/papers/Slides_KS.pdf · 2015. 7. 26. · Sensitivity to Missing Data Assumptions: Theory and An Evaluation](https://reader036.fdocuments.us/reader036/viewer/2022071505/6125a75e9ed4b2615e6b17d6/html5/thumbnails/40.jpg)
1 Nominal Identified Set
2 Parametric Approximation
3 Inference
4 Changes in Wage Structure
5 CPS-SSA Analysis
Patrick Kline UC Berkeley
![Page 41: Sensitivity to Missing Data Assumptions: Theory and An Evaluation …pkline/papers/Slides_KS.pdf · 2015. 7. 26. · Sensitivity to Missing Data Assumptions: Theory and An Evaluation](https://reader036.fdocuments.us/reader036/viewer/2022071505/6125a75e9ed4b2615e6b17d6/html5/thumbnails/41.jpg)
Estimating Bounds
• Study estimators for bound functions πL(τ, k), πU (τ, k) given by:
π̂L(τ, k) ≡ infθλ′[ ∫
xx′dS(x)]−1∫
xθ(x)dS(x) s.t. q̂L(τ, k|x)≤θ(x)≤ q̂U (τ, k|x)
π̂U (τ, k) ≡ supθλ′[ ∫
xx′dS(x)]−1∫
xθ(x)dS(x) s.t. q̂L(τ, k|x)≤θ(x)≤ q̂U (τ, k|x)
• Need distribution as processes on L∞(B), where for 0 < 2ε < infx p(x):
B ≡{
(τ, k) :(i) kp(x)(1− p(x)) + 2ε ≤ τp(x) (iii) k ≤ τ(ii) kp(x)(1− p(x) + 2ε ≤ (1− τ)p(x) (iv) k ≤ 1− τ
}
Comments:• The bounds πL(τ, k) and πU (τ, k) are finite everywhere on B.• Large or small values of τ must be accompanied by small values of k.
Patrick Kline UC Berkeley
![Page 42: Sensitivity to Missing Data Assumptions: Theory and An Evaluation …pkline/papers/Slides_KS.pdf · 2015. 7. 26. · Sensitivity to Missing Data Assumptions: Theory and An Evaluation](https://reader036.fdocuments.us/reader036/viewer/2022071505/6125a75e9ed4b2615e6b17d6/html5/thumbnails/42.jpg)
Estimating Bounds
• Recall qL(τ, k|x) and qU (τ, k|x) were defined as quantiles of Fy|1,x:
qL(τ, k|x)=arg minc∈R
Qx(c|τ, τ+kp(x)) qU (τ, k|x)=arg minc∈R
Qx(c|τ, τ−kp(x))
where the family of criterion functions Qx(c|τ, b) is given by:
Qx(c|τ, b) ≡ (P (Y ≤ c,X = x,D = 1) + bP (D = 0, X = x)− τP (X = x))2
• This suggests an extremum estimation approach given by:
q̂L(τ, k|x)=arg minc∈R
Qx,n(c|τ, τ+kp̂(x)) q̂U (τ, k|x)=arg minc∈R
Qx,n(c|τ, τ−kp̂(x))
where the criterion function Qx,n(c|τ, b) is the immediate sample analogue.
Patrick Kline UC Berkeley
![Page 43: Sensitivity to Missing Data Assumptions: Theory and An Evaluation …pkline/papers/Slides_KS.pdf · 2015. 7. 26. · Sensitivity to Missing Data Assumptions: Theory and An Evaluation](https://reader036.fdocuments.us/reader036/viewer/2022071505/6125a75e9ed4b2615e6b17d6/html5/thumbnails/43.jpg)
Asymptotic Distribution
Assumptions (B)(i) Fy|1,x has a continuous bounded derivative fy|1,x(ii) fy|1,x has a continuous bounded derivative f ′y|1,x(iii) The matrix
∫xx′dS(x) is invertible.
(iv) fy|1,x is positive “over relevant range”.
Theorem Under Assumptions (A) and (B), if {Yi, Xi, Di}ni=1 is IID, then:
√n(π̂L − πLπ̂U − πU
)L−→ G ,
where G is a gaussian process on L∞(B)× L∞(B).
Patrick Kline UC Berkeley
![Page 44: Sensitivity to Missing Data Assumptions: Theory and An Evaluation …pkline/papers/Slides_KS.pdf · 2015. 7. 26. · Sensitivity to Missing Data Assumptions: Theory and An Evaluation](https://reader036.fdocuments.us/reader036/viewer/2022071505/6125a75e9ed4b2615e6b17d6/html5/thumbnails/44.jpg)
Proof Outline
Step 1: Study distribution of minimizers of Qx,n(c|τ, b) as a function of (τ, b).
• Obtain uniform asymptotic expansions for the minimizers.• Qx,n(c|τ, b) has enough structure to establish equicontinuity.
Step 2: Find distribution (q̂L, q̂U ) in L∞(B × X )× L∞(B × X ).• Simply a restriction of the process derived in Step 1.
Step 3: Establish the distribution of (π̂L, π̂U ) on L∞(B).• Straightforward due to linear program.
Patrick Kline UC Berkeley
![Page 45: Sensitivity to Missing Data Assumptions: Theory and An Evaluation …pkline/papers/Slides_KS.pdf · 2015. 7. 26. · Sensitivity to Missing Data Assumptions: Theory and An Evaluation](https://reader036.fdocuments.us/reader036/viewer/2022071505/6125a75e9ed4b2615e6b17d6/html5/thumbnails/45.jpg)
Example 1 (cont)
Suppose under MAR we find that β(τ0) 6= 0 for some specific quantile τ0.Minimal level of selection necessary to undermine this conclusion is:
k0 ≡ inf k : πL(τ0, k) ≤ 0 ≤ πU (τ0, k)
Let r(i)1−α(k) be the 1− α quantile of G(i)(τ0, k) and define:
k̂0 ≡ inf k : π̂L(τ0, k)−r(1)1−α(k)√n≤ 0 ≤ π̂U (τ0, k) +
r(2)1−α(k)√n
Then k0 ∈ [k̂0, 1] with asymptotic probability greater than or equal to 1− α.
Comments
• One sided confidence interval (rather than two sided) is natural.• Relevant critical value depends on transformation of G.
Patrick Kline UC Berkeley
![Page 46: Sensitivity to Missing Data Assumptions: Theory and An Evaluation …pkline/papers/Slides_KS.pdf · 2015. 7. 26. · Sensitivity to Missing Data Assumptions: Theory and An Evaluation](https://reader036.fdocuments.us/reader036/viewer/2022071505/6125a75e9ed4b2615e6b17d6/html5/thumbnails/46.jpg)
Example 2 (cont)
Suppose under MAR we find that β(τ) > 0 for multiple τ and recall that
k0 ≡ inf k : πL(τ, k) ≤ 0 for all τ ∈ [0, 1]
Let r1−α(k) be the 1− α quantile of supτ G(1)(τ, k)/ωL(τ, k) and define:
k̂0 ≡ inf k : supτπ̂L(τ, k)− r1−α(k)√
nωL(τ, k) ≤ 0
Then k0 ∈ [k̂0, 1] with asymptotic probability greater than or equal to 1− α.
Comments
• Weight function ωL allows to adjust for different asymptotic variances.• Result exploits uniformity in τ but not in k.
Patrick Kline UC Berkeley
![Page 47: Sensitivity to Missing Data Assumptions: Theory and An Evaluation …pkline/papers/Slides_KS.pdf · 2015. 7. 26. · Sensitivity to Missing Data Assumptions: Theory and An Evaluation](https://reader036.fdocuments.us/reader036/viewer/2022071505/6125a75e9ed4b2615e6b17d6/html5/thumbnails/47.jpg)
Example 3 (cont)
Suppose under MAR we find that β(τ) 6= 0 for multiple τ and recall that:
κ0(τ) ≡ inf k : πL(τ, k) ≤ 0 ≤ πU (τ, k)
For (ωL, ωU ) positive weight functions, let r1−α be the 1− α quantile of:
supτ,k
max{ |G(1)(τ, k)|
ωL(τ, k),|G(2)(τ, k)|ωU (τ, k)
}
Then with asymptotic probability at least 1− α for all τ , κ0(τ) lies between:
κ̂L(τ) ≡ inf k : π̂L(τ, k)− r1−α√nωL(τ, k) ≤ 0 and 0 ≤ π̂U (τ, k) +
r1−α√nωU (τ, k)
κ̂U (τ) ≡ sup k : π̂L(τ, k) +r1−α√nωL(τ, k) ≥ 0 or 0 ≥ π̂U (τ, k)− r1−α√
nωU (τ, k)
Patrick Kline UC Berkeley
![Page 48: Sensitivity to Missing Data Assumptions: Theory and An Evaluation …pkline/papers/Slides_KS.pdf · 2015. 7. 26. · Sensitivity to Missing Data Assumptions: Theory and An Evaluation](https://reader036.fdocuments.us/reader036/viewer/2022071505/6125a75e9ed4b2615e6b17d6/html5/thumbnails/48.jpg)
Weighted Bootstrap
Question How do we obtain a consistent estimator for r1−α?
Answer Perturb the objective function and recompute (weighted bootstrap).
In all examples, r1−α is quantile of L(Gω) where L is Lipschitz and
Gω(τ, k) =( G(1)(τ, k)/ωL(τ, k)G(2)(τ, k)/ωU (τ, k)
)In Particular• In Example 1 θ 7→ L(θ) is L(Gω) = G
(i)ω (τ0, k).
• In Example 2 θ 7→ L(θ) is L(Gω) = supτ G(1)ω (τ, k).
• In Example 3 θ 7→ L(θ) is L(Gω) = supτ,k max{|G(1)ω (τ, k)|, |G(2)
ω (τ, k)|}.
Goal Construct a general bootstrap procedure for quantiles of L(Gω).
Patrick Kline UC Berkeley
![Page 49: Sensitivity to Missing Data Assumptions: Theory and An Evaluation …pkline/papers/Slides_KS.pdf · 2015. 7. 26. · Sensitivity to Missing Data Assumptions: Theory and An Evaluation](https://reader036.fdocuments.us/reader036/viewer/2022071505/6125a75e9ed4b2615e6b17d6/html5/thumbnails/49.jpg)
Weighted Bootstrap
Step 1 Generate a random sample of weights {Wi} and define the criterion:
Q̃x,n(c|τ, b) ≡( 1n
n∑i=1
Wi{1{Yi ≤ c,Xi=x,Di=1}+b1{Di=0,Xi=x}−τ1{Xi=x}})2
Using Q̃x,n instead of Qx,n obtain analogues to q̂L(τ, k|x) and q̂U (τ, k|x)
q̃L(τ, k|x) = arg minc∈R
Q̃x,n(c|τ, τ+kp̃(x)) q̃U (τ, k|x) = arg minc∈R
Q̃x,n(c|τ, τ−kp̃(x))
where p̃(x) ≡ (∑iWi1{Di = 1, Xi = x})/(
∑iWi1{Xi = x}).
Step 2 Using the bounds q̃L(τ, k|x) and q̃U (τ, k|x) from Step 1, obtain:
π̃L(τ, k) ≡ infθλ′[ ∫
xx′dS(x)]−1∫
xθ(x)dS(x) s.t. q̃L(τ, k|x)≤θ(x)≤ q̃U (τ, k|x)
π̃U (τ, k) ≡ supθλ′[ ∫
xx′dS(x)]−1∫
xθ(x)dS(x) s.t. q̃L(τ, k|x)≤θ(x)≤ q̃U (τ, k|x)
Patrick Kline UC Berkeley
![Page 50: Sensitivity to Missing Data Assumptions: Theory and An Evaluation …pkline/papers/Slides_KS.pdf · 2015. 7. 26. · Sensitivity to Missing Data Assumptions: Theory and An Evaluation](https://reader036.fdocuments.us/reader036/viewer/2022071505/6125a75e9ed4b2615e6b17d6/html5/thumbnails/50.jpg)
Weighted Bootstrap
Step 3 Using the bounds π̃L(τ, k) and π̃U (τ, k) from Step 2, define:
G̃ω(τ, k) =√n(
(π̃L − π̂L)/ω̂L(π̃U − π̂U )/ω̂U
)where ω̂L(τ, k) and ω̂U (τ, k) are estimators for ωL(τ, k) and ωU (τ, k).
Step 4 Estimate r1−α, the 1− α quantile of L(Gω) by r̃1−α defined as:
r̃1−α ≡ inf{r : P
(L(G̃ω) ≥ r
∣∣∣{Yi, Xi, Di}ni=1
)≥ 1− α
}
Comments• Notice probability is conditional on {Yi, Xi, Di}ni=1 but not on {Wi}ni=1.• In practice r̃1−α can be obtained through simulations
Patrick Kline UC Berkeley
![Page 51: Sensitivity to Missing Data Assumptions: Theory and An Evaluation …pkline/papers/Slides_KS.pdf · 2015. 7. 26. · Sensitivity to Missing Data Assumptions: Theory and An Evaluation](https://reader036.fdocuments.us/reader036/viewer/2022071505/6125a75e9ed4b2615e6b17d6/html5/thumbnails/51.jpg)
Weighted Bootstrap
Assumptions (C)(i) ωL and ωU are strictly positive and continuous on B.(ii) ω̂L and ω̂U are uniformly consistent on B.(iii) W is positive a.s. independent of (Y,X,D).(iv) W satisfies E[W ] = 1 and V ar(W ) = 1.(v) The transformation L is Lipschitz continuous.(vi) The cdf of L(Gω) is strictly increasing and continuous at r1−α.
Theorem Under Assumptions (A)-(C), if {Yi, Xi, Di,Wi}ni=1 are IID, then:
r̃1−αp→ r1−α
Patrick Kline UC Berkeley
![Page 52: Sensitivity to Missing Data Assumptions: Theory and An Evaluation …pkline/papers/Slides_KS.pdf · 2015. 7. 26. · Sensitivity to Missing Data Assumptions: Theory and An Evaluation](https://reader036.fdocuments.us/reader036/viewer/2022071505/6125a75e9ed4b2615e6b17d6/html5/thumbnails/52.jpg)
1 Nominal Identified Set
2 Parametric Approximation
3 Inference
4 Changes in Wage Structure
5 CPS-SSA Analysis
Patrick Kline UC Berkeley
![Page 53: Sensitivity to Missing Data Assumptions: Theory and An Evaluation …pkline/papers/Slides_KS.pdf · 2015. 7. 26. · Sensitivity to Missing Data Assumptions: Theory and An Evaluation](https://reader036.fdocuments.us/reader036/viewer/2022071505/6125a75e9ed4b2615e6b17d6/html5/thumbnails/53.jpg)
Roadmap
Goal: Revisit results of Angrist, Chernozhukov and Fernandez-Val (2006)regarding changes across Decennial Censuses in quantile specific returnsto schooling.• Assess sensitivity of results to deviations from MAR.
Then... How worried should we be?• Investigate nature of deviations from MAR in matched CPS-SSA data• Test for and measure departures from ignorability using KS metric.
Patrick Kline UC Berkeley
![Page 54: Sensitivity to Missing Data Assumptions: Theory and An Evaluation …pkline/papers/Slides_KS.pdf · 2015. 7. 26. · Sensitivity to Missing Data Assumptions: Theory and An Evaluation](https://reader036.fdocuments.us/reader036/viewer/2022071505/6125a75e9ed4b2615e6b17d6/html5/thumbnails/54.jpg)
Roadmap
Goal: Revisit results of Angrist, Chernozhukov and Fernandez-Val (2006)regarding changes across Decennial Censuses in quantile specific returnsto schooling.• Assess sensitivity of results to deviations from MAR.
Then... How worried should we be?• Investigate nature of deviations from MAR in matched CPS-SSA data• Test for and measure departures from ignorability using KS metric.
Patrick Kline UC Berkeley
![Page 55: Sensitivity to Missing Data Assumptions: Theory and An Evaluation …pkline/papers/Slides_KS.pdf · 2015. 7. 26. · Sensitivity to Missing Data Assumptions: Theory and An Evaluation](https://reader036.fdocuments.us/reader036/viewer/2022071505/6125a75e9ed4b2615e6b17d6/html5/thumbnails/55.jpg)
Quantile Specific Returns
Like Angrist, Chernozhukov and Fernandez-Val (2006) we estimate:
Yi = X ′iγ(τ) + Eiβ(τ) + εi P (εi ≤ 0|Xi, Ei) = τ
where Yi is log average weekly earnings, Ei is years of schooling, and Xi
consists of intercept, black dummy and quadratic in potential experience.
Sample Restrictions
• 1% Unweighted Extracts of 1980, 1990, 2000 PUMS Samples.• Black and white men age 40-49 with education ≥ 6 years.• Yi treated as missing for all obs with allocated earnings or weeks
worked.
Patrick Kline UC Berkeley
![Page 56: Sensitivity to Missing Data Assumptions: Theory and An Evaluation …pkline/papers/Slides_KS.pdf · 2015. 7. 26. · Sensitivity to Missing Data Assumptions: Theory and An Evaluation](https://reader036.fdocuments.us/reader036/viewer/2022071505/6125a75e9ed4b2615e6b17d6/html5/thumbnails/56.jpg)
Data quality is deteriorating
Table: Fraction of Observations in Estimation Sample with Missing Weekly Earnings
Census Total Number Allocated Allocated Fraction of TotalYear of Observations Earnings Weeks Worked Missing1980 80,128 12,839 5,278 19.49%1990 111,070 17,370 11,807 23.09%2000 131,265 26,540 17,455 27.70%Total 322,463 56,749 34,540 23.66%
Patrick Kline UC Berkeley
![Page 57: Sensitivity to Missing Data Assumptions: Theory and An Evaluation …pkline/papers/Slides_KS.pdf · 2015. 7. 26. · Sensitivity to Missing Data Assumptions: Theory and An Evaluation](https://reader036.fdocuments.us/reader036/viewer/2022071505/6125a75e9ed4b2615e6b17d6/html5/thumbnails/57.jpg)
Figure: Worst Case Nonparametric Bounds on 1990 Medians and Linear Model Fitsfor Two Experience Groups of White Men.
4.5
55.
56
6.5
77.
58
8.5
Log
Wee
kly
Ear
ning
s
8 10 12 14 16 18 20Years of Schooling
95% Conf Region (Exp=26) 95% Conf Region (Exp=20)Estimated Bounds (Exp=26) Estimated Bounds (Exp=20)Parametric Set (Exp=26) Parametric Set (Exp=20)
Patrick Kline UC Berkeley
![Page 58: Sensitivity to Missing Data Assumptions: Theory and An Evaluation …pkline/papers/Slides_KS.pdf · 2015. 7. 26. · Sensitivity to Missing Data Assumptions: Theory and An Evaluation](https://reader036.fdocuments.us/reader036/viewer/2022071505/6125a75e9ed4b2615e6b17d6/html5/thumbnails/58.jpg)
Figure: Nonparametric Bounds on 1990 Medians and Best Linear Approximationsfor Two Experience Groups of White Men Under S(F ) ≤ 0.05.
4.5
55.
56
6.5
77.
58
8.5
Log
Wee
kly
Ear
ning
s
8 10 12 14 16 18 20Years of Schooling
95% Conf Region (Exp=26) 95% Conf Region (Exp=20)Estimated Bounds (Exp=26) Estimated Bounds (Exp=20)Set of BLAs (Exp=26) Set of BLAs (Exp=20)
Patrick Kline UC Berkeley
![Page 59: Sensitivity to Missing Data Assumptions: Theory and An Evaluation …pkline/papers/Slides_KS.pdf · 2015. 7. 26. · Sensitivity to Missing Data Assumptions: Theory and An Evaluation](https://reader036.fdocuments.us/reader036/viewer/2022071505/6125a75e9ed4b2615e6b17d6/html5/thumbnails/59.jpg)
Figure: Uniform Confidence Regions for Schooling Coefficients by Quantile and YearUnder Missing at Random Assumption (S(F ) = 0).
0.0
5.1
.15
.2β(
τ)
.1 .2 .3 .4 .5 .6 .7 .8 .9τ
1980 1990 2000
Patrick Kline UC Berkeley
![Page 60: Sensitivity to Missing Data Assumptions: Theory and An Evaluation …pkline/papers/Slides_KS.pdf · 2015. 7. 26. · Sensitivity to Missing Data Assumptions: Theory and An Evaluation](https://reader036.fdocuments.us/reader036/viewer/2022071505/6125a75e9ed4b2615e6b17d6/html5/thumbnails/60.jpg)
Figure: Uniform Confidence Regions for Schooling Coefficients by Quantile and YearUnder S(F ) ≤ 0.05.
0.0
5.1
.15
.2β(
τ)
.1 .2 .3 .4 .5 .6 .7 .8 .9τ
1980 1990 2000
Patrick Kline UC Berkeley
![Page 61: Sensitivity to Missing Data Assumptions: Theory and An Evaluation …pkline/papers/Slides_KS.pdf · 2015. 7. 26. · Sensitivity to Missing Data Assumptions: Theory and An Evaluation](https://reader036.fdocuments.us/reader036/viewer/2022071505/6125a75e9ed4b2615e6b17d6/html5/thumbnails/61.jpg)
Figure: Uniform Confidence Regions for Schooling Coefficients by Quantile and YearUnder S(F ) ≤ 0.175 (1980 vs. 1990).
0.0
5.1
.15
.2β(
τ)
.2 .3 .4 .5 .6 .7 .8τ
1980 1990
Patrick Kline UC Berkeley
![Page 62: Sensitivity to Missing Data Assumptions: Theory and An Evaluation …pkline/papers/Slides_KS.pdf · 2015. 7. 26. · Sensitivity to Missing Data Assumptions: Theory and An Evaluation](https://reader036.fdocuments.us/reader036/viewer/2022071505/6125a75e9ed4b2615e6b17d6/html5/thumbnails/62.jpg)
Figure: Confidence Intervals for Fitted Values Under S(F ) ≤ 0.05.
45
67
8Lo
g W
eekl
y E
arni
ngs
8 10 12 14 16 18 20Years of Schooling
τ=.1, 1980 τ=.5, 1980 τ=.9, 1980τ=.1, 1990 τ=.5, 1990 τ=.9, 1990τ=.1, 2000 τ=.5, 2000 τ=.9, 2000
Patrick Kline UC Berkeley
![Page 63: Sensitivity to Missing Data Assumptions: Theory and An Evaluation …pkline/papers/Slides_KS.pdf · 2015. 7. 26. · Sensitivity to Missing Data Assumptions: Theory and An Evaluation](https://reader036.fdocuments.us/reader036/viewer/2022071505/6125a75e9ed4b2615e6b17d6/html5/thumbnails/63.jpg)
Distributional Sensitivity
Found critical k at which π80U (τ, k) ≥ π90
L (τ, k) for all τ .
... more informative find a τ specific critical k for each τ .
Define τ -“breakdown” point κ0(τ) as the smallest k ∈ [0, 1] for which
π80U (τ, k)− π90
L (τ, k) ≥ 0
⇒ pointwise defines a function κ0 which at each τ gives critical k.
Comments• κ0 function summarizes distributional sensitivity to MAR assumption.• Use (τ, k) uniformity to build confidence interval for κ0(τ) uniform in τ .
Patrick Kline UC Berkeley
![Page 64: Sensitivity to Missing Data Assumptions: Theory and An Evaluation …pkline/papers/Slides_KS.pdf · 2015. 7. 26. · Sensitivity to Missing Data Assumptions: Theory and An Evaluation](https://reader036.fdocuments.us/reader036/viewer/2022071505/6125a75e9ed4b2615e6b17d6/html5/thumbnails/64.jpg)
00.2
0.40.6
0.81
0
0.1
0.2
0.3
0.40.06
0.07
0.08
0.09
0.1
0.11
0.12
0.13
tau
1980 Sample Upper Bound
k
dY/d
E
Patrick Kline UC Berkeley
![Page 65: Sensitivity to Missing Data Assumptions: Theory and An Evaluation …pkline/papers/Slides_KS.pdf · 2015. 7. 26. · Sensitivity to Missing Data Assumptions: Theory and An Evaluation](https://reader036.fdocuments.us/reader036/viewer/2022071505/6125a75e9ed4b2615e6b17d6/html5/thumbnails/65.jpg)
00.2
0.40.6
0.81
0
0.1
0.2
0.3
0.40.04
0.05
0.06
0.07
0.08
0.09
0.1
0.11
0.12
0.13
0.14
tau
1990 Sample Lower Bound
k
dY/d
E
Patrick Kline UC Berkeley
![Page 66: Sensitivity to Missing Data Assumptions: Theory and An Evaluation …pkline/papers/Slides_KS.pdf · 2015. 7. 26. · Sensitivity to Missing Data Assumptions: Theory and An Evaluation](https://reader036.fdocuments.us/reader036/viewer/2022071505/6125a75e9ed4b2615e6b17d6/html5/thumbnails/66.jpg)
Patrick Kline UC Berkeley
![Page 67: Sensitivity to Missing Data Assumptions: Theory and An Evaluation …pkline/papers/Slides_KS.pdf · 2015. 7. 26. · Sensitivity to Missing Data Assumptions: Theory and An Evaluation](https://reader036.fdocuments.us/reader036/viewer/2022071505/6125a75e9ed4b2615e6b17d6/html5/thumbnails/67.jpg)
Figure: Breakdown Curve (1980 vs 1990).
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.90
0.05
0.1
0.15
0.2
0.25
τ
k
Point Estimate95% Confidence Interval
Patrick Kline UC Berkeley
![Page 68: Sensitivity to Missing Data Assumptions: Theory and An Evaluation …pkline/papers/Slides_KS.pdf · 2015. 7. 26. · Sensitivity to Missing Data Assumptions: Theory and An Evaluation](https://reader036.fdocuments.us/reader036/viewer/2022071505/6125a75e9ed4b2615e6b17d6/html5/thumbnails/68.jpg)
1 Nominal Identified Set
2 Parametric Approximation
3 Inference
4 Changes in Wage Structure
5 CPS-SSA Analysis
Patrick Kline UC Berkeley
![Page 69: Sensitivity to Missing Data Assumptions: Theory and An Evaluation …pkline/papers/Slides_KS.pdf · 2015. 7. 26. · Sensitivity to Missing Data Assumptions: Theory and An Evaluation](https://reader036.fdocuments.us/reader036/viewer/2022071505/6125a75e9ed4b2615e6b17d6/html5/thumbnails/69.jpg)
How worried should we be?
Goal: Employ 1973 CPS-SSA File to assess S(F ).
Data on SSA and IRS earnings for respondents to March CPS
Sample Restrictions
• Black and white men between ages of 25 and 55.• More than 6 years of schooling.• Must have reported working at least one week in past year.• Drop self-employed and occupations likely to receive tips.• Drop observations with IRS earnings ≤ $1000 or ≥ $50000.
Comments
• Roughly 7.2% of observations have unreported CPS earnings.• Use IRS rather than SSA earnings due to topcoding.
Patrick Kline UC Berkeley
![Page 70: Sensitivity to Missing Data Assumptions: Theory and An Evaluation …pkline/papers/Slides_KS.pdf · 2015. 7. 26. · Sensitivity to Missing Data Assumptions: Theory and An Evaluation](https://reader036.fdocuments.us/reader036/viewer/2022071505/6125a75e9ed4b2615e6b17d6/html5/thumbnails/70.jpg)
Assessing S(F )
Define,
pL(x, τ)≡P (D=1|X=x, Fy|x(Y )≤τ) pU (x, τ)≡P (D=1|X=x, Fy|x(Y )>τ)
Leads to alternative expression for distance between Fy|1,x and Fy|0,x
|Fy|1,x(q(τ |x))−Fy|0,x(q(τ |x))| =√
(pL(x, τ)− p(x))(pU (x, τ)− p(x))τ(1− τ)
p(x)(1− p(x))
Comments• Emphasizes the effect of selection.• Only need estimate of P (D = 1|X = x, Fy|x(Y ) = τ).• Use earnings information on nonrespondents to estimate selection.
Patrick Kline UC Berkeley
![Page 71: Sensitivity to Missing Data Assumptions: Theory and An Evaluation …pkline/papers/Slides_KS.pdf · 2015. 7. 26. · Sensitivity to Missing Data Assumptions: Theory and An Evaluation](https://reader036.fdocuments.us/reader036/viewer/2022071505/6125a75e9ed4b2615e6b17d6/html5/thumbnails/71.jpg)
Three Logit Models
P (D = 0|X = x, Fy|x(Y ) = τ) = Λ(β1τ + β2τ2 + δx) (1)
P (D = 0|X = x, Fy|x(Y ) = τ) = Λ(β1τ + β2τ2 + γ1δxτ + γ2δxτ
2 + δx) (2)
P (D = 0|X = x, Fy|x(Y ) = τ) = Λ(β1,xτ + β2,xτ2 + δx) (3)
Comments• Five year age categories, Four schooling (< 12, 12, 13− 15, 16).• Drop small cells (< 50 obs).• Only need estimate of P (D = 1|X = x, Fy|x(Y ) = τ).• Model (2) substantially increases Likelihood over model (1).• LR test cannot reject model (2) for model (3).
Patrick Kline UC Berkeley
![Page 72: Sensitivity to Missing Data Assumptions: Theory and An Evaluation …pkline/papers/Slides_KS.pdf · 2015. 7. 26. · Sensitivity to Missing Data Assumptions: Theory and An Evaluation](https://reader036.fdocuments.us/reader036/viewer/2022071505/6125a75e9ed4b2615e6b17d6/html5/thumbnails/72.jpg)
Table: Logit Estimates of P (D = 0|X = x, Fy|x(Y ) = τ) in 1973 CPS-IRS Sample
Model 1 Model 2 Model 3b1 -1.06 0.05
(0.43) (5.44)b2 1.09 3.75
(0.41) (4.08)γ1 0.45
(2.30)γ2 1.15
(1.73)Log-Likelihood -3,802.91 -3798.48 -3759.97Parameters 37 39 105Number of observations 15,027 15,027 15,027Demographic Cells 35 35 35Ages 25-55Min KS Distance 0.02 0.02 0.01Median KS Distance 0.02 0.05 0.12Max KS Distance (S(F )) 0.02 0.17 0.67Ages 40-49Min KS Distance 0.02 0.02 0.01Median KS Distance 0.02 0.05 0.08Max KS Distance (S(F )) 0.02 0.09 0.39Note: Asymptotic standard errors in parentheses.
Patrick Kline UC Berkeley
![Page 73: Sensitivity to Missing Data Assumptions: Theory and An Evaluation …pkline/papers/Slides_KS.pdf · 2015. 7. 26. · Sensitivity to Missing Data Assumptions: Theory and An Evaluation](https://reader036.fdocuments.us/reader036/viewer/2022071505/6125a75e9ed4b2615e6b17d6/html5/thumbnails/73.jpg)
Comments
MAR clearly violated• Very high and very low earning individuals mostly likely to have missing
earnings on average.• But missingness pattern appears to be heterogenous across
demographic cells.• Difficult to have guessed pattern a priori.
Degree of Heterogeneity Affects Bottom Line• Model 1: S(F ) = 0.02
• Model 2: S(F ) = 0.09
• Model 3: S(F ) = 0.39
Patrick Kline UC Berkeley
![Page 74: Sensitivity to Missing Data Assumptions: Theory and An Evaluation …pkline/papers/Slides_KS.pdf · 2015. 7. 26. · Sensitivity to Missing Data Assumptions: Theory and An Evaluation](https://reader036.fdocuments.us/reader036/viewer/2022071505/6125a75e9ed4b2615e6b17d6/html5/thumbnails/74.jpg)
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10
0.02
0.04
0.06
0.08
0.1
0.12
0.14
0.16
0.18Distance Between Missing and Nonmissing CDFs by Quantile of IRS Earnings and Demographic Cell
τ
k
Patrick Kline UC Berkeley
![Page 75: Sensitivity to Missing Data Assumptions: Theory and An Evaluation …pkline/papers/Slides_KS.pdf · 2015. 7. 26. · Sensitivity to Missing Data Assumptions: Theory and An Evaluation](https://reader036.fdocuments.us/reader036/viewer/2022071505/6125a75e9ed4b2615e6b17d6/html5/thumbnails/75.jpg)
Conclusion
Theory: When data are poor, useful to check sensitivity to violations ofMAR.• KS provides natural metric for assessing violations of MAR.• Methods developed here enable study of parametric approximating
models.• And allow for assessment of distributional sensitivity to MAR
assumption.
Empirics: Reexamine the quantile specific returns to education.• Measured changes in wage structure between 1980-1990 fairly robust
(except at low end of distribution).• But changes over 1990-2000 easily confounded by a bit of selection
and deterioration in quality of Census data.• 1973 CPS-SSA file provides evidence of selection and heterogeneity.
Patrick Kline UC Berkeley
![Page 76: Sensitivity to Missing Data Assumptions: Theory and An Evaluation …pkline/papers/Slides_KS.pdf · 2015. 7. 26. · Sensitivity to Missing Data Assumptions: Theory and An Evaluation](https://reader036.fdocuments.us/reader036/viewer/2022071505/6125a75e9ed4b2615e6b17d6/html5/thumbnails/76.jpg)
Conclusion
Theory: When data are poor, useful to check sensitivity to violations ofMAR.• KS provides natural metric for assessing violations of MAR.• Methods developed here enable study of parametric approximating
models.• And allow for assessment of distributional sensitivity to MAR
assumption.
Empirics: Reexamine the quantile specific returns to education.• Measured changes in wage structure between 1980-1990 fairly robust
(except at low end of distribution).• But changes over 1990-2000 easily confounded by a bit of selection
and deterioration in quality of Census data.• 1973 CPS-SSA file provides evidence of selection and heterogeneity.
Patrick Kline UC Berkeley