Generating Plausible Causal Hypotheses
description
Transcript of Generating Plausible Causal Hypotheses
Generating Plausible Causal Hypotheses
ByLarry V. Hedges
Northwestern University
Presented at the 2010 IES Research Conference
GoalsProvide a brief introduction to causal inference
Explain why experiments provide model free estimates of causal effects
Examine the possibility of causal inference from a few quasi-experimental designs
-Assignment based on a covariate
-Regression discontinuity design
-Nonequivalent control group design
Examine the difference in differences approach in more detail
What is Causal Inference?We all think we know what we mean by cause and effect
But a formal treatment is useful
It turns out that there are several treatments of cause and effect
The modern statistical approach is often called the Rubin-Holland-Rosenbaum model
(But its roots go back as far as Neyman, 1923)
The Rubin Holland ModelKey concepts
Units (e.g., individuals)
Treatments (e.g., 0, 1)
Responses (e.g., r0, r1)ri
0 the response of unit i if it got treatment 0ri
1 the response of unit i if it got treatment 1
Causal effect of treatment 1 versus 0 on unit iτi = ri
1 – ri0
The Rubin Holland ModelThe definition of the causal effect of treatment 1 versus 0 on unit i
τi = ri1 – ri
0
• This is a relative definition: The effect of treatment 1 compared to treatment 0
• This is a counterfactual definition, you can’t observe both ri0 and ri
1
• The (relative) causal effect of a treatment on a single unit cannot be estimated without additional assumptions(Although with additional assumptions single subject designs attempt to do so)
Causal Inference and Missing DataNote that causal inference is a missing data problem
You cannot observe both ri0 and ri
1—one of them is always missing
Not surprisingly, modern ideas for causal inference sometimes draw on modern ideas for handling missing data
Missing data methods try to find conditions that reduce the missing data to be (conditionally) as if “random sampling”
Methods for causal inference try to find conditions that reduce the missing data to be (conditionally) “as if” random assignment
We will discuss some of these later
The Rubin Holland Model
Example
Note that we assumethat both ri
0 and ri1 are
known for the purposes of illustration
Unit r1 r0 τ
1 20 10 10
2 20 10 10
3 20 10 10
4 20 10 10
5 11 20 -9
6 11 20 -9
7 11 20 -98 11 20 -9
The Rubin Holland ModelExample
Any particular experimentwould assign some unitsto treatment, others to control, so some ri
0’s would be observed, some ri
0’s would beobserved
Unit r1 r0 τ
1 20 10 10
2 20 10 10
3 20 10 10
4 20 10 10
5 11 20 -9
6 11 20 -9
7 11 20 -98 11 20 -9
The Rubin Holland ModelExample
Each possible experimentwould get a differentaverage treatment effect but the average over allpossible assignmentswould be the averagetreatment effect
Unit r1 r0 τ
1 20 10 10
2 20 10 10
3 20 10 10
4 20 10 10
5 11 20 -9
6 11 20 -9
7 11 20 -98 11 20 -9
The Rubin Holland Model
Example
Note that assigning thebest treatment to a unitdoes not give an unbiasedestimate of the averagetreatment effect
Unit r1 r0 τ
1 20 10 10
2 20 10 10
3 20 10 10
4 20 10 10
5 11 20 -9
6 11 20 -9
7 11 20 -98 11 20 -9
The Rubin Holland ModelRandomized experiments
Define the assignment variable Z via Z = 0 if a unit gets control and Z = 1 if a unit gets treatment
Random assignment means that
Therefore (r0, r1) is independent of Z (assignment)
0 0 0
1 1 1
E | 0 E | 1 E
E | 0 E | 1 E
r Z r Z r
r Z r Z r
The Rubin Holland ModelRandomized experiments give model free estimates of
the average (relative) causal effect of a treatment
Why?
Because, independence of Z (assignment) and (r0, r1) implies
.
0 1 0 1 0E E E E | 1 E | 0r r r r r Z r Z
The Rubin Holland ModelThis is all very simple
But this is deceptive
I have already embedded assumptions into the model (as had Rubin, 1974)
Why are there only 2 possible outcomes?
What if the treatment I get affects your response to treatment?
This assumption is called “no interference between units” (e.g., Cox, 1958) or the stable unit treatment value (SUTV) (e.g., Rubin, assumption
The Rubin Holland ModelSUTV can be wrong!
Consider response to vaccines
The response to the smallpox vaccine (or not) depends on who else is vaccinated
This is how eradication is possible
Consider classrooms or schools where social interaction is possible (indeed probable)
Contamination is a violation of SUTV
The Rubin Holland ModelSome associations cannot be causal
Suppose one of ri0 or ri
1 does not exist
• Some individuals would never accept treatment (refusers)
• Some individuals would always get treatment (always takers)
• Some individuals would always do the opposite of what they were assigned (defiers)
This leads to the concept of compliers and complier average treatment effect
The Rubin Holland ModelOn a more philosophical level, not all “what if” questions have causal
answers
The idea of a randomized experiment helps clarify what effects might be causal
If you cannot imagine an experiment that assigns the treatments being compared, it may not be sensible to talk of causal effects
It may not be sensible to talk of sex differences as causal effects
But, it might be sensible to talk of gender (social) differences causal effects
The Rubin Holland ModelSimilarly, it may not make sense to talk about causal
effects of treatments on
• Never takers
• Always takers
• Defiers
It makes sense to explicitly limit the scope of our attempts at causal inference to the compliers
Scope of Causal InferenceRandomized experiments give model-free estimates of average causal
effects
Is there any other way to get them?
No other model-free methods are known
Many other methods can give estimates of causal effects given that a model is true
The key problem with these methods is that the model must be assumed to be true, and the model assumptions are often difficult or impossible to verify
But such methods are useful when experiments cannot be done or to suggest plausible causal hypothese
Estimating Treatment EffectsConsider treatment assignment (dummy variable) Z and outcome Y
Regress Y on Z
Yi = β0 + β1 Zi + εi
The estimate of β1 is just the difference between the mean Y for Z = 1 (the treatment group) and the mean Y for Z = 0 (the control group)
Thus the OLS estimate is
= β 1 +
1 0 1 1
0 0 0
Y β β ε
Y β ε
1 0Y Y 1 0
Estimating Treatment Effects(With Random Assignment)
If the treatment is randomly assigned, then Z is uncorrelated with ε (X is exogenous)
If X is uncorrelated with ε if and only if
But if , then the mean difference is
= β 1 + = β 1
This implies that standard methods (OLS) give an unbiased estimate of β1, which is the average treatment effect
That is, the treatment-control mean difference is an unbiased estimate of β1,
1 0
1 0
1 0Y Y 1 0
What goes wrong without randomization?(Simple Case)
If we do not have randomization, there is no guarantee that Z is uncorrelated with ε (Z may be endogenous)
Thus the OLS estimate is still
= β1 +
If Z is correlated with ε, then
Hence does not estimate β1, but some other quantity that depends on the correlation of Z and ε
If Z is correlated with ε, then standard methods give a biased estimate of β1
1 0Y Y 1 0
1 0Y Y
1 0
Instrumental VariablesOne way to see this is in terms of two regression equations
Yi = β0 + β1Zi + εi
Zi = γ0 + γ1Xi + ηi
Note that, in this model Z is endogenous (may be correlated with ε)
The instrumental variables model requires that:
1. γ1 ≠ 0 so that X predicts Z, and
2. X uncorrelated with ε (X is exogenous) [Cov{ε, X} = 0]
Estimating Causal Effects (IV Studies)
Angrist, Imbens, & Rubin (1996) showed that IV can estimate average causal effects of Z on Y, if the following assumptions hold:
1. SUTVA
2. Random assignment of X
3. Exclusion restriction (exogeneity of X)
4. Nonzero causal effect of X on Z
5. Monotonicity (no defiers)
Then the IV estimate is an estimate of the average treatment effect for those who comply with assignment
Assignment by Covariate ValueLet X be a covariate and x be the value of X
Suppose that units with the same X value are randomly assigned with probability π(x), where 0 < π(x) < 1
Thus
Conditional independence of Z (assignment) and (r0, r1) given X implies
Thus the experiment estimates the conditional causal effect given X
0 0 0
1 1 1
E | , 0 E | , 1 E |
E | , 0 E | , 1 E |
r X Z r X Z r X
r X Z r X Z r X
0 1 0 1 0E | E | E E | , 1 E | , 0r r X r X r X r X Z r X Z
Assignment by Covariate ValueThe conditional causal effect of treatment τ(x) might be called the local
average treatment effect at X = x
The weighted average of local average treatment effects
estimates the average causal effect of treatment
Note that the overall treatment-control mean difference (even controlling for X) does not necessarily estimate the average causal effect of treatment, because there may be more
x
x x
Regression Discontinuity DesignsRegression discontinuity designs (RDD) assign to treatment by
covariate value, but assign all units with X > c to treatment
but violate the principle that 0 < π(x) < 1
However, RDDs can estimate the local average causal effect of the treatment at X = x
The reason is that the RDD is a randomized experiment at the cutpoint X = c
More properly, the limit as x → c is a randomized experiment.
10
if x cx
if x c
Regression Discontinuity DesignsNote that the RDD design can support estimation of causal effects,
The causal effect that can be estimated, τ(c), is
In other words, the causal effect (local average treatment effect) at the value X = c, which is the gap or discontinuity at X = c
But not every analysis of the design estimates the causal effect
Analyses that use models assuming functional form (e.g., linear regression) depend on that functional form assumption
1 0lim | 1 lim | 0x c x c
c E r Z E r Z
Regression Discontinuity Designs
Nonparametric regression methods can, in principle, provide model-free estimates of the causal effect of treatment at X = c
But these methods themselves make technical assumptions (e.g., about bandwidth, etc.)
Thus estimation of treatment effects in RDD are in practice somewhat model dependent
Designs with multiple cutpoints can provide estimates of treatment effects at multiple points or more externally valid average causal effects
Nonequivalent Control Group Designs
These designs compare a treatment group with a (non-randomized) comparison group
There is a huge range of quality in these designs, ranging from pretty good to awful
Often matching or adjustment for covariates (a form of pseudo-matching), or both, are used
Can such designs ever provide estimates of average causal effects?
Yes, but essentially never estimates that are model free
Nonequivalent Control Group Designs
How well they work depends on how well the analytic model captures essential features of the data
This is not always possible to determine empirically
If we can assume conditional independence of Z (assignment) and (r0, r1) given X or even that
Then the experiment can estimate the causal effect of treatment, since
.
0 0
1 1
E | , 0 E |
E | , 1 E |
r X Z r X
r X Z r X
0 1 0 1 0E | E | E E | , 1 E | , 0r r X r X r X r X Z r X Z
Nonequivalent Control Group Designs
Note that this is the equivalent of making the treatment assignment “as if random” conditional on the covariate (or matching variable) X
This is the basic strategy of matching for causal inference (e.g., Rubin, Rosenbaum, Cochran)
It is also the basic strategy for inference under missing dataFind covariates so that, conditional on the observed covariates, the missing data is “as if random”
In missing data theory, this is called “strong ignorability”
Nonequivalent Control Group Designs
This is all very abstract
Make it concrete by considering response functions—that is r0 or r1 as a function of covariates or other effects
For example, suppose that
ri0 = α + βxi + εi
0
ri1 = α + τ + βxi + εi
1
and that εi0 and εi
1 are independent of x
Then it easily follows that the usual estimate of the average treatment effect is unbiased
Nonequivalent Control Group Designs
But suppose that the response functions are a little different
ri0 = α + β0xi + εi
0
ri1 = α + τ + β1xi + εi
1
and that εi0 and εi
1 are independent of x
Then it easily follows that the usual estimate of the treatment effect is
where is an “average” of β0 and β1
1 0x x
Nonequivalent Control Group Designs
The analysis could be “fixed up” to remove the bias if we knew the response function
But that is exactly the point
To get an unbiased estimate of the causal effect, you have to know the right model, so analyses will be model dependent
It is not easy (maybe impossible) to know what the right model is
Moreover, I choose a very simple model (homogeneous treatment effects with responses a linear function of the observed covariates)
Differences in DifferencesThe difference in differences idea can be seen as a particular kind of
nonequivalent control group design
It is frequently used to evaluate the effects of policies in education and elsewhere
Assume that there is a series of longitudinal observations in locations (e.g., states) where a policy has been implemented at some time in some locations
Crudely, we estimate the effect of a policy by comparing • the difference in outcome before and after the policy is implemented
for individuals affected by the policy, compared to • the difference for individuals unaffected by the policy
That is why it is a difference in differences estimator
Differences in DifferencesMore elaborate (and convincing) analyses control for location and time or
model variation as random effects
Let Yist be the outcome for individual i in location s at time t
Let Xist be the corresponding individual level covariates
Then the model might be
Yist = αs + πt + γXist + βTst + εist
where αs and πt are location and time fixed effects, is a vector of covariate effects, Tst is a dummy variable for treatment, and εist is a residual
There may be clustering by location, which needs to be taken into account
Differences in DifferencesObviously the difference in differences estimator has great
appeal
Given a good longitudinal data set, it is easy to use
It is simple to understand and explain to policy makers
It is a natural analysis to learn from “natural experiments” where a policy has been tried some place and not others or has been tried at different times in different locations
Differences in DifferencesThis model may seem hard to formulate in causal model terms
The treatment effect is identified by the difference between post-policy and pre-policy outcomes, in the treatment (got policy) group versus the control group
Let ri0 and ri
0 be the possible outcomes after treatment and X be the pretreatment variable
This estimate is estimating
It can estimate the average causal effect under several circumstances
1 0 1 0E | 1 E | 0 E Er X Z r X Z r r not
Differences in DifferencesThis estimate is estimating
It can estimate the average causal effect under some circumstances
For example, if the response functions are
ri0 = αi + xi + εi
0
ri1 = αi + τ + xi + εi
1
and that εi0 and εi
1 are independent of xi, then the difference in differences estimate does estimate τ, the average causal effect of treatment
1 0E | 1 E | 0r X Z r X Z
What Can Go Wrong?One big problem
Z can be correlated with (r0 – X , r1 – X)
• X can cause both the policy and be correlated with outcome
• Something else can cause both X and Z
• This is the general endogeneity problem
What Can Go Wrong?Informal checks
• Look at trends beyond the time of policy implementation
• Estimate effects of treatment where there is no policy change as a check (you should see no effect)
These are suggestive not definitive
They can invalidate an analysis, not validate one
What Can Go Wrong?One smaller problem
The data often exhibit large autocorrelations, and this can lead to large underestimates of standard errors, making tests reject (far) too often
There are three reasons for this:
• Data are often based on long time series
• Data are highly positively correlated over time
• The treatment variable does not change much
What Can Go Wrong?The standard error problem is difficult to solve
Parametric analysis (generalized least squares with autocorrelation) can be done, but inference for autocorrelation is poor
Randomization tests seem to perform well for problems like these
Collapsing the data into two time periods is sometimes useful and improves performance of tests
Conclusion
Without randomization, causal inference is much harder and more model dependent
ReferencesAbadie, A. (2000). Semiparametric Difference-in-Differences
Estimators, Working Paper, Kennedy School of Government, Harvard University.
Bertrand, M., Duflo, E., & Mullainathan, S. (2001). How much should we trust difference in differences estimators? MIT Department of Economics Working Paper Series 01-34.
Meyer, B. (1995). Natural and Quasi-Natural Experiments in Economics, Journal of Business and Economic Statistics, 13, 151-162.
Moulton, B. R. (1990). An Illustration of a Pitfall in Estimating the Effects of Aggregate Variables in Micro Units, Review of Economics and Statistics, 72 , 334-338.
References (cont.)Newey, W. & West, K. D. (1987). A Simple, Positive Semi-definite,
Heteroskedasticity and Autocorrelation Consistent-Covariance Matrix,” Econometrica, 55, 703-708.
Nickell, S. (1981). Biases in Dynamic Models with Fixed Effects, Econometrica, 49,1417-1426.
Rosenbaum, P. (1993). Hodges-Lehmann Point Estimates of Treatment Effect in Observational Studies, Journal of the American Statistical Association, 88, 1250-1253.
Rosenbaum, P. (1996). Observational Studies and Nonrandomized Experiments, In S. Ghosh and C.R.Rao, (Eds), Handbook of Statistics, 13.
Solon, G. (1984). Estimating Auto-correlations in Fixed-Effects Models, NBER Technical Working Paper No. 32, 1984.