Randomized Controlled Trials: How Did We Get Here? · Randomized Controlled Trials: How Did We Get...
Transcript of Randomized Controlled Trials: How Did We Get Here? · Randomized Controlled Trials: How Did We Get...
Why Policy Makers Find IE Valuable?
Need evidence on what works
Information key to sustainability
Improve program/policy implementation
1
2
3
Limited budget and bad policies could hurt
o Design (eligibility, benefits)
o Budget and beliefs
• Most influential at time of design
– relay experience of other countries
– International public good
When is IE Most Valuable for Policy?
Evaluate impact when project is: o Evaluation will fill knowledge gap
Innovative
Replicable/scalable
o Substantial policy impact
Strategically relevant for policy decisions
Affects many and has large budget
Use evaluation within a program to test alternatives and improve programs
Types of Evaluation
• Efficacy – Small scale proof of concept
– Implemented under ideal conditions
– Test theory
– Limited external validity
• Effectiveness – Large scale in context of program rollout
– Implemented as in field
– Better external validity
– Harder to test theory or separate program components
4
Problem of Missing Data
For a program beneficiary:
α= (Y | P=1) - (Y | P=0)
we observe (Y | P=1)
but we do not observe (Y | P=0)
Estimate (Y | P=0) i.e. counterfactual: what would have happened to Y in the absence of P.
• Treatment effect: y(1) – y(0)
y(1) = outcome if treated
y(0) = outcome if not
In equation form y = (1 – P) × y(0) + P × y(1) ,
where P = 1 if treat and = 0 if not
• Usual Parameters of interest:
– Average Treatment Effect (ATE): E(y(1) − y(0)),
– Average Treatment on the Treated (TT): E(y(1) − y(0)|P = 1))
Rubin Causal Model
Selection Bias
E(y|P = 1) – E(y|P = 0) = E(y(1)|P = 1) - E(y(0)|P = 0)
= E(y(1) – y(0)|P=1) + E(y(0)|P=1) – E(y(0)|P =0)
E(y(0)|P = 1) − E(y(0)|P = 0): selection bias.
• Treated do better because would have done better
anyway without treatment
• Confounds interpretation of treatment effect
• Cannot separately identify treatment effect from
selection bias (omitted variables)
= Ineligible
Solution: Random Assignment
= Eligible
1. Population
External Validity
2. Evaluation sample
3. Randomize treatment
Internal Validity
Comparison
Treatment
X
Randomized clinical trial pioneered in medicine:
– Norm since creation of US FDA in 1962
• Proof of efficacy required prior to marketing.
• Based on adequate and well-controlled studies.
– Interpreted as random assignment to treatment and
control groups with adequate power
• Medical Journals
• Use RCTs for clinical studies
• Basis of Clinical Practice Guidelines
• Ethical treatment of human subjects
Clinical Trials in Medicine
• Introdcued to clinical trails in 1950s
– Sampling variability (how big is big?)
– Causality
• Alliance of statisticians and clinical researchers to
– discipline physicians in their prescriptions
– guide insurance companies to pay for the drugs
• Coincided with diffusion of statistics in science
(psychology, economics, physics, genetics).
Statistics …
Main argument: Statistics provides objectivity to
combat subjective and “self-interest” bias.
RCTs not used much until recently
• Formal RCT theory developed by Fischer in 1930s
– Reluctant to apply to soc. sci. due to “non-experimental” nature
• Most econometrics avioded RCTs until last 15 years
– Used to test theory so causal identification became big issue
– Simultanous equations (Supply & Demand) led to IV (1940s)
– Returns to education led to “selection” or Roy models (1960s)
– Ag productivity led to panel data (FE) (1960s)
– Natural policy experiments (1990s)
• Dif-in-Dif & Regression Discontinuity
– Ex post matching flash in the pan
LaLonde (1986)
– Took data from a randomized experiment and and
analyzed them as if observational data
– uses a two-step Heckman selection correction method with
different sets of exclusion variables
– non-experimental results are very dependent of the choice
of exclusion variables and far from the benchmark results
– Results very different that experimental results
Dehejia and Wahba (2002) got much closer using
propensity score matching methods
Can non-experimental evaluations match
experimental results?
• Selection bias taken more seriously
– Influential within-study comparisons (LaLonde, 1986)
• Standard selection correction procedures
– Questioned for lack of robustness
– Require strong “untestable” assumptions
– Never know when able to control for all omitted variables
• Search for:
– Credible sources of variation
– Less restictive parametric and distributional assumptions
• Embrace RCTs as feasible, desirable and mainstream
A Credibility Revolution in 1990s
Policy World More Accepting RCTs
• Positive experience with RCTs
– E.g. PROGRESA/OPOTUNIDADES
• Fewer ethical concerns in cases
– Limited resources: # beneficiaries > resources
– Phased rollout
– Concern is over excluding a group
– Good governance
• Equitable, accountable & transparent
– PROGRESA example
• Expanded funding
– 3ie, DFID, World Bank SIEF, IADB, BMGF, USAID & other bilaterals
• Academic literature (efficacy)
– Preschool Preschool (1961)
– Jamaica ECD (1986)
– Kenya Deworming (2004)
• Some policy evaluation (effectiveness)
– RAND Health insurance (1972)
– Manpower Job Training Act (1974)
– Mexican PROGRESA/OPORTUNIDADES (1997)
Examples of Influential RCTs
• First to demonstrate value of ECD for
disadvantaged children
– Results keep on influencing papers today
– Tracked participants up to 2000.
• Evaluation design:
– 123 children age 3-6 years old in Michigan
– Half randomly assigned to treatment
– Specialized education for 30 weeks.
– Followups at ages 14, 15, 19, 27, 40
Perry Pre-School program (1961)
Jamaican ECD Study Design
• Pyscho-social stimulation:
– improve quality of maternal child interaction
– Weekly CHW visits for 24 months
• Recruited 129 children from poor neighborhood – Stunted (HAZ < -2) between 9 & 24 months
– Randomly assigned
– Followups at endine, and at age 7, 11, 18 and 22 years
• Led to ECD movement in developing countries
Wage Distributions for Treatment and Control Groups
21
0
0,1
0,2
0,3
0,4
0,5
0,6
0,7
0,8
-2 -1,5 -1 -0,5 0 0,5 1 1,5 2
T C Bandwidth 0.37
Deworming School Children in Kenya
• Effect of deworming on schooling
• Cluster randomization at school level
– Impact on enrollment and attendance
– Spillover effect
– Long term effect on earnings
• Influenced worldwide scaleup
22
• Impact of cost-sharing on access to health
care and health outcomes
• Evaluation design:
– 5809 people randomly assigned to insurance
with 0%, 25%, 50%, and 75% cost-sharing rates.
– Main results: cost-sharing lowers “superfluous”
cares with little harm to their health
– But some heterogeneity: not true for the poorest.
• Influenced use of cost-sharing in
health insurance
RAND Health Insurance Experiment
PROGRESA/OPRTUNIDADES
• Conditional Cash Transfers
• Rural Areas
– Randomized rollout 506 communities, 24,000 households
– Large effects on health, education, and poverty
– Used to sell CCTs to 35 countries
– Large number of studies
• Urban Areas
– Matched difference-in-difference
– Results weaker, less stable and controversial
Manpower Job Training
• Individuals applying for unemployment randomized to various job training and counseling interventions
• Large scale national study
• Found little or no impact on labor market outcomes
• Despite this job training in US and internationally still used and well funded
• Problem with negative results
25
• Since mid 1990s, explosion of evaluations
– e.g. Health, education, HIV/AIDS, CCTs, Active Labor
Markets, Microfinance, Water & Sanitation, Infrastructure,
Management, etc…
– Still fewer than quasi-experimental studies
• Going to increasingly “smart designs”
– Alternative program designs
– Alternative pathways and mechanisms (testing theory)
– Validation of structural models for ex ante evaluation
RCTs in development economics today