Randomized Controlled Trials: How Did We Get Here? · Randomized Controlled Trials: How Did We Get...

26
Randomized Controlled Trials: How Did We Get Here? Paul Gertler UC Berkeley J-PAL

Transcript of Randomized Controlled Trials: How Did We Get Here? · Randomized Controlled Trials: How Did We Get...

Randomized Controlled Trials:

How Did We Get Here?

Paul Gertler

UC Berkeley

J-PAL

Why Policy Makers Find IE Valuable?

Need evidence on what works

Information key to sustainability

Improve program/policy implementation

1

2

3

Limited budget and bad policies could hurt

o Design (eligibility, benefits)

o Budget and beliefs

• Most influential at time of design

– relay experience of other countries

– International public good

When is IE Most Valuable for Policy?

Evaluate impact when project is: o Evaluation will fill knowledge gap

Innovative

Replicable/scalable

o Substantial policy impact

Strategically relevant for policy decisions

Affects many and has large budget

Use evaluation within a program to test alternatives and improve programs

Types of Evaluation

• Efficacy – Small scale proof of concept

– Implemented under ideal conditions

– Test theory

– Limited external validity

• Effectiveness – Large scale in context of program rollout

– Implemented as in field

– Better external validity

– Harder to test theory or separate program components

4

Causal Inference

What is the impact of (P) on (Y)?

α= (Y | P=1)-(Y | P=0)

Can we all go home?

Problem of Missing Data

For a program beneficiary:

α= (Y | P=1) - (Y | P=0)

we observe (Y | P=1)

but we do not observe (Y | P=0)

Estimate (Y | P=0) i.e. counterfactual: what would have happened to Y in the absence of P.

• Treatment effect: y(1) – y(0)

y(1) = outcome if treated

y(0) = outcome if not

In equation form y = (1 – P) × y(0) + P × y(1) ,

where P = 1 if treat and = 0 if not

• Usual Parameters of interest:

– Average Treatment Effect (ATE): E(y(1) − y(0)),

– Average Treatment on the Treated (TT): E(y(1) − y(0)|P = 1))

Rubin Causal Model

Selection Bias

E(y|P = 1) – E(y|P = 0) = E(y(1)|P = 1) - E(y(0)|P = 0)

= E(y(1) – y(0)|P=1) + E(y(0)|P=1) – E(y(0)|P =0)

E(y(0)|P = 1) − E(y(0)|P = 0): selection bias.

• Treated do better because would have done better

anyway without treatment

• Confounds interpretation of treatment effect

• Cannot separately identify treatment effect from

selection bias (omitted variables)

= Ineligible

Solution: Random Assignment

= Eligible

1. Population

External Validity

2. Evaluation sample

3. Randomize treatment

Internal Validity

Comparison

Treatment

X

Randomized clinical trial pioneered in medicine:

– Norm since creation of US FDA in 1962

• Proof of efficacy required prior to marketing.

• Based on adequate and well-controlled studies.

– Interpreted as random assignment to treatment and

control groups with adequate power

• Medical Journals

• Use RCTs for clinical studies

• Basis of Clinical Practice Guidelines

• Ethical treatment of human subjects

Clinical Trials in Medicine

• Introdcued to clinical trails in 1950s

– Sampling variability (how big is big?)

– Causality

• Alliance of statisticians and clinical researchers to

– discipline physicians in their prescriptions

– guide insurance companies to pay for the drugs

• Coincided with diffusion of statistics in science

(psychology, economics, physics, genetics).

Statistics …

Main argument: Statistics provides objectivity to

combat subjective and “self-interest” bias.

RCTs not used much until recently

• Formal RCT theory developed by Fischer in 1930s

– Reluctant to apply to soc. sci. due to “non-experimental” nature

• Most econometrics avioded RCTs until last 15 years

– Used to test theory so causal identification became big issue

– Simultanous equations (Supply & Demand) led to IV (1940s)

– Returns to education led to “selection” or Roy models (1960s)

– Ag productivity led to panel data (FE) (1960s)

– Natural policy experiments (1990s)

• Dif-in-Dif & Regression Discontinuity

– Ex post matching flash in the pan

LaLonde (1986)

– Took data from a randomized experiment and and

analyzed them as if observational data

– uses a two-step Heckman selection correction method with

different sets of exclusion variables

– non-experimental results are very dependent of the choice

of exclusion variables and far from the benchmark results

– Results very different that experimental results

Dehejia and Wahba (2002) got much closer using

propensity score matching methods

Can non-experimental evaluations match

experimental results?

• Selection bias taken more seriously

– Influential within-study comparisons (LaLonde, 1986)

• Standard selection correction procedures

– Questioned for lack of robustness

– Require strong “untestable” assumptions

– Never know when able to control for all omitted variables

• Search for:

– Credible sources of variation

– Less restictive parametric and distributional assumptions

• Embrace RCTs as feasible, desirable and mainstream

A Credibility Revolution in 1990s

Policy World More Accepting RCTs

• Positive experience with RCTs

– E.g. PROGRESA/OPOTUNIDADES

• Fewer ethical concerns in cases

– Limited resources: # beneficiaries > resources

– Phased rollout

– Concern is over excluding a group

– Good governance

• Equitable, accountable & transparent

– PROGRESA example

• Expanded funding

– 3ie, DFID, World Bank SIEF, IADB, BMGF, USAID & other bilaterals

• Academic literature (efficacy)

– Preschool Preschool (1961)

– Jamaica ECD (1986)

– Kenya Deworming (2004)

• Some policy evaluation (effectiveness)

– RAND Health insurance (1972)

– Manpower Job Training Act (1974)

– Mexican PROGRESA/OPORTUNIDADES (1997)

Examples of Influential RCTs

• First to demonstrate value of ECD for

disadvantaged children

– Results keep on influencing papers today

– Tracked participants up to 2000.

• Evaluation design:

– 123 children age 3-6 years old in Michigan

– Half randomly assigned to treatment

– Specialized education for 30 weeks.

– Followups at ages 14, 15, 19, 27, 40

Perry Pre-School program (1961)

Perry Pre-School program: Results

Jamaican ECD Study Design

• Pyscho-social stimulation:

– improve quality of maternal child interaction

– Weekly CHW visits for 24 months

• Recruited 129 children from poor neighborhood – Stunted (HAZ < -2) between 9 & 24 months

– Randomly assigned

– Followups at endine, and at age 7, 11, 18 and 22 years

• Led to ECD movement in developing countries

Effect of stimulation on Cognitive Development

Wage Distributions for Treatment and Control Groups

21

0

0,1

0,2

0,3

0,4

0,5

0,6

0,7

0,8

-2 -1,5 -1 -0,5 0 0,5 1 1,5 2

T C Bandwidth 0.37

Deworming School Children in Kenya

• Effect of deworming on schooling

• Cluster randomization at school level

– Impact on enrollment and attendance

– Spillover effect

– Long term effect on earnings

• Influenced worldwide scaleup

22

• Impact of cost-sharing on access to health

care and health outcomes

• Evaluation design:

– 5809 people randomly assigned to insurance

with 0%, 25%, 50%, and 75% cost-sharing rates.

– Main results: cost-sharing lowers “superfluous”

cares with little harm to their health

– But some heterogeneity: not true for the poorest.

• Influenced use of cost-sharing in

health insurance

RAND Health Insurance Experiment

PROGRESA/OPRTUNIDADES

• Conditional Cash Transfers

• Rural Areas

– Randomized rollout 506 communities, 24,000 households

– Large effects on health, education, and poverty

– Used to sell CCTs to 35 countries

– Large number of studies

• Urban Areas

– Matched difference-in-difference

– Results weaker, less stable and controversial

Manpower Job Training

• Individuals applying for unemployment randomized to various job training and counseling interventions

• Large scale national study

• Found little or no impact on labor market outcomes

• Despite this job training in US and internationally still used and well funded

• Problem with negative results

25

• Since mid 1990s, explosion of evaluations

– e.g. Health, education, HIV/AIDS, CCTs, Active Labor

Markets, Microfinance, Water & Sanitation, Infrastructure,

Management, etc…

– Still fewer than quasi-experimental studies

• Going to increasingly “smart designs”

– Alternative program designs

– Alternative pathways and mechanisms (testing theory)

– Validation of structural models for ex ante evaluation

RCTs in development economics today