Emergent U.S. Design and Analysis Strategies for Learning ... · Emergent U.S. Design and Analysis...

Post on 18-Jun-2018

228 views 2 download

Transcript of Emergent U.S. Design and Analysis Strategies for Learning ... · Emergent U.S. Design and Analysis...

Emergent U.S. Design and Analysis Strategies for Learning More from Social

Experiments – with Development Applications (PART 1)

Stephen H. Bell, Ph.D. Abt Associates

September 1, 2014

The Three Biggest RCT Challenges in the U.S.

• Making randomized exclusions acceptable

– for internal validity

• Doing so under characteristic circumstances

– for external validity

• Finding ways to have experimental evidence guide program improvements

– for policy relevance

• Only time for two first and last

– Furthest along in the States

– External validity (e.g, Klerman, 2014; Olsen et al., 2013)

Bell & Bradley, APPAM Presentation, Nov 6 - 8 , 2008 2

Outline of Workshop ( pre-break: 14.00 – 15.30)

Making random assignment acceptable

• Constituencies and their concerns

• Estimating long-run impacts after the control group

receives the intervention

• Building local agency priorities into the random

assignment design

Questions

Discussion: possible development applications

Bell & Bradley, APPAM Presentation, Nov 6 - 8 , 2008 3

Outline of Workshop (post-break: 16.00 – 17.30)

Learning “what works” to guide program improvements

• Case study: Training for health care occupations

– Learning which local program models work best

• Case study: The role of quality in early childhood programs

– Learning what in-program experiences help individuals most

Questions

Discussions: possible development applications

Bell & Bradley, APPAM Presentation, Nov 6 - 8 , 2008 4

Making Program Exclusions for the Control Group Palatable

• Goal = be able to do RCTs more often

• Caveat = do so without sacrificing scientific integrity

• Constituencies

– Political / community leaders

– Implementation agencies

– Target population for intervention (major issue in U.S.; not apparent as a challenge in developing world will not address)

Bell & Bradley, APPAM Presentation, Nov 6 - 8 , 2008 5

For Political / Community Leaders

Graduated phase-in of intervention (“step wedge” design)

• Fits resource and implementation capacity circumstances

• Lottery = fairest way to determine “who goes first”

• No “losers” in long-run

What about evidence of long-run impacts / sustainability?

Use recursive estimation (Bell & Bradley, 2013)

Bell & Bradley, APPAM Presentation, Nov 6 - 8 , 2008 6

For Implementation Agencies

• Agencies care about

– How long exclusions will last

– How many cases go into the C group

– Which cases go in

• How long = “step wedge” design

• How many = uneven random assignment ratio

• Which ones = “Wild card” exemptions

“Agency-Preferred Random Assignment”

Bell & Bradley, APPAM Presentation, Nov 6 - 8 , 2008 7

Bell & Bradley, APPAM Presentation, Nov 6 - 8 , 2008 8

Estimation of Long-Run Impacts after the Control Group Receives the Intervention

OUTLINE:

• Motivation of problem

• Method for estimating impacts in RCTs after the control

group receives the intervention

• The sole assumption behind the method . . . and the

conditions under which it is fulfilled

• Designing RCTs to test the assumption

• Making future satisfaction of the assumption more likely

Bell & Bradley, APPAM Presentation, Nov 6 - 8 , 2008 9

Context of the Problem

• For equity or to obtain leader & implementation agency cooperation

– guarantee all communities or families a new intervention within a discrete time period (e.g., one year )

once included, control group can no longer provide the no-intervention counterfactual with which to estimate impacts experimentally

• Need to know longer-term impacts to judge success / test sustainability

• Example: Aflasafe pilot in Nigeria

– Rolling out to all maize-producing villages within 3 years

Bell & Bradley, APPAM Presentation, Nov 6 - 8 , 2008 10

• Studies providing lagged intervention to control group

– Most stop reporting impacts no long-run findings

– Those that don’t stop typically do pre/post or interrupted

time series analysis to go further

• Biased if . . .

– Exogenous trend shift concurrent with program start

– Different trend shift (if have comparison group for ITS)

• What to do instead?

Take continued advantage of experimental design . .

.

Getting Past the End of the Control Group: Bad and Better Options

Bell & Bradley, APPAM Presentation, Nov 6 - 8 , 2008 11

Desired Time 2 Comparison

0

1

2

3

4

5

6

7

8

9

Baseline Time 1 Time 2

Treatment Control

Bell & Bradley, APPAM Presentation, Nov 6 - 8 , 2008 12

Observed Time 2 Comparison

0

1

2

3

4

5

6

7

8

9

Baseline Time 1 Time 2

Treatment

Control

Bell & Bradley, APPAM Presentation, Nov 6 - 8 , 2008 13

Recursive Method for Estimating Time 2 Impact

0

1

2

3

4

5

6

7

8

9

Baseline Time 1 Time 2

Treatment

Lag Control

Control

Computation

• Subtract I1 from Y2

C to get Y2C *

• Estimate I2 = Y2

T - Y2C * = ( Y2

T - Y2C ) + ( Y1

T - Y1C )

• Bell & Bradley (2013) provide

– Standard error formula

– Recursive extension to impacts in third period & beyond . . .

Bell & Bradley, APPAM Presentation, Nov 6 - 8 , 2008 14

Bell & Bradley, APPAM Presentation, Nov 6 - 8 , 2008 15

Sufficient Condition for Unbiased Estimation

Impact on T group in its initial year of intervention

= Impact on C group in its initial year of intervention

Things that affect impacts must stay constant over time;

Things that affect outcomes can change!

Next . . .

• Identify conditions under which the assumption of constancy holds

• Discuss how one might test the conditions

• Suggest how those conditions might be made more likely

Bell & Bradley, APPAM Presentation, Nov 6 - 8 , 2008 16

Intervention-Related Factors that Need to Remain Stable over Time

Sponsor’s guidelines for intervention’s design and

implementation

– Eligibility guidelines / intake process

– Design of services / service delivery guidelines

Central implementation agency’s desire and evolving ability

to support the intervention

– OK if local partner agencies’ desires and abilities differ

between the two years, if random (i.e., not a time trend)

Bell & Bradley, APPAM Presentation, Nov 6 - 8 , 2008 17

Consequences of Delay that Must Not Happen (Part I)

Different types of cases choose to participate in the

intervention in the C group than in the T group because . .

– Learn the study’s early findings (unlikely)

– Economy shifts over time

– Other programs become available

• Don’t need 100% participation for either T or C group

• Don’t even need the same % participation

– can do separate Bloom no-show adjustment for each group

Bell & Bradley, APPAM Presentation, Nov 6 - 8 , 2008 18

Consequences of Delay that Must Not Happen (Part II)

Local implementation agencies systematically invest less

effort in the intervention in the later C-group year than did

the same or different local agencies in the T-group year

For example, one-year delay in opportunity to launch

intervention may

– Reduce enthusiasm

– Involve the agency in other new initiatives

Bell & Bradley, APPAM Presentation, Nov 6 - 8 , 2008 19

An Experimental Test of the Method

• 3-way random assignment to:

– Immediate treatment (T)

– Lagged treatment (L)

– Permanent control (C)

• Use T and L to compute the recursive estimate

• Compare to purely experimental long- run result (T vs. C)

• At least 3 such tests exist in the U.S. – by happenstance

– all are too small to be informative

Bell & Bradley, APPAM Presentation, Nov 6 - 8 , 2008 20

Making Satisfaction of Unbiasedness Conditions More Likely

• Lock in sponsor/developer commitment to keep the

intervention unchanged over time

• Maximize all local implementation agencies’ up-front

commitment to fully implement with fidelity, regardless of

timing

• Minimize circulation of early study results

• Shorten the lag before C group implementation? (fewer

things change)

• Lengthen the lag? (fewer years of assumptions)

Bell & Bradley, APPAM Presentation, Nov 6 - 8 , 2008 21

Known Applications and Extensions

Two known uses of the recursive method in K-12 education

research in the U.S.

• PCI Reading Evaluation

– No significant impact in Year 2 using QED method

– Significant impact in Year 2 with recursive approach

• Alabama Mathematics, Science, and Technology Initiative

(AMSTI) Evaluation

– Significant impact in Year 2 with recursive approach

– Also applied, through two iterations, to Year 3

Making Intervention Exclusions Palatable to Implementation Agencies

• Many experiments seek to randomize

– Facilities (e.g., health care clinics)

– Workers (e.g., farmers)

– Target clients (e.g., poor children)

• Consider cases where randomization will be carried out all

within one organization

– Organization usually has preferences for inclusions and

exclusions

– Always prefers fewer and shorter exclusions

Bell & Bradley, APPAM Presentation, Nov 6 - 8 , 2008 22

Concerns about Control Group Exclusions for Implementing Agencies

• In other words, regarding control group members

agencies care about

– How long the exclusion lasts

– How many cases are excluded

– Which cases are excluded

• How long = limit the embargo & use recursive estimation

• How many = use an uneven random assignment ratio

• Which ones = “wild card” exemptions

“agency-preferred random assignment”

Bell & Bradley, APPAM Presentation, Nov 6 - 8 , 2008 23

Options for Addressing Agency Concerns about “How Many” and “Which Ones”

• Allow more than half to participate (immediately)

– Tilt the random assignment ratio away from 50:50, toward

the treatment group [does not create mismatch or bias ]

• Allow “wild card” exemptions from RA for a few cases

– Automatically “in”, no questions asked

– Excluded from the research

• Higher odds of inclusion for “preferred” cases

– Above 50:50 for those agency most wants included

– Below 50:50 for others

Bell & Bradley, APPAM Presentation, Nov 6 - 8 , 2008 24

Abt Associates Footer Information goes here

Tilting the Random Assignment Ratio

Ability to detect smaller impacts deteriorates slowly as

treatment group share of sample goes up . . .

Treatment Group

Share

Random Assignment

Ratio ( T : C )

Minimum Detectable

Impact

0.50 1:1 100 units

0.60 3:2 102

0.67 2:1 106

0.75 3:1 116

0.83 5:1 135

0.90 9:1 167

Abt Associates Footer Information goes here

“Wild Card” Exemptions from Random Assignment

Minor distortions (< 10%) unless combine large impact

ratio (3 to 1) with generous exemptions ( > 1 in 20)

Exempted Share Impact Ratios that

Hold Distortion < 10%

Size of Distortion

with 3 to 1 Ratio

1 in 5 < 1.6 to 1 29%

1 in 10 < 2.1 to 1 23%

1 in 15 < 2.6 to 1 17%

1 in 20 < 3.2 to 1 9%

1 in 33 < 4.7 to 1 7%

1 in 50 < 6.5 to 1 4%

“Agency-Preferred Random Assignment”

• Set the treatment group assignment probability higher for

preferred cases than other cases (Olsen et al., 2007)

Example: 2:1 (T vs. C) for preferred cases

1:2 (T vs. C) for others

• If equal shares are “preferred” and “other,” results are

identical to uniform 1:1 (T vs. C) ratio for both groups re

– Total number randomized

– Share (50%) and number excluded as control group

– Expected value of impact estimate

Bell & Bradley, APPAM Presentation, Nov 6 - 8 , 2008 27

How Findings Change with “Agency-Preferred Random Assignment”

• Minimum detectable impacts increase

– Uneven T:C ratios MDIs rise above the illustrative

benchmark of 100 (reversing ratios for “preferred” vs.

other cases makes no difference to MDIs)

• For an ongoing program, able to calculate impacts

separately for

– “Usually included” group (the “agency preferred” cases),

whose finding cannot be distorted by added cases with

different impact magnitudes

– Cases that would be added by expansion (the “other”

cases), which are also important to policy

Bell & Bradley, APPAM Presentation, Nov 6 - 8 , 2008 28

Abt Associates Footer Information goes here

Minimum Detectable Impacts for “Agency-Preferred Random Assignment”

For “usually included” group, confining analysis to half the

sample escalates the penalty from an uneven RA ratio

T:C Ratios Used Overall Minimum

Detectable Impact

MDI for Usually-

Included Group

1:1 and 1:1 100 141

3:2 and 2:3 102 144

2:1 and 1:2 106 149

3:1 and 1:3 116 164

5:1 and 1:5 135 190

9:1 and 1:9 167 235

Bell & Bradley, APPAM Presentation, Nov 6 - 8 , 2008 30