EEF Evaluators’ Conference 25 th June 2015. Session 1: Interpretation / impact 25 th June 2015.

EEF Evaluators’ Conference

25th June 2015

Session 1: Interpretation / impact

25th June 2015

Rethinking the EEF Padlocks

Calum DaveyEducation Endowment Foundation

25th June 2015

Overview

→Background

→Problems

→Attrition

→Power/chance

→Testing

→Proposal

→Discussion

Background

• Summary of the security of evaluation findings • ‘Padlocks’ developed in consultation with evaluators

Background


GroupNumber of

pupilsEffect size Estimated months’ progress Evidence strength

Literacy intervention 550 0.10 (0.03, 0.18) +2

Background


• Five categories – combined to create overall rating:

GroupNumber of

pupilsEffect size Estimated months’ progress Evidence strength

Literacy intervention 550 0.10 (0.03, 0.18) +2

Rating 1. Design 2. Power (MDES) 3. Attrition 4. Balance 5. Threats to validity

5 Fair and clear experimental

design (RCT) < 0.2 < 10%

Well-balanced on observables

No threats to validity

4 Fair and clear experimental

design (RCT, RDD) < 0.3 < 20%

3 Well-matched comparison

(quasi-experiment)< 0.4 < 30%

2 Matched comparison (quasi-experiment)

< 0.5 < 40%

1 Comparison group with poor or no matching

< 0.6 < 50%

0 No comparator > 0.6 > 50%Imbalanced on

observablesSignificant threats

Background

0 1 2 3 4 50

5

10

15

57

4

13

6

2

Number of padlocks

n=37

Note: count does not include pilots, which often don’t get a security rating

Oxford Improving Numeracy and Literacy

Rating 1. Design 2. Power (MDES)

3. Attrition 4. Balance 5. Threats to validity

5 Fair and clear experimental design (RCT)

< 0.2 < 10%Well-balanced on observables


4 Fair and clear experimental design (RCT, RDD)

< 0.3 < 20%

3 Well-matched comparison (quasi-experiment)

< 0.4 < 30%


< 0.5 < 40%


< 0.6 < 50%

0

No comparator > 0.6 > 50%Imbalanced on observables

Significant threats

Act, Sing, Play







< 0.3 < 20%


< 0.4 < 30%


< 0.5 < 40%


< 0.6 < 50%

0


Significant threats

Team Alphie







< 0.3 < 20%


< 0.4 < 30%


< 0.5 < 40%


< 0.6 < 50%

0


Significant threats

Problems : power

• MDES at baseline • MDES changes • Confusion with p-values and CIs:

– Effect bigger than MDES! • E.g. Calderdale • ES=0.74, MDES <0.5

– P-value < 0.05!• E.g. Butterfly Phonics • ES=0.43, p<0.05, MDES>0.5

Rating2. Power (MDES)

5 < 0.2

4 < 0.3

3 < 0.4

2 < 0.5

1 < 0.6

0 > 0.6

Problems : attrition

• Calculated overall at the level of randomisation

• 10% pupils off school each day• Disadvantages individually-

randomised:– Act, Sing, Play (pupil): 0% attrition

at school or class level, 10% at pupil level

– Oxford Science (school): 3% attrition at school level, 16% at pupil level

• Are the levels right?

Rating 3. Attrition

5 < 10%

4 < 20%

3 < 30%

2 < 40%

1 < 50%

0 > 50%

Problems : testing

• Lots of testing administered by teachers

• Teachers rarely blinded to intervention status

• What is the threat to validity when effect sizes are small?

Rating5. Threats to

validity

5 No threats to

validity

4

3

2

1

0 Significant

threats

Potential solution?

• Assess ‘chance’ as well as MDES in padlock?• Assess attrition at pupil level for all trials?• Randomise invigilation of testing to assess bias?• Number of pupils (number with intervention)• Confidence interval for months progress?

Discussion

• Can p-values, confidence intervals, power, sample size, etc. could be combined in measure of ‘chance’?

• What are the advantages and disadvantages of reporting confidence intervals alongside the security rating?

• Is it right to include all attrition in the security rating? What potential disadvantages are there?

• What is the more appropriate way to ensure unbiasedness in testing? Would it be possible to conduct a trial across evaluations?

Session 2: Implementation

25th June 2015

Implementation and process evaluation in intervention development and research

Neil Humphrey and Ann Lendrum

Manchester Institute of Education

[email protected]

0161 275 3404

mailto:[email protected]

Implementation and process evaluation

• The (mistaken) assumption of effective implementation (Berman & McLaughlin, 1978)

– “When faced with the realities of human services, implementation outcomes should not be assumed any more than intervention outcomes are assumed” (Fixsen et al, 2005, p.6)

• Parallel development of ‘implementation’ and ‘process evaluation’ literature in different disciplines– Implementation – psychology and education– Process – health

• “Implementation is defined as a specified set of activities designed to put into practice an activity or program of known dimensions” (Fixsen et al, 2005, p.5)– Term also used more broadly in reference to supporting the uptake of

‘evidence-based’ interventions

• “Process evaluation involves gathering data to assess the delivery of programs” (Domitrovich, 2009, p.195)

• Put simply - looking inside the ‘black box’ (Saunders et al, 2005)

Implementation science

• “The goals of implementation science have been to understand barriers to and facilitators of implementation, to develop new approaches to improving implementation, and to examine relationships between an intervention and its impact. Implementation science has investigated a number of issues, including: influences on the professional behavior of practitioners; influences on the functioning of health and mental health care practice organizations; the process of change; strategies for improving implementation, including how organizations can support the implementation efforts of staff members; appropriate adaptation of interventions according to population and setting; identification of approaches to scaling-up effective interventions; implementation measurement methods; and implementation research design” (Forman et al, 2013, p.83)

Implementation science

IMPLEMENTATION AND PROCESS

EVALUATION (IPE)

UNDERSTANDING IMPLEMENTATION

PROCESSES

SUPPORTING IMPLEMENTATION

PROCESSES

EVALUATING IMPLEMENTATION

PROCESSES

What is an intervention?

• Interventions are, “purposively implemented change strategies” (Fraser & Galinksky, 2010, p.459)

• Key elements:– Purposive– Implementation– Change– Strategic

The intervention development and research cycle

DEFINE AND UNDERSTAND THE PROBLEM

DESIGN AND DESCRIBE THE

PROPOSED SOLUTION

ARTICULATE AN INTERVENTION

THEORY

PILOT AND REFINE

ESTABLISH INTERVENTION

EFFICACY

ESTABLISH INTERVENTION

EFFECTIVENESS

SCALE-UP INTERVENTION

ITERATIVE AND CYCLICAL, NOT

LINEAR

Design and describe the proposed solution

• Design of intervention – build on accrued knowledge base– Review by stakeholders and experts in the field– Use of Type 1 translational research (applying basic science to inform

intervention development)

• Basic intervention features can include– Form (e.g. universal, selective, indicated)– Function (e.g. environmental, developmental, informational)– Level and location (e.g. individual, group, family, school, community,

societal)– Complexity and structure (e.g. single component, multi-component)– Prescriptiveness and specificity (e.g. manualised, flexible)– Components (e.g. curriculum, environment/ethos, parents/wider community)– Intervention agents (e.g. teachers, external staff)– Recipients (e.g. teachers, pupils)– Procedures and materials (e.g. what is done, how often)


• “The quality of description of interventions in publications… is remarkably poor” (Hoffman et al, 2014, p.1)

• Without a complete description of an intervention:– The person/s responsible for delivery cannot reliably implement it– The recipient/s do not know exactly what they ‘signing up for’– Researchers cannot properly replicate or build upon existing findings– Researchers cannot adequately evaluate the implementation of the intervention– It is difficult, if not impossible, to understand how and why it works

• Less than 40% non-pharmacological interventions were found to be described adequately in papers, appendices or websites (Hoffman et al, 2013)

• Hence, the Template for Intervention Description and Replication’ (TIDieR) (Hoffman et al, 2014) offers a useful tool that can improve the quality of how interventions are described and subsequently understood

• TIDieR (adapted version) = 1. Brief name, 2. Why? (theory/rationale), 3. Who (recipients), 4. What (materials), 5. What (procedures), 6. Who (provider), 7. How (format), 8. Where (location), 9. When and how much (dosage), 10. Tailoring (e.g. adaptation)


• Think about a recent or current pilot or trial of an intervention in which you are involved

• Can you provide a full description of the intervention?• Reminder of the TIDieR (adapted version) items: 1. Brief

name, 2. Why? (theory/rationale), 3. Who (recipients), 4. What (materials), 5. What (procedures), 6. Who (provider), 7. How (format), 8. Where (location), 9. When and how much (dosage), 10. Tailoring (e.g. adaptation)

• ‘Just a Minute’ format – responses to the 10 items in the adapted TIDieR framework

• Questions for reflection– Why are these 10 items important? Were some

harder to populate with information than others? Which ones? Why?

– Are there any fundamental ways of describing an intervention that TIDieR misses? What are these?

– Is TIDieR better suited to describing certain kinds of interventions than others? If so, what kinds of interventions and why?

– Would TIDieR be useful as a standardised reporting framework for EEF projects?

Articulate an intervention theory

• Without understanding intervention theory, we are effectively left with a ‘black box’ view (e.g. we think about interventions in terms of effects without paying attention to how and why those effects are produced)

– “Seasoned travellers would not set out on a cross country motor trip without having a destination in mind, at least some idea of how to get there, and, preferably, a detailed map to provide direction and guide progress along the way” (Stinchcomb, 2001, p.48).

• A logic model, “describes the sequence of events for bringing about change by synthesizing the main program elements into a picture of how the program is supposed to work” (CDCP, 1999, p.9)

– Often articulated in terms of inputs, processes/mechanisms, and outcomes– Sometimes factors affecting inputs and processes are also added– Typically displayed in diagrammatic form

1. What is done in the intervention? (inputs)

2. What are you trying to achieve? (outcomes)

3. What are the mechanisms/processes that link 1 and 2 above? (change mechanisms)

4. What factors could impact on the above? (moderators)


• Try to create a basic logic model for the intervention you described in the previous activity using the worksheet provided

• Questions for reflection– Which component(s) of the logic model were

the most difficult to complete? Why?– How might you go about empirically testing the

assumptions of your intervention logic model in a pilot or trial context?

– Is logic modeling better suited to theorising certain kinds of interventions than others? If so, what kinds of interventions and why?

– What are the limitations of logic modeling, and what alternative methods might be used in order to articulate intervention theory?

Pilot and refine

• What are we trying to achieve when we pilot an intervention? • One possible organising framework for a pilot study is that of social validity

– The value and social importance attributed to a given innovation by those who are direct or indirect consumers of it (Hurley, 2012; Luiselli & Reid, 2011)

• Adapting Wolf’s (1978) classic taxonomy– Acceptability – are the intended outcomes of the intervention wanted, needed and/or social

significant?– Feasibility – is the intervention considered to be ‘doable’? – Utility – are the outcomes of the intervention satisfactory, and worth the effort required to

achieve them?

• Consideration of phase of implementation (Fixsen et al, 2005)– Exploration– Installation– Initial implementation– Full implementation

Pilot and refine

• Design, data generation and analysis– Small scale– Mixed methods – assumption of ‘pilot’ = ‘qualitative’ is not

helpful– Review of materials by stakeholders and experts– Key implementation-related questions may include – can

implementers deliver the intervention in the time allotted? Does any sequencing of content and other aspects of intervention design make sense to implementers and recipients? Are suggested activities congruent with the context of delivery (e.g. target population, setting)? Are recipients engaged? (Fraser & Gallinsky, 2010)

– Key outcome-related questions may include – are there indications of impact on intended outcomes? Of what kind of magnitude? For whom?

• What kinds of refinements are made?– Intervention theory– Intervention design– Context – required contextual characteristics, foundations

for change, implementers– Methodological considerations for evaluation

Establish intervention efficacy and effectiveness• Can the intervention produce intended outcomes under optimal conditions?

– How do we define ‘optimal’? Likely to be informed by intervention theory

• There is an implicit assumption that implementation will be uniformly high quality in efficacy trials (see for example Flay et al, 2005) – this is rarely the case in school-based interventions

– We know that implementation variability predicts outcome variability– Interventions do not happen in a vacuum – understanding context and social processes is crucial

• IPE is therefore essential in randomised trials – Studying how the intervention is implemented (including how and why this varies)– Distinguishing between different intervention components and identifying those that are

critical (‘active ingredients’) through analysis of natural variation or experimental manipulation (e.g. multi-arm or factorial trials) (Bonell et al, 2012)

– Planned sub-group analyses to identify differential responsiveness/gains (Petticrew et al, 2012)

– Investigating contextual factors that may influence the achievement of expected outcomes– Empirical validation of intervention theory (Bonell et al, 2012)– Interpretation of outcomes, regardless of their valence

• Intervention theory, implementation, evaluation

• Requires a move toward ‘realist’ RCTs (Bonell et al, 2012) with expectation of some natural variation

Establish intervention efficacy and effectiveness

• IPE in a trial context should consider– Aspects of implementation (e.g.

fidelity/adherence, dosage, quality, participant responsiveness, programme differentiation, reach, adaptation, monitoring of comparison conditions)

• Important to avoid Type III error

– Factors affecting implementation (e.g. preplanning and foundations, implementation support system, implementation environment, implementer factors, intervention characteristics) (Durlak & DuPre, 2008; Greenberg et al, 2005; Forman et al, 2009)

• Not a quant/qual division!– 62% quantitative, 21% qualitative, 17% both in

health promotion research (Oakley et al, 2006)

• Use of a range of methods and informants • There is no one set way to do things – IPE

in a trial has to be pragmatic!

Implementation quality model (Domitrovich et al, 2008)


• Promoting Alternative Thinking Strategies (Humphrey et al, 2015)– PATHS is a social-emotional learning curriculum that aims to help children

manage their behaviour, understand their emotions and work well with others– Cluster RCT; 23 PATHS vs 22 control (N=4516)– Training provided by developers; teachers supported by trained coaches (in

turn supervised by developers); all materials provided free of charge– Assessment of outcomes: social-emotional competence, mental health,

attainment, health-related quality of life– Assessment of implementation: surveys of usual practice, structured

observations, teacher implementation surveys, teacher factors affecting implementation surveys, interviews with teachers and school staff, focus groups with pupils, interviews with parents

• Outcome analysis showed no impact of PATHS on children’s attainment in English/reading or maths

• Structured observational data indicated that fidelity, quality, reach, and participant responsiveness were generally high; however, in terms of dosage, teachers were on average 20 lessons (10 weeks) behind schedule at the point of observation


• Quantitative analysis of usual practice surveys and structured observational data indicated that increased provision of targeted interventions, higher levels of implementation quality, and optimal intervention reach were associated with improved academic outcomes

– Most consistent finding was for reach; largest effect sizes were for quality

• Qualitative interview data was extremely helpful in illuminating the processes underpinning the above findings

– Philosophical fit– Meeting perceived needs– Practical fit– Pedagogical fit– Barriers and facilitators to effective implementation– Technical support and assistance– School leadership


• Will the intervention produce intended outcomes in ‘real world’ conditions?

– Emphasis on natural settings – increased external validity, decreased internal validity

– Intervention developer much less likely to be involved– Success is heavily dependent upon relationship between

the research team and the host institutions (Flay et al, 2005)

– Paradox of researcher involvement

• Implementation is likely to be even more variable in an effectiveness trial so it is vital that the aforementioned aspects and factors are documented and analysed

• Increased implementation variability is one possible reason why effects observed in efficacy trials are not always replicated in effectiveness trials

– The so-called ‘voltage drop’ (Chambers, Glasgow & Stange, 2013)

• IPE in effectiveness trials should therefore include a particular focus on how the intervention is ‘interpreted’ in real world conditions

– What form(s) does this interpretation take? e.g. dilution and drift?

– What real world constraints and processes influence this?

Academic achievement effect sizes for social-emotional learning interventions in efficacy and

effectiveness trials (Wigelsworth, Lendrum & Oldfield, in press)

Scale-up intervention

• How can we take an intervention “from science to service” (August, Gerwitz & Realmuto, 2010, p.72)?

– “There is a broad consensus that schools do not have a good record in accessing the available knowledge base on empirically validated interventions… Developers and advocates of effective practices have a shared responsibility with educators to create the awareness, conditions, incentives and context(s) that will allow achievement of this important goal” (Walker, 2004, p.399)

– Evidence to routine practice ‘lag’ can be 20 years

• Two related issues – scaling up (bringing the intervention to a wider audience) and sustainability (maintaining effective use and impact of the intervention) (Forman, 2015)

• “The implementation stage begins after the adoption decision is made and culminates when the innovation ‘disappears’ either because it has become so thoroughly integrated into everyday practices that it is no longer visible as an innovation or because it has been discontinued” (Bosworth et a, 1999, p.1)

• Body of work on Type 2 translational research “examines factors associated with the adoption, maintenance, and sustainability of science-based interventions at the practice level” (Greenberg, 2010, p.37)

– This kind of research is by no means confined to the scale-up phase – indeed, it is also critical in effectiveness trials

– ‘Implementation is the outcome’


• High quality IPE is needed here more than ever! We need to understand the factors that influence:

– ….intervention engagement and reach (e.g. who takes it on and why?)– ….implementation quality (e.g. when it is delivered well, what supports this?)– ….sustainability over time (e.g. what is sustained? How?) (Greenberg, 2010)

• Important to document how and why the intervention evolves as it goes to scale

• Building capacity and partnerships for scale-up and sustainability: example of PROSPER (PROmoting School-community-university Partnerships to Enhance Resilience) (Spoth, Greenberg, Bierman & Redmond, 2004)


• Imagine that the intervention you focused upon in the previous activities has passed successfully through development, piloting, efficacy and effectiveness stages and is ready to be ‘taken to scale’ and disseminated more broadly

• What factors do you think are most likely to influence the engagement, reach, implementation quality and sustainability of the intervention when it is scaled-up?

• How might you go about researching the above?• How could the knowledge generated be used to improve

the scaling-up process?

RCTs and instrumental variables

Anna Vignoles

University of Cambridge

Why do you need an IV in an RCT?

• RCTs randomize the allocation of the treatment• But not everyone complies• People used to analyse the data “as treated”

– Treatment on the treated ignoring the fact that some people who were randomised into the treatment did not participate

• This is generally a bad solution because those who choose to participate are not the same as those who don’t!

Why do you need an IV in an RCT?

• Nowadays the preferred analytical solution is Intention to Treat (ITT approaches)– Difference in outcomes between those who are

randomised into the treatment and those who are not• But ITT tells you impact of offering the programme • We would still like to know the effect of the treatment on

the treated• But the treated are not a random subset….

What about an IV solution?

• IV often used post hoc to evaluate a programme– Maimonides rule in Israel, Victor Lavy

• Can be used ex ante– Design an IV into an evaluation– Design an IV into a RCT

• In medical literature use of IV in a trial is called contamination adjusted intention to treat

Why use an IV in a RCT?

• Computing the ITT– Straight difference in average outcomes between

the group to whom you offered treatment, and the group to whom you did not offer treatment

• Computing the Effect of Treatment on the Treated (TOT)– Use whether or not the person was randomised

into the intervention (Z) to predict whether or not the individual actually participated in the intervention (D)

Instrumental Variables: a refresher

• Y1i is the value of the outcome if the treatment is received by individual i

• Y0i is the value of the outcome if the treatment is not received by individual i

• Di = 1 if treatment is received by individual i

• Di = 0 if treatment is not received by individual i

• Xi denotes the set of observed characteristics before the intervention/treatment for individual i


• Di is composed of two parts, one that is correlated with u (endogenous part) and one that is independent of the error term (exogenous part)

• IV uses an additional variable(s) Z (called an instrumental variable, to isolate that part of D that is correlated with the error term

• In this case Z is the randomisation process


• For a valid instrument the following must be true:– corr (Zi,Di) > 0 Instrument is relevant– E(ui|Zi, Xi)= E(ui|Xi)= 0 Instrument effects D, but

not Y directly (only through its

impact on D)

• The instrument must predict D• The instrument must also only effect Y through

its impact on D (untestable assumption)• IV is estimated by 2 Stage Least Squares

Problems with IV

• If instruments are weakly correlated with the endogenous variable, the instruments are said to be weak

• When using weak instruments the IV 2SLS estimator is biased even in large samples

• In small samples IV estimates are biased anyway

• Finite sample bias will lessen as sample size increases

• In this case clear strong instrument….

Advantages and disadvantages

• Essentially adjusts estimate for degree of non compliance• Information on non compliance can be revealing in itself to

understand the impact of the intervention• Non compliance may be difficult to measure in practice

– incomplete or partial compliance• Assumes that if the non compliers had received the

treatment the effect for them would have been the same as for the compliers

• Assumptions behind ITT – the effect of the treatment is averaged over those who actually receive it and those who do not

Some examples

• Vitamin A supplementation in malnourished children reduced mortality by 41% using ITT

• Supplements was found to reduce mortality by two thirds (72%) using CA ITT– Sommer and Zeger 1991

References

• Angrist, J. D. and A. Krueger (2001). “Instrumental Variables and the Search for Identification: From Supply and Demand to Natural Experiments”, Journal of Economic Perspectives, 15(4).

• Heckman, James J. "Randomization as an instrumental variable." (1995).

• Imbens, G. W. and J. D. Angrist, (1994). “Identification and Estimation of Local Average Treatment Effects.” Econometrica, 62(2).

• Sommer A, Zeger SL. On estimating efficacy from clinical trials. Stat Med1991;10:45-52

• Sussman, Jeremy B., and Rodney A. Hayward. "An IV for the RCT: using instrumental variables to adjust for treatment contamination in randomised controlled trials." Bmj 340 (2010): c2073.

Session 3: Costs

25th June 2015

EEF Evaluators’ Conference 25 th June 2015. Session 1: Interpretation / impact 25 th June 2015.

Documents

Transcript of EEF Evaluators’ Conference 25 th June 2015. Session 1: Interpretation / impact 25 th June 2015.