The Hype and Futility of Measuring Implementation Fidelity v5

7/31/2019 The Hype and Futility of Measuring Implementation Fidelity v5

1/46

The Hype and Futility of

Measuring Implementation

Fidelity (in GRTs)David Judkins

Presentation at Evaluation 2009Orlando


2/46

2

The Hype

Effectiveness research is now at the point

of sophistication wherein black-box

outcomes studies are no longer acceptable.Mowbray, Holter, Teague and Bybee,

2003

89,900 Google hits on October 10, 2009, forthe phrase, what works best for whom


3/46

3

Lofty Goals

What social programs, policies, and

interventions work?

For whom do they work, and under what

conditions?

And why do they workor fall short?

Preface toLearning More from Social

Experiments, edited by Howard Bloom

(34,000 Google hits on book title)


4/46

4

And why do they workor fall

short? Bloom expands on the question in an

MDRC announcement about the book

publication:But, in the past, there have been questions that

randomized experiments have not been able to

address effectively. What component of a socialpolicy made it successful?


5/46

5

Can such questions be answered?

I will argue that the answer is generally

negative

Worse, that attempting to answer it

compromises the first objective of

determining whether the intervention works

at all (This is all in the context of group

randomized trials)


6/46

6

Thesis

It is better not to attempt to measure fidelity inGRTs

It is counter-productive to try to answer questions

about efficacy (intervention effects under idealconditions) in a trial designed to measureeffectiveness (intervention effects under realisticconditions)

Other forms of measurement about theintervention process in the hopes of learning moreabout alternate interventions are also vain andwasteful


7/46

7

Outline

Opportunities & perspectives

The preconditions for useful fidelity measurement

Operational challenges in fidelity measurement

Statistical issues in the estimation of fidelity-

adjusted intervention effectiveness

A case study Another perspective


8/46

8

Opportunities

Many educational, social, and behavioral

interventions are complex

Multidimensional Incorporate aspects of culturally accepted best practices

(traditions and fads)

Require the participation of trained intervenors and of

intervention subjects over extended periods of time Can never be detailed enough to handle every

eventuality


9/46

9

Cynics Perspective

If there is failure:

Blame the subjects

Blame the intervenors

Disqualify or discount the work of control

intervenors who by virtue of superior skill,

appear to infringe on the developers recipe,possibly merely by implementing the culturally

accepted best practices


10/46

10

Kaspar Hauser (Jeder fr sich

und Gott gegen alle The wolf child raised

in isolation from most

humanity would makean ideal foil for many

educational and

parenting

interventions


11/46

11

The Thrifty Perspective

It is urgent that an effective intervention be

found

Limited number of fresh ideas in circulation

Limited dollars for research

It would be nice to be able to learn from an

experiment designed to measure the impact

of a complex intervention A3B7C2 what

would be the effect of A1B22C4


12/46

12

Preconditions for Fidelity

Measurement Well-defined intervention

A way of splitting an intervention into

components that could be recombined in

alternative strengths and mixtures

Some theory about what aspects of

interactions between intervenor and subjectare relevant and consequential


13/46

13

Operational Challenges

Choice of informant

Subject

Intervenor

Trainer/ senior intervenor adviser

Neutral observer

How to make fidelity reliable, valid andcost effective?


14/46

14

Intervenor Informant

Likely to

think they are

doing justfine if asked

to summarize

their fidelity

Let's Begin with the Letter People: ECE

0

24

6

8

10

12

14

16

18

20

22

24

1 2 3 4 5

Project Director rating

Frequency

Play & Learning Strategies (PALS): PE

0

24

6

8

10

12

14

16

18

20

22

24

1 2 3 4 5


Frequency

Partners for Literacy: ECE

0

2

4

6

8

10

12

14

16

18

20

22

24

1 2 3 4 5


Fre

quency

Partners for Literacy: PE

0

2

4

6

8

10

12

14

16

18

20

22

24

1 2 3 4 5


Frequency


15/46

15

Intervenor Informant (2)

If asked to keep detailed logs, they will

likely do a poor job

For those who do a good job on detailed

activity reporting, it will probably detract

from their effectiveness


16/46

16

Trainer/Advisor Informant

Can have vested interest

Blind neither to treatment status nor

outcome outlook

Possible to read the writing on the wall and

rate the intervenors with unfavorable

average outcomes as having low fidelity,thereby protecting the fidelity-adjusted

effectiveness of the intervention


17/46

17

Trainer/Advisor Informant (2)

Even if unbiased, how sound of an opinion

can be formulated from initial training and

occasional (often telephone) contact withintervenors?


18/46

18

Neutral Observer

Very costly

Need staff who fully understand the

intervention model

Need extensive training for consistent rating

Usually need travel

Results in strong pressure for additional

clustering


19/46

19

Neutral Observer (2)

High cost of training, travel, and salary

directly reduce power for primary

effectiveness research by reducing subjectsample size (for fixed budget)

Pressure for stronger clustering indirectly

reduces power by reducing the number ofintervenors and/or intervention sites


20/46

20

Statistical Issues

Most advocates of fidelity measurement

have unwarranted optimism about the

ability of statisticians to do anything usefulwith the data

Of course, one can always hunt for the

statistician who will provide rosy promisesof artful analyses


21/46

21

Statistical Issues (2)

The statistician who offers multi-level

causal path mediated analyses will be loved

by many, but as Tukey said: The data may not contain the answer. The

combination of some data and an aching

desire for an answer does not ensure that areasonable answer can be extracted from a

given body of data.


22/46

22

The Best We (Statisticians and

Econometricians) Can Offer Requires heroic assumptions. Either:

Randomization provides an instrumental

variable for fidelity; orThe collection of measured covariates is rich

enough to render fidelity conditionally

independent of potential outcomes


23/46

23

Heroic Assumption #1

Can only render the mediating role of one (one!)

unidimensional summary of fidelity

By definition,Zis an instrumental variable for the effect of

Xon Yif the only effect ofZon Yis throughX

In other words, one must be able to rule out a priori that

there could be any effects ofZon Ythat do not run through

X

In the context of fidelity-adjusted effect estimation, this

means that there is a unique plausible summarization of

fidelity


24/46

24

Heroic Assumption #1 (cont.)

Might not be so heroic if the intervention is very

simple and nearly instantaneous

Then a binary measure of fidelity might be theunique plausible choice

Or if the intervention is purely unidimensional,

perhaps a uniquely plausible ordinal measure of

fidelity could be developed


25/46

25


The little recognized but ironic kick is that even if

you make this assumption, the formal hypothesis

tests for fidelity-adjusted interventioneffectiveness based on the IV approach yield the

same star pattern as the original analysis

The point estimate will be altered, but if the ITT

analysis found no statistically significant treatmenteffect, an IV analysis with randomization as the

instrumental variable will yield the same finding


26/46

26

Heroic Assumption #2

If one relies upon the adequacy of covariate

measurement, one quickly runs up against sample

size problems A typical group randomized trial will have only a

few dozen intervenors per arm (maybe just one or

two dozen, and I have seen less than one dozen)


27/46

27


If we agree that it would probably take on the

order of 30 covariates to fully explain why some

intervenors are more faithful than others (thepropensity scoring approach) or more effective

than others (the ANCOVA approach), then we

need on the order of a 1000 intervenors before we

even consider interactions among the covariates


28/46

28


However, instrument designers generally have no

clue how to design intervenor background

questionnaires that would explain intervenorfidelity

And if we knew how to measure intervenor

effectiveness, then the entire experiment would be

unnecessary


29/46

29

CASE STUDY


30/46

30

CLIO

Randomized field trial of curricula for Even

Start Centers

5 arm study4 active, 1 control

Three fidelity measurements:

Local Even Start center director

Curriculum designer

Neutral observer


31/46

31

Fidelity Instrument Development

Several of the top national experts in the

evaluation of early education interventions

designed the neutral observer instrumentsand training

Curriculum designers were consulted

Curriculum designers had ongoing contactwith intervenors through technical

assistance contracts


32/46

32

Correlations Between Developer

and Observer Fidelity Ratings Across 96 active projects for early

childhood curriculum:

o 0.48 in year 1

o 0.39 in year 2

Across 48 active projects for parenting

curriculum:o 0.10 in year 1

o -0.01in year 2


33/46

33

Relationship between developer-rated fidelity

and emergent child English literacy (arm A2)

Fidelity

Outcome


34/46

34


and emergent child English literacy (arm B2)

Fidelity

Outcome


35/46

35



Fidelity

Outcome


36/46

36



Fidelity

Outcome


37/46

37


and emergent child English literacy (control)

Fidelity

Outcome


38/46

38

Relationship between observer-rated fidelity


Fidelity

Outcome


39/46

39



Fidelity

Outcome


40/46

40



Fidelity

Outcome


41/46

41



Fidelity

Outcome


42/46

42


and emergent child English literacy (control)

Fidelity

Outcome


43/46

43

Methods and Results

Multiplied arm indicators by fidelity scores

(constrained to lie between 0 and 1) in multi-level

model Generally similar results

Fidelity-adjusted estimates not always larger than

ITT estimates!

Two more stars

One positive

One negative!


44/46

44

Case Study Wrap Up

A lot of money spent with little discernable

return

We still dont know how to develop goodpreschool curricula for Even Start projects


45/46

45

Other voices

Peter Schochet, Mathematica Policy

Research, in a recent IES white paper, final

line: Thus, these classroom practice mediators

may be of little help in confirming the

studys conceptual model and identifyingteacher practices that are most associated

with student learning gains.


46/46

Josh Angrist

Instrumental Variables Methods in Experimental

Criminological Research: What, Why, and How? 2004.

Journal Of Experimental Criminology.

Especially noteworthy is the fact that, in marked contrast

with an unfortunate trend in education research,

criminologists do not appear to have been afflicted with

what social scientist Tom Cook (2001) calls

sciencephobia. This is a tendency to eschew rigorousquantitative research designs in favor of a softer approach

that emphasizes process over outcomes.

46

The Hype and Futility of Measuring Implementation Fidelity v5

Documents

Transcript of The Hype and Futility of Measuring Implementation Fidelity v5