The Hype and Futility of Measuring Implementation Fidelity v5

download The Hype and Futility of Measuring Implementation Fidelity v5

of 46

Transcript of The Hype and Futility of Measuring Implementation Fidelity v5

  • 7/31/2019 The Hype and Futility of Measuring Implementation Fidelity v5

    1/46

    The Hype and Futility of

    Measuring Implementation

    Fidelity (in GRTs)David Judkins

    Presentation at Evaluation 2009Orlando

  • 7/31/2019 The Hype and Futility of Measuring Implementation Fidelity v5

    2/46

    2

    The Hype

    Effectiveness research is now at the point

    of sophistication wherein black-box

    outcomes studies are no longer acceptable.Mowbray, Holter, Teague and Bybee,

    2003

    89,900 Google hits on October 10, 2009, forthe phrase, what works best for whom

  • 7/31/2019 The Hype and Futility of Measuring Implementation Fidelity v5

    3/46

    3

    Lofty Goals

    What social programs, policies, and

    interventions work?

    For whom do they work, and under what

    conditions?

    And why do they workor fall short?

    Preface toLearning More from Social

    Experiments, edited by Howard Bloom

    (34,000 Google hits on book title)

  • 7/31/2019 The Hype and Futility of Measuring Implementation Fidelity v5

    4/46

    4

    And why do they workor fall

    short? Bloom expands on the question in an

    MDRC announcement about the book

    publication:But, in the past, there have been questions that

    randomized experiments have not been able to

    address effectively. What component of a socialpolicy made it successful?

  • 7/31/2019 The Hype and Futility of Measuring Implementation Fidelity v5

    5/46

    5

    Can such questions be answered?

    I will argue that the answer is generally

    negative

    Worse, that attempting to answer it

    compromises the first objective of

    determining whether the intervention works

    at all (This is all in the context of group

    randomized trials)

  • 7/31/2019 The Hype and Futility of Measuring Implementation Fidelity v5

    6/46

    6

    Thesis

    It is better not to attempt to measure fidelity inGRTs

    It is counter-productive to try to answer questions

    about efficacy (intervention effects under idealconditions) in a trial designed to measureeffectiveness (intervention effects under realisticconditions)

    Other forms of measurement about theintervention process in the hopes of learning moreabout alternate interventions are also vain andwasteful

  • 7/31/2019 The Hype and Futility of Measuring Implementation Fidelity v5

    7/46

    7

    Outline

    Opportunities & perspectives

    The preconditions for useful fidelity measurement

    Operational challenges in fidelity measurement

    Statistical issues in the estimation of fidelity-

    adjusted intervention effectiveness

    A case study Another perspective

  • 7/31/2019 The Hype and Futility of Measuring Implementation Fidelity v5

    8/46

    8

    Opportunities

    Many educational, social, and behavioral

    interventions are complex

    Multidimensional Incorporate aspects of culturally accepted best practices

    (traditions and fads)

    Require the participation of trained intervenors and of

    intervention subjects over extended periods of time Can never be detailed enough to handle every

    eventuality

  • 7/31/2019 The Hype and Futility of Measuring Implementation Fidelity v5

    9/46

    9

    Cynics Perspective

    If there is failure:

    Blame the subjects

    Blame the intervenors

    Disqualify or discount the work of control

    intervenors who by virtue of superior skill,

    appear to infringe on the developers recipe,possibly merely by implementing the culturally

    accepted best practices

  • 7/31/2019 The Hype and Futility of Measuring Implementation Fidelity v5

    10/46

    10

    Kaspar Hauser (Jeder fr sich

    und Gott gegen alle The wolf child raised

    in isolation from most

    humanity would makean ideal foil for many

    educational and

    parenting

    interventions

  • 7/31/2019 The Hype and Futility of Measuring Implementation Fidelity v5

    11/46

    11

    The Thrifty Perspective

    It is urgent that an effective intervention be

    found

    Limited number of fresh ideas in circulation

    Limited dollars for research

    It would be nice to be able to learn from an

    experiment designed to measure the impact

    of a complex intervention A3B7C2 what

    would be the effect of A1B22C4

  • 7/31/2019 The Hype and Futility of Measuring Implementation Fidelity v5

    12/46

    12

    Preconditions for Fidelity

    Measurement Well-defined intervention

    A way of splitting an intervention into

    components that could be recombined in

    alternative strengths and mixtures

    Some theory about what aspects of

    interactions between intervenor and subjectare relevant and consequential

  • 7/31/2019 The Hype and Futility of Measuring Implementation Fidelity v5

    13/46

    13

    Operational Challenges

    Choice of informant

    Subject

    Intervenor

    Trainer/ senior intervenor adviser

    Neutral observer

    How to make fidelity reliable, valid andcost effective?

  • 7/31/2019 The Hype and Futility of Measuring Implementation Fidelity v5

    14/46

    14

    Intervenor Informant

    Likely to

    think they are

    doing justfine if asked

    to summarize

    their fidelity

    Let's Begin with the Letter People: ECE

    0

    24

    6

    8

    10

    12

    14

    16

    18

    20

    22

    24

    1 2 3 4 5

    Project Director rating

    Frequency

    Play & Learning Strategies (PALS): PE

    0

    24

    6

    8

    10

    12

    14

    16

    18

    20

    22

    24

    1 2 3 4 5

    Project Director rating

    Frequency

    Partners for Literacy: ECE

    0

    2

    4

    6

    8

    10

    12

    14

    16

    18

    20

    22

    24

    1 2 3 4 5

    Project Director rating

    Fre

    quency

    Partners for Literacy: PE

    0

    2

    4

    6

    8

    10

    12

    14

    16

    18

    20

    22

    24

    1 2 3 4 5

    Project Director rating

    Frequency

  • 7/31/2019 The Hype and Futility of Measuring Implementation Fidelity v5

    15/46

    15

    Intervenor Informant (2)

    If asked to keep detailed logs, they will

    likely do a poor job

    For those who do a good job on detailed

    activity reporting, it will probably detract

    from their effectiveness

  • 7/31/2019 The Hype and Futility of Measuring Implementation Fidelity v5

    16/46

    16

    Trainer/Advisor Informant

    Can have vested interest

    Blind neither to treatment status nor

    outcome outlook

    Possible to read the writing on the wall and

    rate the intervenors with unfavorable

    average outcomes as having low fidelity,thereby protecting the fidelity-adjusted

    effectiveness of the intervention

  • 7/31/2019 The Hype and Futility of Measuring Implementation Fidelity v5

    17/46

    17

    Trainer/Advisor Informant (2)

    Even if unbiased, how sound of an opinion

    can be formulated from initial training and

    occasional (often telephone) contact withintervenors?

  • 7/31/2019 The Hype and Futility of Measuring Implementation Fidelity v5

    18/46

    18

    Neutral Observer

    Very costly

    Need staff who fully understand the

    intervention model

    Need extensive training for consistent rating

    Usually need travel

    Results in strong pressure for additional

    clustering

  • 7/31/2019 The Hype and Futility of Measuring Implementation Fidelity v5

    19/46

    19

    Neutral Observer (2)

    High cost of training, travel, and salary

    directly reduce power for primary

    effectiveness research by reducing subjectsample size (for fixed budget)

    Pressure for stronger clustering indirectly

    reduces power by reducing the number ofintervenors and/or intervention sites

  • 7/31/2019 The Hype and Futility of Measuring Implementation Fidelity v5

    20/46

    20

    Statistical Issues

    Most advocates of fidelity measurement

    have unwarranted optimism about the

    ability of statisticians to do anything usefulwith the data

    Of course, one can always hunt for the

    statistician who will provide rosy promisesof artful analyses

  • 7/31/2019 The Hype and Futility of Measuring Implementation Fidelity v5

    21/46

    21

    Statistical Issues (2)

    The statistician who offers multi-level

    causal path mediated analyses will be loved

    by many, but as Tukey said: The data may not contain the answer. The

    combination of some data and an aching

    desire for an answer does not ensure that areasonable answer can be extracted from a

    given body of data.

  • 7/31/2019 The Hype and Futility of Measuring Implementation Fidelity v5

    22/46

    22

    The Best We (Statisticians and

    Econometricians) Can Offer Requires heroic assumptions. Either:

    Randomization provides an instrumental

    variable for fidelity; orThe collection of measured covariates is rich

    enough to render fidelity conditionally

    independent of potential outcomes

  • 7/31/2019 The Hype and Futility of Measuring Implementation Fidelity v5

    23/46

    23

    Heroic Assumption #1

    Can only render the mediating role of one (one!)

    unidimensional summary of fidelity

    By definition,Zis an instrumental variable for the effect of

    Xon Yif the only effect ofZon Yis throughX

    In other words, one must be able to rule out a priori that

    there could be any effects ofZon Ythat do not run through

    X

    In the context of fidelity-adjusted effect estimation, this

    means that there is a unique plausible summarization of

    fidelity

  • 7/31/2019 The Hype and Futility of Measuring Implementation Fidelity v5

    24/46

    24

    Heroic Assumption #1 (cont.)

    Might not be so heroic if the intervention is very

    simple and nearly instantaneous

    Then a binary measure of fidelity might be theunique plausible choice

    Or if the intervention is purely unidimensional,

    perhaps a uniquely plausible ordinal measure of

    fidelity could be developed

  • 7/31/2019 The Hype and Futility of Measuring Implementation Fidelity v5

    25/46

    25

    Heroic Assumption #1 (cont.)

    The little recognized but ironic kick is that even if

    you make this assumption, the formal hypothesis

    tests for fidelity-adjusted interventioneffectiveness based on the IV approach yield the

    same star pattern as the original analysis

    The point estimate will be altered, but if the ITT

    analysis found no statistically significant treatmenteffect, an IV analysis with randomization as the

    instrumental variable will yield the same finding

  • 7/31/2019 The Hype and Futility of Measuring Implementation Fidelity v5

    26/46

    26

    Heroic Assumption #2

    If one relies upon the adequacy of covariate

    measurement, one quickly runs up against sample

    size problems A typical group randomized trial will have only a

    few dozen intervenors per arm (maybe just one or

    two dozen, and I have seen less than one dozen)

  • 7/31/2019 The Hype and Futility of Measuring Implementation Fidelity v5

    27/46

    27

    Heroic Assumption #2 (cont.)

    If we agree that it would probably take on the

    order of 30 covariates to fully explain why some

    intervenors are more faithful than others (thepropensity scoring approach) or more effective

    than others (the ANCOVA approach), then we

    need on the order of a 1000 intervenors before we

    even consider interactions among the covariates

  • 7/31/2019 The Hype and Futility of Measuring Implementation Fidelity v5

    28/46

    28

    Heroic Assumption #2 (cont.)

    However, instrument designers generally have no

    clue how to design intervenor background

    questionnaires that would explain intervenorfidelity

    And if we knew how to measure intervenor

    effectiveness, then the entire experiment would be

    unnecessary

  • 7/31/2019 The Hype and Futility of Measuring Implementation Fidelity v5

    29/46

    29

    CASE STUDY

  • 7/31/2019 The Hype and Futility of Measuring Implementation Fidelity v5

    30/46

    30

    CLIO

    Randomized field trial of curricula for Even

    Start Centers

    5 arm study4 active, 1 control

    Three fidelity measurements:

    Local Even Start center director

    Curriculum designer

    Neutral observer

  • 7/31/2019 The Hype and Futility of Measuring Implementation Fidelity v5

    31/46

    31

    Fidelity Instrument Development

    Several of the top national experts in the

    evaluation of early education interventions

    designed the neutral observer instrumentsand training

    Curriculum designers were consulted

    Curriculum designers had ongoing contactwith intervenors through technical

    assistance contracts

  • 7/31/2019 The Hype and Futility of Measuring Implementation Fidelity v5

    32/46

    32

    Correlations Between Developer

    and Observer Fidelity Ratings Across 96 active projects for early

    childhood curriculum:

    o 0.48 in year 1

    o 0.39 in year 2

    Across 48 active projects for parenting

    curriculum:o 0.10 in year 1

    o -0.01in year 2

  • 7/31/2019 The Hype and Futility of Measuring Implementation Fidelity v5

    33/46

    33

    Relationship between developer-rated fidelity

    and emergent child English literacy (arm A2)

    Fidelity

    Outcome

  • 7/31/2019 The Hype and Futility of Measuring Implementation Fidelity v5

    34/46

    34

    Relationship between developer-rated fidelity

    and emergent child English literacy (arm B2)

    Fidelity

    Outcome

  • 7/31/2019 The Hype and Futility of Measuring Implementation Fidelity v5

    35/46

    35

    Relationship between developer-rated fidelity

    and emergent child English literacy (arm A1)

    Fidelity

    Outcome

  • 7/31/2019 The Hype and Futility of Measuring Implementation Fidelity v5

    36/46

    36

    Relationship between developer-rated fidelity

    and emergent child English literacy (arm B1)

    Fidelity

    Outcome

  • 7/31/2019 The Hype and Futility of Measuring Implementation Fidelity v5

    37/46

    37

    Relationship between developer-rated fidelity

    and emergent child English literacy (control)

    Fidelity

    Outcome

  • 7/31/2019 The Hype and Futility of Measuring Implementation Fidelity v5

    38/46

    38

    Relationship between observer-rated fidelity

    and emergent child English literacy (arm A2)

    Fidelity

    Outcome

  • 7/31/2019 The Hype and Futility of Measuring Implementation Fidelity v5

    39/46

    39

    Relationship between observer-rated fidelity

    and emergent child English literacy (arm B2)

    Fidelity

    Outcome

  • 7/31/2019 The Hype and Futility of Measuring Implementation Fidelity v5

    40/46

    40

    Relationship between observer-rated fidelity

    and emergent child English literacy (arm A1)

    Fidelity

    Outcome

  • 7/31/2019 The Hype and Futility of Measuring Implementation Fidelity v5

    41/46

    41

    Relationship between observer-rated fidelity

    and emergent child English literacy (arm B1)

    Fidelity

    Outcome

  • 7/31/2019 The Hype and Futility of Measuring Implementation Fidelity v5

    42/46

    42

    Relationship between observer-rated fidelity

    and emergent child English literacy (control)

    Fidelity

    Outcome

  • 7/31/2019 The Hype and Futility of Measuring Implementation Fidelity v5

    43/46

    43

    Methods and Results

    Multiplied arm indicators by fidelity scores

    (constrained to lie between 0 and 1) in multi-level

    model Generally similar results

    Fidelity-adjusted estimates not always larger than

    ITT estimates!

    Two more stars

    One positive

    One negative!

  • 7/31/2019 The Hype and Futility of Measuring Implementation Fidelity v5

    44/46

    44

    Case Study Wrap Up

    A lot of money spent with little discernable

    return

    We still dont know how to develop goodpreschool curricula for Even Start projects

  • 7/31/2019 The Hype and Futility of Measuring Implementation Fidelity v5

    45/46

    45

    Other voices

    Peter Schochet, Mathematica Policy

    Research, in a recent IES white paper, final

    line: Thus, these classroom practice mediators

    may be of little help in confirming the

    studys conceptual model and identifyingteacher practices that are most associated

    with student learning gains.

  • 7/31/2019 The Hype and Futility of Measuring Implementation Fidelity v5

    46/46

    Josh Angrist

    Instrumental Variables Methods in Experimental

    Criminological Research: What, Why, and How? 2004.

    Journal Of Experimental Criminology.

    Especially noteworthy is the fact that, in marked contrast

    with an unfortunate trend in education research,

    criminologists do not appear to have been afflicted with

    what social scientist Tom Cook (2001) calls

    sciencephobia. This is a tendency to eschew rigorousquantitative research designs in favor of a softer approach

    that emphasizes process over outcomes.

    46