Presented by Jim Rugh to NONIE Conference in Paris 28 March 2011.

58
IOCE proposes more holistic perspectives * What’s involved in “rigorous impact evaluation”? Presented by Jim Rugh to NONIE Conference in Paris 28 March 2011

Transcript of Presented by Jim Rugh to NONIE Conference in Paris 28 March 2011.

Page 1: Presented by Jim Rugh to NONIE Conference in Paris 28 March 2011.

IOCE proposes more holistic perspectives

*What’s involved in “rigorous

impact evaluation”?

Presented by Jim Rughto NONIE Conference in Paris 28 March

2011

Page 2: Presented by Jim Rugh to NONIE Conference in Paris 28 March 2011.

Join me in a review the basics of:

1. Evaluation Design

2. Logic models

3. Counterfactuals

4. Context (simple-complicated-complex)

5. Evaluation Implementation

Page 3: Presented by Jim Rugh to NONIE Conference in Paris 28 March 2011.

*1. Evaluation Design

Page 4: Presented by Jim Rugh to NONIE Conference in Paris 28 March 2011.

baseline end of project evaluation

Illustrating the need for quasi-experimental longitudinal time series evaluation design

Project participants

Comparison group

post project evaluation

An introduction to various evaluation designs

scale of major impact indicator 4

Page 5: Presented by Jim Rugh to NONIE Conference in Paris 28 March 2011.

OK, let’s stop the action to identify each of the major types of evaluation (research) design …… one at a time, beginning with the most rigorous design.

5

Page 6: Presented by Jim Rugh to NONIE Conference in Paris 28 March 2011.

First of all: the key to the traditional symbols:

X = Intervention (treatment), I.e. what the project does in a community

O = Observation event (e.g. baseline, mid-term evaluation, end-of-project evaluation)

P (top row): Project participants

C (bottom row): Comparison (control) group

6

Page 7: Presented by Jim Rugh to NONIE Conference in Paris 28 March 2011.

baseline end of project evaluation

Comparison group

post project evaluation

Design #1: Longitudinal Quasi-experimental

P1 X P2 X P3 P4

C1 C2 C3 C4

Project participants

midterm

7

Page 8: Presented by Jim Rugh to NONIE Conference in Paris 28 March 2011.

baseline end of project evaluation

Comparison group

Design #2: Quasi-experimental (pre+post, with comparison)

P1 X P2

C1 C2

Project participants

8

Page 9: Presented by Jim Rugh to NONIE Conference in Paris 28 March 2011.

baseline end of project evaluation

Control group

Design #2+: Typical Randomized Control Trial

P1 X P2

C1 C2

Project participants

9

Research subjects randomly assigned either to project or control group.

Page 10: Presented by Jim Rugh to NONIE Conference in Paris 28 March 2011.

end of project evaluation

Comparison group

Design #3: Truncated QED

X P1 X P2

C1 C2

Project participants

midterm

10

Page 11: Presented by Jim Rugh to NONIE Conference in Paris 28 March 2011.

baseline end of project evaluation

Comparison group

Design #4: Pre+post of project; post-only comparison

P1 X P2

C

Project participants

11

Page 12: Presented by Jim Rugh to NONIE Conference in Paris 28 March 2011.

end of project evaluation

Comparison group

Design #5: Post-test only of project and comparison

X P

C

Project participants

12

Page 13: Presented by Jim Rugh to NONIE Conference in Paris 28 March 2011.

baseline end of project evaluation

Design #6: Pre+post of project; no comparison

P1 X P2

Project participants

13

Page 14: Presented by Jim Rugh to NONIE Conference in Paris 28 March 2011.

end of project evaluation

Design #7: Post-test only of project participants

X P

Project participants

14

Need to fill in missing data through other means:• What change occurred during the life of the project?• What would have happened without the project (counterfactual)?• How sustainable is that change likely to be?

Page 15: Presented by Jim Rugh to NONIE Conference in Paris 28 March 2011.

15

Design

T1

(baseline)

X(intervention)

T2

(midterm)

X(intervention,

cont.)

T3

(endline)

T4

(ex-post)

1 P1

C1X

P2

C2X

P3

C3

P4

C4

2 P1

C1X X

P2

C2

3X

P1

C1X

P2

C2

4P1 X X

P2

C2

5X X

P1

C1

6 P1 X X P2

7 X X P1Note: These 7 evaluation designs are described in the RealWorld Evaluation book

Page 16: Presented by Jim Rugh to NONIE Conference in Paris 28 March 2011.

What kinds of evaluation designs are actually used in the real world (of international development)? Findings from meta-evaluations of 336 evaluation reports of an INGO.

Post-test only 59%

Before-and-after 25%

With-and-without 15%

Other counterfactual

1%

Page 17: Presented by Jim Rugh to NONIE Conference in Paris 28 March 2011.

Even proponents of RCTs have acknowledged that RTCs are only appropriate for perhaps 5% of development interventions. An empirical study by Forss and Bandstein, examining evaluations in the OECD/DAC DEReC database by bilateral and multilateral organisations found only 5% used even a counterfactual design.

While we recognize that experimental and quasi experimental designs have a place in the toolkit for impact evaluations, we think that more attention needs to be paid to the roughly 95% of situations where these designs would not be possible or appropriate.

Page 18: Presented by Jim Rugh to NONIE Conference in Paris 28 March 2011.

*2. Logic Models

Page 19: Presented by Jim Rugh to NONIE Conference in Paris 28 March 2011.

19

Inputs ImplementationProcess

Outputs Outcomes Impacts

Economic context in which the

project operates

Political context in which the

project operates

Institutional and operational

context

Socio-economic and cultural characteristics of the affected populations

Note: The orange boxes are included in conventional Program Theory Models. The addition of the blue boxes provides the recommended more complete analysis.

One form of Program Theory (Logic) Model

Design Sustainability

Page 20: Presented by Jim Rugh to NONIE Conference in Paris 28 March 2011.

20

Page 21: Presented by Jim Rugh to NONIE Conference in Paris 28 March 2011.

PROBLEM

PRIMARY CAUSE

2

PRIMARY CAUSE 1

PRIMARY CAUSE 3

Secondary cause 2.2

Secondary cause 2.3

Secondary cause 2.1

Tertiary cause 2.2.1

Tertiary cause 2.2.2

Tertiary cause 2.2.3

Consequences Consequences Consequences

Page 22: Presented by Jim Rugh to NONIE Conference in Paris 28 March 2011.

DESIRED IMPACT

OUTCOME 2

OUTCOME 1

OUTCOME 3

OUTPUT 2.2

OUTPUT 2.3

OUTPUT 2.1

Intervention 2.2.1

Intervention 2.2.2

Intervention 2.2.3

Consequences Consequences Consequences

Page 23: Presented by Jim Rugh to NONIE Conference in Paris 28 March 2011.

Children are malnourished

Diarrheal disease

Insufficient food

Poor quality of

food

Unsanitary practices

Need for improved health

policies

Contaminated water

Flies and rodents

Do not use facilities correctly

People do not wash hands

before eating

High infant mortality rate

Page 24: Presented by Jim Rugh to NONIE Conference in Paris 28 March 2011.

Women empowered

Young women

educated

Women in leadership

roles

Economic opportuniti

es for women

Female enrollment

rates increase

Curriculum improved

Improved educational

policies

Parents persuaded

to send girls to school

Schools built

School system hires

and pays teachers

Reduction in poverty

Page 25: Presented by Jim Rugh to NONIE Conference in Paris 28 March 2011.

Advocacy Project Goal:

Improved educational policies enacted

Program Goal: Young women

educated

Construction Project

Goal: More classrooms

built

Teacher Education Project Goal:

Improve quality of curriculum

Program goal at impact level

ASSUMPTION(that others will do

this)PARTNER will do

this

OUR project

To have synergy and achieve impact all of these need to address

the same target population.

Page 26: Presented by Jim Rugh to NONIE Conference in Paris 28 March 2011.

We need to recognize which evaluative process is most appropriate for measurement at various levels

• Impact • Outcomes

• Output• Activities• Inputs

PERFORMANCE MONITORING

PROJECT EVALUATION

PROGRAM EVALUATION

Page 27: Presented by Jim Rugh to NONIE Conference in Paris 28 March 2011.

27

Ultimate Impact

End Outcomes Intermediate Outcomes

Outputs Interventions

Needs-based Higher Consequence

Specific Problem Cause Solution Process Inputs

American Red Cross

Program Goal Project Impact Outcomes Outputs Activities Inputs

AusAID Scheme Goal Major Development Objectives

Outputs Activities Inputs

CARE logframe Program Goal Project Final Goal Intermediate Objectives

Outputs Activities Inputs

CARE terminology

Program Impact Project Impact Effects Outputs Activities Inputs

CIDA + GTZ Overall goal Project purpose Results/Outputs Activities InputsCRS Proframe Goal Strategic Objective Intermediate

ResultsOutputs Activities Inputs

DANIDA + DfID

Goal Purpose Outputs Activities

EIDHR Overall Objectives

Specific Objective

Expected Results

Activities

European Union Overall Objective

Project Purpose Results Activities

FAO + UNDP + NORAD

Development Objective Immediate Objectives

Outputs Activities Inputs

PC/LogFrame Goal Purpose Outputs ActivitiesPeace Corps Purpose Goals Results Objectives Activities VolunteersSAVE – Results Framework

Goal Strategic Objective Intermediate Results

Outputs Activities Inputs

UNHCR Sector Objective

Goal Project Objective

Outputs Activities Input/Resources

USAID LogFrame

Final Goal Strategic Objective

Intermediate Results

Activities Inputs

USAID Results Framework

Goal Strategic Objective Intermediate Results

(Outputs) (Activities) (Inputs)

World Bank Long-term Objectives Short-term Objectives

Outputs Inputs

World Vision International

Program Goal Project Goal Outcomes Outputs Activities (Inputs)

The “Rosetta Stone of Logical Frameworks”

Page 28: Presented by Jim Rugh to NONIE Conference in Paris 28 March 2011.

*3. Alternative Counterfactuals

Page 29: Presented by Jim Rugh to NONIE Conference in Paris 28 March 2011.

29

Attribution and counterfactuals

How do we know if the observed changes in the project participants or communities income, health, attitudes, school

attendance, etc.

are due to the implementation of the project credit, water supply, transport vouchers,

school construction, etc.

or to other unrelated factors? changes in the economy, demographic

movements, other development programs, etc.

Page 30: Presented by Jim Rugh to NONIE Conference in Paris 28 March 2011.

30

The Counterfactual

What change would have occurred in the relevant condition of the target population if there had been no intervention by this project?

Page 31: Presented by Jim Rugh to NONIE Conference in Paris 28 March 2011.

31

*

Control group and comparison group

Control group = randomized allocation of subjects to project and non-treatment group

Comparison group = separate procedure for sampling project and non-treatment groups that are as similar as possible in all aspects except the treatment (intervention)

Page 32: Presented by Jim Rugh to NONIE Conference in Paris 28 March 2011.

32

Some recent developments in impact evaluation in

international developmentJ-PAL is best understood as a network of affiliated researchers … united by their use of the randomized trial methodology…

2003

2010

2008

2006

2009

Page 33: Presented by Jim Rugh to NONIE Conference in Paris 28 March 2011.

33

So, are Randomized Control Trials (RCTs) are the Gold Standard and should they be

used in most if not all program impact evaluations?Yes or no?

If so, under what circumstances should they be

used?

Why or why not?

If not, under what circumstances would they not

be appropriate?

Page 34: Presented by Jim Rugh to NONIE Conference in Paris 28 March 2011.

Adapted from Patricia Rogers, RMIT University

34

Evidence-based policy for simple interventions (or simple aspects): when RCTs may be appropriate

Question needed for evidence-based policy What works?

What interventions look like Discrete, standardized intervention

How interventions work Pretty much the same everywhere

Process needed for evidence

uptake Knowledge transfer

Page 35: Presented by Jim Rugh to NONIE Conference in Paris 28 March 2011.

35

•Complicated, complex programs where there are multiple interventions by multiple actors

•Projects working in evolving contexts (e.g. countries in transition, conflicts, natural disasters)

•Projects with multiple layered logic models, or unclear cause-effect relationships between outputs and higher level “vision statements” (as is often the case in the real world of international development projects)

When might rigorous evaluations of higher-level “impact” indicators require

much more than a simple RCT?

Page 36: Presented by Jim Rugh to NONIE Conference in Paris 28 March 2011.

36

There are other methods for assessing the

counterfactualReliable secondary data that depicts relevant trends in the population

Longitudinal monitoring data (if it includes non-reached population)

Qualitative methods to obtain perspectives of key informants, participants, neighbors, etc.

Page 37: Presented by Jim Rugh to NONIE Conference in Paris 28 March 2011.

There are situations in which a statistical counterfactual is not appropriate – even when

budget and time are not constraintsA conventional statistical counterfactual (with random selection

into treatment and control groups) is often not possible/appropriate:

When conducting the evaluation of complex interventions

When the project involves a number of interventions which may be used in different combinations in different locations

When each project location is affected by a different set of contextual factors

When it is not possible to use standard implementation procedures for all project locations

When many outcomes involve complex behavioral changes

When many outcomes are multidimensional or difficult to measure through standardized quantitative indicators.

37

Page 38: Presented by Jim Rugh to NONIE Conference in Paris 28 March 2011.

Some of the alternative approaches for constructing a counterfactualA: Theory based approaches1. Program theory / logic models2. Realistic evaluation3. Process tracing4. Venn diagrams and many other PRA methods5. Historical methods6. Forensic detective work7. Compilation of a list of plausible alternative

causes8. …

(for more details see www.RealWorldEvaluation.org)

Page 39: Presented by Jim Rugh to NONIE Conference in Paris 28 March 2011.

Some of the alternative approaches for constructing a counterfactualB: Quantitatively oriented approaches1. Pipeline design2. Natural variations3. Creative uses of secondary data4. Creative creation of comparison groups5. Comparison with other programs6. Comparing different types of interventions7. Cohort analysis8. …

(for more details see www.RealWorldEvaluation.org)

Page 40: Presented by Jim Rugh to NONIE Conference in Paris 28 March 2011.

Some of the alternative approaches for constructing a counterfactualC: Qualitatively oriented approaches1. Concept mapping2. Creative use of secondary data3. Many PRA techniques4. Process tracing5. Compiling a book of possible causes6. Comparisons between different projects7. Comparisons among project locations with

different combinations and levels of treatment(for more details see

www.RealWorldEvaluation.org)

Page 41: Presented by Jim Rugh to NONIE Conference in Paris 28 March 2011.

*4. Context

Page 42: Presented by Jim Rugh to NONIE Conference in Paris 28 March 2011.

Different lenses needed for different situations in the RealWorld

Simple Complicated ComplexFollowing a recipe Sending a rocket to

the moonRaising a child

Recipes are tested to assure easy replication

Sending one rocket to the moon increases assurance that the next will also be a success

Raising one child provides experience but is no guarantee of success with the next

The best recipes give good results every time

There is a high degree of certainty of outcome

Uncertainty of outcome remains

Sources: Westley et al (2006) and Stacey (2007), cited in Patton 2008; also presented by Patricia Rodgers at Cairo impact conference 2009.

42

Page 43: Presented by Jim Rugh to NONIE Conference in Paris 28 March 2011.

What’s a conscientious evaluator to do when facing such a complex

world?

Page 44: Presented by Jim Rugh to NONIE Conference in Paris 28 March 2011.

DESIRED IMPACT

OUTCOME 2

OUTCOME 1

OUTCOME 3

OUTPUT 2.2

OUTPUT 2.3

OUTPUT 2.1

Intervention 2.2.1

Intervention 2.2.2

Intervention 2.2.3

Consequences Consequences Consequences

A Simple RCT

A more comprehensive design

Page 45: Presented by Jim Rugh to NONIE Conference in Paris 28 March 2011.

Inputs

Outputs

Intermediate outcomes

Impacts

Donor Government Other donors

Credit for small farmers

Rural roads

SchoolsHealth services

Increased rural H/H income

Increased production

Increased school enrolment

Increased use of health services

Access to off-farm employment

Improved education performance

Improved health

Increased political participation

Expanding the results chain for multi-donor, multi-component program

Attribution gets very difficult! Consider plausible contributions each makes.

Page 46: Presented by Jim Rugh to NONIE Conference in Paris 28 March 2011.

*5. Evaluation Implementation

Page 47: Presented by Jim Rugh to NONIE Conference in Paris 28 March 2011.

47

OECD-DAC (2002: 24) defines impact as “the positive and negative, primary and secondary long-term effects produced by a development intervention, directly or indirectly, intended or unintended. These effects can be economic, sociocultural, institutional, environmental, technological or of other types”.

Definition of impact evaluation

Is it limited to direct attribution? Or point to the need for counterfactuals or Randomized Control Trials (RCTs)?

Page 48: Presented by Jim Rugh to NONIE Conference in Paris 28 March 2011.

48

1. Direct cause-effect relationship between one output (or a very limited number of outputs) and an outcome that can be measured by the end of the research project? Pretty clear attribution.

… OR …

2. Changes in higher-level indicators of sustainable improvement in the quality of life of people, e.g. the MDGs (Millennium Development Goals)? More significant. But assessing plausible contribution is more feasible than assessing unique direct attribution.

So what should be included in a “rigorous impact evaluation”?

Page 49: Presented by Jim Rugh to NONIE Conference in Paris 28 March 2011.

1) thorough consultation with and involvement by a variety of stakeholders,

2) articulating a comprehensive logic model that includes relevant external influences,

3) getting agreement on desirable ‘impact level’ goals and indicators,

4) adapting evaluation design as well as data collection and analysis methodologies to respond to the questions being asked, …

Rigorous impact evaluation should include (but is not limited to):

Page 50: Presented by Jim Rugh to NONIE Conference in Paris 28 March 2011.

5) adequately monitoring and documenting the process throughout the life of the program being evaluated, 6) using an appropriate combination of methods to triangulate evidence being collected, 7) being sufficiently flexible to account for evolving contexts, …

Rigorous impact evaluation should include (but is not limited to):

Page 51: Presented by Jim Rugh to NONIE Conference in Paris 28 March 2011.

8) using a variety of ways to determine the counterfactual, 9) estimating the potential sustainability of whatever changes have been observed, 10) communicating the findings to different audiences in useful ways, 11) etc. …

Rigorous impact evaluation should include (but is not limited to):

Page 52: Presented by Jim Rugh to NONIE Conference in Paris 28 March 2011.

The point is that the list of what’s required for ‘rigorous’ impact evaluation goes way beyond initial randomization into treatment and ‘control’ groups.

Page 53: Presented by Jim Rugh to NONIE Conference in Paris 28 March 2011.

To attempt to conduct an impact evaluation of a program using only one pre-determined tool is to suffer from myopia, which is unfortunate. On the other hand, to prescribe to donors and senior managers of major agencies that there is a single preferred design and method for conducting all impact evaluations can and has had unfortunate consequences for all of those who are involved in the design, implementation and evaluation of international development programs.

Page 54: Presented by Jim Rugh to NONIE Conference in Paris 28 March 2011.

We must be careful that in using the “Gold Standard”

we do not violate the “Golden Rule”:

“Judge not that you not be judged!”

In other words:“Evaluate others as you would

have them evaluate you.”

Page 55: Presented by Jim Rugh to NONIE Conference in Paris 28 March 2011.

Caution: Too often what is called Impact Evaluation is based on a “we will examine and judge you” paradigm. When we want our own programs evaluated we prefer a more holistic approach.

Page 56: Presented by Jim Rugh to NONIE Conference in Paris 28 March 2011.

To use the language of the OECD/DAC, let’s be sure our evaluations are consistent with these

criteria: RELEVANCE: The extent to which the aid activity is suited to the priorities and policies of the target group, recipient and donor.EFFECTIVENESS: The extent to which an aid activity attains its objectives.EFFICIENCY: Efficiency measures the outputs – qualitative and quantitative – in relation to the inputs.IMPACT: The positive and negative changes produced by a development intervention, directly or indirectly, intended or unintended.SUSTAINABILITY is concerned with measuring whether the benefits of an activity are likely to continue after donor funding has been withdrawn. Projects need to be environmentally as well as financially sustainable.

Page 57: Presented by Jim Rugh to NONIE Conference in Paris 28 March 2011.

The bottom line is defined by this question: Are our programs making plausible contributions towards positive impact on the quality of life of our intended beneficiaries? Let’s not forget them!

Page 58: Presented by Jim Rugh to NONIE Conference in Paris 28 March 2011.

58

Thank you!

58