The Goldilocks Challenge: learning to structure a ‘right ... · The Goldilocks Challenge:...

The Goldilocks Challenge: learning to structure a ‘right-fit’ and effective monitoring and evaluation systemDean KarlanProfessor of Economics, Yale UniversityPresident, Innovations for Poverty Action

The Goldilocks Problem

Presenter

Presentation Notes

Like Goldilocks, nonprofit organizations have to navigate many choices and challenges to build monitoring and evaluation systems. What kind of evaluation should the organization undertake? How can organizations develop systems that work just right?

Overview

• Where to begin?– Theory of change

• What to evaluate? – Monitoring vs. Impact Evaluation– What is monitoring?– When is an impact evaluation appropriate?– When is an impact evaluation not appropriate?

• How to measure? – The core problem– Framework for ‘right-fit’ M&E systems

• Credible• Accurate Data• Appropriate Analysis

• Actionable• Responsible• Transportable

Theory of Change

A theory of change explains the “why” of a program by telling a sequential story: • what goes in, • what gets done, • what comes out, and• how the world thus (hopefully) changes for the better

Clearly articulating this helps organizations design sound programs and, lays the foundation for right-fit data collection

Presenter

Presentation Notes

Developing a theory of change typically begins in one of two ways. For new programs, an organization begins by identifying the problem or need it intends to address, or the change it wants to make through the program. Once an organization has agreed on the problem to solve or the change it intends to effect, it works backwards (often called backwards mapping) to lay out the series of actions needed to produce those results. At this point the program defines the specific activities it will undertake, the goods and services (outputs) it will deliver as a result of those activities, and the intended social changes (impact) it will create. At every step of the way, the assumptions underlying each step are clearly laid out and examined. For organizations that already have a program in place, developing a theory of change is more like self-reflection. Such self-reflection can help strengthen programs by forcing organizations to challenge their assumptions and by asking whether sufficient evidence exists to support the current strategy. Such moments can be reassuring, but can help also identify crucial gaps in current operations. No matter when an organization starts developing a theory of change, the theory should articulate the assumptions that must hold for the program to work.

Theory of change

Example from the Deworm the World Initiative

IMPACT: Higher

incomes

OUTCOME: Decreased

worm prevalence

OUTPUT: Number of students

dewormed

INPUT: Staff time, deworming

tablets

ACTIVITY:Trainings,

deworming days

Monitoring vs. Impact Evaluation

Monitoring Impact Evaluation

Impact Evaluation: Asks: How have lives changed compared to how they would have changed had the program/policy/business not happened?

Monitoring: Asks: What did the program/policy/business use, do and produce?

OutputsInputs Activities Outcomes Impact

Presenter

Presentation Notes

A note about terminology

Program

Tinkering

Figure out implementation

Monitoring!

We know what we’re doing

What is monitoring?

• Collecting and analyzing high quality, actionable data on program implementation

• Monitoring can help:– Internal learning— answer questions related to program progress

and program improvement–External reporting— demonstrate accountability and transparency

Is there evidence?

Yes

Keep on doing that!

Can you generate it?

Monitoring AND Impact evaluation!

Program

Tinkering

Figure out implementation

Monitoring!

We know what we’re doing

No

No Yes

When is an impact evaluation appropriate?

Design issues: • Sample size is sufficient

• Often the deal breaker

• Note that this means enough separable units

• Timing is right• Not yet implemented

• Some path for expansion

• Knowledge gap worth filling

When is an impact evaluation appropriate?

Practical issues: • When there is budget and capacity to do it well.

• Often quasi-experimental methods cost just as much as RCTs• Randomizing is not expensive, surveying is • Field work is done well and results are disseminated

• The partner cares deeply about the answer to the research question and will use the results

Proof of concept: • The intervention has been implemented before and the process refined• but the idea needs to be validated• and there is a clear return on investment

When is an impact evaluation notappropriate?

Design issues: • Separable units

–Sample size is too small–Macro policy (monetary policy, trade, etc)

• Spillovers rampant and unmeasurable• No resource constraint that can be randomized well

–Example: refugee camp• Ethical: When we know the answer already

–We don’t keep running RCTs on vaccines – they work, continuing to evaluate is unethical

How to measure?The Core Problem

• Too little data

• Too much data

• Wrong data

Presenter

Presentation Notes

Too little - Some organizations do not collect enough (of the right kind of) data, which means they cannot fulfill what should be their top priority: using data to learn, innovate, and improve program implementation over time. Too much - Some collect more data than they actually have the resources to analyze, resulting in wasted time and effort that could have been spent more productively elsewhere Wrong - Many organizations track changes in outcomes over time, but not in a way that allows one to know if the organization caused the change to happen, or if just happened to happen alongside the program. This distinction matters greatly for knowing whether to continue the program.

From which set would you choose?

or

Presenter

Presentation Notes

Offered a coupon for jam and then showed either a display of 6 jams or 24 jams.

Jams: The results

Sheena Iyengar and Mark Lepper. “When Choice is Demotivating: Can One Desire Too Much of a Good Thing?” Journal of Personality and Social Psychology, 2000, Vol. 79, No. 6, 995-1006

0

20

40

60

80

100

120

140

160

24 jams 6 jams

Num

ber o

f peo

ple

Stoppedat booth

Jams: The results

Sheena Iyengar and Mark Lepper. “When Choice is Demotivating: Can One Desire Too Much of a Good Thing?” Journal of Personality and Social Psychology, 2000, Vol. 79, No. 6, 995-1006

0

20

40

60

80

100

120

140

160

24 jams 6 jams

Num

ber o

f peo

ple

Stoppedat booth

Purchased

Presenter

Presentation Notes

data are like jams – too much is overwhelming and can impede decision-making. So when thinking about how to find the right-fit in data collection, remember that more is not always better.

Finding the ‘right-fit’

Usefulness

Amount of data

Presenter

Presentation Notes

How can organizations find right-fit monitoring and evaluation systems that support learning, action and responsibility? As with Goldilocks search for the best porridge, chair, and bed, the key to is to find the right data. More is not always better. Nor is less. And simply the middle is not the answer either. What is the right balance?

Framework for ‘right-fit’ M&E Systems

CART principlesCredible

Collect accurate high quality data and analyze them appropriately

Actionable

Commit to act on the data you collect

Responsible

Ensure the benefits of data collection outweigh the costs

Transportable

Collect data that generate knowledge for other programs

Presenter

Presentation Notes

organizations need a framework to help them wade through the decisions they will encounter – whether they are setting up a whole monitoring and evaluation system from scratch; reforming an old, tired, and poorly fit system; or simply designing a small survey

Credible• Data must accurately measure what they are supposed to

measure• Appropriate analysis must be conducted in a credible way

Bad quality data and data analyzed badly are similar to snake oil – worse than doing nothing at all!

Presenter

Presentation Notes

There are two elements to credible data Bad quality data or bad analysis are similar to snake oil. Worse that doing nothing at all

Validity – is it what you were trying to measure?To be valid, data should capture the essence of what one is seeking to measure.

Reliability – can you trust the data that is collected?Reliability implies that the same data collection procedure will produce the same data repeatedly.

Accurate Data

Presenter

Presentation Notes

Validity: To know someone’s age, asking their age is usually fairly straightforward. But many concepts are far less clear. Consider a simple example: an An organization would like to ask a question that measures the number of times a person has sought medical attention in the last month. The concept the organization wants to measure is care seeking. What are all the types of activities that could fall under this concept? Doctor and hospital visits? Homeopathy? Traditional healers? Massage therapists? Depending on the context, the way this question is asked could result in many different answers. Reliability: One simple way of thinking about reliability is to consider the use of physical instruments in data collection. Suppose a study uses scales to measure the weight of respondents. If the scales are calibrated differently each day, they will not produce a reliable estimate of weight. For survey questions, reliability implies that a survey question will be interpreted and answered the same way by different respondents. Admin data or data from third party sources also needs to go through quality checks

Appropriate Analysis

• Impact implies causality; it tells us how an organization has changed the world around it• That means comparing what happened, to what would

have happened in the absence of the programwhat happened with the program

- what would have happened without the program= IMPACT of the program

• To measure impact it is necessary to find a way to credibly estimate the counterfactual i.e. how program participants would have fared if the program had not occurred

Appropriate AnalysisImpact Evaluation

Appropriate AnalysisConstructing the counterfactual

• Counterfactual is often constructed by selecting a group not affected by the program

• Non-randomized:• Argue that a certain excluded group mimics the counterfactual.

• Randomized:• Use random assignment of the program to create a control

group which mimics the counterfactual.

Appropriate AnalysisImpact of a Remedial Education Program (Balsakhi)

Method Impact Estimate

Pre-post 26.42*

Simple Difference -5.05*

Difference-in-Difference 6.82*

Regression 1.92

Randomized Experiment 5.87*

*: Statistically significant at the 5% level

RCTs may be the gold standard for measuring

impact, but they aren’t the right fit for every organization

or every project.

Actionable

Organizations should ask three questions of each and every piece of data that they want to collect:• Is there a specific action that we will take based on the

findings?• Do we have the resources necessary to implement that

action?• Do we have the commitment required to take that

action?

Presenter

Presentation Notes

Need to start with clear description of the decisions that you will make with the data. Also necessary to determine decision triggers – if we find out that our results are above or below a certain level, what would we do? Also critical that systems can deliver the data in time for the decision to be made.

Responsible

• The responsibility principle can help organizations assess tradeoffs in a number of different areas, including:• Data Collection Methods: Is there is a cheaper or more

efficient method of data collection that does not compromise quality?

• Use of respondents’ time: Does the information to be gained justify taking a beneficiary’s time to answer?

• Resource Use: Is the total amount of spending on data collection justified, given the information it will provide, when compared to the amount spent on other areas of the organization (such as administrative and programmatic costs)?

Transportable

• Your analysis should help others too• Sharing successes and failures

• External validity• The Balsakhi program

helped inform the development of an education program in Ghana

Two line summary

Data: Have a plan! Preferably a good one.

Thank you!

Please refer to http://www.poverty-action.org/goldilocksfor additional resources and case studies.

http://www.poverty-action.org/goldilocks

The Goldilocks Challenge: learning to structure a ‘right ... · The Goldilocks Challenge:...

Documents

Transcript of The Goldilocks Challenge: learning to structure a ‘right ... · The Goldilocks Challenge:...