Impact Evaluation Overview

What is Impact& how to measure it

Philip JakobJanuary 12, 2017

The Burden of Proof

Evaluation in Development

There is a growing awareness that robust program evaluation is essential in order for organizations to optimize their impact• COMMON GOAL: To effectively execute

programming that is impactful for beneficiaries

Results Based

Management

Planning & Design

Implementation & MonitoringEvaluation

Assessment& Learning

Results Based Management (RBM)

• Results Based Management has two essential evaluation components1. Program Monitoring is a continuous process of collecting data on operations• Informs implementation and program management on effectiveness and accountability• Compares how well a project or policy strategy is performing against initial design

2. Impact Evaluation is the periodic assessment of the causal effect of a project, program or policy on beneficiary outcomes• Estimates the change in outcomes attributable to the intervention

Monitoring

Evaluation

Project Inputs

Project Activities

Project Beneficiari

esProject Outputs

Behavioral Changes

Direct Outcomes

Impact

Results Based Management is the empirical validation of a program’s Theory of Change (ToC), built upon the Project Impact Pathway (PIP) or logic model• For a guide on how to build a PIP see: the-project-impact-pathway (presentation)

Program MonitoringImpact Evaluation

SCOPE

Results Based Management (RBM)

http://www.slideshare.net/PhilipJakob/the-project-impact-pathway

Example: Evaluation of a Water, Sanitation and Hygiene Project

WASH Project implementation

•Materials produced

•Outreach channels established

Exposure to HW with soap promotions

•Beneficiaries reached

•Materials are deemed appropriate

Changes in beliefs, knowledge and availability

•Materials understood

•Knowledge gained•Attitudes influenced

Improved HW behavior among mother and caretakers

•Beneficiaries have access to HW facilities

•Beneficiaries want to improve child health

Improved children’s health

•Disease burden diminished

•Morbidity rate diminished

Causal Assumption

s

Program Monitoring Impact EvaluationSCOPE

Example: Evaluation of a WASH Project
















Causal Assumption

s•Materials produced•Personnel employed•Resources disbursed

Evaluation Method

•Participate rate•Media access rate•Materials uptake

•Observed behavior•Household survey•Physical tests

•Participant survey•Willingness to pay•Purchase records

•Medical statistics•Anthropometry•Reported wellbeing


Where’s the Impact?Given the complex nature of measuring impact for interventions in the real world, significant effort must be made in the designing the Data Generating Process (DGP) of an evaluation• What do we mean when we talk about the effect of a program?

A. The difference in outcomes between people who participate in the program and those who don’t• Observed effect – Many informal evaluations focus on this

B. What happens to someone after she participates in the program?• The Average Treatment on the Treated (ATT or TOT) effect

C. The difference between what happened to the person who participated in the program and what would have happened t that same person if she hadn’t participated in the program?• The true Average Treatment Effect (ATE)

Example 1: Microfinance

A.Difference in outcomes between people who participate in the program and those who don’t: Micro-borrowers may be more highly motivated than others

B.What happens to someone after she participates in the program: change in a borrower’s outcomes determined by outside factors that caused her to borrow as well as the effect of microfinance

C.Difference between what happened and what would have happened: the difference between what happened to a borrower’s business (family, health, etc.) and what would have happened if microfinance were not available to them

Example 2: Improved Wood-burning Stoves

A.Difference in outcomes between people who participate in the program and those who don’t: those who adopt high-tech stoves may be more concerned with good health than those who don’t

B.What happens to someone after she participates in the program: the desire to have an improved woodstove could be triggered by someone having become sick in the family, and sick people usually get better

C.Difference between what happened and what would have happened: the difference between a family’s respiratory health after adopting the stove compared to what their health would have been if the stove were not available to them

The Role of the Counterfactual Control Group

• Concept of the counterfactual: Program performance is relative to what/whom?

• At every stage of assessment we need a valid reference for assessment

• Without a counterfactual it is easy to draw false conclusions or misrepresent impact

Expected growth trend based on historical and control group data

The Importance of a Valid Counterfactual

Without a legitimate counterfactual most impact evaluations lose credibility • Participants to non-participant comparisons • e.g. comparing student performance in private schools with kids in public

schools, but because of self-selection outcomes are likely to be different anyway

• “Before-and-after” studies: • e.g. Income of microfinance loan recipients before and after taking loans

from a MFI, but microfinance borrowers take loans when they have investment opportunities so that the majority of the apparent impact of microfinance is actually an illusion
















Causal Assumption


Evaluation Method


•Observed behavior•Household survey•Physical tests

•Participant survey•Willingness to pay•Purchase records



Internal performance

Counterfactual Program pre/post Beneficiaries pre/postvs. non-

beneficiaries

Randomized non-

beneficiaries

Randomized non-

beneficiaries

Example: Evaluation of a WASH Project

Impact Evaluations Provide Critical DataWhile monitoring data is essential to the efficient implementation of

programs (resources used, goods & services produced, reach and reaction) only Impact Evaluation can answer questions about effectiveness:• Determine if a program had impact, by measuring the causal effect between

an intervention and an outcome of interest• Estimate the level of impact• Compare real impact with the expected impact at the time of designing the

intervention• Determine adequate intensity of intervention• Compare differential impact among geographical areas, communities, or

interventions• What is the effect of different sub-components of a program on specific

outcomes?• What is the right level of subsidy for a service?• How would outcomes be different if the program design changed?• Is the program cost-effective?

Good Impact Data Feeds Robust Statistical Analysis

Correlation Does NOT Imply CausationEven with a valid counterfactual, evaluators must ensure that they are drawing conclusions based on causal inference• Causal Inference: evaluating whether a change in one variable (x) will lead to

a change in another variable (y) assuming that nothing else changes (certis paribus)

Statistical tools can tell us a lot about how two variables covary, but this can lead to false conclusions • Correlation does NOT imply causation• To get to causal inference we generally need to know how the problem works

in real life

The Endogeneity Problem The challenge in defining causality in impact evaluations is that many factors in development are endogenous, and not always observable• Endogenous – Originating from inside the system, in the case of evaluations this typically means

a factor that is co-influential, or a possible third variable that affects both (e.g.. aspirations)• Education and earnings• Voluntary participation and ambition• Prices of substitute or complimentary goods

• Exogenous – Originating outside the system• Interpreting an endogenous relationship as exogenous means risking interpreting a system with

reverse causality as strictly causal

Evaluations that imply a causal relationships without accounting for endogeneity lack internal and external validity• There is a high probability that if the intervention were to be tested again it would provide

different outcomes

Validity Through RandomizationRandomization allows an evaluator to eliminate the possibility that they are arguing for a causal exogenous interpretation on an endogenous relationship• Randomized Control Trials (RCTs) assign treatment through lottery or

another random process• Generates two statistically identical groups• The only difference is the treatment

Randomly samplefrom area of interest

Random sampling and random assignment

Randomly samplefrom area of interest

Randomly assignto treatmentand control

Random sampling and random assignment

Randomly samplefrom both treatment and control

Why Run Randomized Evaluations?1.For programing:• Gives all eligible beneficiaries the same probability to receive the intervention

• Oversubscription: # eligible > available resources• Selection criteria is ethical, quantitative, fair and transparent

2.For analysis:• To ensure that the evaluation is measuring a causal relationship• So that one can employ straight-forward statistical analysis that it is unbiased

(OLS)• Allow costs and benefits to be more accurately quantified

3.For donors:• Produces the most accurate counterfactual making evaluation intuitive to all

stakeholders• So that innovative programs can be piloted and the most impactful scaled with

confidence• Ensure that programs are constantly working to optimize their outcomes
















Causal Assumption


Evaluation Method




Internal performance

Counterfactual Program pre/post Randomized non-

beneficiaries

Why Run Randomized Evaluations?Randomized evaluations allow Evaluators and Managers to measure outcomes and improve programs while minimizing the need to test behavioral assumptions

Randomized Evaluations Are Not Always Appropriate

Running Randomized Evaluations requires significant time and resources that may not be justified for some programs, especially:• When the program is premature and still requires considerable “tinkering” to work well• When the project is on too small a scale to randomize into two “representative

groups”• If a positive impact has been proven using rigorous methodology and resources are

sufficient to cover everyone• After the program has already begun and it is not expanding elsewhere• In emergency situations where ethical considerations suggest that acting to relieve

suffering is the immediate priority

Alternative Evaluation MethodsWhile randomized evaluation is the gold-standard, there are equally valid “quasi-experimental” methods that can be used: • Natural experiments that account for “as if random” program

participation across individuals • e.g. political boundaries, exogenous shocks

• Regression Discontinuity comparing individuals just above and below an eligibility threshold• e.g. idiosyncratic program prerequisites

• Difference in Difference comparing beneficiaries with themselves and other similar groups over time

• Statistical Matching comparing beneficiaries to individuals of similar observable traits

These methods often require making fundamental assumptions and involve more sophisticated statistical analysis which can undermine results• For an overview of methodologies see: J-PAL impact-evaluation-methods (pdf)

https://www.dropbox.com/s/mgqi4l5si8h628n/Summary_Eval_Chart_JPAL.pdf?dl=0

Thank you!Philip Jakob

Senior Analyst604-401-2378

[email protected]

mailto:[email protected]

Impact Evaluation Overview

Data & Analytics

Transcript of Impact Evaluation Overview