1 Indicators for Malaria Impact Evaluation Impact Evaluation Team.
Impact Evaluation Overview
-
Upload
philip-jakob -
Category
Data & Analytics
-
view
42 -
download
0
Transcript of Impact Evaluation Overview
What is Impact& how to measure it
Philip JakobJanuary 12, 2017
The Burden of Proof
The Burden of Proof
Evaluation in Development
There is a growing awareness that robust program evaluation is essential in order for organizations to optimize their impact• COMMON GOAL: To effectively execute
programming that is impactful for beneficiaries
Results Based
Management
Planning & Design
Implementation & MonitoringEvaluation
Assessment& Learning
Results Based Management (RBM)
• Results Based Management has two essential evaluation components1. Program Monitoring is a continuous process of collecting data on operations• Informs implementation and program management on effectiveness and accountability• Compares how well a project or policy strategy is performing against initial design
2. Impact Evaluation is the periodic assessment of the causal effect of a project, program or policy on beneficiary outcomes• Estimates the change in outcomes attributable to the intervention
Monitoring
Evaluation
Project Inputs
Project Activities
Project Beneficiari
esProject Outputs
Behavioral Changes
Direct Outcomes
Impact
Results Based Management is the empirical validation of a program’s Theory of Change (ToC), built upon the Project Impact Pathway (PIP) or logic model• For a guide on how to build a PIP see: the-project-impact-pathway (presentation)
Program MonitoringImpact Evaluation
SCOPE
Results Based Management (RBM)
Example: Evaluation of a Water, Sanitation and Hygiene Project
WASH Project implementation
•Materials produced
•Outreach channels established
Exposure to HW with soap promotions
•Beneficiaries reached
•Materials are deemed appropriate
Changes in beliefs, knowledge and availability
•Materials understood
•Knowledge gained•Attitudes influenced
Improved HW behavior among mother and caretakers
•Beneficiaries have access to HW facilities
•Beneficiaries want to improve child health
Improved children’s health
•Disease burden diminished
•Morbidity rate diminished
Causal Assumption
s
Program Monitoring Impact EvaluationSCOPE
Example: Evaluation of a WASH Project
WASH Project implementation
•Materials produced
•Outreach channels established
Exposure to HW with soap promotions
•Beneficiaries reached
•Materials are deemed appropriate
Changes in beliefs, knowledge and availability
•Materials understood
•Knowledge gained•Attitudes influenced
Improved HW behavior among mother and caretakers
•Beneficiaries have access to HW facilities
•Beneficiaries want to improve child health
Improved children’s health
•Disease burden diminished
•Morbidity rate diminished
Causal Assumption
s•Materials produced•Personnel employed•Resources disbursed
Evaluation Method
•Participate rate•Media access rate•Materials uptake
•Observed behavior•Household survey•Physical tests
•Participant survey•Willingness to pay•Purchase records
•Medical statistics•Anthropometry•Reported wellbeing
Program Monitoring Impact EvaluationSCOPE
Where’s the Impact?Given the complex nature of measuring impact for interventions in the real world, significant effort must be made in the designing the Data Generating Process (DGP) of an evaluation• What do we mean when we talk about the effect of a program?
A. The difference in outcomes between people who participate in the program and those who don’t• Observed effect – Many informal evaluations focus on this
B. What happens to someone after she participates in the program?• The Average Treatment on the Treated (ATT or TOT) effect
C. The difference between what happened to the person who participated in the program and what would have happened t that same person if she hadn’t participated in the program?• The true Average Treatment Effect (ATE)
Example 1: Microfinance
A.Difference in outcomes between people who participate in the program and those who don’t: Micro-borrowers may be more highly motivated than others
B.What happens to someone after she participates in the program: change in a borrower’s outcomes determined by outside factors that caused her to borrow as well as the effect of microfinance
C.Difference between what happened and what would have happened: the difference between what happened to a borrower’s business (family, health, etc.) and what would have happened if microfinance were not available to them
Example 2: Improved Wood-burning Stoves
A.Difference in outcomes between people who participate in the program and those who don’t: those who adopt high-tech stoves may be more concerned with good health than those who don’t
B.What happens to someone after she participates in the program: the desire to have an improved woodstove could be triggered by someone having become sick in the family, and sick people usually get better
C.Difference between what happened and what would have happened: the difference between a family’s respiratory health after adopting the stove compared to what their health would have been if the stove were not available to them
The Role of the Counterfactual Control Group
• Concept of the counterfactual: Program performance is relative to what/whom?
• At every stage of assessment we need a valid reference for assessment
• Without a counterfactual it is easy to draw false conclusions or misrepresent impact
Expected growth trend based on historical and control group data
The Importance of a Valid Counterfactual
Without a legitimate counterfactual most impact evaluations lose credibility • Participants to non-participant comparisons • e.g. comparing student performance in private schools with kids in public
schools, but because of self-selection outcomes are likely to be different anyway
• “Before-and-after” studies: • e.g. Income of microfinance loan recipients before and after taking loans
from a MFI, but microfinance borrowers take loans when they have investment opportunities so that the majority of the apparent impact of microfinance is actually an illusion
WASH Project implementation
•Materials produced
•Outreach channels established
Exposure to HW with soap promotions
•Beneficiaries reached
•Materials are deemed appropriate
Changes in beliefs, knowledge and availability
•Materials understood
•Knowledge gained•Attitudes influenced
Improved HW behavior among mother and caretakers
•Beneficiaries have access to HW facilities
•Beneficiaries want to improve child health
Improved children’s health
•Disease burden diminished
•Morbidity rate diminished
Causal Assumption
s•Materials produced•Personnel employed•Resources disbursed
Evaluation Method
•Participate rate•Media access rate•Materials uptake
•Observed behavior•Household survey•Physical tests
•Participant survey•Willingness to pay•Purchase records
•Medical statistics•Anthropometry•Reported wellbeing
Program Monitoring Impact EvaluationSCOPE
Internal performance
Counterfactual Program pre/post Beneficiaries pre/postvs. non-
beneficiaries
Randomized non-
beneficiaries
Randomized non-
beneficiaries
Example: Evaluation of a WASH Project
Impact Evaluations Provide Critical DataWhile monitoring data is essential to the efficient implementation of
programs (resources used, goods & services produced, reach and reaction) only Impact Evaluation can answer questions about effectiveness:• Determine if a program had impact, by measuring the causal effect between
an intervention and an outcome of interest• Estimate the level of impact• Compare real impact with the expected impact at the time of designing the
intervention• Determine adequate intensity of intervention• Compare differential impact among geographical areas, communities, or
interventions• What is the effect of different sub-components of a program on specific
outcomes?• What is the right level of subsidy for a service?• How would outcomes be different if the program design changed?• Is the program cost-effective?
Good Impact Data Feeds Robust Statistical Analysis
Correlation Does NOT Imply CausationEven with a valid counterfactual, evaluators must ensure that they are drawing conclusions based on causal inference• Causal Inference: evaluating whether a change in one variable (x) will lead to
a change in another variable (y) assuming that nothing else changes (certis paribus)
Statistical tools can tell us a lot about how two variables covary, but this can lead to false conclusions • Correlation does NOT imply causation• To get to causal inference we generally need to know how the problem works
in real life
The Endogeneity Problem The challenge in defining causality in impact evaluations is that many factors in development are endogenous, and not always observable• Endogenous – Originating from inside the system, in the case of evaluations this typically means
a factor that is co-influential, or a possible third variable that affects both (e.g.. aspirations)• Education and earnings• Voluntary participation and ambition• Prices of substitute or complimentary goods
• Exogenous – Originating outside the system• Interpreting an endogenous relationship as exogenous means risking interpreting a system with
reverse causality as strictly causal
Evaluations that imply a causal relationships without accounting for endogeneity lack internal and external validity• There is a high probability that if the intervention were to be tested again it would provide
different outcomes
Validity Through RandomizationRandomization allows an evaluator to eliminate the possibility that they are arguing for a causal exogenous interpretation on an endogenous relationship• Randomized Control Trials (RCTs) assign treatment through lottery or
another random process• Generates two statistically identical groups• The only difference is the treatment
Randomly samplefrom area of interest
Random sampling and random assignment
Randomly samplefrom area of interest
Randomly assignto treatmentand control
Random sampling and random assignment
Randomly samplefrom both treatment and control
Why Run Randomized Evaluations?1.For programing:• Gives all eligible beneficiaries the same probability to receive the intervention
• Oversubscription: # eligible > available resources• Selection criteria is ethical, quantitative, fair and transparent
2.For analysis:• To ensure that the evaluation is measuring a causal relationship• So that one can employ straight-forward statistical analysis that it is unbiased
(OLS)• Allow costs and benefits to be more accurately quantified
3.For donors:• Produces the most accurate counterfactual making evaluation intuitive to all
stakeholders• So that innovative programs can be piloted and the most impactful scaled with
confidence• Ensure that programs are constantly working to optimize their outcomes
WASH Project implementation
•Materials produced
•Outreach channels established
Exposure to HW with soap promotions
•Beneficiaries reached
•Materials are deemed appropriate
Changes in beliefs, knowledge and availability
•Materials understood
•Knowledge gained•Attitudes influenced
Improved HW behavior among mother and caretakers
•Beneficiaries have access to HW facilities
•Beneficiaries want to improve child health
Improved children’s health
•Disease burden diminished
•Morbidity rate diminished
Causal Assumption
s•Materials produced•Personnel employed•Resources disbursed
Evaluation Method
•Participate rate•Media access rate•Materials uptake
•Medical statistics•Anthropometry•Reported wellbeing
Program Monitoring Impact EvaluationSCOPE
Internal performance
Counterfactual Program pre/post Randomized non-
beneficiaries
Why Run Randomized Evaluations?Randomized evaluations allow Evaluators and Managers to measure outcomes and improve programs while minimizing the need to test behavioral assumptions
Randomized Evaluations Are Not Always Appropriate
Running Randomized Evaluations requires significant time and resources that may not be justified for some programs, especially:• When the program is premature and still requires considerable “tinkering” to work well• When the project is on too small a scale to randomize into two “representative
groups”• If a positive impact has been proven using rigorous methodology and resources are
sufficient to cover everyone• After the program has already begun and it is not expanding elsewhere• In emergency situations where ethical considerations suggest that acting to relieve
suffering is the immediate priority
Alternative Evaluation MethodsWhile randomized evaluation is the gold-standard, there are equally valid “quasi-experimental” methods that can be used: • Natural experiments that account for “as if random” program
participation across individuals • e.g. political boundaries, exogenous shocks
• Regression Discontinuity comparing individuals just above and below an eligibility threshold• e.g. idiosyncratic program prerequisites
• Difference in Difference comparing beneficiaries with themselves and other similar groups over time
• Statistical Matching comparing beneficiaries to individuals of similar observable traits
These methods often require making fundamental assumptions and involve more sophisticated statistical analysis which can undermine results• For an overview of methodologies see: J-PAL impact-evaluation-methods (pdf)