Post on 30-May-2015
description
Nicholas Minot (IFPRI/Uganda)
Atsuko Toda (IFAD/Vietnam)
Nguyen Ngoc Ahn (DEPOCEN)
RIMS+ surveys:
A tool for project design and evaluation
Background on RIMS in Vietnam
Results and Information
Management System (RIMS) 3rd -level results are associated with
project impact on child malnutrition
and household living standards.
IFPRI project focus on the
household survey used to collect
third-level results
Background on RIMS
RIMS survey guidelines
Should be implemented for large, national IFAD projects
Should be done before, and at end of project
Sample size: 900 beneficiary households
Returning to same households not recommended
Concern about concentration of IFAD program efforts
Administrative complications of finding old households
Background on RIMS
RIMS questionnaire
Objective is to measure assets and
child nutrition
divided into three sections Section 1 – Household demographics
Section 2 – Housing, assets, and food
security
Section 3 – Anthropometry
Background on RIMS
Standardization of RIMS questionnaire
Ensures comparability across countries
Makes analysis relatively quick
Assures quality
But little flexibility in questionnaire design & analysis
Does not collect intermediary indicators
Changes in RIMS+
Overview of changes
Changes Rationale
1. Expanded questionnaire Collect additional information to diagnose farmer
constraints, improve design of interventions, and
measure impact on intermediate indicators
2. Use of control group Better measurement of impact of project by
controlling for broader changes in rural conditions
3. Additional training and
supervision
Improve quality of data
4. GPS to geo-reference
households
Facilitate return to same households (panel) and
better supervision of enumerators
5. Flexible questionnaire &
analysis
Address information needs of the IFAD project and
IFAD planning in general
Changes in RIMS+
1. Expanded questionnaire
RIMS+ RIMS New info in RIMS+
A. Member
characteristics
1. Household
demographics
+ ethnicity, school attendance, & reasons for not
attending
B. Housing 2. Survey questions + roof material, ownership status, location of
toilet
C. Assets 2. Survey questions + agricultural equipment
D. Land (no info) Farm size, ownership, irrigation, distance
E. Crop production (no info) Production, sales, & prices for 25 crops; cost of
6 inputs
F. Livestock &
fisheries
(no info) Herd size, sales, & costs for 12 types of animals,
use of vet services, type of feeding
Changes in RIMS+
1. Expanded questionnaire (continued)
RIMS+ RIMS New info in RIMS+
G. Extension &
market access
(no info) Access to extension, who uses, cooperatives,
details of sales, distance to markets
H. Non-farm
activities
(no info) Income and business expenses for 11 non-farm
income sources, gender roles
I. Food security 2. Survey questions + coping strategies and quality of diet
J. Credit & borrowing (no info) Access to credit, info on loans received
K. Socio-Economic
Development Plan
(no info) Knowledge of and participation in SEDP process
L. Risk &
vulnerability
(no info) Perceived risk of six natural disasters
M. Anthropometry 3. Anthropometry No new information
Changes in RIMS+
2. Use of control group
Control group is 300 households that are similar to
beneficiaries but not in project area
Useful to control for changes in rural areas due to other factors
Beneficiary
households
Control
households
Impact according
to current
before-after
comparison
Actual impact using
info from control
group
Example 1 Income rises
8%
Income rises 4%
due to economic
growth
Suggests that project
caused 8% increase in
income
Actually, only a 4%
increase due to project
Example 2 Income does
not change
Income falls 4%
due to drought
Suggests that project
had no effect
Actually, 4% increase in
income due to project
Changes in RIMS+
2. Use of control group (continued)
Time
Before project After project
Control group
Beneficiary householdsOutcome
indicator
Actual effect
of projectBefore-after
difference
is hypothetical
path of beneficiary
households without the
project, based on
growth in control group
Changes in RIMS+
3. Additional training and supervision
Because questionnaire is longer and somewhat more complicated, need for additional training & supervision of enumerators
IFPRI & DEPOCEN prepared detailed enumerator manual
DEPOCEN provided 5 days of training plus testing of questionnaire
DEPOCEN also provided additional supervision during data collection, particularly important in first week of data collection
Changes in RIMS+
4. Use of GPS units
GPS units are sometimes used in
RIMS surveys
Main purpose is to make it easier
to find household to interview in
later round of survey
Additional benefit of verifying
that enumerators have visited
households in village
Changes in RIMS+
5. Flexible questionnaire & analysis of results
Original RIMS is analyzed in a “black box” Advantage is analysis is fast, reliable, and comparable
But little opportunity to customize results for project
RIMS+ questionnaire can be customized for project
Type of IFAD project Possible customization of questionnaire
Farmer training &
extension
Access to extension, sources of info, perception of
usefulness, adoption of advice, yield
Linking farmers to
market
Travel time to markets, types of buyers, degree of
competition, prices received, share sold
Promotion of non-farm
enterprises
Number & composition of NFEs, profitability, training
needs, perceived constraints, factors affecting success
Improved access to
credit
Sources of credit, interest rates paid, use of credit, reasons
for use of informal credit, factors affecting repayment rate
Changes in RIMS+
5. Flexible questionnaire & analysis of results
RIMS+ analysis can be customized to address questions
relevant for project design & implementation Is access to extension services different for female-headed
farmers?
Can pepper be successfully grown by small-scale farmers
with limited resources?
Is targeting landless households more (or less) pro-poor
than targeting farmers with less than 0.5 hectares?
Is satisfaction with project services higher in one district
than in another?
Expanded questionnaire
More information and more complicated questionnaire
Requires additional training and supervision
Longer interview time (double at least)
Requires a new data entry program
Separate data entry in CSPro for 1200 questionnaires
At least 2 days in preparing CSpro entry data form
Another 2 days for training in data entry in CSPro in addition to
RIMS training.
Increased complexity in analysis and reporting
Cost and implementation issues
Use of control group
Increased workload with financial implication (additional 300 non-
project household)
Implementing survey in non-project area is more difficult due to
logistics, cooperation
Data entry in both RIMS and CSPro
RIMS software to enter RIMS core questions for 900 beneficiary households
Data entry in CSPro for full questionnaire for1200 household sample
Additional training/supervision
Project managers do not see immediate benefit
Cost and implementation issues
Use of GPS
Cost and implementation issues
Increased training time (1/2 day) and additional time at
household (10 minutes)
Not easy to use due to language barrier
Additional burden due to the fact that interviewers already
have to carry weight and scale
Cost and implementation issues
Component First-time costs Per survey costs
Expanded questionnaire in
data collection
Already carried out under
IFAD-IFPRI Partnership
Interview time is
approximately doubled
Use of control group No fixed cost Increases field costs by 50-
100%
Additional training &
supervision
Enumerator manual
prepared under
Partnership
Approximately US$ 10-
15k per survey
Use of GPS units Cost to purchase =
US$ 100 x 20 units =
US$ 2000
Modest - GPS units can be
shared across projects or
rented
Analysis of data Large initial cost of
preparing analysis
programs, already
undertaken by Partnership
For standard analysis,
negligible. For
customized analysis,
requires Stata skills
Cost estimates
Questions
Results of Vietnam RIMS+
Which crops are pro-poor?
How does crop commercialization vary across farmers?
Do female-headed farmers have equal access to modern
inputs?
How important is income from non-farm activities?
How to farmers perceive the risks of natural disasters?
Is food security threatened by crop commercialization?
How involved are farmers in the preparation of the Socio-
Economic Development plans?
Will raising farmer income improve child nutrition?
Which crops are pro-poor?
Results of Vietnam RIMS+
• Rice is grown by majority of the poor,
but fewer high-income households
• Maize, groundnut, red onion, bananas,
tea, and vegetables are grown by both
poor and non-poor
• Avocado, mango, durian, pepper,
sugarcane, coffee, and cashew are
grown disproportionately by high-
income farms
• This is not to say they can’t be grown
by poor farmers, but any untargeted
support to these crops will not be pro-
poor
Is input use less among female-headed households?
Results of Vietnam RIMS+
• Not much evidence that input use per hectare is lower
• But smaller farm sizes lead to smaller crop production and lower
income
What is the importance of non-farm income?
Results of Vietnam RIMS+
• Even the 20% of farms with the smallest area (less than 0.10
hectares) earns the bulk of their income from crop production
• 45% of smallest farms rent, sharecrop, borrow, or use illegally
other land
How do farmers perceive the risk of different natural
disasters?
Results of Vietnam RIMS+
• Perception of disaster risk varies by province
• Also, perception of likely losses is greater for poor households
Is food security threatened by commercialization?
Results of Vietnam RIMS+
• Commercialization is defined as the share of the value of crop
production that is sold
• Relationship holds even after controlling for per capita income
and farm size in regression analysis
Will raising farmer income improve child nutrition?
Results of Vietnam RIMS+
• Yes, but effect is weak
• Many other variables influence child nutrition: sanitation, health care,
education, child rearing practices, etc.
-50
5
Z-s
co
res
10 12 14 16 18Log of per capita income
Length/height-for-age Z-score Weight-for-length/height Z-score
lowess haz06 lnpcinc lowess whz06 lnpcinc
Summary & conclusions
RIMS+ surveys probably not suitable for all IFAD projects because of additional costs
Conditions under which it is most suitable: IFAD project design is flexible, can be revised
in light of new information from survey IFAD project focuses on a new topic or new
region, so there is a need for information There are gaps in knowledge about farm
household livelihoods and behavior relevant to project
IFAD project is relatively large, implying an adequate M&E budget
When is RIMS+ most suitable?
Additional issues
Size of control group
At the moment, 900 treatment to meet standard RIMS requirement and 300 control
But typically control group is similar size
It would reduce costs to develop a Core Module and additional modules that are selected depending on project (e.g. agricultural marketing, credit, extension)
RIMS+ would require additional capacity building for IFAD project staff
Project has prepared an enumerator manual and data entry programs and could also prepare an implementation guidelines if needed
Summary & conclusions
Page 28
Objective of Impact Evaluation
Measure the effect of the program on its beneficiaries (and eventually on its non-
beneficiaries) by answering the counterfactual question:
How would individuals who participated in a program have fared in the absence of the program?
How would those who were not exposed to the program have fared in the presence of the program?
Two main problems arise: confounding factors and selection biases.
Page 29
Comparing averages Individual-level measure of impact : what would be the outcome (e.g. farm incomes)
had he/she not participated to the program (in our case the treatment?
Compare the individual with the program, to the same individual without the program, at the same time ?
- can never observe both, missing data problem.
Instead: Average impact on given groups of individuals
Compare mean outcome in group of participants (Treatment group) to mean outcome in similar group of non-participants (Control group)
Average Treatment effect on the treated (ATT):
Page 30
Building a control group
Compare what is comparable.
Treatment” and “Control” groups must look the same if there was no program.
Generally, those individuals who benefit from the program initially differ from those
who don’t.
External selection: programs are explicitly targeted (Particular areas, Particular individuals).
Self selection: the decision to participate is voluntary.
Pb with comparing beneficiaries and non-beneficiaries: the difference can be attributed to
both the impact or the original differences.
SELECTION BIAS - when individuals or groups are selected or self select for
treatment on characteristics that may also affect their outcomes.
Page 31
Initial
PopulationSelection
Treatment Group
(receives procedure X)
Impact = Y Exp – Y Control
Quintile I
(Poorer)
Quintile II Quintile III Quintile IV QuintileV
(Richer)
Program selection does not lead to selection bias
(from Bernard 2006)
Control group
(does not receives procedure X)
Page 32
Initial
Population
Quintile I
(Poorer)
Quintile II Quintile III Quintile IV QuintileV
(Richer)
Control group
(does not receives procedure X)
Treatment Group
(receives procedure X)
Program selection leads to selection bias
Selection
Impact ≠ Y Exp – Y Control
Page 33
“Sign” of the selection bias (1)
Program targeted on “worse-off” households
Treatment Control
Observed difference is negative
Actual impact
Page 34
Treatment Control
Observed difference is very large
Actual impact
“Sign” of the selection bias (2)
Program targeted on “better-off” households
Impact evaluation for policy
decisions
Impact evaluations needed to-
curtailing inefficient programs,
to scaling up interventions
adjusting program benefits,
to selecting among various program alternatives.
The Mexican Progresa/Oportunidades evaluation became
influential because of
the innovative nature of the program
its impact evaluation provided credible and strong evidence
Page 35
Role of qualitative data Qualitative data-a key supplement to quantitative impact evaluations
providing complementary perspectives on program’s performance.
Employ mixed methods (Bamberger, Rao & Woolcock 2010).
Approaches include FGD, expert elicitation, key informant interviews
(Rao and Woolcock 2003).
Useful 1. Can use to develop hypotheses as to how and why the
program would work
2. Before quantitative IE results become available, qualitative work can
provide quick insights happenings in the program.
3. In the analysis stage, it can provide context and explanations for the
quantitative results
Page 36
Focusing on quantitative methods
A central feature of IE is use of longitudinal data to use
“difference-in-differences” or “double difference” methods.
Methods rely on baseline data collected before the project
implementation and follow-up data after it starts to develop
a “before/after” comparison.
Data collected from households receiving the program and
those that do not (“with the program” / “without the
program”).
Page 37
Double difference methods:
continued Why both “before/after” and “with/without” data are necessary ?
Suppose only collected data from beneficiaries.
Suppose between the baseline and follow-up, some adverse event occurs.
—the benefits of the program being more than offset by the damage from bad
event. These effects would show up in the difference over time in the
intervention group, in addition to the effects attributable to the program.
More generally, restricting the evaluation to only “before/after” comparisons
makes it impossible to separate program impacts from the influence of other
events that affect beneficiary households.
To guard against this add a second dimension to evaluation design that includes
data on households “with” and “without” the program.
Page 38
Illustration of double difference
Survey round
Intervention group
(Group I)
Control group
(Group C)Difference across groups
Follow-up I1 C1 I1 – C1
Baseline I0 C0 I0 – C0
Difference across time I1 – I0 C1 – C0
Double-difference
(I1 – C1) – (I0 – C0)
Page 39
Randomization With random program assignment all individuals-same chance of receiving the program.
With well done randomized design evaluation, beneficiaries and non-beneficiaries on
average, the same observed and, more important, unobserved characteristics (since they
are more difficult to control for).
In this way a credible basis for comparison is established, freed from selectivity concerns,
and the direction of causality is certain.
A further advantage to a randomized design is that program impact is easy to
calculate and easier to understand and explain.
Heckman and Smith (1995)-however, point
Randomization bias- the process of randomization itself leads to a different
beneficiary pool than would otherwise have been treated
substitution bias where non-beneficiaries obtain similar treatments from
different sources—a form of “contamination.”
Page 40
Matching Matching methods of program evaluation construct a comparison group by
“matching” treatment households to comparison group households based on
observable characteristics.
The impact is estimated as the average difference in the outcomes for each
treatment household from a weighted average of outcomes in each similar
comparison group household from the matched sample.
Matching methods differ in the selection of the matched comparison and in how
these weighted average differences in outcomes are constructed.
One popular approach is propensity score matching (PSM).
Page 41
Regression discontinuity
The regression discontinuity design (RDD)-method that can
be used for programs that have a continuous eligibility index
with a clearly defined cutoff score to determine eligibility.
To apply RDD, two main conditions are needed:
1. A continuous eligibility index.
2. A clearly defined cutoff score, that is, a point on the index
above or below which the population is classified as eligible
for the program.
Page 42
RDD- Continued
The regression discontinuity measures the difference in post-
intervention outcomes, such as incomes between the units
near the eligibility cutoff
The difference is estimated using regression based on sub-
sample around the cutoff point
Page 43
Encouragement design Encouragement design is useful when intervention cannot be randomly administered to
some and not others.
The method requires - a randomly-selected group of beneficiaries receive extra
encouragement to undertake the intervention.
Encouragement -additional information or incentives.
By randomizing encouragement and carefully tracking outcomes for those who
do and do not receive encouragement, it is possible to obtain reliable estimates
of encouragement and intervention itself
compare results for the randomly-selected encouraged group vs. results for the
randomly-selected not-encouraged group. This quantity of interest, known as
the “Intention-to-Treat” effect, or ITT, is the effect of the encouragement itself
Page 44
Encouragement design: continued Effect of the treatment obtained by adjusting the ITT by the amount of non-
compliance
LATE=ITT/Compliance rate
Compliance Rate = Fraction of Subjects that were treated in the treatment
group - Fraction of Subjects that were treated in the control group
With 100% compliance rate LATE = ITT - all assigned to the treatment take
the treatment and all those assigned to the control do not take the treatment.
The compliance rate can be thought of as the fraction of subjects that fall into
the sub-population of “compliers”, the group for whom the decision to take
treatment was directly affected by the assignment.
This is the group induced by the encouragement to take advantage of the
treatment.
Page 45
Finally on encouragement design Compliers-the group of people that actually stick to the experimental protocol-
take treatment if assigned to the treatment group and not if assigned to control.
For policy compliers are the only ones who are actually affected by the
encouragement.
Usually, the compliance rate < 1
LATE effect estimates the effect of treatment only for the sub-population of
compliers and it does not constitute the effect of the treatment for the whole
sample.
Special case when the control group can be excluded from taking the treatment,
the non-compliance can only occur in the treatment group and the LATE =ATT
In general, the compliance rate depends on the encouragement.
Page 46
Power calculation
Power
The ability of a study to detect an impact. Conducting a power
calculation is a crucial step in impact evaluation design,
Power calculation
A calculation of the sample required for the impact evaluation,
which depends on the minimum effect size and required level of
confidence.
Page 47
Power –continued We discuss the basic intuition behind power calculations by focusing on the
simplest case—an evaluation conducted using a RCT and assuming that
noncompliance is not an issue.
Power calculations indicate the minimum sample size needed to conduct IE.
Assess whether existing data sets are large enough for the purpose of
conducting an impact evaluation.
Avoid collecting too much information, which can be very costly.
Page 48
Large samples better resemble population (both
treatment and control) (World Bank 2008)
Page 49
Type 1 and Type 2 error
A type I error is made when an evaluation concludes that a
program has had an impact, when in reality it had no impact.
A type II error occurs when an evaluation concludes that the
program has had no impact, when in fact it has had an
impact.
the likelihood of a type I error can be set by a parameter
called the “confidence level.
Many factors affect the likelihood of committing a type II
error, but the sample size is crucial
Page 50
Power stuff continued If the average of 50,000 units treated is same as the average weight of 50,000
comparison units, then one probably can confidently conclude that the program
has had no impact.
By contrast, if a sample of two treatment children weigh on average the same as
a sample of two comparison children, it is harder to reach a reliable conclusion.
The power (or statistical power) of an impact evaluation is the probability that it
will detect a difference between the treatment and comparison groups, when in
fact one exists. An impact evaluation has a high power if there is a low risk of
not detecting real program impacts, that is, of committing a type II error.
Page 51
Power calculations: continued (World
Bank 2008)
Involves the following steps
Does the program create clusters?
What is the outcome indicator?
Is it required to compare program impacts between subgroups?
What is the minimum level of impact that would justify the
investment made in the intervention?
What is a reasonable level of power for the evaluation being
conducted?
6. What are the baseline mean and variance of the outcome indicators?
Page 52
Power calculations: continued Power calculations involve different steps, depending on whether the program randomly
assigns benefits among clusters or simply assigns benefits randomly among all units in a
population.
No clusters – take a random sample of population (entire)
If subgroups will need larger sample (for example both male and female)
Minimum level of impact below which the program will be treated as not successful?
For an evaluation to identify small effects in difference in mean outcomes sample will
required to be larger – minimum detectable effect should be chosen carefully
There can be different power levels – standard 80 percent i.e. find impact in 80 percent
of cases when one has occurred
Get mean and variance in baseline right- more variance in the baseline will require larger
sample to capture effect
Think of sensitivity to sample size to assumptions -lower expected impact, higher
variance in the outcome indicator, or a higher power level
Page 53
Brief blurb on power with clusters In the presence of clustering, guiding principle is that the number of clusters
matters more than the number of individuals within the clusters. A sufficient
number of clusters is required to test whether a program has had an impact by
comparing outcomes in samples of treatment and comparison units.
If district is cluster -2 districts versus 100 districts on average latter could give
similar treatment and comparison groups but can be costly
All steps 1-6 like before except
How variable is the outcome indicator within clusters?
In general, higher intra-cluster correlation in outcomes increases the number of
clusters required to achieve a given power level – gain less by adding one more
person from same village than from other village
Page 54
In this project
There are both clusters
Unclustered interventions
Page 55