Practical Sampling for Impact Evaluations
-
Upload
colorado-morse -
Category
Documents
-
view
28 -
download
0
description
Transcript of Practical Sampling for Impact Evaluations
AADAPT Workshop South AsiaGoa, December 17-21, 2009
Practical Sampling for Impact Evaluations
Presenter’s name
1
Introduction How do we construct a sample to credibly detect a
meaningful effect? Which populations or groups are we interested in and where do
we find them? How many people/firms/units should be interviewed/observed
from that population? How does this affect the evaluation budget?
Warning! Goal of presentation is not to make you a sampling expert Goal is also not to give you a headache. Rather an overview: How do sampling features affect what it is
possible to learn from an impact evaluation?
2
Outline1. Sampling frame
What populations or groups are we interested in? How do we find them?
2. Sample size Why it is so important: confidence in results Determinants of appropriate sample size Further issues Examples
3. Budgets
3
Sampling frame Who are we interested in?
a) All SMEs?b) All formal SMEs?c) All formal SMEs in a particular sector?d) All formal SMEs in a particular sector in a particular region?
Need to keep in mind external validity Can findings from population (c) inform appropriate programs
to help informal firms in a different sector? Can findings from population (d) inform national policy?
But should also keep in mind feasibility and what you want to learn Might not be possible or desirable to pilot a very broadly
defined program or policy 4
Sampling frame: Finding the units we’re interested in Depends on size and type of experiment
Lottery among applicants Example: BDS program among informal firms in a particular area Can use treatment and comparison units from applicant pool If not feasible (50,000 get the treatment), need to draw a sample to
measure impact Policy change
Example: A change in business registration rules in randomly selected districts
To measure impact on profits, cannot sample all informal businesses in treatment and comparison districts.
Will need to draw a sample of firms within districts.
Required information before sampling Complete listing all of units of observation available for sampling
in each area or group Tricky for units like informal firms, but there are techniques to
overcome this 5
Outline1. Sampling frame
What populations or groups are we interested in How do we find them?
2. Sample size Why it is so important: confidence in results Determinants of appropriate sample size Further issues Examples
3. Budgets
6
Sample size and confidence Start with a simpler question than program impact
Say we wanted to know the average annual profits of an SME in Dakar. Option 1: We go out and track down 5 business owners
and take the average of their responses. Option 2: We track down 1,000 business owners and
average their responses.
Which average is likely to be closer to the true average?
7
Sample size and confidence:
5 firms 1,000 firms
8
Profits Number of firms$0 - $1,000 1$ 1,001 -$5,000 2$5,001-10,000 1$10,001, - $15,000 0$15,001 + 1
Profits Number of firms$0 - $1,000 70$ 1,001 -$5,000 150$5,001-10,000 650$10,001, - $15,000 125$15,001 + 5
Sample size and confidence Similarly, when determining program impact
Need many observations to say with confidence whether average outcome of treatment group is higher/lower than in comparison group
What do I mean by confidence? Minimizing statistical error
Types of errors Type 1 error: You say there is a program impact when there
really isn’t one. Type 2 error: There really is a program impact but you cannot
detect it.
9
Sample size and confidence Type 1 error: Find program impact when there’s none
Error can be minimized after data collection, during statistical analysis Need to adjust the significance levels of impact estimates (e.g. 99% or
95% confidence intervals)
Type 2 error: Cannot see that there really is a program impact In jargon: statistical test has low power Error must be minimized before data collection Best method of doing this: ensuring you have a large enough sample
Whole point of an impact evaluation is to learn something Ex ante: We don’t know how large the impact of this program is Low powered ex-post: This program might have increased firms’
profits by 50% but we cannot distinguish a 50% increase from an increase of zero with any confidence
10
Calculating sample size
There’s actually a formula. Don’t get scared.
Main things to be aware of:1. Detectable effect size2. Probability of type 1 and 2 errors3. Variance of outcome(s)4. Units (firms, banks) per treated area
11
)1(1)(4
2
22/
2
H
D
zzN
Calculating sample size
Detectable effect size Smallest effect you want to be able to distinguish from zero
A 30% increase in sales, a 25% decrease in bribes paid
Larger samples easier to detect smaller effects
Do female and male entrepreneurs work similar hours? Claim: On average, women work 40 hours/week, men work 44
hours/week If statistic came from sample of 10 women & 10 men
Hard to say if they are different Would be easier to say they are different if women work 30 hours/week
and men work 80 hours/week But if statistic came from sample of 500 women and 500 men
More likely that they truly are different 12
Calculating sample size
How do you choose the detectable effect size? Smallest effect that would prompt a policy
response Smallest effect that would allow you to say that a
program was not a failure This program significantly increased sales by 40%.
Great - let’s think about how we can scale this up. This program significantly increased sales by 10%.
Great….uh..wait: we spent all of that money and it only increased sales by that much?
13
Calculating sample size
Type 1 and Type 2 errors Type 1
Significance level of estimates usually set to 1% or 5% 1% or 5% probability that there is no effect but we
think we found one Type 2
Power usually set to 80% or 90% 20% or 10% probability that there is an effect but we
cannot detect it Larger samples higher power
14
Calculating sample size
Variance of outcomes Less underlying variance
easier to detect difference can have lower sample size
15
Calculating sample size
Variance of outcomes How do we know this before we decide our
sample size and collect our data? Ideal pre-existing data often ….non-existent Can use pre-existing data from a similar population Example: Enterprise Surveys, labor force surveys
Makes this a bit of guesswork, not an exact science
16
Further issues
1. Multiple treatment arms2. Group-disaggregated results3. Take-up4. Data quality
17
Further issues Multiple treatment arms
Straightforward to compare each treatment separately to the comparison group
To compare treatment groups requires very large samples Especially if treatments very similar, differences between the
treatment groups would be smaller In effect, it’s like fixing a very small detectable effect size
Group-disaggregated results Are effects different for men and women? For different sectors? If genders/sectors expected to react in a similar way, then
estimating differences in treatment impact also requires very large samples
18
Who is taller?Detecting smaller differences is harder
19
Further issues Group-disaggregated results
To ensure balance across treatment and comparison groups, good to divide sample into strata before assigning treatment
Strata Sub-populations Common strata: geography, gender, sector, initial
values of outcome variable Treatment assignment (or sampling) occurs within
these groups
20
Why do we need strata?
Geography example = T = C
Why do we need strata? What’s the impact in a particular region? Sometimes hard to say with any confidence
Why do we need strata? Random assignment to treatment within
geographical units Within each unit, ½ will be treatment, ½ will be
comparison.
Similar logic for gender, industry, firm size, etc
Further issues Take-up
Low take-up increases detectable effect size Can only find an effect if it is really large Effectively decreases sample size
Example: Offering matching grants to SMEs for BDS services Offer to 5,000 firms Only 50 participate Probably can only say there is an effect on sales with
confidence if they become Fortune 500 companies
24
Further issues
Data quality Poor data quality effectively increases required
sample size Missing observations Increased noise
Can be partly addressed with field coordinator on the ground monitoring data collection
25
Example from Ghana• Calculations can be made in many statistical packages – e.g. STATA, OD
• Experiment in Ghana designed to increase the profits of microenterprise firms
• Baseline profits– 50 cedi per month.– Profits data typically noisy, so a coefficient of variation >1 common.
• Example STATA code to detect 10% increase in profits: – sampsi 50 55, p(0.8) pre(1) post(1) r1(0.5) sd1(50) sd2(50)– Having both a baseline and endline decreases required sample size (pre and post)
• Results– 10% increase (from 50 to 55): 1,178 firms in each group– 20% increase (from 50 to 60): 295 firms in each group.– 50% increase (from 50 to 75): 48 firms in each group (But this effect size not realistic)
• What if take-up is only 50%?– Offer business training that increases profits by 20%, but only half the firms do it. – Mean for treated group = 0.5*50 + 0.5*60 = 55– Equivalent to detecting a 10% increase with 100% take-up need 1,178 in each group instead
of 295 in each group 26
Outline1. Sampling frame
What populations or groups are we interested in How do we find them?
2. Sample size Why it is so important: confidence in results Determinants of appropriate sample size Further issues Examples
3. Budgets
27
Budgets
What is required?
Data collection Survey firm Data entry
Field coordinator to ensure treatment follows randomization protocol and to monitor data collection
Data analysis28
Budgets How much will all of this cost?
Huge range. Often depends on Length of survey Ease of finding respondents Spatial dispersion of respondents Security issues Formal vs informal firms Required human capital of enumerator Et cetera….
Firm-level survey data:$40-350/firm Household survey data: $40+/household Field coordinator: $10,000-$40,000/year
Depends on whether you can find a local hire Administrative data: Usually free
Sometimes has limited outcomes, can miss most of the informal sector
29
Summing up The sample size of your impact evaluation will
determine how much you can learn from your experiment
Some judgment and guesswork in calculations but important to spend time on them If sample size is too low: waste of time and money because
you will not be able to detect a non-zero impact with any confidence
If little effort put into sample design and data collection: See above.
Questions?30