Agenda Sampling probability sampling nonprobability sampling External validity.
Sampling - docs.wfp.org
Transcript of Sampling - docs.wfp.org
Sampling
Basic concepts
Overview
Why do sampling?
Steps for deciding sampling methodology
Sampling methods
Representative vs. bias
Probability vs. non-probability
Simple, random, systematic and cluster sampling
What is the objective of sampling?
The objective of
sampling is to
estimateestimate an indicator
for the larger
population if we cannot
measure everybody.
1 2 3 41 2 3 4 55 66
7 8 9 107 8 9 10 1111 1212
13 14 15 16 1713 14 15 16 17 1818
19 20 21 22 2319 20 21 22 23 2424
25 26 27 28 2925 26 27 28 29 3030
1 2 3 41 2 3 4 55 66
7 8 9 107 8 9 10 1111 1212
13 14 15 16 1713 14 15 16 17 1818
19 20 21 22 2319 20 21 22 23 2424
25 26 27 28 2925 26 27 28 29 3030
Population of Papua New Guinea
726,680 children less than 5 years of age
1,298,503 women 15-49 years of age
With 6 teams who each measure 13 women and 13 children per day, data collection would take 16,648 days or
45.6 years45.6 years
What is necessary to achieve this objective?
The sample must be representative
of the larger population.
Representative versus bias…
BiasBiasSome members have Some members have greater chance of being greater chance of being included than others included than others (e.g. interviewer bias, (e.g. interviewer bias, main road bias).main road bias). Results will differ from Results will differ from the the actual population actual population prevalenceprevalence This error cannot be This error cannot be corrected during the corrected during the analysisanalysis
RepresentativeRepresentativeAll members of a All members of a population have an population have an equal chance of equal chance of being included in being included in the samplethe sample Results will be Results will be close to the close to the population’s true population’s true valuevalue
random or biased sample?
a survey of child malnutrition is a survey of child malnutrition is conducted by measuring the conducted by measuring the children of women who were children of women who were advised over the radio to bring advised over the radio to bring their under-fives to the health clinic their under-fives to the health clinic on Tuesday morningon Tuesday morning
BIASEDBIASED
Proportion of HIV/AIDS affected Proportion of HIV/AIDS affected
population is 5.8% based on population is 5.8% based on
statistics from health facilities who statistics from health facilities who
frequently take blood samples from frequently take blood samples from
pregnant womenpregnant women
random or biased sample?
BIASEDBIASED
Steps for deciding sampling methodology
Define objectives and geographic area
Identify what info to collect
Determine sampling method
Calculate sample size
Additional factors: time available, financial resources, physical access (security)
Types of sampling
Non-probability sampling
Probability sampling
non-probability sampling…
sampling that doesn’t use random selection
to choose units to be examined or measured:
non-representative results
non-probability sampling…
When is it used?
Rapid appraisal methods (e.g. key informant/community group interviews/focus group discussions)
Often used in rapid assessments
Sampling with “a purpose” in mind: generally one or more pre-defined groups or areas to assess
Useful to reach targeted sample quickly
b
probability sampling…
sampling that uses random selection to
choose units. Results are representative
of the larger population
Pro’s and Con’s of Probability and Non-Probability Sampling
factor probability non- probability
precision: ++ +
time: ++ +
cost: ++ +
if lack of access due to insecurity:
+ ++
skill requirements:
statistics skills needed
qualitative analysis skills
needed
key concepts for probability sampling
population: the group of people for which indicators are measured
sampling frame:sampling frame: the population list from which the sample the population list from which the sample is to be drawnis to be drawn
sample: the randomly selected subset of the population
sampling unit: the unit that is selected during the process of sampling (e.g. first stage: community, 2. stage: household)
Example
A food security and nutrition survey is conducted in Flexiland. 100,000 households live in the area in 1,000 villages. First, 30 villages will be selected. In each village 15 households will be visited. The head of household head or spouse reports on all food items consumed by the household over the last 7 days. In addition, all children 6-59 months are measured. On average household have 1.5 children in this age group.
Identify
• Population
• Sampling frame
• Sample
• Respondent
• Sampling units
Example cont. Population: Flexiland Sampling frame:
First stage: List of villages Second stage: List of households within villages
Sample: 450 HHs (30*15) 675 children (450*1.5)
Respondent: Household head or spouse Sampling units:
Primary: Villages Secondary: Households, children (6-59 months)
Types of probability sampling
A: Simple random
B: Systematic
C: Cluster
A: Simple Random SamplingEach household/person randomly is
selected from population list.
Easier to use when population of interest is small and confined to small geographic area.
Steps:1. Number each sampling unit2. Choose new random number for
each selection (random number table or lottery)
Random number table
2352 6959 7678 1937 2554 6804 9098 4316 4318 2346 7276 1880 7136 9603 0163 3152 7000 2865 8357 4475 9804 0042 1106 7949 2932 9958 9582 2235 1140 1164 7841 1688 4097 8995 5030 1785 5420 0125 4953 1332 5540 6278 1584 4392 3258 1374 1617 7427
Number
1
2
3
4
5
6
7
8
9
0
Household
Edmond
Daniel
Jyoti
Victor
Anne
Sheriff
Vandi
Iye
Victor
Rauf
Example: Select 5 people out of 10
Random number table
2352 6959 7678 1937 2554 6804 9098 4316 4318 2346 7276 1880 7136 9603 0163 3152 7000 2865 8357 4475 9804 0042 1106 7949 2932 9958 9582 2235 1140 1164 7841 1688 4097 8995 5030 1785 5420 0125 4953 1332 5540 6278 1584 4392 3258 1374 1617 7427
Number
1
2
3
4
5
6
7
8
9
0
Example: 1. Person = 2
Household
Edmond
Daniel
Jyoti
Victor
Anne
Sheriff
Vandi
Iye
Victor
Rauf
Random number table
2352 6959 7678 1937 2554 6804 9098 4316 4318 2346 7276 1880 7136 9603 0163 3152 7000 2865 8357 4475 9804 0042 1106 7949 2932 9958 9582 2235 1140 1164 7841 1688 4097 8995 5030 1785 5420 0125 4953 1332 5540 6278 1584 4392 3258 1374 1617 7427
Number
1
2
3
4
5
6
7
8
9
0
Example: 2. Person = 3
Household
Edmond
Daniel
Jyoti
Victor
Anne
Sheriff
Vandi
Iye
Victor
Rauf
Random number table
2352 6959 7678 1937 2554 6804 9098 4316 4318 2346 7276 1880 7136 9603 0163 3152 7000 2865 8357 4475 9804 0042 1106 7949 2932 9958 9582 2235 1140 1164 7841 1688 4097 8995 5030 1785 5420 0125 4953 1332 5540 6278 1584 4392 3258 1374 1617 7427
Number
1
2
3
4
5
6
7
8
9
0
Example: 3. Person = 5
Household
Edmond
Daniel
Jyoti
Victor
Anne
Sheriff
Vandi
Iye
Victor
Rauf
Random number table
2352 6959 7678 1937 2554 6804 9098 4316 4318 2346 7276 1880 7136 9603 0163 3152 7000 2865 8357 4475 9804 0042 1106 7949 2932 9958 9582 2235 1140 1164 7841 1688 4097 8995 5030 1785 5420 0125 4953 1332 5540 6278 1584 4392 3258 1374 1617 7427
Number
1
2
3
4
5
6
7
8
9
0
Example: 4. Person = 6
Household
Edmond
Daniel
Jyoti
Victor
Anne
Sheriff
Vandi
Iye
Victor
Rauf
Random number table
2352 6959 7678 1937 2554 6804 9098 4316 4318 2346 7276 1880 7136 9603 0163 3152 7000 2865 8357 4475 9804 0042 1106 7949 2932 9958 9582 2235 1140 1164 7841 1688 4097 8995 5030 1785 5420 0125 4953 1332 5540 6278 1584 4392 3258 1374 1617 7427
Number
1
2
3
4
5
6
7
8
9
0
Example: 5. Person = 9
Household
Edmond
Daniel
Jyoti
Victor
Anne
Sheriff
Vandi
Iye
Victor
Rauf
Using Random Number Tables
If units < 10, then use 1 digit of table numbers
If units < 100, then use 2 digits of table numbers
If units < 1000, then use 3 digits of table numbers
Example: You want to randomly select 6 out of 71 towns
1. You number them from 1 to 71.
2. Close eyes and place fingertip on the table to start
3. Decide if you want to move right, left, up or down
4. Select first two digits of each number in the table
5. Cross out those that start with 72 or higher
TABLE OF RANDOM NUMBERS
39634 62349 74088 65564 16379 19713 39153 69459 17986 24537
14595 35050 40469 27478 44526 67331 93365 54526 22356 93208
30734 71571 83722 79712 25775 65178 07763 82928 31131 30196
64628 89126 91254 99090 25752 03091 39411 73146 06089 15630
42831 95113 43511 42082 15140 34733 68076 18292 69486 80468
80583 70361 41047 26792 78466 03395 17635 09697 82447 31405
00209 90404 99457 72570 42194 49043 24330 14939 09865 45906
05409 20830 01911 60767 55248 79253 12317 84120 77772 50103
95836 22530 91785 80210 34361 52228 33869 94332 83868 61672
65358 70469 87149 89509 72176 18103 55169 79954 72002 20582
6 villages are selected
Class exercise Select randomly 4 members in this class
using the random number table
Random number table
3647 2352 6959 1937 2554 6804 9098 4316 4318 2346 7276 1880 7136
9603 0163 3152 7000 2865 8357 4475 9804 0042 1106 7949 2932 9958
9582 2235 1140 1164 7841 1688 4097 8995 5030 1785 5420 0125 4953
1332 5540 6278 1584 4392 3258 1374 1617 7427 3320
Using SPSS
SPSS can help to randomly select cases by using the “select cases” function
Data Select cases Random sample of cases (option 1: xx% of all cases; option 2: x cases from the first x cases)
Simple Random Sampling
B: Systematic Random SamplingSimilar to simple random sampling, works well in well-organized refugee/IDP camps or neighborhoods• First person chosen randomly• Systematic selection of subsequent people• Statistics same as simple random sampling
Steps:• List or map all units in the population• Compute sampling interval (Number of population / Sample size)• Select random start between 1 and sampling interval• Repeatedly add sampling interval to select subsequent sampling units
Example 1 (household list): selection of 15 households in a community of 47 households
1. Peter Smith2. John Edward3. Mary McLean4. George Williams5. Morris Tamba6. Sayba Kolubah7. James Tamba8. Clifford Howard9. Thomas Tarr10. Jerry Morris11. Jules Sana12. Lisa Miller13. David Harper14. Peter Smith15. John Edward16. Mary McLean17. George Williams18. Morris Tamba19. Sayba Kolubah20. James Tamba21. Clifford Howard22. Thomas Tarr23. Jerry Morris24. Lisa Miller25. David Harper
26. Hilary Scott27. Smith Suba28. Zoe Mulbah29. Roosevelt Hill30. Johnson Snow31. Salif Jensen32. Fassou Clements33. Massa Kru34. Emanuel Liberty35. Stella Morris36. Peter Smith37. John Edward38. Mary McLean39. George Williams40. Morris Tamba41. Sayba Kolubah42. James Tamba43. Clifford Howard44. Thomas Tarr45. Jerry Morris46. Lisa Miller47. David Harper
Sampling interval:
47/15 = 3
Select randomly starting point: 1, 2 or 3 (counting,
lottery)
Example 1: selection of 15 households in a community of 47 households
1. Peter Smith2. John Edward3. Mary McLean4. George Williams5. Morris Tamba6. Sayba Kolubah7. James Tamba8. Clifford Howard9. Thomas Tarr10. Jerry Morris11. Jules Sana12. Lisa Miller13. David Harper14. Peter Smith15. John Edward16. Mary McLean17. George Williams18. Morris Tamba19. Sayba Kolubah20. James Tamba21. Clifford Howard22. Thomas Tarr23. Jerry Morris24. Lisa Miller25. David Harper
26. Hilary Scott27. Smith Suba28. Zoe Mulbah29. Roosevelt Hill30. Johnson Snow31. Salif Jensen32. Fassou Clements33. Massa Kru34. Emanuel Liberty35. Stella Morris36. Peter Smith37. John Edward38. Mary McLean39. George Williams40. Morris Tamba41. Sayba Kolubah42. James Tamba43. Clifford Howard44. Thomas Tarr45. Jerry Morris46. Lisa Miller47. David Harper 15 HHs
are selected
Systematic Sampling
480/40 = 12 Interval = 12
Example 2 (refugee camp): selection of 40 households in a camp made up of 480 households
Example 1: Which sampling method if no registration took place yet?
Stankovic I camp, Macedonia
Example 2: Which sampling method if registration already took place?
Chaman camp, Pakistan
Example 3: Which sampling method?
Kabumba camp, Zaire
What is required for both simple and systematic random sampling?
Both require a complete list of
sampling units arranged in some order.
C: Cluster Sampling
What do we do when no accurate list of all basic sampling units is available?
Used when sampling frame or geographic area is large
Saves time and resources
Objective: To choose smaller geographic areas in which simple or systematic random sampling can be done
Two-stage Cluster Sampling
1st stage: sites are selected using ‘probability proportion to size (PPS)’ methodology (= “clusters”)
2nd stage: within each cluster, households are randomly selected
Example 1: 25 clusters per district, 15 households per cluster = 375 households in each district
Two-stage Cluster Sampling in Flexiland
2. Step: Within each cluster (community), select 15 households using random or systematic random sampling
1. Step: Select randomly 25 communities
FlexilanFlexilandd
Example 4: Which sampling method?
1500 kms
Stratification
Stratification is the process of grouping members of the population into relatively homogeneous subgroups (e.g. regions, districts, livelihood zones)
The strata should be mutually exclusive: every element in the population must be assigned to only one stratum
Within each stratum, random, systematic or two stage cluster sampling is applied
Advantages: Sub-groups can be compared Representativeness is improved as the sample is more homogeneous
During the analysis, weighting is used to generate results that are representative at the aggregate level (e.g. nation, rural/urban population)
Example 5: How many strata?
Example 6: How many strata?
Final panel exercise:
Which sampling method would you choose?
Rapid emergency food security assessments following a flood in the Northern Atlantic Coast region of Nicaragua?
Nutrition survey in IDP-camp in Darfur?
Comprehensive Food Security and Vulnerability Analysis (CFSVAs) in Zambia?
Market assessment in Yemen?
Questions