1. Objectives and Challenges 4. Sampling Weight · 2021. 4. 21. · Objectives and Challenges:...
Transcript of 1. Objectives and Challenges 4. Sampling Weight · 2021. 4. 21. · Objectives and Challenges:...
Outline
1. Objectives and Challenges
2. Proposed Methodology
3. Sampling Approach
4. Sampling Weight
5. Implementation
6. Challenges
Introduction to adaptive sampling designs 1
Objectives and Challenges : Informal sector
• Informal sector is believed to account for a significant share of the
economy in many developing countries. • 55% of SSA’s GDP (AfDB 2013)
• The 2014 Zimbabwe LFS shows that about 94% of people employed
(in non-farm sector) are in informal employment (was 84% in 2011).• Official estimate of unemployment rate is 11%
• Means of livelihood for the poor and vulnerable segment of the society
• Policy making can’t ignore this sector - need to better understand their
operations and needs
• A better understanding requires better data,
Introduction to adaptive sampling designs 2
Objectives and Challenges: Objectives
i. Collect in-depth information on the business environment facing
informal businesses
• Key business environment and performance indicators
• Information specific to the informal sector (e.g.,, barriers to
registration)
ii. To derive estimates of the TOTAL number of informal firms in
Harare
Introduction to adaptive sampling designs 3
Objectives and Challenges : Challenges
• Non availability of sampling frame
• Units are expected to be clustered across the survey area
➢ Distribution expected to be heterogeneous within the survey area
• Construction of a list frame from other data sources not feasible.
Introduction to adaptive sampling designs 4
Conduct a representative survey by using
a probability sample
Proposed Methodology (1)
1. Use of a spatial area frame:
➢ Meets the criteria of a good sampling frame (exhaustive, up-to-
date, i.e. Greig-Smith, 1962)
2. Stratified Adaptive Cluster Sampling (Thompson 1990,1991)
➢ Addresses the problem of clusterization as well as the heterogenous
distribution
Approach originally developed for Biostatistical applications, applications
on human populations rare.
Introduction to adaptive sampling designs 5
Selection of a suitable sampling
approach to address both of the
challenges
THE PSU SAMPLING FRAME
Introduction to adaptive sampling designs
• Total number of PSUs
covering the are was
20510
• Frame was stratified
according to assumed density of final sampling
units in the area
• Sample sizes were 100,
200, 300 units, for high,
medium, low density areas respectively.
• Sample sizes were
derived through
simulation.
6
Proposed Methodology – Construction of the grid
1. Use of a shape file with area
boundaries and stratum identifiers➢ Stratification accounts for an
(expected) heterogeneous
distribution.
2. Appropriate cell size➢ Needs to balance expected
workload (i.e. number of final
sampling units) and area coverage.
Introduction to adaptive sampling designs 7
Proposed Methodology – Determination of the sample
size
Introduction to adaptive sampling designs 8
1. Analytical Solution not feasible.
➢ However relative efficiency of
ACS to SRS has been
demonstrated.
2. Use of empirical micro simulation
approach (Meindl & Tempel, 2016)➢ Construction of a synthetic population
including mean sales as the target
variable.
Stratum Number of units Av. units per square Mean sales
low 3679 4.39 1018.7815
medium 16223 7.14 1002.3281
high 12757 15.69 995.1539
Proposed Methodology – Determination of the sample
size
Introduction to adaptive sampling designs 9
1. Modify different population parameters
(i.e. distribution/density) of the population.
2. Modify different design parameters (i.e.
square, number of sampled starting
squares, expansion factor)
3. Repeat the design n times
4. Compare estimated outcome with the
(known) true population mean.
Implementation – Selection of the starting squares
1. Each starting square defines a
potential network.
2. Expansion of the starting square if a
(stratum) threshold level is found.
3. Survey all squares surrounding the
square responsible for the expansion.
Introduction to adaptive sampling designs 10
Proposed Methodology – Determination of the sample
size
• Simulation code is implemented into R’s shiny (Winston et al., 2017) web application.
• Results also include the (expected) expansion and therefore facilitate survey cost estimation.
Introduction to adaptive sampling designs
Stratum Level Number of units Av. units per square Mean sales
low 0 38 7.6 1122.6063
medium 0 137 8.56 1003.1202
medium 1 29 7.25 865.1046
medium 2 30 6 921.7877
medium 4 7 3.5 1014.4759
high 0 366 17.43 1036.9261
high 1 242 17.29 1046.3947
high 2 96 16 887.846
high 3 66 22 997.4784
high 4 36 18 1069.0724
11
Proposed Methodology – Computations of Weight
Introduction to adaptive sampling designs 12
1. Weights for the starting squares.
2. Weights for the squares subject to
expansion (i.e. the intersection
probability).
3. (Accounting for stratum overlap and edge units).
1. 𝑝0 =𝑛0
𝑁
2. 𝑝1 = 1− 𝑁−𝑚𝑖𝑛0
/ 𝑁𝑛0
Implementation – Sampling Design
Introduction to adaptive sampling designs
1. Create the grid and a synthetic
population to test the design
2. Run a simulation based on the
provided parameters, to advise
on the sample size (i.e. number
of PSUs)
3. Provide estimates of the
expected workload.
13
Implementation – Questionnaire and Navigation
1. Develop a “smart” questionnaire➢ Count the number of units
in the square
➢ Expand the square in line
with the predefined rule
➢ Show the enumerator the
correct boundary file on the
map
2. Create navigation files, which
can be used on the tablet.
3. Combine the two instruments,
such that they are usable even
in low skill environments.
Introduction to adaptive sampling designs 14
Implementation – Monitoring and Adjustments
Introduction to adaptive sampling designs 15
Fieldwork update
• 141 networks completed.
• 1840 informal firms listed
• 469 standard interviews completed; refusal rate of about 4%
Introduction to adaptive sampling designs 16
Fieldwork update…
Introduction to adaptive sampling designs 17
Completed squares and yield per square by strata.
Share of firms and response rate by strata
30
70
120
24
48
69
81 8087
43
114
46
13
6
-5
5
15
25
35
45
55
65
75
85
-10
10
30
50
70
90
110
130
High Medium Low
Sample Completed netwroks Completed Squares
Firms per square (av.) % resulted in expansion
56
28
16
59
25
16
5 62
0
10
20
30
40
50
60
70
High Medium Low
% firms found % long form quest. Refusal Rate (%)
Fieldwork update…
• Sectoral composition, based on data for 1017 listed and 261 interviewed
firms.
Introduction to adaptive sampling designs 18
103[10%]
754[74%]
160
[16%]
43[16.5%]
175[67%]
43[16.5%]
0
100
200
300
400
500
600
700
800
Manufacturing Retail Services
Listed firms Interviewed
Some Descriptives
Introduction to adaptive sampling designs 19
99% 4500 5400 Kurtosis 13.56255
95% 2000 4800 Skewness 3.055291
90% 1500 4500 Variance 715862.4
75% 600 4000
Largest Std. Dev. 846.0865
50% 250 Mean 555.6939
25% 100 25 Sum of Wgt. 245
10% 50 20 Obs 245
5% 35 20
1% 20 20
Percentiles Smallest
sales
************* SALES ****************
*************************************
*************************************
Some Descriptives
Introduction to adaptive sampling designs 20
0
.00
05
.00
1.0
015
Den
sity
0 2000 4000 6000sales
kernel = epanechnikov, bandwidth = 111.0113
Kernel density estimate
0.1
.2.3
Den
sity
2 4 6 8 10ln_sales
kernel = epanechnikov, bandwidth = 0.3747
Kernel density estimate
Some Descriptives
Introduction to adaptive sampling designs 21
0.2
.4.6
.8
Den
sity
0 5 10 15 20 25emp_total
Mean # of paid employee(s) : 1.24
Median # of paid employee(s) : 1.00
Mean # of unpaid employee(s) : 0.47
Median # of unpaid employee(s) : 0.00
Mean # of female employee(s) : 0.82
Median # of female employee(s) : 1.00
Some Descriptives
Introduction to adaptive sampling designs 22
99% 41.96667 117.5833 Kurtosis 62.62024
95% 31.74167 42.15 Skewness 5.585522
90% 27.89167 41.96667 Variance 74.19437
75% 24.28333 40.98333
Largest Std. Dev. 8.613615
50% 20.25833 Mean 20.92051
25% 16.6 9.966666 Sum of Wgt. 260
10% 12.96667 8.033334 Obs 260
5% 11.90833 .5833333
1% 8.033334 .3666667
Percentiles Smallest
time
DURATION FOR LONGFORM
99% 12.76667 357.3167 Kurtosis 637.7595
95% 5.583333 49.91667 Skewness 24.90674
90% 4.308333 17.93333 Variance 190.3719
75% 3.366667 14
Largest Std. Dev. 13.79753
50% 2.616667 Mean 3.530686
25% 2.083333 1.1 Sum of Wgt. 680
10% 1.683333 1.1 Obs 680
5% 1.533333 .5833333
1% 1.166667 .45
Percentiles Smallest
time
DURATION FOR SHORTFORM
Some Descriptives
Introduction to adaptive sampling designs 23
REFERENCES
Greig-Smith, P. (1964) Quantitative plant ecology. (2nd ed.) Butterworths, London.
Bernhard Meindl, Matthias Templ, Andreas Alfons, Alexander Kowarik, and with contributions from Mathieu Ribatet (2017). simPop: Simulation of Synthetic Populations for Survey Data Considering Auxiliary Information. R package version 0.6.0. URL https://CRAN.R-project.org/package=simPop.
Sudman, S., Sirken, M., Cowan, C. D. (1988). Sampling rare and elusive populations. Science240.4855: 991
Solomon, H., & Zacks, S. (1970). Optimal design of sampling from finite populations: A critical review and indication of new research areas. Journal of the American Statistical Association, 65(330), 653-677.
Thompson, S. K. (1990). Adaptive cluster sampling. Journal of the American Statistical Association, 85(412), 1050-1059.
Thompson, S. K. (1991). Stratified adaptive cluster sampling. Biometrika, 78(2), 389-397.
Winston Chang, Joe Cheng, JJ Allaire, Yihui Xie and Jonathan McPherson (2017). shiny: Web Application Framework for R. R package version 1.0.0. https://CRAN.R-project.org/package=shiny
Introduction to adaptive sampling designs 24
Thank You!
Introduction to adaptive sampling designs 25