1 Cluster sampling DEFN: A cluster is a group of observation units (or “elements”)
-
Upload
valerie-bell -
Category
Documents
-
view
218 -
download
0
Transcript of 1 Cluster sampling DEFN: A cluster is a group of observation units (or “elements”)
1
Cluster sampling DEFN: A cluster is a group of
observation units (or “elements”)
Population Obs Unit Cluster
U.S. residents person household
Lincoln households household city block, or postal route
UNL employees employee department
Maple trees in Vermont
tree 1 km 1 km plot
2
Cluster sample DEFN: A cluster sample is a
probability sample in which a sampling unit is a cluster
Frame SU OU List of phone numbers phone number person List of blocks block household List of UNL departments department faculty member List of plots plot tree
3
Cluster sample – 2 1-stage cluster sampling
Divide the population (of N elements) into NI clusters (of size Ni for cluster i)
Cluster = group of elements An element belongs to 1 and only 1 cluster
Sampling unit Cluster = group of elements = PSU = primary
sampling unit Can use any design to select clusters (ST, PPS)
Data collection Collect information on ALL elements in the cluster
4
1-stage CS ST
Take an SRS f rom ever stratum:Take an SRS of clusters; observe all elements within the clusters in thesample:
A block of cells is a stratum
A block of cells is a clusterSU is a cluster
Don’t sample from every cluster
SU is an element (or OU)
Sample from every stratum
Sample of 40 elements
5
Cluster vs. stratified sampling Cluster sample
Divide N elements into NI clusters Cluster or PSU i has Ni elements
Take a sample of nI clusters Stratified sampling
N elements divided into H strata An element belongs to 1 and only 1 stratum
Take a sample of n elements, consisting of nh elements from stratum h for each of the H strata
UNNI
ii
6
Cluster sample – 3 2-stage cluster sampling
Process Select PSUs (stage 1) Select elements within each sampled PSU (stage
2) First stage sampling unit is a …
PSU = primary sampling unit = cluster Second stage sampling unit is a …
SSU = secondary sampling unit = element = OU Only collect data on the SSUs that were
sampled from the cluster
7
1-stage vs. 2-stage cluster sampling
Take an SRS of mi SSUs in sampled PSU i :Sample all SSUs in sampled PSUs:
1-stage cluster sample (stop here)
OR
Stage 1 of 2-stage cluster sample(select PSUs)
Stage 2 of 2-stage cluster sample (select SSUs w/in PSUs)
8
Why use cluster sampling? May not have a list of OUs for a frame, but a list
of clusters may be available List of Lincoln phone numbers (= group of residents) is
available, but a list of Lincoln residents is not available List of all NE primary and secondary schools (= group
of students) is available, but a list of all students in NE schools is not available
May be cheaper to conduct the study if OUs are clustered
Occurs when cost of data collection increases with distance between elements
Household surveys using in-person interviews (household = cluster of people)
Field data collection (plot = cluster of plants, or animals)
9
Defining clusters due to frame limitations A cluster (or PSU) is a group of
elements corresponding to a record (row) in the frame
Example Population = employees in
McDonald’s franchises Element = employee Frame = list of McDonald’s stores PSU = store = cluster of employees
10
Defining clusters to reduce travel costs A cluster (or PSU) is a group of
nearby elements Example
Population = all farms Element = farm Frame = list of sections (1 mi x 1 mi
areas) in rural area PSU = section = cluster of farms
11
Cluster samples usually lead to less precise estimates Elements within clusters tend to be correlated
due to exposure to similar conditions Members of a household Employees in a business Plants or soil within a field plot
We are getting less information than if selected same number of unrelated elements
Select sample of city blocks (clusters of households) Ask each household:
Should city upgrade storm sewer system? PSU (city block) 1
No storm sewer households will tend to say yes PSU (city block) 2
New development households will tend to say no
12
Defining clusters for improved precision Define clusters for which within-cluster
variation is high (rarely possible) Make each cluster as heterogeneous as possible
Like making each cluster a mini-population that reflects variation in population
Minimizes the amount of correlation among elements in the cluster
Opposite of the approach to stratification Large variation among strata, homogeneous within
strata Define clusters that are relatively small
Extreme case is cluster = element Decreasing the number of correlated
observations in the sample
13
Example for single-stage cluster sampling w/ equal prob (CSE1) Dorm has NI = 100 suites (clusters) Each suite has Ni = 4 students (4
elements in cluster i , i = 1, 2, … , NI) Note that there are
Take SRS nI = 5 suites (clusters) Ask each student living in each of the 5
suites How many nights per week do you eat dinner
in the dining hall? Will get observations from a sample of 20
students = 5 suites x 4 students/suite
populationin students 400)4(1001
IN
iiNN
14
Dorm example – 2
Stu-dent
Suite 6
Suite 21
Suite 28
Suite 54
Suite 89
1 5 3 6 5 1
2 5 2 4 4 4
3 4 4 4 6 3
4 6 5 5 6 2
Total 20 14 19 21 10
15
Dorm example – 3 SRS of nI = 5 dorm rooms Data on each cluster (all students in dorm
room) ti = total number of dining hall dinners for dorm
room i t2 = 14 dining hall dinners for 4 students in dorm
room 2 Estimated total number of dining hall nights
for the dorm students HT estimator of total = pop size x sample mean (of
cluster totals)dinners hall dining 1680)8.16(100
)1021191420(5
1100
1ˆ1
n
ii
II tn
Nt
16
Notation Response variable for SSU j in PSU
i yij e.g., age of j-th resident in household
i e.g., whether or not dorm resident j
in room i owns a computer
17
Cluster size =
Cluster population total
Note that we observe cluster population total (or mean or variance) for each sample cluster in 1-stage cluster sampling
We will estimate cluster parameters in 2-stage cluster sampling
iN
jiji yt
1
Cluster-level population parameters (for cluster i )
Ni elements
18
75.733.4
39
9
21111
11
11
Sy
t
N
U
Popuation
83.3
46
88.6Sboxes12
2
2
222
Uy
t
N
33.3
30
00.9S9
6
6
266
Uy
t
N
00.7
95.4
99
20
21
1
1
1
S
y
t
N
U
1-stage cluster sample
19
Data from cluster samples Work with element and cluster-level data Element data set will have columns for
Cluster id Element id within cluster Variable (y)
Will also summarize this data set to generate cluster parameters (1-stage) or estimates of cluster parameters (2-stage)
Cluster id Cluster total (or estimate) Cluster mean (or estimate) Cluster variance (or estimate)
20
1-stage cluster sampleElement data Cluster
summary
i j yij
1 1 y11
1 2 y12
1 3 Y13
1 4 y14
2 1 y21
2 2 y22
2 3 y23
3 1 y31
…
i ti
1 t1
2 t2
3 t3
…
iUy
Uy1
Uy2
Uy3
2iS
21S22S23S
21
CSE1 unbiased estimation under SI – total t Estimator for population total using data
collected from a 1-stage cluster sample SI of clusters
Estimator of variance of
n
iitn
Nt
1̂
t̂
2
1
22
2ˆ
1
1where1ˆˆ
Ii
n
iIt
I
t
I
II N
tt
ns
n
s
N
nNtV
I
22
Dorm example – 4 Estimated population total
Estimated variance
dinners hall dining 1680)8.16(100
)1021191420(5
1100
1ˆ1
In
ii
II tn
Nt
06.203ˆ
230,415
7.21
100
51100ˆˆ
7.21])8.1610(...)8.1620[(15
1
2
222
tSE
tV
s t
23
Dorm example – 5
Inclusion probability for student j in dorm room i N = 100 dorm rooms n = 5 sample dorm rooms Take all 4 students in dorm room ij = nI / NI = 1/20 = 0.05