1 Cluster sampling DEFN: A cluster is a group of observation units (or “elements”)

23
1 Cluster sampling DEFN: A cluster is a group of observation units (or “elements”) Population O bs Unit Cluster U.S.residents person household Lincoln households household city block,or postalroute UNL em ployees employee departm ent M aple trees in Verm ont tree 1 km 1 km plot

Transcript of 1 Cluster sampling DEFN: A cluster is a group of observation units (or “elements”)

Page 1: 1 Cluster sampling DEFN: A cluster is a group of observation units (or “elements”)

1

Cluster sampling DEFN: A cluster is a group of

observation units (or “elements”)

Population Obs Unit Cluster

U.S. residents person household

Lincoln households household city block, or postal route

UNL employees employee department

Maple trees in Vermont

tree 1 km 1 km plot

Page 2: 1 Cluster sampling DEFN: A cluster is a group of observation units (or “elements”)

2

Cluster sample DEFN: A cluster sample is a

probability sample in which a sampling unit is a cluster

Frame SU OU List of phone numbers phone number person List of blocks block household List of UNL departments department faculty member List of plots plot tree

Page 3: 1 Cluster sampling DEFN: A cluster is a group of observation units (or “elements”)

3

Cluster sample – 2 1-stage cluster sampling

Divide the population (of N elements) into NI clusters (of size Ni for cluster i)

Cluster = group of elements An element belongs to 1 and only 1 cluster

Sampling unit Cluster = group of elements = PSU = primary

sampling unit Can use any design to select clusters (ST, PPS)

Data collection Collect information on ALL elements in the cluster

Page 4: 1 Cluster sampling DEFN: A cluster is a group of observation units (or “elements”)

4

1-stage CS ST

Take an SRS f rom ever stratum:Take an SRS of clusters; observe all elements within the clusters in thesample:

A block of cells is a stratum

A block of cells is a clusterSU is a cluster

Don’t sample from every cluster

SU is an element (or OU)

Sample from every stratum

Sample of 40 elements

Page 5: 1 Cluster sampling DEFN: A cluster is a group of observation units (or “elements”)

5

Cluster vs. stratified sampling Cluster sample

Divide N elements into NI clusters Cluster or PSU i has Ni elements

Take a sample of nI clusters Stratified sampling

N elements divided into H strata An element belongs to 1 and only 1 stratum

Take a sample of n elements, consisting of nh elements from stratum h for each of the H strata

UNNI

ii

Page 6: 1 Cluster sampling DEFN: A cluster is a group of observation units (or “elements”)

6

Cluster sample – 3 2-stage cluster sampling

Process Select PSUs (stage 1) Select elements within each sampled PSU (stage

2) First stage sampling unit is a …

PSU = primary sampling unit = cluster Second stage sampling unit is a …

SSU = secondary sampling unit = element = OU Only collect data on the SSUs that were

sampled from the cluster

Page 7: 1 Cluster sampling DEFN: A cluster is a group of observation units (or “elements”)

7

1-stage vs. 2-stage cluster sampling

Take an SRS of mi SSUs in sampled PSU i :Sample all SSUs in sampled PSUs:

1-stage cluster sample (stop here)

OR

Stage 1 of 2-stage cluster sample(select PSUs)

Stage 2 of 2-stage cluster sample (select SSUs w/in PSUs)

Page 8: 1 Cluster sampling DEFN: A cluster is a group of observation units (or “elements”)

8

Why use cluster sampling? May not have a list of OUs for a frame, but a list

of clusters may be available List of Lincoln phone numbers (= group of residents) is

available, but a list of Lincoln residents is not available List of all NE primary and secondary schools (= group

of students) is available, but a list of all students in NE schools is not available

May be cheaper to conduct the study if OUs are clustered

Occurs when cost of data collection increases with distance between elements

Household surveys using in-person interviews (household = cluster of people)

Field data collection (plot = cluster of plants, or animals)

Page 9: 1 Cluster sampling DEFN: A cluster is a group of observation units (or “elements”)

9

Defining clusters due to frame limitations A cluster (or PSU) is a group of

elements corresponding to a record (row) in the frame

Example Population = employees in

McDonald’s franchises Element = employee Frame = list of McDonald’s stores PSU = store = cluster of employees

Page 10: 1 Cluster sampling DEFN: A cluster is a group of observation units (or “elements”)

10

Defining clusters to reduce travel costs A cluster (or PSU) is a group of

nearby elements Example

Population = all farms Element = farm Frame = list of sections (1 mi x 1 mi

areas) in rural area PSU = section = cluster of farms

Page 11: 1 Cluster sampling DEFN: A cluster is a group of observation units (or “elements”)

11

Cluster samples usually lead to less precise estimates Elements within clusters tend to be correlated

due to exposure to similar conditions Members of a household Employees in a business Plants or soil within a field plot

We are getting less information than if selected same number of unrelated elements

Select sample of city blocks (clusters of households) Ask each household:

Should city upgrade storm sewer system? PSU (city block) 1

No storm sewer households will tend to say yes PSU (city block) 2

New development households will tend to say no

Page 12: 1 Cluster sampling DEFN: A cluster is a group of observation units (or “elements”)

12

Defining clusters for improved precision Define clusters for which within-cluster

variation is high (rarely possible) Make each cluster as heterogeneous as possible

Like making each cluster a mini-population that reflects variation in population

Minimizes the amount of correlation among elements in the cluster

Opposite of the approach to stratification Large variation among strata, homogeneous within

strata Define clusters that are relatively small

Extreme case is cluster = element Decreasing the number of correlated

observations in the sample

Page 13: 1 Cluster sampling DEFN: A cluster is a group of observation units (or “elements”)

13

Example for single-stage cluster sampling w/ equal prob (CSE1) Dorm has NI = 100 suites (clusters) Each suite has Ni = 4 students (4

elements in cluster i , i = 1, 2, … , NI) Note that there are

Take SRS nI = 5 suites (clusters) Ask each student living in each of the 5

suites How many nights per week do you eat dinner

in the dining hall? Will get observations from a sample of 20

students = 5 suites x 4 students/suite

populationin students 400)4(1001

IN

iiNN

Page 14: 1 Cluster sampling DEFN: A cluster is a group of observation units (or “elements”)

14

Dorm example – 2

Stu-dent

Suite 6

Suite 21

Suite 28

Suite 54

Suite 89

1 5 3 6 5 1

2 5 2 4 4 4

3 4 4 4 6 3

4 6 5 5 6 2

Total 20 14 19 21 10

Page 15: 1 Cluster sampling DEFN: A cluster is a group of observation units (or “elements”)

15

Dorm example – 3 SRS of nI = 5 dorm rooms Data on each cluster (all students in dorm

room) ti = total number of dining hall dinners for dorm

room i t2 = 14 dining hall dinners for 4 students in dorm

room 2 Estimated total number of dining hall nights

for the dorm students HT estimator of total = pop size x sample mean (of

cluster totals)dinners hall dining 1680)8.16(100

)1021191420(5

1100

1ˆ1

n

ii

II tn

Nt

Page 16: 1 Cluster sampling DEFN: A cluster is a group of observation units (or “elements”)

16

Notation Response variable for SSU j in PSU

i yij e.g., age of j-th resident in household

i e.g., whether or not dorm resident j

in room i owns a computer

Page 17: 1 Cluster sampling DEFN: A cluster is a group of observation units (or “elements”)

17

Cluster size =

Cluster population total

Note that we observe cluster population total (or mean or variance) for each sample cluster in 1-stage cluster sampling

We will estimate cluster parameters in 2-stage cluster sampling

iN

jiji yt

1

Cluster-level population parameters (for cluster i )

Ni elements

Page 18: 1 Cluster sampling DEFN: A cluster is a group of observation units (or “elements”)

18

75.733.4

39

9

21111

11

11

Sy

t

N

U

Popuation

83.3

46

88.6Sboxes12

2

2

222

Uy

t

N

33.3

30

00.9S9

6

6

266

Uy

t

N

00.7

95.4

99

20

21

1

1

1

S

y

t

N

U

1-stage cluster sample

Page 19: 1 Cluster sampling DEFN: A cluster is a group of observation units (or “elements”)

19

Data from cluster samples Work with element and cluster-level data Element data set will have columns for

Cluster id Element id within cluster Variable (y)

Will also summarize this data set to generate cluster parameters (1-stage) or estimates of cluster parameters (2-stage)

Cluster id Cluster total (or estimate) Cluster mean (or estimate) Cluster variance (or estimate)

Page 20: 1 Cluster sampling DEFN: A cluster is a group of observation units (or “elements”)

20

1-stage cluster sampleElement data Cluster

summary

i j yij

1 1 y11

1 2 y12

1 3 Y13

1 4 y14

2 1 y21

2 2 y22

2 3 y23

3 1 y31

i ti

1 t1

2 t2

3 t3

iUy

Uy1

Uy2

Uy3

2iS

21S22S23S

Page 21: 1 Cluster sampling DEFN: A cluster is a group of observation units (or “elements”)

21

CSE1 unbiased estimation under SI – total t Estimator for population total using data

collected from a 1-stage cluster sample SI of clusters

Estimator of variance of

n

iitn

Nt

2

1

22

1

1where1ˆˆ

Ii

n

iIt

I

t

I

II N

tt

ns

n

s

N

nNtV

I

Page 22: 1 Cluster sampling DEFN: A cluster is a group of observation units (or “elements”)

22

Dorm example – 4 Estimated population total

Estimated variance

dinners hall dining 1680)8.16(100

)1021191420(5

1100

1ˆ1

In

ii

II tn

Nt

06.203ˆ

230,415

7.21

100

51100ˆˆ

7.21])8.1610(...)8.1620[(15

1

2

222

tSE

tV

s t

Page 23: 1 Cluster sampling DEFN: A cluster is a group of observation units (or “elements”)

23

Dorm example – 5

Inclusion probability for student j in dorm room i N = 100 dorm rooms n = 5 sample dorm rooms Take all 4 students in dorm room ij = nI / NI = 1/20 = 0.05