Don L. Stevens, Jr. Department of Statistics Oregon State University

Post on 11-Jan-2016

30 views 0 download

Tags:

description

The Generalized Random Tessellation Stratified Sampling Design for Selecting Spatially-Balanced Samples. Don L. Stevens, Jr. Department of Statistics Oregon State University. Monitoring Science & Technology Symposium September 20 - 24, 2004 Denver, Colorado. Designs and Models for. - PowerPoint PPT Presentation

Transcript of Don L. Stevens, Jr. Department of Statistics Oregon State University

The Generalized Random Tessellation Stratified Sampling Design for Selecting

Spatially-Balanced Samples

Don L. Stevens, Jr.

Department of Statistics

Oregon State University

Monitoring Science & Technology Symposium

September 20 - 24, 2004

Denver, Colorado

This presentation was developed under STAR Research Assistance Agreement No. CR82-9096-01 Program on Designs and Models for Aquatic Resource Surveys awarded by the U.S. Environmental Protection Agency to Oregon State University. It has not been subjected to the Agency's review and therefore does not necessarily reflect the views of the Agency, and no official endorsement should be inferred

R82-9096-01

Historical Context

• GRTS design evolved from EMAP work on global tessellations in the early 1990’s– Scott Overton, Denis White, Jon

Kimmerling developed EMAP’s triangular grid + hexagonal tessellation

Historical Context

• EMAP began with a triangular grid & hexagonal tessellation– Expected to intensify grid as needed

– Triangular grid has several advantages• More compact than square grid

• More subdivision factors

– Became clear that basic concept did not have enough flexibility to accommodate the characteristics of environmental resource sampling

Environmental Resource Populations

• Point-like

– Finite population of discrete units, e.g., small- to medium-sized lakes

• Linear

– Width is very small relative to length, e.g., streams or riparian vegetation belts

• Extensive

– Covers large area in a more or less continuous and connected fashion, e.g., a large estuary

Environmental Resource Populations

• Tobler's First Law of Geography: Things that are close together in space tend to have more similar properties than things that are far apart.

OR

• Spatial correlation functions tend to decrease with distance

Sampling Environmental Resource Populations

• Environmental Resource Populations exist in a spatial matrix

• Population elements close to one another tend to be more similar than widely separated elements

• Good sampling designs tend to spread out the sample points more or less regularly

• Simple random sampling tends to exhibit uneven spatial patterns

A B C28 28 15

Simple random sample of a domain

with 3 subdomains

Sampling Environmental Resource Populations

• Patterned response (gradients, patches, periodic responses)

• Variable inclusion probability • 0, 1, and 2 dimensional populations (points,

lines, & areas) • Pattern in population occurrence (density )• Unreliable frame material • Temporal panels often needed

Environmental Resource Populations

Ecological importance, environmental stressor levels, scientific interest, and political importance are not uniform over the extent of the resource

Desirable Properties of Environmental Resource Samples

• (1) Accommodate varying spatial sample intensity

 • (2) Spread the sample points evenly and

regularly over the domain, subject to (1)

• (3) Allow augmentation of the sample after-the-fact, while maintaining (2)

Desirable Properties of Environmental Resource Samples

• (4) Accommodate varying population spatial density for finite & linear populations, subject to (1) & (2).

• (2) + (4) Sample spatial pattern should reflect the (finite or linear) population spatial pattern

Sampling Environmental Resource Populations

• Systematic sample has substantial disadvantages – Well known problems with periodic response – Less well recognized problem: patch-like

response

A B C26 24 15

A B C32 20 16

Sampling Environmental Resource Populations

• Systematic sample has substantial disadvantages – Well known problems with periodic response

– Less well recognized problem: patch-like response

– Difficult to apply to finite populations , e.g., Lakes

– Limited flexibility to change sample point density

– Difficult to accommodate variable inclusion probability or sample adjustment for frame errors

A B C26 88 15

Sample point intensity can be changed using nested grids

RANDOM-TESSELLATION STRATIFIED (RTS) DESIGN

• Compromise between systematic & SRS that resolves periodic/patchy response

• Cover the population domain with a grid – Randomly located – Regular (square or triangular) – Spacing chosen to give required spatial resolution – Tile the domain with equal-sized regular polygons

containing the grid points – Select one sample point at random from each

tessellation polygon

RANDOM-TESSELLATION STRATIFIED (RTS) DESIGN

• Solves some of systematic sample problems– Non-zero pairwise inclusion probability– Alignment with geographic features of

population– Lets points get close together with low

probability

RTS DESIGN

• Does not resolve systematic sample difficulties with – variable probability – finite & linear populations – pattern in population occurrence (density) – unreliable frame material– Limited ability to change density

Generalized Random-Tessellation Stratified (GRTS)

Design• Conceptual structure:

– Population indexed by points contained within a region R

– Have inclusion probability s) defined on R– Select a sample by picking points

• Finite: points represent units s) is usual inclusion probability

• Linear: points on the lines s) is a density: #sample points /unit length

• Extensive: points are in region area s) is a density: #sample points/unit area

GRTS Design Mechanics

• Map R into first quadrant of unit square, & add a random offset

• Subdivide unit square into “small” grid cells– At least small enough so that total inclusion probability

for a cell (expected number of samples in the cell) is less than 1

– Total inclusion probability for cell is sum or integral of s) over the extent of the cell

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

Population region image

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

Population region image + random offset

GRTS Design Mechanics

Order the cells so that some 2-dimensional proximity relationships are preserved– Can’t preserve everything, because a 1-1, onto,

continuous map from unit square to unit interval is impossible

– Can get 1-1,onto, & measureable, which is good enough

– GRTS uses a quadrant-recursive function, similar to the space filling curve developed by Guiseppe Peano in 1890.

Assign each cell an address corresponding to the order of subdivision

The address of the shaded quadrant is 0.213

Order the cells following the address order

GRTS DesignMechanics

• If we carry the process to the limit, letting the grid cell size 0, the result is a quadrant recursive function, that is, a function that maps the unit square onto the unit interval such that the image of every quadrant is an interval.

• Apply a restricted randomization that preserves “quadrant recursiveness”

HIERARCHICAL RANDOMIZATION

Each cell address is a base 4 fraction, that is, t = 0.t1t2t3...,

where each digit ti is either a 0, 1, 2, or 3. A function hp is a

hierarchical permutation if

where is a possibly distinct permutation of

{0,1,2,3} for each unique combination of digits

t1, t2, ..., tn - 1.

1 1 2p 1 2 31 2 3t t t(t) = 0. ( ) ( ) ( )...p p ph t t t

_1 2 n 1... nt t t

( )p

HIERARCHICAL RANDOMIZATION

• If the permutations that define hp() are chosen at

random and independently from the set of all possible

permutations, we call hp() a hierarchical

randomization function, and the process of applying

hp() hierarchical randomization.

• Compose the basic q-r map with a hierarchical

randomization function

x

y

0 1

01

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

x

y

0 1

01

x

y

0 1

01

1

2 3

4

5

67

8

9

10 11

12

1314

1516

x

y

0 1

01

GRTS DesignMechanics

• The result is a random order of the “small” grid cells such that – All grid cells in the same quadrant have

consecutive order positions• But will be randomly ordered within those positions

– This holds for all quadrant levels

• This induces a random ordering of population elements

GRTS DesignMechanics

• Assign each grid cell a length equal to its total inclusion probability

• String the lengths in the random order– Result is a line with length equal to target sample size

• Take systematic sample along line (random start + unit interval)

• Map back to population using inverse random qr function

GRTS DesignMechanics

• Points will be in ‘hierarchical random order’• Re-order into ‘reverse hierarchical order’ gives

some very useful features to the sample

Reverse Hierarchical Order

• Illustrate for 2-levels of addressing:

First 16 addresses as base 4-fractions

00 01 02 03 10 11 12 13 20 21 22 23 30 31 32 33

Reverse Hierarchical Order

• Illustrate for 2-levels of addressing:

First 16 addresses as base 4-fractions

00 01 02 03 10 11 12 13 20 21 22 23 30 31 32 33

Reversed digits

00 10 20 30 01 11 21 31 02 12 22 32 03 13 23 33

Reverse Hierarchical Order

• Illustrate for 2-levels of addressing:

First 16 addresses as base 4-numbers

00 01 02 03 10 11 12 13 20 21 22 23 30 31 32 33

Reversed digits

00 10 20 30 01 11 21 31 02 12 22 32 03 13 23 33

Reversed digits as base 10 numbers

0 4 8 12 1 5 9 13 2 6 10 14 3 7 11 15

SPATIAL PROPERTIES OF REVERSE

HIERARCHICAL ORDERED GRTS SAMPLE • The complete sample is nearly regular, capturing much of the

potential efficiency of a systematic sample without the potential flaws

• Any subsample consisting of a consecutive subsequence is almost as regular as the full sample; in particular, the subsequence

, is a spatially well-balanced sample.

• Any consecutive sequence subsample, restricted to the accessible domain, is a spatially well-balanced sample of the accessible domain.

for k Mk 1 2 k = { , , ..., }, S s s s

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

Inclusion probability density surface

Region is (0,1)x(0,0.8)

c(0, 1)

c(0

, 1

)

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

SPATIAL PROPERTIES OF REVERSE HIERARCHICAL ORDERED GRTS SAMPLE

• Assess spatial balance by variance of size of Voronoi polygons, compared to SRS sample of the same size.

• Voronoi polygons for a set of points: The ith polygon is the collection of points in the domain that are closer to si than to any other sj in

the set.• Estimate variance by 1000 replications of a

sample of size 256 in unit square

1 2 k{ , , ..., }s s s

Voronoi Polygons

GRTS SampleUniform Sample

SPATIAL PROPERTIES OF REVERSE HIERARCHICAL ORDERED GRTS SAMPLE

• Compare regularity as points are added one at a time, following reverse hierarchical order under 4 scenarios:– Complete, continuous domain

– Domains with “holes” excluding 20 %, modeling non-response/access refusal

• 20 randomly-located square holes, constant size

• 20 randomly-located square holes, increasing linearly in size

• 10 randomly-located square holes, increasing exponentially in size

Linear IncreaseConstant Size

Exponential Increase

Inaccessible Domain Patterns (20% Inaccessible)

Uniform Sample GRTS Sample

Voronoi Polygons

point density

po

lyg

on

are

a v

ari

an

ce r

atio

0 50 100 150 200 250

0.0

0.2

0.4

0.6

0.8

1.0

Exponentially increasing polygon size, total perimeter = 4.2

Linearly increasing polygon size, total perimeter = 7.6

Constant polygon size, total perimeter = 8

Continuous domain with no voids

20 –point GRTS Sample

Four 20-point GRTS Panels

Five 20-point GRTS Panels

Five 20-point GRTS Panels

+ Special Study Area

Finite Population Example

Equi-probable GRTS Sample

GRTS Sample: Probability inversely proportional to population density

Equi-probableInverse density