MODULE 5: Spatial Statistics in Epidemiology and …faculty.washington.edu/.../2020_SISMID_5_3.pdfI...

56
Outline Preliminaries Random patterns Estimating intensities Second order properties MODULE 5: Spatial Statistics in Epidemiology and Public Health Lecture 3: Point Processes Jon Wakefield and Lance Waller 1 / 56

Transcript of MODULE 5: Spatial Statistics in Epidemiology and …faculty.washington.edu/.../2020_SISMID_5_3.pdfI...

Page 1: MODULE 5: Spatial Statistics in Epidemiology and …faculty.washington.edu/.../2020_SISMID_5_3.pdfI Diggle, P.J. (2013) Statistical Analysis of Spatial and Spatio-Temporal Point Patterns,

OutlinePreliminaries

Random patternsEstimating intensities

Second order properties

MODULE 5: Spatial Statistics in Epidemiologyand Public Health

Lecture 3: Point Processes

Jon Wakefield and Lance Waller

1 / 56

Page 2: MODULE 5: Spatial Statistics in Epidemiology and …faculty.washington.edu/.../2020_SISMID_5_3.pdfI Diggle, P.J. (2013) Statistical Analysis of Spatial and Spatio-Temporal Point Patterns,

OutlinePreliminaries

Random patternsEstimating intensities

Second order properties

Preliminaries

Random patternsMonte Carlo testingHeterogeneous Poisson process

Estimating intensities

Second order propertiesK functionsEstimating K functionsEdge correctionMonte Carlo envelopes

2 / 56

Page 3: MODULE 5: Spatial Statistics in Epidemiology and …faculty.washington.edu/.../2020_SISMID_5_3.pdfI Diggle, P.J. (2013) Statistical Analysis of Spatial and Spatio-Temporal Point Patterns,

OutlinePreliminaries

Random patternsEstimating intensities

Second order properties

References

I Baddeley, A., Rubak, E., and Turner. R. (2015) Spatial PointPatterns: Methodology and Applications in R. Boca Raton,FL: CRC/Chapman & Hall.

I Diggle, P.J. (1983) Statistical Analysis of Spatial PointPatterns. London: Academic Press.

I Diggle, P.J. (2013) Statistical Analysis of Spatial andSpatio-Temporal Point Patterns, Third EditionCRC/Chapman & Hall.

I Waller and Gotway (2004, Chapter 5) Applied SpatialStatistics for Public Health Data. New York: Wiley.

I Møller, J. and Waagepetersen (2004) Statistical Inference andSimulation for Spatial Point Processes. Boca Raton, FL:CRC/Chapman & Hall.

3 / 56

Page 4: MODULE 5: Spatial Statistics in Epidemiology and …faculty.washington.edu/.../2020_SISMID_5_3.pdfI Diggle, P.J. (2013) Statistical Analysis of Spatial and Spatio-Temporal Point Patterns,

OutlinePreliminaries

Random patternsEstimating intensities

Second order properties

Goals

I Describe basic types of spatial point patterns.

I Introduce mathematical models for random patterns of events.

I Introduce analytic methods for describing patterns in observedcollections of events.

I Illustrate the approaches using examples from archaeology,conservation biology, and epidermal neurology.

4 / 56

Page 5: MODULE 5: Spatial Statistics in Epidemiology and …faculty.washington.edu/.../2020_SISMID_5_3.pdfI Diggle, P.J. (2013) Statistical Analysis of Spatial and Spatio-Temporal Point Patterns,

OutlinePreliminaries

Random patternsEstimating intensities

Second order properties

Monte Carlo testingHeterogeneous Poisson process

Terminology

I Realization: An observed set of event locations (a data set).

I Point: Where an event could occur.

I Event: Where an event did occur.

5 / 56

Page 6: MODULE 5: Spatial Statistics in Epidemiology and …faculty.washington.edu/.../2020_SISMID_5_3.pdfI Diggle, P.J. (2013) Statistical Analysis of Spatial and Spatio-Temporal Point Patterns,

OutlinePreliminaries

Random patternsEstimating intensities

Second order properties

Monte Carlo testingHeterogeneous Poisson process

Complete Spatial Randomness (CSR)

I Start with a model of “lack of pattern”.I Events equally likely to occur anywhere in the study area

(uniform distribution).I Sometimes this is referred to as a “random” pattern, but...I Any probability model generates patterns so, in effect, all of

the patterns we consider are “random”, let’s use “uniform”.

I Event locations independent of each other.

6 / 56

Page 7: MODULE 5: Spatial Statistics in Epidemiology and …faculty.washington.edu/.../2020_SISMID_5_3.pdfI Diggle, P.J. (2013) Statistical Analysis of Spatial and Spatio-Temporal Point Patterns,

OutlinePreliminaries

Random patternsEstimating intensities

Second order properties

Monte Carlo testingHeterogeneous Poisson process

Six realizations of CSR

●●

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

u

v

●●

●●

●●

0.0 0.2 0.4 0.6 0.8 1.00.

00.

20.

40.

60.

81.

0u

v

●●

●●

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

u

v

●●

●●

● ●

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

u

v ●

●●

●●

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

u

v

●●

●●

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

u

v

7 / 56

Page 8: MODULE 5: Spatial Statistics in Epidemiology and …faculty.washington.edu/.../2020_SISMID_5_3.pdfI Diggle, P.J. (2013) Statistical Analysis of Spatial and Spatio-Temporal Point Patterns,

OutlinePreliminaries

Random patternsEstimating intensities

Second order properties

Monte Carlo testingHeterogeneous Poisson process

CSR as a boundary condition

CSR serves as a boundary between:

I Patterns that are more “clustered” than CSR.

I Patterns that are more “regular” than CSR.

8 / 56

Page 9: MODULE 5: Spatial Statistics in Epidemiology and …faculty.washington.edu/.../2020_SISMID_5_3.pdfI Diggle, P.J. (2013) Statistical Analysis of Spatial and Spatio-Temporal Point Patterns,

OutlinePreliminaries

Random patternsEstimating intensities

Second order properties

Monte Carlo testingHeterogeneous Poisson process

Too Clustered (top), Too Regular (bottom)

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

u

v

Clustered

0.0 0.2 0.4 0.6 0.8 1.00.

00.

20.

40.

60.

81.

0

u

v

Clustered

●●

●●

●●

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

u

v

Clustered

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

u

v

Regular

●●

●●

●●

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

u

v

Regular

●●

●●

●●

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

u

v

Regular

9 / 56

Page 10: MODULE 5: Spatial Statistics in Epidemiology and …faculty.washington.edu/.../2020_SISMID_5_3.pdfI Diggle, P.J. (2013) Statistical Analysis of Spatial and Spatio-Temporal Point Patterns,

OutlinePreliminaries

Random patternsEstimating intensities

Second order properties

Monte Carlo testingHeterogeneous Poisson process

The role of scale

I An observed pattern may be clustered at one spatial scale,and regular at another.

I The scale of process, scale of observation, and scale ofintervention might be different!

I How might this impact our ability to detect patterns?

I Many ecology papers on estimating scale of a process (e.g.,plant disease, animal territories) but little in public healthliterature.

10 / 56

Page 11: MODULE 5: Spatial Statistics in Epidemiology and …faculty.washington.edu/.../2020_SISMID_5_3.pdfI Diggle, P.J. (2013) Statistical Analysis of Spatial and Spatio-Temporal Point Patterns,

OutlinePreliminaries

Random patternsEstimating intensities

Second order properties

Monte Carlo testingHeterogeneous Poisson process

Clusters of regular patterns/Regular clusters

●●

●●

●●

●●

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

u

v

● ●

●●

●●

●●

●●●

●●●

●●

●●

●●●

●●

●●

●●

●●

●●●

●●●

●●

●●

●●

Regular pattern of clusters

● ●

●● ●

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

u

v

● ●

●● ●● ●

●● ●

● ●

●● ●

● ●

●● ●

● ●

●● ●

● ●

●● ●

● ●

●● ●

● ●

●● ●

● ●

●● ●

● ●

●● ●

● ●

●● ●

● ●

●● ●

● ●

●● ●

● ●

●● ●

● ●

●● ●

● ●

●● ●

● ●

●● ●

● ●

●● ●

● ●

●● ●

Cluster of regular patterns

11 / 56

Page 12: MODULE 5: Spatial Statistics in Epidemiology and …faculty.washington.edu/.../2020_SISMID_5_3.pdfI Diggle, P.J. (2013) Statistical Analysis of Spatial and Spatio-Temporal Point Patterns,

OutlinePreliminaries

Random patternsEstimating intensities

Second order properties

Monte Carlo testingHeterogeneous Poisson process

Spatial Point Processes

I Mathematically, we treat our point patterns as realizations ofa spatial stochastic process.

I A stochastic process is a collection of random variablesX1,X2, . . . ,XN .

I Examples: Number of people in line at grocery store.

I For us, each random variable represents an event location.

12 / 56

Page 13: MODULE 5: Spatial Statistics in Epidemiology and …faculty.washington.edu/.../2020_SISMID_5_3.pdfI Diggle, P.J. (2013) Statistical Analysis of Spatial and Spatio-Temporal Point Patterns,

OutlinePreliminaries

Random patternsEstimating intensities

Second order properties

Monte Carlo testingHeterogeneous Poisson process

Stationarity/Isotropy

I Stationarity: Properties of process invariant to translation.

I Isotropy: Properties of process invariant to rotation around anorigin.

I Why do we need these? They provide a sort of replication inthe data that allows statistical estimation.

I Not required, but development easier.

13 / 56

Page 14: MODULE 5: Spatial Statistics in Epidemiology and …faculty.washington.edu/.../2020_SISMID_5_3.pdfI Diggle, P.J. (2013) Statistical Analysis of Spatial and Spatio-Temporal Point Patterns,

OutlinePreliminaries

Random patternsEstimating intensities

Second order properties

Monte Carlo testingHeterogeneous Poisson process

CSR as a Stochastic Process

Let N(A) = number of events observed in region A, and λ = apositive constant.

A homogenous spatial Poisson point process is defined by:

(a) N(A) ∼ Pois(λ|A|)(b) given N(A) = n, the locations of the events are uniformly

distributed over A.

λ is the intensity of the process (mean number of events expectedper unit area).

14 / 56

Page 15: MODULE 5: Spatial Statistics in Epidemiology and …faculty.washington.edu/.../2020_SISMID_5_3.pdfI Diggle, P.J. (2013) Statistical Analysis of Spatial and Spatio-Temporal Point Patterns,

OutlinePreliminaries

Random patternsEstimating intensities

Second order properties

Monte Carlo testingHeterogeneous Poisson process

Is this CSR?

I Criterion (b) describes our notion of uniform andindependently distributed in space.

I Criteria (a) and (b) give a “recipe” for simulating realizationsof this process:

* Generate a Poisson random number of events.* Distribute that many events uniformly across the study area.runif(n,min(x),max(x))

runif(n,min(y),max(y))

15 / 56

Page 16: MODULE 5: Spatial Statistics in Epidemiology and …faculty.washington.edu/.../2020_SISMID_5_3.pdfI Diggle, P.J. (2013) Statistical Analysis of Spatial and Spatio-Temporal Point Patterns,

OutlinePreliminaries

Random patternsEstimating intensities

Second order properties

Monte Carlo testingHeterogeneous Poisson process

Compared to temporal Poisson process

Recall three features of a Poisson process in time:

I Number of events in non-overlapping intervals are Poissondistributed,

I Conditional on the number of events, events are uniformlydistributed within a fixed interval, and

I Interevent times are exponentially distributed.

In space, no ordering so no (uniquely defined) interevent distances.

16 / 56

Page 17: MODULE 5: Spatial Statistics in Epidemiology and …faculty.washington.edu/.../2020_SISMID_5_3.pdfI Diggle, P.J. (2013) Statistical Analysis of Spatial and Spatio-Temporal Point Patterns,

OutlinePreliminaries

Random patternsEstimating intensities

Second order properties

Monte Carlo testingHeterogeneous Poisson process

Monte Carlo hypothesis testing

Review of basic hypothesis testing framework:

I T= a random variable representing the test statistic.

I Under H0 : T ∼ F0.

I From the data, observe T = tobs (the observed value of thetest statistic).p-value = 1− F0(tobs)

I Sometimes it is easier to simulate F0(·) than to calculate F0(·)exactly.

Besag and Diggle (1977).

17 / 56

Page 18: MODULE 5: Spatial Statistics in Epidemiology and …faculty.washington.edu/.../2020_SISMID_5_3.pdfI Diggle, P.J. (2013) Statistical Analysis of Spatial and Spatio-Temporal Point Patterns,

OutlinePreliminaries

Random patternsEstimating intensities

Second order properties

Monte Carlo testingHeterogeneous Poisson process

Steps in Monte Carlo testing

1. observe t1.

2. simulate t2, ..., tm from F0.

3. p.value = rank of t1m .

For tests of CSR (complete spatial randomness), M.C. tests arevery useful since CSR is easy to simulate but distribution of teststatistic may be complex.

18 / 56

Page 19: MODULE 5: Spatial Statistics in Epidemiology and …faculty.washington.edu/.../2020_SISMID_5_3.pdfI Diggle, P.J. (2013) Statistical Analysis of Spatial and Spatio-Temporal Point Patterns,

OutlinePreliminaries

Random patternsEstimating intensities

Second order properties

Monte Carlo testingHeterogeneous Poisson process

Monte Carlo tests very helpful

1. when distribution of U is complex but the spatial distributionassociated with H0 is easy to simulate.

2. for permutation tests

e.g., 592 cases of leukemia in ∼ 1 million people in 790regionsI permutation test requires all possible permutations.I Monte Carlo assigns cases to regions under H0.

19 / 56

Page 20: MODULE 5: Spatial Statistics in Epidemiology and …faculty.washington.edu/.../2020_SISMID_5_3.pdfI Diggle, P.J. (2013) Statistical Analysis of Spatial and Spatio-Temporal Point Patterns,

OutlinePreliminaries

Random patternsEstimating intensities

Second order properties

Monte Carlo testingHeterogeneous Poisson process

Activity: Trees and CSR

I Event locations for a strand of 327 myrtle trees in arectangular plot 170.5 × 213.0 meters.

I 221 healthy trees.

I 106 diseased trees.

I Research question: Is the spatial pattern of diseased trees thesame as the spatial pattern of health trees?

I Start with:I Are all locations consistent with CSR?I Are diseased locations consistent with CSR?I Are health locations consistent with CSR?

20 / 56

Page 21: MODULE 5: Spatial Statistics in Epidemiology and …faculty.washington.edu/.../2020_SISMID_5_3.pdfI Diggle, P.J. (2013) Statistical Analysis of Spatial and Spatio-Temporal Point Patterns,

OutlinePreliminaries

Random patternsEstimating intensities

Second order properties

Monte Carlo testingHeterogeneous Poisson process

Final CSR notes

CSR:

1. is the “white noise” of spatial point processes.

2. characterizes the absence of structure (signal) in data.

3. often the null hypothesis in statistical tests to determine ifthere is structure in an observed point pattern.

4. not as useful in public health? Why not?

21 / 56

Page 22: MODULE 5: Spatial Statistics in Epidemiology and …faculty.washington.edu/.../2020_SISMID_5_3.pdfI Diggle, P.J. (2013) Statistical Analysis of Spatial and Spatio-Temporal Point Patterns,

OutlinePreliminaries

Random patternsEstimating intensities

Second order properties

Monte Carlo testingHeterogeneous Poisson process

Heterogeneous population density

●●

●●

●●

●●

●●

●●

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

u

v

●●

●●

●●

●●

●●

●●

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

u

v

22 / 56

Page 23: MODULE 5: Spatial Statistics in Epidemiology and …faculty.washington.edu/.../2020_SISMID_5_3.pdfI Diggle, P.J. (2013) Statistical Analysis of Spatial and Spatio-Temporal Point Patterns,

OutlinePreliminaries

Random patternsEstimating intensities

Second order properties

Monte Carlo testingHeterogeneous Poisson process

Heterogeneous Poisson Process

1. What if λ, the intensity of the process (mean number ofevents expected per unit area), varies by location?

2. N(A) = Pois(∫

(s)∈A λ(s)ds)

(|A| =∫(s)∈A ds)

3. Given N(A) = n, events distributed in A as an independentsample from a distribution on A with p.d.f. proportional toλ(s).

23 / 56

Page 24: MODULE 5: Spatial Statistics in Epidemiology and …faculty.washington.edu/.../2020_SISMID_5_3.pdfI Diggle, P.J. (2013) Statistical Analysis of Spatial and Spatio-Temporal Point Patterns,

OutlinePreliminaries

Random patternsEstimating intensities

Second order properties

Monte Carlo testingHeterogeneous Poisson process

Example intensity function

u

v

lambda

u

v

0 5 10 15 20

05

1015

20

24 / 56

Page 25: MODULE 5: Spatial Statistics in Epidemiology and …faculty.washington.edu/.../2020_SISMID_5_3.pdfI Diggle, P.J. (2013) Statistical Analysis of Spatial and Spatio-Temporal Point Patterns,

OutlinePreliminaries

Random patternsEstimating intensities

Second order properties

Monte Carlo testingHeterogeneous Poisson process

Six realizations

●●

● ●

●●

0 5 10 15 20

05

1015

20

u

v

●●

●●

●●

●●

● ●

● ●

●●

●●

●●

0 5 10 15 200

510

1520

u

v ●

● ●

●●

●●

● ●

0 5 10 15 20

05

1015

20

u

v

●●

●●

●●

●●

●●

●●

● ●

● ●

0 5 10 15 20

05

1015

20

u

v

●●

● ●

●●

●●

0 5 10 15 20

05

1015

20

u

v

●●

● ●

● ●

● ●

●●

● ●

●●

●●

0 5 10 15 20

05

1015

20

u

v

25 / 56

Page 26: MODULE 5: Spatial Statistics in Epidemiology and …faculty.washington.edu/.../2020_SISMID_5_3.pdfI Diggle, P.J. (2013) Statistical Analysis of Spatial and Spatio-Temporal Point Patterns,

OutlinePreliminaries

Random patternsEstimating intensities

Second order properties

Monte Carlo testingHeterogeneous Poisson process

IMPORTANT FACT!

Without additional information, no analysis can differentiatebetween:

1. independent events in a heterogeneous (non-stationary)environment

2. dependent events in a homogeneous (stationary) environment

26 / 56

Page 27: MODULE 5: Spatial Statistics in Epidemiology and …faculty.washington.edu/.../2020_SISMID_5_3.pdfI Diggle, P.J. (2013) Statistical Analysis of Spatial and Spatio-Temporal Point Patterns,

OutlinePreliminaries

Random patternsEstimating intensities

Second order properties

How do we estimate intensities?

For an heterogeneous Poisson process, λ(s) is closely related to thedensity of events over A (λ(s) ∝ density).

So density estimators provide a natural approach (for details ondensity estimation see Silverman (1986) and Wand and Jones(1995, KernSmooth R library)).

Main idea: Put a little “kernel” of density at each data point, thensum to give the estimate of the overall density function.

27 / 56

Page 28: MODULE 5: Spatial Statistics in Epidemiology and …faculty.washington.edu/.../2020_SISMID_5_3.pdfI Diggle, P.J. (2013) Statistical Analysis of Spatial and Spatio-Temporal Point Patterns,

OutlinePreliminaries

Random patternsEstimating intensities

Second order properties

Kernels and bandwidths

s

0.0 0.2 0.4 0.6 0.8 1.0

Kernel variance = 0.02

s

0.0 0.2 0.4 0.6 0.8 1.0

Kernel variance = 0.03

s

0.0 0.2 0.4 0.6 0.8 1.0

Kernel variance = 0.04

s

0.0 0.2 0.4 0.6 0.8 1.0

Kernel variance = 0.1

28 / 56

Page 29: MODULE 5: Spatial Statistics in Epidemiology and …faculty.washington.edu/.../2020_SISMID_5_3.pdfI Diggle, P.J. (2013) Statistical Analysis of Spatial and Spatio-Temporal Point Patterns,

OutlinePreliminaries

Random patternsEstimating intensities

Second order properties

1-dim kernel estimation

If we have observations u1, u2, . . . , uN (in one dimension), thekernel density estimate is

f (u) =1

Nb

N∑i=1

Kern

(u − ui

b

)(1)

where Kern(·) is a kernel function satisfying∫D

Kern(s)ds = 1

and b the bandwidth. To estimate the intensity function, replaceN−1 by |D|−1.

29 / 56

Page 30: MODULE 5: Spatial Statistics in Epidemiology and …faculty.washington.edu/.../2020_SISMID_5_3.pdfI Diggle, P.J. (2013) Statistical Analysis of Spatial and Spatio-Temporal Point Patterns,

OutlinePreliminaries

Random patternsEstimating intensities

Second order properties

Kernel estimation in R

base

I density() one-dimensional kernel

library(KernSmooth)

I bkde2D(x, bandwidth, gridsize=c(51, 51),

range.x=<<see below>>, truncate=TRUE) block kerneldensity estimation

library(splancs)

I kernel2d(pts,poly,h0,nx=20,

ny=20,kernel=’quartic’)

library(spatstat)

I ksmooth.ppp(x, sigma, weights, edge=TRUE)

30 / 56

Page 31: MODULE 5: Spatial Statistics in Epidemiology and …faculty.washington.edu/.../2020_SISMID_5_3.pdfI Diggle, P.J. (2013) Statistical Analysis of Spatial and Spatio-Temporal Point Patterns,

OutlinePreliminaries

Random patternsEstimating intensities

Second order properties

Data Break: Early Medieval Grave Sites

What do we have?

I Alt and Vach (1991). (Data sent from Richard WrightEmeritus Professor, School of Archaeology, University ofSydney.)

I Archeaological dig in Neresheim, Baden-Wurttemberg,Germany.

I The anthropologists and archaeologists involved wonder if thisparticular culture tended to place grave sites according tofamily units.

I 143 grave sites.

31 / 56

Page 32: MODULE 5: Spatial Statistics in Epidemiology and …faculty.washington.edu/.../2020_SISMID_5_3.pdfI Diggle, P.J. (2013) Statistical Analysis of Spatial and Spatio-Temporal Point Patterns,

OutlinePreliminaries

Random patternsEstimating intensities

Second order properties

What do we want?

I 30 with missing or reduced wisdom teeth (“affected”).

I Does the pattern of “affected” graves (cases) differ from thepattern of the 113 non-affected graves (controls)?

I How could estimates of the intensity functions for the affectedand non-affected grave sites, respectively, help answer thisquestion?

32 / 56

Page 33: MODULE 5: Spatial Statistics in Epidemiology and …faculty.washington.edu/.../2020_SISMID_5_3.pdfI Diggle, P.J. (2013) Statistical Analysis of Spatial and Spatio-Temporal Point Patterns,

OutlinePreliminaries

Random patternsEstimating intensities

Second order properties

Outline of analysis

I Read in data.

I Plot data (axis intervals important!).

I Call 2-dimensional kernel smoothing functions (choose kerneland bandwidth).

I Plot surface (persp()) and contour (contour()) plots.

I Visual comparison of two intensities.

33 / 56

Page 34: MODULE 5: Spatial Statistics in Epidemiology and …faculty.washington.edu/.../2020_SISMID_5_3.pdfI Diggle, P.J. (2013) Statistical Analysis of Spatial and Spatio-Temporal Point Patterns,

OutlinePreliminaries

Random patternsEstimating intensities

Second order properties

Plot of the data

*

*

**

**

*

** * *

**

*

*

*

*

*

*

**

*

*

**

**

*

*

*

***

**

*

*

*

*

*

*

*

**

*

*

*

*

*

*

*

**

*

*

*

**

**

*

**

*

*

*

**

*

*

*

**

*

*

*

***

* *

**

*

*

*

*

**

**

*

**

*

*

*

**

*

*

*

*

**

*

*

*

*

*

*

**

** *

*

*

*

*

**

*

*

*

*

****

*

*

*

**

*

*

*

***

*

*

4000 6000 8000 10000

4000

6000

8000

10000

u

v

Grave locations (*=grave, O=affected)

OO

O

O

O

O

O

O

OO

OOO

O

O

OO

O

O

O

O

O

O

O

O

OO

OO

O

34 / 56

Page 35: MODULE 5: Spatial Statistics in Epidemiology and …faculty.washington.edu/.../2020_SISMID_5_3.pdfI Diggle, P.J. (2013) Statistical Analysis of Spatial and Spatio-Temporal Point Patterns,

OutlinePreliminaries

Random patternsEstimating intensities

Second order properties

Case intensity

u

v

Intensity

Estimated intensity function

**

*

*

*

*

*

*

* *

***

*

*

**

**

*

*

*

**

*** **

*

4000 8000

4000

8000

u

v

Affected grave locations

35 / 56

Page 36: MODULE 5: Spatial Statistics in Epidemiology and …faculty.washington.edu/.../2020_SISMID_5_3.pdfI Diggle, P.J. (2013) Statistical Analysis of Spatial and Spatio-Temporal Point Patterns,

OutlinePreliminaries

Random patternsEstimating intensities

Second order properties

Control intensity

u

v

Intensity

Estimated intensity function

*

*

**

*

** * *

**

*

*

*

*

*

**

*

*

**

**

*

*

***

***

*

**

**

*

*

*

*

*

*

**

*

*

*

*

*

**

*

*

**

*

*

*

*

*

** *

***

*

*

***

*

*

*

**

*

*

*

*

**

*

*

*

*

*

*

** *

*

*

*

**

*

*

**

****

*

*

**

*

*

*

*

4000 6000 8000 10000

4000

6000

8000

1000

0

u

v

Non−affected grave locations

36 / 56

Page 37: MODULE 5: Spatial Statistics in Epidemiology and …faculty.washington.edu/.../2020_SISMID_5_3.pdfI Diggle, P.J. (2013) Statistical Analysis of Spatial and Spatio-Temporal Point Patterns,

OutlinePreliminaries

Random patternsEstimating intensities

Second order properties

What we have/don’t have

I Kernel estimates suggest similarities and differences.

I Suggest locations where there might be differences.

I No significance testing (yet!)

37 / 56

Page 38: MODULE 5: Spatial Statistics in Epidemiology and …faculty.washington.edu/.../2020_SISMID_5_3.pdfI Diggle, P.J. (2013) Statistical Analysis of Spatial and Spatio-Temporal Point Patterns,

OutlinePreliminaries

Random patternsEstimating intensities

Second order properties

K functionsEstimating K functionsEdge correctionMonte Carlo envelopes

First and Second Order Properties

I The intensity function describes the mean number of eventsper unit area, a first order property of the underlying process.

I What about second order properties relating to thevariance/covariance/correlation between event locations (ifevents non independent...)?

38 / 56

Page 39: MODULE 5: Spatial Statistics in Epidemiology and …faculty.washington.edu/.../2020_SISMID_5_3.pdfI Diggle, P.J. (2013) Statistical Analysis of Spatial and Spatio-Temporal Point Patterns,

OutlinePreliminaries

Random patternsEstimating intensities

Second order properties

K functionsEstimating K functionsEdge correctionMonte Carlo envelopes

Ripley’s K function

Ripley (1976, 1977 introduced) the reduced second momentmeasure or K function

K (h) =E [# events within h of a randomly chosen event]

λ,

for any positive spatial lag h.

NOTE: Use of λ implies assumption of stationary process!

39 / 56

Page 40: MODULE 5: Spatial Statistics in Epidemiology and …faculty.washington.edu/.../2020_SISMID_5_3.pdfI Diggle, P.J. (2013) Statistical Analysis of Spatial and Spatio-Temporal Point Patterns,

OutlinePreliminaries

Random patternsEstimating intensities

Second order properties

K functionsEstimating K functionsEdge correctionMonte Carlo envelopes

Properties of K (h)

I Ripley (1977) shows specifying K (h) for all h > 0, equivalentto specifying Var[N(A)] for any subregion A.

I Under CSR, K (h) = πh2 (area of circle of with radius h).

I Clustered? K (h) > πh2.

I Regular? K (h) < πh2.

40 / 56

Page 41: MODULE 5: Spatial Statistics in Epidemiology and …faculty.washington.edu/.../2020_SISMID_5_3.pdfI Diggle, P.J. (2013) Statistical Analysis of Spatial and Spatio-Temporal Point Patterns,

OutlinePreliminaries

Random patternsEstimating intensities

Second order properties

K functionsEstimating K functionsEdge correctionMonte Carlo envelopes

Estimating K (h)

Start with definition, replacing expectation with average yields

K (h) =1

λN

N∑i=1

N∑j=1

j 6=i

δ(d(i , j) < h),

for N events, where

I d(i , j) = distance between events i and j

I δ(d(i , j) < h) = 1 if d(i , j) < h, 0 otherwise.

41 / 56

Page 42: MODULE 5: Spatial Statistics in Epidemiology and …faculty.washington.edu/.../2020_SISMID_5_3.pdfI Diggle, P.J. (2013) Statistical Analysis of Spatial and Spatio-Temporal Point Patterns,

OutlinePreliminaries

Random patternsEstimating intensities

Second order properties

K functionsEstimating K functionsEdge correctionMonte Carlo envelopes

Edges

I Think about events near “edges” of study area.

I How do we count events within h if distance between edgeand observed event is < h?

I More of a problem as h increases...

I “Edge effects” call for “edge correction”.

42 / 56

Page 43: MODULE 5: Spatial Statistics in Epidemiology and …faculty.washington.edu/.../2020_SISMID_5_3.pdfI Diggle, P.J. (2013) Statistical Analysis of Spatial and Spatio-Temporal Point Patterns,

OutlinePreliminaries

Random patternsEstimating intensities

Second order properties

K functionsEstimating K functionsEdge correctionMonte Carlo envelopes

Edge correction

I Limit estimates to smaller distances h.

I Ripley’s (1976) edge correction

Kec(h) = λ−1N∑i=1

N∑j=1

j 6=i

w−1ij δ(d(i , j) < h)

where wij = proportion of the circumference of the circlecentered at event i with radius d(i , j) within the study area.

43 / 56

Page 44: MODULE 5: Spatial Statistics in Epidemiology and …faculty.washington.edu/.../2020_SISMID_5_3.pdfI Diggle, P.J. (2013) Statistical Analysis of Spatial and Spatio-Temporal Point Patterns,

OutlinePreliminaries

Random patternsEstimating intensities

Second order properties

K functionsEstimating K functionsEdge correctionMonte Carlo envelopes

Ripley’s weights

I Conceptually, conditional probability of observing an event atdistance d(i , j) given an event occurs d(i , j) from event i .

I Works for “holes” in study area too. (Astronomy application).

I Requires definition of study boundary (and way of calculatingwij).

44 / 56

Page 45: MODULE 5: Spatial Statistics in Epidemiology and …faculty.washington.edu/.../2020_SISMID_5_3.pdfI Diggle, P.J. (2013) Statistical Analysis of Spatial and Spatio-Temporal Point Patterns,

OutlinePreliminaries

Random patternsEstimating intensities

Second order properties

K functionsEstimating K functionsEdge correctionMonte Carlo envelopes

Calculating K (h) in R

library(splancs)

I khat(pts,poly,s,newstyle=FALSE)

I poly defines polygon boundary (important!!!).

library(spatstat)

I Kest(X, r, correction=c("border", "isotropic",

"Ripley", "translate"))

I Boundary part of X (point process “object”).

45 / 56

Page 46: MODULE 5: Spatial Statistics in Epidemiology and …faculty.washington.edu/.../2020_SISMID_5_3.pdfI Diggle, P.J. (2013) Statistical Analysis of Spatial and Spatio-Temporal Point Patterns,

OutlinePreliminaries

Random patternsEstimating intensities

Second order properties

K functionsEstimating K functionsEdge correctionMonte Carlo envelopes

Plots with K (h)

I Plotting (h,K (h)) for CSR is a parabola.

I K (h) = πh2 implies (K (h)

π

)1/2

= h.

I Besag (1977) suggests plotting

h versus L(h)

where

L(h) =

(Kec(h)

π

)1/2

− h

46 / 56

Page 47: MODULE 5: Spatial Statistics in Epidemiology and …faculty.washington.edu/.../2020_SISMID_5_3.pdfI Diggle, P.J. (2013) Statistical Analysis of Spatial and Spatio-Temporal Point Patterns,

OutlinePreliminaries

Random patternsEstimating intensities

Second order properties

K functionsEstimating K functionsEdge correctionMonte Carlo envelopes

Variability and Envelopes

Are there any distributional results for K (h)?

Some, but mostly for particular region shapes. Monte Carloapproaches more general.

I Observe K (h) from data.

I Simulate a realization of events from CSR.

I Find K (h) for the simulated data.

I Repeat simulations many times.

I Create simulation “envelopes” from simulation-based K (h)’s.

47 / 56

Page 48: MODULE 5: Spatial Statistics in Epidemiology and …faculty.washington.edu/.../2020_SISMID_5_3.pdfI Diggle, P.J. (2013) Statistical Analysis of Spatial and Spatio-Temporal Point Patterns,

OutlinePreliminaries

Random patternsEstimating intensities

Second order properties

K functionsEstimating K functionsEdge correctionMonte Carlo envelopes

Example: Regular clusters and clusters of regularity

0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7

−0.

10.

00.

10.

20.

3

Distance (h)

sqrt

(Kha

t/pi)

− h

Estimated K function, regular pattern of clusters

0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7

−0.

10.

00.

10.

20.

3

Distance (h)

sqrt

(Kha

t/pi)

− h

Estimated K function, cluster of regular patterns

48 / 56

Page 49: MODULE 5: Spatial Statistics in Epidemiology and …faculty.washington.edu/.../2020_SISMID_5_3.pdfI Diggle, P.J. (2013) Statistical Analysis of Spatial and Spatio-Temporal Point Patterns,

OutlinePreliminaries

Random patternsEstimating intensities

Second order properties

K functionsEstimating K functionsEdge correctionMonte Carlo envelopes

Data break: Medieval gravesites

0 1000 2000 3000 4000 5000

−40

040

0

Distance

Lhat

− d

ista

nce

L plot for all gravesites, rectangle

0 1000 2000 3000 4000 5000

−40

040

0

Distance

Lhat

− d

ista

nce

L plot for affected gravesites, rectangle

0 1000 2000 3000 4000 5000

−40

040

0

Distance

Lhat

− d

ista

nce

L plot for non−affected gravesites, rectangle

49 / 56

Page 50: MODULE 5: Spatial Statistics in Epidemiology and …faculty.washington.edu/.../2020_SISMID_5_3.pdfI Diggle, P.J. (2013) Statistical Analysis of Spatial and Spatio-Temporal Point Patterns,

OutlinePreliminaries

Random patternsEstimating intensities

Second order properties

K functionsEstimating K functionsEdge correctionMonte Carlo envelopes

Clustering!

I Looks like strong clustering, but wait...

I Compares all, cases, controls to CSR.

I Oops, forgot to adjust for polygon, clustering inside squarearea!

I Significant clustering, but interesting clustering?

I Let’s try again with polygon adjustment.

50 / 56

Page 51: MODULE 5: Spatial Statistics in Epidemiology and …faculty.washington.edu/.../2020_SISMID_5_3.pdfI Diggle, P.J. (2013) Statistical Analysis of Spatial and Spatio-Temporal Point Patterns,

OutlinePreliminaries

Random patternsEstimating intensities

Second order properties

K functionsEstimating K functionsEdge correctionMonte Carlo envelopes

Medieval graves: K functions with polygon adjustment

0 1000 2000 3000 4000 5000

−40

040

0

Distance

Lhat

− d

ista

nce

L plot for all gravesites, polygon

0 1000 2000 3000 4000 5000

−40

040

0

Distance

Lhat

− d

ista

nce

L plot for affected gravesites, polygon

0 1000 2000 3000 4000 5000

−40

040

0

Distance

Lhat

− d

ista

nce

L plot for non−affected gravesites, polygon

51 / 56

Page 52: MODULE 5: Spatial Statistics in Epidemiology and …faculty.washington.edu/.../2020_SISMID_5_3.pdfI Diggle, P.J. (2013) Statistical Analysis of Spatial and Spatio-Temporal Point Patterns,

OutlinePreliminaries

Random patternsEstimating intensities

Second order properties

K functionsEstimating K functionsEdge correctionMonte Carlo envelopes

Clustering?

I Clustering of cases at very shortest distances.

I Likely due to two coincident-pair sites (both cases in bothpairs).

I Could we try random labelling here?

I Construct envelopes based on random samples of 30 ”cases”from set of 143 locations.

52 / 56

Page 53: MODULE 5: Spatial Statistics in Epidemiology and …faculty.washington.edu/.../2020_SISMID_5_3.pdfI Diggle, P.J. (2013) Statistical Analysis of Spatial and Spatio-Temporal Point Patterns,

OutlinePreliminaries

Random patternsEstimating intensities

Second order properties

K functionsEstimating K functionsEdge correctionMonte Carlo envelopes

A few notes

I Since K (h) is a function of distance (h), it can indicateclustering at one scale and regularity at another.

I K (h) measures cumulative aggregation (# events up to h).May result in delayed response to change from clustering toregularity.

I The pair correlation function (based on K ′(h)) may respondmore instantaneously.

53 / 56

Page 54: MODULE 5: Spatial Statistics in Epidemiology and …faculty.washington.edu/.../2020_SISMID_5_3.pdfI Diggle, P.J. (2013) Statistical Analysis of Spatial and Spatio-Temporal Point Patterns,

OutlinePreliminaries

Random patternsEstimating intensities

Second order properties

K functionsEstimating K functionsEdge correctionMonte Carlo envelopes

Notes

I Just as the first and second moments do not uniquely define adistribution, λ(s) and K (h) do not uniquely define a spatialpoint pattern (Baddeley and Silverman 1984, and in Section5.3.4 ).

I Analyses based on λ(s) typically assume independent events,describing the observed pattern entirely through aheterogeneous intensity.

I Analyses based on K (h) typically assume a stationary process(with constant λ), describing the observed pattern entirelythrough correlations between events.

I Remember Barlett’s (1964) result (IMPORTANT FACT!above).

54 / 56

Page 55: MODULE 5: Spatial Statistics in Epidemiology and …faculty.washington.edu/.../2020_SISMID_5_3.pdfI Diggle, P.J. (2013) Statistical Analysis of Spatial and Spatio-Temporal Point Patterns,

OutlinePreliminaries

Random patternsEstimating intensities

Second order properties

K functionsEstimating K functionsEdge correctionMonte Carlo envelopes

What questions can we answer?

I Are events uniformly distributed in space?I Test CSR.

I If not, where are events more or less likely?I Intensity estimation.

I Do events tend to occur near other events, and, if so, at whatscale?I K functions with Monte Carlo envelopes.

55 / 56

Page 56: MODULE 5: Spatial Statistics in Epidemiology and …faculty.washington.edu/.../2020_SISMID_5_3.pdfI Diggle, P.J. (2013) Statistical Analysis of Spatial and Spatio-Temporal Point Patterns,

OutlinePreliminaries

Random patternsEstimating intensities

Second order properties

K functionsEstimating K functionsEdge correctionMonte Carlo envelopes

Activity

I How could these questions be helpful in infectious diseasesurveillance?

I Intensities?

I K-functions?

56 / 56