MODULE 5: Spatial Statistics in Epidemiology and …faculty.washington.edu/.../2020_SISMID_5_3.pdfI...
Transcript of MODULE 5: Spatial Statistics in Epidemiology and …faculty.washington.edu/.../2020_SISMID_5_3.pdfI...
OutlinePreliminaries
Random patternsEstimating intensities
Second order properties
MODULE 5: Spatial Statistics in Epidemiologyand Public Health
Lecture 3: Point Processes
Jon Wakefield and Lance Waller
1 / 56
OutlinePreliminaries
Random patternsEstimating intensities
Second order properties
Preliminaries
Random patternsMonte Carlo testingHeterogeneous Poisson process
Estimating intensities
Second order propertiesK functionsEstimating K functionsEdge correctionMonte Carlo envelopes
2 / 56
OutlinePreliminaries
Random patternsEstimating intensities
Second order properties
References
I Baddeley, A., Rubak, E., and Turner. R. (2015) Spatial PointPatterns: Methodology and Applications in R. Boca Raton,FL: CRC/Chapman & Hall.
I Diggle, P.J. (1983) Statistical Analysis of Spatial PointPatterns. London: Academic Press.
I Diggle, P.J. (2013) Statistical Analysis of Spatial andSpatio-Temporal Point Patterns, Third EditionCRC/Chapman & Hall.
I Waller and Gotway (2004, Chapter 5) Applied SpatialStatistics for Public Health Data. New York: Wiley.
I Møller, J. and Waagepetersen (2004) Statistical Inference andSimulation for Spatial Point Processes. Boca Raton, FL:CRC/Chapman & Hall.
3 / 56
OutlinePreliminaries
Random patternsEstimating intensities
Second order properties
Goals
I Describe basic types of spatial point patterns.
I Introduce mathematical models for random patterns of events.
I Introduce analytic methods for describing patterns in observedcollections of events.
I Illustrate the approaches using examples from archaeology,conservation biology, and epidermal neurology.
4 / 56
OutlinePreliminaries
Random patternsEstimating intensities
Second order properties
Monte Carlo testingHeterogeneous Poisson process
Terminology
I Realization: An observed set of event locations (a data set).
I Point: Where an event could occur.
I Event: Where an event did occur.
5 / 56
OutlinePreliminaries
Random patternsEstimating intensities
Second order properties
Monte Carlo testingHeterogeneous Poisson process
Complete Spatial Randomness (CSR)
I Start with a model of “lack of pattern”.I Events equally likely to occur anywhere in the study area
(uniform distribution).I Sometimes this is referred to as a “random” pattern, but...I Any probability model generates patterns so, in effect, all of
the patterns we consider are “random”, let’s use “uniform”.
I Event locations independent of each other.
6 / 56
OutlinePreliminaries
Random patternsEstimating intensities
Second order properties
Monte Carlo testingHeterogeneous Poisson process
Six realizations of CSR
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
0.0 0.2 0.4 0.6 0.8 1.0
0.0
0.2
0.4
0.6
0.8
1.0
u
v
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
0.0 0.2 0.4 0.6 0.8 1.00.
00.
20.
40.
60.
81.
0u
v
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
0.0 0.2 0.4 0.6 0.8 1.0
0.0
0.2
0.4
0.6
0.8
1.0
u
v
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
0.0 0.2 0.4 0.6 0.8 1.0
0.0
0.2
0.4
0.6
0.8
1.0
u
v ●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
0.0 0.2 0.4 0.6 0.8 1.0
0.0
0.2
0.4
0.6
0.8
1.0
u
v
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
0.0 0.2 0.4 0.6 0.8 1.0
0.0
0.2
0.4
0.6
0.8
1.0
u
v
7 / 56
OutlinePreliminaries
Random patternsEstimating intensities
Second order properties
Monte Carlo testingHeterogeneous Poisson process
CSR as a boundary condition
CSR serves as a boundary between:
I Patterns that are more “clustered” than CSR.
I Patterns that are more “regular” than CSR.
8 / 56
OutlinePreliminaries
Random patternsEstimating intensities
Second order properties
Monte Carlo testingHeterogeneous Poisson process
Too Clustered (top), Too Regular (bottom)
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
0.0 0.2 0.4 0.6 0.8 1.0
0.0
0.2
0.4
0.6
0.8
1.0
u
v
Clustered
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
0.0 0.2 0.4 0.6 0.8 1.00.
00.
20.
40.
60.
81.
0
u
v
Clustered
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
●
●
●
●
0.0 0.2 0.4 0.6 0.8 1.0
0.0
0.2
0.4
0.6
0.8
1.0
u
v
Clustered
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
0.0 0.2 0.4 0.6 0.8 1.0
0.0
0.2
0.4
0.6
0.8
1.0
u
v
Regular
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
0.0 0.2 0.4 0.6 0.8 1.0
0.0
0.2
0.4
0.6
0.8
1.0
u
v
Regular
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
0.0 0.2 0.4 0.6 0.8 1.0
0.0
0.2
0.4
0.6
0.8
1.0
u
v
Regular
9 / 56
OutlinePreliminaries
Random patternsEstimating intensities
Second order properties
Monte Carlo testingHeterogeneous Poisson process
The role of scale
I An observed pattern may be clustered at one spatial scale,and regular at another.
I The scale of process, scale of observation, and scale ofintervention might be different!
I How might this impact our ability to detect patterns?
I Many ecology papers on estimating scale of a process (e.g.,plant disease, animal territories) but little in public healthliterature.
10 / 56
OutlinePreliminaries
Random patternsEstimating intensities
Second order properties
Monte Carlo testingHeterogeneous Poisson process
Clusters of regular patterns/Regular clusters
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●●
●●
●
0.0 0.2 0.4 0.6 0.8 1.0
0.0
0.2
0.4
0.6
0.8
1.0
u
v
●
●
●
● ●
●
●●
●●
●●
●
●
●●●
●●●
●●
●●
●
●
●●●
●
●
●
●
●
●
●
●
●●
●
●
●●
●●
●
●●
●
●
●●●
●●●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
Regular pattern of clusters
● ●
●● ●
0.0 0.2 0.4 0.6 0.8 1.0
0.0
0.2
0.4
0.6
0.8
1.0
u
v
● ●
●● ●● ●
●● ●
● ●
●● ●
● ●
●● ●
● ●
●● ●
● ●
●● ●
● ●
●● ●
● ●
●● ●
● ●
●● ●
● ●
●● ●
● ●
●● ●
● ●
●● ●
● ●
●● ●
● ●
●● ●
● ●
●● ●
● ●
●● ●
● ●
●● ●
● ●
●● ●
● ●
●● ●
Cluster of regular patterns
11 / 56
OutlinePreliminaries
Random patternsEstimating intensities
Second order properties
Monte Carlo testingHeterogeneous Poisson process
Spatial Point Processes
I Mathematically, we treat our point patterns as realizations ofa spatial stochastic process.
I A stochastic process is a collection of random variablesX1,X2, . . . ,XN .
I Examples: Number of people in line at grocery store.
I For us, each random variable represents an event location.
12 / 56
OutlinePreliminaries
Random patternsEstimating intensities
Second order properties
Monte Carlo testingHeterogeneous Poisson process
Stationarity/Isotropy
I Stationarity: Properties of process invariant to translation.
I Isotropy: Properties of process invariant to rotation around anorigin.
I Why do we need these? They provide a sort of replication inthe data that allows statistical estimation.
I Not required, but development easier.
13 / 56
OutlinePreliminaries
Random patternsEstimating intensities
Second order properties
Monte Carlo testingHeterogeneous Poisson process
CSR as a Stochastic Process
Let N(A) = number of events observed in region A, and λ = apositive constant.
A homogenous spatial Poisson point process is defined by:
(a) N(A) ∼ Pois(λ|A|)(b) given N(A) = n, the locations of the events are uniformly
distributed over A.
λ is the intensity of the process (mean number of events expectedper unit area).
14 / 56
OutlinePreliminaries
Random patternsEstimating intensities
Second order properties
Monte Carlo testingHeterogeneous Poisson process
Is this CSR?
I Criterion (b) describes our notion of uniform andindependently distributed in space.
I Criteria (a) and (b) give a “recipe” for simulating realizationsof this process:
* Generate a Poisson random number of events.* Distribute that many events uniformly across the study area.runif(n,min(x),max(x))
runif(n,min(y),max(y))
15 / 56
OutlinePreliminaries
Random patternsEstimating intensities
Second order properties
Monte Carlo testingHeterogeneous Poisson process
Compared to temporal Poisson process
Recall three features of a Poisson process in time:
I Number of events in non-overlapping intervals are Poissondistributed,
I Conditional on the number of events, events are uniformlydistributed within a fixed interval, and
I Interevent times are exponentially distributed.
In space, no ordering so no (uniquely defined) interevent distances.
16 / 56
OutlinePreliminaries
Random patternsEstimating intensities
Second order properties
Monte Carlo testingHeterogeneous Poisson process
Monte Carlo hypothesis testing
Review of basic hypothesis testing framework:
I T= a random variable representing the test statistic.
I Under H0 : T ∼ F0.
I From the data, observe T = tobs (the observed value of thetest statistic).p-value = 1− F0(tobs)
I Sometimes it is easier to simulate F0(·) than to calculate F0(·)exactly.
Besag and Diggle (1977).
17 / 56
OutlinePreliminaries
Random patternsEstimating intensities
Second order properties
Monte Carlo testingHeterogeneous Poisson process
Steps in Monte Carlo testing
1. observe t1.
2. simulate t2, ..., tm from F0.
3. p.value = rank of t1m .
For tests of CSR (complete spatial randomness), M.C. tests arevery useful since CSR is easy to simulate but distribution of teststatistic may be complex.
18 / 56
OutlinePreliminaries
Random patternsEstimating intensities
Second order properties
Monte Carlo testingHeterogeneous Poisson process
Monte Carlo tests very helpful
1. when distribution of U is complex but the spatial distributionassociated with H0 is easy to simulate.
2. for permutation tests
e.g., 592 cases of leukemia in ∼ 1 million people in 790regionsI permutation test requires all possible permutations.I Monte Carlo assigns cases to regions under H0.
19 / 56
OutlinePreliminaries
Random patternsEstimating intensities
Second order properties
Monte Carlo testingHeterogeneous Poisson process
Activity: Trees and CSR
I Event locations for a strand of 327 myrtle trees in arectangular plot 170.5 × 213.0 meters.
I 221 healthy trees.
I 106 diseased trees.
I Research question: Is the spatial pattern of diseased trees thesame as the spatial pattern of health trees?
I Start with:I Are all locations consistent with CSR?I Are diseased locations consistent with CSR?I Are health locations consistent with CSR?
20 / 56
OutlinePreliminaries
Random patternsEstimating intensities
Second order properties
Monte Carlo testingHeterogeneous Poisson process
Final CSR notes
CSR:
1. is the “white noise” of spatial point processes.
2. characterizes the absence of structure (signal) in data.
3. often the null hypothesis in statistical tests to determine ifthere is structure in an observed point pattern.
4. not as useful in public health? Why not?
21 / 56
OutlinePreliminaries
Random patternsEstimating intensities
Second order properties
Monte Carlo testingHeterogeneous Poisson process
Heterogeneous population density
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
●●
●
●
0.0 0.2 0.4 0.6 0.8 1.0
0.0
0.2
0.4
0.6
0.8
1.0
u
v
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
●●
●
●
0.0 0.2 0.4 0.6 0.8 1.0
0.0
0.2
0.4
0.6
0.8
1.0
u
v
22 / 56
OutlinePreliminaries
Random patternsEstimating intensities
Second order properties
Monte Carlo testingHeterogeneous Poisson process
Heterogeneous Poisson Process
1. What if λ, the intensity of the process (mean number ofevents expected per unit area), varies by location?
2. N(A) = Pois(∫
(s)∈A λ(s)ds)
(|A| =∫(s)∈A ds)
3. Given N(A) = n, events distributed in A as an independentsample from a distribution on A with p.d.f. proportional toλ(s).
23 / 56
OutlinePreliminaries
Random patternsEstimating intensities
Second order properties
Monte Carlo testingHeterogeneous Poisson process
Example intensity function
u
v
lambda
u
v
0 5 10 15 20
05
1015
20
24 / 56
OutlinePreliminaries
Random patternsEstimating intensities
Second order properties
Monte Carlo testingHeterogeneous Poisson process
Six realizations
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
0 5 10 15 20
05
1015
20
u
v
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
● ●
● ●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
0 5 10 15 200
510
1520
u
v ●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
0 5 10 15 20
05
1015
20
u
v
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
● ●
●
●
●
●
● ●
●
●
●
●
0 5 10 15 20
05
1015
20
u
v
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
0 5 10 15 20
05
1015
20
u
v
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
● ●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
0 5 10 15 20
05
1015
20
u
v
25 / 56
OutlinePreliminaries
Random patternsEstimating intensities
Second order properties
Monte Carlo testingHeterogeneous Poisson process
IMPORTANT FACT!
Without additional information, no analysis can differentiatebetween:
1. independent events in a heterogeneous (non-stationary)environment
2. dependent events in a homogeneous (stationary) environment
26 / 56
OutlinePreliminaries
Random patternsEstimating intensities
Second order properties
How do we estimate intensities?
For an heterogeneous Poisson process, λ(s) is closely related to thedensity of events over A (λ(s) ∝ density).
So density estimators provide a natural approach (for details ondensity estimation see Silverman (1986) and Wand and Jones(1995, KernSmooth R library)).
Main idea: Put a little “kernel” of density at each data point, thensum to give the estimate of the overall density function.
27 / 56
OutlinePreliminaries
Random patternsEstimating intensities
Second order properties
Kernels and bandwidths
s
0.0 0.2 0.4 0.6 0.8 1.0
Kernel variance = 0.02
s
0.0 0.2 0.4 0.6 0.8 1.0
Kernel variance = 0.03
s
0.0 0.2 0.4 0.6 0.8 1.0
Kernel variance = 0.04
s
0.0 0.2 0.4 0.6 0.8 1.0
Kernel variance = 0.1
28 / 56
OutlinePreliminaries
Random patternsEstimating intensities
Second order properties
1-dim kernel estimation
If we have observations u1, u2, . . . , uN (in one dimension), thekernel density estimate is
f (u) =1
Nb
N∑i=1
Kern
(u − ui
b
)(1)
where Kern(·) is a kernel function satisfying∫D
Kern(s)ds = 1
and b the bandwidth. To estimate the intensity function, replaceN−1 by |D|−1.
29 / 56
OutlinePreliminaries
Random patternsEstimating intensities
Second order properties
Kernel estimation in R
base
I density() one-dimensional kernel
library(KernSmooth)
I bkde2D(x, bandwidth, gridsize=c(51, 51),
range.x=<<see below>>, truncate=TRUE) block kerneldensity estimation
library(splancs)
I kernel2d(pts,poly,h0,nx=20,
ny=20,kernel=’quartic’)
library(spatstat)
I ksmooth.ppp(x, sigma, weights, edge=TRUE)
30 / 56
OutlinePreliminaries
Random patternsEstimating intensities
Second order properties
Data Break: Early Medieval Grave Sites
What do we have?
I Alt and Vach (1991). (Data sent from Richard WrightEmeritus Professor, School of Archaeology, University ofSydney.)
I Archeaological dig in Neresheim, Baden-Wurttemberg,Germany.
I The anthropologists and archaeologists involved wonder if thisparticular culture tended to place grave sites according tofamily units.
I 143 grave sites.
31 / 56
OutlinePreliminaries
Random patternsEstimating intensities
Second order properties
What do we want?
I 30 with missing or reduced wisdom teeth (“affected”).
I Does the pattern of “affected” graves (cases) differ from thepattern of the 113 non-affected graves (controls)?
I How could estimates of the intensity functions for the affectedand non-affected grave sites, respectively, help answer thisquestion?
32 / 56
OutlinePreliminaries
Random patternsEstimating intensities
Second order properties
Outline of analysis
I Read in data.
I Plot data (axis intervals important!).
I Call 2-dimensional kernel smoothing functions (choose kerneland bandwidth).
I Plot surface (persp()) and contour (contour()) plots.
I Visual comparison of two intensities.
33 / 56
OutlinePreliminaries
Random patternsEstimating intensities
Second order properties
Plot of the data
*
*
**
**
*
** * *
**
*
*
*
*
*
*
**
*
*
**
**
*
*
*
***
**
*
*
*
*
*
*
*
**
*
*
*
*
*
*
*
**
*
*
*
**
**
*
**
*
*
*
**
*
*
*
**
*
*
*
***
* *
**
*
*
*
*
**
**
*
**
*
*
*
**
*
*
*
*
**
*
*
*
*
*
*
**
** *
*
*
*
*
**
*
*
*
*
****
*
*
*
**
*
*
*
***
*
*
4000 6000 8000 10000
4000
6000
8000
10000
u
v
Grave locations (*=grave, O=affected)
OO
O
O
O
O
O
O
OO
OOO
O
O
OO
O
O
O
O
O
O
O
O
OO
OO
O
34 / 56
OutlinePreliminaries
Random patternsEstimating intensities
Second order properties
Case intensity
u
v
Intensity
Estimated intensity function
**
*
*
*
*
*
*
* *
***
*
*
**
**
*
*
*
**
*** **
*
4000 8000
4000
8000
u
v
Affected grave locations
35 / 56
OutlinePreliminaries
Random patternsEstimating intensities
Second order properties
Control intensity
u
v
Intensity
Estimated intensity function
*
*
**
*
** * *
**
*
*
*
*
*
**
*
*
**
**
*
*
***
***
*
**
**
*
*
*
*
*
*
**
*
*
*
*
*
**
*
*
**
*
*
*
*
*
** *
***
*
*
***
*
*
*
**
*
*
*
*
**
*
*
*
*
*
*
** *
*
*
*
**
*
*
**
****
*
*
**
*
*
*
*
4000 6000 8000 10000
4000
6000
8000
1000
0
u
v
Non−affected grave locations
36 / 56
OutlinePreliminaries
Random patternsEstimating intensities
Second order properties
What we have/don’t have
I Kernel estimates suggest similarities and differences.
I Suggest locations where there might be differences.
I No significance testing (yet!)
37 / 56
OutlinePreliminaries
Random patternsEstimating intensities
Second order properties
K functionsEstimating K functionsEdge correctionMonte Carlo envelopes
First and Second Order Properties
I The intensity function describes the mean number of eventsper unit area, a first order property of the underlying process.
I What about second order properties relating to thevariance/covariance/correlation between event locations (ifevents non independent...)?
38 / 56
OutlinePreliminaries
Random patternsEstimating intensities
Second order properties
K functionsEstimating K functionsEdge correctionMonte Carlo envelopes
Ripley’s K function
Ripley (1976, 1977 introduced) the reduced second momentmeasure or K function
K (h) =E [# events within h of a randomly chosen event]
λ,
for any positive spatial lag h.
NOTE: Use of λ implies assumption of stationary process!
39 / 56
OutlinePreliminaries
Random patternsEstimating intensities
Second order properties
K functionsEstimating K functionsEdge correctionMonte Carlo envelopes
Properties of K (h)
I Ripley (1977) shows specifying K (h) for all h > 0, equivalentto specifying Var[N(A)] for any subregion A.
I Under CSR, K (h) = πh2 (area of circle of with radius h).
I Clustered? K (h) > πh2.
I Regular? K (h) < πh2.
40 / 56
OutlinePreliminaries
Random patternsEstimating intensities
Second order properties
K functionsEstimating K functionsEdge correctionMonte Carlo envelopes
Estimating K (h)
Start with definition, replacing expectation with average yields
K (h) =1
λN
N∑i=1
N∑j=1
j 6=i
δ(d(i , j) < h),
for N events, where
I d(i , j) = distance between events i and j
I δ(d(i , j) < h) = 1 if d(i , j) < h, 0 otherwise.
41 / 56
OutlinePreliminaries
Random patternsEstimating intensities
Second order properties
K functionsEstimating K functionsEdge correctionMonte Carlo envelopes
Edges
I Think about events near “edges” of study area.
I How do we count events within h if distance between edgeand observed event is < h?
I More of a problem as h increases...
I “Edge effects” call for “edge correction”.
42 / 56
OutlinePreliminaries
Random patternsEstimating intensities
Second order properties
K functionsEstimating K functionsEdge correctionMonte Carlo envelopes
Edge correction
I Limit estimates to smaller distances h.
I Ripley’s (1976) edge correction
Kec(h) = λ−1N∑i=1
N∑j=1
j 6=i
w−1ij δ(d(i , j) < h)
where wij = proportion of the circumference of the circlecentered at event i with radius d(i , j) within the study area.
43 / 56
OutlinePreliminaries
Random patternsEstimating intensities
Second order properties
K functionsEstimating K functionsEdge correctionMonte Carlo envelopes
Ripley’s weights
I Conceptually, conditional probability of observing an event atdistance d(i , j) given an event occurs d(i , j) from event i .
I Works for “holes” in study area too. (Astronomy application).
I Requires definition of study boundary (and way of calculatingwij).
44 / 56
OutlinePreliminaries
Random patternsEstimating intensities
Second order properties
K functionsEstimating K functionsEdge correctionMonte Carlo envelopes
Calculating K (h) in R
library(splancs)
I khat(pts,poly,s,newstyle=FALSE)
I poly defines polygon boundary (important!!!).
library(spatstat)
I Kest(X, r, correction=c("border", "isotropic",
"Ripley", "translate"))
I Boundary part of X (point process “object”).
45 / 56
OutlinePreliminaries
Random patternsEstimating intensities
Second order properties
K functionsEstimating K functionsEdge correctionMonte Carlo envelopes
Plots with K (h)
I Plotting (h,K (h)) for CSR is a parabola.
I K (h) = πh2 implies (K (h)
π
)1/2
= h.
I Besag (1977) suggests plotting
h versus L(h)
where
L(h) =
(Kec(h)
π
)1/2
− h
46 / 56
OutlinePreliminaries
Random patternsEstimating intensities
Second order properties
K functionsEstimating K functionsEdge correctionMonte Carlo envelopes
Variability and Envelopes
Are there any distributional results for K (h)?
Some, but mostly for particular region shapes. Monte Carloapproaches more general.
I Observe K (h) from data.
I Simulate a realization of events from CSR.
I Find K (h) for the simulated data.
I Repeat simulations many times.
I Create simulation “envelopes” from simulation-based K (h)’s.
47 / 56
OutlinePreliminaries
Random patternsEstimating intensities
Second order properties
K functionsEstimating K functionsEdge correctionMonte Carlo envelopes
Example: Regular clusters and clusters of regularity
0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7
−0.
10.
00.
10.
20.
3
Distance (h)
sqrt
(Kha
t/pi)
− h
Estimated K function, regular pattern of clusters
0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7
−0.
10.
00.
10.
20.
3
Distance (h)
sqrt
(Kha
t/pi)
− h
Estimated K function, cluster of regular patterns
48 / 56
OutlinePreliminaries
Random patternsEstimating intensities
Second order properties
K functionsEstimating K functionsEdge correctionMonte Carlo envelopes
Data break: Medieval gravesites
0 1000 2000 3000 4000 5000
−40
040
0
Distance
Lhat
− d
ista
nce
L plot for all gravesites, rectangle
0 1000 2000 3000 4000 5000
−40
040
0
Distance
Lhat
− d
ista
nce
L plot for affected gravesites, rectangle
0 1000 2000 3000 4000 5000
−40
040
0
Distance
Lhat
− d
ista
nce
L plot for non−affected gravesites, rectangle
49 / 56
OutlinePreliminaries
Random patternsEstimating intensities
Second order properties
K functionsEstimating K functionsEdge correctionMonte Carlo envelopes
Clustering!
I Looks like strong clustering, but wait...
I Compares all, cases, controls to CSR.
I Oops, forgot to adjust for polygon, clustering inside squarearea!
I Significant clustering, but interesting clustering?
I Let’s try again with polygon adjustment.
50 / 56
OutlinePreliminaries
Random patternsEstimating intensities
Second order properties
K functionsEstimating K functionsEdge correctionMonte Carlo envelopes
Medieval graves: K functions with polygon adjustment
0 1000 2000 3000 4000 5000
−40
040
0
Distance
Lhat
− d
ista
nce
L plot for all gravesites, polygon
0 1000 2000 3000 4000 5000
−40
040
0
Distance
Lhat
− d
ista
nce
L plot for affected gravesites, polygon
0 1000 2000 3000 4000 5000
−40
040
0
Distance
Lhat
− d
ista
nce
L plot for non−affected gravesites, polygon
51 / 56
OutlinePreliminaries
Random patternsEstimating intensities
Second order properties
K functionsEstimating K functionsEdge correctionMonte Carlo envelopes
Clustering?
I Clustering of cases at very shortest distances.
I Likely due to two coincident-pair sites (both cases in bothpairs).
I Could we try random labelling here?
I Construct envelopes based on random samples of 30 ”cases”from set of 143 locations.
52 / 56
OutlinePreliminaries
Random patternsEstimating intensities
Second order properties
K functionsEstimating K functionsEdge correctionMonte Carlo envelopes
A few notes
I Since K (h) is a function of distance (h), it can indicateclustering at one scale and regularity at another.
I K (h) measures cumulative aggregation (# events up to h).May result in delayed response to change from clustering toregularity.
I The pair correlation function (based on K ′(h)) may respondmore instantaneously.
53 / 56
OutlinePreliminaries
Random patternsEstimating intensities
Second order properties
K functionsEstimating K functionsEdge correctionMonte Carlo envelopes
Notes
I Just as the first and second moments do not uniquely define adistribution, λ(s) and K (h) do not uniquely define a spatialpoint pattern (Baddeley and Silverman 1984, and in Section5.3.4 ).
I Analyses based on λ(s) typically assume independent events,describing the observed pattern entirely through aheterogeneous intensity.
I Analyses based on K (h) typically assume a stationary process(with constant λ), describing the observed pattern entirelythrough correlations between events.
I Remember Barlett’s (1964) result (IMPORTANT FACT!above).
54 / 56
OutlinePreliminaries
Random patternsEstimating intensities
Second order properties
K functionsEstimating K functionsEdge correctionMonte Carlo envelopes
What questions can we answer?
I Are events uniformly distributed in space?I Test CSR.
I If not, where are events more or less likely?I Intensity estimation.
I Do events tend to occur near other events, and, if so, at whatscale?I K functions with Monte Carlo envelopes.
55 / 56
OutlinePreliminaries
Random patternsEstimating intensities
Second order properties
K functionsEstimating K functionsEdge correctionMonte Carlo envelopes
Activity
I How could these questions be helpful in infectious diseasesurveillance?
I Intensities?
I K-functions?
56 / 56