Interaction Analysis of Spatial P o int P atterns - UCSBchris/Medrano_GEO 210C/GEO 210C... ·...
Transcript of Interaction Analysis of Spatial P o int P atterns - UCSBchris/Medrano_GEO 210C/GEO 210C... ·...
Interaction Analysis of Spatial Point PatternsGeog 210C
Introduction to Spatial Data Analysis
Phaedon C. Kyriakidiswww.geog.ucsb.edu/!phaedon
Department of Geography
University of California Santa Barbara
Santa Barbara, CA 93106-4060
Spring Quarter 2009
Spatial Point Patterns
DefinitionSet of point locations with recorded “events” within study region, e.g., locationsof trees, disease or crime incidents
0 20 40 60 80 1000
10
20
30
40
50
60
70
80
90
100N=100 clustered events in a study region
0 20 40 60 80 1000
10
20
30
40
50
60
70
80
90
100N=100 random events in a study region
! point locations could correspond to all possible events or to subsets of them(mapped versus sampled point pattern)
! attribute values could have also been measured at event locations, e.g., treediameter (marked point pattern) – not considered in this handout
Objective of this handout
! Introduce statistical tools for quantifying spatial interaction of events,e.g., clustering versus randomness or regularity
Ph. Kyriakidis (UCSB) Geog 210C Spring 2009 2 / 27
Outline
Concepts & Notation
Distance & Distance Matrices
Distances Involved in Spatial Point Patterns
Quantifying Spatial Interaction: G Function
Quantifying Spatial Interaction: F Function
Quantifying Spatial Interaction: K Function
Points To Remember
Ph. Kyriakidis (UCSB) Geog 210C Spring 2009 3 / 27
Concepts & Notation
Some Notation
Point eventsSet of N locations of events occurring in a study area:
{ui , i = 1, . . . ,N}, ui ! D " RK
ui = coordinate vector of i-th event location, e.g., in 2D ui = {xi yi}, ! = belongs to,D = study domain, a subset " of a K -dimensional space RK
Variable of interesty(s) = number of events (a count) within arbitrary domain or support s withmeasure (length, area, volume) |s|; support s is centered at an arbitrary location uand can also be denoted as s(u); in statistics, y(s) is treated as a realization of arandom variable (RV) Y (s)
ObjectiveQuantify interaction, e.g., covariation, between outcomes of any two RVs Y (s)and Y (s "). To do so, all RVs must lie in the same “environment”; in other words,the long-term average (expectation) of RV Y (s) should be similar to that of Y (s ")
Ph. Kyriakidis (UCSB) Geog 210C Spring 2009 4 / 27
Concepts & Notation
Intensity of Events
Local intensity !(u)Mean number of events per unit area at an arbitrary location or point u, formallydefined as:
!(u) = lim|s|#0
!E{Y (s)}
|s|
", u ! D
where E{Y (s)} denotes the expectation (mean) of RV Y (s) within region s(u) centeredat u and |s| is the area of that region
Overall intensity !
Estimated as: ! =n
|D| , where |D| = measure (area) of study region D
First-order stationarityAny RV Y (s) should have the same long-term average, for a fixed areal unit s.This implies a constant intensity: !(u) = !, #u ! D, and the expected number ofevents with a region s is just a function of |s|: E{Y (s)} = !|s|, s ! D
Ph. Kyriakidis (UCSB) Geog 210C Spring 2009 5 / 27
Concepts & Notation
Interaction Between Count RVs
Second-order intensityLong-term average (expectation) of products of counts per unit areas at any twoarbitrary points u and u", formally defined as:
"(u,u") = lim|s|,|s!|#0
!E{Y (s)Y (s ")}
|s||s "|
", u,u" ! D
Some terminology
! second-order stationarity: expectation of all RVs is constant (first-orderstationarity), and second-order intensity is a function of separation vectorbetween any two locations u and u"
! isotropy: only distance (not orientation) of separation vector matters
OutlookQuantifying interaction in spatial point patterns within the above assumptions orworking hypotheses amounts to studying distances between events
Ph. Kyriakidis (UCSB) Geog 210C Spring 2009 6 / 27
not the same as E{Y(s)}*E{Y(s')}, unless variables are independent
Distance & Distance Matrices
Distance
A measure of proximity (typically along a crow’s flight path) between any twolocations or spatial entities
Euclidean distanceConsider two points in a 2D (geographical or other) space with coordinatesui = (xi , yi ) and uj = (xj , yj). The Euclidean distance dij between points ui anduj is computed via Pythagoras’s theorem as:
dij = d(ui ,uj) = ||ui $ uj || =#
(xi $ xj)2 + (yi $ yj)2
||ui $ uj || is called the 2-norm of vector hij = ui $ uj
locations ui and uj are called, respectively, the tail and head of vector hij
x ix
iu
jy
iy
ix jx
iy jydij
j
j
y
u
xPh. Kyriakidis (UCSB) Geog 210C Spring 2009 7 / 27
Distance & Distance Matrices
Distance Metric
Formal characteristics of a distance metricA measure dij of proximity between locations ui and uj is a valid distance metric ifit satisfies the following requirements:
! distance between a point and itself is always zero: dii = 0! distance between a point and another one is always positive: dij > 0! distance between two points is the same no matter which point you consider
first: dij = dji! the triangular inequality holds: sum of length of two sides of a triangle
cannot be smaller than length of third side: dij % dil + dlj
A metric dij need not always be Euclidean,hence should checked to ensure that it is a valid distance metric
Ph. Kyriakidis (UCSB) Geog 210C Spring 2009 8 / 27
Distance & Distance Matrices
Non-Euclidean Distances
Alternative “distance” measures(i) over a road, or railway, (ii) along a river, (ii) over a network
u
5u
4u
1u2u
3
Euclidean distance between locationsnetwork distance between locations
Even more exotic “distance” measures(i) travel time over a network, (ii) perceived travel time between urban landmarks,(iii) volume of exports/imports
Euclidean distances between network nodes#= actual or perceived distances on the network
the latter might not even be formal distance metrics, i.e.: dij #= dji
Ph. Kyriakidis (UCSB) Geog 210C Spring 2009 9 / 27
Distance & Distance Matrices
Minkowski’s Generalized Distance
DefinitionConsider two points in a K -dimensional (geographical or other) space RK withcoordinate vectors ui = [ui1, . . . , uik , . . . , uiK ] and uj = [uj1, . . . , ujk , . . . , ujK ]. The
Minkowski distance of order p (with p > 1), denoted as d (p)ij , between points ui
and uj is computed as:
d (p)ij =
$K%
k=1
|uik $ ujk |p&1/p
Particular cases! Manhattan or city-block distance: d (1)
ij ='K
k=1 |uik $ ujk |
! Euclidean distance: d (2)ij =
#'Kk=1 |uik $ ujk |2
! infinity norm or Chebyshev distance, as p &':max(|ui1 $ uj1|, . . . , |uik $ ujk |, . . . , |uiK $ ujK |)
Distances computed from points in multidimensional spacesare routinely used in statistical pattern recognition;
points represent objects or cases, each described by K attribute values
Ph. Kyriakidis (UCSB) Geog 210C Spring 2009 10 / 27
Distance & Distance Matrices
Euclidean Distance Matrix: Single Set of Points
DefinitionConsider a set of N points {u1, . . . ,ui , . . . ,uN} in a K -dimensional (geographicalor other) space. The distance matrix D is square (N ( N) matrix containing thedistances {d(ui ,uj), i = 1, . . . ,N, j = 1, . . . ,N} between all N ( N possible pairsof points in the set
ui u1 u2 u3 u4 u5
xi x1 x2 x3 x4 x5
yi y1 y2 y3 y4 y5
by convention, u1 is the coordinate vector of the 1st point in the set (1st entry in data file)
D =
!
""""#
d11 d12 d13 d14 d15
d21 d22 d23 d24 d25
d31 d32 d33 d34 d35
d41 d42 d43 d44 d45
d51 d52 d53 d54 d55
$
%%%%&=
!
""""#
0 d12 d13 d14 d15
d12 0 d23 d24 d25
d13 d23 0 d34 d35
d14 d24 d34 0 d45
d15 d25 d35 d45 0
$
%%%%&= [dij ]
i-th row (or column) contains distances between i-th point ui and all others (including itself)D is symmetric with zeros along its diagonal
Ph. Kyriakidis (UCSB) Geog 210C Spring 2009 11 / 27
Distance & Distance Matrices
Euclidean Distance Matrix: Two Sets of Points
DefinitionConsider 2 sets of points {u1, . . . ,ui , . . . ,uN} and {t1, . . . , tj , . . . , tM} in aK -dimensional (geographical or other) space. The distance matrix D is a (N (M)matrix containing the Euclidean distances {d(ui , tj), i = 1, . . . ,N, j = 1, . . . ,M}between all N (M possible pairs formed by these two sets of points
ui u1 u2 u3 u4 u5
xi x1 x2 x3 x4 x5
yi y1 y2 y3 y4 y5
tj t1 t2 t3 t4 t5 t6 t7xj x1 x2 x3 x4 x5 x6 x7
yj y1 y2 y3 y4 y5 y6 y7
by convention, u1 is the coordinate vector of the 1st datum in the data set #1, and similarly for t1
D =
!
""""#
d11 d12 d13 d14 d15 d16 d17
d21 d22 d23 d24 d25 d26 d27
d31 d32 d33 d34 d35 d36 d37
d41 d42 d43 d44 d45 d46 d47
d51 d52 d53 d54 d55 d56 d57
$
%%%%&= [dij ]
i-th row contains distances between i-th point ui in set #1 and all points in set #2j-th column contains distances between j-th point tj in set #2 and all points in set #1
D is not symmetric, i.e., d12 #= d21: pair {u1, t2} is not the same as pair {u2, t1}
Ph. Kyriakidis (UCSB) Geog 210C Spring 2009 12 / 27
Distances Involved in Spatial Point Patterns
Distances Between Events in A Point Pattern
Event-to-event distanceDistance dij between event at location ui and another event at location uj :
dij =#
(xi $ xj)2 + (yi $ yj)2
Point-to-event distanceDistance dpj between a randomly chosen point at location tp and an event atlocation uj :
dpj =#
(xp $ xj)2 + (yp $ yj)2
Event-to-nearest-event distanceDistance dmin(ui ) between an event at location ui and its nearest neighbor event:
dmin(ui ) = min{d ijj "=i
, j = 1, . . . ,N}
Point-to-nearest-event distanceDistance dmin(tp) between a randomly chosen point at location tp and its nearestneighbor event:
dmin(tp) = min{dpj , j = 1, . . . ,N}Ph. Kyriakidis (UCSB) Geog 210C Spring 2009 13 / 27
Distances Involved in Spatial Point Patterns
Event-to-Nearest-Event Distances
0 20 40 60 80 1000
10
20
30
40
50
60
70
80
90
100
u1
Pattern with N=5 events
u2
u3 u4 u5
!
"""""#
0.00 76.24 59.81 92.21 77.70
76.24 0.00 42.83 20.35 17.62
59.81 42.83 0.00 46.03 30.58
92.21 20.35 46.03 0.00 15.94
77.70 17.62 30.58 15.94 0.00
$
%%%%%&
Distance matrix
e.g., 59.81 = dmin(u1), 17.62 = dmin(u2)Some events might be nearest neighbors of each other: e.g., u4, u5,
or have same nearest neighbor: e.g., u2, u3, u4 are nearest neighbors of u5
Mean nearest neighbor distance
Average of all dmin(ui ) values: dmin =1
N
N%
i=1
dmin(ui )
Drawback: single number does not su!ce to describe point pattern
Ph. Kyriakidis (UCSB) Geog 210C Spring 2009 14 / 27
Quantifying Spatial Interaction: G Function
The G Function
DefinitionProportion of event-to-nearest-event distances dmin(ui ) no greater than givendistance cuto" d , estimated as:
G (d) =#{dmin(ui ) % d , i = 1, . . . ,N}
N
Cumulative distribution function (CDF) of all N event-to-nearest-event distances; insteadof computing average dmin of dmin values, compute their CDF
For point pattern in previous page
10 20 30 40 50 600
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
event-to-nearest neighbor distance, d
Sample histogram of event−nearest−neighbor distances
freq
uenc
y
10 20 30 40 50 600
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
event-to-nearest neighbor distance, d
G(d
)
Sample G function
for larger number of events N, G(d) becomes smootherPh. Kyriakidis (UCSB) Geog 210C Spring 2009 15 / 27
Quantifying Spatial Interaction: G Function
Event-to-Nearest-Event (E2NE) Distance Histograms
0 20 40 60 80 1000
10
20
30
40
50
60
70
80
90
100N=100 random stratified events in a study region
0 20 40 60 80 1000
10
20
30
40
50
60
70
80
90
100N=100 clustered events in a study region
0 2 4 6 8 10 120
0.02
0.04
0.06
0.08
0.1
0.12
0.14
0.16
event-to-nearest neighbor distance, d
Histogram of E2NE distances (evenly−spaced events)
freq
uenc
y
0 1 2 3 4 5 60
0.01
0.02
0.03
0.04
0.05
0.06
0.07
0.08
0.09
event-to-nearest neighbor distance, d
Histogram of E2NE distances (clustered events)
freq
uenc
y
! for evenly-spaced events, more E2NE distances similar to spacing of events! for clustered events, more small E2NE distances and fewer large such distances
Ph. Kyriakidis (UCSB) Geog 210C Spring 2009 16 / 27
Quantifying Spatial Interaction: G Function
Sample G Function Examples
0 20 40 60 80 1000
10
20
30
40
50
60
70
80
90
100N=100 random stratified events in a study region
0 20 40 60 80 1000
10
20
30
40
50
60
70
80
90
100N=100 clustered events in a study region
0 2 4 6 8 10 120
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
event-to-nearest neighbor distance, d
G(d
)
Sample G function (evenly−spaced events)
0 1 2 3 4 5 60
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
event-to-nearest neighbor distance, d
G(d
)
Sample G function (clustered events)
! for evenly-spaced events, G(d) rises gradually up to the distance at which mostevents are spaced, and then increases rapidly
! for clustered events, G(d) rises rapidly at short distances, and then levels o! atlarger d-values
Ph. Kyriakidis (UCSB) Geog 210C Spring 2009 17 / 27
Quantifying Spatial Interaction: F Function
The F Function
DefinitionProportion of point-to-nearest-event distances dmin(tj) no greater than givendistance cuto" d , estimated as:
F (d) =#{dmin(tj) % d , j = 1, . . . ,M}
M
Cumulative distribution function (CDF) of all M point-to-nearest-event distances
0 20 40 60 80 1000
10
20
30
40
50
60
70
80
90
100Pattern with N=5 events and M=100 random points
0 10 20 30 40 50 60 700
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
point-to-nearest neighbor distance, d
F(d
)
Sample F function
for larger number M of random points, F (d) becomes even smootherNote: The F function provides information on event proximity to voids
Ph. Kyriakidis (UCSB) Geog 210C Spring 2009 18 / 27
Quantifying Spatial Interaction: F Function
Point-to-Nearest-Event (P2NE) Distance Histograms
0 20 40 60 80 1000
10
20
30
40
50
60
70
80
90
100N=100 random stratified events in a study region
0 20 40 60 80 1000
10
20
30
40
50
60
70
80
90
100N=100 clustered events in a study region
0 2 4 6 8 100
0.005
0.01
0.015
0.02
0.025
0.03
point-to-nearest neighbor distance, d
Histogram of P2NE distances (evenly−spaced events)
freq
uenc
y
0 10 20 30 40 500
0.005
0.01
0.015
0.02
0.025
0.03
0.035
0.04
point-to-nearest neighbor distance, d
Histogram of P2NE distances (clustered events)
freq
uenc
y
! for evenly-spaced events, there are more nearest events at small distances fromrandomly placed points
! for clustered events, P2NE distances are generally larger than the previous case,and there are a few large such distances
Ph. Kyriakidis (UCSB) Geog 210C Spring 2009 19 / 27
Quantifying Spatial Interaction: F Function
Sample F Function Examples
0 20 40 60 80 1000
10
20
30
40
50
60
70
80
90
100N=100 random stratified events in a study region
0 20 40 60 80 1000
10
20
30
40
50
60
70
80
90
100N=100 clustered events in a study region
0 2 4 6 8 100
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
point-to-nearest neighbor distance, d
F(d
)
Sample F function (evenly−spaced events)
0 10 20 30 40 500
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
point-to-nearest neighbor distance, d
F(d
)
Sample F function (clustered events)
! for evenly-spaced events, F (d) rises rapidly up to the distance at which most eventsare spaced, and then levels o! (more nearest neighbors at small distances fromrandomly placed points)
! for clustered events, F (d) rises rapidly at short distances, and then levels o! atlarger d-values
Ph. Kyriakidis (UCSB) Geog 210C Spring 2009 20 / 27
Quantifying Spatial Interaction: F Function
Comparing Sample G and F Functions
0 20 40 60 80 1000
10
20
30
40
50
60
70
80
90
100N=100 random stratified events in a study region
0 20 40 60 80 1000
10
20
30
40
50
60
70
80
90
100N=100 clustered events in a study region
0 2 4 6 8 10 120
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
distance, d
prop
orti
on
Sample G and F functions (evenly−spaced events)
G(d)F (d)
0 10 20 30 40 500
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
distance, d
prop
orti
on
Sample G and F functions (clustered events)
G(d)F (d)
! for evenly-spaced events, there is more “open” space (smaller point-to-event
distances), hence F (d) rises faster than G(d)! for clustered events, the reverse is true
Ph. Kyriakidis (UCSB) Geog 210C Spring 2009 21 / 27
Quantifying Spatial Interaction: K Function
The Sample K Function
Concept building
1. construct set of concentric circles (of increasing radius d) around each event2. count number of events in each distance “band”3. cumulative number of events up to radius d around all events = sample K
function K (d)
.
.
1u
u2
u3
..
..
..
.
within distance h=6 units
. ..
. ...
..
.
.. ..
.
..
. ...
.
.
.
Example of K function estimation
from event at location
.
from event at location
3 events
6 eventswithin distance h=6 units
from event at location
4 eventswithin distance h=6 units
Formal definition
K(d) =E{# of events within distance d of any arbitrary event }
E{# of events within study domain }
$ 1!
1N
#{dij % d , i = 1, . . . , N, j(#= i) = 1, . . . , N} = K(d)Ph. Kyriakidis (UCSB) Geog 210C Spring 2009 22 / 27
Quantifying Spatial Interaction: K Function
Interpreting The Sample K Function
Re-expressing
K(d) =1!
1N
#{dij % d , i = 1, . . . , N, j(#= i) = 1, . . . , N}
=|D|N
1N
#{dij % d , i = 1, . . . , N, j(#= i) = 1, . . . , N}
= |D|(proportion of event-to-event distances % d)
In other words: Function K (d) is the sample cumulative distribution function(CDF) of all N2 $ N event-to-event distances, scaled by |D|
0 20 40 60 80 1000
10
20
30
40
50
60
70
80
90
100
u1
Pattern with N=5 events
u2
u3 u4 u5
0 20 40 60 80 1000
0.02
0.04
0.06
0.08
0.1
0.12
0.14
0.16
0.18
0.2
event-to-event distance, d
Sample histogram of event−to−event distances
freq
uenc
y
0 20 40 60 80 1000
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
event-to-event distance, d
K(d
)/|A
|
Sample K function (/10000)
Note: Ignore bin at d = 0 (center plot) and point at d = 0 (right plot)
Ph. Kyriakidis (UCSB) Geog 210C Spring 2009 23 / 27
Quantifying Spatial Interaction: K Function
Event-to-Event Distance Histograms
0 20 40 60 80 1000
10
20
30
40
50
60
70
80
90
100N=100 random stratified events in a study region
0 20 40 60 80 1000
10
20
30
40
50
60
70
80
90
100N=100 clustered events in a study region
0 20 40 60 80 100 120 1400
0.005
0.01
0.015
0.02
0.025
0.03
0.035
0.04
0.045
event-to-event distance
Histogram of event−to−event distances (evenly−spaced)
freq
uenc
y
0 20 40 60 80 1000
0.005
0.01
0.015
0.02
0.025
0.03
0.035
0.04
0.045
event-to-event distance
Histogram of event−to−event distances (clustered)
freq
uenc
y
! for evenly-spaced events, there are more medium-sized E2E distances than small orlarge such distances
! for clustered events, the distribution of E2E distances is multi-modal
Ph. Kyriakidis (UCSB) Geog 210C Spring 2009 24 / 27
Quantifying Spatial Interaction: K Function
Event-to-Event Distance CDFs
0 20 40 60 80 1000
10
20
30
40
50
60
70
80
90
100N=100 random stratified events in a study region
0 20 40 60 80 1000
10
20
30
40
50
60
70
80
90
100N=100 clustered events in a study region
0 20 40 60 80 100 120 1400
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
event-to-event distance
cum
ulat
ive
freq
uenc
y
CDF of event−to−event distances (evenly−spaced)
0 20 40 60 80 1000
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
event-to-event distance
cum
ulat
ive
freq
uenc
y
CDF of event−to−event distances (clustered)
! for clustered events, there are multiple bumps in the CDF of E2E distances due tothe grouping of events in space
Ph. Kyriakidis (UCSB) Geog 210C Spring 2009 25 / 27
Quantifying Spatial Interaction: K Function
Sample K Function Examples
0 20 40 60 80 1000
10
20
30
40
50
60
70
80
90
100N=100 random stratified events in a study region
0 20 40 60 80 1000
10
20
30
40
50
60
70
80
90
100N=100 clustered events in a study region
0 10 20 30 40 500
500
1000
1500
2000
2500
3000
3500
4000
4500
5000
event-to-event distance, d
Are
a!
prop
ortion
,K(d
)
Sample K function (evenly−spaced events)
0 10 20 30 40 500
500
1000
1500
2000
2500
3000
3500
4000
4500
5000
event-to-event distance, d
Are
a!
prop
ortion
,K(d
)
Sample K function (clustered events)
! sample K function K(d) is monotonically increasing and is a scaled (by domainmeasure |D|) version of the CDF of E2E distances
Ph. Kyriakidis (UCSB) Geog 210C Spring 2009 26 / 27
Points To Remember
Recap
Quantifying interaction in spatial point patterns
! event-to-nearest-event distances $& use the sample G function G (d)! point-to-nearest-event distances $& use the sample F function F (d)! event-to-event distances $& use the sample K function K (d)
K function looks at information beyond nearest neighbors
Caveats! clustering is always a function of the overall intensity of a point pattern! clustering might occur due to local intensity variations or due to interaction;
it is very di!cult to disentangle each contribution
Watch out for! boundaries and edge e"ects! distance distortions due to map projections! sampled versus mapped point patterns
Ph. Kyriakidis (UCSB) Geog 210C Spring 2009 27 / 27