David O’Sullivan University of Auckland School of ... · PDF file• According to...
Transcript of David O’Sullivan University of Auckland School of ... · PDF file• According to...
Spatial analysis: a roadmap
David O’SullivanUniversity of AucklandSchool of Geography and Environmental Science
2 / 51NIH >> GIS and Population Science >> Penn State >> June 3 2004 >> David O'Sullivan
Overview• Definition of spatial analysis
– Data types and basic questions• Classic difficulties of spatial analysis (the bad
news)– Spatial dependence
• Potential of spatial analysis (the good news)– Some generally useful concepts in the analysis of
spatial data
3 / 51NIH >> GIS and Population Science >> Penn State >> June 3 2004 >> David O'Sullivan
One definition of spatial analysis• According to O’Sullivan ☺ and Unwin (2003) in
Geographic Information Analysis:“concerned with investigating the patterns that arise as a result of processes that may be operating in space. Techniques and methods to enable the representation, description, measurement, comparison and generation of spatial patterns are central to the study of geographic information analysis”
O'Sullivan, D. and Unwin, D. J. 2003. Geographic Information Analysis. Wiley:
Hoboken, NJ.
4 / 51NIH >> GIS and Population Science >> Penn State >> June 3 2004 >> David O'Sullivan
What else is ‘spatial analysis’?• It depends on your point of view:
1.Spatial data manipulation, often in a GIS, is frequently referred to as ‘spatial analysis’
2.Spatial data analysis is descriptive and exploratory
3.Spatial statistical analysis employs statistical methods to investigate data with respect to some statistical model
4.Spatial modeling is about constructing models to predict or to better understand spatial phenomena
5 / 51NIH >> GIS and Population Science >> Penn State >> June 3 2004 >> David O'Sullivan
Two key aspects in spatial analysis• Spatial data
– What types of spatial data exist?• Applying standard statistical ideas to
spatial data– What problems does this introduce?– Why is spatial data special?
6 / 51NIH >> GIS and Population Science >> Penn State >> June 3 2004 >> David O'Sullivan
Points
Lines Areas
Field
7 / 51NIH >> GIS and Population Science >> Penn State >> June 3 2004 >> David O'Sullivan
Spatial data• There are, broadly speaking, two main
ways of representing the world:
Vector objects Raster fields
8 / 51NIH >> GIS and Population Science >> Penn State >> June 3 2004 >> David O'Sullivan
Vector object types• Points
– Cities, trees, houses, sightings (of bears, whatever), archaeological finds, crimes, disease incidence and mortality…
• Lines– Drainage, road, communications, power networks,
commutes…• Areas
– Census districts, cities, boroughs, townships, school districts, police precincts, land cover units…
9 / 51NIH >> GIS and Population Science >> Penn State >> June 3 2004 >> David O'Sullivan
Fields• Field data are useful when a
phenomenon is measurable at all locations– Field data are common in physical geography—air
pressure, wind speed etc.– Land elevation is the classic example– In social/human/population geography field data
are sometimes used to represent estimated densities, since a density may be considered to be measurable everywhere (e.g., crime mapping)
10 / 51NIH >> GIS and Population Science >> Penn State >> June 3 2004 >> David O'Sullivan
Attribute data types• At each location, we have
measurements of various attributes• There are two broad classes:
– Numerical data, which are either ratio or interval; and
– Categorical data, which are either ordinalor nominal
11 / 51NIH >> GIS and Population Science >> Penn State >> June 3 2004 >> David O'Sullivan
Scales of measurement• Nominal – descriptive categories• Ordinal – ranked• Interval – gaps are comparable• Ratio – ratios are comparable
12 / 51NIH >> GIS and Population Science >> Penn State >> June 3 2004 >> David O'Sullivan
The entity-attribute model
PointsLinesAreasFields
NominalOrdinalIntervalRatio
State highway
Spot height
Electoral districts
13 / 51NIH >> GIS and Population Science >> Penn State >> June 3 2004 >> David O'Sullivan
Some possible combinations
Source:O'Sullivan and Unwin. 2003.
14 / 51NIH >> GIS and Population Science >> Penn State >> June 3 2004 >> David O'Sullivan
Some reservations• This model is reductive
– In particular, it is a statistician’s and GIS person’s view
– How would you represent yourself in this framework? Your home? Your car? Photographs? Sound-clips? Hyperlinks?
– Scale is often (make that always…) an issue
15 / 51NIH >> GIS and Population Science >> Penn State >> June 3 2004 >> David O'Sullivan
OK, but…why the interest in spatial data?• We assume that location makes a
difference, so…– statistical distributions are (still) of
interest…– … but now spatial patterns in the data are
also of interest…– … and the possible relationships between
the two are really what matter
16 / 51NIH >> GIS and Population Science >> Penn State >> June 3 2004 >> David O'Sullivan
Statistics and spatial analysis
Source: O'Sullivan and Unwin. 2003.
• Statistics is about– Describing observed
data– Comparing observed
data to expectationsbased on a statistical model
– Inferring from the comparison whether the observed data is compatible with the assumptions
17 / 51NIH >> GIS and Population Science >> Penn State >> June 3 2004 >> David O'Sullivan
Simplification in statistics• To make the mathematics work,
assumptions about observed data are made. In particular that–Observations are random samples from a
population–Observations are independent of one
another• These assumptions are almost never
true of spatial data
18 / 51NIH >> GIS and Population Science >> Penn State >> June 3 2004 >> David O'Sullivan
The problem with spatial dataor: why “spatial [data] is special”
• In a nutshell:“Everything is related to everything else, but near things are more related than distant things ”Tobler, W. 1970. A computer movie simulating urban growth
in the Detroit region, Economic Geography 46, 234–40
– This is sometimes called the First Law of Geography … because it is generally true!
– It follows that: spatial data cannot be considered independent random samples from a population
19 / 51NIH >> GIS and Population Science >> Penn State >> June 3 2004 >> David O'Sullivan
Why “spatial is special” more specifically• Some commonly identified ‘problems’
with spatial data are:– Autocorrelation– The modifiable areal unit problem (or
MAUP)– Scale effects– Non-uniformity of space and ‘edge effects’
20 / 51NIH >> GIS and Population Science >> Penn State >> June 3 2004 >> David O'Sullivan
Spatial autocorrelation• This follows directly from the
observation that “… near things are more related than distant things”– Spatial data are self-correlated – There is redundancy in spatial data
because observations made at locations near one another tend to be similar
21 / 51NIH >> GIS and Population Science >> Penn State >> June 3 2004 >> David O'Sullivan
Problem, or opportunity?• Autocorrelation is only a problem if we
choose to see it as one. Equally, we can– Describe or measure the autocorrelation
structure of spatial data, in order to characterize it
– Then use the description to improve subsequent analysis (e.g., simple interpolation becomes kriging in Geostatistics)
22 / 51NIH >> GIS and Population Science >> Penn State >> June 3 2004 >> David O'Sullivan
An example: Election 2000• Consider the county level results expressed
in terms of the percent share for George W. Bush
This sort of geographical pattern is typical – and is an example of positive autocorrelation
23 / 51NIH >> GIS and Population Science >> Penn State >> June 3 2004 >> David O'Sullivan
Describing autocorrelation• Two broad effects can be distinguished:
– First order variation• Large scale variation in the mean value—a
trend or background effect– Second order variation
• Local variation perhaps due to interactioneffects between observations
• May be isotropic (no directional effects) or anisotropic (with a directional component)
24 / 51NIH >> GIS and Population Science >> Penn State >> June 3 2004 >> David O'Sullivan
First or second order?• In practice, 1st and 2nd order effects are
hard to distinguish…
Source: O’Sullivan and Unwin. 2003.
25 / 51NIH >> GIS and Population Science >> Penn State >> June 3 2004 >> David O'Sullivan
Autocorrelation statistics• A number of formal measures exist:
– Joins count statistics for binary or classed data
– Moran’s I and Geary’s c for numeric data, at points or aggregated to areas
– Semivariogram and covariogram functions for point data
26 / 51NIH >> GIS and Population Science >> Penn State >> June 3 2004 >> David O'Sullivan
The modifiable areal unit problem (MAUP)
• Areas are ‘arbitrary’: they are designed for convenience of data collection, not with respect to underlying patterns– Standard statistical techniques are sensitive to the
choice of units– In one study* it was shown that the correlation
between two variables can be estimated anywhere between –1 and +1, depending on the spatial units used!
*Openshaw, S. and P. J. Taylor. 1979. A million or so correlation coefficients: three experiments on the modifiable areal unit
problem. In N. Wrigley (ed.) Statistical Methods in the Spatial Sciences, Pion: London, 127-44.
27 / 51NIH >> GIS and Population Science >> Penn State >> June 3 2004 >> David O'Sullivan
Source: The Economist
Redistricting• Perhaps the
clearest example of MAUP in practice…
28 / 51NIH >> GIS and Population Science >> Penn State >> June 3 2004 >> David O'Sullivan
Scale• Geographic scale effects are
fundamental:– Different data types are appropriate at
different scales, so available spatial data may be dependent on scale, ruling out some types of analysis
– Scale is a factor in autocorrelation, in the distinction between first and second order
– It is also a factor in MAUP in the level of aggregation used
29 / 51NIH >> GIS and Population Science >> Penn State >> June 3 2004 >> David O'Sullivan
Non-uniformity of space• The non-uniformity of space refers to
problems arising from tacit assumptions about the uniform spatial density of ‘background’ populations– For example, ‘clusters’ of crime are expected in
urban areas because more people live there– This leads to numerous analytic complexities
Example: ISO9000 certified firms in the United States
30 / 51NIH >> GIS and Population Science >> Penn State >> June 3 2004 >> David O'Sullivan
Edge effects• Entities on the edge of a study area only
have neighbors in one direction (toward the middle)– Unless care is taken, this can distort things– Again, coping with this leads to numerous
analytic complexities
31 / 51NIH >> GIS and Population Science >> Penn State >> June 3 2004 >> David O'Sullivan
The bad news summarized• Data are spatially dependent
– Spatial autocorrelation• Data are also dependent on how you
look at them spatially:– Aggregation– Scale– Non-uniformity of space– Edge-effects
32 / 51NIH >> GIS and Population Science >> Penn State >> June 3 2004 >> David O'Sullivan
Some good news• Spatial data do have an intrinsic
advantage, however…• In addition to data we have a record of
where the data were observed• Making the most of this extra information
lies at the heart of spatial analysis
33 / 51NIH >> GIS and Population Science >> Penn State >> June 3 2004 >> David O'Sullivan
Some useful general concepts• A number of concepts are frequently
invoked in spatial analysis– Distance– Adjacency– Interaction– Neighborhood– Proximity polygons
34 / 51NIH >> GIS and Population Science >> Penn State >> June 3 2004 >> David O'Sullivan
Distance• Easily calculated from two coordinates
– Use Pythagoras’s theorem
22 yxd ∆+∆= ∆y
∆x
d
– It gets trickier on a sphere, but for projected data at sub-regional scales the approximation is usually adequate
35 / 51NIH >> GIS and Population Science >> Penn State >> June 3 2004 >> David O'Sullivan
Other distance metrics• Network distance on a transport system• Travel time• Perceived distance
– Although mathematiciansgenerally insist that
CBACAB
BAAB
AB
AA
ddddd
dd
+≤=≥=
00
36 / 51NIH >> GIS and Population Science >> Penn State >> June 3 2004 >> David O'Sullivan
Adjacency• This is a sort of binary distance: two
spatial objects are either adjacent or not– Often we use distance to decide: if d = 0,
then two objects are adjacent– This can get complex for some kinds of
object– The meaning is clearest for polygons
37 / 51NIH >> GIS and Population Science >> Penn State >> June 3 2004 >> David O'Sullivan
Queens and rooks• These terms are fairly self explanatory,
referring to which types of adjacency we choose to ‘allow’
38 / 51NIH >> GIS and Population Science >> Penn State >> June 3 2004 >> David O'Sullivan
A simple example
39 / 51NIH >> GIS and Population Science >> Penn State >> June 3 2004 >> David O'Sullivan
Adjacency applied to measuring autocorrelation
• For each pair of neighboring cases, calculate the covariance:
– This produces a positive number when two values are similar, and a negative number when they are different
• Averaged over all neighboring cases, and scaled by dividing by the variance of the data, we get a number between –1 and +1– This is interpreted in the same way as a standard
correlation coefficient
( )( )xxxx ji −−
40 / 51NIH >> GIS and Population Science >> Penn State >> June 3 2004 >> David O'Sullivan
Election 2000 again• Those county level results expressed in terms
of the percent share for George W. Bush
The clear spatial pattern in these data is confirmed by a Moran’s Ivalue of around 0.45
41 / 51NIH >> GIS and Population Science >> Penn State >> June 3 2004 >> David O'Sullivan
Interaction• Interaction (often denoted wij) is a
measure of the likely strength of relationship between two entities– The most common form is inverse-distance
ijij d
w 1∝
42 / 51NIH >> GIS and Population Science >> Penn State >> June 3 2004 >> David O'Sullivan
Other measures of interaction–Inverse distance powered (usually squared)
kij dw 1
∝
–Negative exponentialkd
ij ew −∝
–Weighted inverse distance
kji
ij dAA
w ∝
43 / 51NIH >> GIS and Population Science >> Penn State >> June 3 2004 >> David O'Sullivan
Matrices and spatial pattern• Many of these concepts assign a value
to describe the relationship between every pair of objects
• This lends itself to being recorded in a matrix:
=
0451081161014145060919924
1086006711068116916705168101991105106636246868660
D
44 / 51NIH >> GIS and Population Science >> Penn State >> June 3 2004 >> David O'Sullivan
Spatial weights matrices• A particularly common matrix is the
spatial weights matrix, usually denoted by W
• This records the interaction between each pair of objects, and appears in– autocorrelation, point pattern analysis,
interpolation, spatial regression, geographically weighted regression, spatial interaction modeling…
45 / 51NIH >> GIS and Population Science >> Penn State >> June 3 2004 >> David O'Sullivan
Neighborhood• Neighborhood is a less clear-cut
concept– It can mean the region of space around
some object– Or the set of objects considered to be
neighbors of that object
46 / 51NIH >> GIS and Population Science >> Penn State >> June 3 2004 >> David O'Sullivan
Proximity polygons• Proximity polygons are an increasingly
important example of the neighborhood concept
• A proximity polygon is associated with each spatial object and is the region of space nearer to that object than to any other
• A good demonstration of the idea is Voroglideby: Praktische Informatik VI, FernUniversität Hagen, Christian Icking, Rolf Klein, Peter Köllner, Lihong Ma
47 / 51NIH >> GIS and Population Science >> Penn State >> June 3 2004 >> David O'Sullivan
Uses of proximity polygons• Proximity polygons are commonly used
in– Interpolation– Point pattern analysis
• Increasingly they are used throughout spatial analysis, especially in location decision making
48 / 51NIH >> GIS and Population Science >> Penn State >> June 3 2004 >> David O'Sullivan
Just to review…• Distance• Adjacency• Interaction• Neighborhood
49 / 51NIH >> GIS and Population Science >> Penn State >> June 3 2004 >> David O'Sullivan
GIS and spatial analysis• GIS vendors often claim to offer ‘spatial
analysis’– This usually doesn’t mean statistical spatial
analysis, but spatial data manipulation—buffering, overlay etc.
– However, GIS has increased the need for spatial analysis, because more people are making maps, and asking questions about them!
50 / 51NIH >> GIS and Population Science >> Penn State >> June 3 2004 >> David O'Sullivan
If spatial analysis is so useful, why is it not integrated into GIS?!
• Different perspectives on spatial data– GIS is built around the entity-attribute model.
Spatial analysis uses this data (because that’s the way it comes). Conceptually, spatial analysis sees data as patterns which are the outcomes of processes, which can be quite different.
• Spatial analysis is not widely understood– It has been a specialized field, and therefore hard
to justify incorporating into GIS, as a standard tool• Spatial analysis can make GIS hard to sell
– Spatial analysis is about asking difficult questions, not about easy answers
51 / 51NIH >> GIS and Population Science >> Penn State >> June 3 2004 >> David O'Sullivan
Questions?
David O’SullivanUniversity of AucklandSchool of Geography and Environmental Science