David O’Sullivan University of Auckland School of ... · PDF file• According to...

Spatial analysis: a roadmap

David O’SullivanUniversity of AucklandSchool of Geography and Environmental Science

[email protected]

2 / 51NIH >> GIS and Population Science >> Penn State >> June 3 2004 >> David O'Sullivan

Overview• Definition of spatial analysis

– Data types and basic questions• Classic difficulties of spatial analysis (the bad

news)– Spatial dependence

• Potential of spatial analysis (the good news)– Some generally useful concepts in the analysis of

spatial data


One definition of spatial analysis• According to O’Sullivan ☺ and Unwin (2003) in

Geographic Information Analysis:“concerned with investigating the patterns that arise as a result of processes that may be operating in space. Techniques and methods to enable the representation, description, measurement, comparison and generation of spatial patterns are central to the study of geographic information analysis”

O'Sullivan, D. and Unwin, D. J. 2003. Geographic Information Analysis. Wiley:

Hoboken, NJ.


What else is ‘spatial analysis’?• It depends on your point of view:

1.Spatial data manipulation, often in a GIS, is frequently referred to as ‘spatial analysis’

2.Spatial data analysis is descriptive and exploratory

3.Spatial statistical analysis employs statistical methods to investigate data with respect to some statistical model

4.Spatial modeling is about constructing models to predict or to better understand spatial phenomena


Two key aspects in spatial analysis• Spatial data

– What types of spatial data exist?• Applying standard statistical ideas to

spatial data– What problems does this introduce?– Why is spatial data special?


Points

Lines Areas

Field


Spatial data• There are, broadly speaking, two main

ways of representing the world:

Vector objects Raster fields


Vector object types• Points

– Cities, trees, houses, sightings (of bears, whatever), archaeological finds, crimes, disease incidence and mortality…

• Lines– Drainage, road, communications, power networks,

commutes…• Areas

– Census districts, cities, boroughs, townships, school districts, police precincts, land cover units…


Fields• Field data are useful when a

phenomenon is measurable at all locations– Field data are common in physical geography—air

pressure, wind speed etc.– Land elevation is the classic example– In social/human/population geography field data

are sometimes used to represent estimated densities, since a density may be considered to be measurable everywhere (e.g., crime mapping)


Attribute data types• At each location, we have

measurements of various attributes• There are two broad classes:

– Numerical data, which are either ratio or interval; and

– Categorical data, which are either ordinalor nominal


Scales of measurement• Nominal – descriptive categories• Ordinal – ranked• Interval – gaps are comparable• Ratio – ratios are comparable


The entity-attribute model

PointsLinesAreasFields

NominalOrdinalIntervalRatio

State highway

Spot height

Electoral districts


Some possible combinations

Source:O'Sullivan and Unwin. 2003.


Some reservations• This model is reductive

– In particular, it is a statistician’s and GIS person’s view

– How would you represent yourself in this framework? Your home? Your car? Photographs? Sound-clips? Hyperlinks?

– Scale is often (make that always…) an issue


OK, but…why the interest in spatial data?• We assume that location makes a

difference, so…– statistical distributions are (still) of

interest…– … but now spatial patterns in the data are

also of interest…– … and the possible relationships between

the two are really what matter


Statistics and spatial analysis

Source: O'Sullivan and Unwin. 2003.

• Statistics is about– Describing observed

data– Comparing observed

data to expectationsbased on a statistical model

– Inferring from the comparison whether the observed data is compatible with the assumptions


Simplification in statistics• To make the mathematics work,

assumptions about observed data are made. In particular that–Observations are random samples from a

population–Observations are independent of one

another• These assumptions are almost never

true of spatial data


The problem with spatial dataor: why “spatial [data] is special”

• In a nutshell:“Everything is related to everything else, but near things are more related than distant things ”Tobler, W. 1970. A computer movie simulating urban growth

in the Detroit region, Economic Geography 46, 234–40

– This is sometimes called the First Law of Geography … because it is generally true!

– It follows that: spatial data cannot be considered independent random samples from a population


Why “spatial is special” more specifically• Some commonly identified ‘problems’

with spatial data are:– Autocorrelation– The modifiable areal unit problem (or

MAUP)– Scale effects– Non-uniformity of space and ‘edge effects’


Spatial autocorrelation• This follows directly from the

observation that “… near things are more related than distant things”– Spatial data are self-correlated – There is redundancy in spatial data

because observations made at locations near one another tend to be similar


Problem, or opportunity?• Autocorrelation is only a problem if we

choose to see it as one. Equally, we can– Describe or measure the autocorrelation

structure of spatial data, in order to characterize it

– Then use the description to improve subsequent analysis (e.g., simple interpolation becomes kriging in Geostatistics)


An example: Election 2000• Consider the county level results expressed

in terms of the percent share for George W. Bush

This sort of geographical pattern is typical – and is an example of positive autocorrelation


Describing autocorrelation• Two broad effects can be distinguished:

– First order variation• Large scale variation in the mean value—a

trend or background effect– Second order variation

• Local variation perhaps due to interactioneffects between observations

• May be isotropic (no directional effects) or anisotropic (with a directional component)


First or second order?• In practice, 1st and 2nd order effects are

hard to distinguish…

Source: O’Sullivan and Unwin. 2003.


Autocorrelation statistics• A number of formal measures exist:

– Joins count statistics for binary or classed data

– Moran’s I and Geary’s c for numeric data, at points or aggregated to areas

– Semivariogram and covariogram functions for point data


The modifiable areal unit problem (MAUP)

• Areas are ‘arbitrary’: they are designed for convenience of data collection, not with respect to underlying patterns– Standard statistical techniques are sensitive to the

choice of units– In one study* it was shown that the correlation

between two variables can be estimated anywhere between –1 and +1, depending on the spatial units used!

*Openshaw, S. and P. J. Taylor. 1979. A million or so correlation coefficients: three experiments on the modifiable areal unit

problem. In N. Wrigley (ed.) Statistical Methods in the Spatial Sciences, Pion: London, 127-44.


Source: The Economist

Redistricting• Perhaps the

clearest example of MAUP in practice…


Scale• Geographic scale effects are

fundamental:– Different data types are appropriate at

different scales, so available spatial data may be dependent on scale, ruling out some types of analysis

– Scale is a factor in autocorrelation, in the distinction between first and second order

– It is also a factor in MAUP in the level of aggregation used


Non-uniformity of space• The non-uniformity of space refers to

problems arising from tacit assumptions about the uniform spatial density of ‘background’ populations– For example, ‘clusters’ of crime are expected in

urban areas because more people live there– This leads to numerous analytic complexities

Example: ISO9000 certified firms in the United States


Edge effects• Entities on the edge of a study area only

have neighbors in one direction (toward the middle)– Unless care is taken, this can distort things– Again, coping with this leads to numerous

analytic complexities


The bad news summarized• Data are spatially dependent

– Spatial autocorrelation• Data are also dependent on how you

look at them spatially:– Aggregation– Scale– Non-uniformity of space– Edge-effects


Some good news• Spatial data do have an intrinsic

advantage, however…• In addition to data we have a record of

where the data were observed• Making the most of this extra information

lies at the heart of spatial analysis


Some useful general concepts• A number of concepts are frequently

invoked in spatial analysis– Distance– Adjacency– Interaction– Neighborhood– Proximity polygons


Distance• Easily calculated from two coordinates

– Use Pythagoras’s theorem

22 yxd ∆+∆= ∆y

∆x

d

– It gets trickier on a sphere, but for projected data at sub-regional scales the approximation is usually adequate


Other distance metrics• Network distance on a transport system• Travel time• Perceived distance

– Although mathematiciansgenerally insist that

CBACAB

BAAB

AB

AA

ddddd

dd

+≤=≥=

00


Adjacency• This is a sort of binary distance: two

spatial objects are either adjacent or not– Often we use distance to decide: if d = 0,

then two objects are adjacent– This can get complex for some kinds of

object– The meaning is clearest for polygons


Queens and rooks• These terms are fairly self explanatory,

referring to which types of adjacency we choose to ‘allow’


A simple example


Adjacency applied to measuring autocorrelation

• For each pair of neighboring cases, calculate the covariance:

– This produces a positive number when two values are similar, and a negative number when they are different

• Averaged over all neighboring cases, and scaled by dividing by the variance of the data, we get a number between –1 and +1– This is interpreted in the same way as a standard

correlation coefficient

( )( )xxxx ji −−


Election 2000 again• Those county level results expressed in terms

of the percent share for George W. Bush

The clear spatial pattern in these data is confirmed by a Moran’s Ivalue of around 0.45


Interaction• Interaction (often denoted wij) is a

measure of the likely strength of relationship between two entities– The most common form is inverse-distance

ijij d

w 1∝


Other measures of interaction–Inverse distance powered (usually squared)

kij dw 1

∝

–Negative exponentialkd

ij ew −∝

–Weighted inverse distance

kji

ij dAA

w ∝


Matrices and spatial pattern• Many of these concepts assign a value

to describe the relationship between every pair of objects

• This lends itself to being recorded in a matrix:

=

0451081161014145060919924

1086006711068116916705168101991105106636246868660

D


Spatial weights matrices• A particularly common matrix is the

spatial weights matrix, usually denoted by W

• This records the interaction between each pair of objects, and appears in– autocorrelation, point pattern analysis,

interpolation, spatial regression, geographically weighted regression, spatial interaction modeling…


Neighborhood• Neighborhood is a less clear-cut

concept– It can mean the region of space around

some object– Or the set of objects considered to be

neighbors of that object


Proximity polygons• Proximity polygons are an increasingly

important example of the neighborhood concept

• A proximity polygon is associated with each spatial object and is the region of space nearer to that object than to any other

• A good demonstration of the idea is Voroglideby: Praktische Informatik VI, FernUniversität Hagen, Christian Icking, Rolf Klein, Peter Köllner, Lihong Ma


Uses of proximity polygons• Proximity polygons are commonly used

in– Interpolation– Point pattern analysis

• Increasingly they are used throughout spatial analysis, especially in location decision making


Just to review…• Distance• Adjacency• Interaction• Neighborhood


GIS and spatial analysis• GIS vendors often claim to offer ‘spatial

analysis’– This usually doesn’t mean statistical spatial

analysis, but spatial data manipulation—buffering, overlay etc.

– However, GIS has increased the need for spatial analysis, because more people are making maps, and asking questions about them!


If spatial analysis is so useful, why is it not integrated into GIS?!

• Different perspectives on spatial data– GIS is built around the entity-attribute model.

Spatial analysis uses this data (because that’s the way it comes). Conceptually, spatial analysis sees data as patterns which are the outcomes of processes, which can be quite different.

• Spatial analysis is not widely understood– It has been a specialized field, and therefore hard

to justify incorporating into GIS, as a standard tool• Spatial analysis can make GIS hard to sell

– Spatial analysis is about asking difficult questions, not about easy answers


Questions?

David O’SullivanUniversity of AucklandSchool of Geography and Environmental Science

[email protected]

David O’Sullivan University of Auckland School of ... · PDF file• According to...

Documents

Transcript of David O’Sullivan University of Auckland School of ... · PDF file• According to...