Spatial Autocorrelation: The Single Most Important Concept in Geography and GIS! Introduction to...

42
Spatial Autocorrelation: The Single Most Important Concept in Geography and GIS! Introduction to Concepts 1 Briggs Henan University 2010

Transcript of Spatial Autocorrelation: The Single Most Important Concept in Geography and GIS! Introduction to...

Page 1: Spatial Autocorrelation: The Single Most Important Concept in Geography and GIS! Introduction to Concepts 1Briggs Henan University 2010.

Spatial Autocorrelation:The Single Most Important Concept

in Geography and GIS!Introduction to Concepts

1Briggs Henan University 2010

Page 2: Spatial Autocorrelation: The Single Most Important Concept in Geography and GIS! Introduction to Concepts 1Briggs Henan University 2010.

Briggs Henan University 2010 2

Spatial StatisticsDescriptive Spatial Statistics: Centrographic Statistics (This time)

– single, summary measures of a spatial distribution–- Spatial equivalents of mean, standard deviation, etc..

Inferential Spatial Statistics: Point Pattern Analysis (Next time)

Analysis of point location only--no quantity or magnitude (no attribute variable)--Quadrat Analysis

--Nearest Neighbor Analysis, Ripley’s K function

Spatial Autocorrelation (Weeks 5 and 6)– One attribute variable with different magnitudes at each location

The Weights Matrix

Global Measures of Spatial Autocorrelation (Moran’s I, Geary’s C, Getis/Ord Global G)

Local Measures of Spatial Autocorrelation (LISA and others)

Prediction with Correlation and Regression (Week 7)–Two or more attribute variables

Standard statistical models

Spatial statistical models

Page 3: Spatial Autocorrelation: The Single Most Important Concept in Geography and GIS! Introduction to Concepts 1Briggs Henan University 2010.

Point Pattern Analysis (last time)

--points only, and only their location

--there is no “magnitude” value

Spatial Autocorrelation: (this time)

--points and polygons, with different “magnitudes”

-- there is an attribute variable.

--income, rainfall, crime rate, etc.

Point Pattern Analysis (PPA) and Spatial Autocorrelation (SA) :

differences and similarities

3Briggs Henan University 2010

Page 4: Spatial Autocorrelation: The Single Most Important Concept in Geography and GIS! Introduction to Concepts 1Briggs Henan University 2010.

Spatial AutocorrelationMany ways to define it!

1. The confirmation of Tobler’s first law of geographyEverything is related to everything else, but near things are more related than distant things.

2. Using similarityThe degree to which characteristics at one location are similar (or

dissimilar) to those nearby.3. Using probabilityMeasure of the extent to which the occurrence of an event in one

geographic area makes more probable, or less probable, the occurrence of a similar event in a neighboring geographic area.

4. Using correlation Correlation of a variable with itself through space.The correlation between an observation’s value on a variable and the value

of near-by observations on the same variable

Lets look at these in more detail. 4

Page 5: Spatial Autocorrelation: The Single Most Important Concept in Geography and GIS! Introduction to Concepts 1Briggs Henan University 2010.

Spatial Autocorrelation:1. Tobler’s Law

The confirmation of Tobler’s first law of geography*:Everything is related to everything else, but near

things are more related than distant things.

The single most important concept in geography and GIS!

*Tobler W., (1970) "A computer movie simulating urban growth in the Detroit region". Economic Geography, 46(2): 234-240 5Briggs Henan University 2010

Page 6: Spatial Autocorrelation: The Single Most Important Concept in Geography and GIS! Introduction to Concepts 1Briggs Henan University 2010.

6

Spatial:On a map Auto: Self Correlation: Degree of relative similarity

Positive: similar values cluster together on a map

Negative: dissimilar (different) values cluster together on a map

SpatialAutocorrelation

Source: Dr Dan Griffith, with modification

Positive Spatial Autocorrelation

Negative Spatial Autocorrelation

Briggs Henan University 2010

Page 7: Spatial Autocorrelation: The Single Most Important Concept in Geography and GIS! Introduction to Concepts 1Briggs Henan University 2010.

2002 populationdensity

Positive spatial autocorrelation

- high values

surrounded by nearby high values

- intermediate values surrounded

by nearby intermediate values

- low values surrounded by

nearby low values

Page 8: Spatial Autocorrelation: The Single Most Important Concept in Geography and GIS! Introduction to Concepts 1Briggs Henan University 2010.

Negative spatial autocorrelation

- high values

surrounded by nearby low values

- intermediate values surrounded

by nearby intermediate values

- low values surrounded by

nearby high values

competition for space

Grocery store density

Page 9: Spatial Autocorrelation: The Single Most Important Concept in Geography and GIS! Introduction to Concepts 1Briggs Henan University 2010.

Spatial Autocorrelation:more ways to describe it

2. Based on SimilarityThe degree to which characteristics at one location are similar to (or different from)

those nearby.

Similar to = positive spatial autocorrelationDifferent from (dissimilar) = negative spatial autocorrelation

Positive spatial autocorrelation much more common than negative

9Briggs Henan University 2010

Page 10: Spatial Autocorrelation: The Single Most Important Concept in Geography and GIS! Introduction to Concepts 1Briggs Henan University 2010.

Spatial Autocorrelation Exists Everywhere!

POLLUTION MONITORING SATELLITE IMAGE

HOUSEHOLD SAMPLING AGRICULTURAL EXPERIMENT

10

Briggs Henan University 2010

Page 11: Spatial Autocorrelation: The Single Most Important Concept in Geography and GIS! Introduction to Concepts 1Briggs Henan University 2010.

11Briggs Henan University 2010

high negative spatial autocorrelation

no spatial autocorrelation*

high positive spatial autocorrelation

Dispersed Pattern Random Pattern Clustered Pattern

CLU

STERED

UNIFO

RM/

DISPERSED

Spatial Autocorrelation:more ways to describe it

3. Based on ProbabilityMeasure of the extent to which the occurrence of an event in one

geographic unit (polygon) makes more probable, or less probable, the occurrence of a similar event in a neighboring unit.

Do you recognize this from earlier discussion?It’s the same concept as clustered, random, dispersed!

Page 12: Spatial Autocorrelation: The Single Most Important Concept in Geography and GIS! Introduction to Concepts 1Briggs Henan University 2010.

Even More Ways to Describe SA

12

Crime rate in an area

Crime rate in near-by area

4. Using correlationCorrelation of a variable with itself through space.

The correlation between an observation’s value on a variable and the value of near-by observations on the same variable.

Correlation = “similarity”, “association”, or “relationship”

Scatter diagram

Briggs Henan University 2010

Page 13: Spatial Autocorrelation: The Single Most Important Concept in Geography and GIS! Introduction to Concepts 1Briggs Henan University 2010.

Spatial Autocorrelation: shows the association or relationship between the same variable in “near-by” areas.

Scatter Diagram: how is it different?Standard Statistics:

shows the association or relationship between two different variables

13education

income

education

Education“next door”

In a neighboring or near-b y area

Each point is ageographic location

Briggs Henan University 2010

Page 14: Spatial Autocorrelation: The Single Most Important Concept in Geography and GIS! Introduction to Concepts 1Briggs Henan University 2010.

14

Why is Spatial Autocorrelation Important?Two reasons1. Spatial autocorrelation is important because it implies

the existence of a spatial process – Why are near-by areas similar to each other?– Why do high income people live “next door” to each other?– These are GEOGRAPHICAL questions.

• They are about location

2. It invalidates most traditional statistical inference tests– If SA exists, then the results of standard statistical inference

tests may be incorrect (wrong!)– We need to use spatial statistical inference tests

CreateProcesses Pattern

Population

Infer

Sample

Briggs Henan University 2010

Page 15: Spatial Autocorrelation: The Single Most Important Concept in Geography and GIS! Introduction to Concepts 1Briggs Henan University 2010.

15

Why are standard statistical tests wrong?• Statistical tests are based on the assumption that the

values of observations in each sample are independent of one another

• spatial autocorrelation violates this– samples taken from nearby areas are related to each other and

are not independent

Briggs Henan University 2010

Values near each other are similar in magnitude.

Implies a relationship between nearby observations

Page 16: Spatial Autocorrelation: The Single Most Important Concept in Geography and GIS! Introduction to Concepts 1Briggs Henan University 2010.

What is the correlation coefficient (r)?• The most common statistic in all of science• measures the strength of the relationship (or “association”) between

two variables e.g. income and education• Varies on a scale from –1 thru 0 to +1

+1 implies a perfect positive association• As values go up () on one, they also go up () on the other

• income and education0 implies no association-1 implies perfect negative association• As values go up on one () , they go down () on the other

• price and quantity purchased• Full name is the Pearson Product Moment correlation coefficient,

16

Why are standard statistical tests wrong?Example for the correlation coefficient (r)

Briggs Henan University 2010

-1 0 +1() () () ()

Page 17: Spatial Autocorrelation: The Single Most Important Concept in Geography and GIS! Introduction to Concepts 1Briggs Henan University 2010.

Examples of Scatter Diagrams and the Correlation Coefficient

r = 1perfectpositive

r = -1

r = 0.26

weakpositive

perfectnegative

EducationIn

com

e

r = -0.71

strongnegative

Price

Qu

anti

tyPositive

Negative

r = 0.72

strongpositive

17Briggs Henan University 2010

Page 18: Spatial Autocorrelation: The Single Most Important Concept in Geography and GIS! Introduction to Concepts 1Briggs Henan University 2010.

18

Why are standard statistical tests wrong?Example for the correlation coefficient (r)

If Spatial Autocorrelation exists:1.Correlation coefficients appear to be bigger than

they really are, and2.They are more likely to be found “statistically

significant”

You are “fooled twice”:--you are more likely to incorrectly conclude a relationship exists when it does not--You believe that the relationship is stronger than it really is

Briggs Henan University 2010

Page 19: Spatial Autocorrelation: The Single Most Important Concept in Geography and GIS! Introduction to Concepts 1Briggs Henan University 2010.

19

Why are standard statistical tests wrong?Example for the correlation coefficient (r)

If Spatial Autocorrelation exists:• Correlation coefficients bigger than they really are

• because income and education are similar in near by areas• Correlation coefficient is “biased upward”

• Also, more likely to appear “statistically significant”– standard error is smaller because spatial autocorrelation

“artificially” reduces variability– there is actually more variability than it appears

• “exagerated precision”Briggs Henan University 2010

Page 20: Spatial Autocorrelation: The Single Most Important Concept in Geography and GIS! Introduction to Concepts 1Briggs Henan University 2010.

Measuring Spatial Autocorrelation:

the problem of measuring

“nearness” or “proximity”

20Briggs Henan University 2010

Page 21: Spatial Autocorrelation: The Single Most Important Concept in Geography and GIS! Introduction to Concepts 1Briggs Henan University 2010.

Measuring Spatial Autocorrelation:the problem of measuring “nearness”

To measure spatial autocorrelation, we must know the “nearness” of our observations

• Which points or polygons are “ near” or “next to” other points or polygons?

– Which provinces are near Henan?

–How measure this?Seems simple and obvious, but it is not!

21Briggs Henan University 2010

Page 22: Spatial Autocorrelation: The Single Most Important Concept in Geography and GIS! Introduction to Concepts 1Briggs Henan University 2010.

Measuring Spatial Autocorrelation:the Spatial Weights matrix

• Wij the spatial weights matrix measures the relative location of all points i and j,

• Different methods of calculating Wij can result in different values for autocorrelation and different conclusions from statistical significance tests!

Wij

22

Admin_Name Anhui BeijingChongqing Fujian Gansu

Guangdong Guangxi Guizhou Hainan Hebei

Heilongjiang Henan ……. Zhejiang

Anhui 0Beijing 0Chongqing 0Fujian 0Gansu 0Guangdong 0Guangxi 0Guizhou 0Hainan 0Hebei 0Heilongjiang 0Henan 0………Zhejiang 0

?Briggs Henan University 2010

Page 23: Spatial Autocorrelation: The Single Most Important Concept in Geography and GIS! Introduction to Concepts 1Briggs Henan University 2010.

23

Measuring Relative Spatial Location:Contiguity and Distance

Two methods for measuring nearness1. Weights based on Contiguity--binary (0,1)

– If zone j is next to zone i, it receives a weight of 1– otherwise it receives a weight of 0,

• It is essentially excluded

– But what constitutes contiguity? Not as easy as it seems!

2. Weights based on Distance—continuous variable– Measure the actual distance between points, or between

polygon centroids– But what measure do we use, and – distance to what points -- All? Some?Briggs Henan University 2010

Page 24: Spatial Autocorrelation: The Single Most Important Concept in Geography and GIS! Introduction to Concepts 1Briggs Henan University 2010.

Spatial neighbors based on contiguity* (adjacency)

• Sharing a border or boundary– Rook: sharing a border– Queen: sharing a border or a point

24

rookrook queenqueen Hexagons Irregular

Which use?

* Shares common border

Briggs Henan University 2010

Page 25: Spatial Autocorrelation: The Single Most Important Concept in Geography and GIS! Introduction to Concepts 1Briggs Henan University 2010.

Spatial weights matrix for Rook case• Matrix contains a:

– 1 if share a border– 0 if do not share a border

25

A B

C D

A B C D

A 0 1 1 0

B 1 0 0 1

C 1 0 0 1

D 0 1 1 0

4 areal units 4x4 matrix

W =

associatedgeographic connectivity/

weights matrix

associatedgeographic connectivity/

weights matrix

Briggs Henan University 2010

Common border

Page 26: Spatial Autocorrelation: The Single Most Important Concept in Geography and GIS! Introduction to Concepts 1Briggs Henan University 2010.

26

“Close” but no common border– Include polygons which have a centroid within the “convex hull” for the centroids of

polygons that do share a common border

Length of border– Is Shanxi “as close to” Nei Mongol as to Henan?– Base “closeness” on proportion of shared border, not just one (1) or zero (0)– wij = border lengthij /border lengthj)

Problem Situations for Irregular PolygonsMany!

X

Briggs Henan University 2010

Page 27: Spatial Autocorrelation: The Single Most Important Concept in Geography and GIS! Introduction to Concepts 1Briggs Henan University 2010.

Measuring Contiguity: Lagged ContiguityShould we include second order contiguity?

hexagonrook queen

1st

order

2nd

order

27Briggs Henan University 2010

Next nearest neighbor

Nearest neighbor

Page 28: Spatial Autocorrelation: The Single Most Important Concept in Geography and GIS! Introduction to Concepts 1Briggs Henan University 2010.

Formats for Weights Matrix

• Raw versus row standardized

• Full contiguity versus sparse contiguity

Briggs Henan University 2010 28

Page 29: Spatial Autocorrelation: The Single Most Important Concept in Geography and GIS! Introduction to Concepts 1Briggs Henan University 2010.

Briggs Henan University 2010 29

A B C

D E F

Row-standardized geographic contiguity matrices

  A B C D E FRow Sum

A 0 1 0 1 0 0 2B 1 0 1 0 1 0 3C 0 1 0 0 0 1 2D 1 0 0 0 1 0 2E 0 1 0 1 0 1 3F 0 0 1 0 1 0 2

Total number of neighbors--some have more than others

  A B C D E FRow Sum

A 0.0 0.5 0.0 0.5 0.0 0.0 1B 0.3 0.0 0.3 0.0 0.3 0.0 1

C 0.0 0.5 0.0 0.0 0.0 0.5 1D 0.5 0.0 0.0 0.0 0.5 0.0 1

E 0.0 0.3 0.0 0.3 0.0 0.3 1F 0.0 0.0 0.5 0.0 0.5 0.0 1

Row standardized--usually use this

Divide each number by the row sum

Page 30: Spatial Autocorrelation: The Single Most Important Concept in Geography and GIS! Introduction to Concepts 1Briggs Henan University 2010.

Queens Case Full Contiguity Matrix for US States• Column headings not shown (same as rows) • Principal diagonal has 0s (blanks)• other 0s omitted for simplicity• Can be very large, thus inefficient to use. 30Briggs Henan University 2010

Page 31: Spatial Autocorrelation: The Single Most Important Concept in Geography and GIS! Introduction to Concepts 1Briggs Henan University 2010.

Name Fips Ncount N1 N2 N3 N4 N5 N6 N7 N8Alabama 1 4 28 13 12 47Arizona 4 5 35 8 49 6 32Arkansas 5 6 22 28 48 47 40 29California 6 3 4 32 41Colorado 8 7 35 4 20 40 31 49 56Connecticut 9 3 44 36 25Delaware 10 3 24 42 34District of Columbia 11 2 51 24Florida 12 2 13 1Georgia 13 5 12 45 37 1 47Idaho 16 6 32 41 56 49 30 53Illinois 17 5 29 21 18 55 19Indiana 18 4 26 21 17 39Iowa 19 6 29 31 17 55 27 46Kansas 20 4 40 29 31 8Kentucky 21 7 47 29 18 39 54 51 17Louisiana 22 3 28 48 5Maine 23 1 33Maryland 24 5 51 10 54 42 11Massachusetts 25 5 44 9 36 50 33Michigan 26 3 18 39 55Minnesota 27 4 19 55 46 38Mississippi 28 4 22 5 1 47Missouri 29 8 5 40 17 21 47 20 19 31Montana 30 4 16 56 38 46Nebraska 31 6 29 20 8 19 56 46Nevada 32 5 6 4 49 16 41New Hampshire 33 3 25 23 50New Jersey 34 3 10 36 42New Mexico 35 5 48 40 8 4 49New York 36 5 34 9 42 50 25North Carolina 37 4 45 13 47 51North Dakota 38 3 46 27 30Ohio 39 5 26 21 54 42 18Oklahoma 40 6 5 35 48 29 20 8Oregon 41 4 6 32 16 53Pennsylvania 42 6 24 54 10 39 36 34Rhode Island 44 2 25 9South Carolina 45 2 13 37South Dakota 46 6 56 27 19 31 38 30Tennessee 47 8 5 28 1 37 13 51 21 29Texas 48 4 22 5 35 40Utah 49 6 4 8 35 56 32 16Vermont 50 3 36 25 33Virginia 51 6 47 37 24 54 11 21Washington 53 2 41 16West Virginia 54 5 51 21 24 39 42Wisconsin 55 4 26 17 19 27Wyoming 56 6 49 16 31 8 46 30

Sparse Contiguity Matrix for US States -- obtained from Anselin's web site (see powerpoint for link)

Queens Case Sparse Contiguity Matrix for US States•Ncount is the number of neighbors for each state•Max is 8 (Missouri and Tennessee)•Sum of Ncount is 218•Number of common borders (joins) ncount / 2 = 109•N1, N2… FIPS codes for neighbors

31Briggs Henan University 2010

Page 32: Spatial Autocorrelation: The Single Most Important Concept in Geography and GIS! Introduction to Concepts 1Briggs Henan University 2010.

Challenge for You

• Which China province has the most neighbors?– How many does it have?

• Create contiguity matrices for the Provinces of China– Can be done with GeoDA or with ArcGIS– Or you can do it “by hand” – Use the software to see if you get it correct

Briggs Henan University 2010 32

Page 33: Spatial Autocorrelation: The Single Most Important Concept in Geography and GIS! Introduction to Concepts 1Briggs Henan University 2010.

33

Weights Based on Distanceagain, not that simple

1.Functional Form to use?2.Distance metric to use ?3.Which points/polygons to include?4.How measure distance between polygons?

Briggs Henan University 2010

Page 34: Spatial Autocorrelation: The Single Most Important Concept in Geography and GIS! Introduction to Concepts 1Briggs Henan University 2010.

34

Weights Based on Distance

1. Functional Form• We want “nearness” not distance

• Most common choice is the inverse (reciprocal) of the distance between locations i and j (wij = 1/dij)

• Other functions also used– inverse of squared distance (wij =1/dij

2), or

– negative exponential (wij = e-d or wij = e-d2)

Briggs Henan University 2010

nearness

distance

Page 35: Spatial Autocorrelation: The Single Most Important Concept in Geography and GIS! Introduction to Concepts 1Briggs Henan University 2010.

Briggs Henan University 2010 35

2. Distance metric• 2-D Cartesian distance via Pythagorus

Use for projected data

• 3-D Spherical distance via spherical coordinatesCos d = (sin a sin b) + (cos a cos b cos P) where: d = arc distance a = Latitude of A

b = Latitude of BP = degrees of long. A to B

Use for unprojected data

• possible distance metrics:– Euclidean straight line/airline– city block/manhattan metric– distance through network

22 )()( jijiij YYXXd

Appropriate if within a city

Weights Based on Distance

Page 36: Spatial Autocorrelation: The Single Most Important Concept in Geography and GIS! Introduction to Concepts 1Briggs Henan University 2010.

3. What points/polygons to include?

• Distances to all points/polygons?– If use all, may make it impossible to solve

necessary equations: matrix too big– May not make theoretical sense: effects may

only be ‘local’• Is Henan influenced by Xinjiang?

• Include distance to only the “nth” nearest neighbors– How many is n? First? Second?

• Include distances to locations only within a buffer distance

36

Weights based on Distance

Page 37: Spatial Autocorrelation: The Single Most Important Concept in Geography and GIS! Introduction to Concepts 1Briggs Henan University 2010.

4. Measuring distance between polygons– distances usually measured

centroid to centroid, but – could be measured from

boundary of one polygon to centroid of others

– could be measured between the two closest boundary points

• adjustment required for contiguous polygons since distance for these would be zero

37

Weights based on Distance

Briggs Henan University 2010

Page 38: Spatial Autocorrelation: The Single Most Important Concept in Geography and GIS! Introduction to Concepts 1Briggs Henan University 2010.

Many decisions!Many challenges!

That is what makes research fun!

Briggs Henan University 2010 38

Page 39: Spatial Autocorrelation: The Single Most Important Concept in Geography and GIS! Introduction to Concepts 1Briggs Henan University 2010.

What have we learned today?• The concept of spatial autocorrelation.

– “Near things are more similar than distant things”

• The use of the weights matrix Wij to measure “nearness”

• The difficulty of measuring “nearness”– That is a surprise!

Next Time• Measures of Spatial Autocorrelation

– Join Count statistic --Geary’s C– Moran’s I --Getis-Ord G statistic

Briggs Henan University 2010 39

Page 40: Spatial Autocorrelation: The Single Most Important Concept in Geography and GIS! Introduction to Concepts 1Briggs Henan University 2010.

Challenge for You

• Which China province has the most neighbors?– How many does it have?

• Create contiguity matrices for the Provinces of China– Can be done with GeoDA or with ArcGIS– Or you can do it “by hand” – Use the software to see if you get it correct

Briggs Henan University 2010 40

Page 41: Spatial Autocorrelation: The Single Most Important Concept in Geography and GIS! Introduction to Concepts 1Briggs Henan University 2010.

Appendix: A Note on Sampling Assumptions• Another factor which influences results from these tests is the assumption made

regarding the type of sampling involved:– Free (or normality) sampling

• Analogous to sampling with replacement• After a polygon is selected for a sample, it is returned to the population set• The same polygon can occur more than one time in a sample

– Non-free (or randomization) sampling• Analogous to sampling without replacement• After a polygon is selected for a sample, it is not returned to the population set• The same polygon can occur only one time in a sample

• The formulae used to calculate test statistics (particularly the standard error) differ depending on which assumption is made– Generally, the formulae are substantially more complex for randomization sampling—

unfortunately, it is also the more common situation!– Usually, assuming normality sampling requires knowledge about larger trends from outside the

region or access to additional information within the region in order to estimate parameters.

41Briggs Henan University 2010

Page 42: Spatial Autocorrelation: The Single Most Important Concept in Geography and GIS! Introduction to Concepts 1Briggs Henan University 2010.

Briggs Henan University 2010 42