Spatial Data Mining: Cluster Analysis and Space-Time Analysis · 2020. 7. 2. · fishnet grid...

Post on 22-Aug-2021

9 views 0 download

Transcript of Spatial Data Mining: Cluster Analysis and Space-Time Analysis · 2020. 7. 2. · fishnet grid...

Lauren Bennett, Flora Vale, Alberto Nieto

Spatial Data Mining: Cluster Analysis and Space-Time Analysis

esriurl.com/spatialstats

Subjectivity of Maps

High Priority 911 Calls in Baltimore

The map as data

Where are the hot spots? Where is the variation greater?

High Priority 911 Calls in Baltimore

The map as data

High Priority 911 Calls in Baltimore

The map as data

Where are the hot spots? Where is the variation greater?

The subjectivity of visual pattern analysisNatural Breaks Quantile

Where are the hot spots? Where is the variation greater?

Minimizing the subjectivityTurning the map into information

Inferential Statistics

Complete Spatial RANDOMNESS

Is there a PATTERN?

3565 89 799214 15 32 38

46 26 43 38 32 79 50 28 84

19 71 69 39 93 75 10 58 20

97 49 44 59 23 07 81 64 06

28 62 08 99 86 28 03 48 25

35

65

8979

92

14

15

62 3846

26

4338

32

79

5028

84

19

71

69

39

93

7510

5820

97

49

4459

23

07

81

64

03

2832

08

99

83

28

0648

25

35

65

8979

92

14

15

62 3846

26

4338

32

79

5028

84

19

71

69

39

93

7510

5820

97

49

4459

23

07

81

64

03

2832

08

99

83

28

0648

25

z-scoresp-values

99%99%

95%95%

90% 90%random

z-scores …-2.58 -1.96 -1.65 0 1.65 1.96 2.58…p-value … 0.01 0.05 0.10 0 0.10 0.05 0.01…

z-scores and p-values

Minimizing the subjectivityTurning the map into information

“…everything is related to everything else, but near things are more related than distant things.”

Hot Spot Analysis

given a set of weighted features, identifies statistically significant hot spots and cold spots using the Getis-Ord Gi* statistic

Polygons

feature

each feature has a

value

Neighborhood

Study Area

is this

significantly different from this?

if significantly higher…

feature is marked as a

hot spot!

…how do we know if it’s

SIGNIFICANTLY different???

MATH!

Getis-Ord Gi* Statistic

Points

“Where are the

Hot Spots?”

“Where are

the highest values?”

not (necessarily)

the same as

How are neighborhoodsizes determined?

Fixed Distance Band

Fixed Distance Band

Fixed Distance Band

Fixed Distance Band

Fixed Distance Band

Fixed Distance Band

z-sc

ore

Spatial Autocorrelation by Distance5

4

3

2

1

Distance (meters)60 80 100 120 140 160 180

Spatial Autocorrelation by Distance

z-sc

ore

5

4

3

2

1

Distance (meters)60 80 100 120 140 160 180

Spatial Autocorrelation by Distance

z-sc

ore

Distance (meters)60 80 100 120 140 160 180

5

4

3

2

1

Fixed Distance Band = 70.3 meters

16

1617

14

32

16

1617

14

32

16

1617

14

32

18

2

dispersed

clustered

Contiguity Edges

Contiguity Rook’s Case

Contiguity Edges/Corners

Contiguity Queen’s Case

K Nearest Neighbors

K Nearest Neighbors

K Nearest Neighbors

Network Spatial Weights

Network Spatial Weights

demo

given a set of weighted features, identifies statistically significant hot spots, cold spots, and spatial outliers using the Anselin Local Moran's I statistic

Cluster and Outlier Analysis

is the neighborhood significantly different from all other neighborhoods?

is the neighborhood significantly different from all other neighborhoods?

is the feature significantly different from all other features?

HH

feature is higher than other features, neighborhood is higherthan other neighborhoods

HH

LL

feature is higher than other features, neighborhood is higherthan other neighborhoods

feature is lower than other features, neighborhood is lower than other neighborhoods

HHHL

LL

feature is higher than other features, neighborhood is higherthan other neighborhoods

feature is lower than other features, neighborhood is lower than other neighborhoods

feature is higher than other features, neighborhood is lowerthan other neighborhoods

LH

HHHL

LL

feature is higher than other features, neighborhood is higherthan other neighborhoods

feature is lower than other features, neighborhood is lower than other neighborhoods

feature is lower than other features, neighborhood is higher than other neighborhoods

feature is higher than other features, neighborhood is lowerthan other neighborhoods

outliers clustersz-score

demo

911 CallsSpace Time Pattern MiningDefining neighbors in space and time

Hot Spot Analysis Cluster and Outlier Analysis

Quantifying Spatial Clusters

What about time?

Where have things changed? Is it meaningful?

January 2016911 Calls

January 2015

911 Calls

“…everything is related to everything else, but

things are more related than distant things.”

near and recent

Quantifying Spatiotemporal Clusters

The Space Time Pattern Mining Tools

Create Space Time Cube By Aggregating Points

aggregates a set of points into space-time bins, counting the number of points and optionally calculating statistics for attributes

10km

February

March

April

May

June

July

10km10km

10km10km

10km10km

January

10km10km

10km10km

10km10km

10km

bin

location

time slice

counts

trend analysis

summaryfieldssummeanmedianminmaxstandard deviation

filling empty bins

zeros

spatial neighbors

space time neighbors

space time neighbors

temporal neighbors

temporal neighbors

past present

3/22 3/23 3/24 3/25 3/26 3/27 3/28 3/29 3/30 3/31 4/1 4/2 4/3 4/4 4/5 4/6 4/7 4/8 4/9 4/10 4/11 4/12 4/13 4/14 4/15 4/16

First data point:March 22, 2018

(start)

Last data point:April 16, 2018

(end)

past present

End Time

3/213/203/19

ending on 3/23back to but not including

3/19

ending on 3/31back to but not including

3/27

ending on 4/8back to but not including

4/4

ending on 4/12back to but not including

4/8

ending on 4/16back to but not including

4/12

ending on 4/4back to but not including

3/31

ending on 3/27back to but not including

3/23

3/22 3/23 3/24 3/25 3/26 3/27 3/28 3/29 3/30 3/31 4/1 4/2 4/3 4/4 4/5 4/6 4/7 4/8 4/9 4/10 4/11 4/12 4/13 4/14 4/15 4/16

First data point:March 22, 2018

(start)

Last data point:April 16, 2018

(end)

past present

End Time

3/213/203/19

ending on 3/23back to but not including

3/19

ending on 3/31back to but not including

3/27

ending on 4/8back to but not including

4/4

ending on 4/12back to but not including

4/8

ending on 4/16back to but not including

4/12

ending on 4/4back to but not including

3/31

ending on 3/27back to but not including

3/23

3/22 3/23 3/24 3/25 3/26 3/27 3/28 3/29 3/30 3/31 4/1 4/2 4/3 4/4 4/5 4/6 4/7 4/8 4/9 4/10 4/11 4/12 4/13 4/14 4/15 4/16

First data point:March 22, 2018

(start)

Last data point:April 16, 2018

(end)

past present

Start Time

3/22 3/23 3/24 3/25 3/26 3/27 3/28 3/29 3/30 3/31 4/1 4/2 4/3 4/4 4/5 4/6 4/7 4/8 4/9 4/10 4/11 4/12 4/13 4/14 4/15 4/16 4/194/184/17

starting on 4/11up to but not including

4/15

starting on 3/22up to but not including

3/26

First data point:March 22, 2018

(start)

starting on 3/30up to but not including

4/3

starting on 4/7up to but not including

4/11

starting on 4/15up to but not including

4/19

starting on 4/3up to but not including

4/7

starting on 3/26up to but not including

3/30

Last data point:April 16, 2018

(end)

past present

Start Time

3/22 3/23 3/24 3/25 3/26 3/27 3/28 3/29 3/30 3/31 4/1 4/2 4/3 4/4 4/5 4/6 4/7 4/8 4/9 4/10 4/11 4/12 4/13 4/14 4/15 4/16 4/194/184/17

starting on 4/11up to but not including

4/15

starting on 3/22up to but not including

3/26

First data point:March 22, 2018

(start)

starting on 3/30up to but not including

4/3

starting on 4/7up to but not including

4/11

starting on 4/15up to but not including

4/19

starting on 4/3up to but not including

4/7

starting on 3/26up to but not including

3/30

Last data point:April 16, 2018

(end)

Temporal Bias

Aggregation Options

fishnetgrid

hexagongrid

defined locations

Create Space Time Cube From Defined Locations

structures panel data or station data where geography does not change but attributes change over time into space-time bins

GSB002

GSB003GSB001

ID DateTime Reading

GSB001 09/14/2015 00:00:00 03.14

GSB001 09/14/2015 05:00:00 15.92

GSB001 09/14/2015 10:00:00 69.35

GSB002 09/14/2015 00:00:00 89.79

GSB002 09/14/2015 05:00:00 32.38

GSB002 09/14/2015 10:00:00 46.26

GSB003 09/14/2015 00:00:00 43.38

Data can be one set of features with a related table containing attributes over time

GSB002 GSB003GSB001

or repeating shapes contained within the same feature class

09/14/2015 00:00:00

09/14/2015 05:00:00

09/14/2015 10:00:00

or repeating shapes contained within the same feature class

Fill Missing Values

demo

Emerging Hot Spot Analysisidentifies trends in the clustering of counts or summary fields in a space time cube

significantly different from this?

is this

Entire Cube

significantly different from this?

is this

Neighborhood Time Step

significantly different from this?

is this

Neighborhood Time Step

significantly different from this?

is this

Individual Time Step

significantly different from this?

is this

Individual Time Step

New Hot Spot

New Cold Spot

Consecutive Hot Spot

Intensifying Hot Spot

Consecutive Cold Spot

Persistent Hot Spot

Intensifying Cold Spot

Diminishing Hot Spot

Persistent Cold Spot

Sporadic Hot Spot

Diminishing Cold Spot

Oscillating Hot Spot

Sporadic Cold Spot

Historical Hot Spot

Oscillating Cold Spot

No trend detected

Historical Cold Spot

New Hot Spot

New Cold Spot

Consecutive Hot Spot

Intensifying Hot Spot

Consecutive Cold Spot

Persistent Hot Spot

Intensifying Cold Spot

Diminishing Hot Spot

Persistent Cold Spot

Sporadic Hot Spot

Diminishing Cold Spot

Oscillating Hot Spot

Sporadic Cold Spot

Historical Hot Spot

Oscillating Cold Spot

No trend detected

Historical Cold Spot

New Hot Spot

New Cold Spot

Consecutive Hot Spot

Intensifying Hot Spot

Consecutive Cold Spot

Persistent Hot Spot

Intensifying Cold Spot

Diminishing Hot Spot

Persistent Cold Spot

Sporadic Hot Spot

Diminishing Cold Spot

Oscillating Hot Spot

Sporadic Cold Spot

Historical Hot Spot

Oscillating Cold Spot

No trend detected

Historical Cold Spot

New Hot Spot

New Cold Spot

Consecutive Hot Spot

Intensifying Hot Spot

Consecutive Cold Spot

Persistent Hot Spot

Intensifying Cold Spot

Diminishing Hot Spot

Persistent Cold Spot

Sporadic Hot Spot

Diminishing Cold Spot

Oscillating Hot Spot

Sporadic Cold Spot

Historical Hot Spot

Oscillating Cold Spot

No trend detected

Historical Cold Spot

demo

Local Outlier Analysisidentifies statistically significant clusters and outliers in the context of both space and time

is the neighborhood significantly different from all other neighborhoods?

is the bin significantly different from all other bins?

bin is higher than other bins, neighborhood is lowerthan other neighborhoods

bin is higher than other bins, neighborhood is higher than other neighborhoods

bin is lower than other bins, neighborhood is lower than other neighborhoods

bin is lower than other bins, neighborhood is higher than other neighborhoods

LH

HHHL

LL

Only High-High Cluster

Only High-Low Outlier

Only Low-High Outlier

Only Low-Low Cluster

Multiple Types

Never Significant

Only High-High Cluster

Only High-Low Outlier

Only Low-High Outlier

Only Low-Low Cluster

Multiple Types

Never Significant

Only High-High Cluster

Only High-Low Outlier

Only Low-High Outlier

Only Low-Low Cluster

Multiple Types

Never Significant

Only High-High Cluster

Only High-Low Outlier

Only Low-High Outlier

Only Low-Low Cluster

Multiple Types

Never Significant

demo

Want to learn more???

esriurl.com/spatialstats

TUESDAY_________________________________________

1:45p Data Visualization for Spatial Analysis 146C

3:00p Machine Learning in ArcGIS 146C

4:15p From Means and Medians to Machine Learning: Spatial Statistics Basics and Innovations 146C

WEDNESDAY______________________________________

8:30a Machine Learning in ArcGIS 146C

11a Data Visualization for Spatial Analysis 146C

1:30p From Means and Medians to Machine Learning: Spatial Statistics Basics and Innovations 146C

2:45p Spatial Data Mining: Cluster Analysis and Space Time Analysis 146C

4:00p Beyond Where: Modeling Spatial Relationships and Making Predictions 146C

5:15p The Forest for the Trees: Making Predictions Using Forest-Based Classification and Regression 146C

Please fill out a course survey!!!

lbennett@esri.comanieto@esri.comfvale@esri.com