Lecture 9 Spatial Data Mining - Fudan Universityadmis.fudan.edu.cn/member/sgzhou/courses/data... ·...

89
Data Mining: Tech. & Appl. Lecture 9 Spatial Data Mining Zhou Shuigeng May 27, 2007

Transcript of Lecture 9 Spatial Data Mining - Fudan Universityadmis.fudan.edu.cn/member/sgzhou/courses/data... ·...

Page 1: Lecture 9 Spatial Data Mining - Fudan Universityadmis.fudan.edu.cn/member/sgzhou/courses/data... · Daily data for temperature, precipitation, wind velocity, etc. Data warehouse using

Data Mining: Tech. & Appl.

Lecture 9Spatial Data Mining

Zhou Shuigeng

May 27, 2007

Page 2: Lecture 9 Spatial Data Mining - Fudan Universityadmis.fudan.edu.cn/member/sgzhou/courses/data... · Daily data for temperature, precipitation, wind velocity, etc. Data warehouse using

Data Mining: Tech. & Appl.

OutlineSpatial DatabasesSpatial Data MiningSpatial Data WarehousingSpatial Data Mining MethodsSummaryReferences

Page 3: Lecture 9 Spatial Data Mining - Fudan Universityadmis.fudan.edu.cn/member/sgzhou/courses/data... · Daily data for temperature, precipitation, wind velocity, etc. Data warehouse using

Data Mining: Tech. & Appl.

OutlineSpatial DatabasesSpatial Data MiningSpatial Data WarehousingSpatial Data Mining MethodsSummaryReferences

Page 4: Lecture 9 Spatial Data Mining - Fudan Universityadmis.fudan.edu.cn/member/sgzhou/courses/data... · Daily data for temperature, precipitation, wind velocity, etc. Data warehouse using

Data Mining: Tech. & Appl.

Spatial DataSpatial data has location or geo-referenced featuresSome of these features are:

Address, latitude/longitude (explicit)Location-based partitions in databases (implicit)

Page 5: Lecture 9 Spatial Data Mining - Fudan Universityadmis.fudan.edu.cn/member/sgzhou/courses/data... · Daily data for temperature, precipitation, wind velocity, etc. Data warehouse using

Data Mining: Tech. & Appl.

Spatial DatabasesSpatial Database Systems (SDBS)

database systems supporting spatial datatypes in data model and implementationobjects with location and extension in a multi-dimensional space

Page 6: Lecture 9 Spatial Data Mining - Fudan Universityadmis.fudan.edu.cn/member/sgzhou/courses/data... · Daily data for temperature, precipitation, wind velocity, etc. Data warehouse using

Data Mining: Tech. & Appl.

Spatial Data FormatRaster Data

represents spatial data as rows / columns of pixels (volume representation)obtained from equipment such as earth observation satellites which measure the emitted / reflected amplitude in some frequency band

Vector Datarepresent spatial data by their boundary (boundary representation)points, lines, polygons, polyhedrons, etc.often obtained from raster data using image processing methods

Page 7: Lecture 9 Spatial Data Mining - Fudan Universityadmis.fudan.edu.cn/member/sgzhou/courses/data... · Daily data for temperature, precipitation, wind velocity, etc. Data warehouse using

Data Mining: Tech. & Appl.

Spatial Queries (1)Spatial selection may involve specialized selection comparison operations:

NearNorth, South, East, WestContained inOverlap/intersect

Region (Range) query find objects that intersect a given regionNearest neighbor query find object close to identified objectDistance scan find object within a certain distance of an identified object where distance is made increasingly larger

Page 8: Lecture 9 Spatial Data Mining - Fudan Universityadmis.fudan.edu.cn/member/sgzhou/courses/data... · Daily data for temperature, precipitation, wind velocity, etc. Data warehouse using

Data Mining: Tech. & Appl.

Spatial Queries (2)

Page 9: Lecture 9 Spatial Data Mining - Fudan Universityadmis.fudan.edu.cn/member/sgzhou/courses/data... · Daily data for temperature, precipitation, wind velocity, etc. Data warehouse using

Data Mining: Tech. & Appl.

Spatial Queries (3)

Page 10: Lecture 9 Spatial Data Mining - Fudan Universityadmis.fudan.edu.cn/member/sgzhou/courses/data... · Daily data for temperature, precipitation, wind velocity, etc. Data warehouse using

Data Mining: Tech. & Appl.

Spatial Data StructuresData structures designed specifically to store or index spatial dataOften based on B-tree or Binary Search TreeCluster data on disk based on geographic locationMay represent complex spatial structure by placing the spatial object in a containing structure of a specific geographic shapeTechniques:

Quad TreeR-Treek-D Tree

Page 11: Lecture 9 Spatial Data Mining - Fudan Universityadmis.fudan.edu.cn/member/sgzhou/courses/data... · Daily data for temperature, precipitation, wind velocity, etc. Data warehouse using

Data Mining: Tech. & Appl.

MBRMinimum Bounding RectangleSmallest rectangle that completely contains the object

Page 12: Lecture 9 Spatial Data Mining - Fudan Universityadmis.fudan.edu.cn/member/sgzhou/courses/data... · Daily data for temperature, precipitation, wind velocity, etc. Data warehouse using

Data Mining: Tech. & Appl.

MBR Examples

Page 13: Lecture 9 Spatial Data Mining - Fudan Universityadmis.fudan.edu.cn/member/sgzhou/courses/data... · Daily data for temperature, precipitation, wind velocity, etc. Data warehouse using

Data Mining: Tech. & Appl.

Quad TreeHierarchical decomposition of the space into quadrants (MBRs)Each level in the tree represents the object as the set of quadrants which contain any portion of the objectEach lower level is a more exact representation of the objectThe number of levels is determined by the degree of accuracy desired

Page 14: Lecture 9 Spatial Data Mining - Fudan Universityadmis.fudan.edu.cn/member/sgzhou/courses/data... · Daily data for temperature, precipitation, wind velocity, etc. Data warehouse using

Data Mining: Tech. & Appl.

Quad Tree Example

Page 15: Lecture 9 Spatial Data Mining - Fudan Universityadmis.fudan.edu.cn/member/sgzhou/courses/data... · Daily data for temperature, precipitation, wind velocity, etc. Data warehouse using

Data Mining: Tech. & Appl.

R-TreeAs with Quad Tree the region is divided into successively smaller rectangles (MBRs).Rectangles need not be of the same size or number at each levelRectangles may actually overlapLowest level cell has only one objectTree maintenance algorithms similar to those for B-trees

Page 16: Lecture 9 Spatial Data Mining - Fudan Universityadmis.fudan.edu.cn/member/sgzhou/courses/data... · Daily data for temperature, precipitation, wind velocity, etc. Data warehouse using

Data Mining: Tech. & Appl.

R-Tree Example

Page 17: Lecture 9 Spatial Data Mining - Fudan Universityadmis.fudan.edu.cn/member/sgzhou/courses/data... · Daily data for temperature, precipitation, wind velocity, etc. Data warehouse using

Data Mining: Tech. & Appl.

K-D TreeDesigned for multi-attribute data, not necessarily spatialVariation of binary search treeEach level is used to index one of the dimensions of the spatial objectLowest level cell has only one objectDivisions not based on MBRs but successive divisions of the dimension range

Page 18: Lecture 9 Spatial Data Mining - Fudan Universityadmis.fudan.edu.cn/member/sgzhou/courses/data... · Daily data for temperature, precipitation, wind velocity, etc. Data warehouse using

Data Mining: Tech. & Appl.

k-D Tree Example

Page 19: Lecture 9 Spatial Data Mining - Fudan Universityadmis.fudan.edu.cn/member/sgzhou/courses/data... · Daily data for temperature, precipitation, wind velocity, etc. Data warehouse using

Data Mining: Tech. & Appl.

Topological RelationshipsDisjointOverlaps or IntersectsEqualsCovered by or inside or contained inCovers or contains

Page 20: Lecture 9 Spatial Data Mining - Fudan Universityadmis.fudan.edu.cn/member/sgzhou/courses/data... · Daily data for temperature, precipitation, wind velocity, etc. Data warehouse using

Data Mining: Tech. & Appl.

Distance Between Objects

EuclideanManhattanExtensions:

Page 21: Lecture 9 Spatial Data Mining - Fudan Universityadmis.fudan.edu.cn/member/sgzhou/courses/data... · Daily data for temperature, precipitation, wind velocity, etc. Data warehouse using

Data Mining: Tech. & Appl.

OutlineSpatial DatabasesWhat’s Spatial Data Mining?Spatial Data WarehousingSpatial Data Mining MethodsSummaryReferences

Page 22: Lecture 9 Spatial Data Mining - Fudan Universityadmis.fudan.edu.cn/member/sgzhou/courses/data... · Daily data for temperature, precipitation, wind velocity, etc. Data warehouse using

Data Mining: Tech. & Appl.

Spatial Data Mining (SDM)The process of discovering

interesting,useful, non-trivial patterns from large spatial datasets

Spatial patternsSpatial outlier, discontinuities

bad traffic sensors on highwaysLocation prediction models

model to identify habitat of endangered speciesSpatial clusters

crime hot-spots, cancer clustersCo-location patterns

predator-prey species, symbiosis(共生现象)

Dental health and fluoride(氟化物)

Page 23: Lecture 9 Spatial Data Mining - Fudan Universityadmis.fudan.edu.cn/member/sgzhou/courses/data... · Daily data for temperature, precipitation, wind velocity, etc. Data warehouse using

Data Mining: Tech. & Appl.

Spatial Cluster: ExampleThe 1854 Asiatic Cholera(亚细亚霍乱)in London

Page 24: Lecture 9 Spatial Data Mining - Fudan Universityadmis.fudan.edu.cn/member/sgzhou/courses/data... · Daily data for temperature, precipitation, wind velocity, etc. Data warehouse using

Data Mining: Tech. & Appl.

Spatial Outliers: ExampleSpatial Outliers

Traffic Data in Twin CitiesAbnormal Sensor DetectionsSpatial and Temporal Outliers

Page 25: Lecture 9 Spatial Data Mining - Fudan Universityadmis.fudan.edu.cn/member/sgzhou/courses/data... · Daily data for temperature, precipitation, wind velocity, etc. Data warehouse using

Data Mining: Tech. & Appl.

Predictive Models: ExampleLocation Prediction: Bird Habitat Prediction

Given training dataPredictive model buildingPredict new data

Page 26: Lecture 9 Spatial Data Mining - Fudan Universityadmis.fudan.edu.cn/member/sgzhou/courses/data... · Daily data for temperature, precipitation, wind velocity, etc. Data warehouse using

Data Mining: Tech. & Appl.

Co-locations: ExampleGiven: A collection of different types of spatial eventsFind: Co-located subsets of event types

Page 27: Lecture 9 Spatial Data Mining - Fudan Universityadmis.fudan.edu.cn/member/sgzhou/courses/data... · Daily data for temperature, precipitation, wind velocity, etc. Data warehouse using

Data Mining: Tech. & Appl.

Data in Spatial Data MiningNon-spatial Information

Same as data in traditional data miningNumerical, categorical, ordinal, boolean, etce.g., city name, city population

Spatial InformationSpatial attribute: geographically referenced

Neighborhood and extentLocation, e.g., longitude, latitude, elevation

Spatial data representationsRaster: gridded spaceVector: point, line, polygonGraph: node, edge, path

Page 28: Lecture 9 Spatial Data Mining - Fudan Universityadmis.fudan.edu.cn/member/sgzhou/courses/data... · Daily data for temperature, precipitation, wind velocity, etc. Data warehouse using

Data Mining: Tech. & Appl.

Relationships on Data in Spatial Data Mining (1)

Relationships on non-spatial dataExplicitArithmetic, ranking(ordering), etc.Object is instance of a class, class is a subclass of another class, object is part of another object, object is a membership of a set

Page 29: Lecture 9 Spatial Data Mining - Fudan Universityadmis.fudan.edu.cn/member/sgzhou/courses/data... · Daily data for temperature, precipitation, wind velocity, etc. Data warehouse using

Data Mining: Tech. & Appl.

Relationships on Data in Spatial Data Mining (2)

Relationships on Spatial DataMany are implicitRelationship Categories

Set-oriented: union, intersection, and membership, etcTopological: meet, within, overlap, etcDirectional: North, NE, left, above, behind, etcMetric: e.g., Euclidean: distance, area, perimeterDynamic: update, create, destroy, etcShape-based and visibility

Granularity

Page 30: Lecture 9 Spatial Data Mining - Fudan Universityadmis.fudan.edu.cn/member/sgzhou/courses/data... · Daily data for temperature, precipitation, wind velocity, etc. Data warehouse using

Data Mining: Tech. & Appl.

Relationships on Data in Spatial Data Mining (3)

Granularity of Spatial DataExamples of granularity

Page 31: Lecture 9 Spatial Data Mining - Fudan Universityadmis.fudan.edu.cn/member/sgzhou/courses/data... · Daily data for temperature, precipitation, wind velocity, etc. Data warehouse using

Data Mining: Tech. & Appl.

What’s NOT Spatial Data MiningSimple Querying of Spatial Data

Find neighbors of Canada given names and boundaries of all countries

Testing a hypothesis via a primary data analysisFemale chimpanzee territories are smaller than male territories

Uninteresting or obvious patterns in spatial dataHeavy rainfall in Minneapolis is correlated with heavy rainfall in St. Paul, Given that the two cities are 10 miles apart

Mining of non-spatial dataDiaper sales and beer sales are correlated in evening

Page 32: Lecture 9 Spatial Data Mining - Fudan Universityadmis.fudan.edu.cn/member/sgzhou/courses/data... · Daily data for temperature, precipitation, wind velocity, etc. Data warehouse using

Data Mining: Tech. & Appl.

SDM ApplicationsGeology(地质学)

GIS SystemsEnvironmental ScienceAgricultureMedicineRoboticsMay involved both spatial and temporal aspects

Page 33: Lecture 9 Spatial Data Mining - Fudan Universityadmis.fudan.edu.cn/member/sgzhou/courses/data... · Daily data for temperature, precipitation, wind velocity, etc. Data warehouse using

Data Mining: Tech. & Appl.

OutlineSpatial DatabasesSpatial Data MiningSpatial Data WarehousingSpatial Data Mining MethodsSummaryReferences

Page 34: Lecture 9 Spatial Data Mining - Fudan Universityadmis.fudan.edu.cn/member/sgzhou/courses/data... · Daily data for temperature, precipitation, wind velocity, etc. Data warehouse using

Data Mining: Tech. & Appl.

Spatial Data WarehousingSpatial data warehouse: Integrated, subject-oriented, time-variant, and nonvolatile spatial data repositorySpatial data integration: a big issue

Structure-specific formats (raster- vs. vector-based, OO vs. relational models, different storage and indexing, etc.)Vendor-specific formats (ESRI, MapInfo, Integraph, IDRISI, etc.)Geo-specific formats (geographic vs. equal area projection, etc.)

Spatial data cube: multidimensional spatial databaseBoth dimensions and measures may contain spatial components

Page 35: Lecture 9 Spatial Data Mining - Fudan Universityadmis.fudan.edu.cn/member/sgzhou/courses/data... · Daily data for temperature, precipitation, wind velocity, etc. Data warehouse using

Data Mining: Tech. & Appl.

Dimensions and Measures in Spatial Data Warehouse

Dimensionsnon-spatial

e.g. “25-30 degrees”generalizes to“hot” (both are strings)

spatial-to-nonspatiale.g. Seattle generalizes to description “Pacific Northwest” (as a string)

spatial-to-spatiale.g. Seattle generalizes to Pacific Northwest (as a spatial region)

Measuresnumerical (e.g. monthly revenue of a region)

distributive (e.g. count, sum)algebraic (e.g. average)holistic (e.g. median, rank)

spatialcollection of spatial pointers (e.g. pointers to all regions with temperature of 25-30 degrees in July)

Page 36: Lecture 9 Spatial Data Mining - Fudan Universityadmis.fudan.edu.cn/member/sgzhou/courses/data... · Daily data for temperature, precipitation, wind velocity, etc. Data warehouse using

Data Mining: Tech. & Appl.

Spatial-to-Spatial Generalization

Generalize detailed geographic points into clustered regions, such as businesses, residential, industrial, or agricultural areas, according to land usageRequires the merging of a set of geographic areas by spatial operations

Dissolve

Merge

Clip

Intersect

Union

Page 37: Lecture 9 Spatial Data Mining - Fudan Universityadmis.fudan.edu.cn/member/sgzhou/courses/data... · Daily data for temperature, precipitation, wind velocity, etc. Data warehouse using

Data Mining: Tech. & Appl.

Example: British Columbia Weather Pattern Analysis

InputA map with about 3,000 weather probes scattered in B.C.Daily data for temperature, precipitation, wind velocity, etc.Data warehouse using star schema

OutputA map that reveals patterns: merged (similar) regions

GoalsInteractive analysis (drill-down, slice, dice, pivot, roll-up)Fast response timeMinimizing storage space used

ChallengeA merged region may contain hundreds of “primitive” regions (polygons)

Page 38: Lecture 9 Spatial Data Mining - Fudan Universityadmis.fudan.edu.cn/member/sgzhou/courses/data... · Daily data for temperature, precipitation, wind velocity, etc. Data warehouse using

Data Mining: Tech. & Appl.

Star Schema of the BC Weather WarehouseSpatial data warehouse

Dimensionsregion_nametimetemperatureprecipitation

Measurementsregion_mapareacount

Fact tableDimension table

Page 39: Lecture 9 Spatial Data Mining - Fudan Universityadmis.fudan.edu.cn/member/sgzhou/courses/data... · Daily data for temperature, precipitation, wind velocity, etc. Data warehouse using

Data Mining: Tech. & Appl.

Dynamic Merging of Spatial Objects

Materializing (precomputing) all?—too much storage spaceOn-line merge?—slow, expensivePrecompute rough approximations?—accuracy trade offA better way: object-based, selective (partial) materialization

Page 40: Lecture 9 Spatial Data Mining - Fudan Universityadmis.fudan.edu.cn/member/sgzhou/courses/data... · Daily data for temperature, precipitation, wind velocity, etc. Data warehouse using

Data Mining: Tech. & Appl.

Methods for Computing Spatial Data Cubes

On-line aggregation: collect and store pointers to spatial objects in a spatial data cube

expensive and slow, need efficient aggregation techniquesPrecompute and store all the possible combinations

huge space overheadPrecompute and store rough approximations in a spatial data cube

accuracy trade-offSelective computation: only materialize those which will be accessed frequently

a reasonable choice

Page 41: Lecture 9 Spatial Data Mining - Fudan Universityadmis.fudan.edu.cn/member/sgzhou/courses/data... · Daily data for temperature, precipitation, wind velocity, etc. Data warehouse using

Data Mining: Tech. & Appl.

OutlineSpatial DatabasesSpatial Data MiningSpatial Data WarehousingSpatial Data Mining MethodsSummaryReferences

Page 42: Lecture 9 Spatial Data Mining - Fudan Universityadmis.fudan.edu.cn/member/sgzhou/courses/data... · Daily data for temperature, precipitation, wind velocity, etc. Data warehouse using

Data Mining: Tech. & Appl.

Spatial Mining TasksSpatial correlationSpatial regressionSpatial associationSpatial co-locationSpatial classificationSpatial clusteringSpatial outlier detection

Page 43: Lecture 9 Spatial Data Mining - Fudan Universityadmis.fudan.edu.cn/member/sgzhou/courses/data... · Daily data for temperature, precipitation, wind velocity, etc. Data warehouse using

Data Mining: Tech. & Appl.

Spatial Auto-correlation (SA)First Law of Geography

All things are related, but nearby things are more related than distant things

Tobler [1970]Examples

People with similar backgrounds tend to live in the same areaEconomies of nearby regions tend to be similarChanges in temperature occur gradually over space (and time)

Page 44: Lecture 9 Spatial Data Mining - Fudan Universityadmis.fudan.edu.cn/member/sgzhou/courses/data... · Daily data for temperature, precipitation, wind velocity, etc. Data warehouse using

Data Mining: Tech. & Appl.

Spatial Correlation MeasuresSpatial Autocorrelation

Measuresdistance-based(e.g., K-function)neighbor-based(e.g., Moran’s I)

Spatial Cross-CorrelationMeasures

distance-based, e.g., cross K-function

Page 45: Lecture 9 Spatial Data Mining - Fudan Universityadmis.fudan.edu.cn/member/sgzhou/courses/data... · Daily data for temperature, precipitation, wind velocity, etc. Data warehouse using

Data Mining: Tech. & Appl.

Moran’s I MeasureDefinition

z= {x1 −x^-, . . . , xn − x^-}xi : data values; x^-: mean of x; n: number of dataW is the row-normalized contiguity matrix

Page 46: Lecture 9 Spatial Data Mining - Fudan Universityadmis.fudan.edu.cn/member/sgzhou/courses/data... · Daily data for temperature, precipitation, wind velocity, etc. Data warehouse using

Data Mining: Tech. & Appl.

Moran’s I MeasureRanges between -1 and +1

higher positive value ⇒ high SA, Cluster, Attractlower negative value ⇒ interspersed, de-clustered, repel

e.g., spatial randomness ⇒ MI = 0e.g., distribution of vegetation durability ⇒MI = 0.7e.g., checker board ⇒ MI = -1

Page 47: Lecture 9 Spatial Data Mining - Fudan Universityadmis.fudan.edu.cn/member/sgzhou/courses/data... · Daily data for temperature, precipitation, wind velocity, etc. Data warehouse using

Data Mining: Tech. & Appl.

K-FunctionK-function Definition:

Test against randomness for point patternK(h) = λ−1E[number of events within distance h of an arbitrary event]

λ is intensity of eventModel departure from randomness in a wide range of scales

Page 48: Lecture 9 Spatial Data Mining - Fudan Universityadmis.fudan.edu.cn/member/sgzhou/courses/data... · Daily data for temperature, precipitation, wind velocity, etc. Data warehouse using

Data Mining: Tech. & Appl.

K-Function: ExampleFor Poisson complete spatial randomness(csr): K(h) = πh2

Page 49: Lecture 9 Spatial Data Mining - Fudan Universityadmis.fudan.edu.cn/member/sgzhou/courses/data... · Daily data for temperature, precipitation, wind velocity, etc. Data warehouse using

Data Mining: Tech. & Appl.

Cross-CorrelationCross K-Function Definition

Kij(h) = λ−1 E [number of type j event within distance h of a randomly chosen type i event]Cross K-function of some pair of spatial feature typesExample

Which pairs are frequently co-located?

Page 50: Lecture 9 Spatial Data Mining - Fudan Universityadmis.fudan.edu.cn/member/sgzhou/courses/data... · Daily data for temperature, precipitation, wind velocity, etc. Data warehouse using

Data Mining: Tech. & Appl.

Cross-Correlation: Example

Page 51: Lecture 9 Spatial Data Mining - Fudan Universityadmis.fudan.edu.cn/member/sgzhou/courses/data... · Daily data for temperature, precipitation, wind velocity, etc. Data warehouse using

Data Mining: Tech. & Appl.

Cross-Correlation: Example

Page 52: Lecture 9 Spatial Data Mining - Fudan Universityadmis.fudan.edu.cn/member/sgzhou/courses/data... · Daily data for temperature, precipitation, wind velocity, etc. Data warehouse using

Data Mining: Tech. & Appl.

Location PredictionGiven

n spatial objects:d different features / maps:a dependent (target) class:a family of function mappings:

Finda classifier predicting the location of objects of the given classes which maximizes classification accuracy

Page 53: Lecture 9 Spatial Data Mining - Fudan Universityadmis.fudan.edu.cn/member/sgzhou/courses/data... · Daily data for temperature, precipitation, wind velocity, etc. Data warehouse using

Data Mining: Tech. & Appl.

Location Prediction: Exampleknown nest locationsTask: predict other nest locations using the maps below

Page 54: Lecture 9 Spatial Data Mining - Fudan Universityadmis.fudan.edu.cn/member/sgzhou/courses/data... · Daily data for temperature, precipitation, wind velocity, etc. Data warehouse using

Data Mining: Tech. & Appl.

Location Prediction: MethodsPrediction

• Continuous: trend, e.g., regressionLocation aware: spatial autoregressive model(SAR)

Discrete: classification, e.g., Bayesian classifier

Location aware: Markov random fields(MRF)

Page 55: Lecture 9 Spatial Data Mining - Fudan Universityadmis.fudan.edu.cn/member/sgzhou/courses/data... · Daily data for temperature, precipitation, wind velocity, etc. Data warehouse using

Data Mining: Tech. & Appl.

Spatial Contextual Model: SARSpatial Autoregressive Model (SAR)

y = ρWy + X β + εAssume that dependent values y are related to each other yi = f(yj) for i ≠ jDirectly model spatial autocorrelation using W

Geographically Weighted Regression (GWR)A method of analyzing spatially varying relationships

parameter estimates vary locallyModels with Gaussian, logistic or Poisson forms can be fittedExample: y = X β′ + ε′.

where β′ and ε′ are location dependent

Page 56: Lecture 9 Spatial Data Mining - Fudan Universityadmis.fudan.edu.cn/member/sgzhou/courses/data... · Daily data for temperature, precipitation, wind velocity, etc. Data warehouse using

Data Mining: Tech. & Appl.

Spatial Contextual Model: MRFMarkov Random Fields Gaussian Mixture Model (MRF-GMM)

Undirected graph to represent the interdependency relationship of random variablesA variable depends only on neighborsIndependent of all other variablesfC(Si) independent of fC(Sj) if W(si, sj) = 0Predict fC(Si) , given feature value X and neighborhood class label CN

Assume Pr(ci), Pr(X,CN|ci) and Pr(X,CN) are mixture of Gaussian distributions

Page 57: Lecture 9 Spatial Data Mining - Fudan Universityadmis.fudan.edu.cn/member/sgzhou/courses/data... · Daily data for temperature, precipitation, wind velocity, etc. Data warehouse using

Data Mining: Tech. & Appl.

Spatial Association RulesA spatial association rule is an association rule containing at least one spatial neighborhood relationSpatial association rule: A ⇒ B [s%, c%]

A and B are sets of spatial or non-spatial predicatesTopological relations: intersects, overlaps, disjoint, etc.Spatial orientations: left_of, west_of, under, etc.Distance information: close_to, within_distance, etc.

s% is the support and c% is the confidence of the rule

Page 58: Lecture 9 Spatial Data Mining - Fudan Universityadmis.fudan.edu.cn/member/sgzhou/courses/data... · Daily data for temperature, precipitation, wind velocity, etc. Data warehouse using

Data Mining: Tech. & Appl.

Spatial Association Rules Mining Methods

Examplesis_a(x, large_town) ^ intersect(x, highway) => adjacent_to(x, water)

[7%, 85%]Two approaches

Transaction based approachTransaction free approach

Page 59: Lecture 9 Spatial Data Mining - Fudan Universityadmis.fudan.edu.cn/member/sgzhou/courses/data... · Daily data for temperature, precipitation, wind velocity, etc. Data warehouse using

Data Mining: Tech. & Appl.

Transaction-Based ApproachDetermine object type of interest (target object type)Transform spatial database into set of transactions

Transaction = one target object plus set of neighboring objects

neighborhood definition is crucialApply (modified) algorithm for mining frequent itemsets

e.g., Apriori algorithm

Page 60: Lecture 9 Spatial Data Mining - Fudan Universityadmis.fudan.edu.cn/member/sgzhou/courses/data... · Daily data for temperature, precipitation, wind velocity, etc. Data warehouse using

Data Mining: Tech. & Appl.

Progressive Refinement Mining of Spatial Association Rules

Hierarchy of spatial neighborhood relations “g_close_to” may be specialized to near_by, touch, intersect, contain, etc.Basic Idea: if two objects do not fulfill a rough relationship (such as intersect) they cannot fulfill a refined relationship (such as meet)Two-step procedure for spatial neighborhood relations

Step 1: rough spatial computation (as a filter)Using MBR or R-tree for rough estimation

Step2: Detailed spatial algorithm (as refinement)Is very expensive (e.g. intersect test)Apply only to those objects which have passed the rough spatial association test (no less than min_support)

Page 61: Lecture 9 Spatial Data Mining - Fudan Universityadmis.fudan.edu.cn/member/sgzhou/courses/data... · Daily data for temperature, precipitation, wind velocity, etc. Data warehouse using

Data Mining: Tech. & Appl.

Example

Page 62: Lecture 9 Spatial Data Mining - Fudan Universityadmis.fudan.edu.cn/member/sgzhou/courses/data... · Daily data for temperature, precipitation, wind velocity, etc. Data warehouse using

Data Mining: Tech. & Appl.

Example

Page 63: Lecture 9 Spatial Data Mining - Fudan Universityadmis.fudan.edu.cn/member/sgzhou/courses/data... · Daily data for temperature, precipitation, wind velocity, etc. Data warehouse using

Data Mining: Tech. & Appl.

Transaction-Free ApproachTransaction-based approach requires target object type, which restricts set of rules discoveredAlternative approach: based on cliques of neighboring objects

R-proximity neighborhoodsDatabase: set of spatial features of different types (e.g., A, B, C):

Example of R-proximity neighborhoods

Page 64: Lecture 9 Spatial Data Mining - Fudan Universityadmis.fudan.edu.cn/member/sgzhou/courses/data... · Daily data for temperature, precipitation, wind velocity, etc. Data warehouse using

Data Mining: Tech. & Appl.

Transaction-Free ApproachCo-location: set of feature types, e.g., {A,C} or {A,B,C}Participation ratio of fi in c: proportion of instances of feature (type) fi participating in co-location c

participation ratio of A in {A,B} = 2/3 = 0.67participation ratio of B in {A,B} = 2/2 = 1.0

Participation index: minimum participation ratio over all features fi in a co-location c

participation index of {A,B} = min{0.67, 1.0} = 0.67Participation index is an upper bound of the cross-K function (Spatial Statistics)Participation index is monotonically decreasing with increasing co-location size

Goal: find all co-locations with minimum participation index

Page 65: Lecture 9 Spatial Data Mining - Fudan Universityadmis.fudan.edu.cn/member/sgzhou/courses/data... · Daily data for temperature, precipitation, wind velocity, etc. Data warehouse using

Data Mining: Tech. & Appl.

The MethodAlternatives for generation co-location candidates

combinatorial join, geometric join, hybrid approachPruning of candidates using the participation indexMulti-resolution pruning

Start with coarse resolution neighborhood definitionPrune if coarse resolution participation falls below threshold

anti-monotone because of spatial auto-correlationDecrease resolution of neighborhood definition

Page 66: Lecture 9 Spatial Data Mining - Fudan Universityadmis.fudan.edu.cn/member/sgzhou/courses/data... · Daily data for temperature, precipitation, wind velocity, etc. Data warehouse using

Data Mining: Tech. & Appl.

Spatial Cluster Analysis

Mining clusters: k-means, k-medoids, hierarchical, density-based, etc.Analysis of distinct features of the clusters

Page 67: Lecture 9 Spatial Data Mining - Fudan Universityadmis.fudan.edu.cn/member/sgzhou/courses/data... · Daily data for temperature, precipitation, wind velocity, etc. Data warehouse using

Data Mining: Tech. & Appl.

Constraints-Based ClusteringConstraints on individual objects

Simple selection of relevant objects before clustering

Clustering parameters as constraintsK-means, density-based: radius, min-# of points

Constraints specified on clusters using SQL aggregates

Sum of the profits in each cluster > $1 millionConstraints imposed by physical obstacles

Clustering with obstructed distance

Page 68: Lecture 9 Spatial Data Mining - Fudan Universityadmis.fudan.edu.cn/member/sgzhou/courses/data... · Daily data for temperature, precipitation, wind velocity, etc. Data warehouse using

Data Mining: Tech. & Appl.

Constraint-Based Clustering: Planning ATM Locations

Mountain

RiverBridg

e

Spatial data with obstacles

C1

C2C3

C4

Clustering without takingobstacles into consideration

Page 69: Lecture 9 Spatial Data Mining - Fudan Universityadmis.fudan.edu.cn/member/sgzhou/courses/data... · Daily data for temperature, precipitation, wind velocity, etc. Data warehouse using

Data Mining: Tech. & Appl.

Mining Spatiotemporal Data

Spatiotemporal dataData has spatial extensions and changes with time Ex: Forest fire, moving objects, hurricane & earthquakes

Automatic anomaly detection in massive moving objects

Moving objects are ubiquitous: GPS, radar, etc.Ex: Maritime vessel surveillance

Problem: Automatic anomaly detection

Page 70: Lecture 9 Spatial Data Mining - Fudan Universityadmis.fudan.edu.cn/member/sgzhou/courses/data... · Daily data for temperature, precipitation, wind velocity, etc. Data warehouse using

Data Mining: Tech. & Appl.

Analysis: Mining Anomaly in Moving Objects

Raw analysis of collected data does not fully convey “anomaly” informationMore effective analysis relies on higher semantic featuresExamples:

A speed boat moving quickly in open waterA fishing boat moving slowly into the docksA yacht circling slowly around landmark during night hours

Page 71: Lecture 9 Spatial Data Mining - Fudan Universityadmis.fudan.edu.cn/member/sgzhou/courses/data... · Daily data for temperature, precipitation, wind velocity, etc. Data warehouse using

Data Mining: Tech. & Appl.

Framework: Motif-Based Feature Analysis

Motif-based representationA motif is a prototypical movement patternView a movement path as a sequence of motif expressions

Motif-oriented feature spaceAutomated motif feature extractionSemantic-level features

ClassificationAnomaly detection via classificationHigh dimensional classifier

Page 72: Lecture 9 Spatial Data Mining - Fudan Universityadmis.fudan.edu.cn/member/sgzhou/courses/data... · Daily data for temperature, precipitation, wind velocity, etc. Data warehouse using

Data Mining: Tech. & Appl.

Movement MotifsPrototypical movement of object

Right-turn, U-turnCan be either defined by an expert or discovered automatically from data

Defined in our frameworkExtracted in movement pathsPath becomes a set ofmotif expressions

Page 73: Lecture 9 Spatial Data Mining - Fudan Universityadmis.fudan.edu.cn/member/sgzhou/courses/data... · Daily data for temperature, precipitation, wind velocity, etc. Data warehouse using

Data Mining: Tech. & Appl.

Motif Expression AttributesEach motif expression has attributes (e.g., speed, location, size)Attributes express how a motif was expressedConveys semantic information useful for classification

a tight circle at 30mph near landmark Y.A tight circle at 10mph in location X

Page 74: Lecture 9 Spatial Data Mining - Fudan Universityadmis.fudan.edu.cn/member/sgzhou/courses/data... · Daily data for temperature, precipitation, wind velocity, etc. Data warehouse using

Data Mining: Tech. & Appl.

Motif-Oriented Feature SpaceAttributes describe how motifs are expressedLet there be A attributes, each path is a set of (A+1)-tuples

{(mi, v1, v2, …, vA), (mj, v1, v2, …, vA)}Naïve Feature space construction

1. Let each distinct (mj, v1, v2, …, vA) be a feature2. If path exhibits a particular motif-expression, its

value is 1. Otherwise, its value is 0.

Page 75: Lecture 9 Spatial Data Mining - Fudan Universityadmis.fudan.edu.cn/member/sgzhou/courses/data... · Daily data for temperature, precipitation, wind velocity, etc. Data warehouse using

Data Mining: Tech. & Appl.

Analyzing Naïve Feature SpaceLet there be M distinct motifs and V different possible values for each of the A attributesSize of feature space is

M * VA

V is usually very large due to high granularity of measurements

E.g., seconds for time or meters for locationModest values for A and M could lead to extremely high dimensional feature space

Page 76: Lecture 9 Spatial Data Mining - Fudan Universityadmis.fudan.edu.cn/member/sgzhou/courses/data... · Daily data for temperature, precipitation, wind velocity, etc. Data warehouse using

Data Mining: Tech. & Appl.

More on Naïve Feature SpaceHigh dimensional feature space could make effective learning hardMore importantly, high granular features make generalization impossible!

(mj, v1, 10:01am, …, vA) vs (mj, v1, 10:02am, …, vA)Learning on one feature has no effect on another feature

Intuition: should have features that describe general high-level concepts

“Early Morning” instead of 2:03am, 2:04am, …“Near Location X” instead of “50m west of Location X”

Solution: Clustering on naïve feature space

Page 77: Lecture 9 Spatial Data Mining - Fudan Universityadmis.fudan.edu.cn/member/sgzhou/courses/data... · Daily data for temperature, precipitation, wind velocity, etc. Data warehouse using

Data Mining: Tech. & Appl.

Motif Feature ExtractionFor each motif attribute, cluster values to form higher level conceptsFrequency and distribution in learning data dictates the final clustersHierarchical micro-clustering

Small clusters so concepts are not merged unnecessarilyHierarchy allows flexibility in describing objects

For example: “afternoon” vs. “early afternoon” and “late afternoon”

Page 78: Lecture 9 Spatial Data Mining - Fudan Universityadmis.fudan.edu.cn/member/sgzhou/courses/data... · Daily data for temperature, precipitation, wind velocity, etc. Data warehouse using

Data Mining: Tech. & Appl.

Feature ClusteringRough, fast micro-clustering method based on BIRCH (SIGMOD’96)A micro-cluster is represented by a CF Vector: CF = (n, LS, SS)Centroid and radius can be calculated from CF vectorCF Additive Theorem allows two CF Vectors to be combined quickly and losslesslyCF Tree is a hierarchy of CF Vectors

A parent CF Vector holds information for all descendent CF VectorsLeaf CF Vector corresponds to a set of actual points

Page 79: Lecture 9 Spatial Data Mining - Fudan Universityadmis.fudan.edu.cn/member/sgzhou/courses/data... · Daily data for temperature, precipitation, wind velocity, etc. Data warehouse using

Data Mining: Tech. & Appl.

More on Feature ClusteringBuild CF Tree from raw data, much like B-treeTwo parameters in clustering

B: branching factor of CF TreeT: radius threshold of CF Vector

Parameters control how fine micro-clusters are constructedHierarchical agglomerative clustering on leaves of CF TreeEntire process is efficient: O(N)

Page 80: Lecture 9 Spatial Data Mining - Fudan Universityadmis.fudan.edu.cn/member/sgzhou/courses/data... · Daily data for temperature, precipitation, wind velocity, etc. Data warehouse using

Data Mining: Tech. & Appl.

Extracted Feature SpaceLeaf nodes in final clustering become the new featuresMore general than the original naïve feature spaceDimensionality could still be moderately highUse Support Vector Machine for classification

Page 81: Lecture 9 Spatial Data Mining - Fudan Universityadmis.fudan.edu.cn/member/sgzhou/courses/data... · Daily data for temperature, precipitation, wind velocity, etc. Data warehouse using

Data Mining: Tech. & Appl.

ExperimentsSynthetic Data

Generated at motif-expression levelAbnormal paths are injected with abnormal motif-expressions

ClassifiersSVM using naïve feature spaceSVM using extracted feature spaces of varying refinement levels

Page 82: Lecture 9 Spatial Data Mining - Fudan Universityadmis.fudan.edu.cn/member/sgzhou/courses/data... · Daily data for temperature, precipitation, wind velocity, etc. Data warehouse using

Data Mining: Tech. & Appl.

Experiment

Page 83: Lecture 9 Spatial Data Mining - Fudan Universityadmis.fudan.edu.cn/member/sgzhou/courses/data... · Daily data for temperature, precipitation, wind velocity, etc. Data warehouse using

Data Mining: Tech. & Appl.

Experiment (2)

Page 84: Lecture 9 Spatial Data Mining - Fudan Universityadmis.fudan.edu.cn/member/sgzhou/courses/data... · Daily data for temperature, precipitation, wind velocity, etc. Data warehouse using

Data Mining: Tech. & Appl.

Summary: Moving Object Anomaly Detection

Higher level semantic analysis of moving objects yields better resultsAutomated feature extractionFuture work

Automatic determination of t parameterBetter use of feature space hierarchyOther analysis, such as clustering and local outlier detection for anomaly detectionMining other knowledge for moving objects

Page 85: Lecture 9 Spatial Data Mining - Fudan Universityadmis.fudan.edu.cn/member/sgzhou/courses/data... · Daily data for temperature, precipitation, wind velocity, etc. Data warehouse using

Data Mining: Tech. & Appl.

OutlineSpatial Databases Spatial Data MiningSpatial Data WarehousingSpatial Data Mining MethodsSummaryReferences

Page 86: Lecture 9 Spatial Data Mining - Fudan Universityadmis.fudan.edu.cn/member/sgzhou/courses/data... · Daily data for temperature, precipitation, wind velocity, etc. Data warehouse using

Data Mining: Tech. & Appl.

Summary (1)What’s Special About Spatial Data Mining?

Input DataStatistical FoundationOutput PatternsComputational Process

Page 87: Lecture 9 Spatial Data Mining - Fudan Universityadmis.fudan.edu.cn/member/sgzhou/courses/data... · Daily data for temperature, precipitation, wind velocity, etc. Data warehouse using

Data Mining: Tech. & Appl.

Summary (2)

Page 88: Lecture 9 Spatial Data Mining - Fudan Universityadmis.fudan.edu.cn/member/sgzhou/courses/data... · Daily data for temperature, precipitation, wind velocity, etc. Data warehouse using

Data Mining: Tech. & Appl.

References (1)J. Roddick, K. Hornsby and M. Spiliopoulou, Yet AnotherBibliography of Temporal, Spatial Spatio-temporal Data Mining Research, KDD Workshop, 2001S. Shekhar, C. T. Lu, and P. Zhang, A Unified Approach to Detecting Spatial Outliers, GeoInformatica, 7(2), KluwerAcademic Publishers, 2003S. Shekhar and S. Chawla, Spatial Databases: A Tour, Prentice Hall, 2003S. Shekhar, P. Schrater, R. Vatsavai, W. Wu, and S. Chawla, Spatial Contextual Classification and Prediction Models for Mining Geospatial Data, IEEE Transactions on Multimedia (special issue on Multimedia Databases), 2002

Page 89: Lecture 9 Spatial Data Mining - Fudan Universityadmis.fudan.edu.cn/member/sgzhou/courses/data... · Daily data for temperature, precipitation, wind velocity, etc. Data warehouse using

Data Mining: Tech. & Appl.

References (2)S. Shekhar and Y. Huang, Discovering Spatial Co-location Patterns: A Summary of Results ,SSTD, 2001A. Fotheringham, C. Brunsdon, and M. Charlton, Geographically Weighted Regression : The Analysis of Spatially Varying Relationships, John Wiley & Sons, 2002.P. Tan and M. Steinbach and V. Kumar and C. Potter and S. Klooster and A. Torregrosa, Finding Spatio-Temporal Patterns in Earth Science Data, KDD Workshop on Temporal Data Mining, 2001P. Zhang, Y. Huang, S. Shekhar, and V. Kumar, Exploiting Spatial Autocorrelation to Efficiently Process Correlation-Based Similarity Queries, SSTD, 2003