On Discovering Moving Clusters in Spatio-temporal Data Panos Kalnis National University of Singapore...

Post on 20-Dec-2015

214 views 1 download

Tags:

Transcript of On Discovering Moving Clusters in Spatio-temporal Data Panos Kalnis National University of Singapore...

On Discovering Moving Clusters in Spatio-temporal Data

Panos KalnisNational University of Singapore

Nikos MamoulisUniversity of Hong Kong

Spiridon BakirasHong Kong University of Science and Technology

What is a Moving Cluster? Dense clusters of objects that move

similarly for a long time period Not necessarily the same objects during

the lifetime of the cluster Examples

Migrating animals Convoy of cars Military applications

Solutions: Efficient exact and approximate algorithms

Problem Formulation

Example: Moving cluster

1

1

ii

ii

cc

cc

5.0321 ccc

6

3

21

21

cc

cc

5

4

32

32

cc

cc

Related Work (Static) Partition-based clustering (k-medoids) Hierarchical clustering (BIRCH, CURE) Density-based clustering (DBSCAN)

ε

ε

MinPts=3

Related Work (Moving Objects) Grouping trajectories [Vlachos et.al, ICDE 02]

Trajectory cluster: Constant set of objects through its lifetime

Only similar movement; no space proximity Dense areas over time [Hadjieleftheriou et.al, SSTD 03]

Static dense regions No common objects between regions in sequence

Incremental DBSCAN/OPTICS [Ester et.al, VLDB 98]

Only a small percentage of objects moves Maintaining Data Bubbles [Nassar et.al, SIGMOD 04]

Redistributes updated objects in existing bubbles

MC1: The Straight-forward approach

G: set of moving clusters Apply clustering to next

timeslice Si

Expand moving clusters in G Add new moving clusters to G Report ending clusters

Hash-based DBSCAN

2

2

Memory:

10M objects with 1GB RAM

2||||

2gSO i

MC1 is inefficient!

1. Checks all possible combination of clusters in consecutive timeslices

2. Performs clustering for every timeslice

MC2: Minimizing Redundant Checks

Clustering in every timeslice

Select a random object in c1

Search the object in S2

Repeat for remaining objects

Max: (1-θ)|ci| objects

c1c2 is a moving cluster

Ambiguity Cases: θ<0.5

3

1 {c0c1, c2}

{c0c2, c1}

MC3: Approximate Moving Clusters Intuition: Many clusters will remain the

same even if objects move Avoid performing clustering in every

timeslice For an object o

If o belongs to cluster c in timeslice Si

Assume that o also belongs to c in the next timeslice (notice: objects may have moved)

Refine clusters Hash new clusters in a grid Legal cluster:

Does not meet/intersect with other clusters

It is connected (cells meet) Objects in legal clusters are

not considered further For the rest of the objects,

perform clustering Possible inaccuracies!!!

Minimize Error

Perform exact clustering to absorb (may not eliminate) the accumulated error

Period for exact clustering: Grows linearly, drops exponentially

Exact clustering: If more that α|G| clusters have been added/removed

Experimental Evaluation 10K-50K objects per timeslice 50-100 timeslices, up to 5M

objects Linux, C++, 1.3GHz CPU,

1.2GB RAM Generator: Clusters

move/rotate, objects appear/disappear

recallprecision

recallprecisionF

2

Varying data size (10K-50K per timeslice)

Avg: 87%

θ=0.9, α=0.1 Larger dataset: larger clusters, more interactions

Varying number of clusters (100-800 per timeslice)

5M objects, θ=0.9, α=0.1 Many clusters: Reaches error threshold fast

96%

87% 73%

Varying α

5M objects, θ=0.9, 800 clusters α small: may not recover!!!

Varying α for different agilities

Low agility: Fewer errors faster

MC3 for varying θ

5M objects, α=0.1, 800 clusters θ large: incorrect clusters are pruned for not

satisfying the θ criterion

Conclusions Moving clusters

Objects may move/change Exact and approximate solutions

Future work Automatic setting of parameter α Better error estimation Constraints (e.g, moving cluster must span at

least k timeslices)

Questions?