April 8, 2007

36
Trajectory Outlier Detection: A Partition-and- Detect Framework 1 04/08/08 April 8, 2007 Trajectory Outlier Detection: A Partition-and-Detect Framework Jae-Gil Lee, Jiawei Han, and Xiaolei Li Department of Computer Science University of Illinois at Urbana-Champaign ICDE 2008

description

ICDE 2008. Trajectory Outlier Detection: A Partition-and-Detect Framework. April 8, 2007. Jae-Gil Lee, Jiawei Han, and Xiaolei Li Department of Computer Science University of Illinois at Urbana-Champaign. Table of Contents. Motivation Partition-and- Detect Framework - PowerPoint PPT Presentation

Transcript of April 8, 2007

Page 1: April 8, 2007

Trajectory Outlier Detection: A Partition-and-Detect Framework

104/08/08

April 8, 2007

Trajectory Outlier Detection: A Partition-and-Detect Framework

Jae-Gil Lee, Jiawei Han, and Xiaolei Li

Department of Computer ScienceUniversity of Illinois at Urbana-Champaign

ICDE 2008

Page 2: April 8, 2007

Trajectory Outlier Detection: A Partition-and-Detect Framework

204/08/08

Table of ContentsTable of ContentsTable of ContentsTable of Contents

Motivation

Partition-and-Detect Framework

Outlier Detection Algorithm: TRAOD• Partitioning Phase (Simple)

• Detection Phase

• Partitioning Phase (Enhanced)

Performance Evaluation

Related Work

Conclusions

Page 3: April 8, 2007

Trajectory Outlier Detection: A Partition-and-Detect Framework

304/08/08

Outlier DetectionOutlier DetectionOutlier DetectionOutlier Detection

Definition: the process of detecting a data object that is grossly different from or inconsistent with the remaining set of data

Applications: the detection of credit card fraud, the monitoring of criminal activities in electronic commerce, etc.

Algorithms: distribution-based, distance-based, density-based, and deviation-based

Target data: previous research has mainly dealt with outlier detection of point data

Page 4: April 8, 2007

Trajectory Outlier Detection: A Partition-and-Detect Framework

404/08/08

Analysis on Trajectory DataAnalysis on Trajectory DataAnalysis on Trajectory DataAnalysis on Trajectory Data

Tremendous amounts of trajectory data of moving objects are being collected• Example: vehicle positioning data, hurricane tracking data,

animal movement data, etc.

Trajectory outlier detection has many important, real-world applications• Detection of suspicious persons in video surveillance• Analysis of unusual air-mass trajectories in meteorology• …

A powerful outlier detection algorithm for trajectories is needed urgently

Page 5: April 8, 2007

Trajectory Outlier Detection: A Partition-and-Detect Framework

504/08/08

LimitationsLimitations of Existing Algorithms of Existing AlgorithmsLimitationsLimitations of Existing Algorithms of Existing Algorithms

Knorr et al. [5] have presented one of very few attempts• Define the distance between two whole trajectories using the summ

ary information (e.g., the coordinates of the starting and ending points)

• Apply a distance-based approach to detection of trajectory outliers

Existing algorithms might not be able to detect outlying portions of trajectories• Example: TR3 is not detected as an outlier since its overall behav

ior is similar to those of neighboring trajectoriesTR5

TR1

TR4TR3TR2

An outlying sub-trajectory

Page 6: April 8, 2007

Trajectory Outlier Detection: A Partition-and-Detect Framework

604/08/08

Discovery of OutlyingDiscovery of Outlying SubSub-Trajectories-TrajectoriesDiscovery of OutlyingDiscovery of Outlying SubSub-Trajectories-Trajectories

Discovery of outlying sub-trajectories is very useful in the real world• Example: Sudden changes in hurricane’s path [10]

We propose the partition-and-detect framework

Usual trajectories

Sudden change

Page 7: April 8, 2007

Trajectory Outlier Detection: A Partition-and-Detect Framework

704/08/08

The The Partition-and-DetectPartition-and-Detect Framework FrameworkThe The Partition-and-DetectPartition-and-Detect Framework Framework

Consists of two phases: partitioning and detection

TR5

TR1

TR4TR3TR2

A set of trajectories

(1) Partition

(2) DetectTR3

A set of trajectory partitions

An outlier

Outlying trajectory partitions

Note: A set of outlying trajectory partitions indicates an outlying sub-trajectory

Page 8: April 8, 2007

Trajectory Outlier Detection: A Partition-and-Detect Framework

804/08/08

The Problem StatementThe Problem StatementThe Problem StatementThe Problem Statement

Given a set of trajectories II = {TR1, …, TRn}, our algorithm generates a set of outliers OO = {O1, …, Om} with outlying trajectory partitions for each Oi

Necessary definitions:• A trajectory is a sequence of multi-dimensional points, which is den

oted as TRi = p1p2p3 … pj … pleni; a trajectory partition (t-partition for short) is a line segment pipj (i < j), where pi and pj are the points chosen from the same trajectory

• A t-partition is outlying if it does not have a sufficient number of similar neighbors

• A trajectory is an outlier if it contains a non-negligible amount of outlying t-partitions

Page 9: April 8, 2007

Trajectory Outlier Detection: A Partition-and-Detect Framework

904/08/08

The Outlier Detection Algorithm: The Outlier Detection Algorithm: TRAODTRAODThe Outlier Detection Algorithm: The Outlier Detection Algorithm: TRAODTRAOD

Based on the partition-and-detect framework

Algorithm TRAOD (TRAjectory Outlier Detection) Input: A set of trajectories II = {TR1, …, TRn} Output: A set of outliers OO = {O1, …, Om} with outlying t-partitions for each Oi

Algorithm: /* Partitioning Phase */ 01: for each TR II do 02: Partition TR into a set LL of line segments; 03: Accumulate LL into a set DD; /* Detection Phase */ 04: for each P DD do 05: Mark P if it is an outlying t-partition; 06: for each TR I I do 07: Output TR if it is an outlier;

Page 10: April 8, 2007

Trajectory Outlier Detection: A Partition-and-Detect Framework

1004/08/08

Where We Are NowWhere We Are NowWhere We Are NowWhere We Are Now

/* Partitioning Phase */ 01: for each TR II do 02: Partition TR into a set LL of line segments

03: Accumulate LL into a set DD; /* Detection Phase */ 04: for each P DD do 05: Mark P if it is an outlying t-partition; 06: for each TR I I do 07: Output TR if it is an outlier;

by a simple strategy; by a two-level partitioning strategy;

Page 11: April 8, 2007

Trajectory Outlier Detection: A Partition-and-Detect Framework

1104/08/08

A Simple Partitioning Strategy (1/2)A Simple Partitioning Strategy (1/2)A Simple Partitioning Strategy (1/2)A Simple Partitioning Strategy (1/2)

Careless partitioning (especially, in a long length) could miss possible outliers• Example: Even though TRout behaves differently from its neighboring

trajectories, these differences are averaged out due to careless partitioning

Neighboring Trajectories

A t-partition

A trajectory TRout

Page 12: April 8, 2007

Trajectory Outlier Detection: A Partition-and-Detect Framework

1204/08/08

A Simple Partitioning Strategy (2/2)A Simple Partitioning Strategy (2/2)A Simple Partitioning Strategy (2/2)A Simple Partitioning Strategy (2/2)

A trajectory is partitioned at a base unit: the smallest meaningful unit of a trajectory in a given application• Example: The base unit can be every single point

Pros: high detection quality in general

Cons: poor performance due to a large number of t-partitions

remedied by a two-level partitioning strategy

Neighboring TrajectoriesA t-partition

A trajectory TRout

An outlying t-partition

Page 13: April 8, 2007

Trajectory Outlier Detection: A Partition-and-Detect Framework

1304/08/08

Where We Are NowWhere We Are NowWhere We Are NowWhere We Are Now

/* Partitioning Phase */ 01: for each TR II do 02: Partition TR into a set LL of line segments

03: Accumulate LL into a set DD; /* Detection Phase */ 04: for each P DD do 05: Mark P if it is an outlying t-partition; 06: for each TR I I do 07: Output TR if it is an outlier;

by a simple strategy; by a two-level partitioning strategy;

Page 14: April 8, 2007

Trajectory Outlier Detection: A Partition-and-Detect Framework

1404/08/08

Distance between T-PartitionsDistance between T-PartitionsDistance between T-PartitionsDistance between T-Partitions

The weighted sum of three components: the perpendicular distance( ), parallel distance( ), and angle distance( )• Adapted from similarity measures used in the domain of

pattern recognition [13]

||d

iL

jL

is ie

je

js1l

2ld

1||l 2||l

)sin(),(MIN 2||1||||

21

22

21

jLdlld

ll

lld

sp ep

d

dwdwdwLLdist ji ||||),(

d

Page 15: April 8, 2007

Trajectory Outlier Detection: A Partition-and-Detect Framework

1504/08/08

Trajectory Outliers Based on Trajectory Outliers Based on DistanceDistance (1/2) (1/2)Trajectory Outliers Based on Trajectory Outliers Based on DistanceDistance (1/2) (1/2)

Def. (a close trajectory):

Def. (an outlying t-partition):

D

jTR

iLD

jTR

iL

jL jL

iTRiTR

TRj is close to Li TRj is not close to Li

iLiTR

Li is an outlying t-partition Li is not an outlying t-

partition

Not close

≤ 1‒piLiTR

Close > 1‒p

Page 16: April 8, 2007

Trajectory Outlier Detection: A Partition-and-Detect Framework

1604/08/08

Trajectory Outliers Based on Trajectory Outliers Based on DistanceDistance (2/2) (2/2)Trajectory Outliers Based on Trajectory Outliers Based on DistanceDistance (2/2) (2/2)

Def. (an outlier):• A trajectory TRi is an outlier if

the sum of the lengths of all t-partitions in TRi

the sum of the lengths of outlying t-partitions in TRi≥ F

TRi

TRj

TRi is an outlier

TRj is not an outlier

FTRlen

len

i

)(

)(

FTRlen

len

j

)(

)(

Page 17: April 8, 2007

Trajectory Outlier Detection: A Partition-and-Detect Framework

1704/08/08

Incorporation of Incorporation of DensityDensity (1/2) (1/2)Incorporation of Incorporation of DensityDensity (1/2) (1/2)

The previous definition, as it is, has the local density problem• A t-partition in a dense region tends to have relatively a

larger number of close trajectories than that in a sparse region

T-Partitions in dense regions are favored!

Page 18: April 8, 2007

Trajectory Outlier Detection: A Partition-and-Detect Framework

1804/08/08

Incorporation of Incorporation of DensityDensity (2/2) (2/2)Incorporation of Incorporation of DensityDensity (2/2) (2/2)

Def. (the density of a t-partition):• The density of a t-partition Li is the number of t-partitions within the

distance σ from Li, where σ is the standard deviation of pairwise distances between t-partitions

Def. (the adjusting coefficient of a t-partition):

Adjustment by the density• The number of close trajectories is multiplied by the adjusting coeffi

cient adj(Li)

adj(Li) < 1.0 in a dense region

adj(Li) > 1.0 in a sparse region

the density of the t-partition Li

the average density of all t-partitionsadj(Li) =

Page 19: April 8, 2007

Trajectory Outlier Detection: A Partition-and-Detect Framework

1904/08/08

Guidelines for Parameter ValuesGuidelines for Parameter ValuesGuidelines for Parameter ValuesGuidelines for Parameter Values

Three parameters:• D corresponds to similar, p to sufficient, and F to non-

negligible

Remark: There is no universally correct parameter value even for the same data set and application

Our guideline: Resorts on user feedbackWant Many Outliers?

Have Many Trajectories?

Are Trajectories Short?

D

p

F

0.90 0.99

0.200.10

Smaller Larger

Page 20: April 8, 2007

Trajectory Outlier Detection: A Partition-and-Detect Framework

2004/08/08

Where We Are NowWhere We Are NowWhere We Are NowWhere We Are Now

/* Partitioning Phase */ 01: for each TR II do 02: Partition TR into a set LL of line segments

03: Accumulate LL into a set DD; /* Detection Phase */ 04: for each P DD do 05: Mark P if it is an outlying t-partition; 06: for each TR I I do 07: Output TR if it is an outlier;

by a simple strategy; by a two-level partitioning strategy;

Page 21: April 8, 2007

Trajectory Outlier Detection: A Partition-and-Detect Framework

2104/08/08

Two-Level Trajectory PartitioningTwo-Level Trajectory PartitioningTwo-Level Trajectory PartitioningTwo-Level Trajectory Partitioning

Objective • Achieves much higher performance than the simple strategy• Obtains the same result as that of the simple strategy; i.e.,

does not lose the quality of the result

Basic idea1. Partition a trajectory in coarse granularity first2. Partition a coarse t-partition in fine granularity only when

necessary

Main benefit• Narrows the search space that needs to be inspected in fine

granularity Many portions of trajectories can be pruned early on

Page 22: April 8, 2007

Trajectory Outlier Detection: A Partition-and-Detect Framework

2204/08/08

Intuition to Two-Level Trajectory PartitioningIntuition to Two-Level Trajectory PartitioningIntuition to Two-Level Trajectory PartitioningIntuition to Two-Level Trajectory Partitioning

If the distance between coarse t-partitions is very large (or small), the distances between their fine t-partitions is also very large (or small)

TRi

TRj

Coarse-Granularity Partitioning

Fine-Granularity Partitioning

Given two coarse t-partitions, can we know if the distance between any two fine t-partitions is greater than (or less than) D?

Page 23: April 8, 2007

Trajectory Outlier Detection: A Partition-and-Detect Framework

2304/08/08

Coarse-Granularity Partitioning*Coarse-Granularity Partitioning*Coarse-Granularity Partitioning*Coarse-Granularity Partitioning*

Try to maximize two rivalry measures• Preciseness: the difference between a trajectory and a set

of its coarse t-partitions should be as small as possible− Required for making the bounds tight

• Conciseness: the number of coarse t-partitions should be as small as possible

− Required for reducing the number of comparisons

Formulate this problem using the minimum length description (MDL) principle• A good tradeoff between the two measures is found based

on the information theory

* Coarse-granularity partitioning is identical to that in our earlier work on trajectory clustering [15]

Page 24: April 8, 2007

Trajectory Outlier Detection: A Partition-and-Detect Framework

2404/08/08

Fine-Granularity PartitioningFine-Granularity PartitioningFine-Granularity PartitioningFine-Granularity Partitioning

Identify outlying coarse t-partitions by deriving the distance bounds between two coarse t-partitions Li and Lj

• Suppose li is a fine t-partition in Li and lj is that in Lj

• Derive the above bounds separately for (Lemmas 1~3) and combine them (Lemma 4)

ddd ,, ||

TRi

TRj

),,(),,,(),,,(

),,(),,,(),,,(

||

||

dLLubdLLubdLLub

dLLlbdLLlbdLLlb

jijiji

jijiji

Li

Lj

lb(Li, Lj, f) The lower bound of f(li, lj),

ub(Li, Lj, f) The upper bound of f(li, lj),

jjii LlLl ,

jjii LlLl ,

Page 25: April 8, 2007

Trajectory Outlier Detection: A Partition-and-Detect Framework

2504/08/08

Derivation of the Distance BoundsDerivation of the Distance BoundsDerivation of the Distance BoundsDerivation of the Distance Bounds

Lemma 1. Bounds for ||dd

d

Lemma 2. Bounds for

Lemma 3. Bounds for

Lemma 4. Bounds for dist(Li, Lj)

)()()(),,(

)()()(),,(

||||

||||

dubwdubwdubwdistLLub

dlbwdlbwdlbwdistLLlb

ji

ji

Combine

Page 26: April 8, 2007

Trajectory Outlier Detection: A Partition-and-Detect Framework

2604/08/08

Pruning Rules for Fine-Granularity PartitioningPruning Rules for Fine-Granularity PartitioningPruning Rules for Fine-Granularity PartitioningPruning Rules for Fine-Granularity Partitioning

Rule 1: If lb(Li, Lj, dist) > D, fine-granularity partitioning is not required when comparing Li and Lj

Rule 2: If ub(Li, Lj, dist) ≤ D, fine-granularity partitioning is required, but the distance between the fine t-partitions in Li and Lj needs not be computed

> DLi

Lj

lb(Li, Lj, dist) > D

Li

Lj

ub(Li, Lj, dist) ≤ D≤ D

Page 27: April 8, 2007

Trajectory Outlier Detection: A Partition-and-Detect Framework

2704/08/08

Performance EvaluationPerformance EvaluationPerformance EvaluationPerformance Evaluation

Use two real trajectory data sets• Hurricane track data set

− Records the Atlantic hurricanes for the years 1950 through 2006

− The entire set: 608 trajectories and 18,951 points; A small set (1990~2006): 221 trajectories and 7,270 points

• Animal movement data set− Records the locations of elk, deer, and cattle for the years

1993 through 1996 (the Starkey Project)− Elk1993: 33 trajectories and 15,422 points;

Deer1995: 32 trajectories and 20,065 points; Cattle1993: 41 trajectories and 19,556 points

Validate the quality of outlier detection Evaluate the effectiveness of the two-level partitioning strat

egy

Page 28: April 8, 2007

Trajectory Outlier Detection: A Partition-and-Detect Framework

2804/08/08

Trajectory Outliers for Hurricane Data (Small)Trajectory Outliers for Hurricane Data (Small)Trajectory Outliers for Hurricane Data (Small)Trajectory Outliers for Hurricane Data (Small)

D = 85, p = 0.95, F = 0.2 → # of outliers = 13

Page 29: April 8, 2007

Trajectory Outlier Detection: A Partition-and-Detect Framework

2904/08/08

Trajectory Outliers for Trajectory Outliers for Elk1993Elk1993Trajectory Outliers for Trajectory Outliers for Elk1993Elk1993

D = 55, p = 0.95, F = 0.1 → # of outliers = 3

Page 30: April 8, 2007

Trajectory Outlier Detection: A Partition-and-Detect Framework

3004/08/08

Trajectory Outliers for Trajectory Outliers for Deer1995Deer1995Trajectory Outliers for Trajectory Outliers for Deer1995Deer1995

D = 80, p = 0.95, F = 0.1 → # of outliers = 3

Page 31: April 8, 2007

Trajectory Outlier Detection: A Partition-and-Detect Framework

3104/08/08

Effects of Parameter ValuesEffects of Parameter ValuesEffects of Parameter ValuesEffects of Parameter Values

(a) D = 83, p = 0.95, F = 0.2

(b) D = 87, p = 0.95, F = 0.2

19 outliers

10 outliers

Page 32: April 8, 2007

Trajectory Outlier Detection: A Partition-and-Detect Framework

3204/08/08

Pruning Power of Two-Level PartitioningPruning Power of Two-Level PartitioningPruning Power of Two-Level PartitioningPruning Power of Two-Level Partitioning

0.018 0.001 0.001 0.0030

0.2

0.4

0.6

0.8

1

Hurricane Elk1993 Deer1995 Cattle1993Dataset

Pru

ning

Pow

er2L-False 2L-Total Optimal

2L-Total: the ratio of the number of pairs pruned by Rule 1 to the total number of pairs of coarse t-partitions

2L-False: the proportion of pairs pruned incorrectly Optimal: the maximum ratio of pairs that can be pruned

Achieves high pruning power (64~88%)

Page 33: April 8, 2007

Trajectory Outlier Detection: A Partition-and-Detect Framework

3304/08/08

Speedup Ratio of Two-Level PartitioningSpeedup Ratio of Two-Level PartitioningSpeedup Ratio of Two-Level PartitioningSpeedup Ratio of Two-Level Partitioning

30.9 27.0

73.7

42.8

0

20

40

60

80

Hurricane Elk1993 Deer1995 Cattle1993Dataset

Spe

edup

Rat

io

the elapsed time of the algorithm using the simple partitioning strategy

the elapsed time of the algorithm using the two-level partitioning strategy

Speedup Ratio =

Shows significant performance improvement

Page 34: April 8, 2007

Trajectory Outlier Detection: A Partition-and-Detect Framework

3404/08/08

Related WorkRelated WorkRelated WorkRelated Work

Outlier detection algorithms for points• Distribution-based [2], distance-based [3, 4, 5, 6], density-

based [7, 8], deviation-based [9]

Trajectory outlier detection technique using a distance-based approach [5]• Not clear whether this technique can detect outlying sub-

trajectories from very complicated trajectories

Trajectory outlier detection algorithms based on classification [12]• Require a good training set and depend on training

Page 35: April 8, 2007

Trajectory Outlier Detection: A Partition-and-Detect Framework

3504/08/08

ConclusionsConclusionsConclusionsConclusions

Proposed a novel framework, the partition-and-detect framework, for detecting trajectory outliers

For the 1st phase, proposed a two-level trajectory partitioning strategy• Ensures both high quality and high efficiency

For the 2nd phase, proposed a hybrid of the distance-based and density-based approaches• Very intuitive, but does not have the local density problem

Demonstrated the effectiveness of TRAOD using various real trajectory data

Page 36: April 8, 2007

Trajectory Outlier Detection: A Partition-and-Detect Framework

3604/08/08

Thank You!