K-BestMatch

17
Mirco Nanni Mirco Nanni e-mail: [email protected] Roberto Trasarti Roberto Trasarti e-mail: [email protected] KDD Lab, ISTI-CNR, Pisa, Italy KDD Lab, ISTI-CNR, Pisa, Italy SSTDM-09 Miami, 2009

description

 

Transcript of K-BestMatch

Page 1: K-BestMatch

Mirco NanniMirco Nanni e-mail: [email protected]

Roberto TrasartiRoberto Trasartie-mail: [email protected]

KDD Lab, ISTI-CNR, Pisa, ItalyKDD Lab, ISTI-CNR, Pisa, Italy

SSTDM-09 Miami, 2009

Page 2: K-BestMatch

Diffusion of devices with GPS technology leads to a large amount of data for vehicles and individuals.

Applications on urban context require to map this data on a street network.

Page 3: K-BestMatch

Geometric Map Matching: Consider a set of timestamps T = {0, 1, . . . , t} and a function Pr = (latitude, longitude) which describes the real position of a user at time r ∈ T. Then, given a set of georeferenced map objects O = {o1, . . . , on}, a Map Matching function is defined as follows: F( Pr) → oj where oj ∈ O.

There are three classes of approches:1. Point – to – point

The original points are mapped on nodes of the network

2. Point – to – segmentsThe original points are mapped on segments of the network

3. Segments – to – SegmentsConsidering every two consicutive original point the obtained segments are mapped on segments of the network

Page 4: K-BestMatch

Accuracy of the devices:Errors in the positioning create ambiguity during the process of mapmatching.

Usually solved by heuristicsUsually solved by heuristics.

Sampling rate:The sample rate (or storing rate) of the device can produce a disconnected path.

How to complete the path?How to complete the path?

Page 5: K-BestMatch

Best Match: Let M be a street map, composed of: a set of nodes M.nodes, a set of oriented segments M.segments ⊆ M.nodes ×M.nodes, and a cost function that associates each segment with a real value M.cost :

M.segments → R.

Then, given a sequence of segments S = <s1, . . . , sn>, the match set of S over M is defined as follows:

and the best match of S over M is:

where

Page 6: K-BestMatch

From a set of disconnected segments to a connected path on the street network:

Page 7: K-BestMatch

Point-to-segmentWhen the two nearest segments of a point have distances from it that are equal up to a given tolerance, the segment whose starting vertex is closest to the end point of the previous segment of the trajectory is chosen.

K-BestMatchWe use a more flexible approach which considers the k-optimal alternatives paths between two disconnected segments of the initial set.

Page 8: K-BestMatch

The previous representation of a path on the street network becomes a multipath

Page 9: K-BestMatch

Item-frequency representation: Given a street map M, a sequence of segments S = <s1, . . . , sn> and a positive integer k, the Item-Frequency representation IF(S,M, k) of S over M w.r.t. k is defined as a pair (I, f) such that:

The frequencies can be computed locally for each gap in S.

Page 10: K-BestMatch

Freqeuncy:RedRed = 1 = 1OrangeOrange

>=.75>=.75YellowYellow >=.5 >=.5 GreenGreen >=.25 >=.25 BlueBlue <=.25 <=.25

K = 1 (BestMatch)

K =4

As obvious, segments already contained in the original dataset have frequency 1 in both the cases.

Page 11: K-BestMatch

Point-to-segment: Compares a set of points P with all the segments of the street network M.segments, therefore the complexity is O(|P|m) , where m = |M.segments|

K-BestMatch: considering G as the set of gaps in the dataset, the complexity is O(|G|kn(m +nlogn)) where k is the k-bestmatch parameter and n = |M.nodes| (See [1]).

The overall complexity is O(|G|kn(m+nlogn) + |P|m). If we assume that the road network M is fixed, and therefore n and m are constant factors, the complexity reduces to O(|G|k + |P|).

[1] Ernesto Q. V. Martins and Marta M. B. Pascoal. A new implementation of yen ranking loopless paths algorithm. In 4OR: A Quarterly Journal of Operations Research.

Page 12: K-BestMatch

Given the Item-Frequency representations of two trajectories, IF1 = (I1, f1) and IF2 = (I2, f2), we define the following distances between IF1 and IF2:

where:

Page 13: K-BestMatch

As a first validation of the method, we provide a visual account of the effects of the k−BestMatch reconstruction with the Jaccard distance on k-Nearest Neighbor queries (kNN):

Q: Returning the top 10 objects that are closest to a chosen pivot object

Our solution contains a larger core of segments but there are not the outlier paths that occur in the BestMatch based.

Pivot object BestMatch Result K-BestMatch Result

Page 14: K-BestMatch

In order to test the method on a clustering task, a generic agglomerative hierarchical clustering algorithm was adapted to work with the item-frequency representation of trajectories.

A Cluster using K-BestMatch Same set of trajectory using BestMatch. (It’s not a cluster)

Page 15: K-BestMatch

In this experiment the k−BestMatch reconstruction was performed with several different values for k, then comparing the resulting clusters against those obtained by adopting the BestMatch approach (k = 1).

The comparison between clustering results is performed by the standard F-measure:

Page 16: K-BestMatch

A first exploration of the effects that a more flexible map matching approach can have on the comparison, query and mining of trajectories.

Preliminary results are encouraging, and suggest that overcoming the limits of standard best-match reconstruction strategies can have beneficial effects on the successive analysis to be performed on such data.

The work also rose several open issues:

the need for refined methods to select the k-optimal alternative paths, for instance trying to limit path redundancy;

the need for considering also the order of visit of the segments, thus moving from the item-frequency representation to a more complex one

Page 17: K-BestMatch

Thank you.Questions?

Mirco NanniMirco Nanni e-mail: [email protected]

Roberto TrasartiRoberto Trasartie-mail: [email protected] KDD Lab, ISTI-CNR, Pisa, Italy