Nearest Neighbor Searching Under Uncertainty Wuzhou Zhang Supervised by Pankaj K. Agarwal Department...

18
Nearest Neighbor Searching Under Uncertainty Wuzhou Zhang Supervised by Pankaj K. Agarwal Department of Computer Science Duke University

Transcript of Nearest Neighbor Searching Under Uncertainty Wuzhou Zhang Supervised by Pankaj K. Agarwal Department...

Nearest Neighbor Searching Under

UncertaintyWuzhou Zhang

Supervised by Pankaj K. AgarwalDepartment of Computer Science

Duke University

Nearest Neighbor Searching (NNS)

Applications

Pattern Recognition, Data Compression

Statistical Classification, Clustering

Databases, Information Retrieval

Computer Vision, etc.http://en.wikipedia.org/wiki/Nearest_neighbor_search

Nearest Neighbor Searching Under Uncertainty

Discrete pdf•

Continuous pdf•

0.2

0.10.4

0.3

0.3

0.4

0.2

0.1

0.3

0.4

0.2

0.1

Nearest Neighbor In Expectation

_________

Bisector In Case Of Gaussian

For Gaussian distribution, bisector is a line!

Hard to get explicit formula!

Figure: http://www.cs.utah.edu/~hal/courses/2009S_AI/Walkthrough/KalmanFilters/

In case of discrete pdf,

bisector is also a line!

In both cases, compute the Voronoi diagram, solve it optimally!

However, not a metric !

Squared Distance Function

bisector is simple and beautiful!

Sampling Continuous Distributions

Sometimes working on continuous distributions is hard….

Lower bounds on other metrics and distributions are also possible…. Let’s focus on discrete pdf then….

Expected Nearest NeighborIn L1 Metric (Manhattan metric)

Expected Nearest NeighborIn L1 Metric ( cont. )

Source: Range Searching on Uncertain Data [P.K.Agarwal et al. 2009]

Geometric Reduction

Building Block: Half-Space Intersection and Convex Hulls

Upper hulls correspond to lower envelopes, an example in 2D

Source: page 252 – 253, Computational Geometry: Algorithms and Applications, 3rd Edition[Mark de Berg et al. ]

Segment-tree Based Data Structures for Expected-NN In L1 Metric

Segment-tree Based Data Structures for Expected-NN In L1 Metric ( cont. )

Segment-tree Based Data Structures for Expected-NN In L1 Metric ( cont. )

Size of data structure

Preprocessing time

Query timeSummary of the result

Approximate L2 Metric

It’s a metric when P is centrally symmetric!

More complex!

Approximate L2 Metric ( cont. )

Work harder in the near

future!

• Approximate the expected NN in L2 metric

• Study the complexity of expected Voronoi diagram

• Study the probability case

Future Work

Thanks!

Questions?

Main References:[1] Pankaj K. Agarwal, Siu-Wing Cheng, Yufei Tao, Ke Yi: Indexing uncertain data. PODS 2009: 137-146[2] Pankaj K. Agarwal, Lars Arge, Jeff Erickson: Indexing Moving Points. J. Comput. Syst. Sci. 66(1): 207-243 (2003)