Download - Nearest-Neighbor Searching Under Uncertainty Wuzhou Zhang Joint work with Pankaj K. Agarwal, Alon Efrat,…

Transcript
Page 1: Nearest-Neighbor Searching Under Uncertainty Wuzhou Zhang Joint work with Pankaj K. Agarwal, Alon Efrat,…

Nearest-Neighbor Searching Under UncertaintyWuzhou Zhang

RIP Final and Masters, March 22, 2012

Joint work with Pankaj K. Agarwal, Alon Efrat, and Swaminathan Sankararaman. To appear in PODS 2012.

Page 2: Nearest-Neighbor Searching Under Uncertainty Wuzhou Zhang Joint work with Pankaj K. Agarwal, Alon Efrat,…

2

Nearest-Neighbor Searching

ApplicationsPattern Recognition, Data CompressionStatistical Classification, ClusteringDatabases, Information RetrievalComputer Vision, etc.

http://en.wikipedia.org/wiki/Nearest_neighbor_search

𝑆𝑞

𝑝∗

a set of points in

any query point in

Find the closest point to

Page 3: Nearest-Neighbor Searching Under Uncertainty Wuzhou Zhang Joint work with Pankaj K. Agarwal, Alon Efrat,…

3

Data Uncertainty Location of data is imprecise: Sensor databases, face recognition, mobile data, etc.

𝑞

What is the “nearest neighbor” of now?

Page 4: Nearest-Neighbor Searching Under Uncertainty Wuzhou Zhang Joint work with Pankaj K. Agarwal, Alon Efrat,…

4

Our Model and Problem Statement Uncertain point : represented as a probability density function (pdf)

Expected distance:

: probabilities/weights: distance function

Let , find the expected nearest neighbor (ENN) of :

Or an -ENN :

0.1 0.2 0.3 0.4

Page 5: Nearest-Neighbor Searching Under Uncertainty Wuzhou Zhang Joint work with Pankaj K. Agarwal, Alon Efrat,…

5

Previous work and Our contribution Previous work • The expected -NN under metric: ε-approximation

[Ljosa2007]• Aggregate nearest neighbor (ANN) under the SUM

function [Li2011, Sharifzadeh2010, Lian2008, etc]• All based on heuristics Our contribution

First nontrivial methods for answering exact or -approximate ENN queries with provable performance guarantees

Page 6: Nearest-Neighbor Searching Under Uncertainty Wuzhou Zhang Joint work with Pankaj K. Agarwal, Alon Efrat,…

6

Summary of resultsDistanc

e functio

n

Settings Preprocessing time Space Query time

Squared Euclidean distance

Uncertain data

Uncertain query

Rectilinear metric

Uncertain data

Uncertain query

Euclidean metric(-ENN)

Uncertain data

Uncertain query

Results in , extends to higher dimensions

Page 7: Nearest-Neighbor Searching Under Uncertainty Wuzhou Zhang Joint work with Pankaj K. Agarwal, Alon Efrat,…

7

Voronoi Diagram

Voronoi cell: Voronoi diagram : decomposition induced by

Preprocessing time

Space

Query time

Page 8: Nearest-Neighbor Searching Under Uncertainty Wuzhou Zhang Joint work with Pankaj K. Agarwal, Alon Efrat,…

8

Expected Voronoi Diagram

Expected Voronoi cell

Expected Voronoi diagram : induced by

An example in metric

Page 9: Nearest-Neighbor Searching Under Uncertainty Wuzhou Zhang Joint work with Pankaj K. Agarwal, Alon Efrat,…

9

Minimization diagram

The lower envelope of :

: the projection of the graph of

Page 10: Nearest-Neighbor Searching Under Uncertainty Wuzhou Zhang Joint work with Pankaj K. Agarwal, Alon Efrat,…

10

Squared Euclidean distanceUncertain data : the centroid of

Then,

• Replace by with weight • same as the power diagram WPD

Preprocessing time

Space Query time

Remarks: Works for any distribution

Page 11: Nearest-Neighbor Searching Under Uncertainty Wuzhou Zhang Joint work with Pankaj K. Agarwal, Alon Efrat,…

11

Rectilinear metricUncertain data Assume metric: Size of : Lower bound construction

the inverse Ackermann function Remarks: Extends to metric

Page 12: Nearest-Neighbor Searching Under Uncertainty Wuzhou Zhang Joint work with Pankaj K. Agarwal, Alon Efrat,…

12

Rectilinear metricUncertain data (cont.) A near-linear size index exists despite size of

linear pieces!

𝑝𝑖𝑗

− (𝑥𝑝 𝑖𝑗−𝑥𝑞)+(𝑦𝑝 𝑖𝑗

− 𝑦𝑞)

− (𝑥𝑝 𝑖𝑗−𝑥𝑞)− ( 𝑦𝑝𝑖𝑗

− 𝑦𝑞)

(𝑥𝑝𝑖𝑗−𝑥𝑞)+ (𝑦 𝑝𝑖𝑗

− 𝑦𝑞)

(𝑥𝑝𝑖𝑗−𝑥𝑞)− ( 𝑦𝑝 𝑖𝑗

−𝑦𝑞 )𝑝𝑖𝑗

Linear!

𝑃 𝑖

Page 13: Nearest-Neighbor Searching Under Uncertainty Wuzhou Zhang Joint work with Pankaj K. Agarwal, Alon Efrat,…

13

Rectilinear metricUncertain data (cont.)

Preprocessing time

Space Query time

Remarks: Extends to higher dimensions

Page 14: Nearest-Neighbor Searching Under Uncertainty Wuzhou Zhang Joint work with Pankaj K. Agarwal, Alon Efrat,…

14

Euclidean metric (-ENN)Uncertain data Approximate by

Outside the grid:

Inside the gird:

Total # of cells:: outermost square: the collection of squares

Remarks: Extends to any metric

Page 15: Nearest-Neighbor Searching Under Uncertainty Wuzhou Zhang Joint work with Pankaj K. Agarwal, Alon Efrat,…

15

Euclidean metric (-ENN)Uncertain data (cont.)

and and : generated by Arya’s data structure on A linear size approximate !

15

Quadtree: 4-way tree

Preprocessing time Space Query time

Page 16: Nearest-Neighbor Searching Under Uncertainty Wuzhou Zhang Joint work with Pankaj K. Agarwal, Alon Efrat,…

16

Further work

Is there a linear-size index to answer the following queries in sublinear time in the worst case?

• the nearest neighbor with highest probability• the nearest neighbors with probability higher than

THANKS

Page 17: Nearest-Neighbor Searching Under Uncertainty Wuzhou Zhang Joint work with Pankaj K. Agarwal, Alon Efrat,…

17

Squared Euclidean distanceUncertain query

: the centroid of

Preprocessing• Compute the Voronoi diagram VD Query• Given , compute in , then query VD with

Preprocessing time

Space Query time

Remarks: Extends to higher dimensions and works for any distribution

Page 18: Nearest-Neighbor Searching Under Uncertainty Wuzhou Zhang Joint work with Pankaj K. Agarwal, Alon Efrat,…

18

Rectilinear metricUncertain query Similarly, linear pieces

Preprocessing time

Space

Query time

Page 19: Nearest-Neighbor Searching Under Uncertainty Wuzhou Zhang Joint work with Pankaj K. Agarwal, Alon Efrat,…

19

Euclidean metric (-ENN)Uncertain query

Preprocessing time

Space

Query time

Remarks: Extends to higher dimensions