Nearest-Neighbor Searching Under Uncertainty Wuzhou Zhang Joint work with Pankaj K. Agarwal, Alon...

19
Nearest-Neighbor Searching Under Uncertainty Wuzhou Zhang RIP Final and Masters, March 22, 2012 Joint work with Pankaj K. Agarwal, Alon Efrat, and Swaminathan Sankararaman. To appear in PODS 2012.

description

Data Uncertainty  Location of data is imprecise: Sensor databases, face recognition, mobile data, etc. 3

Transcript of Nearest-Neighbor Searching Under Uncertainty Wuzhou Zhang Joint work with Pankaj K. Agarwal, Alon...

Page 1: Nearest-Neighbor Searching Under Uncertainty Wuzhou Zhang Joint work with Pankaj K. Agarwal, Alon Efrat,…

Nearest-Neighbor Searching Under UncertaintyWuzhou Zhang

RIP Final and Masters, March 22, 2012

Joint work with Pankaj K. Agarwal, Alon Efrat, and Swaminathan Sankararaman. To appear in PODS 2012.

Page 2: Nearest-Neighbor Searching Under Uncertainty Wuzhou Zhang Joint work with Pankaj K. Agarwal, Alon Efrat,…

2

Nearest-Neighbor Searching

ApplicationsPattern Recognition, Data CompressionStatistical Classification, ClusteringDatabases, Information RetrievalComputer Vision, etc.

http://en.wikipedia.org/wiki/Nearest_neighbor_search

𝑆𝑞

𝑝∗

a set of points in

any query point in

Find the closest point to

Page 3: Nearest-Neighbor Searching Under Uncertainty Wuzhou Zhang Joint work with Pankaj K. Agarwal, Alon Efrat,…

3

Data Uncertainty Location of data is imprecise: Sensor databases, face recognition, mobile data, etc.

𝑞

What is the “nearest neighbor” of now?

Page 4: Nearest-Neighbor Searching Under Uncertainty Wuzhou Zhang Joint work with Pankaj K. Agarwal, Alon Efrat,…

4

Our Model and Problem Statement Uncertain point : represented as a probability density function (pdf)

Expected distance:

: probabilities/weights: distance function

Let , find the expected nearest neighbor (ENN) of :

Or an -ENN :

0.1 0.2 0.3 0.4

Page 5: Nearest-Neighbor Searching Under Uncertainty Wuzhou Zhang Joint work with Pankaj K. Agarwal, Alon Efrat,…

5

Previous work and Our contribution Previous work • The expected -NN under metric: ε-approximation

[Ljosa2007]• Aggregate nearest neighbor (ANN) under the SUM

function [Li2011, Sharifzadeh2010, Lian2008, etc]• All based on heuristics Our contribution

First nontrivial methods for answering exact or -approximate ENN queries with provable performance guarantees

Page 6: Nearest-Neighbor Searching Under Uncertainty Wuzhou Zhang Joint work with Pankaj K. Agarwal, Alon Efrat,…

6

Summary of resultsDistanc

e functio

n

Settings Preprocessing time Space Query time

Squared Euclidean distance

Uncertain data

Uncertain query

Rectilinear metric

Uncertain data

Uncertain query

Euclidean metric(-ENN)

Uncertain data

Uncertain query

Results in , extends to higher dimensions

Page 7: Nearest-Neighbor Searching Under Uncertainty Wuzhou Zhang Joint work with Pankaj K. Agarwal, Alon Efrat,…

7

Voronoi Diagram

Voronoi cell: Voronoi diagram : decomposition induced by

Preprocessing time

Space

Query time

Page 8: Nearest-Neighbor Searching Under Uncertainty Wuzhou Zhang Joint work with Pankaj K. Agarwal, Alon Efrat,…

8

Expected Voronoi Diagram

Expected Voronoi cell

Expected Voronoi diagram : induced by

An example in metric

Page 9: Nearest-Neighbor Searching Under Uncertainty Wuzhou Zhang Joint work with Pankaj K. Agarwal, Alon Efrat,…

9

Minimization diagram

The lower envelope of :

: the projection of the graph of

Page 10: Nearest-Neighbor Searching Under Uncertainty Wuzhou Zhang Joint work with Pankaj K. Agarwal, Alon Efrat,…

10

Squared Euclidean distanceUncertain data : the centroid of

Then,

• Replace by with weight • same as the power diagram WPD

Preprocessing time

Space Query time

Remarks: Works for any distribution

Page 11: Nearest-Neighbor Searching Under Uncertainty Wuzhou Zhang Joint work with Pankaj K. Agarwal, Alon Efrat,…

11

Rectilinear metricUncertain data Assume metric: Size of : Lower bound construction

the inverse Ackermann function Remarks: Extends to metric

Page 12: Nearest-Neighbor Searching Under Uncertainty Wuzhou Zhang Joint work with Pankaj K. Agarwal, Alon Efrat,…

12

Rectilinear metricUncertain data (cont.) A near-linear size index exists despite size of

linear pieces!

𝑝𝑖𝑗

− (𝑥𝑝 𝑖𝑗−𝑥𝑞)+(𝑦𝑝 𝑖𝑗

− 𝑦𝑞)

− (𝑥𝑝 𝑖𝑗−𝑥𝑞)− ( 𝑦𝑝𝑖𝑗

− 𝑦𝑞)

(𝑥𝑝𝑖𝑗−𝑥𝑞)+ (𝑦 𝑝𝑖𝑗

− 𝑦𝑞)

(𝑥𝑝𝑖𝑗−𝑥𝑞)− ( 𝑦𝑝 𝑖𝑗

−𝑦𝑞 )𝑝𝑖𝑗

Linear!

𝑃 𝑖

Page 13: Nearest-Neighbor Searching Under Uncertainty Wuzhou Zhang Joint work with Pankaj K. Agarwal, Alon Efrat,…

13

Rectilinear metricUncertain data (cont.)

Preprocessing time

Space Query time

Remarks: Extends to higher dimensions

Page 14: Nearest-Neighbor Searching Under Uncertainty Wuzhou Zhang Joint work with Pankaj K. Agarwal, Alon Efrat,…

14

Euclidean metric (-ENN)Uncertain data Approximate by

Outside the grid:

Inside the gird:

Total # of cells:: outermost square: the collection of squares

Remarks: Extends to any metric

Page 15: Nearest-Neighbor Searching Under Uncertainty Wuzhou Zhang Joint work with Pankaj K. Agarwal, Alon Efrat,…

15

Euclidean metric (-ENN)Uncertain data (cont.)

and and : generated by Arya’s data structure on A linear size approximate !

15

Quadtree: 4-way tree

Preprocessing time Space Query time

Page 16: Nearest-Neighbor Searching Under Uncertainty Wuzhou Zhang Joint work with Pankaj K. Agarwal, Alon Efrat,…

16

Further work

Is there a linear-size index to answer the following queries in sublinear time in the worst case?

• the nearest neighbor with highest probability• the nearest neighbors with probability higher than

THANKS

Page 17: Nearest-Neighbor Searching Under Uncertainty Wuzhou Zhang Joint work with Pankaj K. Agarwal, Alon Efrat,…

17

Squared Euclidean distanceUncertain query

: the centroid of

Preprocessing• Compute the Voronoi diagram VD Query• Given , compute in , then query VD with

Preprocessing time

Space Query time

Remarks: Extends to higher dimensions and works for any distribution

Page 18: Nearest-Neighbor Searching Under Uncertainty Wuzhou Zhang Joint work with Pankaj K. Agarwal, Alon Efrat,…

18

Rectilinear metricUncertain query Similarly, linear pieces

Preprocessing time

Space

Query time

Page 19: Nearest-Neighbor Searching Under Uncertainty Wuzhou Zhang Joint work with Pankaj K. Agarwal, Alon Efrat,…

19

Euclidean metric (-ENN)Uncertain query

Preprocessing time

Space

Query time

Remarks: Extends to higher dimensions