Nearest-Neighbor Searching Under Uncertainty Wuzhou Zhang Joint work with Pankaj K. Agarwal, Alon...
-
Upload
suzanna-porter -
Category
Documents
-
view
216 -
download
0
description
Transcript of Nearest-Neighbor Searching Under Uncertainty Wuzhou Zhang Joint work with Pankaj K. Agarwal, Alon...
Nearest-Neighbor Searching Under UncertaintyWuzhou Zhang
RIP Final and Masters, March 22, 2012
Joint work with Pankaj K. Agarwal, Alon Efrat, and Swaminathan Sankararaman. To appear in PODS 2012.
2
Nearest-Neighbor Searching
ApplicationsPattern Recognition, Data CompressionStatistical Classification, ClusteringDatabases, Information RetrievalComputer Vision, etc.
http://en.wikipedia.org/wiki/Nearest_neighbor_search
𝑆𝑞
𝑝∗
a set of points in
any query point in
Find the closest point to
3
Data Uncertainty Location of data is imprecise: Sensor databases, face recognition, mobile data, etc.
𝑞
What is the “nearest neighbor” of now?
4
Our Model and Problem Statement Uncertain point : represented as a probability density function (pdf)
Expected distance:
: probabilities/weights: distance function
Let , find the expected nearest neighbor (ENN) of :
Or an -ENN :
0.1 0.2 0.3 0.4
5
Previous work and Our contribution Previous work • The expected -NN under metric: ε-approximation
[Ljosa2007]• Aggregate nearest neighbor (ANN) under the SUM
function [Li2011, Sharifzadeh2010, Lian2008, etc]• All based on heuristics Our contribution
First nontrivial methods for answering exact or -approximate ENN queries with provable performance guarantees
6
Summary of resultsDistanc
e functio
n
Settings Preprocessing time Space Query time
Squared Euclidean distance
Uncertain data
Uncertain query
Rectilinear metric
Uncertain data
Uncertain query
Euclidean metric(-ENN)
Uncertain data
Uncertain query
Results in , extends to higher dimensions
7
Voronoi Diagram
Voronoi cell: Voronoi diagram : decomposition induced by
Preprocessing time
Space
Query time
8
Expected Voronoi Diagram
Expected Voronoi cell
Expected Voronoi diagram : induced by
An example in metric
9
Minimization diagram
The lower envelope of :
: the projection of the graph of
10
Squared Euclidean distanceUncertain data : the centroid of
Then,
• Replace by with weight • same as the power diagram WPD
Preprocessing time
Space Query time
Remarks: Works for any distribution
11
Rectilinear metricUncertain data Assume metric: Size of : Lower bound construction
the inverse Ackermann function Remarks: Extends to metric
12
Rectilinear metricUncertain data (cont.) A near-linear size index exists despite size of
linear pieces!
𝑝𝑖𝑗
− (𝑥𝑝 𝑖𝑗−𝑥𝑞)+(𝑦𝑝 𝑖𝑗
− 𝑦𝑞)
− (𝑥𝑝 𝑖𝑗−𝑥𝑞)− ( 𝑦𝑝𝑖𝑗
− 𝑦𝑞)
(𝑥𝑝𝑖𝑗−𝑥𝑞)+ (𝑦 𝑝𝑖𝑗
− 𝑦𝑞)
(𝑥𝑝𝑖𝑗−𝑥𝑞)− ( 𝑦𝑝 𝑖𝑗
−𝑦𝑞 )𝑝𝑖𝑗
Linear!
𝑃 𝑖
13
Rectilinear metricUncertain data (cont.)
Preprocessing time
Space Query time
Remarks: Extends to higher dimensions
14
Euclidean metric (-ENN)Uncertain data Approximate by
Outside the grid:
Inside the gird:
Total # of cells:: outermost square: the collection of squares
Remarks: Extends to any metric
15
Euclidean metric (-ENN)Uncertain data (cont.)
and and : generated by Arya’s data structure on A linear size approximate !
15
Quadtree: 4-way tree
Preprocessing time Space Query time
16
Further work
Is there a linear-size index to answer the following queries in sublinear time in the worst case?
• the nearest neighbor with highest probability• the nearest neighbors with probability higher than
THANKS
17
Squared Euclidean distanceUncertain query
: the centroid of
Preprocessing• Compute the Voronoi diagram VD Query• Given , compute in , then query VD with
Preprocessing time
Space Query time
Remarks: Extends to higher dimensions and works for any distribution
18
Rectilinear metricUncertain query Similarly, linear pieces
Preprocessing time
Space
Query time
19
Euclidean metric (-ENN)Uncertain query
Preprocessing time
Space
Query time
Remarks: Extends to higher dimensions