Navigating Nets: Simple algorithms for proximity search
description
Transcript of Navigating Nets: Simple algorithms for proximity search
Navigating Nets: Simple algorithms for
proximity search
Robert Krauthgamer (IBM Almaden)Joint work with James R. Lee (UC Berkeley)
Navigating Nets 2
A classical problemFix a metric space (X,d):
X = set of points.
d = distance function over X.
Near-neighbor search (NNS) [Minsky-Papert]:
1. Preprocess a given n-point subset S X.
2. Given a query point q 2 X, quickly compute the closest point to q among S.
Navigating Nets 3
Variations on NNS(1+)-approximate nearest neighbor search: Find a2X such that d(q,a) · (1+) d(q,S).
Dynamic case: Allow updates to S (insertions and deletions).
Distributed case: No central index (e.g., nodes in a network). Other cost measures (e.g., communication, stretch, load).
Navigating Nets 4
General metrics Only oracle access to distance function d(¢,¢). Models a complicated metric or on-demand measurement. No “hashing of coordinates” or tuning for a specific metric.
Goal: efficient query (sublinear or polylog time). Impossible, even if the data set S is a path metric:
1 2 n
n-1n n
What about approximate NNS?
Navigating Nets 5
Approximate NNSHard even for (near) uniform metrics d(x,y) = 1 for all x,y2S.
1
11
But many data sets lack large uniform subsets.
Can we quantify this?
Navigating Nets 6
Abstract dimensionThe doubling constant X of a metric (X,d) is the minimum such that every ball can be covered by balls of half the radius.
The metric is doubling if X = O(1).
The (abstract) dimension is dim (X) = log2 X.
Immediate properties: dimA(Rd , || · ||2) = O(d).
dimA(X’) dimA(X) for all X’ X.
dimA(X) log |X|. (Equality for a uniform metric.)
Navigating Nets 7
IllustrationGrid with missing piece
Navigating Nets 8
IllustrationGrid with missing piece
Low-dimensional manifold (bounded curvature)
Navigating Nets 9
IllustrationGrid with missing piece
Manifold
Union of curves in Euclidean space
Navigating Nets 10
Embedding doubling metricsTheorem [Assouad, 1983] [Gupta, K., Lee, 2003]: Fix 0<<1, and let (X,d) be a doubling metric. Then (X,d) can be embedded with O(1) distortion into l2O(1).
Not true for =1 [Semmes, 1996].
Motivation: Embed S and then apply Euclidean NNS.
Navigating Nets 11
Our resultsSimple data structure for maintaining S: (1+)-NNS query time: (1/)O(dim(S)) · log (for <½), where
dmax/dmin is the normalized diameter of S (typically =nO(1)). Space: n · 2O(dim(S))
Dynamic maintenance of S: Insertion / deletion time: 2O(dim(S)) · log · loglog .
Additional properties: Best possible dependency on dim(S) (in a certain model). Oblivious to dim(S) and robust against “bad localities”.
Matches/improves known (more specialized) results.
Navigating Nets 12
NetsDefinition: An r-net of X is a subset Y with1. d(y1,y2) r for all y1,y2 2 Y.
2. d(x,Y) < r for all x 2 XnY.
(I.e., a maximal r-separated subset.)
Note: Compare vs. -net.
Running example – a path metric:
An 8-net
A 4-net
A 16-net
Navigating Nets 13
More netsDefinition: An r-net of X is a subset Y with1. d(y1,y2) r for all y1,y2 2 Y.
2. d(x,Y) < r for all x 2 XnY.
(I.e., a maximal r-separated subset.)
Note: Compare vs. -net.
Yr
Y Y
Y
Navigating Nets 14
The data structureFor every r = 2i, let Yr be an r-net of S. Only O(log ) values of r are non-trivial.
A 16-net
An 8-net
A 4-net
For every y 2 Yr maintain a navigation list
Ly,r = {z 2 Yr/2: d(y,z) 2r}
Navigating Nets 15
More on the data structure
3r
Yr/2
Yr
For every r = 2i, let Yr be an r-net of S. Only O(log ) values of r are non-trivial.
For every y 2 Yr maintain a navigation list
Ly,r = {z 2 Yr/2: d(y,z) 2r}
Navigating Nets 16
Space requirementLemma: |Ly,r| 2O(dim(S)) for all y2Y, r¸0.Proof:
Ly,r is contained in a ball of radius 2r.
This ball can be covered by S3 balls of radius r/4.
Every point in Ly,r Yr/2 must be covered by a distinct ball.
Hence, | Ly,r | S3 = 23dim(S).
Corollary: Total space is 2O(dim(S)) · n · log .We actually improve it to 2O(dim(S)) · n.
Navigating Nets 17
Back to running example
A 16-net
An 8-net
A 4-net
Navigating Nets 18
Navigating netsLet $ denote the query point.
Initially z16 = only point in Y16.
Find z8 = closest Y8 point to $.
Find z4 = closest Y4 point to $ etc.
$
$
$
Navigating Nets 19
How to find zr/2?
Assume each zr2Yr is the closest point to a (instead of to q).
Then d(zr,zr/2) · r+r/2 = 3r/2.
And zr/2 must be in zr‘s list Ly,r.
• q
• zr
· r
• a
• zr/2
· r/2 · r/4For zr to be closest Yr point to q,
It suffices that d(q,a) · r/4.
And then zr’s list Ly,r contains zr/2.
Note: d(q,zr) · 3r/2.
Navigating Nets 20
Stopping pointIf we find a point zr with d(q,zr) · 3r/2,
But not a point zr/2 with d(q,zr/2) · 3r/4,
We know that d(q,S) > r/4,
Yielding 6-NNS with query time 2O(dim(S)) · log .
This can be extended to (1+)-NNS Similar principles yield insertions and deletions.
Navigating Nets 21
Near-optimalityThe basic idea: Consider a uniform metric on points. Let the query point be at distance 1 from all of them, Except for one point whose distance is 1-. Finding this point requires (in an oracle model) computing all
distances to q.
Can happen at every distance scale r.
We get a lower bound of 2 (dim(S)) log .
Navigating Nets 22
Related work – general metricsLet KX be the smallest K such that
|B(x,r)| K ¢ |B(x,r/2)| for all x 2 X, r ¸ 0.
Define the KR-dimension as log2 KX.
Randomized exact NNS [Karger-Ruhl’02, Hildrum et al.’04]: Space n · 2O(dim(S)) · log . Query time : 2O(dim(S)) · log . If dimKR(S) = O(1) the log term is actually O(log n).
Our results extend to this setting:1. KR-metrics are doubling: dim(X) 4dimKR(X).
2. Our algorithms actually give exact NNS.
Assumptions on query distribution [Clarkson’99].
Navigating Nets 23
Related work – Euclidean metricsExact NNS for Rd: O(d5 log n) query time and O(nd+) space. [Meiser’93]
-NNS for Rd: O((d/)d log n) query time and O(dn) space by quad-tree like
decompositions [AMNSW’94]. Our algorithm achieves similar bounds.
O(d polylog(dn)) query time and (dn)O(1) space is useful for higher dimensions [IM’98, KOR’98].
Navigating Nets 24
Concluding remarksOur approach: A “decision tree” that is not really a tree (saves space).
In progress: A different (static) scheme where log is replaced by log n. Bounds on the help of “ambient” space points.
Our data structure yields a spanner of the metric Immediate: O(1) stretch with average degree 2dim(S). More work: O(1) stretch with maximum degree 2dim(S).
[Guibas,’04] applied the nets data structure for moving points in the plane.
Navigating Nets 25