Nearest Neighbor Search in High Dimensions Seminar in Algorithms and Geometry Mica Arie-Nachimson...
-
Upload
eric-bails -
Category
Documents
-
view
216 -
download
0
Transcript of Nearest Neighbor Search in High Dimensions Seminar in Algorithms and Geometry Mica Arie-Nachimson...
Nearest Neighbor Search in High Dimensions
Seminar in Algorithms and Geometry
Mica Arie-Nachimson and Daniel Glasner
April 2009
Talk Outline
• Nearest neighbor problem– Motivation
• Classical nearest neighbor methods– KD-trees
• Efficient search in high dimensions– Bucketing method– Locality Sensitive Hashing
• ConclusionIndyk and Motwani, 1998
Gionis, Indyk and Motwani, 1999
Main Results
Nearest Neighbor Problem
• Input: A set P of points in Rd (or any metric space).
• Output: Given a query point q, find the point p* in P which is closest to q.
qp*
What is it good for?Many things!Examples:• Optical Character Recognition• Spell Checking• Computer Vision• DNA sequencing• Data compression
What is it good for?Many things!Examples:• Optical Character Recognition• Spell Checking• Computer Vision• DNA sequencing• Data compression
22
2
3
8
7
4
2
Feature space
query
What is it good for?Many things!Examples:• Optical Character Recognition• Spell Checking• Computer Vision• DNA sequencing• Data compression
aboutboat
batabate
able
scoutshout
abaut
Feature space
query
What is it good for?Many things!Examples:• Optical Character Recognition• Spell Checking• Computer Vision• DNA sequencing• Data compression
And many more…
Approximate Nearest Neighbor -NN
Approximate Nearest Neighbor -NN
• Input: A set P of points in Rd (or any metric space).
• Given a query point q, let:– p* point in P closest to q– r* the distance ||p*-q||
• Output: Some point p’ with distance at most r*(1+)
q
p*r*
Approximate Nearest Neighbor -NN
• Input: A set P of points in Rd (or any metric space).
• Given a query point q, let:– p* point in P closest to q– r* the distance ||p*-q||
• Output: Some point p’ with distance at most r*(1+)
p*·r*(1+)
·r*(1+)q
r*
Approximate vs. ExactNearest Neighbor
• Many applications give similar results with approximate NN
• Example from Computer Vision
Retiling
Slide from Lihi Zelnik-Manor
Exact NNS ~27 sec
Approximate NNS ~0.6 sec
Slide from Lihi Zelnik-Manor
Solution Method
• Input: A set P of n points in Rd.• Method: Construct a data structure
to answer nearest neighbor queries• Complexity
– Preprocessing: space and time to construct the data structure
– Query: time to return answer
Solution Method
• Naïve approach:– Preprocessing O(nd)– Query time O(nd)
• Reasonable requirements:– Preprocessing time and space poly(nd).– Query time sublinear in n.
Talk Outline
• Nearest neighbor problem– Motivation
• Classical nearest neighbor methods– KD-trees
• Efficient search in high dimensions– Bucketing method– Locality Sensitive Hashing
• Conclusion
Classical nearest neighbor methods
• Tree structures– kd-trees
• Vornoi Diagrams– Preprocessing poly(n), exp(d)– Query log(n), exp(d)
• Difficult problem in high dimensions– The solutions still work, but are exp(d)…
KD-tree
• d=1 (binary search tree)
5 20
7 ,8 10 ,12 13 ,15 18
12 157 8 10 13 18
13,15,187,8,10,12
1813,1510,127,8
KD-tree
• d=1 (binary search tree)
5 20
7 ,8 10 ,12 13 ,15 18
12 157 8 10 13 18
13,15,187,8,10,12
1813,1510,127,8
17query
min dist = 1
KD-tree
• d=1 (binary search tree)
5 20
7 ,8 10 ,12 13 ,15 18
12 157 8 10 13 18
13,15,187,8,10,12
1813,1510,127,8
16query
min dist = 2min dist = 1
KD-tree
• d>1: alternate between dimensions• Example: d=2
x
y
x
(12,5( )6,8( )17,4( )23,2( )20,10( )9,9( )1,6)
(17,4( )23,2 )(20,10)
(12,5( )6,8( )1,6 )(9,9)
KD-tree
• d>1: alternate between dimensions• Example: d=2
xx
y
x
KD-tree: complexity
• Preprocessing O(nd)• Query
– O(logn) if points are randomly distributed– w.c. O(kn1-1/k) almost linear when n close to k
• Need to search the whole tree
xx
y
x
Talk Outline
• Nearest neighbor problem– Motivation
• Classical nearest neighbor methods– KD-trees
• Efficient search in high dimensions– Bucketing method– Locality Sensitive Hashing
• Conclusion
Sublinear solutions
Query timePreprocessing
BucketingO(logn)nO(1/)
LSHO(n1/(1+))
[sqrt(n) when =1]
O(n1+1/(1+))
[n3/2 when =1]
2
Linear in d
Not counting logn factors
Solve -NN by reduction
r-PLEBPoint Location in Equal Balls
• Given n balls of radius r, for every query q, find a ball that it resides in, if exists.
• If doesn’t reside in any ball return NO.
Return p1 p1
r-PLEBPoint Location in Equal Balls
• Given n balls of radius r, for every query q, find a ball that it resides in, if exists.
• If doesn’t reside in any ball return NO.
Return NO
Reduction from -NN to r-PLEB
• The two problems are connected– r-PLEB is like a decision problem for -
NN
Reduction from -NN to r-PLEB
• The two problems are connected– r-PLEB is like a decision problem for -
NN
Reduction from -NN to r-PLEB
• The two problems are connected– r-PLEB is like a decision problem for -
NN
Reduction from -NN to r-PLEBNaïve Approach
• Set R=proportion between largest dist and smallest dist of 2 points
• Define r={(1+)0, (1+)1,…,R}• For each ri construct ri-PLEB• Given q, find the smallest r* which gives a
YES– Use binary search to find r*
Reduction from -NN to r-PLEBNaïve Approach
• Set R=proportion between largest dist and smallest dist of 2 points
• Define r={(1+)0, (1+)1,…,R}• For each ri construct ri-PLEB• Given q, find the smallest ri which gives a
YES– Use binary search
r1-PLEBr2-PLEB
r3-PLEB
Reduction from -NN to r-PLEBNaïve Approach
• Correctness– Stopped at ri=(1+)k
– ri+1=(1+)k+1
r1-PLEBr2-PLEB
r3-PLEB
(1+)k · r* · (1+)k+1
Reduction from -NN to r-PLEBNaïve Approach
Reduction overhead:
• Space: O(log1+R) r-PLEB constructions – Size of {(1+)0, (1+)1,…,R} is log1+R
• Query: O(loglog1+R) calls to r-PLEBDependency on R
Reduction from -NN to r-PLEBBetter Approach
• Set rmed as the radius which gives n/2 connected components (C.C)
Har-Peled 2001
Reduction from -NN to r-PLEBBetter Approach
• Set rmed as the radius which gives n/2 connected components (C.C)
Reduction from -NN to r-PLEBBetter Approach
• Set rmed as the radius which gives n/2 connected components (C.C)
• Set rtop= 4nrmedlogn/
rmed
rtop
Reduction from -NN to r-PLEBBetter Approach
• If q2 B(pi,rmed) and q2 B(pi,rtop), set R=rtop/rmed and perform binary search on r={(1+)0, (1+)1,…,R}– R independent of input points
• If q2 B(pi,rmed) q2 B(pi,rtop) 8 i then q is “far away”– Enough to choose one point from each C.C and continue
recursively with these points (accumulating error · 1+/3)
• If q2 B(pi,rmed) for some i then continue recursively on the C.C.
rmed
Reduction from -NN to r-PLEBBetter Approach
• If q2 B(pi,rmed) and q2 B(pi,rtop), set R=rtop/rmed and perform binary search on r={(1+)0, (1+)1,…,R}– R independent of input points
• If q2 B(pi,rmed) q2 B(pi,rtop) 8 i then q is “far away”– Enough to choose one point from each C.C and continue
recursively with these points (accumulating error · 1+/3)
• If q2 B(pi,rmed) for some i then continue recursively on the C.C.
rtop
Reduction from -NN to r-PLEBBetter Approach
• If q2 B(pi,rmed) and q2 B(pi,rtop), set R=rtop/rmed and perform binary search on r={(1+)0, (1+)1,…,R}– R independent of input points
• If q2 B(pi,rmed) q2 B(pi,rtop) 8 i then q is “far away”– Enough to choose one point from each C.C and continue
recursively with these points (accumulating error · 1+/3)
• If q2 B(pi,rmed) for some i then continue recursively on the C.C.
rmed
Reduction from -NN to r-PLEBBetter Approach
• If q2 B(pi,rmed) and q2 B(pi,rtop), set R=rtop/rmed and perform binary search on r={(1+)0, (1+)1,…,R}– R independent of input points
• If q2 B(pi,rmed) q2 B(pi,rtop) 8 i then q is “far away”– Enough to choose one point from each C.C and continue
recursively with these points (accumulating error · 1+/3)
• If q2 B(pi,rmed) for some i then continue recursively on the C.C.
rtop
Reduction from -NN to r-PLEBBetter Approach
• If q2 B(pi,rmed) and q2 B(pi,rtop), set R=rtop/rmed and perform binary search on r={(1+)0, (1+)1,…,R}– R independent of input points
• If q2 B(pi,rmed) q2 B(pi,rtop) 8 i then q is “far away”– Enough to choose one point from each C.C and continue
recursively with these points (accumulating error · 1+/3)
• If q2 B(pi,rmed) for some i then continue recursively on the C.C.
rtop
Reduction from -NN to r-PLEBBetter Approach
• If q2 B(pi,rmed) and q2 B(pi,rtop), set R=rtop/rmed and perform binary search on r={(1+)0, (1+)1,…,R}– R independent of input points
• If q2 B(pi,rmed) q2 B(pi,rtop) 8 i then q is “far away”– Enough to choose one point from each C.C and continue
recursively with these points (accumulating error · 1+/3)
• If q2 B(pi,rmed) for some i then continue recursively on the C.C.
rmed
Reduction from -NN to r-PLEBBetter Approach
• If q2 B(pi,rmed) and q2 B(pi,rtop), set R=rtop/rmed and perform binary search on r={(1+)0, (1+)1,…,R}– R independent of input points
• If q2 B(pi,rmed) q2 B(pi,rtop) 8 i then q is “far away”– Enough to choose one point from each C.C and continue
recursively with these points (accumulating error · 1+/3)
• If q2 B(pi,rmed) for some i then continue recursively on the C.C.
rmed
Reduction from -NN to r-PLEBBetter Approach
• If q2 B(pi,rmed) and q2 B(pi,rtop), set R=rtop/rmed and perform binary search on r={(1+)0, (1+)1,…,R}– R independent of input points
• If q2 B(pi,rmed) q2 B(pi,rtop) 8 i then q is “far away”– Enough to choose one point from each C.C and continue
recursively with these points (accumulating error · 1+/3)
• If q2 B(pi,rmed) for some i then continue recursively on the C.C.
rmed
Reduction from -NN to r-PLEBBetter Approach
• If q2 B(pi,rmed) and q2 B(pi,rtop), set R=rtop/rmed and perform binary search on r={(1+)0, (1+)1,…,R}– R independent of input points
• If q2 B(pi,rmed) q2 B(pi,rtop) 8 i then q is “far away”– Enough to choose one point from each C.C and continue
recursively with these points (accumulating error · 1+/3)
• If q2 B(pi,rmed) for some i then continue recursively on the C.C.
rmed
Reduction from -NN to r-PLEBBetter Approach
• If q2 B(pi,rmed) and q2 B(pi,rtop), set R=rtop/rmed and perform binary search on r={(1+)0, (1+)1,…,R}– R independent of input points
• If q2 B(pi,rmed) q2 B(pi,rtop) 8 i then q is “far away”– Enough to choose one point from each C.C and continue
recursively with these points (accumulating error · 1+/3)
• If q2 B(pi,rmed) for some i then continue recursively on the C.C. 2 + half of the points
O(loglogR)=O(log(n/)
Complexity overhead: how many r-PLEB queries? Total: O(logn)
(r,)-PLEBPoint Location in Equal Balls
• Given n balls of radius r, for query q:– If q resides in a ball of radius r, return
the ball.– If q doesn’t reside in any ball, return NO.– If q resides only in the “border” of a ball,
return either the ball or NO.
p1Return p1
(r,)-PLEBPoint Location in Equal Balls
• Given n balls of radius r, for query q:– If q resides in a ball of radius r, return
the ball.– If q doesn’t reside in any ball, return NO.– If q resides only in the “border” of a ball,
return either the ball or NO.
Return NO
(r,)-PLEBPoint Location in Equal Balls
• Given n balls of radius r, for query q:– If q resides in a ball of radius r, return
the ball.– If q doesn’t reside in any ball, return NO.– If q resides only in the “border” of a ball,
return either the ball or NO.
Return YES or NO
Talk Outline
• Nearest neighbor problem– Motivation
• Classical nearest neighbor methods– KD-trees
• Efficient search in high dimensions– Bucketing method– Locality Sensitive Hashing
• Conclusion
Bucketing Method
• Apply a grid of size r/sqrt(d)• Every ball is covered by at most k cubes
– Can show that k· Cd/d for some C<5 constant
• kn cubes cover all balls• Finite number of cubes: can use hash table
– Key: cube, Value: a ball it covers
• Space req: O(nk)r-PLEB
Indyk and Motwani, 1998
Bucketing Method
• Apply a grid of size r/sqrt(d)• Every ball is covered by at most k cubes
– Can show that k· Cd/d for some C<5 constant
• kn cubes cover all balls• Finite number of cubes: can use hash table
– Key: cube, Value: a ball it covers
• Space req: O(nk)r-PLEB
Bucketing Method
• Apply a grid of size r/sqrt(d)• Every ball is covered by at most k cubes
– Can show that k· Cd/d for some C<5 constant
• kn cubes cover all balls• Finite number of cubes: can use hash table
– Key: cube, Value: a ball it covers
• Space req: O(nk)r-PLEB
Bucketing Method
• Apply a grid of size r/sqrt(d)• Every ball is covered by at most k cubes
– Can show that k· Cd/d for some C<5 constant
• kn cubes cover all balls• Finite number of cubes: can use hash table
– Key: cube, Value: a ball it covers
• Space req: O(nk)r-PLEB
Bucketing Method
• Given query q• Compute the cube it resides in [O(d)]• Find the ball this cube intersects [O(1)]• This point is an (r,)-PLEB of q
r-PLEB
Bucketing Method
• Given query q• Compute the cube it resides in [O(d)]• Find the ball this cube intersects [O(1)]• This point is an (r,)-PLEB of q
r/sqrt(d)
r/s
qrt(
d)
r-PLEB
Bucketing Method
• Given query q• Compute the cube it resides in [O(d)]• Find the ball this cube intersects [O(1)]• This point is an (r,)-PLEB of q
NO
YES
YES or NO r-PLEB
Bucketing MethodComplexity
• Space required: O(nk)=O(n(1/d))• Query time: O(d)• If d=O(logn) [or n=O(2d)]
– Space req: O(nlog(1/))
• Else use dimensionality reduction in l2 from d to -2log(n) [Johnson-Lindenstrauss lemma]– Space: nO(1/)
2
Break
Talk Outline
• Nearest neighbor problem– Motivation
• Classical nearest neighbor methods– KD-trees
• Efficient search in high dimensions– Bucketing method– Local Sensitive Hashing
• Conclusion
Locality Sensitive Hashing
• Indyk & Motwani 98, Gionis, Indyk & Motwani 99
• A solution for (r,)-PLEB.• Probabilistic construction, query
succeeds with high probability.• Use random hash functions
g: X U (some finite range).• Preserve “separation” of “near” and
“far” points with high probability.
Locality Sensitive Hashing
• If ||p-q|| ≤ r, then Pr[g(p)=g(q)] is “high”
• If ||p-q|| > (1+)r, then Pr[g(p)=g(q)] is “low”
r
… g3
… g2
… g1
A locality sensitive family
• A family H of functions h: X → U is called (P1,P2,r,(1+)r)-sensitive for metric dX, if for any p,q:
– if ||p-q|| < r then Pr[ h(p)=h(q) ] > P1
– if ||p-q|| >(1+)r then Pr[ h(p)=h(q) ] < P2
• For this notion to be useful we requireP1 > P2
Intuition
• if ||p-q|| < r then Pr[ h(p)=h(q) ] > P1
• if ||p-q|| >(1+)r then Pr[ h(p)=h(q) ] < P2
h1
h2
Illustration from Lihi Zelnik-Manor
Claim
• If there is a (P1,P2,r,(1+)r) - sensitive family for dX then there exists an algorithm for (r,)-PLEB in dX with
• Space - O(dn+n1+) • Query - O(dn)
Where
~ When = 1 O(dn + n3/2) O(d¢sqrt(n))
Algorithm – preprocessing
k
h1
h2
hk
• For i = 1,…,L – Uniformly select k functions from H
– Set gi(p)=(h1(p),h2(p),…,hk(p))
gi( ) = (0,0,...,1)
gi( ) = (1,0,…,0)
hi : Rd {0,1}0 1
Algorithm – preprocessing
• For i = 1,…,L – Uniformly select k functions from H
– Set gi(p)=(h1(p),h2(p),…,hk(p))
– Compute gi(p) for all p 2 P
– Store resulting values in a hash table
Algorithm - query
• S à , i à 1• While |S| · 2L
– S Ã S [ {points in bucket gi(q) of table i}
– If 9 p 2 S s.t. ||p-q|| · (1+)rreturn p and exit.
– i++
• Return NO.
Correctness
• Property I:if ||q-p*|| · r then gi(p*) = gi(q) for some i 2 1,...,L
• Property II:number of points p2 P s.t. ||q-p|| ¸ (1+)r and gi(p*) = gi(q) is less than 2L
• We show that Pr[I & II hold] ¸ ½-1/e
Correctness
• Property I:if ||q-p*|| · r then gi(p*) = gi(q) for some i 2 1,...,L
• Property II:number of points p2 P s.t. ||q-p|| ¸ (1+)r and gi(p*) = gi(q) is less than 2L
• Choose: – k = log1/p2
n
– L = nwhere
Complexity
• k = log1/p2n
• L = nwhere• Space
L¢n + d¢n = O(n1+ + dn)
• QueryL hash function evaluations +
O(L) distance calculations = O(dn)
Hash tables Data points
~
Significance of k and L
||p-q||
Pr[
g(p
) =
g
(q)]
Significance of k and L
||p-q||
Pr[
gi(p
) =
gi(q
)
for
som
e i 2
1,.
..,L
]
Application
• Perform NNS in Rd with l1 distance.
• Reduce the problem to NNS in Hd’ the hamming cube of dimension d’.
• Hd’ = binary strings of length d’.
• dHam(s1,s2) = number of coordinates where s1 and s2 disagree.
• w.l.o.g all coordinates of all points in P are positive integer < C.
• Map integer i 2 {1,...,C} to(1,1,....,1,0,0,...0)
• Map a vector by mapping each coordinate.• Example: {(5,3,2),(2,4,1)}
{(11111,11100,11000),(11000,11110,10000)}
Embedding l1d in Hd’
C-i zerosi ones
• Distances are preserved.• Actual computations are performed
in the original space O(log C) overhead.
Embedding l1d in Hd’
A sensitive family for the hamming cube
• Hd’ = {hi : hi(b1,…,bd’) = bi for i = 1,…,d’}– If dHam(s1,s2) < r what is Pr[h(p)=h(q)] ?
at most 1-r/d’– If dHam(s,s2) > (1+)r what is Pr[h(p)=h(q)] ?
at least 1-(1+)r/d’
• Hd’ is (r,(1+)r,1-r/d’,1-(1+)r/d’) sensitive.
• Question: what are these projections in the original space?
Corollary
• We can bound · (1/1+)
• Space - O(dn+n(1+1/(1+) • Query - O(dn1/(1+)
When = 1 O(dn + n3/2) O(d¢sqrt(n))
Recent results
• In Euclidian space– · 1/(1+)2 + O(log log n / log1/3 n)
[Andoni & Indyk 2008]– ¸ 0.462/(1+)2
[Motwani, Naor & Panigrahy 2006]
• LSH family for ls s 2 [0,2)[Datar,Immorlica,Indyk & Mirrokni 2004]
• And many more.
Conclusion
• NNS is an important problem with many applications.
• The problem can be efficiently solved in low dimensions.
• We saw some efficient approximate solutions in high dimensions, which are applicable to many metrics.