Beyond Locality Sensitive Hashing Alex Andoni (Microsoft Research) Joint with: Piotr Indyk (MIT),...

18
Beyond Locality Sensitive Hashing Alex Andoni (Microsoft Research) Joint with: Piotr Indyk (MIT), Huy L. Nguyen (Princeton), Ilya Razenshteyn (MIT)

Transcript of Beyond Locality Sensitive Hashing Alex Andoni (Microsoft Research) Joint with: Piotr Indyk (MIT),...

Beyond Locality Sensitive Hashing

Alex Andoni (Microsoft Research)

Joint with: Piotr Indyk (MIT), Huy L. Nguyen (Princeton), Ilya Razenshteyn (MIT)

Nearest Neighbor Search (NNS)

• Preprocess: a set of points

• Query: given a query point , report a point with the smallest distance to

𝑞

𝑝

Motivation• Generic setup:

• Points model objects (e.g. images)• Distance models (dis)similarity measure

• Application areas: • machine learning: k-NN rule• image/video/music recognition,

deduplication, bioinformatics, etc…

• Distance can be: • Hamming, Euclidean, …

• Primitive for other problems:• find the similar pairs, clustering…

000000011100010100000100010100011111

000000001100000100000100110100111111 𝑞

𝑝

Approximate NNS

• r-near neighbor: given a new point , report a point s.t.

• Randomized: a point returned with 90% probability

• In practice: query time small if not too many approximate neighbors

c-approximate

𝑐𝑟if there exists apoint at distance

q

r p

cr

Locality-Sensitive Hashing• Random hash function on

s.t. for any points :• Close when is “high” • Far when is “small”

• Use several hashtables

q

p

𝑑𝑖𝑠𝑡 (𝑝 ,𝑞)

Pr [𝑔 (𝑝 )=𝑔 (𝑞)]

𝑟 𝑐𝑟

1

𝑃 1

𝑃 2

: , where

[Indyk-Motwani’98]

q

“not-so-small”𝑃 1=¿𝑃 2=¿

𝑃1=𝑃2𝜌

Locality sensitive hash functions• Hash function is actually a

concatenation of “primitive” functions:

• LSH in Hamming space • , i.e., choose bit for a random • • = projection on a few random bits

6

[Indyk-Motwani’98]

Ham (𝑝 ,𝑞)

Pr [𝑔 (𝑝 )=𝑔 (𝑞)]

𝑟 𝑐𝑟

1

𝑃 1

𝑃 2

Algorithms and Lower Bounds

Space Time Comment Reference

[IM’98]

memory lookups [PTW’08, PTW’10]

[IM’98]

[DIIM’04, AI’06]

ℓ1

ℓ2

[MNP’06]

[OWZ’11]

memory lookups [PTW’08, PTW’10]

[MNP’06]

[OWZ’11]

LSH is tight…

leave the rest to cell-probe lower bounds?

Main Result

• NNS in Hamming space () with query time, space and preprocessing for

• cf of [IM’98]

• NNS in Euclidean space () with:

• cf of [AI’06]

9

+𝑂( 1𝑐3 /2 )+𝑜(1)

+𝑂( 1𝑐3 )+𝑜(1)

A look at LSH lower bounds

• LSH lower bounds in Hamming space• Fourier analytic

• [Motwani-Naor-Panigrahy’06]• distribution over hash functions • Far pair: random, distance • Close pair: random at distance = • Get

10

[O’Donnell-Wu-Zhou’11]

𝜖 𝑑𝜖 𝑑𝑐

𝜌 ≥1/𝑐

Why not NNS lower bound?

• Suppose we try to generalize [OWZ’11] to NNS

• Pick random • All the “far point” are concentrated in a small

ball of radius • Easy to see at preprocessing: actual near

neighbor close to the center of the minimum enclosing ball

11

Our algorithm: intuition

• Data dependent LSH:• Hashing that depends on the entire given

dataset!

• Two components:• “Nice” geometric configuration with • Reduction from general to this “nice”

geometric configuration

12

Nice Configuration: “sparsity”• All points are on a sphere of radius

• Random points are at distance

• Lemma 1: • “Proof”:

• Obtained via “cap carving”• Similar to “ball carving” [KMS’98, AI’06]

• Lemma 1’: for radius =

13

Reduction: into spherical LSH• Idea: apply a few rounds of “regular” LSH

• Ball carving [AI’06]

• Intuitively:• far points unlikely to collide• partitions the data into buckets of small

diameter • find the minimum enclosing ball• finally apply spherical LSH on this ball!

14

Two-level algorithm

• hash tables, each with:• hash function • ’s are “ball carving LSH” (data independent)• ’s are “spherical LSH” (data dependent)

• Final is an “average” of from levels 1 and 2

h1 (𝑝 ) ,…,h𝑙(𝑝)

𝑠1 (𝑝 ) ,…, 𝑠𝑚(𝑝)

Details

• Inside a bucket, need to ensure “sparse” case

• 1) drop all “far pairs”• 2) find minimum enclosing ball (MEB)

• Use Jung theorem: diameter implies MEB radius

• 3) partition by “sparsity” (distance from center) into shells

16

Practice• Practice uses data-dependent

partitions!• “wherever theoreticians suggest to use

random dimensionality reduction, use PCA”

• Lots of variants• Trees: kd-trees, quad-trees, ball-trees,

rp-trees, PCA-trees, sp-trees…• no guarantees: e.g., are deterministic

• Is there a better way to do partitions in practice?

• Why do PCA-trees work?• [Abdullah-A-Kannan-Krauthgamer]: if

have more structure17

Finale

• NNS with query time: • where for • where for

• Below the lower bounds for LSH/space partitions!

• Idea: data dependent space partitions• Better upper bound?

• Multi-level improves a bit, but not too much• for ?

• Best partitions in theory and practice?18