Similarity Search and Locality Sensitive Hashing Using Ternary Content Address Able Memories (2010)
Locality sensitive hashing
-
Upload
yasanka-sameera-horawalavithana -
Category
Technology
-
view
124 -
download
5
Transcript of Locality sensitive hashing
![Page 1: Locality sensitive hashing](https://reader033.fdocuments.us/reader033/viewer/2022061523/55c7258abb61eb064d8b45fa/html5/thumbnails/1.jpg)
Locality Sensitive HashingRandomized Algorithm
![Page 2: Locality sensitive hashing](https://reader033.fdocuments.us/reader033/viewer/2022061523/55c7258abb61eb064d8b45fa/html5/thumbnails/2.jpg)
Problem Statement
• Given a query point q,• Find closest items to the query
point with the probability of
• Iterative methods?• Large volume of data• Curse of dimensionality
![Page 3: Locality sensitive hashing](https://reader033.fdocuments.us/reader033/viewer/2022061523/55c7258abb61eb064d8b45fa/html5/thumbnails/3.jpg)
Taxonomy – Near Neighbor Query (NN)
NN
Trees
K-d Tree Range Tree B Tree Cover Tree
Grid
Voronoi Diagram
Hash
ApproximateLSH
![Page 4: Locality sensitive hashing](https://reader033.fdocuments.us/reader033/viewer/2022061523/55c7258abb61eb064d8b45fa/html5/thumbnails/4.jpg)
Approximate LSH
• Simple Idea• if two points are close together, then after a “projection” operation these two
points will remain close together
![Page 5: Locality sensitive hashing](https://reader033.fdocuments.us/reader033/viewer/2022061523/55c7258abb61eb064d8b45fa/html5/thumbnails/5.jpg)
LSH Requirement
• For any given points
• Hash function h is (, ) sensitive, Ideally we need• to be large• to be small
![Page 6: Locality sensitive hashing](https://reader033.fdocuments.us/reader033/viewer/2022061523/55c7258abb61eb064d8b45fa/html5/thumbnails/6.jpg)
Pd
2d
c.d
q
q
P(1)
P(2)
P(c) P(1) P(2) P(3)
q
![Page 7: Locality sensitive hashing](https://reader033.fdocuments.us/reader033/viewer/2022061523/55c7258abb61eb064d8b45fa/html5/thumbnails/7.jpg)
Probability vs. Distance on candidate pairs
![Page 8: Locality sensitive hashing](https://reader033.fdocuments.us/reader033/viewer/2022061523/55c7258abb61eb064d8b45fa/html5/thumbnails/8.jpg)
Hash Function(Random)
• Locality-preserving• Independent• Deterministic• Family of Hash Function per various distance measures• Euclidean• Jaccard• Cosine Similarity• Hamming
![Page 9: Locality sensitive hashing](https://reader033.fdocuments.us/reader033/viewer/2022061523/55c7258abb61eb064d8b45fa/html5/thumbnails/9.jpg)
LSH Family for Euclidean distance (2d)
• When ,• Chance of colliding• But not certain
• But can guarantee,• If ,
• to have
• If ,• 0 to have
• As LSH (, ) sensitive
![Page 10: Locality sensitive hashing](https://reader033.fdocuments.us/reader033/viewer/2022061523/55c7258abb61eb064d8b45fa/html5/thumbnails/10.jpg)
How to define the projection?
• Scalar projection (Dot product)
![Page 11: Locality sensitive hashing](https://reader033.fdocuments.us/reader033/viewer/2022061523/55c7258abb61eb064d8b45fa/html5/thumbnails/11.jpg)
How to define the projection?
• K-dot product, that
points at different separations will fall into the same quantization bin
• Perform k independent dot products• Achieve success,• if the query and the nearest neighbor are in the same bin in all k dot products• Success probability = decreases as we include more dot products
![Page 12: Locality sensitive hashing](https://reader033.fdocuments.us/reader033/viewer/2022061523/55c7258abb61eb064d8b45fa/html5/thumbnails/12.jpg)
Multiple-projections
• L independent projections• True near neighbor will be unlikely to be unlucky in all the projections
• By increasing L,• we can find the true nearest neighbor with arbitrarily high probability
![Page 13: Locality sensitive hashing](https://reader033.fdocuments.us/reader033/viewer/2022061523/55c7258abb61eb064d8b45fa/html5/thumbnails/13.jpg)
Accuracy
• Two close points p and q,• Separated by • Probability of collision ,
- probability density function of H
• As distance u increases, decreases
![Page 14: Locality sensitive hashing](https://reader033.fdocuments.us/reader033/viewer/2022061523/55c7258abb61eb064d8b45fa/html5/thumbnails/14.jpg)
Time complexity
• For a query point q,• To Find the near neighbor: (+)
• Calculate & hash the projections ()
• Search the bucket for collisions ()• O(DL); D-dimension, L projections, and• where ; - expected number of collisions for single projection
• Analyze• increases as k & L increase• decreases as k increases since
![Page 15: Locality sensitive hashing](https://reader033.fdocuments.us/reader033/viewer/2022061523/55c7258abb61eb064d8b45fa/html5/thumbnails/15.jpg)
How many projections(L)?
• For query point p & neighbor q,• For single projection,
• Success probability of collisions: • For L projections,
• Failure probability of collisions:
![Page 16: Locality sensitive hashing](https://reader033.fdocuments.us/reader033/viewer/2022061523/55c7258abb61eb064d8b45fa/html5/thumbnails/16.jpg)
LSH in MAXDIVREL Diversity
#1 #2 #3 … #k dot product
1 1 0 0 .. 1
2 0 1 1 … 1
w 0 0 1 … 0
#1 #2 #3 … #k dot product
1 1 1 0 .. 1
2 1 0 1 … 1
w 0 1 1 … 0
#1 #2 #3 … #k dot product
1 1 0 1 .. 0
2 0 0 1 … 0
w 0 1 0 … 0
#1 #2 #3 … #k dot product
1 1 0 0 .. 1
2 0 1 1 … 1
w 0 0 1 … 0
![Page 17: Locality sensitive hashing](https://reader033.fdocuments.us/reader033/viewer/2022061523/55c7258abb61eb064d8b45fa/html5/thumbnails/17.jpg)
REFERENCES
[1] Anand Rajaraman and Jeff Ullman, “Chapter Three of ‘Mining of Massive Datasets,’” pp. 72–130.[2] M. Slaney and M. Casey, “Lecture Note: LSH,” 2008.[3] N. Sundaram, A. Turmukhametova, N. Satish, T. Mostak, P. Indyk, S. Madden, and P. Dubey, “Streaming similarity search over one billion tweets using parallel locality-sensitive hashing,” Proc. VLDB Endow., vol. 6, no. 14, pp. 1930–1941, Sep. 2013.