K-Nearest Neighbors Search in High Dimensions Tomer Peled Dan Kushnir Tell me who your neighbors...
-
Upload
omari-dannels -
Category
Documents
-
view
219 -
download
1
Transcript of K-Nearest Neighbors Search in High Dimensions Tomer Peled Dan Kushnir Tell me who your neighbors...
![Page 1: K-Nearest Neighbors Search in High Dimensions Tomer Peled Dan Kushnir Tell me who your neighbors are, and I'll know who you are.](https://reader036.fdocuments.us/reader036/viewer/2022062421/56649c765503460f94929f53/html5/thumbnails/1.jpg)
k-Nearest Neighbors Search in High Dimensions
Tomer Peled
Dan Kushnir
Tell me who your neighbors are and Ill know who you are
Outline
bullProblem definition and flavorsProblem definition and flavorsbullAlgorithms overview - low dimensions bullCurse of dimensionality (dgt1020)bullEnchanting the curse
Locality Sensitive Hashing (high dimension approximate solutions)
bulll2 extensionbullApplications (Dan)
bull Given a set P of n points in Rd
Over some metric
bull find the nearest neighbor p of q in P
Nearest Neighbor SearchProblem definition
Distance metric
Applications
bullClassification bullClustering
bullSegmentation
q
bullIndexingbullDimension reduction
(eg lle)
color
Weight
Naiumlve solution
bullNo preprocess
bullGiven a query point qndashGo over all n pointsndashDo comparison in Rd
bullquery time = O(nd)
Keep in mind
Common solution
bullUse a data structure for acceleration
bullScale-ability with n amp with d is important
When to use nearest neighbor
High level algorithms
Assuming no prior knowledge about the underlying probability structure
complex models Sparse data High dimensions
Parametric Non-parametric
Density estimation
Probability distribution estimation
Nearest neighbors
Nearest Neighbor
min pi P dist(qpi)
Closestqq
r - Nearest Neighbor
r
(1 + ) r
dist(qp1) r
dist(qp2) (1 + ) r r2=(1 + ) r1
Outline
bullProblem definition and flavorsbullAlgorithms overview - low dimensionsAlgorithms overview - low dimensions bullCurse of dimensionality (dgt1020)bullEnchanting the curse
Locality Sensitive Hashing (high dimension approximate solutions)
bulll2 extensionbullApplications (Dan)
The simplest solution
bullLion in the desert
Quadtree
Split the first dimension into 2
Repeat iteratively
Stop when each cell has no more than 1 data point
Quadtree - structure
X
Y
X1Y1 PgeX1PgeY1
PltX1PltY1
PgeX1PltY1
PltX1PgeY1
X1Y1
Quadtree - Query
X
Y
In many cases works
X1Y1PltX1PltY1 PltX1
PgeY1
X1Y1
PgeX1PgeY1
PgeX1PltY1
Quadtree ndash Pitfall1
X
Y
In some cases doesnrsquot
X1Y1PgeX1PgeY1
PltX1
PltX1PltY1 PgeX1
PltY1PltX1PgeY1
X1Y1
Quadtree ndash Pitfall1
X
Y
In some cases nothing works
Quadtree ndash pitfall 2X
Y
O(2d)
Could result in Query time Exponential in dimensions
Space partition based algorithms
Multidimensional access methods Volker Gaede O Gunther
Could be improved
Outline
bullProblem definition and flavorsbullAlgorithms overview - low dimensions bullCurse of dimensionality (dgt1020)Curse of dimensionality (dgt1020)bullEnchanting the curse
Locality Sensitive Hashing (high dimension approximate solutions)
bulll2 extensionbullApplications (Dan)
Curse of dimensionality
bullQuery time or spaceO(nd)bullDgt1020 worst than sequential scan
ndashFor most geometric distributionsbullTechniques specific to high dimensions are needed
bullProoved in theory and in practice by Barkol amp Rabani 2000 amp Beame-Vee 2002
O( min(nd nd) )Naive
Curse of dimensionalitySome intuition
2
22
23
2d
Outline
bullProblem definition and flavorsbullAlgorithms overview - low dimensions bullCurse of dimensionality (dgt1020)bullEnchanting the curse Enchanting the curse
Locality Sensitive Hashing Locality Sensitive Hashing (high dimension approximate solutions)
bulll2 extensionbullApplications (Dan)
Preview
bullGeneral Solution ndash Locality sensitive hashing
bullImplementation for Hamming space
bullGeneralization to l1 amp l2
Hash function
Hash function
Hash function
Data_Item
Key
BinBucket
Hash function
X modulo 3
X=Number in the range 0n
02
Storage Address
Data structure
0
Usually we would like related Data-items to be stored at the same bin
Recall r - Nearest Neighbor
r
(1 + ) r
dist(qp1) r
dist(qp2) (1 + ) r r2=(1 + ) r1
Locality sensitive hashing
r(1 + ) r
(r p1p2 )Sensitiveequiv Pr[I(p)=I(q)] is ldquohighrdquo if p is ldquocloserdquo to qequiv Pr[I(p)=I(q)] is ldquolowrdquo if p isrdquofarrdquo from q
r2=(1 + ) r1
P1P2
Preview
bullGeneral Solution ndash Locality sensitive hashing
bullImplementation for Hamming space
bullGeneralization to l1 amp l2
Hamming Space
bullHamming space = 2N binary strings
bullHamming distance = changed digits
aka Signal distanceRichard Hamming
Hamming SpaceN
010100001111
010100001111
010010000011Distance = 4
bullHamming space
bullHamming distance
SUM(X1 XOR X2)
L1 to Hamming Space Embedding
p
8
C=11
1111111100011000000000
2
1111111100011000000000
drsquo=Cd
Hash function
Lj Hash function
p Hdrsquoisin
Gj(p)=p|Ij
j=1L k=3 digits
Bits sampling from p
Store p into bucket p|Ij 2k buckets101
11000000000 111111110000 111000000000 111111110001
Construction
1 2 L
p
Query
1 2 L
q
Alternative intuition random projections
p
8
C=11
1111111100011000000000
2
1111111100011000000000
drsquo=Cd
Alternative intuition random projections
8
C=11
1111111100011000000000
2
1111111100011000000000
p
Alternative intuition random projections
8
C=11
1111111100011000000000
2
1111111100011000000000
p
Alternative intuition random projections
101
11000000000 111111110000 111000000000 111111110001
000
100
110
001
101
111
2233 BucketsBucketsp
k samplings
Repeating
Repeating L times
Repeating L times
Secondary hashing
Support volume tuning
dataset-size vs storage volume
2k buckets
011
Size=B
M Buckets
Simple Hashing
MB=αn α=2
Skip
The above hashing is locality-sensitive
bullProbability (pq in same bucket)=
k=1 k=2
Distance (qpi) Distance (qpi)
Pro
babi
lity Pr
Adopted from Piotr Indykrsquos slides
kqp
dimensions
)(Distance1
Preview
bullGeneral Solution ndash Locality sensitive hashing
bullImplementation for Hamming space
bullGeneralization to l2
Direct L2 solution
bullNew hashing function
bullStill based on sampling
bullUsing mathematical trick
bullP-stable distribution for Lp distance bullGaussian distribution for L2 distance
Central limit theorem
v1 +v2 hellip+vn =+hellip
(Weighted Gaussians) = Weighted Gaussian
Central limit theorem
v1vn = Real Numbers
X1Xn = Independent Identically Distributed(iid)
+v2 X2 hellip+vn Xn =+hellipv1 X1
Central limit theorem
XvXvi
ii
ii
21
2||
Dot Product Norm
Norm Distance
XvuXvXui
iii
iii
ii
21
2||
Features vector 1
Features vector 2 Distance
Norm Distance
XvuXvXui
iii
iii
ii
21
2||
Dot Product
Dot Product Distance
The full Hashing
w
bvavh ba )(
[34 82 21]1
227742
d
d random numbers
+b
phaseRandom[0w]
wDiscretization step
Features vector
The full Hashing
w
bvavh ba )(
+34
100
7944
7900 8000 8100 82007800
The full Hashing
w
bvavh ba )(
+34
phaseRandom[0w]
100Discretization step
7944
The full Hashing
w
bvavh ba )(
a1 v d
iid from p-stable distribution
+b
phaseRandom[0w]
wDiscretization step
Features vector
Generalization P-Stable distribution
bullLp p=eps2
bullGeneralized Central Limit Theorem
bullP-stable distributionCauchy for L2
bullL2
bullCentral Limit Theorem
bullGaussian (normal) distribution
P-Stable summary
bullWorks for bullGeneralizes to 0ltplt=2
bullImproves query time
Query time = O (dn1(1+)log(n) ) O (dn1(1+)^2log(n) )
r - Nearest Neighbor
Latest resultsReported in Email by
Alexander Andoni
Parameters selection
bull90 Probability Best quarry time performance
For Euclidean Space
Parameters selectionhellip
For Euclidean Space
bullSingle projection hit an - Nearest Neighbor with Pr=p1
bullk projections hits an - Nearest Neighbor with Pr=p1k
bullL hashings fail to collide with Pr=(1-p1k)L
bullTo ensure Collision (eg 1-δge90)
bull1( -1-p1k)Lge 1-δ)1log(
)log(
1kp
L
L
Reject Non-NeighborsAccept Neighbors
hellipParameters selection
K
k
time Candidates verification Candidates extraction
Better Query Time than Spatial Data Structures
Scales well to higher dimensions and larger data size ( Sub-linear dependence )
Predictable running time
Extra storage over-head
Inefficient for data with distances concentrated around average
works best for Hamming distance (although can be generalized to Euclidean space)
In secondary storage linear scan is pretty much all we can do (for high dim)
requires radius r to be fixed in advance
Pros amp Cons
From Pioter Indyk slides
Conclusion
bullbut at the endeverything depends on your data set
bullTry it at homendashVisit
httpwebmiteduandoniwwwLSHindexhtml
ndashEmail Alex AndoniAndonimitedundashTest over your own data
(C code under Red Hat Linux )
LSH - Applicationsbull Searching video clips in databases (Hierarchical Non-Uniform Locality Sensitive
Hashing and Its Application to Video Identificationldquo Yang Ooi Sun)
bull Searching image databases (see the following)
bull Image segmentation (see the following)
bull Image classification (ldquoDiscriminant adaptive Nearest Neighbor Classificationrdquo T Hastie R Tibshirani)
bull Texture classification (see the following)
bull Clustering (see the following)
bull Embedding and manifold learning (LLE and many others)
bull Compression ndash vector quantizationbull Search engines (ldquoLSH Forest SelfTuning Indexes for Similarity Searchrdquo M Bawa T Condie P Ganesanrdquo)
bull Genomics (ldquoEfficient Large-Scale Sequence Comparison by Locality-Sensitive Hashingrdquo J Buhler)
bull In short whenever K-Nearest Neighbors (KNN) are needed
Motivation
bull A variety of procedures in learning require KNN computation
bull KNN search is a computational bottleneck
bull LSH provides a fast approximate solution to the problem
bull LSH requires hash function construction and parameter tunning
Outline
Fast Pose Estimation with Parameter Sensitive Hashing G Shakhnarovich P Viola and T Darrell
bull Finding sensitive hash functions
Mean Shift Based Clustering in HighDimensions A Texture Classification Example
B Georgescu I Shimshoni and P Meer
bull Tuning LSH parametersbull LSH data structure is used for algorithm
speedups
Given an image x what are the parameters θ in this image
ie angles of joints orientation of the body etc1048698
The Problem
Fast Pose Estimation with Parameter Sensitive Hashing
G Shakhnarovich P Viola and T Darrell
i
Ingredients
bull Input query image with unknown angles (parameters)
bull Database of human poses with known anglesbull Image feature extractor ndash edge detector
bull Distance metric in feature space dx
bull Distance metric in angles space
m
i
iid1
2121 )cos(1)(
Example based learning
bull Construct a database of example images with their known angles
bull Given a query image run your favorite feature extractorbull Compute KNN from databasebull Use these KNNs to compute the average angles of the
query
Input queryFind KNN in database of examples
Output Average angles of KNN
Input Query
Features extraction
Processed query
PSH (LSH)
Database of examples
The algorithm flow
LWR (Regression)
Output Match
The image features
B A
Axx 4107 )(
4
3
2
4 0
Image features are multi-scale edge histograms
Feature Extraction PSH LWR
PSH The basic assumption
There are two metric spaces here feature space ( )
and parameter space ( )
We want similarity to be measured in the angles
space whereas LSH works on the feature space
bull Assumption The feature space is closely related to the parameter space
xd
d
Feature Extraction PSH LWR
Insight Manifolds
bull Manifold is a space in which every point has a neighborhood resembling a Euclid space
bull But global structure may be complicated curved
bull For example lines are 1D manifolds planes are 2D manifolds etc
Feature Extraction PSH LWR
Parameters Space (angles)
Feature Space
q
Is this Magic
Parameter Sensitive Hashing (PSH)
The trick
Estimate performance of different hash functions on examples and select those sensitive to
The hash functions are applied in feature space but the KNN are valid in angle space
d
Feature Extraction PSH LWR
Label pairs of examples with similar angles
Define hash functions h on feature space
Feature Extraction PSH LWR
Predict labeling of similarnon-similar examples by using h
Compare labeling
If labeling by h is goodaccept h else change h
PSH as a classification problem
+1 +1 -1 -1
(r=025)
Labels
)1()( if 1
)( if 1y
labeled is
)x()(x examples ofpair A
ij
ji
rd
rd
ji
ji
ji
Feature Extraction PSH LWR
otherwise 1-
T(x) if 1)(
xh T
A binary hash functionfeatures
otherwise 1
if 1ˆ
labels ePredict th
)(xh)(xh)x(xy
jTiTjih
Feature Extraction PSH LWR
Feature
Feature Extraction PSH LWR
sconstraint iesprobabilit with thelabeling true thepredicts that Tbest theFind
themseparateor bin
same in the examplesboth place willTh
)(xT
Local Weighted Regression (LWR)bull Given a query image PSH returns
KNNs
bull LWR uses the KNN to compute a weighted average of the estimated angles of the query
weightdist
iXiixNx
xxdKxgdi
0)(
))(())((minarg0
Feature Extraction PSH LWR
Results
Synthetic data were generated
bull 13 angles 1 for rotation of the torso 12 for joints
bull 150000 images
bull Nuisance parameters added clothing illumination face expression
bull 1775000 example pairs
bull Selected 137 out of 5123 meaningful features (how)
18 bit hash functions (k) 150 hash tables (l)
bull Test on 1000 synthetic examplesbull PSH searched only 34 of the data per query
bull Without selection needed 40 bits and
1000 hash tables
Recall P1 is prob of positive hashP2 is prob of bad hashB is the max number of pts in a bucket
Results ndash real data
bull 800 images
bull Processed by a segmentation algorithm
bull 13 of the data were searched
Results ndash real data
Interesting mismatches
Fast pose estimation - summary
bull Fast way to compute the angles of human body figure
bull Moving from one representation space to another
bull Training a sensitive hash function
bull KNN smart averaging
Food for Thought
bull The basic assumption may be problematic (distance metric representations)
bull The training set should be dense
bull Texture and clutter
bull General some features are more important than others and should be weighted
Food for Thought Point Location in Different Spheres (PLDS)
bull Given n spheres in Rd centered at P=p1hellippn
with radii r1helliprn
bull Goal given a query q preprocess the points in P to find point pi that its sphere lsquocoverrsquo the query q
qpi
ri
Courtesy of Mohamad Hegaze
Motivationbull Clustering high dimensional data by using local
density measurements (eg feature space)bull Statistical curse of dimensionality
sparseness of the databull Computational curse of dimensionality
expensive range queriesbull LSH parameters should be adjusted for optimal
performance
Mean-Shift Based Clustering in High Dimensions A Texture Classification Example
B Georgescu I Shimshoni and P Meer
Outline
bull Mean-shift in a nutshell + examples
Our scope
bull Mean-shift in high dimensions ndash using LSH
bull Speedups1 Finding optimal LSH parameters
2 Data-driven partitions into buckets
3 Additional speedup by using LSH data structure
Mean-Shift in a Nutshellbandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
point
KNN in mean-shift
Bandwidth should be inversely proportional to the density in the region
high density - small bandwidth low density - large bandwidth
Based on kth nearest neighbor of the point
The bandwidth is
Adaptive mean-shift vs non-adaptive
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Image segmentation algorithm1 Input Data in 5D (3 color + 2 xy) or 3D (1 gray +2 xy)2 Resolution controlled by the bandwidth hs (spatial) hr (color)3 Apply filtering
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
3D
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Image segmentation algorithm
original segmented
filtered
Filtering pixel value of the nearest mode
Mean-shift trajectories
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
original squirrel filtered
original baboon filtered
Filtering examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Segmentation examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Mean-shift in high dimensions
Computational curse of dimensionality
Statistical curse of dimensionality
Expensive range queries implemented with LSH
Sparseness of the data variable bandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
LSH-based data structure
bull Choose L random partitionsEach partition includes K pairs
(dkvk)bull For each point we check
kdi vxK
It Partitions the data into cells
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing the optimal K and L
bull For a query q compute smallest number of distances to points in its buckets
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
points extra includemight big toois L ifbut
missed bemight points small toois L If
cell ain points ofnumber smaller k Large
C
l
l
CC
dC
LNN
dKnN
)1(
C
C
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
structure data theof resolution thedetermines
decreases but increases increases L As
C
CC
Choosing optimal K and LDetermine accurately the KNN for m randomly-selected data points
distance (bandwidth)
Choose error threshold
The optimal K and L should satisfy
the approximate distance
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing optimal K and Lbull For each K estimate the error forbull In one run for all Lrsquos find the minimal L satisfying the constraint L(K)bull Minimize time t(KL(K))
minimum
Approximationerror for KL
L(K) for =005 Running timet[KL(K)]
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Data driven partitions
bull In original LSH cut values are random in the range of the databull Suggestion Randomly select a point from the data and use one of its coordinates as the cut value
uniform data driven pointsbucket distribution
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Additional speedup
aggregate)an of typea like is (C mode same
the toconverge willCin points all that Assume
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
C
C
Speedup results
65536 points 1638 points sampled k=100
Food for thought
Low dimension High dimension
A thought for foodhellipbull Choose K L by sample learning or take the traditionalbull Can one estimate K L without samplingbull A thought for food does it help to know the data
dimensionality or the data manifoldbull Intuitively dimensionality implies the number of hash
functions neededbull The catch efficient dimensionality learning requires
KNN
1530 cookieshellip
Summary
bull LSH suggests a compromise on accuracy for the gain of complexity
bull Applications that involve massive data in high dimension require the LSH fast performance
bull Extension of the LSH to different spaces (PSH)
bull Learning the LSH parameters and hash functions for different applications
Conclusion
bull but at the endeverything depends on your data set
bull Try it at homendash Visit
httpwebmiteduandoniwwwLSHindexhtmlndash Email Alex Andoni Andonimitedundash Test over your own data
(C code under Red Hat Linux )
Thanks
bull Ilan Shimshoni (Haifa)
bull Mohamad Hegaze (Weizmann)
bull Alex Andoni (MIT)
bull Mica and Denis
- k-Nearest Neighbors Search in High Dimensions
- Outline
- Nearest Neighbor Search Problem definition
- Applications
- Naiumlve solution
- Common solution
- When to use nearest neighbor
- Nearest Neighbor
- r - Nearest Neighbor
- Slide 10
- The simplest solution
- Quadtree
- Quadtree - structure
- Quadtree - Query
- Quadtree ndash Pitfall1
- Slide 16
- Quadtree ndash pitfall 2
- Space partition based algorithms
- Slide 19
- Curse of dimensionality
- Curse of dimensionality Some intuition
- Slide 22
- Preview
- Hash function
- Slide 25
- Slide 26
- Recall r - Nearest Neighbor
- Locality sensitive hashing
- Slide 29
- Hamming Space
- Slide 31
- L1 to Hamming Space Embedding
- Slide 33
- Construction
- Query
- Alternative intuition random projections
- Slide 37
- Slide 38
- Slide 39
- k samplings
- Repeating
- Repeating L times
- Slide 43
- Secondary hashing
- The above hashing is locality-sensitive
- Slide 46
- Direct L2 solution
- Central limit theorem
- Slide 49
- Slide 50
- Norm Distance
- Slide 52
- The full Hashing
- Slide 54
- Slide 55
- Slide 56
- Generalization P-Stable distribution
- P-Stable summary
- Parameters selection
- Parameters selection hellip
- hellip Parameters selection
- Pros amp Cons
- Conclusion
- LSH - Applications
- Motivation
- Slide 66
- Given an image x what are the parameters θ in this image ie angles of joints orientation of the body etc1048698
- Ingredients
- Example based learning
- Slide 70
- The image features
- PSH The basic assumption
- Insight Manifolds
- Slide 74
- Parameter Sensitive Hashing (PSH)
- Slide 76
- Slide 77
- Slide 78
- Slide 79
- Local Weighted Regression (LWR)
- Results
- Slide 82
- Results ndash real data
- Slide 84
- Slide 85
- Fast pose estimation - summary
- Food for Thought
- Food for Thought Point Location in Different Spheres (PLDS)
- Motivation
- Slide 90
- Slide 91
- Slide 92
- Slide 93
- Image segmentation algorithm
- Slide 95
- Filtering examples
- Segmentation examples
- Mean-shift in high dimensions
- LSH-based data structure
- Choosing the optimal K and L
- Slide 101
- Choosing optimal K and L
- Slide 103
- Data driven partitions
- Additional speedup
- Speedup results
- Food for thought
- A thought for foodhellip
- Summary
- Slide 110
- Thanks
-
![Page 2: K-Nearest Neighbors Search in High Dimensions Tomer Peled Dan Kushnir Tell me who your neighbors are, and I'll know who you are.](https://reader036.fdocuments.us/reader036/viewer/2022062421/56649c765503460f94929f53/html5/thumbnails/2.jpg)
Outline
bullProblem definition and flavorsProblem definition and flavorsbullAlgorithms overview - low dimensions bullCurse of dimensionality (dgt1020)bullEnchanting the curse
Locality Sensitive Hashing (high dimension approximate solutions)
bulll2 extensionbullApplications (Dan)
bull Given a set P of n points in Rd
Over some metric
bull find the nearest neighbor p of q in P
Nearest Neighbor SearchProblem definition
Distance metric
Applications
bullClassification bullClustering
bullSegmentation
q
bullIndexingbullDimension reduction
(eg lle)
color
Weight
Naiumlve solution
bullNo preprocess
bullGiven a query point qndashGo over all n pointsndashDo comparison in Rd
bullquery time = O(nd)
Keep in mind
Common solution
bullUse a data structure for acceleration
bullScale-ability with n amp with d is important
When to use nearest neighbor
High level algorithms
Assuming no prior knowledge about the underlying probability structure
complex models Sparse data High dimensions
Parametric Non-parametric
Density estimation
Probability distribution estimation
Nearest neighbors
Nearest Neighbor
min pi P dist(qpi)
Closestqq
r - Nearest Neighbor
r
(1 + ) r
dist(qp1) r
dist(qp2) (1 + ) r r2=(1 + ) r1
Outline
bullProblem definition and flavorsbullAlgorithms overview - low dimensionsAlgorithms overview - low dimensions bullCurse of dimensionality (dgt1020)bullEnchanting the curse
Locality Sensitive Hashing (high dimension approximate solutions)
bulll2 extensionbullApplications (Dan)
The simplest solution
bullLion in the desert
Quadtree
Split the first dimension into 2
Repeat iteratively
Stop when each cell has no more than 1 data point
Quadtree - structure
X
Y
X1Y1 PgeX1PgeY1
PltX1PltY1
PgeX1PltY1
PltX1PgeY1
X1Y1
Quadtree - Query
X
Y
In many cases works
X1Y1PltX1PltY1 PltX1
PgeY1
X1Y1
PgeX1PgeY1
PgeX1PltY1
Quadtree ndash Pitfall1
X
Y
In some cases doesnrsquot
X1Y1PgeX1PgeY1
PltX1
PltX1PltY1 PgeX1
PltY1PltX1PgeY1
X1Y1
Quadtree ndash Pitfall1
X
Y
In some cases nothing works
Quadtree ndash pitfall 2X
Y
O(2d)
Could result in Query time Exponential in dimensions
Space partition based algorithms
Multidimensional access methods Volker Gaede O Gunther
Could be improved
Outline
bullProblem definition and flavorsbullAlgorithms overview - low dimensions bullCurse of dimensionality (dgt1020)Curse of dimensionality (dgt1020)bullEnchanting the curse
Locality Sensitive Hashing (high dimension approximate solutions)
bulll2 extensionbullApplications (Dan)
Curse of dimensionality
bullQuery time or spaceO(nd)bullDgt1020 worst than sequential scan
ndashFor most geometric distributionsbullTechniques specific to high dimensions are needed
bullProoved in theory and in practice by Barkol amp Rabani 2000 amp Beame-Vee 2002
O( min(nd nd) )Naive
Curse of dimensionalitySome intuition
2
22
23
2d
Outline
bullProblem definition and flavorsbullAlgorithms overview - low dimensions bullCurse of dimensionality (dgt1020)bullEnchanting the curse Enchanting the curse
Locality Sensitive Hashing Locality Sensitive Hashing (high dimension approximate solutions)
bulll2 extensionbullApplications (Dan)
Preview
bullGeneral Solution ndash Locality sensitive hashing
bullImplementation for Hamming space
bullGeneralization to l1 amp l2
Hash function
Hash function
Hash function
Data_Item
Key
BinBucket
Hash function
X modulo 3
X=Number in the range 0n
02
Storage Address
Data structure
0
Usually we would like related Data-items to be stored at the same bin
Recall r - Nearest Neighbor
r
(1 + ) r
dist(qp1) r
dist(qp2) (1 + ) r r2=(1 + ) r1
Locality sensitive hashing
r(1 + ) r
(r p1p2 )Sensitiveequiv Pr[I(p)=I(q)] is ldquohighrdquo if p is ldquocloserdquo to qequiv Pr[I(p)=I(q)] is ldquolowrdquo if p isrdquofarrdquo from q
r2=(1 + ) r1
P1P2
Preview
bullGeneral Solution ndash Locality sensitive hashing
bullImplementation for Hamming space
bullGeneralization to l1 amp l2
Hamming Space
bullHamming space = 2N binary strings
bullHamming distance = changed digits
aka Signal distanceRichard Hamming
Hamming SpaceN
010100001111
010100001111
010010000011Distance = 4
bullHamming space
bullHamming distance
SUM(X1 XOR X2)
L1 to Hamming Space Embedding
p
8
C=11
1111111100011000000000
2
1111111100011000000000
drsquo=Cd
Hash function
Lj Hash function
p Hdrsquoisin
Gj(p)=p|Ij
j=1L k=3 digits
Bits sampling from p
Store p into bucket p|Ij 2k buckets101
11000000000 111111110000 111000000000 111111110001
Construction
1 2 L
p
Query
1 2 L
q
Alternative intuition random projections
p
8
C=11
1111111100011000000000
2
1111111100011000000000
drsquo=Cd
Alternative intuition random projections
8
C=11
1111111100011000000000
2
1111111100011000000000
p
Alternative intuition random projections
8
C=11
1111111100011000000000
2
1111111100011000000000
p
Alternative intuition random projections
101
11000000000 111111110000 111000000000 111111110001
000
100
110
001
101
111
2233 BucketsBucketsp
k samplings
Repeating
Repeating L times
Repeating L times
Secondary hashing
Support volume tuning
dataset-size vs storage volume
2k buckets
011
Size=B
M Buckets
Simple Hashing
MB=αn α=2
Skip
The above hashing is locality-sensitive
bullProbability (pq in same bucket)=
k=1 k=2
Distance (qpi) Distance (qpi)
Pro
babi
lity Pr
Adopted from Piotr Indykrsquos slides
kqp
dimensions
)(Distance1
Preview
bullGeneral Solution ndash Locality sensitive hashing
bullImplementation for Hamming space
bullGeneralization to l2
Direct L2 solution
bullNew hashing function
bullStill based on sampling
bullUsing mathematical trick
bullP-stable distribution for Lp distance bullGaussian distribution for L2 distance
Central limit theorem
v1 +v2 hellip+vn =+hellip
(Weighted Gaussians) = Weighted Gaussian
Central limit theorem
v1vn = Real Numbers
X1Xn = Independent Identically Distributed(iid)
+v2 X2 hellip+vn Xn =+hellipv1 X1
Central limit theorem
XvXvi
ii
ii
21
2||
Dot Product Norm
Norm Distance
XvuXvXui
iii
iii
ii
21
2||
Features vector 1
Features vector 2 Distance
Norm Distance
XvuXvXui
iii
iii
ii
21
2||
Dot Product
Dot Product Distance
The full Hashing
w
bvavh ba )(
[34 82 21]1
227742
d
d random numbers
+b
phaseRandom[0w]
wDiscretization step
Features vector
The full Hashing
w
bvavh ba )(
+34
100
7944
7900 8000 8100 82007800
The full Hashing
w
bvavh ba )(
+34
phaseRandom[0w]
100Discretization step
7944
The full Hashing
w
bvavh ba )(
a1 v d
iid from p-stable distribution
+b
phaseRandom[0w]
wDiscretization step
Features vector
Generalization P-Stable distribution
bullLp p=eps2
bullGeneralized Central Limit Theorem
bullP-stable distributionCauchy for L2
bullL2
bullCentral Limit Theorem
bullGaussian (normal) distribution
P-Stable summary
bullWorks for bullGeneralizes to 0ltplt=2
bullImproves query time
Query time = O (dn1(1+)log(n) ) O (dn1(1+)^2log(n) )
r - Nearest Neighbor
Latest resultsReported in Email by
Alexander Andoni
Parameters selection
bull90 Probability Best quarry time performance
For Euclidean Space
Parameters selectionhellip
For Euclidean Space
bullSingle projection hit an - Nearest Neighbor with Pr=p1
bullk projections hits an - Nearest Neighbor with Pr=p1k
bullL hashings fail to collide with Pr=(1-p1k)L
bullTo ensure Collision (eg 1-δge90)
bull1( -1-p1k)Lge 1-δ)1log(
)log(
1kp
L
L
Reject Non-NeighborsAccept Neighbors
hellipParameters selection
K
k
time Candidates verification Candidates extraction
Better Query Time than Spatial Data Structures
Scales well to higher dimensions and larger data size ( Sub-linear dependence )
Predictable running time
Extra storage over-head
Inefficient for data with distances concentrated around average
works best for Hamming distance (although can be generalized to Euclidean space)
In secondary storage linear scan is pretty much all we can do (for high dim)
requires radius r to be fixed in advance
Pros amp Cons
From Pioter Indyk slides
Conclusion
bullbut at the endeverything depends on your data set
bullTry it at homendashVisit
httpwebmiteduandoniwwwLSHindexhtml
ndashEmail Alex AndoniAndonimitedundashTest over your own data
(C code under Red Hat Linux )
LSH - Applicationsbull Searching video clips in databases (Hierarchical Non-Uniform Locality Sensitive
Hashing and Its Application to Video Identificationldquo Yang Ooi Sun)
bull Searching image databases (see the following)
bull Image segmentation (see the following)
bull Image classification (ldquoDiscriminant adaptive Nearest Neighbor Classificationrdquo T Hastie R Tibshirani)
bull Texture classification (see the following)
bull Clustering (see the following)
bull Embedding and manifold learning (LLE and many others)
bull Compression ndash vector quantizationbull Search engines (ldquoLSH Forest SelfTuning Indexes for Similarity Searchrdquo M Bawa T Condie P Ganesanrdquo)
bull Genomics (ldquoEfficient Large-Scale Sequence Comparison by Locality-Sensitive Hashingrdquo J Buhler)
bull In short whenever K-Nearest Neighbors (KNN) are needed
Motivation
bull A variety of procedures in learning require KNN computation
bull KNN search is a computational bottleneck
bull LSH provides a fast approximate solution to the problem
bull LSH requires hash function construction and parameter tunning
Outline
Fast Pose Estimation with Parameter Sensitive Hashing G Shakhnarovich P Viola and T Darrell
bull Finding sensitive hash functions
Mean Shift Based Clustering in HighDimensions A Texture Classification Example
B Georgescu I Shimshoni and P Meer
bull Tuning LSH parametersbull LSH data structure is used for algorithm
speedups
Given an image x what are the parameters θ in this image
ie angles of joints orientation of the body etc1048698
The Problem
Fast Pose Estimation with Parameter Sensitive Hashing
G Shakhnarovich P Viola and T Darrell
i
Ingredients
bull Input query image with unknown angles (parameters)
bull Database of human poses with known anglesbull Image feature extractor ndash edge detector
bull Distance metric in feature space dx
bull Distance metric in angles space
m
i
iid1
2121 )cos(1)(
Example based learning
bull Construct a database of example images with their known angles
bull Given a query image run your favorite feature extractorbull Compute KNN from databasebull Use these KNNs to compute the average angles of the
query
Input queryFind KNN in database of examples
Output Average angles of KNN
Input Query
Features extraction
Processed query
PSH (LSH)
Database of examples
The algorithm flow
LWR (Regression)
Output Match
The image features
B A
Axx 4107 )(
4
3
2
4 0
Image features are multi-scale edge histograms
Feature Extraction PSH LWR
PSH The basic assumption
There are two metric spaces here feature space ( )
and parameter space ( )
We want similarity to be measured in the angles
space whereas LSH works on the feature space
bull Assumption The feature space is closely related to the parameter space
xd
d
Feature Extraction PSH LWR
Insight Manifolds
bull Manifold is a space in which every point has a neighborhood resembling a Euclid space
bull But global structure may be complicated curved
bull For example lines are 1D manifolds planes are 2D manifolds etc
Feature Extraction PSH LWR
Parameters Space (angles)
Feature Space
q
Is this Magic
Parameter Sensitive Hashing (PSH)
The trick
Estimate performance of different hash functions on examples and select those sensitive to
The hash functions are applied in feature space but the KNN are valid in angle space
d
Feature Extraction PSH LWR
Label pairs of examples with similar angles
Define hash functions h on feature space
Feature Extraction PSH LWR
Predict labeling of similarnon-similar examples by using h
Compare labeling
If labeling by h is goodaccept h else change h
PSH as a classification problem
+1 +1 -1 -1
(r=025)
Labels
)1()( if 1
)( if 1y
labeled is
)x()(x examples ofpair A
ij
ji
rd
rd
ji
ji
ji
Feature Extraction PSH LWR
otherwise 1-
T(x) if 1)(
xh T
A binary hash functionfeatures
otherwise 1
if 1ˆ
labels ePredict th
)(xh)(xh)x(xy
jTiTjih
Feature Extraction PSH LWR
Feature
Feature Extraction PSH LWR
sconstraint iesprobabilit with thelabeling true thepredicts that Tbest theFind
themseparateor bin
same in the examplesboth place willTh
)(xT
Local Weighted Regression (LWR)bull Given a query image PSH returns
KNNs
bull LWR uses the KNN to compute a weighted average of the estimated angles of the query
weightdist
iXiixNx
xxdKxgdi
0)(
))(())((minarg0
Feature Extraction PSH LWR
Results
Synthetic data were generated
bull 13 angles 1 for rotation of the torso 12 for joints
bull 150000 images
bull Nuisance parameters added clothing illumination face expression
bull 1775000 example pairs
bull Selected 137 out of 5123 meaningful features (how)
18 bit hash functions (k) 150 hash tables (l)
bull Test on 1000 synthetic examplesbull PSH searched only 34 of the data per query
bull Without selection needed 40 bits and
1000 hash tables
Recall P1 is prob of positive hashP2 is prob of bad hashB is the max number of pts in a bucket
Results ndash real data
bull 800 images
bull Processed by a segmentation algorithm
bull 13 of the data were searched
Results ndash real data
Interesting mismatches
Fast pose estimation - summary
bull Fast way to compute the angles of human body figure
bull Moving from one representation space to another
bull Training a sensitive hash function
bull KNN smart averaging
Food for Thought
bull The basic assumption may be problematic (distance metric representations)
bull The training set should be dense
bull Texture and clutter
bull General some features are more important than others and should be weighted
Food for Thought Point Location in Different Spheres (PLDS)
bull Given n spheres in Rd centered at P=p1hellippn
with radii r1helliprn
bull Goal given a query q preprocess the points in P to find point pi that its sphere lsquocoverrsquo the query q
qpi
ri
Courtesy of Mohamad Hegaze
Motivationbull Clustering high dimensional data by using local
density measurements (eg feature space)bull Statistical curse of dimensionality
sparseness of the databull Computational curse of dimensionality
expensive range queriesbull LSH parameters should be adjusted for optimal
performance
Mean-Shift Based Clustering in High Dimensions A Texture Classification Example
B Georgescu I Shimshoni and P Meer
Outline
bull Mean-shift in a nutshell + examples
Our scope
bull Mean-shift in high dimensions ndash using LSH
bull Speedups1 Finding optimal LSH parameters
2 Data-driven partitions into buckets
3 Additional speedup by using LSH data structure
Mean-Shift in a Nutshellbandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
point
KNN in mean-shift
Bandwidth should be inversely proportional to the density in the region
high density - small bandwidth low density - large bandwidth
Based on kth nearest neighbor of the point
The bandwidth is
Adaptive mean-shift vs non-adaptive
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Image segmentation algorithm1 Input Data in 5D (3 color + 2 xy) or 3D (1 gray +2 xy)2 Resolution controlled by the bandwidth hs (spatial) hr (color)3 Apply filtering
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
3D
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Image segmentation algorithm
original segmented
filtered
Filtering pixel value of the nearest mode
Mean-shift trajectories
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
original squirrel filtered
original baboon filtered
Filtering examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Segmentation examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Mean-shift in high dimensions
Computational curse of dimensionality
Statistical curse of dimensionality
Expensive range queries implemented with LSH
Sparseness of the data variable bandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
LSH-based data structure
bull Choose L random partitionsEach partition includes K pairs
(dkvk)bull For each point we check
kdi vxK
It Partitions the data into cells
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing the optimal K and L
bull For a query q compute smallest number of distances to points in its buckets
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
points extra includemight big toois L ifbut
missed bemight points small toois L If
cell ain points ofnumber smaller k Large
C
l
l
CC
dC
LNN
dKnN
)1(
C
C
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
structure data theof resolution thedetermines
decreases but increases increases L As
C
CC
Choosing optimal K and LDetermine accurately the KNN for m randomly-selected data points
distance (bandwidth)
Choose error threshold
The optimal K and L should satisfy
the approximate distance
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing optimal K and Lbull For each K estimate the error forbull In one run for all Lrsquos find the minimal L satisfying the constraint L(K)bull Minimize time t(KL(K))
minimum
Approximationerror for KL
L(K) for =005 Running timet[KL(K)]
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Data driven partitions
bull In original LSH cut values are random in the range of the databull Suggestion Randomly select a point from the data and use one of its coordinates as the cut value
uniform data driven pointsbucket distribution
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Additional speedup
aggregate)an of typea like is (C mode same
the toconverge willCin points all that Assume
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
C
C
Speedup results
65536 points 1638 points sampled k=100
Food for thought
Low dimension High dimension
A thought for foodhellipbull Choose K L by sample learning or take the traditionalbull Can one estimate K L without samplingbull A thought for food does it help to know the data
dimensionality or the data manifoldbull Intuitively dimensionality implies the number of hash
functions neededbull The catch efficient dimensionality learning requires
KNN
1530 cookieshellip
Summary
bull LSH suggests a compromise on accuracy for the gain of complexity
bull Applications that involve massive data in high dimension require the LSH fast performance
bull Extension of the LSH to different spaces (PSH)
bull Learning the LSH parameters and hash functions for different applications
Conclusion
bull but at the endeverything depends on your data set
bull Try it at homendash Visit
httpwebmiteduandoniwwwLSHindexhtmlndash Email Alex Andoni Andonimitedundash Test over your own data
(C code under Red Hat Linux )
Thanks
bull Ilan Shimshoni (Haifa)
bull Mohamad Hegaze (Weizmann)
bull Alex Andoni (MIT)
bull Mica and Denis
- k-Nearest Neighbors Search in High Dimensions
- Outline
- Nearest Neighbor Search Problem definition
- Applications
- Naiumlve solution
- Common solution
- When to use nearest neighbor
- Nearest Neighbor
- r - Nearest Neighbor
- Slide 10
- The simplest solution
- Quadtree
- Quadtree - structure
- Quadtree - Query
- Quadtree ndash Pitfall1
- Slide 16
- Quadtree ndash pitfall 2
- Space partition based algorithms
- Slide 19
- Curse of dimensionality
- Curse of dimensionality Some intuition
- Slide 22
- Preview
- Hash function
- Slide 25
- Slide 26
- Recall r - Nearest Neighbor
- Locality sensitive hashing
- Slide 29
- Hamming Space
- Slide 31
- L1 to Hamming Space Embedding
- Slide 33
- Construction
- Query
- Alternative intuition random projections
- Slide 37
- Slide 38
- Slide 39
- k samplings
- Repeating
- Repeating L times
- Slide 43
- Secondary hashing
- The above hashing is locality-sensitive
- Slide 46
- Direct L2 solution
- Central limit theorem
- Slide 49
- Slide 50
- Norm Distance
- Slide 52
- The full Hashing
- Slide 54
- Slide 55
- Slide 56
- Generalization P-Stable distribution
- P-Stable summary
- Parameters selection
- Parameters selection hellip
- hellip Parameters selection
- Pros amp Cons
- Conclusion
- LSH - Applications
- Motivation
- Slide 66
- Given an image x what are the parameters θ in this image ie angles of joints orientation of the body etc1048698
- Ingredients
- Example based learning
- Slide 70
- The image features
- PSH The basic assumption
- Insight Manifolds
- Slide 74
- Parameter Sensitive Hashing (PSH)
- Slide 76
- Slide 77
- Slide 78
- Slide 79
- Local Weighted Regression (LWR)
- Results
- Slide 82
- Results ndash real data
- Slide 84
- Slide 85
- Fast pose estimation - summary
- Food for Thought
- Food for Thought Point Location in Different Spheres (PLDS)
- Motivation
- Slide 90
- Slide 91
- Slide 92
- Slide 93
- Image segmentation algorithm
- Slide 95
- Filtering examples
- Segmentation examples
- Mean-shift in high dimensions
- LSH-based data structure
- Choosing the optimal K and L
- Slide 101
- Choosing optimal K and L
- Slide 103
- Data driven partitions
- Additional speedup
- Speedup results
- Food for thought
- A thought for foodhellip
- Summary
- Slide 110
- Thanks
-
![Page 3: K-Nearest Neighbors Search in High Dimensions Tomer Peled Dan Kushnir Tell me who your neighbors are, and I'll know who you are.](https://reader036.fdocuments.us/reader036/viewer/2022062421/56649c765503460f94929f53/html5/thumbnails/3.jpg)
bull Given a set P of n points in Rd
Over some metric
bull find the nearest neighbor p of q in P
Nearest Neighbor SearchProblem definition
Distance metric
Applications
bullClassification bullClustering
bullSegmentation
q
bullIndexingbullDimension reduction
(eg lle)
color
Weight
Naiumlve solution
bullNo preprocess
bullGiven a query point qndashGo over all n pointsndashDo comparison in Rd
bullquery time = O(nd)
Keep in mind
Common solution
bullUse a data structure for acceleration
bullScale-ability with n amp with d is important
When to use nearest neighbor
High level algorithms
Assuming no prior knowledge about the underlying probability structure
complex models Sparse data High dimensions
Parametric Non-parametric
Density estimation
Probability distribution estimation
Nearest neighbors
Nearest Neighbor
min pi P dist(qpi)
Closestqq
r - Nearest Neighbor
r
(1 + ) r
dist(qp1) r
dist(qp2) (1 + ) r r2=(1 + ) r1
Outline
bullProblem definition and flavorsbullAlgorithms overview - low dimensionsAlgorithms overview - low dimensions bullCurse of dimensionality (dgt1020)bullEnchanting the curse
Locality Sensitive Hashing (high dimension approximate solutions)
bulll2 extensionbullApplications (Dan)
The simplest solution
bullLion in the desert
Quadtree
Split the first dimension into 2
Repeat iteratively
Stop when each cell has no more than 1 data point
Quadtree - structure
X
Y
X1Y1 PgeX1PgeY1
PltX1PltY1
PgeX1PltY1
PltX1PgeY1
X1Y1
Quadtree - Query
X
Y
In many cases works
X1Y1PltX1PltY1 PltX1
PgeY1
X1Y1
PgeX1PgeY1
PgeX1PltY1
Quadtree ndash Pitfall1
X
Y
In some cases doesnrsquot
X1Y1PgeX1PgeY1
PltX1
PltX1PltY1 PgeX1
PltY1PltX1PgeY1
X1Y1
Quadtree ndash Pitfall1
X
Y
In some cases nothing works
Quadtree ndash pitfall 2X
Y
O(2d)
Could result in Query time Exponential in dimensions
Space partition based algorithms
Multidimensional access methods Volker Gaede O Gunther
Could be improved
Outline
bullProblem definition and flavorsbullAlgorithms overview - low dimensions bullCurse of dimensionality (dgt1020)Curse of dimensionality (dgt1020)bullEnchanting the curse
Locality Sensitive Hashing (high dimension approximate solutions)
bulll2 extensionbullApplications (Dan)
Curse of dimensionality
bullQuery time or spaceO(nd)bullDgt1020 worst than sequential scan
ndashFor most geometric distributionsbullTechniques specific to high dimensions are needed
bullProoved in theory and in practice by Barkol amp Rabani 2000 amp Beame-Vee 2002
O( min(nd nd) )Naive
Curse of dimensionalitySome intuition
2
22
23
2d
Outline
bullProblem definition and flavorsbullAlgorithms overview - low dimensions bullCurse of dimensionality (dgt1020)bullEnchanting the curse Enchanting the curse
Locality Sensitive Hashing Locality Sensitive Hashing (high dimension approximate solutions)
bulll2 extensionbullApplications (Dan)
Preview
bullGeneral Solution ndash Locality sensitive hashing
bullImplementation for Hamming space
bullGeneralization to l1 amp l2
Hash function
Hash function
Hash function
Data_Item
Key
BinBucket
Hash function
X modulo 3
X=Number in the range 0n
02
Storage Address
Data structure
0
Usually we would like related Data-items to be stored at the same bin
Recall r - Nearest Neighbor
r
(1 + ) r
dist(qp1) r
dist(qp2) (1 + ) r r2=(1 + ) r1
Locality sensitive hashing
r(1 + ) r
(r p1p2 )Sensitiveequiv Pr[I(p)=I(q)] is ldquohighrdquo if p is ldquocloserdquo to qequiv Pr[I(p)=I(q)] is ldquolowrdquo if p isrdquofarrdquo from q
r2=(1 + ) r1
P1P2
Preview
bullGeneral Solution ndash Locality sensitive hashing
bullImplementation for Hamming space
bullGeneralization to l1 amp l2
Hamming Space
bullHamming space = 2N binary strings
bullHamming distance = changed digits
aka Signal distanceRichard Hamming
Hamming SpaceN
010100001111
010100001111
010010000011Distance = 4
bullHamming space
bullHamming distance
SUM(X1 XOR X2)
L1 to Hamming Space Embedding
p
8
C=11
1111111100011000000000
2
1111111100011000000000
drsquo=Cd
Hash function
Lj Hash function
p Hdrsquoisin
Gj(p)=p|Ij
j=1L k=3 digits
Bits sampling from p
Store p into bucket p|Ij 2k buckets101
11000000000 111111110000 111000000000 111111110001
Construction
1 2 L
p
Query
1 2 L
q
Alternative intuition random projections
p
8
C=11
1111111100011000000000
2
1111111100011000000000
drsquo=Cd
Alternative intuition random projections
8
C=11
1111111100011000000000
2
1111111100011000000000
p
Alternative intuition random projections
8
C=11
1111111100011000000000
2
1111111100011000000000
p
Alternative intuition random projections
101
11000000000 111111110000 111000000000 111111110001
000
100
110
001
101
111
2233 BucketsBucketsp
k samplings
Repeating
Repeating L times
Repeating L times
Secondary hashing
Support volume tuning
dataset-size vs storage volume
2k buckets
011
Size=B
M Buckets
Simple Hashing
MB=αn α=2
Skip
The above hashing is locality-sensitive
bullProbability (pq in same bucket)=
k=1 k=2
Distance (qpi) Distance (qpi)
Pro
babi
lity Pr
Adopted from Piotr Indykrsquos slides
kqp
dimensions
)(Distance1
Preview
bullGeneral Solution ndash Locality sensitive hashing
bullImplementation for Hamming space
bullGeneralization to l2
Direct L2 solution
bullNew hashing function
bullStill based on sampling
bullUsing mathematical trick
bullP-stable distribution for Lp distance bullGaussian distribution for L2 distance
Central limit theorem
v1 +v2 hellip+vn =+hellip
(Weighted Gaussians) = Weighted Gaussian
Central limit theorem
v1vn = Real Numbers
X1Xn = Independent Identically Distributed(iid)
+v2 X2 hellip+vn Xn =+hellipv1 X1
Central limit theorem
XvXvi
ii
ii
21
2||
Dot Product Norm
Norm Distance
XvuXvXui
iii
iii
ii
21
2||
Features vector 1
Features vector 2 Distance
Norm Distance
XvuXvXui
iii
iii
ii
21
2||
Dot Product
Dot Product Distance
The full Hashing
w
bvavh ba )(
[34 82 21]1
227742
d
d random numbers
+b
phaseRandom[0w]
wDiscretization step
Features vector
The full Hashing
w
bvavh ba )(
+34
100
7944
7900 8000 8100 82007800
The full Hashing
w
bvavh ba )(
+34
phaseRandom[0w]
100Discretization step
7944
The full Hashing
w
bvavh ba )(
a1 v d
iid from p-stable distribution
+b
phaseRandom[0w]
wDiscretization step
Features vector
Generalization P-Stable distribution
bullLp p=eps2
bullGeneralized Central Limit Theorem
bullP-stable distributionCauchy for L2
bullL2
bullCentral Limit Theorem
bullGaussian (normal) distribution
P-Stable summary
bullWorks for bullGeneralizes to 0ltplt=2
bullImproves query time
Query time = O (dn1(1+)log(n) ) O (dn1(1+)^2log(n) )
r - Nearest Neighbor
Latest resultsReported in Email by
Alexander Andoni
Parameters selection
bull90 Probability Best quarry time performance
For Euclidean Space
Parameters selectionhellip
For Euclidean Space
bullSingle projection hit an - Nearest Neighbor with Pr=p1
bullk projections hits an - Nearest Neighbor with Pr=p1k
bullL hashings fail to collide with Pr=(1-p1k)L
bullTo ensure Collision (eg 1-δge90)
bull1( -1-p1k)Lge 1-δ)1log(
)log(
1kp
L
L
Reject Non-NeighborsAccept Neighbors
hellipParameters selection
K
k
time Candidates verification Candidates extraction
Better Query Time than Spatial Data Structures
Scales well to higher dimensions and larger data size ( Sub-linear dependence )
Predictable running time
Extra storage over-head
Inefficient for data with distances concentrated around average
works best for Hamming distance (although can be generalized to Euclidean space)
In secondary storage linear scan is pretty much all we can do (for high dim)
requires radius r to be fixed in advance
Pros amp Cons
From Pioter Indyk slides
Conclusion
bullbut at the endeverything depends on your data set
bullTry it at homendashVisit
httpwebmiteduandoniwwwLSHindexhtml
ndashEmail Alex AndoniAndonimitedundashTest over your own data
(C code under Red Hat Linux )
LSH - Applicationsbull Searching video clips in databases (Hierarchical Non-Uniform Locality Sensitive
Hashing and Its Application to Video Identificationldquo Yang Ooi Sun)
bull Searching image databases (see the following)
bull Image segmentation (see the following)
bull Image classification (ldquoDiscriminant adaptive Nearest Neighbor Classificationrdquo T Hastie R Tibshirani)
bull Texture classification (see the following)
bull Clustering (see the following)
bull Embedding and manifold learning (LLE and many others)
bull Compression ndash vector quantizationbull Search engines (ldquoLSH Forest SelfTuning Indexes for Similarity Searchrdquo M Bawa T Condie P Ganesanrdquo)
bull Genomics (ldquoEfficient Large-Scale Sequence Comparison by Locality-Sensitive Hashingrdquo J Buhler)
bull In short whenever K-Nearest Neighbors (KNN) are needed
Motivation
bull A variety of procedures in learning require KNN computation
bull KNN search is a computational bottleneck
bull LSH provides a fast approximate solution to the problem
bull LSH requires hash function construction and parameter tunning
Outline
Fast Pose Estimation with Parameter Sensitive Hashing G Shakhnarovich P Viola and T Darrell
bull Finding sensitive hash functions
Mean Shift Based Clustering in HighDimensions A Texture Classification Example
B Georgescu I Shimshoni and P Meer
bull Tuning LSH parametersbull LSH data structure is used for algorithm
speedups
Given an image x what are the parameters θ in this image
ie angles of joints orientation of the body etc1048698
The Problem
Fast Pose Estimation with Parameter Sensitive Hashing
G Shakhnarovich P Viola and T Darrell
i
Ingredients
bull Input query image with unknown angles (parameters)
bull Database of human poses with known anglesbull Image feature extractor ndash edge detector
bull Distance metric in feature space dx
bull Distance metric in angles space
m
i
iid1
2121 )cos(1)(
Example based learning
bull Construct a database of example images with their known angles
bull Given a query image run your favorite feature extractorbull Compute KNN from databasebull Use these KNNs to compute the average angles of the
query
Input queryFind KNN in database of examples
Output Average angles of KNN
Input Query
Features extraction
Processed query
PSH (LSH)
Database of examples
The algorithm flow
LWR (Regression)
Output Match
The image features
B A
Axx 4107 )(
4
3
2
4 0
Image features are multi-scale edge histograms
Feature Extraction PSH LWR
PSH The basic assumption
There are two metric spaces here feature space ( )
and parameter space ( )
We want similarity to be measured in the angles
space whereas LSH works on the feature space
bull Assumption The feature space is closely related to the parameter space
xd
d
Feature Extraction PSH LWR
Insight Manifolds
bull Manifold is a space in which every point has a neighborhood resembling a Euclid space
bull But global structure may be complicated curved
bull For example lines are 1D manifolds planes are 2D manifolds etc
Feature Extraction PSH LWR
Parameters Space (angles)
Feature Space
q
Is this Magic
Parameter Sensitive Hashing (PSH)
The trick
Estimate performance of different hash functions on examples and select those sensitive to
The hash functions are applied in feature space but the KNN are valid in angle space
d
Feature Extraction PSH LWR
Label pairs of examples with similar angles
Define hash functions h on feature space
Feature Extraction PSH LWR
Predict labeling of similarnon-similar examples by using h
Compare labeling
If labeling by h is goodaccept h else change h
PSH as a classification problem
+1 +1 -1 -1
(r=025)
Labels
)1()( if 1
)( if 1y
labeled is
)x()(x examples ofpair A
ij
ji
rd
rd
ji
ji
ji
Feature Extraction PSH LWR
otherwise 1-
T(x) if 1)(
xh T
A binary hash functionfeatures
otherwise 1
if 1ˆ
labels ePredict th
)(xh)(xh)x(xy
jTiTjih
Feature Extraction PSH LWR
Feature
Feature Extraction PSH LWR
sconstraint iesprobabilit with thelabeling true thepredicts that Tbest theFind
themseparateor bin
same in the examplesboth place willTh
)(xT
Local Weighted Regression (LWR)bull Given a query image PSH returns
KNNs
bull LWR uses the KNN to compute a weighted average of the estimated angles of the query
weightdist
iXiixNx
xxdKxgdi
0)(
))(())((minarg0
Feature Extraction PSH LWR
Results
Synthetic data were generated
bull 13 angles 1 for rotation of the torso 12 for joints
bull 150000 images
bull Nuisance parameters added clothing illumination face expression
bull 1775000 example pairs
bull Selected 137 out of 5123 meaningful features (how)
18 bit hash functions (k) 150 hash tables (l)
bull Test on 1000 synthetic examplesbull PSH searched only 34 of the data per query
bull Without selection needed 40 bits and
1000 hash tables
Recall P1 is prob of positive hashP2 is prob of bad hashB is the max number of pts in a bucket
Results ndash real data
bull 800 images
bull Processed by a segmentation algorithm
bull 13 of the data were searched
Results ndash real data
Interesting mismatches
Fast pose estimation - summary
bull Fast way to compute the angles of human body figure
bull Moving from one representation space to another
bull Training a sensitive hash function
bull KNN smart averaging
Food for Thought
bull The basic assumption may be problematic (distance metric representations)
bull The training set should be dense
bull Texture and clutter
bull General some features are more important than others and should be weighted
Food for Thought Point Location in Different Spheres (PLDS)
bull Given n spheres in Rd centered at P=p1hellippn
with radii r1helliprn
bull Goal given a query q preprocess the points in P to find point pi that its sphere lsquocoverrsquo the query q
qpi
ri
Courtesy of Mohamad Hegaze
Motivationbull Clustering high dimensional data by using local
density measurements (eg feature space)bull Statistical curse of dimensionality
sparseness of the databull Computational curse of dimensionality
expensive range queriesbull LSH parameters should be adjusted for optimal
performance
Mean-Shift Based Clustering in High Dimensions A Texture Classification Example
B Georgescu I Shimshoni and P Meer
Outline
bull Mean-shift in a nutshell + examples
Our scope
bull Mean-shift in high dimensions ndash using LSH
bull Speedups1 Finding optimal LSH parameters
2 Data-driven partitions into buckets
3 Additional speedup by using LSH data structure
Mean-Shift in a Nutshellbandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
point
KNN in mean-shift
Bandwidth should be inversely proportional to the density in the region
high density - small bandwidth low density - large bandwidth
Based on kth nearest neighbor of the point
The bandwidth is
Adaptive mean-shift vs non-adaptive
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Image segmentation algorithm1 Input Data in 5D (3 color + 2 xy) or 3D (1 gray +2 xy)2 Resolution controlled by the bandwidth hs (spatial) hr (color)3 Apply filtering
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
3D
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Image segmentation algorithm
original segmented
filtered
Filtering pixel value of the nearest mode
Mean-shift trajectories
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
original squirrel filtered
original baboon filtered
Filtering examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Segmentation examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Mean-shift in high dimensions
Computational curse of dimensionality
Statistical curse of dimensionality
Expensive range queries implemented with LSH
Sparseness of the data variable bandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
LSH-based data structure
bull Choose L random partitionsEach partition includes K pairs
(dkvk)bull For each point we check
kdi vxK
It Partitions the data into cells
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing the optimal K and L
bull For a query q compute smallest number of distances to points in its buckets
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
points extra includemight big toois L ifbut
missed bemight points small toois L If
cell ain points ofnumber smaller k Large
C
l
l
CC
dC
LNN
dKnN
)1(
C
C
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
structure data theof resolution thedetermines
decreases but increases increases L As
C
CC
Choosing optimal K and LDetermine accurately the KNN for m randomly-selected data points
distance (bandwidth)
Choose error threshold
The optimal K and L should satisfy
the approximate distance
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing optimal K and Lbull For each K estimate the error forbull In one run for all Lrsquos find the minimal L satisfying the constraint L(K)bull Minimize time t(KL(K))
minimum
Approximationerror for KL
L(K) for =005 Running timet[KL(K)]
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Data driven partitions
bull In original LSH cut values are random in the range of the databull Suggestion Randomly select a point from the data and use one of its coordinates as the cut value
uniform data driven pointsbucket distribution
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Additional speedup
aggregate)an of typea like is (C mode same
the toconverge willCin points all that Assume
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
C
C
Speedup results
65536 points 1638 points sampled k=100
Food for thought
Low dimension High dimension
A thought for foodhellipbull Choose K L by sample learning or take the traditionalbull Can one estimate K L without samplingbull A thought for food does it help to know the data
dimensionality or the data manifoldbull Intuitively dimensionality implies the number of hash
functions neededbull The catch efficient dimensionality learning requires
KNN
1530 cookieshellip
Summary
bull LSH suggests a compromise on accuracy for the gain of complexity
bull Applications that involve massive data in high dimension require the LSH fast performance
bull Extension of the LSH to different spaces (PSH)
bull Learning the LSH parameters and hash functions for different applications
Conclusion
bull but at the endeverything depends on your data set
bull Try it at homendash Visit
httpwebmiteduandoniwwwLSHindexhtmlndash Email Alex Andoni Andonimitedundash Test over your own data
(C code under Red Hat Linux )
Thanks
bull Ilan Shimshoni (Haifa)
bull Mohamad Hegaze (Weizmann)
bull Alex Andoni (MIT)
bull Mica and Denis
- k-Nearest Neighbors Search in High Dimensions
- Outline
- Nearest Neighbor Search Problem definition
- Applications
- Naiumlve solution
- Common solution
- When to use nearest neighbor
- Nearest Neighbor
- r - Nearest Neighbor
- Slide 10
- The simplest solution
- Quadtree
- Quadtree - structure
- Quadtree - Query
- Quadtree ndash Pitfall1
- Slide 16
- Quadtree ndash pitfall 2
- Space partition based algorithms
- Slide 19
- Curse of dimensionality
- Curse of dimensionality Some intuition
- Slide 22
- Preview
- Hash function
- Slide 25
- Slide 26
- Recall r - Nearest Neighbor
- Locality sensitive hashing
- Slide 29
- Hamming Space
- Slide 31
- L1 to Hamming Space Embedding
- Slide 33
- Construction
- Query
- Alternative intuition random projections
- Slide 37
- Slide 38
- Slide 39
- k samplings
- Repeating
- Repeating L times
- Slide 43
- Secondary hashing
- The above hashing is locality-sensitive
- Slide 46
- Direct L2 solution
- Central limit theorem
- Slide 49
- Slide 50
- Norm Distance
- Slide 52
- The full Hashing
- Slide 54
- Slide 55
- Slide 56
- Generalization P-Stable distribution
- P-Stable summary
- Parameters selection
- Parameters selection hellip
- hellip Parameters selection
- Pros amp Cons
- Conclusion
- LSH - Applications
- Motivation
- Slide 66
- Given an image x what are the parameters θ in this image ie angles of joints orientation of the body etc1048698
- Ingredients
- Example based learning
- Slide 70
- The image features
- PSH The basic assumption
- Insight Manifolds
- Slide 74
- Parameter Sensitive Hashing (PSH)
- Slide 76
- Slide 77
- Slide 78
- Slide 79
- Local Weighted Regression (LWR)
- Results
- Slide 82
- Results ndash real data
- Slide 84
- Slide 85
- Fast pose estimation - summary
- Food for Thought
- Food for Thought Point Location in Different Spheres (PLDS)
- Motivation
- Slide 90
- Slide 91
- Slide 92
- Slide 93
- Image segmentation algorithm
- Slide 95
- Filtering examples
- Segmentation examples
- Mean-shift in high dimensions
- LSH-based data structure
- Choosing the optimal K and L
- Slide 101
- Choosing optimal K and L
- Slide 103
- Data driven partitions
- Additional speedup
- Speedup results
- Food for thought
- A thought for foodhellip
- Summary
- Slide 110
- Thanks
-
![Page 4: K-Nearest Neighbors Search in High Dimensions Tomer Peled Dan Kushnir Tell me who your neighbors are, and I'll know who you are.](https://reader036.fdocuments.us/reader036/viewer/2022062421/56649c765503460f94929f53/html5/thumbnails/4.jpg)
Applications
bullClassification bullClustering
bullSegmentation
q
bullIndexingbullDimension reduction
(eg lle)
color
Weight
Naiumlve solution
bullNo preprocess
bullGiven a query point qndashGo over all n pointsndashDo comparison in Rd
bullquery time = O(nd)
Keep in mind
Common solution
bullUse a data structure for acceleration
bullScale-ability with n amp with d is important
When to use nearest neighbor
High level algorithms
Assuming no prior knowledge about the underlying probability structure
complex models Sparse data High dimensions
Parametric Non-parametric
Density estimation
Probability distribution estimation
Nearest neighbors
Nearest Neighbor
min pi P dist(qpi)
Closestqq
r - Nearest Neighbor
r
(1 + ) r
dist(qp1) r
dist(qp2) (1 + ) r r2=(1 + ) r1
Outline
bullProblem definition and flavorsbullAlgorithms overview - low dimensionsAlgorithms overview - low dimensions bullCurse of dimensionality (dgt1020)bullEnchanting the curse
Locality Sensitive Hashing (high dimension approximate solutions)
bulll2 extensionbullApplications (Dan)
The simplest solution
bullLion in the desert
Quadtree
Split the first dimension into 2
Repeat iteratively
Stop when each cell has no more than 1 data point
Quadtree - structure
X
Y
X1Y1 PgeX1PgeY1
PltX1PltY1
PgeX1PltY1
PltX1PgeY1
X1Y1
Quadtree - Query
X
Y
In many cases works
X1Y1PltX1PltY1 PltX1
PgeY1
X1Y1
PgeX1PgeY1
PgeX1PltY1
Quadtree ndash Pitfall1
X
Y
In some cases doesnrsquot
X1Y1PgeX1PgeY1
PltX1
PltX1PltY1 PgeX1
PltY1PltX1PgeY1
X1Y1
Quadtree ndash Pitfall1
X
Y
In some cases nothing works
Quadtree ndash pitfall 2X
Y
O(2d)
Could result in Query time Exponential in dimensions
Space partition based algorithms
Multidimensional access methods Volker Gaede O Gunther
Could be improved
Outline
bullProblem definition and flavorsbullAlgorithms overview - low dimensions bullCurse of dimensionality (dgt1020)Curse of dimensionality (dgt1020)bullEnchanting the curse
Locality Sensitive Hashing (high dimension approximate solutions)
bulll2 extensionbullApplications (Dan)
Curse of dimensionality
bullQuery time or spaceO(nd)bullDgt1020 worst than sequential scan
ndashFor most geometric distributionsbullTechniques specific to high dimensions are needed
bullProoved in theory and in practice by Barkol amp Rabani 2000 amp Beame-Vee 2002
O( min(nd nd) )Naive
Curse of dimensionalitySome intuition
2
22
23
2d
Outline
bullProblem definition and flavorsbullAlgorithms overview - low dimensions bullCurse of dimensionality (dgt1020)bullEnchanting the curse Enchanting the curse
Locality Sensitive Hashing Locality Sensitive Hashing (high dimension approximate solutions)
bulll2 extensionbullApplications (Dan)
Preview
bullGeneral Solution ndash Locality sensitive hashing
bullImplementation for Hamming space
bullGeneralization to l1 amp l2
Hash function
Hash function
Hash function
Data_Item
Key
BinBucket
Hash function
X modulo 3
X=Number in the range 0n
02
Storage Address
Data structure
0
Usually we would like related Data-items to be stored at the same bin
Recall r - Nearest Neighbor
r
(1 + ) r
dist(qp1) r
dist(qp2) (1 + ) r r2=(1 + ) r1
Locality sensitive hashing
r(1 + ) r
(r p1p2 )Sensitiveequiv Pr[I(p)=I(q)] is ldquohighrdquo if p is ldquocloserdquo to qequiv Pr[I(p)=I(q)] is ldquolowrdquo if p isrdquofarrdquo from q
r2=(1 + ) r1
P1P2
Preview
bullGeneral Solution ndash Locality sensitive hashing
bullImplementation for Hamming space
bullGeneralization to l1 amp l2
Hamming Space
bullHamming space = 2N binary strings
bullHamming distance = changed digits
aka Signal distanceRichard Hamming
Hamming SpaceN
010100001111
010100001111
010010000011Distance = 4
bullHamming space
bullHamming distance
SUM(X1 XOR X2)
L1 to Hamming Space Embedding
p
8
C=11
1111111100011000000000
2
1111111100011000000000
drsquo=Cd
Hash function
Lj Hash function
p Hdrsquoisin
Gj(p)=p|Ij
j=1L k=3 digits
Bits sampling from p
Store p into bucket p|Ij 2k buckets101
11000000000 111111110000 111000000000 111111110001
Construction
1 2 L
p
Query
1 2 L
q
Alternative intuition random projections
p
8
C=11
1111111100011000000000
2
1111111100011000000000
drsquo=Cd
Alternative intuition random projections
8
C=11
1111111100011000000000
2
1111111100011000000000
p
Alternative intuition random projections
8
C=11
1111111100011000000000
2
1111111100011000000000
p
Alternative intuition random projections
101
11000000000 111111110000 111000000000 111111110001
000
100
110
001
101
111
2233 BucketsBucketsp
k samplings
Repeating
Repeating L times
Repeating L times
Secondary hashing
Support volume tuning
dataset-size vs storage volume
2k buckets
011
Size=B
M Buckets
Simple Hashing
MB=αn α=2
Skip
The above hashing is locality-sensitive
bullProbability (pq in same bucket)=
k=1 k=2
Distance (qpi) Distance (qpi)
Pro
babi
lity Pr
Adopted from Piotr Indykrsquos slides
kqp
dimensions
)(Distance1
Preview
bullGeneral Solution ndash Locality sensitive hashing
bullImplementation for Hamming space
bullGeneralization to l2
Direct L2 solution
bullNew hashing function
bullStill based on sampling
bullUsing mathematical trick
bullP-stable distribution for Lp distance bullGaussian distribution for L2 distance
Central limit theorem
v1 +v2 hellip+vn =+hellip
(Weighted Gaussians) = Weighted Gaussian
Central limit theorem
v1vn = Real Numbers
X1Xn = Independent Identically Distributed(iid)
+v2 X2 hellip+vn Xn =+hellipv1 X1
Central limit theorem
XvXvi
ii
ii
21
2||
Dot Product Norm
Norm Distance
XvuXvXui
iii
iii
ii
21
2||
Features vector 1
Features vector 2 Distance
Norm Distance
XvuXvXui
iii
iii
ii
21
2||
Dot Product
Dot Product Distance
The full Hashing
w
bvavh ba )(
[34 82 21]1
227742
d
d random numbers
+b
phaseRandom[0w]
wDiscretization step
Features vector
The full Hashing
w
bvavh ba )(
+34
100
7944
7900 8000 8100 82007800
The full Hashing
w
bvavh ba )(
+34
phaseRandom[0w]
100Discretization step
7944
The full Hashing
w
bvavh ba )(
a1 v d
iid from p-stable distribution
+b
phaseRandom[0w]
wDiscretization step
Features vector
Generalization P-Stable distribution
bullLp p=eps2
bullGeneralized Central Limit Theorem
bullP-stable distributionCauchy for L2
bullL2
bullCentral Limit Theorem
bullGaussian (normal) distribution
P-Stable summary
bullWorks for bullGeneralizes to 0ltplt=2
bullImproves query time
Query time = O (dn1(1+)log(n) ) O (dn1(1+)^2log(n) )
r - Nearest Neighbor
Latest resultsReported in Email by
Alexander Andoni
Parameters selection
bull90 Probability Best quarry time performance
For Euclidean Space
Parameters selectionhellip
For Euclidean Space
bullSingle projection hit an - Nearest Neighbor with Pr=p1
bullk projections hits an - Nearest Neighbor with Pr=p1k
bullL hashings fail to collide with Pr=(1-p1k)L
bullTo ensure Collision (eg 1-δge90)
bull1( -1-p1k)Lge 1-δ)1log(
)log(
1kp
L
L
Reject Non-NeighborsAccept Neighbors
hellipParameters selection
K
k
time Candidates verification Candidates extraction
Better Query Time than Spatial Data Structures
Scales well to higher dimensions and larger data size ( Sub-linear dependence )
Predictable running time
Extra storage over-head
Inefficient for data with distances concentrated around average
works best for Hamming distance (although can be generalized to Euclidean space)
In secondary storage linear scan is pretty much all we can do (for high dim)
requires radius r to be fixed in advance
Pros amp Cons
From Pioter Indyk slides
Conclusion
bullbut at the endeverything depends on your data set
bullTry it at homendashVisit
httpwebmiteduandoniwwwLSHindexhtml
ndashEmail Alex AndoniAndonimitedundashTest over your own data
(C code under Red Hat Linux )
LSH - Applicationsbull Searching video clips in databases (Hierarchical Non-Uniform Locality Sensitive
Hashing and Its Application to Video Identificationldquo Yang Ooi Sun)
bull Searching image databases (see the following)
bull Image segmentation (see the following)
bull Image classification (ldquoDiscriminant adaptive Nearest Neighbor Classificationrdquo T Hastie R Tibshirani)
bull Texture classification (see the following)
bull Clustering (see the following)
bull Embedding and manifold learning (LLE and many others)
bull Compression ndash vector quantizationbull Search engines (ldquoLSH Forest SelfTuning Indexes for Similarity Searchrdquo M Bawa T Condie P Ganesanrdquo)
bull Genomics (ldquoEfficient Large-Scale Sequence Comparison by Locality-Sensitive Hashingrdquo J Buhler)
bull In short whenever K-Nearest Neighbors (KNN) are needed
Motivation
bull A variety of procedures in learning require KNN computation
bull KNN search is a computational bottleneck
bull LSH provides a fast approximate solution to the problem
bull LSH requires hash function construction and parameter tunning
Outline
Fast Pose Estimation with Parameter Sensitive Hashing G Shakhnarovich P Viola and T Darrell
bull Finding sensitive hash functions
Mean Shift Based Clustering in HighDimensions A Texture Classification Example
B Georgescu I Shimshoni and P Meer
bull Tuning LSH parametersbull LSH data structure is used for algorithm
speedups
Given an image x what are the parameters θ in this image
ie angles of joints orientation of the body etc1048698
The Problem
Fast Pose Estimation with Parameter Sensitive Hashing
G Shakhnarovich P Viola and T Darrell
i
Ingredients
bull Input query image with unknown angles (parameters)
bull Database of human poses with known anglesbull Image feature extractor ndash edge detector
bull Distance metric in feature space dx
bull Distance metric in angles space
m
i
iid1
2121 )cos(1)(
Example based learning
bull Construct a database of example images with their known angles
bull Given a query image run your favorite feature extractorbull Compute KNN from databasebull Use these KNNs to compute the average angles of the
query
Input queryFind KNN in database of examples
Output Average angles of KNN
Input Query
Features extraction
Processed query
PSH (LSH)
Database of examples
The algorithm flow
LWR (Regression)
Output Match
The image features
B A
Axx 4107 )(
4
3
2
4 0
Image features are multi-scale edge histograms
Feature Extraction PSH LWR
PSH The basic assumption
There are two metric spaces here feature space ( )
and parameter space ( )
We want similarity to be measured in the angles
space whereas LSH works on the feature space
bull Assumption The feature space is closely related to the parameter space
xd
d
Feature Extraction PSH LWR
Insight Manifolds
bull Manifold is a space in which every point has a neighborhood resembling a Euclid space
bull But global structure may be complicated curved
bull For example lines are 1D manifolds planes are 2D manifolds etc
Feature Extraction PSH LWR
Parameters Space (angles)
Feature Space
q
Is this Magic
Parameter Sensitive Hashing (PSH)
The trick
Estimate performance of different hash functions on examples and select those sensitive to
The hash functions are applied in feature space but the KNN are valid in angle space
d
Feature Extraction PSH LWR
Label pairs of examples with similar angles
Define hash functions h on feature space
Feature Extraction PSH LWR
Predict labeling of similarnon-similar examples by using h
Compare labeling
If labeling by h is goodaccept h else change h
PSH as a classification problem
+1 +1 -1 -1
(r=025)
Labels
)1()( if 1
)( if 1y
labeled is
)x()(x examples ofpair A
ij
ji
rd
rd
ji
ji
ji
Feature Extraction PSH LWR
otherwise 1-
T(x) if 1)(
xh T
A binary hash functionfeatures
otherwise 1
if 1ˆ
labels ePredict th
)(xh)(xh)x(xy
jTiTjih
Feature Extraction PSH LWR
Feature
Feature Extraction PSH LWR
sconstraint iesprobabilit with thelabeling true thepredicts that Tbest theFind
themseparateor bin
same in the examplesboth place willTh
)(xT
Local Weighted Regression (LWR)bull Given a query image PSH returns
KNNs
bull LWR uses the KNN to compute a weighted average of the estimated angles of the query
weightdist
iXiixNx
xxdKxgdi
0)(
))(())((minarg0
Feature Extraction PSH LWR
Results
Synthetic data were generated
bull 13 angles 1 for rotation of the torso 12 for joints
bull 150000 images
bull Nuisance parameters added clothing illumination face expression
bull 1775000 example pairs
bull Selected 137 out of 5123 meaningful features (how)
18 bit hash functions (k) 150 hash tables (l)
bull Test on 1000 synthetic examplesbull PSH searched only 34 of the data per query
bull Without selection needed 40 bits and
1000 hash tables
Recall P1 is prob of positive hashP2 is prob of bad hashB is the max number of pts in a bucket
Results ndash real data
bull 800 images
bull Processed by a segmentation algorithm
bull 13 of the data were searched
Results ndash real data
Interesting mismatches
Fast pose estimation - summary
bull Fast way to compute the angles of human body figure
bull Moving from one representation space to another
bull Training a sensitive hash function
bull KNN smart averaging
Food for Thought
bull The basic assumption may be problematic (distance metric representations)
bull The training set should be dense
bull Texture and clutter
bull General some features are more important than others and should be weighted
Food for Thought Point Location in Different Spheres (PLDS)
bull Given n spheres in Rd centered at P=p1hellippn
with radii r1helliprn
bull Goal given a query q preprocess the points in P to find point pi that its sphere lsquocoverrsquo the query q
qpi
ri
Courtesy of Mohamad Hegaze
Motivationbull Clustering high dimensional data by using local
density measurements (eg feature space)bull Statistical curse of dimensionality
sparseness of the databull Computational curse of dimensionality
expensive range queriesbull LSH parameters should be adjusted for optimal
performance
Mean-Shift Based Clustering in High Dimensions A Texture Classification Example
B Georgescu I Shimshoni and P Meer
Outline
bull Mean-shift in a nutshell + examples
Our scope
bull Mean-shift in high dimensions ndash using LSH
bull Speedups1 Finding optimal LSH parameters
2 Data-driven partitions into buckets
3 Additional speedup by using LSH data structure
Mean-Shift in a Nutshellbandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
point
KNN in mean-shift
Bandwidth should be inversely proportional to the density in the region
high density - small bandwidth low density - large bandwidth
Based on kth nearest neighbor of the point
The bandwidth is
Adaptive mean-shift vs non-adaptive
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Image segmentation algorithm1 Input Data in 5D (3 color + 2 xy) or 3D (1 gray +2 xy)2 Resolution controlled by the bandwidth hs (spatial) hr (color)3 Apply filtering
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
3D
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Image segmentation algorithm
original segmented
filtered
Filtering pixel value of the nearest mode
Mean-shift trajectories
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
original squirrel filtered
original baboon filtered
Filtering examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Segmentation examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Mean-shift in high dimensions
Computational curse of dimensionality
Statistical curse of dimensionality
Expensive range queries implemented with LSH
Sparseness of the data variable bandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
LSH-based data structure
bull Choose L random partitionsEach partition includes K pairs
(dkvk)bull For each point we check
kdi vxK
It Partitions the data into cells
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing the optimal K and L
bull For a query q compute smallest number of distances to points in its buckets
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
points extra includemight big toois L ifbut
missed bemight points small toois L If
cell ain points ofnumber smaller k Large
C
l
l
CC
dC
LNN
dKnN
)1(
C
C
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
structure data theof resolution thedetermines
decreases but increases increases L As
C
CC
Choosing optimal K and LDetermine accurately the KNN for m randomly-selected data points
distance (bandwidth)
Choose error threshold
The optimal K and L should satisfy
the approximate distance
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing optimal K and Lbull For each K estimate the error forbull In one run for all Lrsquos find the minimal L satisfying the constraint L(K)bull Minimize time t(KL(K))
minimum
Approximationerror for KL
L(K) for =005 Running timet[KL(K)]
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Data driven partitions
bull In original LSH cut values are random in the range of the databull Suggestion Randomly select a point from the data and use one of its coordinates as the cut value
uniform data driven pointsbucket distribution
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Additional speedup
aggregate)an of typea like is (C mode same
the toconverge willCin points all that Assume
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
C
C
Speedup results
65536 points 1638 points sampled k=100
Food for thought
Low dimension High dimension
A thought for foodhellipbull Choose K L by sample learning or take the traditionalbull Can one estimate K L without samplingbull A thought for food does it help to know the data
dimensionality or the data manifoldbull Intuitively dimensionality implies the number of hash
functions neededbull The catch efficient dimensionality learning requires
KNN
1530 cookieshellip
Summary
bull LSH suggests a compromise on accuracy for the gain of complexity
bull Applications that involve massive data in high dimension require the LSH fast performance
bull Extension of the LSH to different spaces (PSH)
bull Learning the LSH parameters and hash functions for different applications
Conclusion
bull but at the endeverything depends on your data set
bull Try it at homendash Visit
httpwebmiteduandoniwwwLSHindexhtmlndash Email Alex Andoni Andonimitedundash Test over your own data
(C code under Red Hat Linux )
Thanks
bull Ilan Shimshoni (Haifa)
bull Mohamad Hegaze (Weizmann)
bull Alex Andoni (MIT)
bull Mica and Denis
- k-Nearest Neighbors Search in High Dimensions
- Outline
- Nearest Neighbor Search Problem definition
- Applications
- Naiumlve solution
- Common solution
- When to use nearest neighbor
- Nearest Neighbor
- r - Nearest Neighbor
- Slide 10
- The simplest solution
- Quadtree
- Quadtree - structure
- Quadtree - Query
- Quadtree ndash Pitfall1
- Slide 16
- Quadtree ndash pitfall 2
- Space partition based algorithms
- Slide 19
- Curse of dimensionality
- Curse of dimensionality Some intuition
- Slide 22
- Preview
- Hash function
- Slide 25
- Slide 26
- Recall r - Nearest Neighbor
- Locality sensitive hashing
- Slide 29
- Hamming Space
- Slide 31
- L1 to Hamming Space Embedding
- Slide 33
- Construction
- Query
- Alternative intuition random projections
- Slide 37
- Slide 38
- Slide 39
- k samplings
- Repeating
- Repeating L times
- Slide 43
- Secondary hashing
- The above hashing is locality-sensitive
- Slide 46
- Direct L2 solution
- Central limit theorem
- Slide 49
- Slide 50
- Norm Distance
- Slide 52
- The full Hashing
- Slide 54
- Slide 55
- Slide 56
- Generalization P-Stable distribution
- P-Stable summary
- Parameters selection
- Parameters selection hellip
- hellip Parameters selection
- Pros amp Cons
- Conclusion
- LSH - Applications
- Motivation
- Slide 66
- Given an image x what are the parameters θ in this image ie angles of joints orientation of the body etc1048698
- Ingredients
- Example based learning
- Slide 70
- The image features
- PSH The basic assumption
- Insight Manifolds
- Slide 74
- Parameter Sensitive Hashing (PSH)
- Slide 76
- Slide 77
- Slide 78
- Slide 79
- Local Weighted Regression (LWR)
- Results
- Slide 82
- Results ndash real data
- Slide 84
- Slide 85
- Fast pose estimation - summary
- Food for Thought
- Food for Thought Point Location in Different Spheres (PLDS)
- Motivation
- Slide 90
- Slide 91
- Slide 92
- Slide 93
- Image segmentation algorithm
- Slide 95
- Filtering examples
- Segmentation examples
- Mean-shift in high dimensions
- LSH-based data structure
- Choosing the optimal K and L
- Slide 101
- Choosing optimal K and L
- Slide 103
- Data driven partitions
- Additional speedup
- Speedup results
- Food for thought
- A thought for foodhellip
- Summary
- Slide 110
- Thanks
-
![Page 5: K-Nearest Neighbors Search in High Dimensions Tomer Peled Dan Kushnir Tell me who your neighbors are, and I'll know who you are.](https://reader036.fdocuments.us/reader036/viewer/2022062421/56649c765503460f94929f53/html5/thumbnails/5.jpg)
Naiumlve solution
bullNo preprocess
bullGiven a query point qndashGo over all n pointsndashDo comparison in Rd
bullquery time = O(nd)
Keep in mind
Common solution
bullUse a data structure for acceleration
bullScale-ability with n amp with d is important
When to use nearest neighbor
High level algorithms
Assuming no prior knowledge about the underlying probability structure
complex models Sparse data High dimensions
Parametric Non-parametric
Density estimation
Probability distribution estimation
Nearest neighbors
Nearest Neighbor
min pi P dist(qpi)
Closestqq
r - Nearest Neighbor
r
(1 + ) r
dist(qp1) r
dist(qp2) (1 + ) r r2=(1 + ) r1
Outline
bullProblem definition and flavorsbullAlgorithms overview - low dimensionsAlgorithms overview - low dimensions bullCurse of dimensionality (dgt1020)bullEnchanting the curse
Locality Sensitive Hashing (high dimension approximate solutions)
bulll2 extensionbullApplications (Dan)
The simplest solution
bullLion in the desert
Quadtree
Split the first dimension into 2
Repeat iteratively
Stop when each cell has no more than 1 data point
Quadtree - structure
X
Y
X1Y1 PgeX1PgeY1
PltX1PltY1
PgeX1PltY1
PltX1PgeY1
X1Y1
Quadtree - Query
X
Y
In many cases works
X1Y1PltX1PltY1 PltX1
PgeY1
X1Y1
PgeX1PgeY1
PgeX1PltY1
Quadtree ndash Pitfall1
X
Y
In some cases doesnrsquot
X1Y1PgeX1PgeY1
PltX1
PltX1PltY1 PgeX1
PltY1PltX1PgeY1
X1Y1
Quadtree ndash Pitfall1
X
Y
In some cases nothing works
Quadtree ndash pitfall 2X
Y
O(2d)
Could result in Query time Exponential in dimensions
Space partition based algorithms
Multidimensional access methods Volker Gaede O Gunther
Could be improved
Outline
bullProblem definition and flavorsbullAlgorithms overview - low dimensions bullCurse of dimensionality (dgt1020)Curse of dimensionality (dgt1020)bullEnchanting the curse
Locality Sensitive Hashing (high dimension approximate solutions)
bulll2 extensionbullApplications (Dan)
Curse of dimensionality
bullQuery time or spaceO(nd)bullDgt1020 worst than sequential scan
ndashFor most geometric distributionsbullTechniques specific to high dimensions are needed
bullProoved in theory and in practice by Barkol amp Rabani 2000 amp Beame-Vee 2002
O( min(nd nd) )Naive
Curse of dimensionalitySome intuition
2
22
23
2d
Outline
bullProblem definition and flavorsbullAlgorithms overview - low dimensions bullCurse of dimensionality (dgt1020)bullEnchanting the curse Enchanting the curse
Locality Sensitive Hashing Locality Sensitive Hashing (high dimension approximate solutions)
bulll2 extensionbullApplications (Dan)
Preview
bullGeneral Solution ndash Locality sensitive hashing
bullImplementation for Hamming space
bullGeneralization to l1 amp l2
Hash function
Hash function
Hash function
Data_Item
Key
BinBucket
Hash function
X modulo 3
X=Number in the range 0n
02
Storage Address
Data structure
0
Usually we would like related Data-items to be stored at the same bin
Recall r - Nearest Neighbor
r
(1 + ) r
dist(qp1) r
dist(qp2) (1 + ) r r2=(1 + ) r1
Locality sensitive hashing
r(1 + ) r
(r p1p2 )Sensitiveequiv Pr[I(p)=I(q)] is ldquohighrdquo if p is ldquocloserdquo to qequiv Pr[I(p)=I(q)] is ldquolowrdquo if p isrdquofarrdquo from q
r2=(1 + ) r1
P1P2
Preview
bullGeneral Solution ndash Locality sensitive hashing
bullImplementation for Hamming space
bullGeneralization to l1 amp l2
Hamming Space
bullHamming space = 2N binary strings
bullHamming distance = changed digits
aka Signal distanceRichard Hamming
Hamming SpaceN
010100001111
010100001111
010010000011Distance = 4
bullHamming space
bullHamming distance
SUM(X1 XOR X2)
L1 to Hamming Space Embedding
p
8
C=11
1111111100011000000000
2
1111111100011000000000
drsquo=Cd
Hash function
Lj Hash function
p Hdrsquoisin
Gj(p)=p|Ij
j=1L k=3 digits
Bits sampling from p
Store p into bucket p|Ij 2k buckets101
11000000000 111111110000 111000000000 111111110001
Construction
1 2 L
p
Query
1 2 L
q
Alternative intuition random projections
p
8
C=11
1111111100011000000000
2
1111111100011000000000
drsquo=Cd
Alternative intuition random projections
8
C=11
1111111100011000000000
2
1111111100011000000000
p
Alternative intuition random projections
8
C=11
1111111100011000000000
2
1111111100011000000000
p
Alternative intuition random projections
101
11000000000 111111110000 111000000000 111111110001
000
100
110
001
101
111
2233 BucketsBucketsp
k samplings
Repeating
Repeating L times
Repeating L times
Secondary hashing
Support volume tuning
dataset-size vs storage volume
2k buckets
011
Size=B
M Buckets
Simple Hashing
MB=αn α=2
Skip
The above hashing is locality-sensitive
bullProbability (pq in same bucket)=
k=1 k=2
Distance (qpi) Distance (qpi)
Pro
babi
lity Pr
Adopted from Piotr Indykrsquos slides
kqp
dimensions
)(Distance1
Preview
bullGeneral Solution ndash Locality sensitive hashing
bullImplementation for Hamming space
bullGeneralization to l2
Direct L2 solution
bullNew hashing function
bullStill based on sampling
bullUsing mathematical trick
bullP-stable distribution for Lp distance bullGaussian distribution for L2 distance
Central limit theorem
v1 +v2 hellip+vn =+hellip
(Weighted Gaussians) = Weighted Gaussian
Central limit theorem
v1vn = Real Numbers
X1Xn = Independent Identically Distributed(iid)
+v2 X2 hellip+vn Xn =+hellipv1 X1
Central limit theorem
XvXvi
ii
ii
21
2||
Dot Product Norm
Norm Distance
XvuXvXui
iii
iii
ii
21
2||
Features vector 1
Features vector 2 Distance
Norm Distance
XvuXvXui
iii
iii
ii
21
2||
Dot Product
Dot Product Distance
The full Hashing
w
bvavh ba )(
[34 82 21]1
227742
d
d random numbers
+b
phaseRandom[0w]
wDiscretization step
Features vector
The full Hashing
w
bvavh ba )(
+34
100
7944
7900 8000 8100 82007800
The full Hashing
w
bvavh ba )(
+34
phaseRandom[0w]
100Discretization step
7944
The full Hashing
w
bvavh ba )(
a1 v d
iid from p-stable distribution
+b
phaseRandom[0w]
wDiscretization step
Features vector
Generalization P-Stable distribution
bullLp p=eps2
bullGeneralized Central Limit Theorem
bullP-stable distributionCauchy for L2
bullL2
bullCentral Limit Theorem
bullGaussian (normal) distribution
P-Stable summary
bullWorks for bullGeneralizes to 0ltplt=2
bullImproves query time
Query time = O (dn1(1+)log(n) ) O (dn1(1+)^2log(n) )
r - Nearest Neighbor
Latest resultsReported in Email by
Alexander Andoni
Parameters selection
bull90 Probability Best quarry time performance
For Euclidean Space
Parameters selectionhellip
For Euclidean Space
bullSingle projection hit an - Nearest Neighbor with Pr=p1
bullk projections hits an - Nearest Neighbor with Pr=p1k
bullL hashings fail to collide with Pr=(1-p1k)L
bullTo ensure Collision (eg 1-δge90)
bull1( -1-p1k)Lge 1-δ)1log(
)log(
1kp
L
L
Reject Non-NeighborsAccept Neighbors
hellipParameters selection
K
k
time Candidates verification Candidates extraction
Better Query Time than Spatial Data Structures
Scales well to higher dimensions and larger data size ( Sub-linear dependence )
Predictable running time
Extra storage over-head
Inefficient for data with distances concentrated around average
works best for Hamming distance (although can be generalized to Euclidean space)
In secondary storage linear scan is pretty much all we can do (for high dim)
requires radius r to be fixed in advance
Pros amp Cons
From Pioter Indyk slides
Conclusion
bullbut at the endeverything depends on your data set
bullTry it at homendashVisit
httpwebmiteduandoniwwwLSHindexhtml
ndashEmail Alex AndoniAndonimitedundashTest over your own data
(C code under Red Hat Linux )
LSH - Applicationsbull Searching video clips in databases (Hierarchical Non-Uniform Locality Sensitive
Hashing and Its Application to Video Identificationldquo Yang Ooi Sun)
bull Searching image databases (see the following)
bull Image segmentation (see the following)
bull Image classification (ldquoDiscriminant adaptive Nearest Neighbor Classificationrdquo T Hastie R Tibshirani)
bull Texture classification (see the following)
bull Clustering (see the following)
bull Embedding and manifold learning (LLE and many others)
bull Compression ndash vector quantizationbull Search engines (ldquoLSH Forest SelfTuning Indexes for Similarity Searchrdquo M Bawa T Condie P Ganesanrdquo)
bull Genomics (ldquoEfficient Large-Scale Sequence Comparison by Locality-Sensitive Hashingrdquo J Buhler)
bull In short whenever K-Nearest Neighbors (KNN) are needed
Motivation
bull A variety of procedures in learning require KNN computation
bull KNN search is a computational bottleneck
bull LSH provides a fast approximate solution to the problem
bull LSH requires hash function construction and parameter tunning
Outline
Fast Pose Estimation with Parameter Sensitive Hashing G Shakhnarovich P Viola and T Darrell
bull Finding sensitive hash functions
Mean Shift Based Clustering in HighDimensions A Texture Classification Example
B Georgescu I Shimshoni and P Meer
bull Tuning LSH parametersbull LSH data structure is used for algorithm
speedups
Given an image x what are the parameters θ in this image
ie angles of joints orientation of the body etc1048698
The Problem
Fast Pose Estimation with Parameter Sensitive Hashing
G Shakhnarovich P Viola and T Darrell
i
Ingredients
bull Input query image with unknown angles (parameters)
bull Database of human poses with known anglesbull Image feature extractor ndash edge detector
bull Distance metric in feature space dx
bull Distance metric in angles space
m
i
iid1
2121 )cos(1)(
Example based learning
bull Construct a database of example images with their known angles
bull Given a query image run your favorite feature extractorbull Compute KNN from databasebull Use these KNNs to compute the average angles of the
query
Input queryFind KNN in database of examples
Output Average angles of KNN
Input Query
Features extraction
Processed query
PSH (LSH)
Database of examples
The algorithm flow
LWR (Regression)
Output Match
The image features
B A
Axx 4107 )(
4
3
2
4 0
Image features are multi-scale edge histograms
Feature Extraction PSH LWR
PSH The basic assumption
There are two metric spaces here feature space ( )
and parameter space ( )
We want similarity to be measured in the angles
space whereas LSH works on the feature space
bull Assumption The feature space is closely related to the parameter space
xd
d
Feature Extraction PSH LWR
Insight Manifolds
bull Manifold is a space in which every point has a neighborhood resembling a Euclid space
bull But global structure may be complicated curved
bull For example lines are 1D manifolds planes are 2D manifolds etc
Feature Extraction PSH LWR
Parameters Space (angles)
Feature Space
q
Is this Magic
Parameter Sensitive Hashing (PSH)
The trick
Estimate performance of different hash functions on examples and select those sensitive to
The hash functions are applied in feature space but the KNN are valid in angle space
d
Feature Extraction PSH LWR
Label pairs of examples with similar angles
Define hash functions h on feature space
Feature Extraction PSH LWR
Predict labeling of similarnon-similar examples by using h
Compare labeling
If labeling by h is goodaccept h else change h
PSH as a classification problem
+1 +1 -1 -1
(r=025)
Labels
)1()( if 1
)( if 1y
labeled is
)x()(x examples ofpair A
ij
ji
rd
rd
ji
ji
ji
Feature Extraction PSH LWR
otherwise 1-
T(x) if 1)(
xh T
A binary hash functionfeatures
otherwise 1
if 1ˆ
labels ePredict th
)(xh)(xh)x(xy
jTiTjih
Feature Extraction PSH LWR
Feature
Feature Extraction PSH LWR
sconstraint iesprobabilit with thelabeling true thepredicts that Tbest theFind
themseparateor bin
same in the examplesboth place willTh
)(xT
Local Weighted Regression (LWR)bull Given a query image PSH returns
KNNs
bull LWR uses the KNN to compute a weighted average of the estimated angles of the query
weightdist
iXiixNx
xxdKxgdi
0)(
))(())((minarg0
Feature Extraction PSH LWR
Results
Synthetic data were generated
bull 13 angles 1 for rotation of the torso 12 for joints
bull 150000 images
bull Nuisance parameters added clothing illumination face expression
bull 1775000 example pairs
bull Selected 137 out of 5123 meaningful features (how)
18 bit hash functions (k) 150 hash tables (l)
bull Test on 1000 synthetic examplesbull PSH searched only 34 of the data per query
bull Without selection needed 40 bits and
1000 hash tables
Recall P1 is prob of positive hashP2 is prob of bad hashB is the max number of pts in a bucket
Results ndash real data
bull 800 images
bull Processed by a segmentation algorithm
bull 13 of the data were searched
Results ndash real data
Interesting mismatches
Fast pose estimation - summary
bull Fast way to compute the angles of human body figure
bull Moving from one representation space to another
bull Training a sensitive hash function
bull KNN smart averaging
Food for Thought
bull The basic assumption may be problematic (distance metric representations)
bull The training set should be dense
bull Texture and clutter
bull General some features are more important than others and should be weighted
Food for Thought Point Location in Different Spheres (PLDS)
bull Given n spheres in Rd centered at P=p1hellippn
with radii r1helliprn
bull Goal given a query q preprocess the points in P to find point pi that its sphere lsquocoverrsquo the query q
qpi
ri
Courtesy of Mohamad Hegaze
Motivationbull Clustering high dimensional data by using local
density measurements (eg feature space)bull Statistical curse of dimensionality
sparseness of the databull Computational curse of dimensionality
expensive range queriesbull LSH parameters should be adjusted for optimal
performance
Mean-Shift Based Clustering in High Dimensions A Texture Classification Example
B Georgescu I Shimshoni and P Meer
Outline
bull Mean-shift in a nutshell + examples
Our scope
bull Mean-shift in high dimensions ndash using LSH
bull Speedups1 Finding optimal LSH parameters
2 Data-driven partitions into buckets
3 Additional speedup by using LSH data structure
Mean-Shift in a Nutshellbandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
point
KNN in mean-shift
Bandwidth should be inversely proportional to the density in the region
high density - small bandwidth low density - large bandwidth
Based on kth nearest neighbor of the point
The bandwidth is
Adaptive mean-shift vs non-adaptive
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Image segmentation algorithm1 Input Data in 5D (3 color + 2 xy) or 3D (1 gray +2 xy)2 Resolution controlled by the bandwidth hs (spatial) hr (color)3 Apply filtering
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
3D
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Image segmentation algorithm
original segmented
filtered
Filtering pixel value of the nearest mode
Mean-shift trajectories
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
original squirrel filtered
original baboon filtered
Filtering examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Segmentation examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Mean-shift in high dimensions
Computational curse of dimensionality
Statistical curse of dimensionality
Expensive range queries implemented with LSH
Sparseness of the data variable bandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
LSH-based data structure
bull Choose L random partitionsEach partition includes K pairs
(dkvk)bull For each point we check
kdi vxK
It Partitions the data into cells
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing the optimal K and L
bull For a query q compute smallest number of distances to points in its buckets
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
points extra includemight big toois L ifbut
missed bemight points small toois L If
cell ain points ofnumber smaller k Large
C
l
l
CC
dC
LNN
dKnN
)1(
C
C
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
structure data theof resolution thedetermines
decreases but increases increases L As
C
CC
Choosing optimal K and LDetermine accurately the KNN for m randomly-selected data points
distance (bandwidth)
Choose error threshold
The optimal K and L should satisfy
the approximate distance
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing optimal K and Lbull For each K estimate the error forbull In one run for all Lrsquos find the minimal L satisfying the constraint L(K)bull Minimize time t(KL(K))
minimum
Approximationerror for KL
L(K) for =005 Running timet[KL(K)]
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Data driven partitions
bull In original LSH cut values are random in the range of the databull Suggestion Randomly select a point from the data and use one of its coordinates as the cut value
uniform data driven pointsbucket distribution
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Additional speedup
aggregate)an of typea like is (C mode same
the toconverge willCin points all that Assume
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
C
C
Speedup results
65536 points 1638 points sampled k=100
Food for thought
Low dimension High dimension
A thought for foodhellipbull Choose K L by sample learning or take the traditionalbull Can one estimate K L without samplingbull A thought for food does it help to know the data
dimensionality or the data manifoldbull Intuitively dimensionality implies the number of hash
functions neededbull The catch efficient dimensionality learning requires
KNN
1530 cookieshellip
Summary
bull LSH suggests a compromise on accuracy for the gain of complexity
bull Applications that involve massive data in high dimension require the LSH fast performance
bull Extension of the LSH to different spaces (PSH)
bull Learning the LSH parameters and hash functions for different applications
Conclusion
bull but at the endeverything depends on your data set
bull Try it at homendash Visit
httpwebmiteduandoniwwwLSHindexhtmlndash Email Alex Andoni Andonimitedundash Test over your own data
(C code under Red Hat Linux )
Thanks
bull Ilan Shimshoni (Haifa)
bull Mohamad Hegaze (Weizmann)
bull Alex Andoni (MIT)
bull Mica and Denis
- k-Nearest Neighbors Search in High Dimensions
- Outline
- Nearest Neighbor Search Problem definition
- Applications
- Naiumlve solution
- Common solution
- When to use nearest neighbor
- Nearest Neighbor
- r - Nearest Neighbor
- Slide 10
- The simplest solution
- Quadtree
- Quadtree - structure
- Quadtree - Query
- Quadtree ndash Pitfall1
- Slide 16
- Quadtree ndash pitfall 2
- Space partition based algorithms
- Slide 19
- Curse of dimensionality
- Curse of dimensionality Some intuition
- Slide 22
- Preview
- Hash function
- Slide 25
- Slide 26
- Recall r - Nearest Neighbor
- Locality sensitive hashing
- Slide 29
- Hamming Space
- Slide 31
- L1 to Hamming Space Embedding
- Slide 33
- Construction
- Query
- Alternative intuition random projections
- Slide 37
- Slide 38
- Slide 39
- k samplings
- Repeating
- Repeating L times
- Slide 43
- Secondary hashing
- The above hashing is locality-sensitive
- Slide 46
- Direct L2 solution
- Central limit theorem
- Slide 49
- Slide 50
- Norm Distance
- Slide 52
- The full Hashing
- Slide 54
- Slide 55
- Slide 56
- Generalization P-Stable distribution
- P-Stable summary
- Parameters selection
- Parameters selection hellip
- hellip Parameters selection
- Pros amp Cons
- Conclusion
- LSH - Applications
- Motivation
- Slide 66
- Given an image x what are the parameters θ in this image ie angles of joints orientation of the body etc1048698
- Ingredients
- Example based learning
- Slide 70
- The image features
- PSH The basic assumption
- Insight Manifolds
- Slide 74
- Parameter Sensitive Hashing (PSH)
- Slide 76
- Slide 77
- Slide 78
- Slide 79
- Local Weighted Regression (LWR)
- Results
- Slide 82
- Results ndash real data
- Slide 84
- Slide 85
- Fast pose estimation - summary
- Food for Thought
- Food for Thought Point Location in Different Spheres (PLDS)
- Motivation
- Slide 90
- Slide 91
- Slide 92
- Slide 93
- Image segmentation algorithm
- Slide 95
- Filtering examples
- Segmentation examples
- Mean-shift in high dimensions
- LSH-based data structure
- Choosing the optimal K and L
- Slide 101
- Choosing optimal K and L
- Slide 103
- Data driven partitions
- Additional speedup
- Speedup results
- Food for thought
- A thought for foodhellip
- Summary
- Slide 110
- Thanks
-
![Page 6: K-Nearest Neighbors Search in High Dimensions Tomer Peled Dan Kushnir Tell me who your neighbors are, and I'll know who you are.](https://reader036.fdocuments.us/reader036/viewer/2022062421/56649c765503460f94929f53/html5/thumbnails/6.jpg)
Common solution
bullUse a data structure for acceleration
bullScale-ability with n amp with d is important
When to use nearest neighbor
High level algorithms
Assuming no prior knowledge about the underlying probability structure
complex models Sparse data High dimensions
Parametric Non-parametric
Density estimation
Probability distribution estimation
Nearest neighbors
Nearest Neighbor
min pi P dist(qpi)
Closestqq
r - Nearest Neighbor
r
(1 + ) r
dist(qp1) r
dist(qp2) (1 + ) r r2=(1 + ) r1
Outline
bullProblem definition and flavorsbullAlgorithms overview - low dimensionsAlgorithms overview - low dimensions bullCurse of dimensionality (dgt1020)bullEnchanting the curse
Locality Sensitive Hashing (high dimension approximate solutions)
bulll2 extensionbullApplications (Dan)
The simplest solution
bullLion in the desert
Quadtree
Split the first dimension into 2
Repeat iteratively
Stop when each cell has no more than 1 data point
Quadtree - structure
X
Y
X1Y1 PgeX1PgeY1
PltX1PltY1
PgeX1PltY1
PltX1PgeY1
X1Y1
Quadtree - Query
X
Y
In many cases works
X1Y1PltX1PltY1 PltX1
PgeY1
X1Y1
PgeX1PgeY1
PgeX1PltY1
Quadtree ndash Pitfall1
X
Y
In some cases doesnrsquot
X1Y1PgeX1PgeY1
PltX1
PltX1PltY1 PgeX1
PltY1PltX1PgeY1
X1Y1
Quadtree ndash Pitfall1
X
Y
In some cases nothing works
Quadtree ndash pitfall 2X
Y
O(2d)
Could result in Query time Exponential in dimensions
Space partition based algorithms
Multidimensional access methods Volker Gaede O Gunther
Could be improved
Outline
bullProblem definition and flavorsbullAlgorithms overview - low dimensions bullCurse of dimensionality (dgt1020)Curse of dimensionality (dgt1020)bullEnchanting the curse
Locality Sensitive Hashing (high dimension approximate solutions)
bulll2 extensionbullApplications (Dan)
Curse of dimensionality
bullQuery time or spaceO(nd)bullDgt1020 worst than sequential scan
ndashFor most geometric distributionsbullTechniques specific to high dimensions are needed
bullProoved in theory and in practice by Barkol amp Rabani 2000 amp Beame-Vee 2002
O( min(nd nd) )Naive
Curse of dimensionalitySome intuition
2
22
23
2d
Outline
bullProblem definition and flavorsbullAlgorithms overview - low dimensions bullCurse of dimensionality (dgt1020)bullEnchanting the curse Enchanting the curse
Locality Sensitive Hashing Locality Sensitive Hashing (high dimension approximate solutions)
bulll2 extensionbullApplications (Dan)
Preview
bullGeneral Solution ndash Locality sensitive hashing
bullImplementation for Hamming space
bullGeneralization to l1 amp l2
Hash function
Hash function
Hash function
Data_Item
Key
BinBucket
Hash function
X modulo 3
X=Number in the range 0n
02
Storage Address
Data structure
0
Usually we would like related Data-items to be stored at the same bin
Recall r - Nearest Neighbor
r
(1 + ) r
dist(qp1) r
dist(qp2) (1 + ) r r2=(1 + ) r1
Locality sensitive hashing
r(1 + ) r
(r p1p2 )Sensitiveequiv Pr[I(p)=I(q)] is ldquohighrdquo if p is ldquocloserdquo to qequiv Pr[I(p)=I(q)] is ldquolowrdquo if p isrdquofarrdquo from q
r2=(1 + ) r1
P1P2
Preview
bullGeneral Solution ndash Locality sensitive hashing
bullImplementation for Hamming space
bullGeneralization to l1 amp l2
Hamming Space
bullHamming space = 2N binary strings
bullHamming distance = changed digits
aka Signal distanceRichard Hamming
Hamming SpaceN
010100001111
010100001111
010010000011Distance = 4
bullHamming space
bullHamming distance
SUM(X1 XOR X2)
L1 to Hamming Space Embedding
p
8
C=11
1111111100011000000000
2
1111111100011000000000
drsquo=Cd
Hash function
Lj Hash function
p Hdrsquoisin
Gj(p)=p|Ij
j=1L k=3 digits
Bits sampling from p
Store p into bucket p|Ij 2k buckets101
11000000000 111111110000 111000000000 111111110001
Construction
1 2 L
p
Query
1 2 L
q
Alternative intuition random projections
p
8
C=11
1111111100011000000000
2
1111111100011000000000
drsquo=Cd
Alternative intuition random projections
8
C=11
1111111100011000000000
2
1111111100011000000000
p
Alternative intuition random projections
8
C=11
1111111100011000000000
2
1111111100011000000000
p
Alternative intuition random projections
101
11000000000 111111110000 111000000000 111111110001
000
100
110
001
101
111
2233 BucketsBucketsp
k samplings
Repeating
Repeating L times
Repeating L times
Secondary hashing
Support volume tuning
dataset-size vs storage volume
2k buckets
011
Size=B
M Buckets
Simple Hashing
MB=αn α=2
Skip
The above hashing is locality-sensitive
bullProbability (pq in same bucket)=
k=1 k=2
Distance (qpi) Distance (qpi)
Pro
babi
lity Pr
Adopted from Piotr Indykrsquos slides
kqp
dimensions
)(Distance1
Preview
bullGeneral Solution ndash Locality sensitive hashing
bullImplementation for Hamming space
bullGeneralization to l2
Direct L2 solution
bullNew hashing function
bullStill based on sampling
bullUsing mathematical trick
bullP-stable distribution for Lp distance bullGaussian distribution for L2 distance
Central limit theorem
v1 +v2 hellip+vn =+hellip
(Weighted Gaussians) = Weighted Gaussian
Central limit theorem
v1vn = Real Numbers
X1Xn = Independent Identically Distributed(iid)
+v2 X2 hellip+vn Xn =+hellipv1 X1
Central limit theorem
XvXvi
ii
ii
21
2||
Dot Product Norm
Norm Distance
XvuXvXui
iii
iii
ii
21
2||
Features vector 1
Features vector 2 Distance
Norm Distance
XvuXvXui
iii
iii
ii
21
2||
Dot Product
Dot Product Distance
The full Hashing
w
bvavh ba )(
[34 82 21]1
227742
d
d random numbers
+b
phaseRandom[0w]
wDiscretization step
Features vector
The full Hashing
w
bvavh ba )(
+34
100
7944
7900 8000 8100 82007800
The full Hashing
w
bvavh ba )(
+34
phaseRandom[0w]
100Discretization step
7944
The full Hashing
w
bvavh ba )(
a1 v d
iid from p-stable distribution
+b
phaseRandom[0w]
wDiscretization step
Features vector
Generalization P-Stable distribution
bullLp p=eps2
bullGeneralized Central Limit Theorem
bullP-stable distributionCauchy for L2
bullL2
bullCentral Limit Theorem
bullGaussian (normal) distribution
P-Stable summary
bullWorks for bullGeneralizes to 0ltplt=2
bullImproves query time
Query time = O (dn1(1+)log(n) ) O (dn1(1+)^2log(n) )
r - Nearest Neighbor
Latest resultsReported in Email by
Alexander Andoni
Parameters selection
bull90 Probability Best quarry time performance
For Euclidean Space
Parameters selectionhellip
For Euclidean Space
bullSingle projection hit an - Nearest Neighbor with Pr=p1
bullk projections hits an - Nearest Neighbor with Pr=p1k
bullL hashings fail to collide with Pr=(1-p1k)L
bullTo ensure Collision (eg 1-δge90)
bull1( -1-p1k)Lge 1-δ)1log(
)log(
1kp
L
L
Reject Non-NeighborsAccept Neighbors
hellipParameters selection
K
k
time Candidates verification Candidates extraction
Better Query Time than Spatial Data Structures
Scales well to higher dimensions and larger data size ( Sub-linear dependence )
Predictable running time
Extra storage over-head
Inefficient for data with distances concentrated around average
works best for Hamming distance (although can be generalized to Euclidean space)
In secondary storage linear scan is pretty much all we can do (for high dim)
requires radius r to be fixed in advance
Pros amp Cons
From Pioter Indyk slides
Conclusion
bullbut at the endeverything depends on your data set
bullTry it at homendashVisit
httpwebmiteduandoniwwwLSHindexhtml
ndashEmail Alex AndoniAndonimitedundashTest over your own data
(C code under Red Hat Linux )
LSH - Applicationsbull Searching video clips in databases (Hierarchical Non-Uniform Locality Sensitive
Hashing and Its Application to Video Identificationldquo Yang Ooi Sun)
bull Searching image databases (see the following)
bull Image segmentation (see the following)
bull Image classification (ldquoDiscriminant adaptive Nearest Neighbor Classificationrdquo T Hastie R Tibshirani)
bull Texture classification (see the following)
bull Clustering (see the following)
bull Embedding and manifold learning (LLE and many others)
bull Compression ndash vector quantizationbull Search engines (ldquoLSH Forest SelfTuning Indexes for Similarity Searchrdquo M Bawa T Condie P Ganesanrdquo)
bull Genomics (ldquoEfficient Large-Scale Sequence Comparison by Locality-Sensitive Hashingrdquo J Buhler)
bull In short whenever K-Nearest Neighbors (KNN) are needed
Motivation
bull A variety of procedures in learning require KNN computation
bull KNN search is a computational bottleneck
bull LSH provides a fast approximate solution to the problem
bull LSH requires hash function construction and parameter tunning
Outline
Fast Pose Estimation with Parameter Sensitive Hashing G Shakhnarovich P Viola and T Darrell
bull Finding sensitive hash functions
Mean Shift Based Clustering in HighDimensions A Texture Classification Example
B Georgescu I Shimshoni and P Meer
bull Tuning LSH parametersbull LSH data structure is used for algorithm
speedups
Given an image x what are the parameters θ in this image
ie angles of joints orientation of the body etc1048698
The Problem
Fast Pose Estimation with Parameter Sensitive Hashing
G Shakhnarovich P Viola and T Darrell
i
Ingredients
bull Input query image with unknown angles (parameters)
bull Database of human poses with known anglesbull Image feature extractor ndash edge detector
bull Distance metric in feature space dx
bull Distance metric in angles space
m
i
iid1
2121 )cos(1)(
Example based learning
bull Construct a database of example images with their known angles
bull Given a query image run your favorite feature extractorbull Compute KNN from databasebull Use these KNNs to compute the average angles of the
query
Input queryFind KNN in database of examples
Output Average angles of KNN
Input Query
Features extraction
Processed query
PSH (LSH)
Database of examples
The algorithm flow
LWR (Regression)
Output Match
The image features
B A
Axx 4107 )(
4
3
2
4 0
Image features are multi-scale edge histograms
Feature Extraction PSH LWR
PSH The basic assumption
There are two metric spaces here feature space ( )
and parameter space ( )
We want similarity to be measured in the angles
space whereas LSH works on the feature space
bull Assumption The feature space is closely related to the parameter space
xd
d
Feature Extraction PSH LWR
Insight Manifolds
bull Manifold is a space in which every point has a neighborhood resembling a Euclid space
bull But global structure may be complicated curved
bull For example lines are 1D manifolds planes are 2D manifolds etc
Feature Extraction PSH LWR
Parameters Space (angles)
Feature Space
q
Is this Magic
Parameter Sensitive Hashing (PSH)
The trick
Estimate performance of different hash functions on examples and select those sensitive to
The hash functions are applied in feature space but the KNN are valid in angle space
d
Feature Extraction PSH LWR
Label pairs of examples with similar angles
Define hash functions h on feature space
Feature Extraction PSH LWR
Predict labeling of similarnon-similar examples by using h
Compare labeling
If labeling by h is goodaccept h else change h
PSH as a classification problem
+1 +1 -1 -1
(r=025)
Labels
)1()( if 1
)( if 1y
labeled is
)x()(x examples ofpair A
ij
ji
rd
rd
ji
ji
ji
Feature Extraction PSH LWR
otherwise 1-
T(x) if 1)(
xh T
A binary hash functionfeatures
otherwise 1
if 1ˆ
labels ePredict th
)(xh)(xh)x(xy
jTiTjih
Feature Extraction PSH LWR
Feature
Feature Extraction PSH LWR
sconstraint iesprobabilit with thelabeling true thepredicts that Tbest theFind
themseparateor bin
same in the examplesboth place willTh
)(xT
Local Weighted Regression (LWR)bull Given a query image PSH returns
KNNs
bull LWR uses the KNN to compute a weighted average of the estimated angles of the query
weightdist
iXiixNx
xxdKxgdi
0)(
))(())((minarg0
Feature Extraction PSH LWR
Results
Synthetic data were generated
bull 13 angles 1 for rotation of the torso 12 for joints
bull 150000 images
bull Nuisance parameters added clothing illumination face expression
bull 1775000 example pairs
bull Selected 137 out of 5123 meaningful features (how)
18 bit hash functions (k) 150 hash tables (l)
bull Test on 1000 synthetic examplesbull PSH searched only 34 of the data per query
bull Without selection needed 40 bits and
1000 hash tables
Recall P1 is prob of positive hashP2 is prob of bad hashB is the max number of pts in a bucket
Results ndash real data
bull 800 images
bull Processed by a segmentation algorithm
bull 13 of the data were searched
Results ndash real data
Interesting mismatches
Fast pose estimation - summary
bull Fast way to compute the angles of human body figure
bull Moving from one representation space to another
bull Training a sensitive hash function
bull KNN smart averaging
Food for Thought
bull The basic assumption may be problematic (distance metric representations)
bull The training set should be dense
bull Texture and clutter
bull General some features are more important than others and should be weighted
Food for Thought Point Location in Different Spheres (PLDS)
bull Given n spheres in Rd centered at P=p1hellippn
with radii r1helliprn
bull Goal given a query q preprocess the points in P to find point pi that its sphere lsquocoverrsquo the query q
qpi
ri
Courtesy of Mohamad Hegaze
Motivationbull Clustering high dimensional data by using local
density measurements (eg feature space)bull Statistical curse of dimensionality
sparseness of the databull Computational curse of dimensionality
expensive range queriesbull LSH parameters should be adjusted for optimal
performance
Mean-Shift Based Clustering in High Dimensions A Texture Classification Example
B Georgescu I Shimshoni and P Meer
Outline
bull Mean-shift in a nutshell + examples
Our scope
bull Mean-shift in high dimensions ndash using LSH
bull Speedups1 Finding optimal LSH parameters
2 Data-driven partitions into buckets
3 Additional speedup by using LSH data structure
Mean-Shift in a Nutshellbandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
point
KNN in mean-shift
Bandwidth should be inversely proportional to the density in the region
high density - small bandwidth low density - large bandwidth
Based on kth nearest neighbor of the point
The bandwidth is
Adaptive mean-shift vs non-adaptive
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Image segmentation algorithm1 Input Data in 5D (3 color + 2 xy) or 3D (1 gray +2 xy)2 Resolution controlled by the bandwidth hs (spatial) hr (color)3 Apply filtering
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
3D
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Image segmentation algorithm
original segmented
filtered
Filtering pixel value of the nearest mode
Mean-shift trajectories
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
original squirrel filtered
original baboon filtered
Filtering examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Segmentation examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Mean-shift in high dimensions
Computational curse of dimensionality
Statistical curse of dimensionality
Expensive range queries implemented with LSH
Sparseness of the data variable bandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
LSH-based data structure
bull Choose L random partitionsEach partition includes K pairs
(dkvk)bull For each point we check
kdi vxK
It Partitions the data into cells
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing the optimal K and L
bull For a query q compute smallest number of distances to points in its buckets
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
points extra includemight big toois L ifbut
missed bemight points small toois L If
cell ain points ofnumber smaller k Large
C
l
l
CC
dC
LNN
dKnN
)1(
C
C
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
structure data theof resolution thedetermines
decreases but increases increases L As
C
CC
Choosing optimal K and LDetermine accurately the KNN for m randomly-selected data points
distance (bandwidth)
Choose error threshold
The optimal K and L should satisfy
the approximate distance
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing optimal K and Lbull For each K estimate the error forbull In one run for all Lrsquos find the minimal L satisfying the constraint L(K)bull Minimize time t(KL(K))
minimum
Approximationerror for KL
L(K) for =005 Running timet[KL(K)]
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Data driven partitions
bull In original LSH cut values are random in the range of the databull Suggestion Randomly select a point from the data and use one of its coordinates as the cut value
uniform data driven pointsbucket distribution
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Additional speedup
aggregate)an of typea like is (C mode same
the toconverge willCin points all that Assume
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
C
C
Speedup results
65536 points 1638 points sampled k=100
Food for thought
Low dimension High dimension
A thought for foodhellipbull Choose K L by sample learning or take the traditionalbull Can one estimate K L without samplingbull A thought for food does it help to know the data
dimensionality or the data manifoldbull Intuitively dimensionality implies the number of hash
functions neededbull The catch efficient dimensionality learning requires
KNN
1530 cookieshellip
Summary
bull LSH suggests a compromise on accuracy for the gain of complexity
bull Applications that involve massive data in high dimension require the LSH fast performance
bull Extension of the LSH to different spaces (PSH)
bull Learning the LSH parameters and hash functions for different applications
Conclusion
bull but at the endeverything depends on your data set
bull Try it at homendash Visit
httpwebmiteduandoniwwwLSHindexhtmlndash Email Alex Andoni Andonimitedundash Test over your own data
(C code under Red Hat Linux )
Thanks
bull Ilan Shimshoni (Haifa)
bull Mohamad Hegaze (Weizmann)
bull Alex Andoni (MIT)
bull Mica and Denis
- k-Nearest Neighbors Search in High Dimensions
- Outline
- Nearest Neighbor Search Problem definition
- Applications
- Naiumlve solution
- Common solution
- When to use nearest neighbor
- Nearest Neighbor
- r - Nearest Neighbor
- Slide 10
- The simplest solution
- Quadtree
- Quadtree - structure
- Quadtree - Query
- Quadtree ndash Pitfall1
- Slide 16
- Quadtree ndash pitfall 2
- Space partition based algorithms
- Slide 19
- Curse of dimensionality
- Curse of dimensionality Some intuition
- Slide 22
- Preview
- Hash function
- Slide 25
- Slide 26
- Recall r - Nearest Neighbor
- Locality sensitive hashing
- Slide 29
- Hamming Space
- Slide 31
- L1 to Hamming Space Embedding
- Slide 33
- Construction
- Query
- Alternative intuition random projections
- Slide 37
- Slide 38
- Slide 39
- k samplings
- Repeating
- Repeating L times
- Slide 43
- Secondary hashing
- The above hashing is locality-sensitive
- Slide 46
- Direct L2 solution
- Central limit theorem
- Slide 49
- Slide 50
- Norm Distance
- Slide 52
- The full Hashing
- Slide 54
- Slide 55
- Slide 56
- Generalization P-Stable distribution
- P-Stable summary
- Parameters selection
- Parameters selection hellip
- hellip Parameters selection
- Pros amp Cons
- Conclusion
- LSH - Applications
- Motivation
- Slide 66
- Given an image x what are the parameters θ in this image ie angles of joints orientation of the body etc1048698
- Ingredients
- Example based learning
- Slide 70
- The image features
- PSH The basic assumption
- Insight Manifolds
- Slide 74
- Parameter Sensitive Hashing (PSH)
- Slide 76
- Slide 77
- Slide 78
- Slide 79
- Local Weighted Regression (LWR)
- Results
- Slide 82
- Results ndash real data
- Slide 84
- Slide 85
- Fast pose estimation - summary
- Food for Thought
- Food for Thought Point Location in Different Spheres (PLDS)
- Motivation
- Slide 90
- Slide 91
- Slide 92
- Slide 93
- Image segmentation algorithm
- Slide 95
- Filtering examples
- Segmentation examples
- Mean-shift in high dimensions
- LSH-based data structure
- Choosing the optimal K and L
- Slide 101
- Choosing optimal K and L
- Slide 103
- Data driven partitions
- Additional speedup
- Speedup results
- Food for thought
- A thought for foodhellip
- Summary
- Slide 110
- Thanks
-
![Page 7: K-Nearest Neighbors Search in High Dimensions Tomer Peled Dan Kushnir Tell me who your neighbors are, and I'll know who you are.](https://reader036.fdocuments.us/reader036/viewer/2022062421/56649c765503460f94929f53/html5/thumbnails/7.jpg)
When to use nearest neighbor
High level algorithms
Assuming no prior knowledge about the underlying probability structure
complex models Sparse data High dimensions
Parametric Non-parametric
Density estimation
Probability distribution estimation
Nearest neighbors
Nearest Neighbor
min pi P dist(qpi)
Closestqq
r - Nearest Neighbor
r
(1 + ) r
dist(qp1) r
dist(qp2) (1 + ) r r2=(1 + ) r1
Outline
bullProblem definition and flavorsbullAlgorithms overview - low dimensionsAlgorithms overview - low dimensions bullCurse of dimensionality (dgt1020)bullEnchanting the curse
Locality Sensitive Hashing (high dimension approximate solutions)
bulll2 extensionbullApplications (Dan)
The simplest solution
bullLion in the desert
Quadtree
Split the first dimension into 2
Repeat iteratively
Stop when each cell has no more than 1 data point
Quadtree - structure
X
Y
X1Y1 PgeX1PgeY1
PltX1PltY1
PgeX1PltY1
PltX1PgeY1
X1Y1
Quadtree - Query
X
Y
In many cases works
X1Y1PltX1PltY1 PltX1
PgeY1
X1Y1
PgeX1PgeY1
PgeX1PltY1
Quadtree ndash Pitfall1
X
Y
In some cases doesnrsquot
X1Y1PgeX1PgeY1
PltX1
PltX1PltY1 PgeX1
PltY1PltX1PgeY1
X1Y1
Quadtree ndash Pitfall1
X
Y
In some cases nothing works
Quadtree ndash pitfall 2X
Y
O(2d)
Could result in Query time Exponential in dimensions
Space partition based algorithms
Multidimensional access methods Volker Gaede O Gunther
Could be improved
Outline
bullProblem definition and flavorsbullAlgorithms overview - low dimensions bullCurse of dimensionality (dgt1020)Curse of dimensionality (dgt1020)bullEnchanting the curse
Locality Sensitive Hashing (high dimension approximate solutions)
bulll2 extensionbullApplications (Dan)
Curse of dimensionality
bullQuery time or spaceO(nd)bullDgt1020 worst than sequential scan
ndashFor most geometric distributionsbullTechniques specific to high dimensions are needed
bullProoved in theory and in practice by Barkol amp Rabani 2000 amp Beame-Vee 2002
O( min(nd nd) )Naive
Curse of dimensionalitySome intuition
2
22
23
2d
Outline
bullProblem definition and flavorsbullAlgorithms overview - low dimensions bullCurse of dimensionality (dgt1020)bullEnchanting the curse Enchanting the curse
Locality Sensitive Hashing Locality Sensitive Hashing (high dimension approximate solutions)
bulll2 extensionbullApplications (Dan)
Preview
bullGeneral Solution ndash Locality sensitive hashing
bullImplementation for Hamming space
bullGeneralization to l1 amp l2
Hash function
Hash function
Hash function
Data_Item
Key
BinBucket
Hash function
X modulo 3
X=Number in the range 0n
02
Storage Address
Data structure
0
Usually we would like related Data-items to be stored at the same bin
Recall r - Nearest Neighbor
r
(1 + ) r
dist(qp1) r
dist(qp2) (1 + ) r r2=(1 + ) r1
Locality sensitive hashing
r(1 + ) r
(r p1p2 )Sensitiveequiv Pr[I(p)=I(q)] is ldquohighrdquo if p is ldquocloserdquo to qequiv Pr[I(p)=I(q)] is ldquolowrdquo if p isrdquofarrdquo from q
r2=(1 + ) r1
P1P2
Preview
bullGeneral Solution ndash Locality sensitive hashing
bullImplementation for Hamming space
bullGeneralization to l1 amp l2
Hamming Space
bullHamming space = 2N binary strings
bullHamming distance = changed digits
aka Signal distanceRichard Hamming
Hamming SpaceN
010100001111
010100001111
010010000011Distance = 4
bullHamming space
bullHamming distance
SUM(X1 XOR X2)
L1 to Hamming Space Embedding
p
8
C=11
1111111100011000000000
2
1111111100011000000000
drsquo=Cd
Hash function
Lj Hash function
p Hdrsquoisin
Gj(p)=p|Ij
j=1L k=3 digits
Bits sampling from p
Store p into bucket p|Ij 2k buckets101
11000000000 111111110000 111000000000 111111110001
Construction
1 2 L
p
Query
1 2 L
q
Alternative intuition random projections
p
8
C=11
1111111100011000000000
2
1111111100011000000000
drsquo=Cd
Alternative intuition random projections
8
C=11
1111111100011000000000
2
1111111100011000000000
p
Alternative intuition random projections
8
C=11
1111111100011000000000
2
1111111100011000000000
p
Alternative intuition random projections
101
11000000000 111111110000 111000000000 111111110001
000
100
110
001
101
111
2233 BucketsBucketsp
k samplings
Repeating
Repeating L times
Repeating L times
Secondary hashing
Support volume tuning
dataset-size vs storage volume
2k buckets
011
Size=B
M Buckets
Simple Hashing
MB=αn α=2
Skip
The above hashing is locality-sensitive
bullProbability (pq in same bucket)=
k=1 k=2
Distance (qpi) Distance (qpi)
Pro
babi
lity Pr
Adopted from Piotr Indykrsquos slides
kqp
dimensions
)(Distance1
Preview
bullGeneral Solution ndash Locality sensitive hashing
bullImplementation for Hamming space
bullGeneralization to l2
Direct L2 solution
bullNew hashing function
bullStill based on sampling
bullUsing mathematical trick
bullP-stable distribution for Lp distance bullGaussian distribution for L2 distance
Central limit theorem
v1 +v2 hellip+vn =+hellip
(Weighted Gaussians) = Weighted Gaussian
Central limit theorem
v1vn = Real Numbers
X1Xn = Independent Identically Distributed(iid)
+v2 X2 hellip+vn Xn =+hellipv1 X1
Central limit theorem
XvXvi
ii
ii
21
2||
Dot Product Norm
Norm Distance
XvuXvXui
iii
iii
ii
21
2||
Features vector 1
Features vector 2 Distance
Norm Distance
XvuXvXui
iii
iii
ii
21
2||
Dot Product
Dot Product Distance
The full Hashing
w
bvavh ba )(
[34 82 21]1
227742
d
d random numbers
+b
phaseRandom[0w]
wDiscretization step
Features vector
The full Hashing
w
bvavh ba )(
+34
100
7944
7900 8000 8100 82007800
The full Hashing
w
bvavh ba )(
+34
phaseRandom[0w]
100Discretization step
7944
The full Hashing
w
bvavh ba )(
a1 v d
iid from p-stable distribution
+b
phaseRandom[0w]
wDiscretization step
Features vector
Generalization P-Stable distribution
bullLp p=eps2
bullGeneralized Central Limit Theorem
bullP-stable distributionCauchy for L2
bullL2
bullCentral Limit Theorem
bullGaussian (normal) distribution
P-Stable summary
bullWorks for bullGeneralizes to 0ltplt=2
bullImproves query time
Query time = O (dn1(1+)log(n) ) O (dn1(1+)^2log(n) )
r - Nearest Neighbor
Latest resultsReported in Email by
Alexander Andoni
Parameters selection
bull90 Probability Best quarry time performance
For Euclidean Space
Parameters selectionhellip
For Euclidean Space
bullSingle projection hit an - Nearest Neighbor with Pr=p1
bullk projections hits an - Nearest Neighbor with Pr=p1k
bullL hashings fail to collide with Pr=(1-p1k)L
bullTo ensure Collision (eg 1-δge90)
bull1( -1-p1k)Lge 1-δ)1log(
)log(
1kp
L
L
Reject Non-NeighborsAccept Neighbors
hellipParameters selection
K
k
time Candidates verification Candidates extraction
Better Query Time than Spatial Data Structures
Scales well to higher dimensions and larger data size ( Sub-linear dependence )
Predictable running time
Extra storage over-head
Inefficient for data with distances concentrated around average
works best for Hamming distance (although can be generalized to Euclidean space)
In secondary storage linear scan is pretty much all we can do (for high dim)
requires radius r to be fixed in advance
Pros amp Cons
From Pioter Indyk slides
Conclusion
bullbut at the endeverything depends on your data set
bullTry it at homendashVisit
httpwebmiteduandoniwwwLSHindexhtml
ndashEmail Alex AndoniAndonimitedundashTest over your own data
(C code under Red Hat Linux )
LSH - Applicationsbull Searching video clips in databases (Hierarchical Non-Uniform Locality Sensitive
Hashing and Its Application to Video Identificationldquo Yang Ooi Sun)
bull Searching image databases (see the following)
bull Image segmentation (see the following)
bull Image classification (ldquoDiscriminant adaptive Nearest Neighbor Classificationrdquo T Hastie R Tibshirani)
bull Texture classification (see the following)
bull Clustering (see the following)
bull Embedding and manifold learning (LLE and many others)
bull Compression ndash vector quantizationbull Search engines (ldquoLSH Forest SelfTuning Indexes for Similarity Searchrdquo M Bawa T Condie P Ganesanrdquo)
bull Genomics (ldquoEfficient Large-Scale Sequence Comparison by Locality-Sensitive Hashingrdquo J Buhler)
bull In short whenever K-Nearest Neighbors (KNN) are needed
Motivation
bull A variety of procedures in learning require KNN computation
bull KNN search is a computational bottleneck
bull LSH provides a fast approximate solution to the problem
bull LSH requires hash function construction and parameter tunning
Outline
Fast Pose Estimation with Parameter Sensitive Hashing G Shakhnarovich P Viola and T Darrell
bull Finding sensitive hash functions
Mean Shift Based Clustering in HighDimensions A Texture Classification Example
B Georgescu I Shimshoni and P Meer
bull Tuning LSH parametersbull LSH data structure is used for algorithm
speedups
Given an image x what are the parameters θ in this image
ie angles of joints orientation of the body etc1048698
The Problem
Fast Pose Estimation with Parameter Sensitive Hashing
G Shakhnarovich P Viola and T Darrell
i
Ingredients
bull Input query image with unknown angles (parameters)
bull Database of human poses with known anglesbull Image feature extractor ndash edge detector
bull Distance metric in feature space dx
bull Distance metric in angles space
m
i
iid1
2121 )cos(1)(
Example based learning
bull Construct a database of example images with their known angles
bull Given a query image run your favorite feature extractorbull Compute KNN from databasebull Use these KNNs to compute the average angles of the
query
Input queryFind KNN in database of examples
Output Average angles of KNN
Input Query
Features extraction
Processed query
PSH (LSH)
Database of examples
The algorithm flow
LWR (Regression)
Output Match
The image features
B A
Axx 4107 )(
4
3
2
4 0
Image features are multi-scale edge histograms
Feature Extraction PSH LWR
PSH The basic assumption
There are two metric spaces here feature space ( )
and parameter space ( )
We want similarity to be measured in the angles
space whereas LSH works on the feature space
bull Assumption The feature space is closely related to the parameter space
xd
d
Feature Extraction PSH LWR
Insight Manifolds
bull Manifold is a space in which every point has a neighborhood resembling a Euclid space
bull But global structure may be complicated curved
bull For example lines are 1D manifolds planes are 2D manifolds etc
Feature Extraction PSH LWR
Parameters Space (angles)
Feature Space
q
Is this Magic
Parameter Sensitive Hashing (PSH)
The trick
Estimate performance of different hash functions on examples and select those sensitive to
The hash functions are applied in feature space but the KNN are valid in angle space
d
Feature Extraction PSH LWR
Label pairs of examples with similar angles
Define hash functions h on feature space
Feature Extraction PSH LWR
Predict labeling of similarnon-similar examples by using h
Compare labeling
If labeling by h is goodaccept h else change h
PSH as a classification problem
+1 +1 -1 -1
(r=025)
Labels
)1()( if 1
)( if 1y
labeled is
)x()(x examples ofpair A
ij
ji
rd
rd
ji
ji
ji
Feature Extraction PSH LWR
otherwise 1-
T(x) if 1)(
xh T
A binary hash functionfeatures
otherwise 1
if 1ˆ
labels ePredict th
)(xh)(xh)x(xy
jTiTjih
Feature Extraction PSH LWR
Feature
Feature Extraction PSH LWR
sconstraint iesprobabilit with thelabeling true thepredicts that Tbest theFind
themseparateor bin
same in the examplesboth place willTh
)(xT
Local Weighted Regression (LWR)bull Given a query image PSH returns
KNNs
bull LWR uses the KNN to compute a weighted average of the estimated angles of the query
weightdist
iXiixNx
xxdKxgdi
0)(
))(())((minarg0
Feature Extraction PSH LWR
Results
Synthetic data were generated
bull 13 angles 1 for rotation of the torso 12 for joints
bull 150000 images
bull Nuisance parameters added clothing illumination face expression
bull 1775000 example pairs
bull Selected 137 out of 5123 meaningful features (how)
18 bit hash functions (k) 150 hash tables (l)
bull Test on 1000 synthetic examplesbull PSH searched only 34 of the data per query
bull Without selection needed 40 bits and
1000 hash tables
Recall P1 is prob of positive hashP2 is prob of bad hashB is the max number of pts in a bucket
Results ndash real data
bull 800 images
bull Processed by a segmentation algorithm
bull 13 of the data were searched
Results ndash real data
Interesting mismatches
Fast pose estimation - summary
bull Fast way to compute the angles of human body figure
bull Moving from one representation space to another
bull Training a sensitive hash function
bull KNN smart averaging
Food for Thought
bull The basic assumption may be problematic (distance metric representations)
bull The training set should be dense
bull Texture and clutter
bull General some features are more important than others and should be weighted
Food for Thought Point Location in Different Spheres (PLDS)
bull Given n spheres in Rd centered at P=p1hellippn
with radii r1helliprn
bull Goal given a query q preprocess the points in P to find point pi that its sphere lsquocoverrsquo the query q
qpi
ri
Courtesy of Mohamad Hegaze
Motivationbull Clustering high dimensional data by using local
density measurements (eg feature space)bull Statistical curse of dimensionality
sparseness of the databull Computational curse of dimensionality
expensive range queriesbull LSH parameters should be adjusted for optimal
performance
Mean-Shift Based Clustering in High Dimensions A Texture Classification Example
B Georgescu I Shimshoni and P Meer
Outline
bull Mean-shift in a nutshell + examples
Our scope
bull Mean-shift in high dimensions ndash using LSH
bull Speedups1 Finding optimal LSH parameters
2 Data-driven partitions into buckets
3 Additional speedup by using LSH data structure
Mean-Shift in a Nutshellbandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
point
KNN in mean-shift
Bandwidth should be inversely proportional to the density in the region
high density - small bandwidth low density - large bandwidth
Based on kth nearest neighbor of the point
The bandwidth is
Adaptive mean-shift vs non-adaptive
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Image segmentation algorithm1 Input Data in 5D (3 color + 2 xy) or 3D (1 gray +2 xy)2 Resolution controlled by the bandwidth hs (spatial) hr (color)3 Apply filtering
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
3D
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Image segmentation algorithm
original segmented
filtered
Filtering pixel value of the nearest mode
Mean-shift trajectories
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
original squirrel filtered
original baboon filtered
Filtering examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Segmentation examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Mean-shift in high dimensions
Computational curse of dimensionality
Statistical curse of dimensionality
Expensive range queries implemented with LSH
Sparseness of the data variable bandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
LSH-based data structure
bull Choose L random partitionsEach partition includes K pairs
(dkvk)bull For each point we check
kdi vxK
It Partitions the data into cells
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing the optimal K and L
bull For a query q compute smallest number of distances to points in its buckets
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
points extra includemight big toois L ifbut
missed bemight points small toois L If
cell ain points ofnumber smaller k Large
C
l
l
CC
dC
LNN
dKnN
)1(
C
C
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
structure data theof resolution thedetermines
decreases but increases increases L As
C
CC
Choosing optimal K and LDetermine accurately the KNN for m randomly-selected data points
distance (bandwidth)
Choose error threshold
The optimal K and L should satisfy
the approximate distance
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing optimal K and Lbull For each K estimate the error forbull In one run for all Lrsquos find the minimal L satisfying the constraint L(K)bull Minimize time t(KL(K))
minimum
Approximationerror for KL
L(K) for =005 Running timet[KL(K)]
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Data driven partitions
bull In original LSH cut values are random in the range of the databull Suggestion Randomly select a point from the data and use one of its coordinates as the cut value
uniform data driven pointsbucket distribution
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Additional speedup
aggregate)an of typea like is (C mode same
the toconverge willCin points all that Assume
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
C
C
Speedup results
65536 points 1638 points sampled k=100
Food for thought
Low dimension High dimension
A thought for foodhellipbull Choose K L by sample learning or take the traditionalbull Can one estimate K L without samplingbull A thought for food does it help to know the data
dimensionality or the data manifoldbull Intuitively dimensionality implies the number of hash
functions neededbull The catch efficient dimensionality learning requires
KNN
1530 cookieshellip
Summary
bull LSH suggests a compromise on accuracy for the gain of complexity
bull Applications that involve massive data in high dimension require the LSH fast performance
bull Extension of the LSH to different spaces (PSH)
bull Learning the LSH parameters and hash functions for different applications
Conclusion
bull but at the endeverything depends on your data set
bull Try it at homendash Visit
httpwebmiteduandoniwwwLSHindexhtmlndash Email Alex Andoni Andonimitedundash Test over your own data
(C code under Red Hat Linux )
Thanks
bull Ilan Shimshoni (Haifa)
bull Mohamad Hegaze (Weizmann)
bull Alex Andoni (MIT)
bull Mica and Denis
- k-Nearest Neighbors Search in High Dimensions
- Outline
- Nearest Neighbor Search Problem definition
- Applications
- Naiumlve solution
- Common solution
- When to use nearest neighbor
- Nearest Neighbor
- r - Nearest Neighbor
- Slide 10
- The simplest solution
- Quadtree
- Quadtree - structure
- Quadtree - Query
- Quadtree ndash Pitfall1
- Slide 16
- Quadtree ndash pitfall 2
- Space partition based algorithms
- Slide 19
- Curse of dimensionality
- Curse of dimensionality Some intuition
- Slide 22
- Preview
- Hash function
- Slide 25
- Slide 26
- Recall r - Nearest Neighbor
- Locality sensitive hashing
- Slide 29
- Hamming Space
- Slide 31
- L1 to Hamming Space Embedding
- Slide 33
- Construction
- Query
- Alternative intuition random projections
- Slide 37
- Slide 38
- Slide 39
- k samplings
- Repeating
- Repeating L times
- Slide 43
- Secondary hashing
- The above hashing is locality-sensitive
- Slide 46
- Direct L2 solution
- Central limit theorem
- Slide 49
- Slide 50
- Norm Distance
- Slide 52
- The full Hashing
- Slide 54
- Slide 55
- Slide 56
- Generalization P-Stable distribution
- P-Stable summary
- Parameters selection
- Parameters selection hellip
- hellip Parameters selection
- Pros amp Cons
- Conclusion
- LSH - Applications
- Motivation
- Slide 66
- Given an image x what are the parameters θ in this image ie angles of joints orientation of the body etc1048698
- Ingredients
- Example based learning
- Slide 70
- The image features
- PSH The basic assumption
- Insight Manifolds
- Slide 74
- Parameter Sensitive Hashing (PSH)
- Slide 76
- Slide 77
- Slide 78
- Slide 79
- Local Weighted Regression (LWR)
- Results
- Slide 82
- Results ndash real data
- Slide 84
- Slide 85
- Fast pose estimation - summary
- Food for Thought
- Food for Thought Point Location in Different Spheres (PLDS)
- Motivation
- Slide 90
- Slide 91
- Slide 92
- Slide 93
- Image segmentation algorithm
- Slide 95
- Filtering examples
- Segmentation examples
- Mean-shift in high dimensions
- LSH-based data structure
- Choosing the optimal K and L
- Slide 101
- Choosing optimal K and L
- Slide 103
- Data driven partitions
- Additional speedup
- Speedup results
- Food for thought
- A thought for foodhellip
- Summary
- Slide 110
- Thanks
-
![Page 8: K-Nearest Neighbors Search in High Dimensions Tomer Peled Dan Kushnir Tell me who your neighbors are, and I'll know who you are.](https://reader036.fdocuments.us/reader036/viewer/2022062421/56649c765503460f94929f53/html5/thumbnails/8.jpg)
Nearest Neighbor
min pi P dist(qpi)
Closestqq
r - Nearest Neighbor
r
(1 + ) r
dist(qp1) r
dist(qp2) (1 + ) r r2=(1 + ) r1
Outline
bullProblem definition and flavorsbullAlgorithms overview - low dimensionsAlgorithms overview - low dimensions bullCurse of dimensionality (dgt1020)bullEnchanting the curse
Locality Sensitive Hashing (high dimension approximate solutions)
bulll2 extensionbullApplications (Dan)
The simplest solution
bullLion in the desert
Quadtree
Split the first dimension into 2
Repeat iteratively
Stop when each cell has no more than 1 data point
Quadtree - structure
X
Y
X1Y1 PgeX1PgeY1
PltX1PltY1
PgeX1PltY1
PltX1PgeY1
X1Y1
Quadtree - Query
X
Y
In many cases works
X1Y1PltX1PltY1 PltX1
PgeY1
X1Y1
PgeX1PgeY1
PgeX1PltY1
Quadtree ndash Pitfall1
X
Y
In some cases doesnrsquot
X1Y1PgeX1PgeY1
PltX1
PltX1PltY1 PgeX1
PltY1PltX1PgeY1
X1Y1
Quadtree ndash Pitfall1
X
Y
In some cases nothing works
Quadtree ndash pitfall 2X
Y
O(2d)
Could result in Query time Exponential in dimensions
Space partition based algorithms
Multidimensional access methods Volker Gaede O Gunther
Could be improved
Outline
bullProblem definition and flavorsbullAlgorithms overview - low dimensions bullCurse of dimensionality (dgt1020)Curse of dimensionality (dgt1020)bullEnchanting the curse
Locality Sensitive Hashing (high dimension approximate solutions)
bulll2 extensionbullApplications (Dan)
Curse of dimensionality
bullQuery time or spaceO(nd)bullDgt1020 worst than sequential scan
ndashFor most geometric distributionsbullTechniques specific to high dimensions are needed
bullProoved in theory and in practice by Barkol amp Rabani 2000 amp Beame-Vee 2002
O( min(nd nd) )Naive
Curse of dimensionalitySome intuition
2
22
23
2d
Outline
bullProblem definition and flavorsbullAlgorithms overview - low dimensions bullCurse of dimensionality (dgt1020)bullEnchanting the curse Enchanting the curse
Locality Sensitive Hashing Locality Sensitive Hashing (high dimension approximate solutions)
bulll2 extensionbullApplications (Dan)
Preview
bullGeneral Solution ndash Locality sensitive hashing
bullImplementation for Hamming space
bullGeneralization to l1 amp l2
Hash function
Hash function
Hash function
Data_Item
Key
BinBucket
Hash function
X modulo 3
X=Number in the range 0n
02
Storage Address
Data structure
0
Usually we would like related Data-items to be stored at the same bin
Recall r - Nearest Neighbor
r
(1 + ) r
dist(qp1) r
dist(qp2) (1 + ) r r2=(1 + ) r1
Locality sensitive hashing
r(1 + ) r
(r p1p2 )Sensitiveequiv Pr[I(p)=I(q)] is ldquohighrdquo if p is ldquocloserdquo to qequiv Pr[I(p)=I(q)] is ldquolowrdquo if p isrdquofarrdquo from q
r2=(1 + ) r1
P1P2
Preview
bullGeneral Solution ndash Locality sensitive hashing
bullImplementation for Hamming space
bullGeneralization to l1 amp l2
Hamming Space
bullHamming space = 2N binary strings
bullHamming distance = changed digits
aka Signal distanceRichard Hamming
Hamming SpaceN
010100001111
010100001111
010010000011Distance = 4
bullHamming space
bullHamming distance
SUM(X1 XOR X2)
L1 to Hamming Space Embedding
p
8
C=11
1111111100011000000000
2
1111111100011000000000
drsquo=Cd
Hash function
Lj Hash function
p Hdrsquoisin
Gj(p)=p|Ij
j=1L k=3 digits
Bits sampling from p
Store p into bucket p|Ij 2k buckets101
11000000000 111111110000 111000000000 111111110001
Construction
1 2 L
p
Query
1 2 L
q
Alternative intuition random projections
p
8
C=11
1111111100011000000000
2
1111111100011000000000
drsquo=Cd
Alternative intuition random projections
8
C=11
1111111100011000000000
2
1111111100011000000000
p
Alternative intuition random projections
8
C=11
1111111100011000000000
2
1111111100011000000000
p
Alternative intuition random projections
101
11000000000 111111110000 111000000000 111111110001
000
100
110
001
101
111
2233 BucketsBucketsp
k samplings
Repeating
Repeating L times
Repeating L times
Secondary hashing
Support volume tuning
dataset-size vs storage volume
2k buckets
011
Size=B
M Buckets
Simple Hashing
MB=αn α=2
Skip
The above hashing is locality-sensitive
bullProbability (pq in same bucket)=
k=1 k=2
Distance (qpi) Distance (qpi)
Pro
babi
lity Pr
Adopted from Piotr Indykrsquos slides
kqp
dimensions
)(Distance1
Preview
bullGeneral Solution ndash Locality sensitive hashing
bullImplementation for Hamming space
bullGeneralization to l2
Direct L2 solution
bullNew hashing function
bullStill based on sampling
bullUsing mathematical trick
bullP-stable distribution for Lp distance bullGaussian distribution for L2 distance
Central limit theorem
v1 +v2 hellip+vn =+hellip
(Weighted Gaussians) = Weighted Gaussian
Central limit theorem
v1vn = Real Numbers
X1Xn = Independent Identically Distributed(iid)
+v2 X2 hellip+vn Xn =+hellipv1 X1
Central limit theorem
XvXvi
ii
ii
21
2||
Dot Product Norm
Norm Distance
XvuXvXui
iii
iii
ii
21
2||
Features vector 1
Features vector 2 Distance
Norm Distance
XvuXvXui
iii
iii
ii
21
2||
Dot Product
Dot Product Distance
The full Hashing
w
bvavh ba )(
[34 82 21]1
227742
d
d random numbers
+b
phaseRandom[0w]
wDiscretization step
Features vector
The full Hashing
w
bvavh ba )(
+34
100
7944
7900 8000 8100 82007800
The full Hashing
w
bvavh ba )(
+34
phaseRandom[0w]
100Discretization step
7944
The full Hashing
w
bvavh ba )(
a1 v d
iid from p-stable distribution
+b
phaseRandom[0w]
wDiscretization step
Features vector
Generalization P-Stable distribution
bullLp p=eps2
bullGeneralized Central Limit Theorem
bullP-stable distributionCauchy for L2
bullL2
bullCentral Limit Theorem
bullGaussian (normal) distribution
P-Stable summary
bullWorks for bullGeneralizes to 0ltplt=2
bullImproves query time
Query time = O (dn1(1+)log(n) ) O (dn1(1+)^2log(n) )
r - Nearest Neighbor
Latest resultsReported in Email by
Alexander Andoni
Parameters selection
bull90 Probability Best quarry time performance
For Euclidean Space
Parameters selectionhellip
For Euclidean Space
bullSingle projection hit an - Nearest Neighbor with Pr=p1
bullk projections hits an - Nearest Neighbor with Pr=p1k
bullL hashings fail to collide with Pr=(1-p1k)L
bullTo ensure Collision (eg 1-δge90)
bull1( -1-p1k)Lge 1-δ)1log(
)log(
1kp
L
L
Reject Non-NeighborsAccept Neighbors
hellipParameters selection
K
k
time Candidates verification Candidates extraction
Better Query Time than Spatial Data Structures
Scales well to higher dimensions and larger data size ( Sub-linear dependence )
Predictable running time
Extra storage over-head
Inefficient for data with distances concentrated around average
works best for Hamming distance (although can be generalized to Euclidean space)
In secondary storage linear scan is pretty much all we can do (for high dim)
requires radius r to be fixed in advance
Pros amp Cons
From Pioter Indyk slides
Conclusion
bullbut at the endeverything depends on your data set
bullTry it at homendashVisit
httpwebmiteduandoniwwwLSHindexhtml
ndashEmail Alex AndoniAndonimitedundashTest over your own data
(C code under Red Hat Linux )
LSH - Applicationsbull Searching video clips in databases (Hierarchical Non-Uniform Locality Sensitive
Hashing and Its Application to Video Identificationldquo Yang Ooi Sun)
bull Searching image databases (see the following)
bull Image segmentation (see the following)
bull Image classification (ldquoDiscriminant adaptive Nearest Neighbor Classificationrdquo T Hastie R Tibshirani)
bull Texture classification (see the following)
bull Clustering (see the following)
bull Embedding and manifold learning (LLE and many others)
bull Compression ndash vector quantizationbull Search engines (ldquoLSH Forest SelfTuning Indexes for Similarity Searchrdquo M Bawa T Condie P Ganesanrdquo)
bull Genomics (ldquoEfficient Large-Scale Sequence Comparison by Locality-Sensitive Hashingrdquo J Buhler)
bull In short whenever K-Nearest Neighbors (KNN) are needed
Motivation
bull A variety of procedures in learning require KNN computation
bull KNN search is a computational bottleneck
bull LSH provides a fast approximate solution to the problem
bull LSH requires hash function construction and parameter tunning
Outline
Fast Pose Estimation with Parameter Sensitive Hashing G Shakhnarovich P Viola and T Darrell
bull Finding sensitive hash functions
Mean Shift Based Clustering in HighDimensions A Texture Classification Example
B Georgescu I Shimshoni and P Meer
bull Tuning LSH parametersbull LSH data structure is used for algorithm
speedups
Given an image x what are the parameters θ in this image
ie angles of joints orientation of the body etc1048698
The Problem
Fast Pose Estimation with Parameter Sensitive Hashing
G Shakhnarovich P Viola and T Darrell
i
Ingredients
bull Input query image with unknown angles (parameters)
bull Database of human poses with known anglesbull Image feature extractor ndash edge detector
bull Distance metric in feature space dx
bull Distance metric in angles space
m
i
iid1
2121 )cos(1)(
Example based learning
bull Construct a database of example images with their known angles
bull Given a query image run your favorite feature extractorbull Compute KNN from databasebull Use these KNNs to compute the average angles of the
query
Input queryFind KNN in database of examples
Output Average angles of KNN
Input Query
Features extraction
Processed query
PSH (LSH)
Database of examples
The algorithm flow
LWR (Regression)
Output Match
The image features
B A
Axx 4107 )(
4
3
2
4 0
Image features are multi-scale edge histograms
Feature Extraction PSH LWR
PSH The basic assumption
There are two metric spaces here feature space ( )
and parameter space ( )
We want similarity to be measured in the angles
space whereas LSH works on the feature space
bull Assumption The feature space is closely related to the parameter space
xd
d
Feature Extraction PSH LWR
Insight Manifolds
bull Manifold is a space in which every point has a neighborhood resembling a Euclid space
bull But global structure may be complicated curved
bull For example lines are 1D manifolds planes are 2D manifolds etc
Feature Extraction PSH LWR
Parameters Space (angles)
Feature Space
q
Is this Magic
Parameter Sensitive Hashing (PSH)
The trick
Estimate performance of different hash functions on examples and select those sensitive to
The hash functions are applied in feature space but the KNN are valid in angle space
d
Feature Extraction PSH LWR
Label pairs of examples with similar angles
Define hash functions h on feature space
Feature Extraction PSH LWR
Predict labeling of similarnon-similar examples by using h
Compare labeling
If labeling by h is goodaccept h else change h
PSH as a classification problem
+1 +1 -1 -1
(r=025)
Labels
)1()( if 1
)( if 1y
labeled is
)x()(x examples ofpair A
ij
ji
rd
rd
ji
ji
ji
Feature Extraction PSH LWR
otherwise 1-
T(x) if 1)(
xh T
A binary hash functionfeatures
otherwise 1
if 1ˆ
labels ePredict th
)(xh)(xh)x(xy
jTiTjih
Feature Extraction PSH LWR
Feature
Feature Extraction PSH LWR
sconstraint iesprobabilit with thelabeling true thepredicts that Tbest theFind
themseparateor bin
same in the examplesboth place willTh
)(xT
Local Weighted Regression (LWR)bull Given a query image PSH returns
KNNs
bull LWR uses the KNN to compute a weighted average of the estimated angles of the query
weightdist
iXiixNx
xxdKxgdi
0)(
))(())((minarg0
Feature Extraction PSH LWR
Results
Synthetic data were generated
bull 13 angles 1 for rotation of the torso 12 for joints
bull 150000 images
bull Nuisance parameters added clothing illumination face expression
bull 1775000 example pairs
bull Selected 137 out of 5123 meaningful features (how)
18 bit hash functions (k) 150 hash tables (l)
bull Test on 1000 synthetic examplesbull PSH searched only 34 of the data per query
bull Without selection needed 40 bits and
1000 hash tables
Recall P1 is prob of positive hashP2 is prob of bad hashB is the max number of pts in a bucket
Results ndash real data
bull 800 images
bull Processed by a segmentation algorithm
bull 13 of the data were searched
Results ndash real data
Interesting mismatches
Fast pose estimation - summary
bull Fast way to compute the angles of human body figure
bull Moving from one representation space to another
bull Training a sensitive hash function
bull KNN smart averaging
Food for Thought
bull The basic assumption may be problematic (distance metric representations)
bull The training set should be dense
bull Texture and clutter
bull General some features are more important than others and should be weighted
Food for Thought Point Location in Different Spheres (PLDS)
bull Given n spheres in Rd centered at P=p1hellippn
with radii r1helliprn
bull Goal given a query q preprocess the points in P to find point pi that its sphere lsquocoverrsquo the query q
qpi
ri
Courtesy of Mohamad Hegaze
Motivationbull Clustering high dimensional data by using local
density measurements (eg feature space)bull Statistical curse of dimensionality
sparseness of the databull Computational curse of dimensionality
expensive range queriesbull LSH parameters should be adjusted for optimal
performance
Mean-Shift Based Clustering in High Dimensions A Texture Classification Example
B Georgescu I Shimshoni and P Meer
Outline
bull Mean-shift in a nutshell + examples
Our scope
bull Mean-shift in high dimensions ndash using LSH
bull Speedups1 Finding optimal LSH parameters
2 Data-driven partitions into buckets
3 Additional speedup by using LSH data structure
Mean-Shift in a Nutshellbandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
point
KNN in mean-shift
Bandwidth should be inversely proportional to the density in the region
high density - small bandwidth low density - large bandwidth
Based on kth nearest neighbor of the point
The bandwidth is
Adaptive mean-shift vs non-adaptive
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Image segmentation algorithm1 Input Data in 5D (3 color + 2 xy) or 3D (1 gray +2 xy)2 Resolution controlled by the bandwidth hs (spatial) hr (color)3 Apply filtering
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
3D
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Image segmentation algorithm
original segmented
filtered
Filtering pixel value of the nearest mode
Mean-shift trajectories
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
original squirrel filtered
original baboon filtered
Filtering examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Segmentation examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Mean-shift in high dimensions
Computational curse of dimensionality
Statistical curse of dimensionality
Expensive range queries implemented with LSH
Sparseness of the data variable bandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
LSH-based data structure
bull Choose L random partitionsEach partition includes K pairs
(dkvk)bull For each point we check
kdi vxK
It Partitions the data into cells
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing the optimal K and L
bull For a query q compute smallest number of distances to points in its buckets
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
points extra includemight big toois L ifbut
missed bemight points small toois L If
cell ain points ofnumber smaller k Large
C
l
l
CC
dC
LNN
dKnN
)1(
C
C
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
structure data theof resolution thedetermines
decreases but increases increases L As
C
CC
Choosing optimal K and LDetermine accurately the KNN for m randomly-selected data points
distance (bandwidth)
Choose error threshold
The optimal K and L should satisfy
the approximate distance
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing optimal K and Lbull For each K estimate the error forbull In one run for all Lrsquos find the minimal L satisfying the constraint L(K)bull Minimize time t(KL(K))
minimum
Approximationerror for KL
L(K) for =005 Running timet[KL(K)]
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Data driven partitions
bull In original LSH cut values are random in the range of the databull Suggestion Randomly select a point from the data and use one of its coordinates as the cut value
uniform data driven pointsbucket distribution
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Additional speedup
aggregate)an of typea like is (C mode same
the toconverge willCin points all that Assume
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
C
C
Speedup results
65536 points 1638 points sampled k=100
Food for thought
Low dimension High dimension
A thought for foodhellipbull Choose K L by sample learning or take the traditionalbull Can one estimate K L without samplingbull A thought for food does it help to know the data
dimensionality or the data manifoldbull Intuitively dimensionality implies the number of hash
functions neededbull The catch efficient dimensionality learning requires
KNN
1530 cookieshellip
Summary
bull LSH suggests a compromise on accuracy for the gain of complexity
bull Applications that involve massive data in high dimension require the LSH fast performance
bull Extension of the LSH to different spaces (PSH)
bull Learning the LSH parameters and hash functions for different applications
Conclusion
bull but at the endeverything depends on your data set
bull Try it at homendash Visit
httpwebmiteduandoniwwwLSHindexhtmlndash Email Alex Andoni Andonimitedundash Test over your own data
(C code under Red Hat Linux )
Thanks
bull Ilan Shimshoni (Haifa)
bull Mohamad Hegaze (Weizmann)
bull Alex Andoni (MIT)
bull Mica and Denis
- k-Nearest Neighbors Search in High Dimensions
- Outline
- Nearest Neighbor Search Problem definition
- Applications
- Naiumlve solution
- Common solution
- When to use nearest neighbor
- Nearest Neighbor
- r - Nearest Neighbor
- Slide 10
- The simplest solution
- Quadtree
- Quadtree - structure
- Quadtree - Query
- Quadtree ndash Pitfall1
- Slide 16
- Quadtree ndash pitfall 2
- Space partition based algorithms
- Slide 19
- Curse of dimensionality
- Curse of dimensionality Some intuition
- Slide 22
- Preview
- Hash function
- Slide 25
- Slide 26
- Recall r - Nearest Neighbor
- Locality sensitive hashing
- Slide 29
- Hamming Space
- Slide 31
- L1 to Hamming Space Embedding
- Slide 33
- Construction
- Query
- Alternative intuition random projections
- Slide 37
- Slide 38
- Slide 39
- k samplings
- Repeating
- Repeating L times
- Slide 43
- Secondary hashing
- The above hashing is locality-sensitive
- Slide 46
- Direct L2 solution
- Central limit theorem
- Slide 49
- Slide 50
- Norm Distance
- Slide 52
- The full Hashing
- Slide 54
- Slide 55
- Slide 56
- Generalization P-Stable distribution
- P-Stable summary
- Parameters selection
- Parameters selection hellip
- hellip Parameters selection
- Pros amp Cons
- Conclusion
- LSH - Applications
- Motivation
- Slide 66
- Given an image x what are the parameters θ in this image ie angles of joints orientation of the body etc1048698
- Ingredients
- Example based learning
- Slide 70
- The image features
- PSH The basic assumption
- Insight Manifolds
- Slide 74
- Parameter Sensitive Hashing (PSH)
- Slide 76
- Slide 77
- Slide 78
- Slide 79
- Local Weighted Regression (LWR)
- Results
- Slide 82
- Results ndash real data
- Slide 84
- Slide 85
- Fast pose estimation - summary
- Food for Thought
- Food for Thought Point Location in Different Spheres (PLDS)
- Motivation
- Slide 90
- Slide 91
- Slide 92
- Slide 93
- Image segmentation algorithm
- Slide 95
- Filtering examples
- Segmentation examples
- Mean-shift in high dimensions
- LSH-based data structure
- Choosing the optimal K and L
- Slide 101
- Choosing optimal K and L
- Slide 103
- Data driven partitions
- Additional speedup
- Speedup results
- Food for thought
- A thought for foodhellip
- Summary
- Slide 110
- Thanks
-
![Page 9: K-Nearest Neighbors Search in High Dimensions Tomer Peled Dan Kushnir Tell me who your neighbors are, and I'll know who you are.](https://reader036.fdocuments.us/reader036/viewer/2022062421/56649c765503460f94929f53/html5/thumbnails/9.jpg)
r - Nearest Neighbor
r
(1 + ) r
dist(qp1) r
dist(qp2) (1 + ) r r2=(1 + ) r1
Outline
bullProblem definition and flavorsbullAlgorithms overview - low dimensionsAlgorithms overview - low dimensions bullCurse of dimensionality (dgt1020)bullEnchanting the curse
Locality Sensitive Hashing (high dimension approximate solutions)
bulll2 extensionbullApplications (Dan)
The simplest solution
bullLion in the desert
Quadtree
Split the first dimension into 2
Repeat iteratively
Stop when each cell has no more than 1 data point
Quadtree - structure
X
Y
X1Y1 PgeX1PgeY1
PltX1PltY1
PgeX1PltY1
PltX1PgeY1
X1Y1
Quadtree - Query
X
Y
In many cases works
X1Y1PltX1PltY1 PltX1
PgeY1
X1Y1
PgeX1PgeY1
PgeX1PltY1
Quadtree ndash Pitfall1
X
Y
In some cases doesnrsquot
X1Y1PgeX1PgeY1
PltX1
PltX1PltY1 PgeX1
PltY1PltX1PgeY1
X1Y1
Quadtree ndash Pitfall1
X
Y
In some cases nothing works
Quadtree ndash pitfall 2X
Y
O(2d)
Could result in Query time Exponential in dimensions
Space partition based algorithms
Multidimensional access methods Volker Gaede O Gunther
Could be improved
Outline
bullProblem definition and flavorsbullAlgorithms overview - low dimensions bullCurse of dimensionality (dgt1020)Curse of dimensionality (dgt1020)bullEnchanting the curse
Locality Sensitive Hashing (high dimension approximate solutions)
bulll2 extensionbullApplications (Dan)
Curse of dimensionality
bullQuery time or spaceO(nd)bullDgt1020 worst than sequential scan
ndashFor most geometric distributionsbullTechniques specific to high dimensions are needed
bullProoved in theory and in practice by Barkol amp Rabani 2000 amp Beame-Vee 2002
O( min(nd nd) )Naive
Curse of dimensionalitySome intuition
2
22
23
2d
Outline
bullProblem definition and flavorsbullAlgorithms overview - low dimensions bullCurse of dimensionality (dgt1020)bullEnchanting the curse Enchanting the curse
Locality Sensitive Hashing Locality Sensitive Hashing (high dimension approximate solutions)
bulll2 extensionbullApplications (Dan)
Preview
bullGeneral Solution ndash Locality sensitive hashing
bullImplementation for Hamming space
bullGeneralization to l1 amp l2
Hash function
Hash function
Hash function
Data_Item
Key
BinBucket
Hash function
X modulo 3
X=Number in the range 0n
02
Storage Address
Data structure
0
Usually we would like related Data-items to be stored at the same bin
Recall r - Nearest Neighbor
r
(1 + ) r
dist(qp1) r
dist(qp2) (1 + ) r r2=(1 + ) r1
Locality sensitive hashing
r(1 + ) r
(r p1p2 )Sensitiveequiv Pr[I(p)=I(q)] is ldquohighrdquo if p is ldquocloserdquo to qequiv Pr[I(p)=I(q)] is ldquolowrdquo if p isrdquofarrdquo from q
r2=(1 + ) r1
P1P2
Preview
bullGeneral Solution ndash Locality sensitive hashing
bullImplementation for Hamming space
bullGeneralization to l1 amp l2
Hamming Space
bullHamming space = 2N binary strings
bullHamming distance = changed digits
aka Signal distanceRichard Hamming
Hamming SpaceN
010100001111
010100001111
010010000011Distance = 4
bullHamming space
bullHamming distance
SUM(X1 XOR X2)
L1 to Hamming Space Embedding
p
8
C=11
1111111100011000000000
2
1111111100011000000000
drsquo=Cd
Hash function
Lj Hash function
p Hdrsquoisin
Gj(p)=p|Ij
j=1L k=3 digits
Bits sampling from p
Store p into bucket p|Ij 2k buckets101
11000000000 111111110000 111000000000 111111110001
Construction
1 2 L
p
Query
1 2 L
q
Alternative intuition random projections
p
8
C=11
1111111100011000000000
2
1111111100011000000000
drsquo=Cd
Alternative intuition random projections
8
C=11
1111111100011000000000
2
1111111100011000000000
p
Alternative intuition random projections
8
C=11
1111111100011000000000
2
1111111100011000000000
p
Alternative intuition random projections
101
11000000000 111111110000 111000000000 111111110001
000
100
110
001
101
111
2233 BucketsBucketsp
k samplings
Repeating
Repeating L times
Repeating L times
Secondary hashing
Support volume tuning
dataset-size vs storage volume
2k buckets
011
Size=B
M Buckets
Simple Hashing
MB=αn α=2
Skip
The above hashing is locality-sensitive
bullProbability (pq in same bucket)=
k=1 k=2
Distance (qpi) Distance (qpi)
Pro
babi
lity Pr
Adopted from Piotr Indykrsquos slides
kqp
dimensions
)(Distance1
Preview
bullGeneral Solution ndash Locality sensitive hashing
bullImplementation for Hamming space
bullGeneralization to l2
Direct L2 solution
bullNew hashing function
bullStill based on sampling
bullUsing mathematical trick
bullP-stable distribution for Lp distance bullGaussian distribution for L2 distance
Central limit theorem
v1 +v2 hellip+vn =+hellip
(Weighted Gaussians) = Weighted Gaussian
Central limit theorem
v1vn = Real Numbers
X1Xn = Independent Identically Distributed(iid)
+v2 X2 hellip+vn Xn =+hellipv1 X1
Central limit theorem
XvXvi
ii
ii
21
2||
Dot Product Norm
Norm Distance
XvuXvXui
iii
iii
ii
21
2||
Features vector 1
Features vector 2 Distance
Norm Distance
XvuXvXui
iii
iii
ii
21
2||
Dot Product
Dot Product Distance
The full Hashing
w
bvavh ba )(
[34 82 21]1
227742
d
d random numbers
+b
phaseRandom[0w]
wDiscretization step
Features vector
The full Hashing
w
bvavh ba )(
+34
100
7944
7900 8000 8100 82007800
The full Hashing
w
bvavh ba )(
+34
phaseRandom[0w]
100Discretization step
7944
The full Hashing
w
bvavh ba )(
a1 v d
iid from p-stable distribution
+b
phaseRandom[0w]
wDiscretization step
Features vector
Generalization P-Stable distribution
bullLp p=eps2
bullGeneralized Central Limit Theorem
bullP-stable distributionCauchy for L2
bullL2
bullCentral Limit Theorem
bullGaussian (normal) distribution
P-Stable summary
bullWorks for bullGeneralizes to 0ltplt=2
bullImproves query time
Query time = O (dn1(1+)log(n) ) O (dn1(1+)^2log(n) )
r - Nearest Neighbor
Latest resultsReported in Email by
Alexander Andoni
Parameters selection
bull90 Probability Best quarry time performance
For Euclidean Space
Parameters selectionhellip
For Euclidean Space
bullSingle projection hit an - Nearest Neighbor with Pr=p1
bullk projections hits an - Nearest Neighbor with Pr=p1k
bullL hashings fail to collide with Pr=(1-p1k)L
bullTo ensure Collision (eg 1-δge90)
bull1( -1-p1k)Lge 1-δ)1log(
)log(
1kp
L
L
Reject Non-NeighborsAccept Neighbors
hellipParameters selection
K
k
time Candidates verification Candidates extraction
Better Query Time than Spatial Data Structures
Scales well to higher dimensions and larger data size ( Sub-linear dependence )
Predictable running time
Extra storage over-head
Inefficient for data with distances concentrated around average
works best for Hamming distance (although can be generalized to Euclidean space)
In secondary storage linear scan is pretty much all we can do (for high dim)
requires radius r to be fixed in advance
Pros amp Cons
From Pioter Indyk slides
Conclusion
bullbut at the endeverything depends on your data set
bullTry it at homendashVisit
httpwebmiteduandoniwwwLSHindexhtml
ndashEmail Alex AndoniAndonimitedundashTest over your own data
(C code under Red Hat Linux )
LSH - Applicationsbull Searching video clips in databases (Hierarchical Non-Uniform Locality Sensitive
Hashing and Its Application to Video Identificationldquo Yang Ooi Sun)
bull Searching image databases (see the following)
bull Image segmentation (see the following)
bull Image classification (ldquoDiscriminant adaptive Nearest Neighbor Classificationrdquo T Hastie R Tibshirani)
bull Texture classification (see the following)
bull Clustering (see the following)
bull Embedding and manifold learning (LLE and many others)
bull Compression ndash vector quantizationbull Search engines (ldquoLSH Forest SelfTuning Indexes for Similarity Searchrdquo M Bawa T Condie P Ganesanrdquo)
bull Genomics (ldquoEfficient Large-Scale Sequence Comparison by Locality-Sensitive Hashingrdquo J Buhler)
bull In short whenever K-Nearest Neighbors (KNN) are needed
Motivation
bull A variety of procedures in learning require KNN computation
bull KNN search is a computational bottleneck
bull LSH provides a fast approximate solution to the problem
bull LSH requires hash function construction and parameter tunning
Outline
Fast Pose Estimation with Parameter Sensitive Hashing G Shakhnarovich P Viola and T Darrell
bull Finding sensitive hash functions
Mean Shift Based Clustering in HighDimensions A Texture Classification Example
B Georgescu I Shimshoni and P Meer
bull Tuning LSH parametersbull LSH data structure is used for algorithm
speedups
Given an image x what are the parameters θ in this image
ie angles of joints orientation of the body etc1048698
The Problem
Fast Pose Estimation with Parameter Sensitive Hashing
G Shakhnarovich P Viola and T Darrell
i
Ingredients
bull Input query image with unknown angles (parameters)
bull Database of human poses with known anglesbull Image feature extractor ndash edge detector
bull Distance metric in feature space dx
bull Distance metric in angles space
m
i
iid1
2121 )cos(1)(
Example based learning
bull Construct a database of example images with their known angles
bull Given a query image run your favorite feature extractorbull Compute KNN from databasebull Use these KNNs to compute the average angles of the
query
Input queryFind KNN in database of examples
Output Average angles of KNN
Input Query
Features extraction
Processed query
PSH (LSH)
Database of examples
The algorithm flow
LWR (Regression)
Output Match
The image features
B A
Axx 4107 )(
4
3
2
4 0
Image features are multi-scale edge histograms
Feature Extraction PSH LWR
PSH The basic assumption
There are two metric spaces here feature space ( )
and parameter space ( )
We want similarity to be measured in the angles
space whereas LSH works on the feature space
bull Assumption The feature space is closely related to the parameter space
xd
d
Feature Extraction PSH LWR
Insight Manifolds
bull Manifold is a space in which every point has a neighborhood resembling a Euclid space
bull But global structure may be complicated curved
bull For example lines are 1D manifolds planes are 2D manifolds etc
Feature Extraction PSH LWR
Parameters Space (angles)
Feature Space
q
Is this Magic
Parameter Sensitive Hashing (PSH)
The trick
Estimate performance of different hash functions on examples and select those sensitive to
The hash functions are applied in feature space but the KNN are valid in angle space
d
Feature Extraction PSH LWR
Label pairs of examples with similar angles
Define hash functions h on feature space
Feature Extraction PSH LWR
Predict labeling of similarnon-similar examples by using h
Compare labeling
If labeling by h is goodaccept h else change h
PSH as a classification problem
+1 +1 -1 -1
(r=025)
Labels
)1()( if 1
)( if 1y
labeled is
)x()(x examples ofpair A
ij
ji
rd
rd
ji
ji
ji
Feature Extraction PSH LWR
otherwise 1-
T(x) if 1)(
xh T
A binary hash functionfeatures
otherwise 1
if 1ˆ
labels ePredict th
)(xh)(xh)x(xy
jTiTjih
Feature Extraction PSH LWR
Feature
Feature Extraction PSH LWR
sconstraint iesprobabilit with thelabeling true thepredicts that Tbest theFind
themseparateor bin
same in the examplesboth place willTh
)(xT
Local Weighted Regression (LWR)bull Given a query image PSH returns
KNNs
bull LWR uses the KNN to compute a weighted average of the estimated angles of the query
weightdist
iXiixNx
xxdKxgdi
0)(
))(())((minarg0
Feature Extraction PSH LWR
Results
Synthetic data were generated
bull 13 angles 1 for rotation of the torso 12 for joints
bull 150000 images
bull Nuisance parameters added clothing illumination face expression
bull 1775000 example pairs
bull Selected 137 out of 5123 meaningful features (how)
18 bit hash functions (k) 150 hash tables (l)
bull Test on 1000 synthetic examplesbull PSH searched only 34 of the data per query
bull Without selection needed 40 bits and
1000 hash tables
Recall P1 is prob of positive hashP2 is prob of bad hashB is the max number of pts in a bucket
Results ndash real data
bull 800 images
bull Processed by a segmentation algorithm
bull 13 of the data were searched
Results ndash real data
Interesting mismatches
Fast pose estimation - summary
bull Fast way to compute the angles of human body figure
bull Moving from one representation space to another
bull Training a sensitive hash function
bull KNN smart averaging
Food for Thought
bull The basic assumption may be problematic (distance metric representations)
bull The training set should be dense
bull Texture and clutter
bull General some features are more important than others and should be weighted
Food for Thought Point Location in Different Spheres (PLDS)
bull Given n spheres in Rd centered at P=p1hellippn
with radii r1helliprn
bull Goal given a query q preprocess the points in P to find point pi that its sphere lsquocoverrsquo the query q
qpi
ri
Courtesy of Mohamad Hegaze
Motivationbull Clustering high dimensional data by using local
density measurements (eg feature space)bull Statistical curse of dimensionality
sparseness of the databull Computational curse of dimensionality
expensive range queriesbull LSH parameters should be adjusted for optimal
performance
Mean-Shift Based Clustering in High Dimensions A Texture Classification Example
B Georgescu I Shimshoni and P Meer
Outline
bull Mean-shift in a nutshell + examples
Our scope
bull Mean-shift in high dimensions ndash using LSH
bull Speedups1 Finding optimal LSH parameters
2 Data-driven partitions into buckets
3 Additional speedup by using LSH data structure
Mean-Shift in a Nutshellbandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
point
KNN in mean-shift
Bandwidth should be inversely proportional to the density in the region
high density - small bandwidth low density - large bandwidth
Based on kth nearest neighbor of the point
The bandwidth is
Adaptive mean-shift vs non-adaptive
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Image segmentation algorithm1 Input Data in 5D (3 color + 2 xy) or 3D (1 gray +2 xy)2 Resolution controlled by the bandwidth hs (spatial) hr (color)3 Apply filtering
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
3D
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Image segmentation algorithm
original segmented
filtered
Filtering pixel value of the nearest mode
Mean-shift trajectories
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
original squirrel filtered
original baboon filtered
Filtering examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Segmentation examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Mean-shift in high dimensions
Computational curse of dimensionality
Statistical curse of dimensionality
Expensive range queries implemented with LSH
Sparseness of the data variable bandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
LSH-based data structure
bull Choose L random partitionsEach partition includes K pairs
(dkvk)bull For each point we check
kdi vxK
It Partitions the data into cells
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing the optimal K and L
bull For a query q compute smallest number of distances to points in its buckets
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
points extra includemight big toois L ifbut
missed bemight points small toois L If
cell ain points ofnumber smaller k Large
C
l
l
CC
dC
LNN
dKnN
)1(
C
C
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
structure data theof resolution thedetermines
decreases but increases increases L As
C
CC
Choosing optimal K and LDetermine accurately the KNN for m randomly-selected data points
distance (bandwidth)
Choose error threshold
The optimal K and L should satisfy
the approximate distance
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing optimal K and Lbull For each K estimate the error forbull In one run for all Lrsquos find the minimal L satisfying the constraint L(K)bull Minimize time t(KL(K))
minimum
Approximationerror for KL
L(K) for =005 Running timet[KL(K)]
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Data driven partitions
bull In original LSH cut values are random in the range of the databull Suggestion Randomly select a point from the data and use one of its coordinates as the cut value
uniform data driven pointsbucket distribution
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Additional speedup
aggregate)an of typea like is (C mode same
the toconverge willCin points all that Assume
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
C
C
Speedup results
65536 points 1638 points sampled k=100
Food for thought
Low dimension High dimension
A thought for foodhellipbull Choose K L by sample learning or take the traditionalbull Can one estimate K L without samplingbull A thought for food does it help to know the data
dimensionality or the data manifoldbull Intuitively dimensionality implies the number of hash
functions neededbull The catch efficient dimensionality learning requires
KNN
1530 cookieshellip
Summary
bull LSH suggests a compromise on accuracy for the gain of complexity
bull Applications that involve massive data in high dimension require the LSH fast performance
bull Extension of the LSH to different spaces (PSH)
bull Learning the LSH parameters and hash functions for different applications
Conclusion
bull but at the endeverything depends on your data set
bull Try it at homendash Visit
httpwebmiteduandoniwwwLSHindexhtmlndash Email Alex Andoni Andonimitedundash Test over your own data
(C code under Red Hat Linux )
Thanks
bull Ilan Shimshoni (Haifa)
bull Mohamad Hegaze (Weizmann)
bull Alex Andoni (MIT)
bull Mica and Denis
- k-Nearest Neighbors Search in High Dimensions
- Outline
- Nearest Neighbor Search Problem definition
- Applications
- Naiumlve solution
- Common solution
- When to use nearest neighbor
- Nearest Neighbor
- r - Nearest Neighbor
- Slide 10
- The simplest solution
- Quadtree
- Quadtree - structure
- Quadtree - Query
- Quadtree ndash Pitfall1
- Slide 16
- Quadtree ndash pitfall 2
- Space partition based algorithms
- Slide 19
- Curse of dimensionality
- Curse of dimensionality Some intuition
- Slide 22
- Preview
- Hash function
- Slide 25
- Slide 26
- Recall r - Nearest Neighbor
- Locality sensitive hashing
- Slide 29
- Hamming Space
- Slide 31
- L1 to Hamming Space Embedding
- Slide 33
- Construction
- Query
- Alternative intuition random projections
- Slide 37
- Slide 38
- Slide 39
- k samplings
- Repeating
- Repeating L times
- Slide 43
- Secondary hashing
- The above hashing is locality-sensitive
- Slide 46
- Direct L2 solution
- Central limit theorem
- Slide 49
- Slide 50
- Norm Distance
- Slide 52
- The full Hashing
- Slide 54
- Slide 55
- Slide 56
- Generalization P-Stable distribution
- P-Stable summary
- Parameters selection
- Parameters selection hellip
- hellip Parameters selection
- Pros amp Cons
- Conclusion
- LSH - Applications
- Motivation
- Slide 66
- Given an image x what are the parameters θ in this image ie angles of joints orientation of the body etc1048698
- Ingredients
- Example based learning
- Slide 70
- The image features
- PSH The basic assumption
- Insight Manifolds
- Slide 74
- Parameter Sensitive Hashing (PSH)
- Slide 76
- Slide 77
- Slide 78
- Slide 79
- Local Weighted Regression (LWR)
- Results
- Slide 82
- Results ndash real data
- Slide 84
- Slide 85
- Fast pose estimation - summary
- Food for Thought
- Food for Thought Point Location in Different Spheres (PLDS)
- Motivation
- Slide 90
- Slide 91
- Slide 92
- Slide 93
- Image segmentation algorithm
- Slide 95
- Filtering examples
- Segmentation examples
- Mean-shift in high dimensions
- LSH-based data structure
- Choosing the optimal K and L
- Slide 101
- Choosing optimal K and L
- Slide 103
- Data driven partitions
- Additional speedup
- Speedup results
- Food for thought
- A thought for foodhellip
- Summary
- Slide 110
- Thanks
-
![Page 10: K-Nearest Neighbors Search in High Dimensions Tomer Peled Dan Kushnir Tell me who your neighbors are, and I'll know who you are.](https://reader036.fdocuments.us/reader036/viewer/2022062421/56649c765503460f94929f53/html5/thumbnails/10.jpg)
Outline
bullProblem definition and flavorsbullAlgorithms overview - low dimensionsAlgorithms overview - low dimensions bullCurse of dimensionality (dgt1020)bullEnchanting the curse
Locality Sensitive Hashing (high dimension approximate solutions)
bulll2 extensionbullApplications (Dan)
The simplest solution
bullLion in the desert
Quadtree
Split the first dimension into 2
Repeat iteratively
Stop when each cell has no more than 1 data point
Quadtree - structure
X
Y
X1Y1 PgeX1PgeY1
PltX1PltY1
PgeX1PltY1
PltX1PgeY1
X1Y1
Quadtree - Query
X
Y
In many cases works
X1Y1PltX1PltY1 PltX1
PgeY1
X1Y1
PgeX1PgeY1
PgeX1PltY1
Quadtree ndash Pitfall1
X
Y
In some cases doesnrsquot
X1Y1PgeX1PgeY1
PltX1
PltX1PltY1 PgeX1
PltY1PltX1PgeY1
X1Y1
Quadtree ndash Pitfall1
X
Y
In some cases nothing works
Quadtree ndash pitfall 2X
Y
O(2d)
Could result in Query time Exponential in dimensions
Space partition based algorithms
Multidimensional access methods Volker Gaede O Gunther
Could be improved
Outline
bullProblem definition and flavorsbullAlgorithms overview - low dimensions bullCurse of dimensionality (dgt1020)Curse of dimensionality (dgt1020)bullEnchanting the curse
Locality Sensitive Hashing (high dimension approximate solutions)
bulll2 extensionbullApplications (Dan)
Curse of dimensionality
bullQuery time or spaceO(nd)bullDgt1020 worst than sequential scan
ndashFor most geometric distributionsbullTechniques specific to high dimensions are needed
bullProoved in theory and in practice by Barkol amp Rabani 2000 amp Beame-Vee 2002
O( min(nd nd) )Naive
Curse of dimensionalitySome intuition
2
22
23
2d
Outline
bullProblem definition and flavorsbullAlgorithms overview - low dimensions bullCurse of dimensionality (dgt1020)bullEnchanting the curse Enchanting the curse
Locality Sensitive Hashing Locality Sensitive Hashing (high dimension approximate solutions)
bulll2 extensionbullApplications (Dan)
Preview
bullGeneral Solution ndash Locality sensitive hashing
bullImplementation for Hamming space
bullGeneralization to l1 amp l2
Hash function
Hash function
Hash function
Data_Item
Key
BinBucket
Hash function
X modulo 3
X=Number in the range 0n
02
Storage Address
Data structure
0
Usually we would like related Data-items to be stored at the same bin
Recall r - Nearest Neighbor
r
(1 + ) r
dist(qp1) r
dist(qp2) (1 + ) r r2=(1 + ) r1
Locality sensitive hashing
r(1 + ) r
(r p1p2 )Sensitiveequiv Pr[I(p)=I(q)] is ldquohighrdquo if p is ldquocloserdquo to qequiv Pr[I(p)=I(q)] is ldquolowrdquo if p isrdquofarrdquo from q
r2=(1 + ) r1
P1P2
Preview
bullGeneral Solution ndash Locality sensitive hashing
bullImplementation for Hamming space
bullGeneralization to l1 amp l2
Hamming Space
bullHamming space = 2N binary strings
bullHamming distance = changed digits
aka Signal distanceRichard Hamming
Hamming SpaceN
010100001111
010100001111
010010000011Distance = 4
bullHamming space
bullHamming distance
SUM(X1 XOR X2)
L1 to Hamming Space Embedding
p
8
C=11
1111111100011000000000
2
1111111100011000000000
drsquo=Cd
Hash function
Lj Hash function
p Hdrsquoisin
Gj(p)=p|Ij
j=1L k=3 digits
Bits sampling from p
Store p into bucket p|Ij 2k buckets101
11000000000 111111110000 111000000000 111111110001
Construction
1 2 L
p
Query
1 2 L
q
Alternative intuition random projections
p
8
C=11
1111111100011000000000
2
1111111100011000000000
drsquo=Cd
Alternative intuition random projections
8
C=11
1111111100011000000000
2
1111111100011000000000
p
Alternative intuition random projections
8
C=11
1111111100011000000000
2
1111111100011000000000
p
Alternative intuition random projections
101
11000000000 111111110000 111000000000 111111110001
000
100
110
001
101
111
2233 BucketsBucketsp
k samplings
Repeating
Repeating L times
Repeating L times
Secondary hashing
Support volume tuning
dataset-size vs storage volume
2k buckets
011
Size=B
M Buckets
Simple Hashing
MB=αn α=2
Skip
The above hashing is locality-sensitive
bullProbability (pq in same bucket)=
k=1 k=2
Distance (qpi) Distance (qpi)
Pro
babi
lity Pr
Adopted from Piotr Indykrsquos slides
kqp
dimensions
)(Distance1
Preview
bullGeneral Solution ndash Locality sensitive hashing
bullImplementation for Hamming space
bullGeneralization to l2
Direct L2 solution
bullNew hashing function
bullStill based on sampling
bullUsing mathematical trick
bullP-stable distribution for Lp distance bullGaussian distribution for L2 distance
Central limit theorem
v1 +v2 hellip+vn =+hellip
(Weighted Gaussians) = Weighted Gaussian
Central limit theorem
v1vn = Real Numbers
X1Xn = Independent Identically Distributed(iid)
+v2 X2 hellip+vn Xn =+hellipv1 X1
Central limit theorem
XvXvi
ii
ii
21
2||
Dot Product Norm
Norm Distance
XvuXvXui
iii
iii
ii
21
2||
Features vector 1
Features vector 2 Distance
Norm Distance
XvuXvXui
iii
iii
ii
21
2||
Dot Product
Dot Product Distance
The full Hashing
w
bvavh ba )(
[34 82 21]1
227742
d
d random numbers
+b
phaseRandom[0w]
wDiscretization step
Features vector
The full Hashing
w
bvavh ba )(
+34
100
7944
7900 8000 8100 82007800
The full Hashing
w
bvavh ba )(
+34
phaseRandom[0w]
100Discretization step
7944
The full Hashing
w
bvavh ba )(
a1 v d
iid from p-stable distribution
+b
phaseRandom[0w]
wDiscretization step
Features vector
Generalization P-Stable distribution
bullLp p=eps2
bullGeneralized Central Limit Theorem
bullP-stable distributionCauchy for L2
bullL2
bullCentral Limit Theorem
bullGaussian (normal) distribution
P-Stable summary
bullWorks for bullGeneralizes to 0ltplt=2
bullImproves query time
Query time = O (dn1(1+)log(n) ) O (dn1(1+)^2log(n) )
r - Nearest Neighbor
Latest resultsReported in Email by
Alexander Andoni
Parameters selection
bull90 Probability Best quarry time performance
For Euclidean Space
Parameters selectionhellip
For Euclidean Space
bullSingle projection hit an - Nearest Neighbor with Pr=p1
bullk projections hits an - Nearest Neighbor with Pr=p1k
bullL hashings fail to collide with Pr=(1-p1k)L
bullTo ensure Collision (eg 1-δge90)
bull1( -1-p1k)Lge 1-δ)1log(
)log(
1kp
L
L
Reject Non-NeighborsAccept Neighbors
hellipParameters selection
K
k
time Candidates verification Candidates extraction
Better Query Time than Spatial Data Structures
Scales well to higher dimensions and larger data size ( Sub-linear dependence )
Predictable running time
Extra storage over-head
Inefficient for data with distances concentrated around average
works best for Hamming distance (although can be generalized to Euclidean space)
In secondary storage linear scan is pretty much all we can do (for high dim)
requires radius r to be fixed in advance
Pros amp Cons
From Pioter Indyk slides
Conclusion
bullbut at the endeverything depends on your data set
bullTry it at homendashVisit
httpwebmiteduandoniwwwLSHindexhtml
ndashEmail Alex AndoniAndonimitedundashTest over your own data
(C code under Red Hat Linux )
LSH - Applicationsbull Searching video clips in databases (Hierarchical Non-Uniform Locality Sensitive
Hashing and Its Application to Video Identificationldquo Yang Ooi Sun)
bull Searching image databases (see the following)
bull Image segmentation (see the following)
bull Image classification (ldquoDiscriminant adaptive Nearest Neighbor Classificationrdquo T Hastie R Tibshirani)
bull Texture classification (see the following)
bull Clustering (see the following)
bull Embedding and manifold learning (LLE and many others)
bull Compression ndash vector quantizationbull Search engines (ldquoLSH Forest SelfTuning Indexes for Similarity Searchrdquo M Bawa T Condie P Ganesanrdquo)
bull Genomics (ldquoEfficient Large-Scale Sequence Comparison by Locality-Sensitive Hashingrdquo J Buhler)
bull In short whenever K-Nearest Neighbors (KNN) are needed
Motivation
bull A variety of procedures in learning require KNN computation
bull KNN search is a computational bottleneck
bull LSH provides a fast approximate solution to the problem
bull LSH requires hash function construction and parameter tunning
Outline
Fast Pose Estimation with Parameter Sensitive Hashing G Shakhnarovich P Viola and T Darrell
bull Finding sensitive hash functions
Mean Shift Based Clustering in HighDimensions A Texture Classification Example
B Georgescu I Shimshoni and P Meer
bull Tuning LSH parametersbull LSH data structure is used for algorithm
speedups
Given an image x what are the parameters θ in this image
ie angles of joints orientation of the body etc1048698
The Problem
Fast Pose Estimation with Parameter Sensitive Hashing
G Shakhnarovich P Viola and T Darrell
i
Ingredients
bull Input query image with unknown angles (parameters)
bull Database of human poses with known anglesbull Image feature extractor ndash edge detector
bull Distance metric in feature space dx
bull Distance metric in angles space
m
i
iid1
2121 )cos(1)(
Example based learning
bull Construct a database of example images with their known angles
bull Given a query image run your favorite feature extractorbull Compute KNN from databasebull Use these KNNs to compute the average angles of the
query
Input queryFind KNN in database of examples
Output Average angles of KNN
Input Query
Features extraction
Processed query
PSH (LSH)
Database of examples
The algorithm flow
LWR (Regression)
Output Match
The image features
B A
Axx 4107 )(
4
3
2
4 0
Image features are multi-scale edge histograms
Feature Extraction PSH LWR
PSH The basic assumption
There are two metric spaces here feature space ( )
and parameter space ( )
We want similarity to be measured in the angles
space whereas LSH works on the feature space
bull Assumption The feature space is closely related to the parameter space
xd
d
Feature Extraction PSH LWR
Insight Manifolds
bull Manifold is a space in which every point has a neighborhood resembling a Euclid space
bull But global structure may be complicated curved
bull For example lines are 1D manifolds planes are 2D manifolds etc
Feature Extraction PSH LWR
Parameters Space (angles)
Feature Space
q
Is this Magic
Parameter Sensitive Hashing (PSH)
The trick
Estimate performance of different hash functions on examples and select those sensitive to
The hash functions are applied in feature space but the KNN are valid in angle space
d
Feature Extraction PSH LWR
Label pairs of examples with similar angles
Define hash functions h on feature space
Feature Extraction PSH LWR
Predict labeling of similarnon-similar examples by using h
Compare labeling
If labeling by h is goodaccept h else change h
PSH as a classification problem
+1 +1 -1 -1
(r=025)
Labels
)1()( if 1
)( if 1y
labeled is
)x()(x examples ofpair A
ij
ji
rd
rd
ji
ji
ji
Feature Extraction PSH LWR
otherwise 1-
T(x) if 1)(
xh T
A binary hash functionfeatures
otherwise 1
if 1ˆ
labels ePredict th
)(xh)(xh)x(xy
jTiTjih
Feature Extraction PSH LWR
Feature
Feature Extraction PSH LWR
sconstraint iesprobabilit with thelabeling true thepredicts that Tbest theFind
themseparateor bin
same in the examplesboth place willTh
)(xT
Local Weighted Regression (LWR)bull Given a query image PSH returns
KNNs
bull LWR uses the KNN to compute a weighted average of the estimated angles of the query
weightdist
iXiixNx
xxdKxgdi
0)(
))(())((minarg0
Feature Extraction PSH LWR
Results
Synthetic data were generated
bull 13 angles 1 for rotation of the torso 12 for joints
bull 150000 images
bull Nuisance parameters added clothing illumination face expression
bull 1775000 example pairs
bull Selected 137 out of 5123 meaningful features (how)
18 bit hash functions (k) 150 hash tables (l)
bull Test on 1000 synthetic examplesbull PSH searched only 34 of the data per query
bull Without selection needed 40 bits and
1000 hash tables
Recall P1 is prob of positive hashP2 is prob of bad hashB is the max number of pts in a bucket
Results ndash real data
bull 800 images
bull Processed by a segmentation algorithm
bull 13 of the data were searched
Results ndash real data
Interesting mismatches
Fast pose estimation - summary
bull Fast way to compute the angles of human body figure
bull Moving from one representation space to another
bull Training a sensitive hash function
bull KNN smart averaging
Food for Thought
bull The basic assumption may be problematic (distance metric representations)
bull The training set should be dense
bull Texture and clutter
bull General some features are more important than others and should be weighted
Food for Thought Point Location in Different Spheres (PLDS)
bull Given n spheres in Rd centered at P=p1hellippn
with radii r1helliprn
bull Goal given a query q preprocess the points in P to find point pi that its sphere lsquocoverrsquo the query q
qpi
ri
Courtesy of Mohamad Hegaze
Motivationbull Clustering high dimensional data by using local
density measurements (eg feature space)bull Statistical curse of dimensionality
sparseness of the databull Computational curse of dimensionality
expensive range queriesbull LSH parameters should be adjusted for optimal
performance
Mean-Shift Based Clustering in High Dimensions A Texture Classification Example
B Georgescu I Shimshoni and P Meer
Outline
bull Mean-shift in a nutshell + examples
Our scope
bull Mean-shift in high dimensions ndash using LSH
bull Speedups1 Finding optimal LSH parameters
2 Data-driven partitions into buckets
3 Additional speedup by using LSH data structure
Mean-Shift in a Nutshellbandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
point
KNN in mean-shift
Bandwidth should be inversely proportional to the density in the region
high density - small bandwidth low density - large bandwidth
Based on kth nearest neighbor of the point
The bandwidth is
Adaptive mean-shift vs non-adaptive
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Image segmentation algorithm1 Input Data in 5D (3 color + 2 xy) or 3D (1 gray +2 xy)2 Resolution controlled by the bandwidth hs (spatial) hr (color)3 Apply filtering
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
3D
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Image segmentation algorithm
original segmented
filtered
Filtering pixel value of the nearest mode
Mean-shift trajectories
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
original squirrel filtered
original baboon filtered
Filtering examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Segmentation examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Mean-shift in high dimensions
Computational curse of dimensionality
Statistical curse of dimensionality
Expensive range queries implemented with LSH
Sparseness of the data variable bandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
LSH-based data structure
bull Choose L random partitionsEach partition includes K pairs
(dkvk)bull For each point we check
kdi vxK
It Partitions the data into cells
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing the optimal K and L
bull For a query q compute smallest number of distances to points in its buckets
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
points extra includemight big toois L ifbut
missed bemight points small toois L If
cell ain points ofnumber smaller k Large
C
l
l
CC
dC
LNN
dKnN
)1(
C
C
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
structure data theof resolution thedetermines
decreases but increases increases L As
C
CC
Choosing optimal K and LDetermine accurately the KNN for m randomly-selected data points
distance (bandwidth)
Choose error threshold
The optimal K and L should satisfy
the approximate distance
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing optimal K and Lbull For each K estimate the error forbull In one run for all Lrsquos find the minimal L satisfying the constraint L(K)bull Minimize time t(KL(K))
minimum
Approximationerror for KL
L(K) for =005 Running timet[KL(K)]
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Data driven partitions
bull In original LSH cut values are random in the range of the databull Suggestion Randomly select a point from the data and use one of its coordinates as the cut value
uniform data driven pointsbucket distribution
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Additional speedup
aggregate)an of typea like is (C mode same
the toconverge willCin points all that Assume
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
C
C
Speedup results
65536 points 1638 points sampled k=100
Food for thought
Low dimension High dimension
A thought for foodhellipbull Choose K L by sample learning or take the traditionalbull Can one estimate K L without samplingbull A thought for food does it help to know the data
dimensionality or the data manifoldbull Intuitively dimensionality implies the number of hash
functions neededbull The catch efficient dimensionality learning requires
KNN
1530 cookieshellip
Summary
bull LSH suggests a compromise on accuracy for the gain of complexity
bull Applications that involve massive data in high dimension require the LSH fast performance
bull Extension of the LSH to different spaces (PSH)
bull Learning the LSH parameters and hash functions for different applications
Conclusion
bull but at the endeverything depends on your data set
bull Try it at homendash Visit
httpwebmiteduandoniwwwLSHindexhtmlndash Email Alex Andoni Andonimitedundash Test over your own data
(C code under Red Hat Linux )
Thanks
bull Ilan Shimshoni (Haifa)
bull Mohamad Hegaze (Weizmann)
bull Alex Andoni (MIT)
bull Mica and Denis
- k-Nearest Neighbors Search in High Dimensions
- Outline
- Nearest Neighbor Search Problem definition
- Applications
- Naiumlve solution
- Common solution
- When to use nearest neighbor
- Nearest Neighbor
- r - Nearest Neighbor
- Slide 10
- The simplest solution
- Quadtree
- Quadtree - structure
- Quadtree - Query
- Quadtree ndash Pitfall1
- Slide 16
- Quadtree ndash pitfall 2
- Space partition based algorithms
- Slide 19
- Curse of dimensionality
- Curse of dimensionality Some intuition
- Slide 22
- Preview
- Hash function
- Slide 25
- Slide 26
- Recall r - Nearest Neighbor
- Locality sensitive hashing
- Slide 29
- Hamming Space
- Slide 31
- L1 to Hamming Space Embedding
- Slide 33
- Construction
- Query
- Alternative intuition random projections
- Slide 37
- Slide 38
- Slide 39
- k samplings
- Repeating
- Repeating L times
- Slide 43
- Secondary hashing
- The above hashing is locality-sensitive
- Slide 46
- Direct L2 solution
- Central limit theorem
- Slide 49
- Slide 50
- Norm Distance
- Slide 52
- The full Hashing
- Slide 54
- Slide 55
- Slide 56
- Generalization P-Stable distribution
- P-Stable summary
- Parameters selection
- Parameters selection hellip
- hellip Parameters selection
- Pros amp Cons
- Conclusion
- LSH - Applications
- Motivation
- Slide 66
- Given an image x what are the parameters θ in this image ie angles of joints orientation of the body etc1048698
- Ingredients
- Example based learning
- Slide 70
- The image features
- PSH The basic assumption
- Insight Manifolds
- Slide 74
- Parameter Sensitive Hashing (PSH)
- Slide 76
- Slide 77
- Slide 78
- Slide 79
- Local Weighted Regression (LWR)
- Results
- Slide 82
- Results ndash real data
- Slide 84
- Slide 85
- Fast pose estimation - summary
- Food for Thought
- Food for Thought Point Location in Different Spheres (PLDS)
- Motivation
- Slide 90
- Slide 91
- Slide 92
- Slide 93
- Image segmentation algorithm
- Slide 95
- Filtering examples
- Segmentation examples
- Mean-shift in high dimensions
- LSH-based data structure
- Choosing the optimal K and L
- Slide 101
- Choosing optimal K and L
- Slide 103
- Data driven partitions
- Additional speedup
- Speedup results
- Food for thought
- A thought for foodhellip
- Summary
- Slide 110
- Thanks
-
![Page 11: K-Nearest Neighbors Search in High Dimensions Tomer Peled Dan Kushnir Tell me who your neighbors are, and I'll know who you are.](https://reader036.fdocuments.us/reader036/viewer/2022062421/56649c765503460f94929f53/html5/thumbnails/11.jpg)
The simplest solution
bullLion in the desert
Quadtree
Split the first dimension into 2
Repeat iteratively
Stop when each cell has no more than 1 data point
Quadtree - structure
X
Y
X1Y1 PgeX1PgeY1
PltX1PltY1
PgeX1PltY1
PltX1PgeY1
X1Y1
Quadtree - Query
X
Y
In many cases works
X1Y1PltX1PltY1 PltX1
PgeY1
X1Y1
PgeX1PgeY1
PgeX1PltY1
Quadtree ndash Pitfall1
X
Y
In some cases doesnrsquot
X1Y1PgeX1PgeY1
PltX1
PltX1PltY1 PgeX1
PltY1PltX1PgeY1
X1Y1
Quadtree ndash Pitfall1
X
Y
In some cases nothing works
Quadtree ndash pitfall 2X
Y
O(2d)
Could result in Query time Exponential in dimensions
Space partition based algorithms
Multidimensional access methods Volker Gaede O Gunther
Could be improved
Outline
bullProblem definition and flavorsbullAlgorithms overview - low dimensions bullCurse of dimensionality (dgt1020)Curse of dimensionality (dgt1020)bullEnchanting the curse
Locality Sensitive Hashing (high dimension approximate solutions)
bulll2 extensionbullApplications (Dan)
Curse of dimensionality
bullQuery time or spaceO(nd)bullDgt1020 worst than sequential scan
ndashFor most geometric distributionsbullTechniques specific to high dimensions are needed
bullProoved in theory and in practice by Barkol amp Rabani 2000 amp Beame-Vee 2002
O( min(nd nd) )Naive
Curse of dimensionalitySome intuition
2
22
23
2d
Outline
bullProblem definition and flavorsbullAlgorithms overview - low dimensions bullCurse of dimensionality (dgt1020)bullEnchanting the curse Enchanting the curse
Locality Sensitive Hashing Locality Sensitive Hashing (high dimension approximate solutions)
bulll2 extensionbullApplications (Dan)
Preview
bullGeneral Solution ndash Locality sensitive hashing
bullImplementation for Hamming space
bullGeneralization to l1 amp l2
Hash function
Hash function
Hash function
Data_Item
Key
BinBucket
Hash function
X modulo 3
X=Number in the range 0n
02
Storage Address
Data structure
0
Usually we would like related Data-items to be stored at the same bin
Recall r - Nearest Neighbor
r
(1 + ) r
dist(qp1) r
dist(qp2) (1 + ) r r2=(1 + ) r1
Locality sensitive hashing
r(1 + ) r
(r p1p2 )Sensitiveequiv Pr[I(p)=I(q)] is ldquohighrdquo if p is ldquocloserdquo to qequiv Pr[I(p)=I(q)] is ldquolowrdquo if p isrdquofarrdquo from q
r2=(1 + ) r1
P1P2
Preview
bullGeneral Solution ndash Locality sensitive hashing
bullImplementation for Hamming space
bullGeneralization to l1 amp l2
Hamming Space
bullHamming space = 2N binary strings
bullHamming distance = changed digits
aka Signal distanceRichard Hamming
Hamming SpaceN
010100001111
010100001111
010010000011Distance = 4
bullHamming space
bullHamming distance
SUM(X1 XOR X2)
L1 to Hamming Space Embedding
p
8
C=11
1111111100011000000000
2
1111111100011000000000
drsquo=Cd
Hash function
Lj Hash function
p Hdrsquoisin
Gj(p)=p|Ij
j=1L k=3 digits
Bits sampling from p
Store p into bucket p|Ij 2k buckets101
11000000000 111111110000 111000000000 111111110001
Construction
1 2 L
p
Query
1 2 L
q
Alternative intuition random projections
p
8
C=11
1111111100011000000000
2
1111111100011000000000
drsquo=Cd
Alternative intuition random projections
8
C=11
1111111100011000000000
2
1111111100011000000000
p
Alternative intuition random projections
8
C=11
1111111100011000000000
2
1111111100011000000000
p
Alternative intuition random projections
101
11000000000 111111110000 111000000000 111111110001
000
100
110
001
101
111
2233 BucketsBucketsp
k samplings
Repeating
Repeating L times
Repeating L times
Secondary hashing
Support volume tuning
dataset-size vs storage volume
2k buckets
011
Size=B
M Buckets
Simple Hashing
MB=αn α=2
Skip
The above hashing is locality-sensitive
bullProbability (pq in same bucket)=
k=1 k=2
Distance (qpi) Distance (qpi)
Pro
babi
lity Pr
Adopted from Piotr Indykrsquos slides
kqp
dimensions
)(Distance1
Preview
bullGeneral Solution ndash Locality sensitive hashing
bullImplementation for Hamming space
bullGeneralization to l2
Direct L2 solution
bullNew hashing function
bullStill based on sampling
bullUsing mathematical trick
bullP-stable distribution for Lp distance bullGaussian distribution for L2 distance
Central limit theorem
v1 +v2 hellip+vn =+hellip
(Weighted Gaussians) = Weighted Gaussian
Central limit theorem
v1vn = Real Numbers
X1Xn = Independent Identically Distributed(iid)
+v2 X2 hellip+vn Xn =+hellipv1 X1
Central limit theorem
XvXvi
ii
ii
21
2||
Dot Product Norm
Norm Distance
XvuXvXui
iii
iii
ii
21
2||
Features vector 1
Features vector 2 Distance
Norm Distance
XvuXvXui
iii
iii
ii
21
2||
Dot Product
Dot Product Distance
The full Hashing
w
bvavh ba )(
[34 82 21]1
227742
d
d random numbers
+b
phaseRandom[0w]
wDiscretization step
Features vector
The full Hashing
w
bvavh ba )(
+34
100
7944
7900 8000 8100 82007800
The full Hashing
w
bvavh ba )(
+34
phaseRandom[0w]
100Discretization step
7944
The full Hashing
w
bvavh ba )(
a1 v d
iid from p-stable distribution
+b
phaseRandom[0w]
wDiscretization step
Features vector
Generalization P-Stable distribution
bullLp p=eps2
bullGeneralized Central Limit Theorem
bullP-stable distributionCauchy for L2
bullL2
bullCentral Limit Theorem
bullGaussian (normal) distribution
P-Stable summary
bullWorks for bullGeneralizes to 0ltplt=2
bullImproves query time
Query time = O (dn1(1+)log(n) ) O (dn1(1+)^2log(n) )
r - Nearest Neighbor
Latest resultsReported in Email by
Alexander Andoni
Parameters selection
bull90 Probability Best quarry time performance
For Euclidean Space
Parameters selectionhellip
For Euclidean Space
bullSingle projection hit an - Nearest Neighbor with Pr=p1
bullk projections hits an - Nearest Neighbor with Pr=p1k
bullL hashings fail to collide with Pr=(1-p1k)L
bullTo ensure Collision (eg 1-δge90)
bull1( -1-p1k)Lge 1-δ)1log(
)log(
1kp
L
L
Reject Non-NeighborsAccept Neighbors
hellipParameters selection
K
k
time Candidates verification Candidates extraction
Better Query Time than Spatial Data Structures
Scales well to higher dimensions and larger data size ( Sub-linear dependence )
Predictable running time
Extra storage over-head
Inefficient for data with distances concentrated around average
works best for Hamming distance (although can be generalized to Euclidean space)
In secondary storage linear scan is pretty much all we can do (for high dim)
requires radius r to be fixed in advance
Pros amp Cons
From Pioter Indyk slides
Conclusion
bullbut at the endeverything depends on your data set
bullTry it at homendashVisit
httpwebmiteduandoniwwwLSHindexhtml
ndashEmail Alex AndoniAndonimitedundashTest over your own data
(C code under Red Hat Linux )
LSH - Applicationsbull Searching video clips in databases (Hierarchical Non-Uniform Locality Sensitive
Hashing and Its Application to Video Identificationldquo Yang Ooi Sun)
bull Searching image databases (see the following)
bull Image segmentation (see the following)
bull Image classification (ldquoDiscriminant adaptive Nearest Neighbor Classificationrdquo T Hastie R Tibshirani)
bull Texture classification (see the following)
bull Clustering (see the following)
bull Embedding and manifold learning (LLE and many others)
bull Compression ndash vector quantizationbull Search engines (ldquoLSH Forest SelfTuning Indexes for Similarity Searchrdquo M Bawa T Condie P Ganesanrdquo)
bull Genomics (ldquoEfficient Large-Scale Sequence Comparison by Locality-Sensitive Hashingrdquo J Buhler)
bull In short whenever K-Nearest Neighbors (KNN) are needed
Motivation
bull A variety of procedures in learning require KNN computation
bull KNN search is a computational bottleneck
bull LSH provides a fast approximate solution to the problem
bull LSH requires hash function construction and parameter tunning
Outline
Fast Pose Estimation with Parameter Sensitive Hashing G Shakhnarovich P Viola and T Darrell
bull Finding sensitive hash functions
Mean Shift Based Clustering in HighDimensions A Texture Classification Example
B Georgescu I Shimshoni and P Meer
bull Tuning LSH parametersbull LSH data structure is used for algorithm
speedups
Given an image x what are the parameters θ in this image
ie angles of joints orientation of the body etc1048698
The Problem
Fast Pose Estimation with Parameter Sensitive Hashing
G Shakhnarovich P Viola and T Darrell
i
Ingredients
bull Input query image with unknown angles (parameters)
bull Database of human poses with known anglesbull Image feature extractor ndash edge detector
bull Distance metric in feature space dx
bull Distance metric in angles space
m
i
iid1
2121 )cos(1)(
Example based learning
bull Construct a database of example images with their known angles
bull Given a query image run your favorite feature extractorbull Compute KNN from databasebull Use these KNNs to compute the average angles of the
query
Input queryFind KNN in database of examples
Output Average angles of KNN
Input Query
Features extraction
Processed query
PSH (LSH)
Database of examples
The algorithm flow
LWR (Regression)
Output Match
The image features
B A
Axx 4107 )(
4
3
2
4 0
Image features are multi-scale edge histograms
Feature Extraction PSH LWR
PSH The basic assumption
There are two metric spaces here feature space ( )
and parameter space ( )
We want similarity to be measured in the angles
space whereas LSH works on the feature space
bull Assumption The feature space is closely related to the parameter space
xd
d
Feature Extraction PSH LWR
Insight Manifolds
bull Manifold is a space in which every point has a neighborhood resembling a Euclid space
bull But global structure may be complicated curved
bull For example lines are 1D manifolds planes are 2D manifolds etc
Feature Extraction PSH LWR
Parameters Space (angles)
Feature Space
q
Is this Magic
Parameter Sensitive Hashing (PSH)
The trick
Estimate performance of different hash functions on examples and select those sensitive to
The hash functions are applied in feature space but the KNN are valid in angle space
d
Feature Extraction PSH LWR
Label pairs of examples with similar angles
Define hash functions h on feature space
Feature Extraction PSH LWR
Predict labeling of similarnon-similar examples by using h
Compare labeling
If labeling by h is goodaccept h else change h
PSH as a classification problem
+1 +1 -1 -1
(r=025)
Labels
)1()( if 1
)( if 1y
labeled is
)x()(x examples ofpair A
ij
ji
rd
rd
ji
ji
ji
Feature Extraction PSH LWR
otherwise 1-
T(x) if 1)(
xh T
A binary hash functionfeatures
otherwise 1
if 1ˆ
labels ePredict th
)(xh)(xh)x(xy
jTiTjih
Feature Extraction PSH LWR
Feature
Feature Extraction PSH LWR
sconstraint iesprobabilit with thelabeling true thepredicts that Tbest theFind
themseparateor bin
same in the examplesboth place willTh
)(xT
Local Weighted Regression (LWR)bull Given a query image PSH returns
KNNs
bull LWR uses the KNN to compute a weighted average of the estimated angles of the query
weightdist
iXiixNx
xxdKxgdi
0)(
))(())((minarg0
Feature Extraction PSH LWR
Results
Synthetic data were generated
bull 13 angles 1 for rotation of the torso 12 for joints
bull 150000 images
bull Nuisance parameters added clothing illumination face expression
bull 1775000 example pairs
bull Selected 137 out of 5123 meaningful features (how)
18 bit hash functions (k) 150 hash tables (l)
bull Test on 1000 synthetic examplesbull PSH searched only 34 of the data per query
bull Without selection needed 40 bits and
1000 hash tables
Recall P1 is prob of positive hashP2 is prob of bad hashB is the max number of pts in a bucket
Results ndash real data
bull 800 images
bull Processed by a segmentation algorithm
bull 13 of the data were searched
Results ndash real data
Interesting mismatches
Fast pose estimation - summary
bull Fast way to compute the angles of human body figure
bull Moving from one representation space to another
bull Training a sensitive hash function
bull KNN smart averaging
Food for Thought
bull The basic assumption may be problematic (distance metric representations)
bull The training set should be dense
bull Texture and clutter
bull General some features are more important than others and should be weighted
Food for Thought Point Location in Different Spheres (PLDS)
bull Given n spheres in Rd centered at P=p1hellippn
with radii r1helliprn
bull Goal given a query q preprocess the points in P to find point pi that its sphere lsquocoverrsquo the query q
qpi
ri
Courtesy of Mohamad Hegaze
Motivationbull Clustering high dimensional data by using local
density measurements (eg feature space)bull Statistical curse of dimensionality
sparseness of the databull Computational curse of dimensionality
expensive range queriesbull LSH parameters should be adjusted for optimal
performance
Mean-Shift Based Clustering in High Dimensions A Texture Classification Example
B Georgescu I Shimshoni and P Meer
Outline
bull Mean-shift in a nutshell + examples
Our scope
bull Mean-shift in high dimensions ndash using LSH
bull Speedups1 Finding optimal LSH parameters
2 Data-driven partitions into buckets
3 Additional speedup by using LSH data structure
Mean-Shift in a Nutshellbandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
point
KNN in mean-shift
Bandwidth should be inversely proportional to the density in the region
high density - small bandwidth low density - large bandwidth
Based on kth nearest neighbor of the point
The bandwidth is
Adaptive mean-shift vs non-adaptive
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Image segmentation algorithm1 Input Data in 5D (3 color + 2 xy) or 3D (1 gray +2 xy)2 Resolution controlled by the bandwidth hs (spatial) hr (color)3 Apply filtering
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
3D
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Image segmentation algorithm
original segmented
filtered
Filtering pixel value of the nearest mode
Mean-shift trajectories
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
original squirrel filtered
original baboon filtered
Filtering examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Segmentation examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Mean-shift in high dimensions
Computational curse of dimensionality
Statistical curse of dimensionality
Expensive range queries implemented with LSH
Sparseness of the data variable bandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
LSH-based data structure
bull Choose L random partitionsEach partition includes K pairs
(dkvk)bull For each point we check
kdi vxK
It Partitions the data into cells
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing the optimal K and L
bull For a query q compute smallest number of distances to points in its buckets
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
points extra includemight big toois L ifbut
missed bemight points small toois L If
cell ain points ofnumber smaller k Large
C
l
l
CC
dC
LNN
dKnN
)1(
C
C
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
structure data theof resolution thedetermines
decreases but increases increases L As
C
CC
Choosing optimal K and LDetermine accurately the KNN for m randomly-selected data points
distance (bandwidth)
Choose error threshold
The optimal K and L should satisfy
the approximate distance
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing optimal K and Lbull For each K estimate the error forbull In one run for all Lrsquos find the minimal L satisfying the constraint L(K)bull Minimize time t(KL(K))
minimum
Approximationerror for KL
L(K) for =005 Running timet[KL(K)]
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Data driven partitions
bull In original LSH cut values are random in the range of the databull Suggestion Randomly select a point from the data and use one of its coordinates as the cut value
uniform data driven pointsbucket distribution
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Additional speedup
aggregate)an of typea like is (C mode same
the toconverge willCin points all that Assume
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
C
C
Speedup results
65536 points 1638 points sampled k=100
Food for thought
Low dimension High dimension
A thought for foodhellipbull Choose K L by sample learning or take the traditionalbull Can one estimate K L without samplingbull A thought for food does it help to know the data
dimensionality or the data manifoldbull Intuitively dimensionality implies the number of hash
functions neededbull The catch efficient dimensionality learning requires
KNN
1530 cookieshellip
Summary
bull LSH suggests a compromise on accuracy for the gain of complexity
bull Applications that involve massive data in high dimension require the LSH fast performance
bull Extension of the LSH to different spaces (PSH)
bull Learning the LSH parameters and hash functions for different applications
Conclusion
bull but at the endeverything depends on your data set
bull Try it at homendash Visit
httpwebmiteduandoniwwwLSHindexhtmlndash Email Alex Andoni Andonimitedundash Test over your own data
(C code under Red Hat Linux )
Thanks
bull Ilan Shimshoni (Haifa)
bull Mohamad Hegaze (Weizmann)
bull Alex Andoni (MIT)
bull Mica and Denis
- k-Nearest Neighbors Search in High Dimensions
- Outline
- Nearest Neighbor Search Problem definition
- Applications
- Naiumlve solution
- Common solution
- When to use nearest neighbor
- Nearest Neighbor
- r - Nearest Neighbor
- Slide 10
- The simplest solution
- Quadtree
- Quadtree - structure
- Quadtree - Query
- Quadtree ndash Pitfall1
- Slide 16
- Quadtree ndash pitfall 2
- Space partition based algorithms
- Slide 19
- Curse of dimensionality
- Curse of dimensionality Some intuition
- Slide 22
- Preview
- Hash function
- Slide 25
- Slide 26
- Recall r - Nearest Neighbor
- Locality sensitive hashing
- Slide 29
- Hamming Space
- Slide 31
- L1 to Hamming Space Embedding
- Slide 33
- Construction
- Query
- Alternative intuition random projections
- Slide 37
- Slide 38
- Slide 39
- k samplings
- Repeating
- Repeating L times
- Slide 43
- Secondary hashing
- The above hashing is locality-sensitive
- Slide 46
- Direct L2 solution
- Central limit theorem
- Slide 49
- Slide 50
- Norm Distance
- Slide 52
- The full Hashing
- Slide 54
- Slide 55
- Slide 56
- Generalization P-Stable distribution
- P-Stable summary
- Parameters selection
- Parameters selection hellip
- hellip Parameters selection
- Pros amp Cons
- Conclusion
- LSH - Applications
- Motivation
- Slide 66
- Given an image x what are the parameters θ in this image ie angles of joints orientation of the body etc1048698
- Ingredients
- Example based learning
- Slide 70
- The image features
- PSH The basic assumption
- Insight Manifolds
- Slide 74
- Parameter Sensitive Hashing (PSH)
- Slide 76
- Slide 77
- Slide 78
- Slide 79
- Local Weighted Regression (LWR)
- Results
- Slide 82
- Results ndash real data
- Slide 84
- Slide 85
- Fast pose estimation - summary
- Food for Thought
- Food for Thought Point Location in Different Spheres (PLDS)
- Motivation
- Slide 90
- Slide 91
- Slide 92
- Slide 93
- Image segmentation algorithm
- Slide 95
- Filtering examples
- Segmentation examples
- Mean-shift in high dimensions
- LSH-based data structure
- Choosing the optimal K and L
- Slide 101
- Choosing optimal K and L
- Slide 103
- Data driven partitions
- Additional speedup
- Speedup results
- Food for thought
- A thought for foodhellip
- Summary
- Slide 110
- Thanks
-
![Page 12: K-Nearest Neighbors Search in High Dimensions Tomer Peled Dan Kushnir Tell me who your neighbors are, and I'll know who you are.](https://reader036.fdocuments.us/reader036/viewer/2022062421/56649c765503460f94929f53/html5/thumbnails/12.jpg)
Quadtree
Split the first dimension into 2
Repeat iteratively
Stop when each cell has no more than 1 data point
Quadtree - structure
X
Y
X1Y1 PgeX1PgeY1
PltX1PltY1
PgeX1PltY1
PltX1PgeY1
X1Y1
Quadtree - Query
X
Y
In many cases works
X1Y1PltX1PltY1 PltX1
PgeY1
X1Y1
PgeX1PgeY1
PgeX1PltY1
Quadtree ndash Pitfall1
X
Y
In some cases doesnrsquot
X1Y1PgeX1PgeY1
PltX1
PltX1PltY1 PgeX1
PltY1PltX1PgeY1
X1Y1
Quadtree ndash Pitfall1
X
Y
In some cases nothing works
Quadtree ndash pitfall 2X
Y
O(2d)
Could result in Query time Exponential in dimensions
Space partition based algorithms
Multidimensional access methods Volker Gaede O Gunther
Could be improved
Outline
bullProblem definition and flavorsbullAlgorithms overview - low dimensions bullCurse of dimensionality (dgt1020)Curse of dimensionality (dgt1020)bullEnchanting the curse
Locality Sensitive Hashing (high dimension approximate solutions)
bulll2 extensionbullApplications (Dan)
Curse of dimensionality
bullQuery time or spaceO(nd)bullDgt1020 worst than sequential scan
ndashFor most geometric distributionsbullTechniques specific to high dimensions are needed
bullProoved in theory and in practice by Barkol amp Rabani 2000 amp Beame-Vee 2002
O( min(nd nd) )Naive
Curse of dimensionalitySome intuition
2
22
23
2d
Outline
bullProblem definition and flavorsbullAlgorithms overview - low dimensions bullCurse of dimensionality (dgt1020)bullEnchanting the curse Enchanting the curse
Locality Sensitive Hashing Locality Sensitive Hashing (high dimension approximate solutions)
bulll2 extensionbullApplications (Dan)
Preview
bullGeneral Solution ndash Locality sensitive hashing
bullImplementation for Hamming space
bullGeneralization to l1 amp l2
Hash function
Hash function
Hash function
Data_Item
Key
BinBucket
Hash function
X modulo 3
X=Number in the range 0n
02
Storage Address
Data structure
0
Usually we would like related Data-items to be stored at the same bin
Recall r - Nearest Neighbor
r
(1 + ) r
dist(qp1) r
dist(qp2) (1 + ) r r2=(1 + ) r1
Locality sensitive hashing
r(1 + ) r
(r p1p2 )Sensitiveequiv Pr[I(p)=I(q)] is ldquohighrdquo if p is ldquocloserdquo to qequiv Pr[I(p)=I(q)] is ldquolowrdquo if p isrdquofarrdquo from q
r2=(1 + ) r1
P1P2
Preview
bullGeneral Solution ndash Locality sensitive hashing
bullImplementation for Hamming space
bullGeneralization to l1 amp l2
Hamming Space
bullHamming space = 2N binary strings
bullHamming distance = changed digits
aka Signal distanceRichard Hamming
Hamming SpaceN
010100001111
010100001111
010010000011Distance = 4
bullHamming space
bullHamming distance
SUM(X1 XOR X2)
L1 to Hamming Space Embedding
p
8
C=11
1111111100011000000000
2
1111111100011000000000
drsquo=Cd
Hash function
Lj Hash function
p Hdrsquoisin
Gj(p)=p|Ij
j=1L k=3 digits
Bits sampling from p
Store p into bucket p|Ij 2k buckets101
11000000000 111111110000 111000000000 111111110001
Construction
1 2 L
p
Query
1 2 L
q
Alternative intuition random projections
p
8
C=11
1111111100011000000000
2
1111111100011000000000
drsquo=Cd
Alternative intuition random projections
8
C=11
1111111100011000000000
2
1111111100011000000000
p
Alternative intuition random projections
8
C=11
1111111100011000000000
2
1111111100011000000000
p
Alternative intuition random projections
101
11000000000 111111110000 111000000000 111111110001
000
100
110
001
101
111
2233 BucketsBucketsp
k samplings
Repeating
Repeating L times
Repeating L times
Secondary hashing
Support volume tuning
dataset-size vs storage volume
2k buckets
011
Size=B
M Buckets
Simple Hashing
MB=αn α=2
Skip
The above hashing is locality-sensitive
bullProbability (pq in same bucket)=
k=1 k=2
Distance (qpi) Distance (qpi)
Pro
babi
lity Pr
Adopted from Piotr Indykrsquos slides
kqp
dimensions
)(Distance1
Preview
bullGeneral Solution ndash Locality sensitive hashing
bullImplementation for Hamming space
bullGeneralization to l2
Direct L2 solution
bullNew hashing function
bullStill based on sampling
bullUsing mathematical trick
bullP-stable distribution for Lp distance bullGaussian distribution for L2 distance
Central limit theorem
v1 +v2 hellip+vn =+hellip
(Weighted Gaussians) = Weighted Gaussian
Central limit theorem
v1vn = Real Numbers
X1Xn = Independent Identically Distributed(iid)
+v2 X2 hellip+vn Xn =+hellipv1 X1
Central limit theorem
XvXvi
ii
ii
21
2||
Dot Product Norm
Norm Distance
XvuXvXui
iii
iii
ii
21
2||
Features vector 1
Features vector 2 Distance
Norm Distance
XvuXvXui
iii
iii
ii
21
2||
Dot Product
Dot Product Distance
The full Hashing
w
bvavh ba )(
[34 82 21]1
227742
d
d random numbers
+b
phaseRandom[0w]
wDiscretization step
Features vector
The full Hashing
w
bvavh ba )(
+34
100
7944
7900 8000 8100 82007800
The full Hashing
w
bvavh ba )(
+34
phaseRandom[0w]
100Discretization step
7944
The full Hashing
w
bvavh ba )(
a1 v d
iid from p-stable distribution
+b
phaseRandom[0w]
wDiscretization step
Features vector
Generalization P-Stable distribution
bullLp p=eps2
bullGeneralized Central Limit Theorem
bullP-stable distributionCauchy for L2
bullL2
bullCentral Limit Theorem
bullGaussian (normal) distribution
P-Stable summary
bullWorks for bullGeneralizes to 0ltplt=2
bullImproves query time
Query time = O (dn1(1+)log(n) ) O (dn1(1+)^2log(n) )
r - Nearest Neighbor
Latest resultsReported in Email by
Alexander Andoni
Parameters selection
bull90 Probability Best quarry time performance
For Euclidean Space
Parameters selectionhellip
For Euclidean Space
bullSingle projection hit an - Nearest Neighbor with Pr=p1
bullk projections hits an - Nearest Neighbor with Pr=p1k
bullL hashings fail to collide with Pr=(1-p1k)L
bullTo ensure Collision (eg 1-δge90)
bull1( -1-p1k)Lge 1-δ)1log(
)log(
1kp
L
L
Reject Non-NeighborsAccept Neighbors
hellipParameters selection
K
k
time Candidates verification Candidates extraction
Better Query Time than Spatial Data Structures
Scales well to higher dimensions and larger data size ( Sub-linear dependence )
Predictable running time
Extra storage over-head
Inefficient for data with distances concentrated around average
works best for Hamming distance (although can be generalized to Euclidean space)
In secondary storage linear scan is pretty much all we can do (for high dim)
requires radius r to be fixed in advance
Pros amp Cons
From Pioter Indyk slides
Conclusion
bullbut at the endeverything depends on your data set
bullTry it at homendashVisit
httpwebmiteduandoniwwwLSHindexhtml
ndashEmail Alex AndoniAndonimitedundashTest over your own data
(C code under Red Hat Linux )
LSH - Applicationsbull Searching video clips in databases (Hierarchical Non-Uniform Locality Sensitive
Hashing and Its Application to Video Identificationldquo Yang Ooi Sun)
bull Searching image databases (see the following)
bull Image segmentation (see the following)
bull Image classification (ldquoDiscriminant adaptive Nearest Neighbor Classificationrdquo T Hastie R Tibshirani)
bull Texture classification (see the following)
bull Clustering (see the following)
bull Embedding and manifold learning (LLE and many others)
bull Compression ndash vector quantizationbull Search engines (ldquoLSH Forest SelfTuning Indexes for Similarity Searchrdquo M Bawa T Condie P Ganesanrdquo)
bull Genomics (ldquoEfficient Large-Scale Sequence Comparison by Locality-Sensitive Hashingrdquo J Buhler)
bull In short whenever K-Nearest Neighbors (KNN) are needed
Motivation
bull A variety of procedures in learning require KNN computation
bull KNN search is a computational bottleneck
bull LSH provides a fast approximate solution to the problem
bull LSH requires hash function construction and parameter tunning
Outline
Fast Pose Estimation with Parameter Sensitive Hashing G Shakhnarovich P Viola and T Darrell
bull Finding sensitive hash functions
Mean Shift Based Clustering in HighDimensions A Texture Classification Example
B Georgescu I Shimshoni and P Meer
bull Tuning LSH parametersbull LSH data structure is used for algorithm
speedups
Given an image x what are the parameters θ in this image
ie angles of joints orientation of the body etc1048698
The Problem
Fast Pose Estimation with Parameter Sensitive Hashing
G Shakhnarovich P Viola and T Darrell
i
Ingredients
bull Input query image with unknown angles (parameters)
bull Database of human poses with known anglesbull Image feature extractor ndash edge detector
bull Distance metric in feature space dx
bull Distance metric in angles space
m
i
iid1
2121 )cos(1)(
Example based learning
bull Construct a database of example images with their known angles
bull Given a query image run your favorite feature extractorbull Compute KNN from databasebull Use these KNNs to compute the average angles of the
query
Input queryFind KNN in database of examples
Output Average angles of KNN
Input Query
Features extraction
Processed query
PSH (LSH)
Database of examples
The algorithm flow
LWR (Regression)
Output Match
The image features
B A
Axx 4107 )(
4
3
2
4 0
Image features are multi-scale edge histograms
Feature Extraction PSH LWR
PSH The basic assumption
There are two metric spaces here feature space ( )
and parameter space ( )
We want similarity to be measured in the angles
space whereas LSH works on the feature space
bull Assumption The feature space is closely related to the parameter space
xd
d
Feature Extraction PSH LWR
Insight Manifolds
bull Manifold is a space in which every point has a neighborhood resembling a Euclid space
bull But global structure may be complicated curved
bull For example lines are 1D manifolds planes are 2D manifolds etc
Feature Extraction PSH LWR
Parameters Space (angles)
Feature Space
q
Is this Magic
Parameter Sensitive Hashing (PSH)
The trick
Estimate performance of different hash functions on examples and select those sensitive to
The hash functions are applied in feature space but the KNN are valid in angle space
d
Feature Extraction PSH LWR
Label pairs of examples with similar angles
Define hash functions h on feature space
Feature Extraction PSH LWR
Predict labeling of similarnon-similar examples by using h
Compare labeling
If labeling by h is goodaccept h else change h
PSH as a classification problem
+1 +1 -1 -1
(r=025)
Labels
)1()( if 1
)( if 1y
labeled is
)x()(x examples ofpair A
ij
ji
rd
rd
ji
ji
ji
Feature Extraction PSH LWR
otherwise 1-
T(x) if 1)(
xh T
A binary hash functionfeatures
otherwise 1
if 1ˆ
labels ePredict th
)(xh)(xh)x(xy
jTiTjih
Feature Extraction PSH LWR
Feature
Feature Extraction PSH LWR
sconstraint iesprobabilit with thelabeling true thepredicts that Tbest theFind
themseparateor bin
same in the examplesboth place willTh
)(xT
Local Weighted Regression (LWR)bull Given a query image PSH returns
KNNs
bull LWR uses the KNN to compute a weighted average of the estimated angles of the query
weightdist
iXiixNx
xxdKxgdi
0)(
))(())((minarg0
Feature Extraction PSH LWR
Results
Synthetic data were generated
bull 13 angles 1 for rotation of the torso 12 for joints
bull 150000 images
bull Nuisance parameters added clothing illumination face expression
bull 1775000 example pairs
bull Selected 137 out of 5123 meaningful features (how)
18 bit hash functions (k) 150 hash tables (l)
bull Test on 1000 synthetic examplesbull PSH searched only 34 of the data per query
bull Without selection needed 40 bits and
1000 hash tables
Recall P1 is prob of positive hashP2 is prob of bad hashB is the max number of pts in a bucket
Results ndash real data
bull 800 images
bull Processed by a segmentation algorithm
bull 13 of the data were searched
Results ndash real data
Interesting mismatches
Fast pose estimation - summary
bull Fast way to compute the angles of human body figure
bull Moving from one representation space to another
bull Training a sensitive hash function
bull KNN smart averaging
Food for Thought
bull The basic assumption may be problematic (distance metric representations)
bull The training set should be dense
bull Texture and clutter
bull General some features are more important than others and should be weighted
Food for Thought Point Location in Different Spheres (PLDS)
bull Given n spheres in Rd centered at P=p1hellippn
with radii r1helliprn
bull Goal given a query q preprocess the points in P to find point pi that its sphere lsquocoverrsquo the query q
qpi
ri
Courtesy of Mohamad Hegaze
Motivationbull Clustering high dimensional data by using local
density measurements (eg feature space)bull Statistical curse of dimensionality
sparseness of the databull Computational curse of dimensionality
expensive range queriesbull LSH parameters should be adjusted for optimal
performance
Mean-Shift Based Clustering in High Dimensions A Texture Classification Example
B Georgescu I Shimshoni and P Meer
Outline
bull Mean-shift in a nutshell + examples
Our scope
bull Mean-shift in high dimensions ndash using LSH
bull Speedups1 Finding optimal LSH parameters
2 Data-driven partitions into buckets
3 Additional speedup by using LSH data structure
Mean-Shift in a Nutshellbandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
point
KNN in mean-shift
Bandwidth should be inversely proportional to the density in the region
high density - small bandwidth low density - large bandwidth
Based on kth nearest neighbor of the point
The bandwidth is
Adaptive mean-shift vs non-adaptive
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Image segmentation algorithm1 Input Data in 5D (3 color + 2 xy) or 3D (1 gray +2 xy)2 Resolution controlled by the bandwidth hs (spatial) hr (color)3 Apply filtering
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
3D
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Image segmentation algorithm
original segmented
filtered
Filtering pixel value of the nearest mode
Mean-shift trajectories
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
original squirrel filtered
original baboon filtered
Filtering examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Segmentation examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Mean-shift in high dimensions
Computational curse of dimensionality
Statistical curse of dimensionality
Expensive range queries implemented with LSH
Sparseness of the data variable bandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
LSH-based data structure
bull Choose L random partitionsEach partition includes K pairs
(dkvk)bull For each point we check
kdi vxK
It Partitions the data into cells
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing the optimal K and L
bull For a query q compute smallest number of distances to points in its buckets
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
points extra includemight big toois L ifbut
missed bemight points small toois L If
cell ain points ofnumber smaller k Large
C
l
l
CC
dC
LNN
dKnN
)1(
C
C
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
structure data theof resolution thedetermines
decreases but increases increases L As
C
CC
Choosing optimal K and LDetermine accurately the KNN for m randomly-selected data points
distance (bandwidth)
Choose error threshold
The optimal K and L should satisfy
the approximate distance
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing optimal K and Lbull For each K estimate the error forbull In one run for all Lrsquos find the minimal L satisfying the constraint L(K)bull Minimize time t(KL(K))
minimum
Approximationerror for KL
L(K) for =005 Running timet[KL(K)]
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Data driven partitions
bull In original LSH cut values are random in the range of the databull Suggestion Randomly select a point from the data and use one of its coordinates as the cut value
uniform data driven pointsbucket distribution
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Additional speedup
aggregate)an of typea like is (C mode same
the toconverge willCin points all that Assume
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
C
C
Speedup results
65536 points 1638 points sampled k=100
Food for thought
Low dimension High dimension
A thought for foodhellipbull Choose K L by sample learning or take the traditionalbull Can one estimate K L without samplingbull A thought for food does it help to know the data
dimensionality or the data manifoldbull Intuitively dimensionality implies the number of hash
functions neededbull The catch efficient dimensionality learning requires
KNN
1530 cookieshellip
Summary
bull LSH suggests a compromise on accuracy for the gain of complexity
bull Applications that involve massive data in high dimension require the LSH fast performance
bull Extension of the LSH to different spaces (PSH)
bull Learning the LSH parameters and hash functions for different applications
Conclusion
bull but at the endeverything depends on your data set
bull Try it at homendash Visit
httpwebmiteduandoniwwwLSHindexhtmlndash Email Alex Andoni Andonimitedundash Test over your own data
(C code under Red Hat Linux )
Thanks
bull Ilan Shimshoni (Haifa)
bull Mohamad Hegaze (Weizmann)
bull Alex Andoni (MIT)
bull Mica and Denis
- k-Nearest Neighbors Search in High Dimensions
- Outline
- Nearest Neighbor Search Problem definition
- Applications
- Naiumlve solution
- Common solution
- When to use nearest neighbor
- Nearest Neighbor
- r - Nearest Neighbor
- Slide 10
- The simplest solution
- Quadtree
- Quadtree - structure
- Quadtree - Query
- Quadtree ndash Pitfall1
- Slide 16
- Quadtree ndash pitfall 2
- Space partition based algorithms
- Slide 19
- Curse of dimensionality
- Curse of dimensionality Some intuition
- Slide 22
- Preview
- Hash function
- Slide 25
- Slide 26
- Recall r - Nearest Neighbor
- Locality sensitive hashing
- Slide 29
- Hamming Space
- Slide 31
- L1 to Hamming Space Embedding
- Slide 33
- Construction
- Query
- Alternative intuition random projections
- Slide 37
- Slide 38
- Slide 39
- k samplings
- Repeating
- Repeating L times
- Slide 43
- Secondary hashing
- The above hashing is locality-sensitive
- Slide 46
- Direct L2 solution
- Central limit theorem
- Slide 49
- Slide 50
- Norm Distance
- Slide 52
- The full Hashing
- Slide 54
- Slide 55
- Slide 56
- Generalization P-Stable distribution
- P-Stable summary
- Parameters selection
- Parameters selection hellip
- hellip Parameters selection
- Pros amp Cons
- Conclusion
- LSH - Applications
- Motivation
- Slide 66
- Given an image x what are the parameters θ in this image ie angles of joints orientation of the body etc1048698
- Ingredients
- Example based learning
- Slide 70
- The image features
- PSH The basic assumption
- Insight Manifolds
- Slide 74
- Parameter Sensitive Hashing (PSH)
- Slide 76
- Slide 77
- Slide 78
- Slide 79
- Local Weighted Regression (LWR)
- Results
- Slide 82
- Results ndash real data
- Slide 84
- Slide 85
- Fast pose estimation - summary
- Food for Thought
- Food for Thought Point Location in Different Spheres (PLDS)
- Motivation
- Slide 90
- Slide 91
- Slide 92
- Slide 93
- Image segmentation algorithm
- Slide 95
- Filtering examples
- Segmentation examples
- Mean-shift in high dimensions
- LSH-based data structure
- Choosing the optimal K and L
- Slide 101
- Choosing optimal K and L
- Slide 103
- Data driven partitions
- Additional speedup
- Speedup results
- Food for thought
- A thought for foodhellip
- Summary
- Slide 110
- Thanks
-
![Page 13: K-Nearest Neighbors Search in High Dimensions Tomer Peled Dan Kushnir Tell me who your neighbors are, and I'll know who you are.](https://reader036.fdocuments.us/reader036/viewer/2022062421/56649c765503460f94929f53/html5/thumbnails/13.jpg)
Quadtree - structure
X
Y
X1Y1 PgeX1PgeY1
PltX1PltY1
PgeX1PltY1
PltX1PgeY1
X1Y1
Quadtree - Query
X
Y
In many cases works
X1Y1PltX1PltY1 PltX1
PgeY1
X1Y1
PgeX1PgeY1
PgeX1PltY1
Quadtree ndash Pitfall1
X
Y
In some cases doesnrsquot
X1Y1PgeX1PgeY1
PltX1
PltX1PltY1 PgeX1
PltY1PltX1PgeY1
X1Y1
Quadtree ndash Pitfall1
X
Y
In some cases nothing works
Quadtree ndash pitfall 2X
Y
O(2d)
Could result in Query time Exponential in dimensions
Space partition based algorithms
Multidimensional access methods Volker Gaede O Gunther
Could be improved
Outline
bullProblem definition and flavorsbullAlgorithms overview - low dimensions bullCurse of dimensionality (dgt1020)Curse of dimensionality (dgt1020)bullEnchanting the curse
Locality Sensitive Hashing (high dimension approximate solutions)
bulll2 extensionbullApplications (Dan)
Curse of dimensionality
bullQuery time or spaceO(nd)bullDgt1020 worst than sequential scan
ndashFor most geometric distributionsbullTechniques specific to high dimensions are needed
bullProoved in theory and in practice by Barkol amp Rabani 2000 amp Beame-Vee 2002
O( min(nd nd) )Naive
Curse of dimensionalitySome intuition
2
22
23
2d
Outline
bullProblem definition and flavorsbullAlgorithms overview - low dimensions bullCurse of dimensionality (dgt1020)bullEnchanting the curse Enchanting the curse
Locality Sensitive Hashing Locality Sensitive Hashing (high dimension approximate solutions)
bulll2 extensionbullApplications (Dan)
Preview
bullGeneral Solution ndash Locality sensitive hashing
bullImplementation for Hamming space
bullGeneralization to l1 amp l2
Hash function
Hash function
Hash function
Data_Item
Key
BinBucket
Hash function
X modulo 3
X=Number in the range 0n
02
Storage Address
Data structure
0
Usually we would like related Data-items to be stored at the same bin
Recall r - Nearest Neighbor
r
(1 + ) r
dist(qp1) r
dist(qp2) (1 + ) r r2=(1 + ) r1
Locality sensitive hashing
r(1 + ) r
(r p1p2 )Sensitiveequiv Pr[I(p)=I(q)] is ldquohighrdquo if p is ldquocloserdquo to qequiv Pr[I(p)=I(q)] is ldquolowrdquo if p isrdquofarrdquo from q
r2=(1 + ) r1
P1P2
Preview
bullGeneral Solution ndash Locality sensitive hashing
bullImplementation for Hamming space
bullGeneralization to l1 amp l2
Hamming Space
bullHamming space = 2N binary strings
bullHamming distance = changed digits
aka Signal distanceRichard Hamming
Hamming SpaceN
010100001111
010100001111
010010000011Distance = 4
bullHamming space
bullHamming distance
SUM(X1 XOR X2)
L1 to Hamming Space Embedding
p
8
C=11
1111111100011000000000
2
1111111100011000000000
drsquo=Cd
Hash function
Lj Hash function
p Hdrsquoisin
Gj(p)=p|Ij
j=1L k=3 digits
Bits sampling from p
Store p into bucket p|Ij 2k buckets101
11000000000 111111110000 111000000000 111111110001
Construction
1 2 L
p
Query
1 2 L
q
Alternative intuition random projections
p
8
C=11
1111111100011000000000
2
1111111100011000000000
drsquo=Cd
Alternative intuition random projections
8
C=11
1111111100011000000000
2
1111111100011000000000
p
Alternative intuition random projections
8
C=11
1111111100011000000000
2
1111111100011000000000
p
Alternative intuition random projections
101
11000000000 111111110000 111000000000 111111110001
000
100
110
001
101
111
2233 BucketsBucketsp
k samplings
Repeating
Repeating L times
Repeating L times
Secondary hashing
Support volume tuning
dataset-size vs storage volume
2k buckets
011
Size=B
M Buckets
Simple Hashing
MB=αn α=2
Skip
The above hashing is locality-sensitive
bullProbability (pq in same bucket)=
k=1 k=2
Distance (qpi) Distance (qpi)
Pro
babi
lity Pr
Adopted from Piotr Indykrsquos slides
kqp
dimensions
)(Distance1
Preview
bullGeneral Solution ndash Locality sensitive hashing
bullImplementation for Hamming space
bullGeneralization to l2
Direct L2 solution
bullNew hashing function
bullStill based on sampling
bullUsing mathematical trick
bullP-stable distribution for Lp distance bullGaussian distribution for L2 distance
Central limit theorem
v1 +v2 hellip+vn =+hellip
(Weighted Gaussians) = Weighted Gaussian
Central limit theorem
v1vn = Real Numbers
X1Xn = Independent Identically Distributed(iid)
+v2 X2 hellip+vn Xn =+hellipv1 X1
Central limit theorem
XvXvi
ii
ii
21
2||
Dot Product Norm
Norm Distance
XvuXvXui
iii
iii
ii
21
2||
Features vector 1
Features vector 2 Distance
Norm Distance
XvuXvXui
iii
iii
ii
21
2||
Dot Product
Dot Product Distance
The full Hashing
w
bvavh ba )(
[34 82 21]1
227742
d
d random numbers
+b
phaseRandom[0w]
wDiscretization step
Features vector
The full Hashing
w
bvavh ba )(
+34
100
7944
7900 8000 8100 82007800
The full Hashing
w
bvavh ba )(
+34
phaseRandom[0w]
100Discretization step
7944
The full Hashing
w
bvavh ba )(
a1 v d
iid from p-stable distribution
+b
phaseRandom[0w]
wDiscretization step
Features vector
Generalization P-Stable distribution
bullLp p=eps2
bullGeneralized Central Limit Theorem
bullP-stable distributionCauchy for L2
bullL2
bullCentral Limit Theorem
bullGaussian (normal) distribution
P-Stable summary
bullWorks for bullGeneralizes to 0ltplt=2
bullImproves query time
Query time = O (dn1(1+)log(n) ) O (dn1(1+)^2log(n) )
r - Nearest Neighbor
Latest resultsReported in Email by
Alexander Andoni
Parameters selection
bull90 Probability Best quarry time performance
For Euclidean Space
Parameters selectionhellip
For Euclidean Space
bullSingle projection hit an - Nearest Neighbor with Pr=p1
bullk projections hits an - Nearest Neighbor with Pr=p1k
bullL hashings fail to collide with Pr=(1-p1k)L
bullTo ensure Collision (eg 1-δge90)
bull1( -1-p1k)Lge 1-δ)1log(
)log(
1kp
L
L
Reject Non-NeighborsAccept Neighbors
hellipParameters selection
K
k
time Candidates verification Candidates extraction
Better Query Time than Spatial Data Structures
Scales well to higher dimensions and larger data size ( Sub-linear dependence )
Predictable running time
Extra storage over-head
Inefficient for data with distances concentrated around average
works best for Hamming distance (although can be generalized to Euclidean space)
In secondary storage linear scan is pretty much all we can do (for high dim)
requires radius r to be fixed in advance
Pros amp Cons
From Pioter Indyk slides
Conclusion
bullbut at the endeverything depends on your data set
bullTry it at homendashVisit
httpwebmiteduandoniwwwLSHindexhtml
ndashEmail Alex AndoniAndonimitedundashTest over your own data
(C code under Red Hat Linux )
LSH - Applicationsbull Searching video clips in databases (Hierarchical Non-Uniform Locality Sensitive
Hashing and Its Application to Video Identificationldquo Yang Ooi Sun)
bull Searching image databases (see the following)
bull Image segmentation (see the following)
bull Image classification (ldquoDiscriminant adaptive Nearest Neighbor Classificationrdquo T Hastie R Tibshirani)
bull Texture classification (see the following)
bull Clustering (see the following)
bull Embedding and manifold learning (LLE and many others)
bull Compression ndash vector quantizationbull Search engines (ldquoLSH Forest SelfTuning Indexes for Similarity Searchrdquo M Bawa T Condie P Ganesanrdquo)
bull Genomics (ldquoEfficient Large-Scale Sequence Comparison by Locality-Sensitive Hashingrdquo J Buhler)
bull In short whenever K-Nearest Neighbors (KNN) are needed
Motivation
bull A variety of procedures in learning require KNN computation
bull KNN search is a computational bottleneck
bull LSH provides a fast approximate solution to the problem
bull LSH requires hash function construction and parameter tunning
Outline
Fast Pose Estimation with Parameter Sensitive Hashing G Shakhnarovich P Viola and T Darrell
bull Finding sensitive hash functions
Mean Shift Based Clustering in HighDimensions A Texture Classification Example
B Georgescu I Shimshoni and P Meer
bull Tuning LSH parametersbull LSH data structure is used for algorithm
speedups
Given an image x what are the parameters θ in this image
ie angles of joints orientation of the body etc1048698
The Problem
Fast Pose Estimation with Parameter Sensitive Hashing
G Shakhnarovich P Viola and T Darrell
i
Ingredients
bull Input query image with unknown angles (parameters)
bull Database of human poses with known anglesbull Image feature extractor ndash edge detector
bull Distance metric in feature space dx
bull Distance metric in angles space
m
i
iid1
2121 )cos(1)(
Example based learning
bull Construct a database of example images with their known angles
bull Given a query image run your favorite feature extractorbull Compute KNN from databasebull Use these KNNs to compute the average angles of the
query
Input queryFind KNN in database of examples
Output Average angles of KNN
Input Query
Features extraction
Processed query
PSH (LSH)
Database of examples
The algorithm flow
LWR (Regression)
Output Match
The image features
B A
Axx 4107 )(
4
3
2
4 0
Image features are multi-scale edge histograms
Feature Extraction PSH LWR
PSH The basic assumption
There are two metric spaces here feature space ( )
and parameter space ( )
We want similarity to be measured in the angles
space whereas LSH works on the feature space
bull Assumption The feature space is closely related to the parameter space
xd
d
Feature Extraction PSH LWR
Insight Manifolds
bull Manifold is a space in which every point has a neighborhood resembling a Euclid space
bull But global structure may be complicated curved
bull For example lines are 1D manifolds planes are 2D manifolds etc
Feature Extraction PSH LWR
Parameters Space (angles)
Feature Space
q
Is this Magic
Parameter Sensitive Hashing (PSH)
The trick
Estimate performance of different hash functions on examples and select those sensitive to
The hash functions are applied in feature space but the KNN are valid in angle space
d
Feature Extraction PSH LWR
Label pairs of examples with similar angles
Define hash functions h on feature space
Feature Extraction PSH LWR
Predict labeling of similarnon-similar examples by using h
Compare labeling
If labeling by h is goodaccept h else change h
PSH as a classification problem
+1 +1 -1 -1
(r=025)
Labels
)1()( if 1
)( if 1y
labeled is
)x()(x examples ofpair A
ij
ji
rd
rd
ji
ji
ji
Feature Extraction PSH LWR
otherwise 1-
T(x) if 1)(
xh T
A binary hash functionfeatures
otherwise 1
if 1ˆ
labels ePredict th
)(xh)(xh)x(xy
jTiTjih
Feature Extraction PSH LWR
Feature
Feature Extraction PSH LWR
sconstraint iesprobabilit with thelabeling true thepredicts that Tbest theFind
themseparateor bin
same in the examplesboth place willTh
)(xT
Local Weighted Regression (LWR)bull Given a query image PSH returns
KNNs
bull LWR uses the KNN to compute a weighted average of the estimated angles of the query
weightdist
iXiixNx
xxdKxgdi
0)(
))(())((minarg0
Feature Extraction PSH LWR
Results
Synthetic data were generated
bull 13 angles 1 for rotation of the torso 12 for joints
bull 150000 images
bull Nuisance parameters added clothing illumination face expression
bull 1775000 example pairs
bull Selected 137 out of 5123 meaningful features (how)
18 bit hash functions (k) 150 hash tables (l)
bull Test on 1000 synthetic examplesbull PSH searched only 34 of the data per query
bull Without selection needed 40 bits and
1000 hash tables
Recall P1 is prob of positive hashP2 is prob of bad hashB is the max number of pts in a bucket
Results ndash real data
bull 800 images
bull Processed by a segmentation algorithm
bull 13 of the data were searched
Results ndash real data
Interesting mismatches
Fast pose estimation - summary
bull Fast way to compute the angles of human body figure
bull Moving from one representation space to another
bull Training a sensitive hash function
bull KNN smart averaging
Food for Thought
bull The basic assumption may be problematic (distance metric representations)
bull The training set should be dense
bull Texture and clutter
bull General some features are more important than others and should be weighted
Food for Thought Point Location in Different Spheres (PLDS)
bull Given n spheres in Rd centered at P=p1hellippn
with radii r1helliprn
bull Goal given a query q preprocess the points in P to find point pi that its sphere lsquocoverrsquo the query q
qpi
ri
Courtesy of Mohamad Hegaze
Motivationbull Clustering high dimensional data by using local
density measurements (eg feature space)bull Statistical curse of dimensionality
sparseness of the databull Computational curse of dimensionality
expensive range queriesbull LSH parameters should be adjusted for optimal
performance
Mean-Shift Based Clustering in High Dimensions A Texture Classification Example
B Georgescu I Shimshoni and P Meer
Outline
bull Mean-shift in a nutshell + examples
Our scope
bull Mean-shift in high dimensions ndash using LSH
bull Speedups1 Finding optimal LSH parameters
2 Data-driven partitions into buckets
3 Additional speedup by using LSH data structure
Mean-Shift in a Nutshellbandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
point
KNN in mean-shift
Bandwidth should be inversely proportional to the density in the region
high density - small bandwidth low density - large bandwidth
Based on kth nearest neighbor of the point
The bandwidth is
Adaptive mean-shift vs non-adaptive
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Image segmentation algorithm1 Input Data in 5D (3 color + 2 xy) or 3D (1 gray +2 xy)2 Resolution controlled by the bandwidth hs (spatial) hr (color)3 Apply filtering
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
3D
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Image segmentation algorithm
original segmented
filtered
Filtering pixel value of the nearest mode
Mean-shift trajectories
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
original squirrel filtered
original baboon filtered
Filtering examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Segmentation examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Mean-shift in high dimensions
Computational curse of dimensionality
Statistical curse of dimensionality
Expensive range queries implemented with LSH
Sparseness of the data variable bandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
LSH-based data structure
bull Choose L random partitionsEach partition includes K pairs
(dkvk)bull For each point we check
kdi vxK
It Partitions the data into cells
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing the optimal K and L
bull For a query q compute smallest number of distances to points in its buckets
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
points extra includemight big toois L ifbut
missed bemight points small toois L If
cell ain points ofnumber smaller k Large
C
l
l
CC
dC
LNN
dKnN
)1(
C
C
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
structure data theof resolution thedetermines
decreases but increases increases L As
C
CC
Choosing optimal K and LDetermine accurately the KNN for m randomly-selected data points
distance (bandwidth)
Choose error threshold
The optimal K and L should satisfy
the approximate distance
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing optimal K and Lbull For each K estimate the error forbull In one run for all Lrsquos find the minimal L satisfying the constraint L(K)bull Minimize time t(KL(K))
minimum
Approximationerror for KL
L(K) for =005 Running timet[KL(K)]
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Data driven partitions
bull In original LSH cut values are random in the range of the databull Suggestion Randomly select a point from the data and use one of its coordinates as the cut value
uniform data driven pointsbucket distribution
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Additional speedup
aggregate)an of typea like is (C mode same
the toconverge willCin points all that Assume
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
C
C
Speedup results
65536 points 1638 points sampled k=100
Food for thought
Low dimension High dimension
A thought for foodhellipbull Choose K L by sample learning or take the traditionalbull Can one estimate K L without samplingbull A thought for food does it help to know the data
dimensionality or the data manifoldbull Intuitively dimensionality implies the number of hash
functions neededbull The catch efficient dimensionality learning requires
KNN
1530 cookieshellip
Summary
bull LSH suggests a compromise on accuracy for the gain of complexity
bull Applications that involve massive data in high dimension require the LSH fast performance
bull Extension of the LSH to different spaces (PSH)
bull Learning the LSH parameters and hash functions for different applications
Conclusion
bull but at the endeverything depends on your data set
bull Try it at homendash Visit
httpwebmiteduandoniwwwLSHindexhtmlndash Email Alex Andoni Andonimitedundash Test over your own data
(C code under Red Hat Linux )
Thanks
bull Ilan Shimshoni (Haifa)
bull Mohamad Hegaze (Weizmann)
bull Alex Andoni (MIT)
bull Mica and Denis
- k-Nearest Neighbors Search in High Dimensions
- Outline
- Nearest Neighbor Search Problem definition
- Applications
- Naiumlve solution
- Common solution
- When to use nearest neighbor
- Nearest Neighbor
- r - Nearest Neighbor
- Slide 10
- The simplest solution
- Quadtree
- Quadtree - structure
- Quadtree - Query
- Quadtree ndash Pitfall1
- Slide 16
- Quadtree ndash pitfall 2
- Space partition based algorithms
- Slide 19
- Curse of dimensionality
- Curse of dimensionality Some intuition
- Slide 22
- Preview
- Hash function
- Slide 25
- Slide 26
- Recall r - Nearest Neighbor
- Locality sensitive hashing
- Slide 29
- Hamming Space
- Slide 31
- L1 to Hamming Space Embedding
- Slide 33
- Construction
- Query
- Alternative intuition random projections
- Slide 37
- Slide 38
- Slide 39
- k samplings
- Repeating
- Repeating L times
- Slide 43
- Secondary hashing
- The above hashing is locality-sensitive
- Slide 46
- Direct L2 solution
- Central limit theorem
- Slide 49
- Slide 50
- Norm Distance
- Slide 52
- The full Hashing
- Slide 54
- Slide 55
- Slide 56
- Generalization P-Stable distribution
- P-Stable summary
- Parameters selection
- Parameters selection hellip
- hellip Parameters selection
- Pros amp Cons
- Conclusion
- LSH - Applications
- Motivation
- Slide 66
- Given an image x what are the parameters θ in this image ie angles of joints orientation of the body etc1048698
- Ingredients
- Example based learning
- Slide 70
- The image features
- PSH The basic assumption
- Insight Manifolds
- Slide 74
- Parameter Sensitive Hashing (PSH)
- Slide 76
- Slide 77
- Slide 78
- Slide 79
- Local Weighted Regression (LWR)
- Results
- Slide 82
- Results ndash real data
- Slide 84
- Slide 85
- Fast pose estimation - summary
- Food for Thought
- Food for Thought Point Location in Different Spheres (PLDS)
- Motivation
- Slide 90
- Slide 91
- Slide 92
- Slide 93
- Image segmentation algorithm
- Slide 95
- Filtering examples
- Segmentation examples
- Mean-shift in high dimensions
- LSH-based data structure
- Choosing the optimal K and L
- Slide 101
- Choosing optimal K and L
- Slide 103
- Data driven partitions
- Additional speedup
- Speedup results
- Food for thought
- A thought for foodhellip
- Summary
- Slide 110
- Thanks
-
![Page 14: K-Nearest Neighbors Search in High Dimensions Tomer Peled Dan Kushnir Tell me who your neighbors are, and I'll know who you are.](https://reader036.fdocuments.us/reader036/viewer/2022062421/56649c765503460f94929f53/html5/thumbnails/14.jpg)
Quadtree - Query
X
Y
In many cases works
X1Y1PltX1PltY1 PltX1
PgeY1
X1Y1
PgeX1PgeY1
PgeX1PltY1
Quadtree ndash Pitfall1
X
Y
In some cases doesnrsquot
X1Y1PgeX1PgeY1
PltX1
PltX1PltY1 PgeX1
PltY1PltX1PgeY1
X1Y1
Quadtree ndash Pitfall1
X
Y
In some cases nothing works
Quadtree ndash pitfall 2X
Y
O(2d)
Could result in Query time Exponential in dimensions
Space partition based algorithms
Multidimensional access methods Volker Gaede O Gunther
Could be improved
Outline
bullProblem definition and flavorsbullAlgorithms overview - low dimensions bullCurse of dimensionality (dgt1020)Curse of dimensionality (dgt1020)bullEnchanting the curse
Locality Sensitive Hashing (high dimension approximate solutions)
bulll2 extensionbullApplications (Dan)
Curse of dimensionality
bullQuery time or spaceO(nd)bullDgt1020 worst than sequential scan
ndashFor most geometric distributionsbullTechniques specific to high dimensions are needed
bullProoved in theory and in practice by Barkol amp Rabani 2000 amp Beame-Vee 2002
O( min(nd nd) )Naive
Curse of dimensionalitySome intuition
2
22
23
2d
Outline
bullProblem definition and flavorsbullAlgorithms overview - low dimensions bullCurse of dimensionality (dgt1020)bullEnchanting the curse Enchanting the curse
Locality Sensitive Hashing Locality Sensitive Hashing (high dimension approximate solutions)
bulll2 extensionbullApplications (Dan)
Preview
bullGeneral Solution ndash Locality sensitive hashing
bullImplementation for Hamming space
bullGeneralization to l1 amp l2
Hash function
Hash function
Hash function
Data_Item
Key
BinBucket
Hash function
X modulo 3
X=Number in the range 0n
02
Storage Address
Data structure
0
Usually we would like related Data-items to be stored at the same bin
Recall r - Nearest Neighbor
r
(1 + ) r
dist(qp1) r
dist(qp2) (1 + ) r r2=(1 + ) r1
Locality sensitive hashing
r(1 + ) r
(r p1p2 )Sensitiveequiv Pr[I(p)=I(q)] is ldquohighrdquo if p is ldquocloserdquo to qequiv Pr[I(p)=I(q)] is ldquolowrdquo if p isrdquofarrdquo from q
r2=(1 + ) r1
P1P2
Preview
bullGeneral Solution ndash Locality sensitive hashing
bullImplementation for Hamming space
bullGeneralization to l1 amp l2
Hamming Space
bullHamming space = 2N binary strings
bullHamming distance = changed digits
aka Signal distanceRichard Hamming
Hamming SpaceN
010100001111
010100001111
010010000011Distance = 4
bullHamming space
bullHamming distance
SUM(X1 XOR X2)
L1 to Hamming Space Embedding
p
8
C=11
1111111100011000000000
2
1111111100011000000000
drsquo=Cd
Hash function
Lj Hash function
p Hdrsquoisin
Gj(p)=p|Ij
j=1L k=3 digits
Bits sampling from p
Store p into bucket p|Ij 2k buckets101
11000000000 111111110000 111000000000 111111110001
Construction
1 2 L
p
Query
1 2 L
q
Alternative intuition random projections
p
8
C=11
1111111100011000000000
2
1111111100011000000000
drsquo=Cd
Alternative intuition random projections
8
C=11
1111111100011000000000
2
1111111100011000000000
p
Alternative intuition random projections
8
C=11
1111111100011000000000
2
1111111100011000000000
p
Alternative intuition random projections
101
11000000000 111111110000 111000000000 111111110001
000
100
110
001
101
111
2233 BucketsBucketsp
k samplings
Repeating
Repeating L times
Repeating L times
Secondary hashing
Support volume tuning
dataset-size vs storage volume
2k buckets
011
Size=B
M Buckets
Simple Hashing
MB=αn α=2
Skip
The above hashing is locality-sensitive
bullProbability (pq in same bucket)=
k=1 k=2
Distance (qpi) Distance (qpi)
Pro
babi
lity Pr
Adopted from Piotr Indykrsquos slides
kqp
dimensions
)(Distance1
Preview
bullGeneral Solution ndash Locality sensitive hashing
bullImplementation for Hamming space
bullGeneralization to l2
Direct L2 solution
bullNew hashing function
bullStill based on sampling
bullUsing mathematical trick
bullP-stable distribution for Lp distance bullGaussian distribution for L2 distance
Central limit theorem
v1 +v2 hellip+vn =+hellip
(Weighted Gaussians) = Weighted Gaussian
Central limit theorem
v1vn = Real Numbers
X1Xn = Independent Identically Distributed(iid)
+v2 X2 hellip+vn Xn =+hellipv1 X1
Central limit theorem
XvXvi
ii
ii
21
2||
Dot Product Norm
Norm Distance
XvuXvXui
iii
iii
ii
21
2||
Features vector 1
Features vector 2 Distance
Norm Distance
XvuXvXui
iii
iii
ii
21
2||
Dot Product
Dot Product Distance
The full Hashing
w
bvavh ba )(
[34 82 21]1
227742
d
d random numbers
+b
phaseRandom[0w]
wDiscretization step
Features vector
The full Hashing
w
bvavh ba )(
+34
100
7944
7900 8000 8100 82007800
The full Hashing
w
bvavh ba )(
+34
phaseRandom[0w]
100Discretization step
7944
The full Hashing
w
bvavh ba )(
a1 v d
iid from p-stable distribution
+b
phaseRandom[0w]
wDiscretization step
Features vector
Generalization P-Stable distribution
bullLp p=eps2
bullGeneralized Central Limit Theorem
bullP-stable distributionCauchy for L2
bullL2
bullCentral Limit Theorem
bullGaussian (normal) distribution
P-Stable summary
bullWorks for bullGeneralizes to 0ltplt=2
bullImproves query time
Query time = O (dn1(1+)log(n) ) O (dn1(1+)^2log(n) )
r - Nearest Neighbor
Latest resultsReported in Email by
Alexander Andoni
Parameters selection
bull90 Probability Best quarry time performance
For Euclidean Space
Parameters selectionhellip
For Euclidean Space
bullSingle projection hit an - Nearest Neighbor with Pr=p1
bullk projections hits an - Nearest Neighbor with Pr=p1k
bullL hashings fail to collide with Pr=(1-p1k)L
bullTo ensure Collision (eg 1-δge90)
bull1( -1-p1k)Lge 1-δ)1log(
)log(
1kp
L
L
Reject Non-NeighborsAccept Neighbors
hellipParameters selection
K
k
time Candidates verification Candidates extraction
Better Query Time than Spatial Data Structures
Scales well to higher dimensions and larger data size ( Sub-linear dependence )
Predictable running time
Extra storage over-head
Inefficient for data with distances concentrated around average
works best for Hamming distance (although can be generalized to Euclidean space)
In secondary storage linear scan is pretty much all we can do (for high dim)
requires radius r to be fixed in advance
Pros amp Cons
From Pioter Indyk slides
Conclusion
bullbut at the endeverything depends on your data set
bullTry it at homendashVisit
httpwebmiteduandoniwwwLSHindexhtml
ndashEmail Alex AndoniAndonimitedundashTest over your own data
(C code under Red Hat Linux )
LSH - Applicationsbull Searching video clips in databases (Hierarchical Non-Uniform Locality Sensitive
Hashing and Its Application to Video Identificationldquo Yang Ooi Sun)
bull Searching image databases (see the following)
bull Image segmentation (see the following)
bull Image classification (ldquoDiscriminant adaptive Nearest Neighbor Classificationrdquo T Hastie R Tibshirani)
bull Texture classification (see the following)
bull Clustering (see the following)
bull Embedding and manifold learning (LLE and many others)
bull Compression ndash vector quantizationbull Search engines (ldquoLSH Forest SelfTuning Indexes for Similarity Searchrdquo M Bawa T Condie P Ganesanrdquo)
bull Genomics (ldquoEfficient Large-Scale Sequence Comparison by Locality-Sensitive Hashingrdquo J Buhler)
bull In short whenever K-Nearest Neighbors (KNN) are needed
Motivation
bull A variety of procedures in learning require KNN computation
bull KNN search is a computational bottleneck
bull LSH provides a fast approximate solution to the problem
bull LSH requires hash function construction and parameter tunning
Outline
Fast Pose Estimation with Parameter Sensitive Hashing G Shakhnarovich P Viola and T Darrell
bull Finding sensitive hash functions
Mean Shift Based Clustering in HighDimensions A Texture Classification Example
B Georgescu I Shimshoni and P Meer
bull Tuning LSH parametersbull LSH data structure is used for algorithm
speedups
Given an image x what are the parameters θ in this image
ie angles of joints orientation of the body etc1048698
The Problem
Fast Pose Estimation with Parameter Sensitive Hashing
G Shakhnarovich P Viola and T Darrell
i
Ingredients
bull Input query image with unknown angles (parameters)
bull Database of human poses with known anglesbull Image feature extractor ndash edge detector
bull Distance metric in feature space dx
bull Distance metric in angles space
m
i
iid1
2121 )cos(1)(
Example based learning
bull Construct a database of example images with their known angles
bull Given a query image run your favorite feature extractorbull Compute KNN from databasebull Use these KNNs to compute the average angles of the
query
Input queryFind KNN in database of examples
Output Average angles of KNN
Input Query
Features extraction
Processed query
PSH (LSH)
Database of examples
The algorithm flow
LWR (Regression)
Output Match
The image features
B A
Axx 4107 )(
4
3
2
4 0
Image features are multi-scale edge histograms
Feature Extraction PSH LWR
PSH The basic assumption
There are two metric spaces here feature space ( )
and parameter space ( )
We want similarity to be measured in the angles
space whereas LSH works on the feature space
bull Assumption The feature space is closely related to the parameter space
xd
d
Feature Extraction PSH LWR
Insight Manifolds
bull Manifold is a space in which every point has a neighborhood resembling a Euclid space
bull But global structure may be complicated curved
bull For example lines are 1D manifolds planes are 2D manifolds etc
Feature Extraction PSH LWR
Parameters Space (angles)
Feature Space
q
Is this Magic
Parameter Sensitive Hashing (PSH)
The trick
Estimate performance of different hash functions on examples and select those sensitive to
The hash functions are applied in feature space but the KNN are valid in angle space
d
Feature Extraction PSH LWR
Label pairs of examples with similar angles
Define hash functions h on feature space
Feature Extraction PSH LWR
Predict labeling of similarnon-similar examples by using h
Compare labeling
If labeling by h is goodaccept h else change h
PSH as a classification problem
+1 +1 -1 -1
(r=025)
Labels
)1()( if 1
)( if 1y
labeled is
)x()(x examples ofpair A
ij
ji
rd
rd
ji
ji
ji
Feature Extraction PSH LWR
otherwise 1-
T(x) if 1)(
xh T
A binary hash functionfeatures
otherwise 1
if 1ˆ
labels ePredict th
)(xh)(xh)x(xy
jTiTjih
Feature Extraction PSH LWR
Feature
Feature Extraction PSH LWR
sconstraint iesprobabilit with thelabeling true thepredicts that Tbest theFind
themseparateor bin
same in the examplesboth place willTh
)(xT
Local Weighted Regression (LWR)bull Given a query image PSH returns
KNNs
bull LWR uses the KNN to compute a weighted average of the estimated angles of the query
weightdist
iXiixNx
xxdKxgdi
0)(
))(())((minarg0
Feature Extraction PSH LWR
Results
Synthetic data were generated
bull 13 angles 1 for rotation of the torso 12 for joints
bull 150000 images
bull Nuisance parameters added clothing illumination face expression
bull 1775000 example pairs
bull Selected 137 out of 5123 meaningful features (how)
18 bit hash functions (k) 150 hash tables (l)
bull Test on 1000 synthetic examplesbull PSH searched only 34 of the data per query
bull Without selection needed 40 bits and
1000 hash tables
Recall P1 is prob of positive hashP2 is prob of bad hashB is the max number of pts in a bucket
Results ndash real data
bull 800 images
bull Processed by a segmentation algorithm
bull 13 of the data were searched
Results ndash real data
Interesting mismatches
Fast pose estimation - summary
bull Fast way to compute the angles of human body figure
bull Moving from one representation space to another
bull Training a sensitive hash function
bull KNN smart averaging
Food for Thought
bull The basic assumption may be problematic (distance metric representations)
bull The training set should be dense
bull Texture and clutter
bull General some features are more important than others and should be weighted
Food for Thought Point Location in Different Spheres (PLDS)
bull Given n spheres in Rd centered at P=p1hellippn
with radii r1helliprn
bull Goal given a query q preprocess the points in P to find point pi that its sphere lsquocoverrsquo the query q
qpi
ri
Courtesy of Mohamad Hegaze
Motivationbull Clustering high dimensional data by using local
density measurements (eg feature space)bull Statistical curse of dimensionality
sparseness of the databull Computational curse of dimensionality
expensive range queriesbull LSH parameters should be adjusted for optimal
performance
Mean-Shift Based Clustering in High Dimensions A Texture Classification Example
B Georgescu I Shimshoni and P Meer
Outline
bull Mean-shift in a nutshell + examples
Our scope
bull Mean-shift in high dimensions ndash using LSH
bull Speedups1 Finding optimal LSH parameters
2 Data-driven partitions into buckets
3 Additional speedup by using LSH data structure
Mean-Shift in a Nutshellbandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
point
KNN in mean-shift
Bandwidth should be inversely proportional to the density in the region
high density - small bandwidth low density - large bandwidth
Based on kth nearest neighbor of the point
The bandwidth is
Adaptive mean-shift vs non-adaptive
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Image segmentation algorithm1 Input Data in 5D (3 color + 2 xy) or 3D (1 gray +2 xy)2 Resolution controlled by the bandwidth hs (spatial) hr (color)3 Apply filtering
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
3D
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Image segmentation algorithm
original segmented
filtered
Filtering pixel value of the nearest mode
Mean-shift trajectories
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
original squirrel filtered
original baboon filtered
Filtering examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Segmentation examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Mean-shift in high dimensions
Computational curse of dimensionality
Statistical curse of dimensionality
Expensive range queries implemented with LSH
Sparseness of the data variable bandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
LSH-based data structure
bull Choose L random partitionsEach partition includes K pairs
(dkvk)bull For each point we check
kdi vxK
It Partitions the data into cells
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing the optimal K and L
bull For a query q compute smallest number of distances to points in its buckets
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
points extra includemight big toois L ifbut
missed bemight points small toois L If
cell ain points ofnumber smaller k Large
C
l
l
CC
dC
LNN
dKnN
)1(
C
C
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
structure data theof resolution thedetermines
decreases but increases increases L As
C
CC
Choosing optimal K and LDetermine accurately the KNN for m randomly-selected data points
distance (bandwidth)
Choose error threshold
The optimal K and L should satisfy
the approximate distance
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing optimal K and Lbull For each K estimate the error forbull In one run for all Lrsquos find the minimal L satisfying the constraint L(K)bull Minimize time t(KL(K))
minimum
Approximationerror for KL
L(K) for =005 Running timet[KL(K)]
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Data driven partitions
bull In original LSH cut values are random in the range of the databull Suggestion Randomly select a point from the data and use one of its coordinates as the cut value
uniform data driven pointsbucket distribution
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Additional speedup
aggregate)an of typea like is (C mode same
the toconverge willCin points all that Assume
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
C
C
Speedup results
65536 points 1638 points sampled k=100
Food for thought
Low dimension High dimension
A thought for foodhellipbull Choose K L by sample learning or take the traditionalbull Can one estimate K L without samplingbull A thought for food does it help to know the data
dimensionality or the data manifoldbull Intuitively dimensionality implies the number of hash
functions neededbull The catch efficient dimensionality learning requires
KNN
1530 cookieshellip
Summary
bull LSH suggests a compromise on accuracy for the gain of complexity
bull Applications that involve massive data in high dimension require the LSH fast performance
bull Extension of the LSH to different spaces (PSH)
bull Learning the LSH parameters and hash functions for different applications
Conclusion
bull but at the endeverything depends on your data set
bull Try it at homendash Visit
httpwebmiteduandoniwwwLSHindexhtmlndash Email Alex Andoni Andonimitedundash Test over your own data
(C code under Red Hat Linux )
Thanks
bull Ilan Shimshoni (Haifa)
bull Mohamad Hegaze (Weizmann)
bull Alex Andoni (MIT)
bull Mica and Denis
- k-Nearest Neighbors Search in High Dimensions
- Outline
- Nearest Neighbor Search Problem definition
- Applications
- Naiumlve solution
- Common solution
- When to use nearest neighbor
- Nearest Neighbor
- r - Nearest Neighbor
- Slide 10
- The simplest solution
- Quadtree
- Quadtree - structure
- Quadtree - Query
- Quadtree ndash Pitfall1
- Slide 16
- Quadtree ndash pitfall 2
- Space partition based algorithms
- Slide 19
- Curse of dimensionality
- Curse of dimensionality Some intuition
- Slide 22
- Preview
- Hash function
- Slide 25
- Slide 26
- Recall r - Nearest Neighbor
- Locality sensitive hashing
- Slide 29
- Hamming Space
- Slide 31
- L1 to Hamming Space Embedding
- Slide 33
- Construction
- Query
- Alternative intuition random projections
- Slide 37
- Slide 38
- Slide 39
- k samplings
- Repeating
- Repeating L times
- Slide 43
- Secondary hashing
- The above hashing is locality-sensitive
- Slide 46
- Direct L2 solution
- Central limit theorem
- Slide 49
- Slide 50
- Norm Distance
- Slide 52
- The full Hashing
- Slide 54
- Slide 55
- Slide 56
- Generalization P-Stable distribution
- P-Stable summary
- Parameters selection
- Parameters selection hellip
- hellip Parameters selection
- Pros amp Cons
- Conclusion
- LSH - Applications
- Motivation
- Slide 66
- Given an image x what are the parameters θ in this image ie angles of joints orientation of the body etc1048698
- Ingredients
- Example based learning
- Slide 70
- The image features
- PSH The basic assumption
- Insight Manifolds
- Slide 74
- Parameter Sensitive Hashing (PSH)
- Slide 76
- Slide 77
- Slide 78
- Slide 79
- Local Weighted Regression (LWR)
- Results
- Slide 82
- Results ndash real data
- Slide 84
- Slide 85
- Fast pose estimation - summary
- Food for Thought
- Food for Thought Point Location in Different Spheres (PLDS)
- Motivation
- Slide 90
- Slide 91
- Slide 92
- Slide 93
- Image segmentation algorithm
- Slide 95
- Filtering examples
- Segmentation examples
- Mean-shift in high dimensions
- LSH-based data structure
- Choosing the optimal K and L
- Slide 101
- Choosing optimal K and L
- Slide 103
- Data driven partitions
- Additional speedup
- Speedup results
- Food for thought
- A thought for foodhellip
- Summary
- Slide 110
- Thanks
-
![Page 15: K-Nearest Neighbors Search in High Dimensions Tomer Peled Dan Kushnir Tell me who your neighbors are, and I'll know who you are.](https://reader036.fdocuments.us/reader036/viewer/2022062421/56649c765503460f94929f53/html5/thumbnails/15.jpg)
Quadtree ndash Pitfall1
X
Y
In some cases doesnrsquot
X1Y1PgeX1PgeY1
PltX1
PltX1PltY1 PgeX1
PltY1PltX1PgeY1
X1Y1
Quadtree ndash Pitfall1
X
Y
In some cases nothing works
Quadtree ndash pitfall 2X
Y
O(2d)
Could result in Query time Exponential in dimensions
Space partition based algorithms
Multidimensional access methods Volker Gaede O Gunther
Could be improved
Outline
bullProblem definition and flavorsbullAlgorithms overview - low dimensions bullCurse of dimensionality (dgt1020)Curse of dimensionality (dgt1020)bullEnchanting the curse
Locality Sensitive Hashing (high dimension approximate solutions)
bulll2 extensionbullApplications (Dan)
Curse of dimensionality
bullQuery time or spaceO(nd)bullDgt1020 worst than sequential scan
ndashFor most geometric distributionsbullTechniques specific to high dimensions are needed
bullProoved in theory and in practice by Barkol amp Rabani 2000 amp Beame-Vee 2002
O( min(nd nd) )Naive
Curse of dimensionalitySome intuition
2
22
23
2d
Outline
bullProblem definition and flavorsbullAlgorithms overview - low dimensions bullCurse of dimensionality (dgt1020)bullEnchanting the curse Enchanting the curse
Locality Sensitive Hashing Locality Sensitive Hashing (high dimension approximate solutions)
bulll2 extensionbullApplications (Dan)
Preview
bullGeneral Solution ndash Locality sensitive hashing
bullImplementation for Hamming space
bullGeneralization to l1 amp l2
Hash function
Hash function
Hash function
Data_Item
Key
BinBucket
Hash function
X modulo 3
X=Number in the range 0n
02
Storage Address
Data structure
0
Usually we would like related Data-items to be stored at the same bin
Recall r - Nearest Neighbor
r
(1 + ) r
dist(qp1) r
dist(qp2) (1 + ) r r2=(1 + ) r1
Locality sensitive hashing
r(1 + ) r
(r p1p2 )Sensitiveequiv Pr[I(p)=I(q)] is ldquohighrdquo if p is ldquocloserdquo to qequiv Pr[I(p)=I(q)] is ldquolowrdquo if p isrdquofarrdquo from q
r2=(1 + ) r1
P1P2
Preview
bullGeneral Solution ndash Locality sensitive hashing
bullImplementation for Hamming space
bullGeneralization to l1 amp l2
Hamming Space
bullHamming space = 2N binary strings
bullHamming distance = changed digits
aka Signal distanceRichard Hamming
Hamming SpaceN
010100001111
010100001111
010010000011Distance = 4
bullHamming space
bullHamming distance
SUM(X1 XOR X2)
L1 to Hamming Space Embedding
p
8
C=11
1111111100011000000000
2
1111111100011000000000
drsquo=Cd
Hash function
Lj Hash function
p Hdrsquoisin
Gj(p)=p|Ij
j=1L k=3 digits
Bits sampling from p
Store p into bucket p|Ij 2k buckets101
11000000000 111111110000 111000000000 111111110001
Construction
1 2 L
p
Query
1 2 L
q
Alternative intuition random projections
p
8
C=11
1111111100011000000000
2
1111111100011000000000
drsquo=Cd
Alternative intuition random projections
8
C=11
1111111100011000000000
2
1111111100011000000000
p
Alternative intuition random projections
8
C=11
1111111100011000000000
2
1111111100011000000000
p
Alternative intuition random projections
101
11000000000 111111110000 111000000000 111111110001
000
100
110
001
101
111
2233 BucketsBucketsp
k samplings
Repeating
Repeating L times
Repeating L times
Secondary hashing
Support volume tuning
dataset-size vs storage volume
2k buckets
011
Size=B
M Buckets
Simple Hashing
MB=αn α=2
Skip
The above hashing is locality-sensitive
bullProbability (pq in same bucket)=
k=1 k=2
Distance (qpi) Distance (qpi)
Pro
babi
lity Pr
Adopted from Piotr Indykrsquos slides
kqp
dimensions
)(Distance1
Preview
bullGeneral Solution ndash Locality sensitive hashing
bullImplementation for Hamming space
bullGeneralization to l2
Direct L2 solution
bullNew hashing function
bullStill based on sampling
bullUsing mathematical trick
bullP-stable distribution for Lp distance bullGaussian distribution for L2 distance
Central limit theorem
v1 +v2 hellip+vn =+hellip
(Weighted Gaussians) = Weighted Gaussian
Central limit theorem
v1vn = Real Numbers
X1Xn = Independent Identically Distributed(iid)
+v2 X2 hellip+vn Xn =+hellipv1 X1
Central limit theorem
XvXvi
ii
ii
21
2||
Dot Product Norm
Norm Distance
XvuXvXui
iii
iii
ii
21
2||
Features vector 1
Features vector 2 Distance
Norm Distance
XvuXvXui
iii
iii
ii
21
2||
Dot Product
Dot Product Distance
The full Hashing
w
bvavh ba )(
[34 82 21]1
227742
d
d random numbers
+b
phaseRandom[0w]
wDiscretization step
Features vector
The full Hashing
w
bvavh ba )(
+34
100
7944
7900 8000 8100 82007800
The full Hashing
w
bvavh ba )(
+34
phaseRandom[0w]
100Discretization step
7944
The full Hashing
w
bvavh ba )(
a1 v d
iid from p-stable distribution
+b
phaseRandom[0w]
wDiscretization step
Features vector
Generalization P-Stable distribution
bullLp p=eps2
bullGeneralized Central Limit Theorem
bullP-stable distributionCauchy for L2
bullL2
bullCentral Limit Theorem
bullGaussian (normal) distribution
P-Stable summary
bullWorks for bullGeneralizes to 0ltplt=2
bullImproves query time
Query time = O (dn1(1+)log(n) ) O (dn1(1+)^2log(n) )
r - Nearest Neighbor
Latest resultsReported in Email by
Alexander Andoni
Parameters selection
bull90 Probability Best quarry time performance
For Euclidean Space
Parameters selectionhellip
For Euclidean Space
bullSingle projection hit an - Nearest Neighbor with Pr=p1
bullk projections hits an - Nearest Neighbor with Pr=p1k
bullL hashings fail to collide with Pr=(1-p1k)L
bullTo ensure Collision (eg 1-δge90)
bull1( -1-p1k)Lge 1-δ)1log(
)log(
1kp
L
L
Reject Non-NeighborsAccept Neighbors
hellipParameters selection
K
k
time Candidates verification Candidates extraction
Better Query Time than Spatial Data Structures
Scales well to higher dimensions and larger data size ( Sub-linear dependence )
Predictable running time
Extra storage over-head
Inefficient for data with distances concentrated around average
works best for Hamming distance (although can be generalized to Euclidean space)
In secondary storage linear scan is pretty much all we can do (for high dim)
requires radius r to be fixed in advance
Pros amp Cons
From Pioter Indyk slides
Conclusion
bullbut at the endeverything depends on your data set
bullTry it at homendashVisit
httpwebmiteduandoniwwwLSHindexhtml
ndashEmail Alex AndoniAndonimitedundashTest over your own data
(C code under Red Hat Linux )
LSH - Applicationsbull Searching video clips in databases (Hierarchical Non-Uniform Locality Sensitive
Hashing and Its Application to Video Identificationldquo Yang Ooi Sun)
bull Searching image databases (see the following)
bull Image segmentation (see the following)
bull Image classification (ldquoDiscriminant adaptive Nearest Neighbor Classificationrdquo T Hastie R Tibshirani)
bull Texture classification (see the following)
bull Clustering (see the following)
bull Embedding and manifold learning (LLE and many others)
bull Compression ndash vector quantizationbull Search engines (ldquoLSH Forest SelfTuning Indexes for Similarity Searchrdquo M Bawa T Condie P Ganesanrdquo)
bull Genomics (ldquoEfficient Large-Scale Sequence Comparison by Locality-Sensitive Hashingrdquo J Buhler)
bull In short whenever K-Nearest Neighbors (KNN) are needed
Motivation
bull A variety of procedures in learning require KNN computation
bull KNN search is a computational bottleneck
bull LSH provides a fast approximate solution to the problem
bull LSH requires hash function construction and parameter tunning
Outline
Fast Pose Estimation with Parameter Sensitive Hashing G Shakhnarovich P Viola and T Darrell
bull Finding sensitive hash functions
Mean Shift Based Clustering in HighDimensions A Texture Classification Example
B Georgescu I Shimshoni and P Meer
bull Tuning LSH parametersbull LSH data structure is used for algorithm
speedups
Given an image x what are the parameters θ in this image
ie angles of joints orientation of the body etc1048698
The Problem
Fast Pose Estimation with Parameter Sensitive Hashing
G Shakhnarovich P Viola and T Darrell
i
Ingredients
bull Input query image with unknown angles (parameters)
bull Database of human poses with known anglesbull Image feature extractor ndash edge detector
bull Distance metric in feature space dx
bull Distance metric in angles space
m
i
iid1
2121 )cos(1)(
Example based learning
bull Construct a database of example images with their known angles
bull Given a query image run your favorite feature extractorbull Compute KNN from databasebull Use these KNNs to compute the average angles of the
query
Input queryFind KNN in database of examples
Output Average angles of KNN
Input Query
Features extraction
Processed query
PSH (LSH)
Database of examples
The algorithm flow
LWR (Regression)
Output Match
The image features
B A
Axx 4107 )(
4
3
2
4 0
Image features are multi-scale edge histograms
Feature Extraction PSH LWR
PSH The basic assumption
There are two metric spaces here feature space ( )
and parameter space ( )
We want similarity to be measured in the angles
space whereas LSH works on the feature space
bull Assumption The feature space is closely related to the parameter space
xd
d
Feature Extraction PSH LWR
Insight Manifolds
bull Manifold is a space in which every point has a neighborhood resembling a Euclid space
bull But global structure may be complicated curved
bull For example lines are 1D manifolds planes are 2D manifolds etc
Feature Extraction PSH LWR
Parameters Space (angles)
Feature Space
q
Is this Magic
Parameter Sensitive Hashing (PSH)
The trick
Estimate performance of different hash functions on examples and select those sensitive to
The hash functions are applied in feature space but the KNN are valid in angle space
d
Feature Extraction PSH LWR
Label pairs of examples with similar angles
Define hash functions h on feature space
Feature Extraction PSH LWR
Predict labeling of similarnon-similar examples by using h
Compare labeling
If labeling by h is goodaccept h else change h
PSH as a classification problem
+1 +1 -1 -1
(r=025)
Labels
)1()( if 1
)( if 1y
labeled is
)x()(x examples ofpair A
ij
ji
rd
rd
ji
ji
ji
Feature Extraction PSH LWR
otherwise 1-
T(x) if 1)(
xh T
A binary hash functionfeatures
otherwise 1
if 1ˆ
labels ePredict th
)(xh)(xh)x(xy
jTiTjih
Feature Extraction PSH LWR
Feature
Feature Extraction PSH LWR
sconstraint iesprobabilit with thelabeling true thepredicts that Tbest theFind
themseparateor bin
same in the examplesboth place willTh
)(xT
Local Weighted Regression (LWR)bull Given a query image PSH returns
KNNs
bull LWR uses the KNN to compute a weighted average of the estimated angles of the query
weightdist
iXiixNx
xxdKxgdi
0)(
))(())((minarg0
Feature Extraction PSH LWR
Results
Synthetic data were generated
bull 13 angles 1 for rotation of the torso 12 for joints
bull 150000 images
bull Nuisance parameters added clothing illumination face expression
bull 1775000 example pairs
bull Selected 137 out of 5123 meaningful features (how)
18 bit hash functions (k) 150 hash tables (l)
bull Test on 1000 synthetic examplesbull PSH searched only 34 of the data per query
bull Without selection needed 40 bits and
1000 hash tables
Recall P1 is prob of positive hashP2 is prob of bad hashB is the max number of pts in a bucket
Results ndash real data
bull 800 images
bull Processed by a segmentation algorithm
bull 13 of the data were searched
Results ndash real data
Interesting mismatches
Fast pose estimation - summary
bull Fast way to compute the angles of human body figure
bull Moving from one representation space to another
bull Training a sensitive hash function
bull KNN smart averaging
Food for Thought
bull The basic assumption may be problematic (distance metric representations)
bull The training set should be dense
bull Texture and clutter
bull General some features are more important than others and should be weighted
Food for Thought Point Location in Different Spheres (PLDS)
bull Given n spheres in Rd centered at P=p1hellippn
with radii r1helliprn
bull Goal given a query q preprocess the points in P to find point pi that its sphere lsquocoverrsquo the query q
qpi
ri
Courtesy of Mohamad Hegaze
Motivationbull Clustering high dimensional data by using local
density measurements (eg feature space)bull Statistical curse of dimensionality
sparseness of the databull Computational curse of dimensionality
expensive range queriesbull LSH parameters should be adjusted for optimal
performance
Mean-Shift Based Clustering in High Dimensions A Texture Classification Example
B Georgescu I Shimshoni and P Meer
Outline
bull Mean-shift in a nutshell + examples
Our scope
bull Mean-shift in high dimensions ndash using LSH
bull Speedups1 Finding optimal LSH parameters
2 Data-driven partitions into buckets
3 Additional speedup by using LSH data structure
Mean-Shift in a Nutshellbandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
point
KNN in mean-shift
Bandwidth should be inversely proportional to the density in the region
high density - small bandwidth low density - large bandwidth
Based on kth nearest neighbor of the point
The bandwidth is
Adaptive mean-shift vs non-adaptive
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Image segmentation algorithm1 Input Data in 5D (3 color + 2 xy) or 3D (1 gray +2 xy)2 Resolution controlled by the bandwidth hs (spatial) hr (color)3 Apply filtering
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
3D
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Image segmentation algorithm
original segmented
filtered
Filtering pixel value of the nearest mode
Mean-shift trajectories
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
original squirrel filtered
original baboon filtered
Filtering examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Segmentation examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Mean-shift in high dimensions
Computational curse of dimensionality
Statistical curse of dimensionality
Expensive range queries implemented with LSH
Sparseness of the data variable bandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
LSH-based data structure
bull Choose L random partitionsEach partition includes K pairs
(dkvk)bull For each point we check
kdi vxK
It Partitions the data into cells
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing the optimal K and L
bull For a query q compute smallest number of distances to points in its buckets
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
points extra includemight big toois L ifbut
missed bemight points small toois L If
cell ain points ofnumber smaller k Large
C
l
l
CC
dC
LNN
dKnN
)1(
C
C
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
structure data theof resolution thedetermines
decreases but increases increases L As
C
CC
Choosing optimal K and LDetermine accurately the KNN for m randomly-selected data points
distance (bandwidth)
Choose error threshold
The optimal K and L should satisfy
the approximate distance
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing optimal K and Lbull For each K estimate the error forbull In one run for all Lrsquos find the minimal L satisfying the constraint L(K)bull Minimize time t(KL(K))
minimum
Approximationerror for KL
L(K) for =005 Running timet[KL(K)]
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Data driven partitions
bull In original LSH cut values are random in the range of the databull Suggestion Randomly select a point from the data and use one of its coordinates as the cut value
uniform data driven pointsbucket distribution
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Additional speedup
aggregate)an of typea like is (C mode same
the toconverge willCin points all that Assume
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
C
C
Speedup results
65536 points 1638 points sampled k=100
Food for thought
Low dimension High dimension
A thought for foodhellipbull Choose K L by sample learning or take the traditionalbull Can one estimate K L without samplingbull A thought for food does it help to know the data
dimensionality or the data manifoldbull Intuitively dimensionality implies the number of hash
functions neededbull The catch efficient dimensionality learning requires
KNN
1530 cookieshellip
Summary
bull LSH suggests a compromise on accuracy for the gain of complexity
bull Applications that involve massive data in high dimension require the LSH fast performance
bull Extension of the LSH to different spaces (PSH)
bull Learning the LSH parameters and hash functions for different applications
Conclusion
bull but at the endeverything depends on your data set
bull Try it at homendash Visit
httpwebmiteduandoniwwwLSHindexhtmlndash Email Alex Andoni Andonimitedundash Test over your own data
(C code under Red Hat Linux )
Thanks
bull Ilan Shimshoni (Haifa)
bull Mohamad Hegaze (Weizmann)
bull Alex Andoni (MIT)
bull Mica and Denis
- k-Nearest Neighbors Search in High Dimensions
- Outline
- Nearest Neighbor Search Problem definition
- Applications
- Naiumlve solution
- Common solution
- When to use nearest neighbor
- Nearest Neighbor
- r - Nearest Neighbor
- Slide 10
- The simplest solution
- Quadtree
- Quadtree - structure
- Quadtree - Query
- Quadtree ndash Pitfall1
- Slide 16
- Quadtree ndash pitfall 2
- Space partition based algorithms
- Slide 19
- Curse of dimensionality
- Curse of dimensionality Some intuition
- Slide 22
- Preview
- Hash function
- Slide 25
- Slide 26
- Recall r - Nearest Neighbor
- Locality sensitive hashing
- Slide 29
- Hamming Space
- Slide 31
- L1 to Hamming Space Embedding
- Slide 33
- Construction
- Query
- Alternative intuition random projections
- Slide 37
- Slide 38
- Slide 39
- k samplings
- Repeating
- Repeating L times
- Slide 43
- Secondary hashing
- The above hashing is locality-sensitive
- Slide 46
- Direct L2 solution
- Central limit theorem
- Slide 49
- Slide 50
- Norm Distance
- Slide 52
- The full Hashing
- Slide 54
- Slide 55
- Slide 56
- Generalization P-Stable distribution
- P-Stable summary
- Parameters selection
- Parameters selection hellip
- hellip Parameters selection
- Pros amp Cons
- Conclusion
- LSH - Applications
- Motivation
- Slide 66
- Given an image x what are the parameters θ in this image ie angles of joints orientation of the body etc1048698
- Ingredients
- Example based learning
- Slide 70
- The image features
- PSH The basic assumption
- Insight Manifolds
- Slide 74
- Parameter Sensitive Hashing (PSH)
- Slide 76
- Slide 77
- Slide 78
- Slide 79
- Local Weighted Regression (LWR)
- Results
- Slide 82
- Results ndash real data
- Slide 84
- Slide 85
- Fast pose estimation - summary
- Food for Thought
- Food for Thought Point Location in Different Spheres (PLDS)
- Motivation
- Slide 90
- Slide 91
- Slide 92
- Slide 93
- Image segmentation algorithm
- Slide 95
- Filtering examples
- Segmentation examples
- Mean-shift in high dimensions
- LSH-based data structure
- Choosing the optimal K and L
- Slide 101
- Choosing optimal K and L
- Slide 103
- Data driven partitions
- Additional speedup
- Speedup results
- Food for thought
- A thought for foodhellip
- Summary
- Slide 110
- Thanks
-
![Page 16: K-Nearest Neighbors Search in High Dimensions Tomer Peled Dan Kushnir Tell me who your neighbors are, and I'll know who you are.](https://reader036.fdocuments.us/reader036/viewer/2022062421/56649c765503460f94929f53/html5/thumbnails/16.jpg)
Quadtree ndash Pitfall1
X
Y
In some cases nothing works
Quadtree ndash pitfall 2X
Y
O(2d)
Could result in Query time Exponential in dimensions
Space partition based algorithms
Multidimensional access methods Volker Gaede O Gunther
Could be improved
Outline
bullProblem definition and flavorsbullAlgorithms overview - low dimensions bullCurse of dimensionality (dgt1020)Curse of dimensionality (dgt1020)bullEnchanting the curse
Locality Sensitive Hashing (high dimension approximate solutions)
bulll2 extensionbullApplications (Dan)
Curse of dimensionality
bullQuery time or spaceO(nd)bullDgt1020 worst than sequential scan
ndashFor most geometric distributionsbullTechniques specific to high dimensions are needed
bullProoved in theory and in practice by Barkol amp Rabani 2000 amp Beame-Vee 2002
O( min(nd nd) )Naive
Curse of dimensionalitySome intuition
2
22
23
2d
Outline
bullProblem definition and flavorsbullAlgorithms overview - low dimensions bullCurse of dimensionality (dgt1020)bullEnchanting the curse Enchanting the curse
Locality Sensitive Hashing Locality Sensitive Hashing (high dimension approximate solutions)
bulll2 extensionbullApplications (Dan)
Preview
bullGeneral Solution ndash Locality sensitive hashing
bullImplementation for Hamming space
bullGeneralization to l1 amp l2
Hash function
Hash function
Hash function
Data_Item
Key
BinBucket
Hash function
X modulo 3
X=Number in the range 0n
02
Storage Address
Data structure
0
Usually we would like related Data-items to be stored at the same bin
Recall r - Nearest Neighbor
r
(1 + ) r
dist(qp1) r
dist(qp2) (1 + ) r r2=(1 + ) r1
Locality sensitive hashing
r(1 + ) r
(r p1p2 )Sensitiveequiv Pr[I(p)=I(q)] is ldquohighrdquo if p is ldquocloserdquo to qequiv Pr[I(p)=I(q)] is ldquolowrdquo if p isrdquofarrdquo from q
r2=(1 + ) r1
P1P2
Preview
bullGeneral Solution ndash Locality sensitive hashing
bullImplementation for Hamming space
bullGeneralization to l1 amp l2
Hamming Space
bullHamming space = 2N binary strings
bullHamming distance = changed digits
aka Signal distanceRichard Hamming
Hamming SpaceN
010100001111
010100001111
010010000011Distance = 4
bullHamming space
bullHamming distance
SUM(X1 XOR X2)
L1 to Hamming Space Embedding
p
8
C=11
1111111100011000000000
2
1111111100011000000000
drsquo=Cd
Hash function
Lj Hash function
p Hdrsquoisin
Gj(p)=p|Ij
j=1L k=3 digits
Bits sampling from p
Store p into bucket p|Ij 2k buckets101
11000000000 111111110000 111000000000 111111110001
Construction
1 2 L
p
Query
1 2 L
q
Alternative intuition random projections
p
8
C=11
1111111100011000000000
2
1111111100011000000000
drsquo=Cd
Alternative intuition random projections
8
C=11
1111111100011000000000
2
1111111100011000000000
p
Alternative intuition random projections
8
C=11
1111111100011000000000
2
1111111100011000000000
p
Alternative intuition random projections
101
11000000000 111111110000 111000000000 111111110001
000
100
110
001
101
111
2233 BucketsBucketsp
k samplings
Repeating
Repeating L times
Repeating L times
Secondary hashing
Support volume tuning
dataset-size vs storage volume
2k buckets
011
Size=B
M Buckets
Simple Hashing
MB=αn α=2
Skip
The above hashing is locality-sensitive
bullProbability (pq in same bucket)=
k=1 k=2
Distance (qpi) Distance (qpi)
Pro
babi
lity Pr
Adopted from Piotr Indykrsquos slides
kqp
dimensions
)(Distance1
Preview
bullGeneral Solution ndash Locality sensitive hashing
bullImplementation for Hamming space
bullGeneralization to l2
Direct L2 solution
bullNew hashing function
bullStill based on sampling
bullUsing mathematical trick
bullP-stable distribution for Lp distance bullGaussian distribution for L2 distance
Central limit theorem
v1 +v2 hellip+vn =+hellip
(Weighted Gaussians) = Weighted Gaussian
Central limit theorem
v1vn = Real Numbers
X1Xn = Independent Identically Distributed(iid)
+v2 X2 hellip+vn Xn =+hellipv1 X1
Central limit theorem
XvXvi
ii
ii
21
2||
Dot Product Norm
Norm Distance
XvuXvXui
iii
iii
ii
21
2||
Features vector 1
Features vector 2 Distance
Norm Distance
XvuXvXui
iii
iii
ii
21
2||
Dot Product
Dot Product Distance
The full Hashing
w
bvavh ba )(
[34 82 21]1
227742
d
d random numbers
+b
phaseRandom[0w]
wDiscretization step
Features vector
The full Hashing
w
bvavh ba )(
+34
100
7944
7900 8000 8100 82007800
The full Hashing
w
bvavh ba )(
+34
phaseRandom[0w]
100Discretization step
7944
The full Hashing
w
bvavh ba )(
a1 v d
iid from p-stable distribution
+b
phaseRandom[0w]
wDiscretization step
Features vector
Generalization P-Stable distribution
bullLp p=eps2
bullGeneralized Central Limit Theorem
bullP-stable distributionCauchy for L2
bullL2
bullCentral Limit Theorem
bullGaussian (normal) distribution
P-Stable summary
bullWorks for bullGeneralizes to 0ltplt=2
bullImproves query time
Query time = O (dn1(1+)log(n) ) O (dn1(1+)^2log(n) )
r - Nearest Neighbor
Latest resultsReported in Email by
Alexander Andoni
Parameters selection
bull90 Probability Best quarry time performance
For Euclidean Space
Parameters selectionhellip
For Euclidean Space
bullSingle projection hit an - Nearest Neighbor with Pr=p1
bullk projections hits an - Nearest Neighbor with Pr=p1k
bullL hashings fail to collide with Pr=(1-p1k)L
bullTo ensure Collision (eg 1-δge90)
bull1( -1-p1k)Lge 1-δ)1log(
)log(
1kp
L
L
Reject Non-NeighborsAccept Neighbors
hellipParameters selection
K
k
time Candidates verification Candidates extraction
Better Query Time than Spatial Data Structures
Scales well to higher dimensions and larger data size ( Sub-linear dependence )
Predictable running time
Extra storage over-head
Inefficient for data with distances concentrated around average
works best for Hamming distance (although can be generalized to Euclidean space)
In secondary storage linear scan is pretty much all we can do (for high dim)
requires radius r to be fixed in advance
Pros amp Cons
From Pioter Indyk slides
Conclusion
bullbut at the endeverything depends on your data set
bullTry it at homendashVisit
httpwebmiteduandoniwwwLSHindexhtml
ndashEmail Alex AndoniAndonimitedundashTest over your own data
(C code under Red Hat Linux )
LSH - Applicationsbull Searching video clips in databases (Hierarchical Non-Uniform Locality Sensitive
Hashing and Its Application to Video Identificationldquo Yang Ooi Sun)
bull Searching image databases (see the following)
bull Image segmentation (see the following)
bull Image classification (ldquoDiscriminant adaptive Nearest Neighbor Classificationrdquo T Hastie R Tibshirani)
bull Texture classification (see the following)
bull Clustering (see the following)
bull Embedding and manifold learning (LLE and many others)
bull Compression ndash vector quantizationbull Search engines (ldquoLSH Forest SelfTuning Indexes for Similarity Searchrdquo M Bawa T Condie P Ganesanrdquo)
bull Genomics (ldquoEfficient Large-Scale Sequence Comparison by Locality-Sensitive Hashingrdquo J Buhler)
bull In short whenever K-Nearest Neighbors (KNN) are needed
Motivation
bull A variety of procedures in learning require KNN computation
bull KNN search is a computational bottleneck
bull LSH provides a fast approximate solution to the problem
bull LSH requires hash function construction and parameter tunning
Outline
Fast Pose Estimation with Parameter Sensitive Hashing G Shakhnarovich P Viola and T Darrell
bull Finding sensitive hash functions
Mean Shift Based Clustering in HighDimensions A Texture Classification Example
B Georgescu I Shimshoni and P Meer
bull Tuning LSH parametersbull LSH data structure is used for algorithm
speedups
Given an image x what are the parameters θ in this image
ie angles of joints orientation of the body etc1048698
The Problem
Fast Pose Estimation with Parameter Sensitive Hashing
G Shakhnarovich P Viola and T Darrell
i
Ingredients
bull Input query image with unknown angles (parameters)
bull Database of human poses with known anglesbull Image feature extractor ndash edge detector
bull Distance metric in feature space dx
bull Distance metric in angles space
m
i
iid1
2121 )cos(1)(
Example based learning
bull Construct a database of example images with their known angles
bull Given a query image run your favorite feature extractorbull Compute KNN from databasebull Use these KNNs to compute the average angles of the
query
Input queryFind KNN in database of examples
Output Average angles of KNN
Input Query
Features extraction
Processed query
PSH (LSH)
Database of examples
The algorithm flow
LWR (Regression)
Output Match
The image features
B A
Axx 4107 )(
4
3
2
4 0
Image features are multi-scale edge histograms
Feature Extraction PSH LWR
PSH The basic assumption
There are two metric spaces here feature space ( )
and parameter space ( )
We want similarity to be measured in the angles
space whereas LSH works on the feature space
bull Assumption The feature space is closely related to the parameter space
xd
d
Feature Extraction PSH LWR
Insight Manifolds
bull Manifold is a space in which every point has a neighborhood resembling a Euclid space
bull But global structure may be complicated curved
bull For example lines are 1D manifolds planes are 2D manifolds etc
Feature Extraction PSH LWR
Parameters Space (angles)
Feature Space
q
Is this Magic
Parameter Sensitive Hashing (PSH)
The trick
Estimate performance of different hash functions on examples and select those sensitive to
The hash functions are applied in feature space but the KNN are valid in angle space
d
Feature Extraction PSH LWR
Label pairs of examples with similar angles
Define hash functions h on feature space
Feature Extraction PSH LWR
Predict labeling of similarnon-similar examples by using h
Compare labeling
If labeling by h is goodaccept h else change h
PSH as a classification problem
+1 +1 -1 -1
(r=025)
Labels
)1()( if 1
)( if 1y
labeled is
)x()(x examples ofpair A
ij
ji
rd
rd
ji
ji
ji
Feature Extraction PSH LWR
otherwise 1-
T(x) if 1)(
xh T
A binary hash functionfeatures
otherwise 1
if 1ˆ
labels ePredict th
)(xh)(xh)x(xy
jTiTjih
Feature Extraction PSH LWR
Feature
Feature Extraction PSH LWR
sconstraint iesprobabilit with thelabeling true thepredicts that Tbest theFind
themseparateor bin
same in the examplesboth place willTh
)(xT
Local Weighted Regression (LWR)bull Given a query image PSH returns
KNNs
bull LWR uses the KNN to compute a weighted average of the estimated angles of the query
weightdist
iXiixNx
xxdKxgdi
0)(
))(())((minarg0
Feature Extraction PSH LWR
Results
Synthetic data were generated
bull 13 angles 1 for rotation of the torso 12 for joints
bull 150000 images
bull Nuisance parameters added clothing illumination face expression
bull 1775000 example pairs
bull Selected 137 out of 5123 meaningful features (how)
18 bit hash functions (k) 150 hash tables (l)
bull Test on 1000 synthetic examplesbull PSH searched only 34 of the data per query
bull Without selection needed 40 bits and
1000 hash tables
Recall P1 is prob of positive hashP2 is prob of bad hashB is the max number of pts in a bucket
Results ndash real data
bull 800 images
bull Processed by a segmentation algorithm
bull 13 of the data were searched
Results ndash real data
Interesting mismatches
Fast pose estimation - summary
bull Fast way to compute the angles of human body figure
bull Moving from one representation space to another
bull Training a sensitive hash function
bull KNN smart averaging
Food for Thought
bull The basic assumption may be problematic (distance metric representations)
bull The training set should be dense
bull Texture and clutter
bull General some features are more important than others and should be weighted
Food for Thought Point Location in Different Spheres (PLDS)
bull Given n spheres in Rd centered at P=p1hellippn
with radii r1helliprn
bull Goal given a query q preprocess the points in P to find point pi that its sphere lsquocoverrsquo the query q
qpi
ri
Courtesy of Mohamad Hegaze
Motivationbull Clustering high dimensional data by using local
density measurements (eg feature space)bull Statistical curse of dimensionality
sparseness of the databull Computational curse of dimensionality
expensive range queriesbull LSH parameters should be adjusted for optimal
performance
Mean-Shift Based Clustering in High Dimensions A Texture Classification Example
B Georgescu I Shimshoni and P Meer
Outline
bull Mean-shift in a nutshell + examples
Our scope
bull Mean-shift in high dimensions ndash using LSH
bull Speedups1 Finding optimal LSH parameters
2 Data-driven partitions into buckets
3 Additional speedup by using LSH data structure
Mean-Shift in a Nutshellbandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
point
KNN in mean-shift
Bandwidth should be inversely proportional to the density in the region
high density - small bandwidth low density - large bandwidth
Based on kth nearest neighbor of the point
The bandwidth is
Adaptive mean-shift vs non-adaptive
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Image segmentation algorithm1 Input Data in 5D (3 color + 2 xy) or 3D (1 gray +2 xy)2 Resolution controlled by the bandwidth hs (spatial) hr (color)3 Apply filtering
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
3D
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Image segmentation algorithm
original segmented
filtered
Filtering pixel value of the nearest mode
Mean-shift trajectories
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
original squirrel filtered
original baboon filtered
Filtering examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Segmentation examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Mean-shift in high dimensions
Computational curse of dimensionality
Statistical curse of dimensionality
Expensive range queries implemented with LSH
Sparseness of the data variable bandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
LSH-based data structure
bull Choose L random partitionsEach partition includes K pairs
(dkvk)bull For each point we check
kdi vxK
It Partitions the data into cells
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing the optimal K and L
bull For a query q compute smallest number of distances to points in its buckets
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
points extra includemight big toois L ifbut
missed bemight points small toois L If
cell ain points ofnumber smaller k Large
C
l
l
CC
dC
LNN
dKnN
)1(
C
C
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
structure data theof resolution thedetermines
decreases but increases increases L As
C
CC
Choosing optimal K and LDetermine accurately the KNN for m randomly-selected data points
distance (bandwidth)
Choose error threshold
The optimal K and L should satisfy
the approximate distance
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing optimal K and Lbull For each K estimate the error forbull In one run for all Lrsquos find the minimal L satisfying the constraint L(K)bull Minimize time t(KL(K))
minimum
Approximationerror for KL
L(K) for =005 Running timet[KL(K)]
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Data driven partitions
bull In original LSH cut values are random in the range of the databull Suggestion Randomly select a point from the data and use one of its coordinates as the cut value
uniform data driven pointsbucket distribution
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Additional speedup
aggregate)an of typea like is (C mode same
the toconverge willCin points all that Assume
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
C
C
Speedup results
65536 points 1638 points sampled k=100
Food for thought
Low dimension High dimension
A thought for foodhellipbull Choose K L by sample learning or take the traditionalbull Can one estimate K L without samplingbull A thought for food does it help to know the data
dimensionality or the data manifoldbull Intuitively dimensionality implies the number of hash
functions neededbull The catch efficient dimensionality learning requires
KNN
1530 cookieshellip
Summary
bull LSH suggests a compromise on accuracy for the gain of complexity
bull Applications that involve massive data in high dimension require the LSH fast performance
bull Extension of the LSH to different spaces (PSH)
bull Learning the LSH parameters and hash functions for different applications
Conclusion
bull but at the endeverything depends on your data set
bull Try it at homendash Visit
httpwebmiteduandoniwwwLSHindexhtmlndash Email Alex Andoni Andonimitedundash Test over your own data
(C code under Red Hat Linux )
Thanks
bull Ilan Shimshoni (Haifa)
bull Mohamad Hegaze (Weizmann)
bull Alex Andoni (MIT)
bull Mica and Denis
- k-Nearest Neighbors Search in High Dimensions
- Outline
- Nearest Neighbor Search Problem definition
- Applications
- Naiumlve solution
- Common solution
- When to use nearest neighbor
- Nearest Neighbor
- r - Nearest Neighbor
- Slide 10
- The simplest solution
- Quadtree
- Quadtree - structure
- Quadtree - Query
- Quadtree ndash Pitfall1
- Slide 16
- Quadtree ndash pitfall 2
- Space partition based algorithms
- Slide 19
- Curse of dimensionality
- Curse of dimensionality Some intuition
- Slide 22
- Preview
- Hash function
- Slide 25
- Slide 26
- Recall r - Nearest Neighbor
- Locality sensitive hashing
- Slide 29
- Hamming Space
- Slide 31
- L1 to Hamming Space Embedding
- Slide 33
- Construction
- Query
- Alternative intuition random projections
- Slide 37
- Slide 38
- Slide 39
- k samplings
- Repeating
- Repeating L times
- Slide 43
- Secondary hashing
- The above hashing is locality-sensitive
- Slide 46
- Direct L2 solution
- Central limit theorem
- Slide 49
- Slide 50
- Norm Distance
- Slide 52
- The full Hashing
- Slide 54
- Slide 55
- Slide 56
- Generalization P-Stable distribution
- P-Stable summary
- Parameters selection
- Parameters selection hellip
- hellip Parameters selection
- Pros amp Cons
- Conclusion
- LSH - Applications
- Motivation
- Slide 66
- Given an image x what are the parameters θ in this image ie angles of joints orientation of the body etc1048698
- Ingredients
- Example based learning
- Slide 70
- The image features
- PSH The basic assumption
- Insight Manifolds
- Slide 74
- Parameter Sensitive Hashing (PSH)
- Slide 76
- Slide 77
- Slide 78
- Slide 79
- Local Weighted Regression (LWR)
- Results
- Slide 82
- Results ndash real data
- Slide 84
- Slide 85
- Fast pose estimation - summary
- Food for Thought
- Food for Thought Point Location in Different Spheres (PLDS)
- Motivation
- Slide 90
- Slide 91
- Slide 92
- Slide 93
- Image segmentation algorithm
- Slide 95
- Filtering examples
- Segmentation examples
- Mean-shift in high dimensions
- LSH-based data structure
- Choosing the optimal K and L
- Slide 101
- Choosing optimal K and L
- Slide 103
- Data driven partitions
- Additional speedup
- Speedup results
- Food for thought
- A thought for foodhellip
- Summary
- Slide 110
- Thanks
-
![Page 17: K-Nearest Neighbors Search in High Dimensions Tomer Peled Dan Kushnir Tell me who your neighbors are, and I'll know who you are.](https://reader036.fdocuments.us/reader036/viewer/2022062421/56649c765503460f94929f53/html5/thumbnails/17.jpg)
Quadtree ndash pitfall 2X
Y
O(2d)
Could result in Query time Exponential in dimensions
Space partition based algorithms
Multidimensional access methods Volker Gaede O Gunther
Could be improved
Outline
bullProblem definition and flavorsbullAlgorithms overview - low dimensions bullCurse of dimensionality (dgt1020)Curse of dimensionality (dgt1020)bullEnchanting the curse
Locality Sensitive Hashing (high dimension approximate solutions)
bulll2 extensionbullApplications (Dan)
Curse of dimensionality
bullQuery time or spaceO(nd)bullDgt1020 worst than sequential scan
ndashFor most geometric distributionsbullTechniques specific to high dimensions are needed
bullProoved in theory and in practice by Barkol amp Rabani 2000 amp Beame-Vee 2002
O( min(nd nd) )Naive
Curse of dimensionalitySome intuition
2
22
23
2d
Outline
bullProblem definition and flavorsbullAlgorithms overview - low dimensions bullCurse of dimensionality (dgt1020)bullEnchanting the curse Enchanting the curse
Locality Sensitive Hashing Locality Sensitive Hashing (high dimension approximate solutions)
bulll2 extensionbullApplications (Dan)
Preview
bullGeneral Solution ndash Locality sensitive hashing
bullImplementation for Hamming space
bullGeneralization to l1 amp l2
Hash function
Hash function
Hash function
Data_Item
Key
BinBucket
Hash function
X modulo 3
X=Number in the range 0n
02
Storage Address
Data structure
0
Usually we would like related Data-items to be stored at the same bin
Recall r - Nearest Neighbor
r
(1 + ) r
dist(qp1) r
dist(qp2) (1 + ) r r2=(1 + ) r1
Locality sensitive hashing
r(1 + ) r
(r p1p2 )Sensitiveequiv Pr[I(p)=I(q)] is ldquohighrdquo if p is ldquocloserdquo to qequiv Pr[I(p)=I(q)] is ldquolowrdquo if p isrdquofarrdquo from q
r2=(1 + ) r1
P1P2
Preview
bullGeneral Solution ndash Locality sensitive hashing
bullImplementation for Hamming space
bullGeneralization to l1 amp l2
Hamming Space
bullHamming space = 2N binary strings
bullHamming distance = changed digits
aka Signal distanceRichard Hamming
Hamming SpaceN
010100001111
010100001111
010010000011Distance = 4
bullHamming space
bullHamming distance
SUM(X1 XOR X2)
L1 to Hamming Space Embedding
p
8
C=11
1111111100011000000000
2
1111111100011000000000
drsquo=Cd
Hash function
Lj Hash function
p Hdrsquoisin
Gj(p)=p|Ij
j=1L k=3 digits
Bits sampling from p
Store p into bucket p|Ij 2k buckets101
11000000000 111111110000 111000000000 111111110001
Construction
1 2 L
p
Query
1 2 L
q
Alternative intuition random projections
p
8
C=11
1111111100011000000000
2
1111111100011000000000
drsquo=Cd
Alternative intuition random projections
8
C=11
1111111100011000000000
2
1111111100011000000000
p
Alternative intuition random projections
8
C=11
1111111100011000000000
2
1111111100011000000000
p
Alternative intuition random projections
101
11000000000 111111110000 111000000000 111111110001
000
100
110
001
101
111
2233 BucketsBucketsp
k samplings
Repeating
Repeating L times
Repeating L times
Secondary hashing
Support volume tuning
dataset-size vs storage volume
2k buckets
011
Size=B
M Buckets
Simple Hashing
MB=αn α=2
Skip
The above hashing is locality-sensitive
bullProbability (pq in same bucket)=
k=1 k=2
Distance (qpi) Distance (qpi)
Pro
babi
lity Pr
Adopted from Piotr Indykrsquos slides
kqp
dimensions
)(Distance1
Preview
bullGeneral Solution ndash Locality sensitive hashing
bullImplementation for Hamming space
bullGeneralization to l2
Direct L2 solution
bullNew hashing function
bullStill based on sampling
bullUsing mathematical trick
bullP-stable distribution for Lp distance bullGaussian distribution for L2 distance
Central limit theorem
v1 +v2 hellip+vn =+hellip
(Weighted Gaussians) = Weighted Gaussian
Central limit theorem
v1vn = Real Numbers
X1Xn = Independent Identically Distributed(iid)
+v2 X2 hellip+vn Xn =+hellipv1 X1
Central limit theorem
XvXvi
ii
ii
21
2||
Dot Product Norm
Norm Distance
XvuXvXui
iii
iii
ii
21
2||
Features vector 1
Features vector 2 Distance
Norm Distance
XvuXvXui
iii
iii
ii
21
2||
Dot Product
Dot Product Distance
The full Hashing
w
bvavh ba )(
[34 82 21]1
227742
d
d random numbers
+b
phaseRandom[0w]
wDiscretization step
Features vector
The full Hashing
w
bvavh ba )(
+34
100
7944
7900 8000 8100 82007800
The full Hashing
w
bvavh ba )(
+34
phaseRandom[0w]
100Discretization step
7944
The full Hashing
w
bvavh ba )(
a1 v d
iid from p-stable distribution
+b
phaseRandom[0w]
wDiscretization step
Features vector
Generalization P-Stable distribution
bullLp p=eps2
bullGeneralized Central Limit Theorem
bullP-stable distributionCauchy for L2
bullL2
bullCentral Limit Theorem
bullGaussian (normal) distribution
P-Stable summary
bullWorks for bullGeneralizes to 0ltplt=2
bullImproves query time
Query time = O (dn1(1+)log(n) ) O (dn1(1+)^2log(n) )
r - Nearest Neighbor
Latest resultsReported in Email by
Alexander Andoni
Parameters selection
bull90 Probability Best quarry time performance
For Euclidean Space
Parameters selectionhellip
For Euclidean Space
bullSingle projection hit an - Nearest Neighbor with Pr=p1
bullk projections hits an - Nearest Neighbor with Pr=p1k
bullL hashings fail to collide with Pr=(1-p1k)L
bullTo ensure Collision (eg 1-δge90)
bull1( -1-p1k)Lge 1-δ)1log(
)log(
1kp
L
L
Reject Non-NeighborsAccept Neighbors
hellipParameters selection
K
k
time Candidates verification Candidates extraction
Better Query Time than Spatial Data Structures
Scales well to higher dimensions and larger data size ( Sub-linear dependence )
Predictable running time
Extra storage over-head
Inefficient for data with distances concentrated around average
works best for Hamming distance (although can be generalized to Euclidean space)
In secondary storage linear scan is pretty much all we can do (for high dim)
requires radius r to be fixed in advance
Pros amp Cons
From Pioter Indyk slides
Conclusion
bullbut at the endeverything depends on your data set
bullTry it at homendashVisit
httpwebmiteduandoniwwwLSHindexhtml
ndashEmail Alex AndoniAndonimitedundashTest over your own data
(C code under Red Hat Linux )
LSH - Applicationsbull Searching video clips in databases (Hierarchical Non-Uniform Locality Sensitive
Hashing and Its Application to Video Identificationldquo Yang Ooi Sun)
bull Searching image databases (see the following)
bull Image segmentation (see the following)
bull Image classification (ldquoDiscriminant adaptive Nearest Neighbor Classificationrdquo T Hastie R Tibshirani)
bull Texture classification (see the following)
bull Clustering (see the following)
bull Embedding and manifold learning (LLE and many others)
bull Compression ndash vector quantizationbull Search engines (ldquoLSH Forest SelfTuning Indexes for Similarity Searchrdquo M Bawa T Condie P Ganesanrdquo)
bull Genomics (ldquoEfficient Large-Scale Sequence Comparison by Locality-Sensitive Hashingrdquo J Buhler)
bull In short whenever K-Nearest Neighbors (KNN) are needed
Motivation
bull A variety of procedures in learning require KNN computation
bull KNN search is a computational bottleneck
bull LSH provides a fast approximate solution to the problem
bull LSH requires hash function construction and parameter tunning
Outline
Fast Pose Estimation with Parameter Sensitive Hashing G Shakhnarovich P Viola and T Darrell
bull Finding sensitive hash functions
Mean Shift Based Clustering in HighDimensions A Texture Classification Example
B Georgescu I Shimshoni and P Meer
bull Tuning LSH parametersbull LSH data structure is used for algorithm
speedups
Given an image x what are the parameters θ in this image
ie angles of joints orientation of the body etc1048698
The Problem
Fast Pose Estimation with Parameter Sensitive Hashing
G Shakhnarovich P Viola and T Darrell
i
Ingredients
bull Input query image with unknown angles (parameters)
bull Database of human poses with known anglesbull Image feature extractor ndash edge detector
bull Distance metric in feature space dx
bull Distance metric in angles space
m
i
iid1
2121 )cos(1)(
Example based learning
bull Construct a database of example images with their known angles
bull Given a query image run your favorite feature extractorbull Compute KNN from databasebull Use these KNNs to compute the average angles of the
query
Input queryFind KNN in database of examples
Output Average angles of KNN
Input Query
Features extraction
Processed query
PSH (LSH)
Database of examples
The algorithm flow
LWR (Regression)
Output Match
The image features
B A
Axx 4107 )(
4
3
2
4 0
Image features are multi-scale edge histograms
Feature Extraction PSH LWR
PSH The basic assumption
There are two metric spaces here feature space ( )
and parameter space ( )
We want similarity to be measured in the angles
space whereas LSH works on the feature space
bull Assumption The feature space is closely related to the parameter space
xd
d
Feature Extraction PSH LWR
Insight Manifolds
bull Manifold is a space in which every point has a neighborhood resembling a Euclid space
bull But global structure may be complicated curved
bull For example lines are 1D manifolds planes are 2D manifolds etc
Feature Extraction PSH LWR
Parameters Space (angles)
Feature Space
q
Is this Magic
Parameter Sensitive Hashing (PSH)
The trick
Estimate performance of different hash functions on examples and select those sensitive to
The hash functions are applied in feature space but the KNN are valid in angle space
d
Feature Extraction PSH LWR
Label pairs of examples with similar angles
Define hash functions h on feature space
Feature Extraction PSH LWR
Predict labeling of similarnon-similar examples by using h
Compare labeling
If labeling by h is goodaccept h else change h
PSH as a classification problem
+1 +1 -1 -1
(r=025)
Labels
)1()( if 1
)( if 1y
labeled is
)x()(x examples ofpair A
ij
ji
rd
rd
ji
ji
ji
Feature Extraction PSH LWR
otherwise 1-
T(x) if 1)(
xh T
A binary hash functionfeatures
otherwise 1
if 1ˆ
labels ePredict th
)(xh)(xh)x(xy
jTiTjih
Feature Extraction PSH LWR
Feature
Feature Extraction PSH LWR
sconstraint iesprobabilit with thelabeling true thepredicts that Tbest theFind
themseparateor bin
same in the examplesboth place willTh
)(xT
Local Weighted Regression (LWR)bull Given a query image PSH returns
KNNs
bull LWR uses the KNN to compute a weighted average of the estimated angles of the query
weightdist
iXiixNx
xxdKxgdi
0)(
))(())((minarg0
Feature Extraction PSH LWR
Results
Synthetic data were generated
bull 13 angles 1 for rotation of the torso 12 for joints
bull 150000 images
bull Nuisance parameters added clothing illumination face expression
bull 1775000 example pairs
bull Selected 137 out of 5123 meaningful features (how)
18 bit hash functions (k) 150 hash tables (l)
bull Test on 1000 synthetic examplesbull PSH searched only 34 of the data per query
bull Without selection needed 40 bits and
1000 hash tables
Recall P1 is prob of positive hashP2 is prob of bad hashB is the max number of pts in a bucket
Results ndash real data
bull 800 images
bull Processed by a segmentation algorithm
bull 13 of the data were searched
Results ndash real data
Interesting mismatches
Fast pose estimation - summary
bull Fast way to compute the angles of human body figure
bull Moving from one representation space to another
bull Training a sensitive hash function
bull KNN smart averaging
Food for Thought
bull The basic assumption may be problematic (distance metric representations)
bull The training set should be dense
bull Texture and clutter
bull General some features are more important than others and should be weighted
Food for Thought Point Location in Different Spheres (PLDS)
bull Given n spheres in Rd centered at P=p1hellippn
with radii r1helliprn
bull Goal given a query q preprocess the points in P to find point pi that its sphere lsquocoverrsquo the query q
qpi
ri
Courtesy of Mohamad Hegaze
Motivationbull Clustering high dimensional data by using local
density measurements (eg feature space)bull Statistical curse of dimensionality
sparseness of the databull Computational curse of dimensionality
expensive range queriesbull LSH parameters should be adjusted for optimal
performance
Mean-Shift Based Clustering in High Dimensions A Texture Classification Example
B Georgescu I Shimshoni and P Meer
Outline
bull Mean-shift in a nutshell + examples
Our scope
bull Mean-shift in high dimensions ndash using LSH
bull Speedups1 Finding optimal LSH parameters
2 Data-driven partitions into buckets
3 Additional speedup by using LSH data structure
Mean-Shift in a Nutshellbandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
point
KNN in mean-shift
Bandwidth should be inversely proportional to the density in the region
high density - small bandwidth low density - large bandwidth
Based on kth nearest neighbor of the point
The bandwidth is
Adaptive mean-shift vs non-adaptive
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Image segmentation algorithm1 Input Data in 5D (3 color + 2 xy) or 3D (1 gray +2 xy)2 Resolution controlled by the bandwidth hs (spatial) hr (color)3 Apply filtering
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
3D
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Image segmentation algorithm
original segmented
filtered
Filtering pixel value of the nearest mode
Mean-shift trajectories
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
original squirrel filtered
original baboon filtered
Filtering examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Segmentation examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Mean-shift in high dimensions
Computational curse of dimensionality
Statistical curse of dimensionality
Expensive range queries implemented with LSH
Sparseness of the data variable bandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
LSH-based data structure
bull Choose L random partitionsEach partition includes K pairs
(dkvk)bull For each point we check
kdi vxK
It Partitions the data into cells
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing the optimal K and L
bull For a query q compute smallest number of distances to points in its buckets
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
points extra includemight big toois L ifbut
missed bemight points small toois L If
cell ain points ofnumber smaller k Large
C
l
l
CC
dC
LNN
dKnN
)1(
C
C
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
structure data theof resolution thedetermines
decreases but increases increases L As
C
CC
Choosing optimal K and LDetermine accurately the KNN for m randomly-selected data points
distance (bandwidth)
Choose error threshold
The optimal K and L should satisfy
the approximate distance
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing optimal K and Lbull For each K estimate the error forbull In one run for all Lrsquos find the minimal L satisfying the constraint L(K)bull Minimize time t(KL(K))
minimum
Approximationerror for KL
L(K) for =005 Running timet[KL(K)]
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Data driven partitions
bull In original LSH cut values are random in the range of the databull Suggestion Randomly select a point from the data and use one of its coordinates as the cut value
uniform data driven pointsbucket distribution
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Additional speedup
aggregate)an of typea like is (C mode same
the toconverge willCin points all that Assume
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
C
C
Speedup results
65536 points 1638 points sampled k=100
Food for thought
Low dimension High dimension
A thought for foodhellipbull Choose K L by sample learning or take the traditionalbull Can one estimate K L without samplingbull A thought for food does it help to know the data
dimensionality or the data manifoldbull Intuitively dimensionality implies the number of hash
functions neededbull The catch efficient dimensionality learning requires
KNN
1530 cookieshellip
Summary
bull LSH suggests a compromise on accuracy for the gain of complexity
bull Applications that involve massive data in high dimension require the LSH fast performance
bull Extension of the LSH to different spaces (PSH)
bull Learning the LSH parameters and hash functions for different applications
Conclusion
bull but at the endeverything depends on your data set
bull Try it at homendash Visit
httpwebmiteduandoniwwwLSHindexhtmlndash Email Alex Andoni Andonimitedundash Test over your own data
(C code under Red Hat Linux )
Thanks
bull Ilan Shimshoni (Haifa)
bull Mohamad Hegaze (Weizmann)
bull Alex Andoni (MIT)
bull Mica and Denis
- k-Nearest Neighbors Search in High Dimensions
- Outline
- Nearest Neighbor Search Problem definition
- Applications
- Naiumlve solution
- Common solution
- When to use nearest neighbor
- Nearest Neighbor
- r - Nearest Neighbor
- Slide 10
- The simplest solution
- Quadtree
- Quadtree - structure
- Quadtree - Query
- Quadtree ndash Pitfall1
- Slide 16
- Quadtree ndash pitfall 2
- Space partition based algorithms
- Slide 19
- Curse of dimensionality
- Curse of dimensionality Some intuition
- Slide 22
- Preview
- Hash function
- Slide 25
- Slide 26
- Recall r - Nearest Neighbor
- Locality sensitive hashing
- Slide 29
- Hamming Space
- Slide 31
- L1 to Hamming Space Embedding
- Slide 33
- Construction
- Query
- Alternative intuition random projections
- Slide 37
- Slide 38
- Slide 39
- k samplings
- Repeating
- Repeating L times
- Slide 43
- Secondary hashing
- The above hashing is locality-sensitive
- Slide 46
- Direct L2 solution
- Central limit theorem
- Slide 49
- Slide 50
- Norm Distance
- Slide 52
- The full Hashing
- Slide 54
- Slide 55
- Slide 56
- Generalization P-Stable distribution
- P-Stable summary
- Parameters selection
- Parameters selection hellip
- hellip Parameters selection
- Pros amp Cons
- Conclusion
- LSH - Applications
- Motivation
- Slide 66
- Given an image x what are the parameters θ in this image ie angles of joints orientation of the body etc1048698
- Ingredients
- Example based learning
- Slide 70
- The image features
- PSH The basic assumption
- Insight Manifolds
- Slide 74
- Parameter Sensitive Hashing (PSH)
- Slide 76
- Slide 77
- Slide 78
- Slide 79
- Local Weighted Regression (LWR)
- Results
- Slide 82
- Results ndash real data
- Slide 84
- Slide 85
- Fast pose estimation - summary
- Food for Thought
- Food for Thought Point Location in Different Spheres (PLDS)
- Motivation
- Slide 90
- Slide 91
- Slide 92
- Slide 93
- Image segmentation algorithm
- Slide 95
- Filtering examples
- Segmentation examples
- Mean-shift in high dimensions
- LSH-based data structure
- Choosing the optimal K and L
- Slide 101
- Choosing optimal K and L
- Slide 103
- Data driven partitions
- Additional speedup
- Speedup results
- Food for thought
- A thought for foodhellip
- Summary
- Slide 110
- Thanks
-
![Page 18: K-Nearest Neighbors Search in High Dimensions Tomer Peled Dan Kushnir Tell me who your neighbors are, and I'll know who you are.](https://reader036.fdocuments.us/reader036/viewer/2022062421/56649c765503460f94929f53/html5/thumbnails/18.jpg)
Space partition based algorithms
Multidimensional access methods Volker Gaede O Gunther
Could be improved
Outline
bullProblem definition and flavorsbullAlgorithms overview - low dimensions bullCurse of dimensionality (dgt1020)Curse of dimensionality (dgt1020)bullEnchanting the curse
Locality Sensitive Hashing (high dimension approximate solutions)
bulll2 extensionbullApplications (Dan)
Curse of dimensionality
bullQuery time or spaceO(nd)bullDgt1020 worst than sequential scan
ndashFor most geometric distributionsbullTechniques specific to high dimensions are needed
bullProoved in theory and in practice by Barkol amp Rabani 2000 amp Beame-Vee 2002
O( min(nd nd) )Naive
Curse of dimensionalitySome intuition
2
22
23
2d
Outline
bullProblem definition and flavorsbullAlgorithms overview - low dimensions bullCurse of dimensionality (dgt1020)bullEnchanting the curse Enchanting the curse
Locality Sensitive Hashing Locality Sensitive Hashing (high dimension approximate solutions)
bulll2 extensionbullApplications (Dan)
Preview
bullGeneral Solution ndash Locality sensitive hashing
bullImplementation for Hamming space
bullGeneralization to l1 amp l2
Hash function
Hash function
Hash function
Data_Item
Key
BinBucket
Hash function
X modulo 3
X=Number in the range 0n
02
Storage Address
Data structure
0
Usually we would like related Data-items to be stored at the same bin
Recall r - Nearest Neighbor
r
(1 + ) r
dist(qp1) r
dist(qp2) (1 + ) r r2=(1 + ) r1
Locality sensitive hashing
r(1 + ) r
(r p1p2 )Sensitiveequiv Pr[I(p)=I(q)] is ldquohighrdquo if p is ldquocloserdquo to qequiv Pr[I(p)=I(q)] is ldquolowrdquo if p isrdquofarrdquo from q
r2=(1 + ) r1
P1P2
Preview
bullGeneral Solution ndash Locality sensitive hashing
bullImplementation for Hamming space
bullGeneralization to l1 amp l2
Hamming Space
bullHamming space = 2N binary strings
bullHamming distance = changed digits
aka Signal distanceRichard Hamming
Hamming SpaceN
010100001111
010100001111
010010000011Distance = 4
bullHamming space
bullHamming distance
SUM(X1 XOR X2)
L1 to Hamming Space Embedding
p
8
C=11
1111111100011000000000
2
1111111100011000000000
drsquo=Cd
Hash function
Lj Hash function
p Hdrsquoisin
Gj(p)=p|Ij
j=1L k=3 digits
Bits sampling from p
Store p into bucket p|Ij 2k buckets101
11000000000 111111110000 111000000000 111111110001
Construction
1 2 L
p
Query
1 2 L
q
Alternative intuition random projections
p
8
C=11
1111111100011000000000
2
1111111100011000000000
drsquo=Cd
Alternative intuition random projections
8
C=11
1111111100011000000000
2
1111111100011000000000
p
Alternative intuition random projections
8
C=11
1111111100011000000000
2
1111111100011000000000
p
Alternative intuition random projections
101
11000000000 111111110000 111000000000 111111110001
000
100
110
001
101
111
2233 BucketsBucketsp
k samplings
Repeating
Repeating L times
Repeating L times
Secondary hashing
Support volume tuning
dataset-size vs storage volume
2k buckets
011
Size=B
M Buckets
Simple Hashing
MB=αn α=2
Skip
The above hashing is locality-sensitive
bullProbability (pq in same bucket)=
k=1 k=2
Distance (qpi) Distance (qpi)
Pro
babi
lity Pr
Adopted from Piotr Indykrsquos slides
kqp
dimensions
)(Distance1
Preview
bullGeneral Solution ndash Locality sensitive hashing
bullImplementation for Hamming space
bullGeneralization to l2
Direct L2 solution
bullNew hashing function
bullStill based on sampling
bullUsing mathematical trick
bullP-stable distribution for Lp distance bullGaussian distribution for L2 distance
Central limit theorem
v1 +v2 hellip+vn =+hellip
(Weighted Gaussians) = Weighted Gaussian
Central limit theorem
v1vn = Real Numbers
X1Xn = Independent Identically Distributed(iid)
+v2 X2 hellip+vn Xn =+hellipv1 X1
Central limit theorem
XvXvi
ii
ii
21
2||
Dot Product Norm
Norm Distance
XvuXvXui
iii
iii
ii
21
2||
Features vector 1
Features vector 2 Distance
Norm Distance
XvuXvXui
iii
iii
ii
21
2||
Dot Product
Dot Product Distance
The full Hashing
w
bvavh ba )(
[34 82 21]1
227742
d
d random numbers
+b
phaseRandom[0w]
wDiscretization step
Features vector
The full Hashing
w
bvavh ba )(
+34
100
7944
7900 8000 8100 82007800
The full Hashing
w
bvavh ba )(
+34
phaseRandom[0w]
100Discretization step
7944
The full Hashing
w
bvavh ba )(
a1 v d
iid from p-stable distribution
+b
phaseRandom[0w]
wDiscretization step
Features vector
Generalization P-Stable distribution
bullLp p=eps2
bullGeneralized Central Limit Theorem
bullP-stable distributionCauchy for L2
bullL2
bullCentral Limit Theorem
bullGaussian (normal) distribution
P-Stable summary
bullWorks for bullGeneralizes to 0ltplt=2
bullImproves query time
Query time = O (dn1(1+)log(n) ) O (dn1(1+)^2log(n) )
r - Nearest Neighbor
Latest resultsReported in Email by
Alexander Andoni
Parameters selection
bull90 Probability Best quarry time performance
For Euclidean Space
Parameters selectionhellip
For Euclidean Space
bullSingle projection hit an - Nearest Neighbor with Pr=p1
bullk projections hits an - Nearest Neighbor with Pr=p1k
bullL hashings fail to collide with Pr=(1-p1k)L
bullTo ensure Collision (eg 1-δge90)
bull1( -1-p1k)Lge 1-δ)1log(
)log(
1kp
L
L
Reject Non-NeighborsAccept Neighbors
hellipParameters selection
K
k
time Candidates verification Candidates extraction
Better Query Time than Spatial Data Structures
Scales well to higher dimensions and larger data size ( Sub-linear dependence )
Predictable running time
Extra storage over-head
Inefficient for data with distances concentrated around average
works best for Hamming distance (although can be generalized to Euclidean space)
In secondary storage linear scan is pretty much all we can do (for high dim)
requires radius r to be fixed in advance
Pros amp Cons
From Pioter Indyk slides
Conclusion
bullbut at the endeverything depends on your data set
bullTry it at homendashVisit
httpwebmiteduandoniwwwLSHindexhtml
ndashEmail Alex AndoniAndonimitedundashTest over your own data
(C code under Red Hat Linux )
LSH - Applicationsbull Searching video clips in databases (Hierarchical Non-Uniform Locality Sensitive
Hashing and Its Application to Video Identificationldquo Yang Ooi Sun)
bull Searching image databases (see the following)
bull Image segmentation (see the following)
bull Image classification (ldquoDiscriminant adaptive Nearest Neighbor Classificationrdquo T Hastie R Tibshirani)
bull Texture classification (see the following)
bull Clustering (see the following)
bull Embedding and manifold learning (LLE and many others)
bull Compression ndash vector quantizationbull Search engines (ldquoLSH Forest SelfTuning Indexes for Similarity Searchrdquo M Bawa T Condie P Ganesanrdquo)
bull Genomics (ldquoEfficient Large-Scale Sequence Comparison by Locality-Sensitive Hashingrdquo J Buhler)
bull In short whenever K-Nearest Neighbors (KNN) are needed
Motivation
bull A variety of procedures in learning require KNN computation
bull KNN search is a computational bottleneck
bull LSH provides a fast approximate solution to the problem
bull LSH requires hash function construction and parameter tunning
Outline
Fast Pose Estimation with Parameter Sensitive Hashing G Shakhnarovich P Viola and T Darrell
bull Finding sensitive hash functions
Mean Shift Based Clustering in HighDimensions A Texture Classification Example
B Georgescu I Shimshoni and P Meer
bull Tuning LSH parametersbull LSH data structure is used for algorithm
speedups
Given an image x what are the parameters θ in this image
ie angles of joints orientation of the body etc1048698
The Problem
Fast Pose Estimation with Parameter Sensitive Hashing
G Shakhnarovich P Viola and T Darrell
i
Ingredients
bull Input query image with unknown angles (parameters)
bull Database of human poses with known anglesbull Image feature extractor ndash edge detector
bull Distance metric in feature space dx
bull Distance metric in angles space
m
i
iid1
2121 )cos(1)(
Example based learning
bull Construct a database of example images with their known angles
bull Given a query image run your favorite feature extractorbull Compute KNN from databasebull Use these KNNs to compute the average angles of the
query
Input queryFind KNN in database of examples
Output Average angles of KNN
Input Query
Features extraction
Processed query
PSH (LSH)
Database of examples
The algorithm flow
LWR (Regression)
Output Match
The image features
B A
Axx 4107 )(
4
3
2
4 0
Image features are multi-scale edge histograms
Feature Extraction PSH LWR
PSH The basic assumption
There are two metric spaces here feature space ( )
and parameter space ( )
We want similarity to be measured in the angles
space whereas LSH works on the feature space
bull Assumption The feature space is closely related to the parameter space
xd
d
Feature Extraction PSH LWR
Insight Manifolds
bull Manifold is a space in which every point has a neighborhood resembling a Euclid space
bull But global structure may be complicated curved
bull For example lines are 1D manifolds planes are 2D manifolds etc
Feature Extraction PSH LWR
Parameters Space (angles)
Feature Space
q
Is this Magic
Parameter Sensitive Hashing (PSH)
The trick
Estimate performance of different hash functions on examples and select those sensitive to
The hash functions are applied in feature space but the KNN are valid in angle space
d
Feature Extraction PSH LWR
Label pairs of examples with similar angles
Define hash functions h on feature space
Feature Extraction PSH LWR
Predict labeling of similarnon-similar examples by using h
Compare labeling
If labeling by h is goodaccept h else change h
PSH as a classification problem
+1 +1 -1 -1
(r=025)
Labels
)1()( if 1
)( if 1y
labeled is
)x()(x examples ofpair A
ij
ji
rd
rd
ji
ji
ji
Feature Extraction PSH LWR
otherwise 1-
T(x) if 1)(
xh T
A binary hash functionfeatures
otherwise 1
if 1ˆ
labels ePredict th
)(xh)(xh)x(xy
jTiTjih
Feature Extraction PSH LWR
Feature
Feature Extraction PSH LWR
sconstraint iesprobabilit with thelabeling true thepredicts that Tbest theFind
themseparateor bin
same in the examplesboth place willTh
)(xT
Local Weighted Regression (LWR)bull Given a query image PSH returns
KNNs
bull LWR uses the KNN to compute a weighted average of the estimated angles of the query
weightdist
iXiixNx
xxdKxgdi
0)(
))(())((minarg0
Feature Extraction PSH LWR
Results
Synthetic data were generated
bull 13 angles 1 for rotation of the torso 12 for joints
bull 150000 images
bull Nuisance parameters added clothing illumination face expression
bull 1775000 example pairs
bull Selected 137 out of 5123 meaningful features (how)
18 bit hash functions (k) 150 hash tables (l)
bull Test on 1000 synthetic examplesbull PSH searched only 34 of the data per query
bull Without selection needed 40 bits and
1000 hash tables
Recall P1 is prob of positive hashP2 is prob of bad hashB is the max number of pts in a bucket
Results ndash real data
bull 800 images
bull Processed by a segmentation algorithm
bull 13 of the data were searched
Results ndash real data
Interesting mismatches
Fast pose estimation - summary
bull Fast way to compute the angles of human body figure
bull Moving from one representation space to another
bull Training a sensitive hash function
bull KNN smart averaging
Food for Thought
bull The basic assumption may be problematic (distance metric representations)
bull The training set should be dense
bull Texture and clutter
bull General some features are more important than others and should be weighted
Food for Thought Point Location in Different Spheres (PLDS)
bull Given n spheres in Rd centered at P=p1hellippn
with radii r1helliprn
bull Goal given a query q preprocess the points in P to find point pi that its sphere lsquocoverrsquo the query q
qpi
ri
Courtesy of Mohamad Hegaze
Motivationbull Clustering high dimensional data by using local
density measurements (eg feature space)bull Statistical curse of dimensionality
sparseness of the databull Computational curse of dimensionality
expensive range queriesbull LSH parameters should be adjusted for optimal
performance
Mean-Shift Based Clustering in High Dimensions A Texture Classification Example
B Georgescu I Shimshoni and P Meer
Outline
bull Mean-shift in a nutshell + examples
Our scope
bull Mean-shift in high dimensions ndash using LSH
bull Speedups1 Finding optimal LSH parameters
2 Data-driven partitions into buckets
3 Additional speedup by using LSH data structure
Mean-Shift in a Nutshellbandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
point
KNN in mean-shift
Bandwidth should be inversely proportional to the density in the region
high density - small bandwidth low density - large bandwidth
Based on kth nearest neighbor of the point
The bandwidth is
Adaptive mean-shift vs non-adaptive
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Image segmentation algorithm1 Input Data in 5D (3 color + 2 xy) or 3D (1 gray +2 xy)2 Resolution controlled by the bandwidth hs (spatial) hr (color)3 Apply filtering
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
3D
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Image segmentation algorithm
original segmented
filtered
Filtering pixel value of the nearest mode
Mean-shift trajectories
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
original squirrel filtered
original baboon filtered
Filtering examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Segmentation examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Mean-shift in high dimensions
Computational curse of dimensionality
Statistical curse of dimensionality
Expensive range queries implemented with LSH
Sparseness of the data variable bandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
LSH-based data structure
bull Choose L random partitionsEach partition includes K pairs
(dkvk)bull For each point we check
kdi vxK
It Partitions the data into cells
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing the optimal K and L
bull For a query q compute smallest number of distances to points in its buckets
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
points extra includemight big toois L ifbut
missed bemight points small toois L If
cell ain points ofnumber smaller k Large
C
l
l
CC
dC
LNN
dKnN
)1(
C
C
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
structure data theof resolution thedetermines
decreases but increases increases L As
C
CC
Choosing optimal K and LDetermine accurately the KNN for m randomly-selected data points
distance (bandwidth)
Choose error threshold
The optimal K and L should satisfy
the approximate distance
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing optimal K and Lbull For each K estimate the error forbull In one run for all Lrsquos find the minimal L satisfying the constraint L(K)bull Minimize time t(KL(K))
minimum
Approximationerror for KL
L(K) for =005 Running timet[KL(K)]
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Data driven partitions
bull In original LSH cut values are random in the range of the databull Suggestion Randomly select a point from the data and use one of its coordinates as the cut value
uniform data driven pointsbucket distribution
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Additional speedup
aggregate)an of typea like is (C mode same
the toconverge willCin points all that Assume
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
C
C
Speedup results
65536 points 1638 points sampled k=100
Food for thought
Low dimension High dimension
A thought for foodhellipbull Choose K L by sample learning or take the traditionalbull Can one estimate K L without samplingbull A thought for food does it help to know the data
dimensionality or the data manifoldbull Intuitively dimensionality implies the number of hash
functions neededbull The catch efficient dimensionality learning requires
KNN
1530 cookieshellip
Summary
bull LSH suggests a compromise on accuracy for the gain of complexity
bull Applications that involve massive data in high dimension require the LSH fast performance
bull Extension of the LSH to different spaces (PSH)
bull Learning the LSH parameters and hash functions for different applications
Conclusion
bull but at the endeverything depends on your data set
bull Try it at homendash Visit
httpwebmiteduandoniwwwLSHindexhtmlndash Email Alex Andoni Andonimitedundash Test over your own data
(C code under Red Hat Linux )
Thanks
bull Ilan Shimshoni (Haifa)
bull Mohamad Hegaze (Weizmann)
bull Alex Andoni (MIT)
bull Mica and Denis
- k-Nearest Neighbors Search in High Dimensions
- Outline
- Nearest Neighbor Search Problem definition
- Applications
- Naiumlve solution
- Common solution
- When to use nearest neighbor
- Nearest Neighbor
- r - Nearest Neighbor
- Slide 10
- The simplest solution
- Quadtree
- Quadtree - structure
- Quadtree - Query
- Quadtree ndash Pitfall1
- Slide 16
- Quadtree ndash pitfall 2
- Space partition based algorithms
- Slide 19
- Curse of dimensionality
- Curse of dimensionality Some intuition
- Slide 22
- Preview
- Hash function
- Slide 25
- Slide 26
- Recall r - Nearest Neighbor
- Locality sensitive hashing
- Slide 29
- Hamming Space
- Slide 31
- L1 to Hamming Space Embedding
- Slide 33
- Construction
- Query
- Alternative intuition random projections
- Slide 37
- Slide 38
- Slide 39
- k samplings
- Repeating
- Repeating L times
- Slide 43
- Secondary hashing
- The above hashing is locality-sensitive
- Slide 46
- Direct L2 solution
- Central limit theorem
- Slide 49
- Slide 50
- Norm Distance
- Slide 52
- The full Hashing
- Slide 54
- Slide 55
- Slide 56
- Generalization P-Stable distribution
- P-Stable summary
- Parameters selection
- Parameters selection hellip
- hellip Parameters selection
- Pros amp Cons
- Conclusion
- LSH - Applications
- Motivation
- Slide 66
- Given an image x what are the parameters θ in this image ie angles of joints orientation of the body etc1048698
- Ingredients
- Example based learning
- Slide 70
- The image features
- PSH The basic assumption
- Insight Manifolds
- Slide 74
- Parameter Sensitive Hashing (PSH)
- Slide 76
- Slide 77
- Slide 78
- Slide 79
- Local Weighted Regression (LWR)
- Results
- Slide 82
- Results ndash real data
- Slide 84
- Slide 85
- Fast pose estimation - summary
- Food for Thought
- Food for Thought Point Location in Different Spheres (PLDS)
- Motivation
- Slide 90
- Slide 91
- Slide 92
- Slide 93
- Image segmentation algorithm
- Slide 95
- Filtering examples
- Segmentation examples
- Mean-shift in high dimensions
- LSH-based data structure
- Choosing the optimal K and L
- Slide 101
- Choosing optimal K and L
- Slide 103
- Data driven partitions
- Additional speedup
- Speedup results
- Food for thought
- A thought for foodhellip
- Summary
- Slide 110
- Thanks
-
![Page 19: K-Nearest Neighbors Search in High Dimensions Tomer Peled Dan Kushnir Tell me who your neighbors are, and I'll know who you are.](https://reader036.fdocuments.us/reader036/viewer/2022062421/56649c765503460f94929f53/html5/thumbnails/19.jpg)
Outline
bullProblem definition and flavorsbullAlgorithms overview - low dimensions bullCurse of dimensionality (dgt1020)Curse of dimensionality (dgt1020)bullEnchanting the curse
Locality Sensitive Hashing (high dimension approximate solutions)
bulll2 extensionbullApplications (Dan)
Curse of dimensionality
bullQuery time or spaceO(nd)bullDgt1020 worst than sequential scan
ndashFor most geometric distributionsbullTechniques specific to high dimensions are needed
bullProoved in theory and in practice by Barkol amp Rabani 2000 amp Beame-Vee 2002
O( min(nd nd) )Naive
Curse of dimensionalitySome intuition
2
22
23
2d
Outline
bullProblem definition and flavorsbullAlgorithms overview - low dimensions bullCurse of dimensionality (dgt1020)bullEnchanting the curse Enchanting the curse
Locality Sensitive Hashing Locality Sensitive Hashing (high dimension approximate solutions)
bulll2 extensionbullApplications (Dan)
Preview
bullGeneral Solution ndash Locality sensitive hashing
bullImplementation for Hamming space
bullGeneralization to l1 amp l2
Hash function
Hash function
Hash function
Data_Item
Key
BinBucket
Hash function
X modulo 3
X=Number in the range 0n
02
Storage Address
Data structure
0
Usually we would like related Data-items to be stored at the same bin
Recall r - Nearest Neighbor
r
(1 + ) r
dist(qp1) r
dist(qp2) (1 + ) r r2=(1 + ) r1
Locality sensitive hashing
r(1 + ) r
(r p1p2 )Sensitiveequiv Pr[I(p)=I(q)] is ldquohighrdquo if p is ldquocloserdquo to qequiv Pr[I(p)=I(q)] is ldquolowrdquo if p isrdquofarrdquo from q
r2=(1 + ) r1
P1P2
Preview
bullGeneral Solution ndash Locality sensitive hashing
bullImplementation for Hamming space
bullGeneralization to l1 amp l2
Hamming Space
bullHamming space = 2N binary strings
bullHamming distance = changed digits
aka Signal distanceRichard Hamming
Hamming SpaceN
010100001111
010100001111
010010000011Distance = 4
bullHamming space
bullHamming distance
SUM(X1 XOR X2)
L1 to Hamming Space Embedding
p
8
C=11
1111111100011000000000
2
1111111100011000000000
drsquo=Cd
Hash function
Lj Hash function
p Hdrsquoisin
Gj(p)=p|Ij
j=1L k=3 digits
Bits sampling from p
Store p into bucket p|Ij 2k buckets101
11000000000 111111110000 111000000000 111111110001
Construction
1 2 L
p
Query
1 2 L
q
Alternative intuition random projections
p
8
C=11
1111111100011000000000
2
1111111100011000000000
drsquo=Cd
Alternative intuition random projections
8
C=11
1111111100011000000000
2
1111111100011000000000
p
Alternative intuition random projections
8
C=11
1111111100011000000000
2
1111111100011000000000
p
Alternative intuition random projections
101
11000000000 111111110000 111000000000 111111110001
000
100
110
001
101
111
2233 BucketsBucketsp
k samplings
Repeating
Repeating L times
Repeating L times
Secondary hashing
Support volume tuning
dataset-size vs storage volume
2k buckets
011
Size=B
M Buckets
Simple Hashing
MB=αn α=2
Skip
The above hashing is locality-sensitive
bullProbability (pq in same bucket)=
k=1 k=2
Distance (qpi) Distance (qpi)
Pro
babi
lity Pr
Adopted from Piotr Indykrsquos slides
kqp
dimensions
)(Distance1
Preview
bullGeneral Solution ndash Locality sensitive hashing
bullImplementation for Hamming space
bullGeneralization to l2
Direct L2 solution
bullNew hashing function
bullStill based on sampling
bullUsing mathematical trick
bullP-stable distribution for Lp distance bullGaussian distribution for L2 distance
Central limit theorem
v1 +v2 hellip+vn =+hellip
(Weighted Gaussians) = Weighted Gaussian
Central limit theorem
v1vn = Real Numbers
X1Xn = Independent Identically Distributed(iid)
+v2 X2 hellip+vn Xn =+hellipv1 X1
Central limit theorem
XvXvi
ii
ii
21
2||
Dot Product Norm
Norm Distance
XvuXvXui
iii
iii
ii
21
2||
Features vector 1
Features vector 2 Distance
Norm Distance
XvuXvXui
iii
iii
ii
21
2||
Dot Product
Dot Product Distance
The full Hashing
w
bvavh ba )(
[34 82 21]1
227742
d
d random numbers
+b
phaseRandom[0w]
wDiscretization step
Features vector
The full Hashing
w
bvavh ba )(
+34
100
7944
7900 8000 8100 82007800
The full Hashing
w
bvavh ba )(
+34
phaseRandom[0w]
100Discretization step
7944
The full Hashing
w
bvavh ba )(
a1 v d
iid from p-stable distribution
+b
phaseRandom[0w]
wDiscretization step
Features vector
Generalization P-Stable distribution
bullLp p=eps2
bullGeneralized Central Limit Theorem
bullP-stable distributionCauchy for L2
bullL2
bullCentral Limit Theorem
bullGaussian (normal) distribution
P-Stable summary
bullWorks for bullGeneralizes to 0ltplt=2
bullImproves query time
Query time = O (dn1(1+)log(n) ) O (dn1(1+)^2log(n) )
r - Nearest Neighbor
Latest resultsReported in Email by
Alexander Andoni
Parameters selection
bull90 Probability Best quarry time performance
For Euclidean Space
Parameters selectionhellip
For Euclidean Space
bullSingle projection hit an - Nearest Neighbor with Pr=p1
bullk projections hits an - Nearest Neighbor with Pr=p1k
bullL hashings fail to collide with Pr=(1-p1k)L
bullTo ensure Collision (eg 1-δge90)
bull1( -1-p1k)Lge 1-δ)1log(
)log(
1kp
L
L
Reject Non-NeighborsAccept Neighbors
hellipParameters selection
K
k
time Candidates verification Candidates extraction
Better Query Time than Spatial Data Structures
Scales well to higher dimensions and larger data size ( Sub-linear dependence )
Predictable running time
Extra storage over-head
Inefficient for data with distances concentrated around average
works best for Hamming distance (although can be generalized to Euclidean space)
In secondary storage linear scan is pretty much all we can do (for high dim)
requires radius r to be fixed in advance
Pros amp Cons
From Pioter Indyk slides
Conclusion
bullbut at the endeverything depends on your data set
bullTry it at homendashVisit
httpwebmiteduandoniwwwLSHindexhtml
ndashEmail Alex AndoniAndonimitedundashTest over your own data
(C code under Red Hat Linux )
LSH - Applicationsbull Searching video clips in databases (Hierarchical Non-Uniform Locality Sensitive
Hashing and Its Application to Video Identificationldquo Yang Ooi Sun)
bull Searching image databases (see the following)
bull Image segmentation (see the following)
bull Image classification (ldquoDiscriminant adaptive Nearest Neighbor Classificationrdquo T Hastie R Tibshirani)
bull Texture classification (see the following)
bull Clustering (see the following)
bull Embedding and manifold learning (LLE and many others)
bull Compression ndash vector quantizationbull Search engines (ldquoLSH Forest SelfTuning Indexes for Similarity Searchrdquo M Bawa T Condie P Ganesanrdquo)
bull Genomics (ldquoEfficient Large-Scale Sequence Comparison by Locality-Sensitive Hashingrdquo J Buhler)
bull In short whenever K-Nearest Neighbors (KNN) are needed
Motivation
bull A variety of procedures in learning require KNN computation
bull KNN search is a computational bottleneck
bull LSH provides a fast approximate solution to the problem
bull LSH requires hash function construction and parameter tunning
Outline
Fast Pose Estimation with Parameter Sensitive Hashing G Shakhnarovich P Viola and T Darrell
bull Finding sensitive hash functions
Mean Shift Based Clustering in HighDimensions A Texture Classification Example
B Georgescu I Shimshoni and P Meer
bull Tuning LSH parametersbull LSH data structure is used for algorithm
speedups
Given an image x what are the parameters θ in this image
ie angles of joints orientation of the body etc1048698
The Problem
Fast Pose Estimation with Parameter Sensitive Hashing
G Shakhnarovich P Viola and T Darrell
i
Ingredients
bull Input query image with unknown angles (parameters)
bull Database of human poses with known anglesbull Image feature extractor ndash edge detector
bull Distance metric in feature space dx
bull Distance metric in angles space
m
i
iid1
2121 )cos(1)(
Example based learning
bull Construct a database of example images with their known angles
bull Given a query image run your favorite feature extractorbull Compute KNN from databasebull Use these KNNs to compute the average angles of the
query
Input queryFind KNN in database of examples
Output Average angles of KNN
Input Query
Features extraction
Processed query
PSH (LSH)
Database of examples
The algorithm flow
LWR (Regression)
Output Match
The image features
B A
Axx 4107 )(
4
3
2
4 0
Image features are multi-scale edge histograms
Feature Extraction PSH LWR
PSH The basic assumption
There are two metric spaces here feature space ( )
and parameter space ( )
We want similarity to be measured in the angles
space whereas LSH works on the feature space
bull Assumption The feature space is closely related to the parameter space
xd
d
Feature Extraction PSH LWR
Insight Manifolds
bull Manifold is a space in which every point has a neighborhood resembling a Euclid space
bull But global structure may be complicated curved
bull For example lines are 1D manifolds planes are 2D manifolds etc
Feature Extraction PSH LWR
Parameters Space (angles)
Feature Space
q
Is this Magic
Parameter Sensitive Hashing (PSH)
The trick
Estimate performance of different hash functions on examples and select those sensitive to
The hash functions are applied in feature space but the KNN are valid in angle space
d
Feature Extraction PSH LWR
Label pairs of examples with similar angles
Define hash functions h on feature space
Feature Extraction PSH LWR
Predict labeling of similarnon-similar examples by using h
Compare labeling
If labeling by h is goodaccept h else change h
PSH as a classification problem
+1 +1 -1 -1
(r=025)
Labels
)1()( if 1
)( if 1y
labeled is
)x()(x examples ofpair A
ij
ji
rd
rd
ji
ji
ji
Feature Extraction PSH LWR
otherwise 1-
T(x) if 1)(
xh T
A binary hash functionfeatures
otherwise 1
if 1ˆ
labels ePredict th
)(xh)(xh)x(xy
jTiTjih
Feature Extraction PSH LWR
Feature
Feature Extraction PSH LWR
sconstraint iesprobabilit with thelabeling true thepredicts that Tbest theFind
themseparateor bin
same in the examplesboth place willTh
)(xT
Local Weighted Regression (LWR)bull Given a query image PSH returns
KNNs
bull LWR uses the KNN to compute a weighted average of the estimated angles of the query
weightdist
iXiixNx
xxdKxgdi
0)(
))(())((minarg0
Feature Extraction PSH LWR
Results
Synthetic data were generated
bull 13 angles 1 for rotation of the torso 12 for joints
bull 150000 images
bull Nuisance parameters added clothing illumination face expression
bull 1775000 example pairs
bull Selected 137 out of 5123 meaningful features (how)
18 bit hash functions (k) 150 hash tables (l)
bull Test on 1000 synthetic examplesbull PSH searched only 34 of the data per query
bull Without selection needed 40 bits and
1000 hash tables
Recall P1 is prob of positive hashP2 is prob of bad hashB is the max number of pts in a bucket
Results ndash real data
bull 800 images
bull Processed by a segmentation algorithm
bull 13 of the data were searched
Results ndash real data
Interesting mismatches
Fast pose estimation - summary
bull Fast way to compute the angles of human body figure
bull Moving from one representation space to another
bull Training a sensitive hash function
bull KNN smart averaging
Food for Thought
bull The basic assumption may be problematic (distance metric representations)
bull The training set should be dense
bull Texture and clutter
bull General some features are more important than others and should be weighted
Food for Thought Point Location in Different Spheres (PLDS)
bull Given n spheres in Rd centered at P=p1hellippn
with radii r1helliprn
bull Goal given a query q preprocess the points in P to find point pi that its sphere lsquocoverrsquo the query q
qpi
ri
Courtesy of Mohamad Hegaze
Motivationbull Clustering high dimensional data by using local
density measurements (eg feature space)bull Statistical curse of dimensionality
sparseness of the databull Computational curse of dimensionality
expensive range queriesbull LSH parameters should be adjusted for optimal
performance
Mean-Shift Based Clustering in High Dimensions A Texture Classification Example
B Georgescu I Shimshoni and P Meer
Outline
bull Mean-shift in a nutshell + examples
Our scope
bull Mean-shift in high dimensions ndash using LSH
bull Speedups1 Finding optimal LSH parameters
2 Data-driven partitions into buckets
3 Additional speedup by using LSH data structure
Mean-Shift in a Nutshellbandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
point
KNN in mean-shift
Bandwidth should be inversely proportional to the density in the region
high density - small bandwidth low density - large bandwidth
Based on kth nearest neighbor of the point
The bandwidth is
Adaptive mean-shift vs non-adaptive
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Image segmentation algorithm1 Input Data in 5D (3 color + 2 xy) or 3D (1 gray +2 xy)2 Resolution controlled by the bandwidth hs (spatial) hr (color)3 Apply filtering
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
3D
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Image segmentation algorithm
original segmented
filtered
Filtering pixel value of the nearest mode
Mean-shift trajectories
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
original squirrel filtered
original baboon filtered
Filtering examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Segmentation examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Mean-shift in high dimensions
Computational curse of dimensionality
Statistical curse of dimensionality
Expensive range queries implemented with LSH
Sparseness of the data variable bandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
LSH-based data structure
bull Choose L random partitionsEach partition includes K pairs
(dkvk)bull For each point we check
kdi vxK
It Partitions the data into cells
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing the optimal K and L
bull For a query q compute smallest number of distances to points in its buckets
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
points extra includemight big toois L ifbut
missed bemight points small toois L If
cell ain points ofnumber smaller k Large
C
l
l
CC
dC
LNN
dKnN
)1(
C
C
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
structure data theof resolution thedetermines
decreases but increases increases L As
C
CC
Choosing optimal K and LDetermine accurately the KNN for m randomly-selected data points
distance (bandwidth)
Choose error threshold
The optimal K and L should satisfy
the approximate distance
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing optimal K and Lbull For each K estimate the error forbull In one run for all Lrsquos find the minimal L satisfying the constraint L(K)bull Minimize time t(KL(K))
minimum
Approximationerror for KL
L(K) for =005 Running timet[KL(K)]
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Data driven partitions
bull In original LSH cut values are random in the range of the databull Suggestion Randomly select a point from the data and use one of its coordinates as the cut value
uniform data driven pointsbucket distribution
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Additional speedup
aggregate)an of typea like is (C mode same
the toconverge willCin points all that Assume
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
C
C
Speedup results
65536 points 1638 points sampled k=100
Food for thought
Low dimension High dimension
A thought for foodhellipbull Choose K L by sample learning or take the traditionalbull Can one estimate K L without samplingbull A thought for food does it help to know the data
dimensionality or the data manifoldbull Intuitively dimensionality implies the number of hash
functions neededbull The catch efficient dimensionality learning requires
KNN
1530 cookieshellip
Summary
bull LSH suggests a compromise on accuracy for the gain of complexity
bull Applications that involve massive data in high dimension require the LSH fast performance
bull Extension of the LSH to different spaces (PSH)
bull Learning the LSH parameters and hash functions for different applications
Conclusion
bull but at the endeverything depends on your data set
bull Try it at homendash Visit
httpwebmiteduandoniwwwLSHindexhtmlndash Email Alex Andoni Andonimitedundash Test over your own data
(C code under Red Hat Linux )
Thanks
bull Ilan Shimshoni (Haifa)
bull Mohamad Hegaze (Weizmann)
bull Alex Andoni (MIT)
bull Mica and Denis
- k-Nearest Neighbors Search in High Dimensions
- Outline
- Nearest Neighbor Search Problem definition
- Applications
- Naiumlve solution
- Common solution
- When to use nearest neighbor
- Nearest Neighbor
- r - Nearest Neighbor
- Slide 10
- The simplest solution
- Quadtree
- Quadtree - structure
- Quadtree - Query
- Quadtree ndash Pitfall1
- Slide 16
- Quadtree ndash pitfall 2
- Space partition based algorithms
- Slide 19
- Curse of dimensionality
- Curse of dimensionality Some intuition
- Slide 22
- Preview
- Hash function
- Slide 25
- Slide 26
- Recall r - Nearest Neighbor
- Locality sensitive hashing
- Slide 29
- Hamming Space
- Slide 31
- L1 to Hamming Space Embedding
- Slide 33
- Construction
- Query
- Alternative intuition random projections
- Slide 37
- Slide 38
- Slide 39
- k samplings
- Repeating
- Repeating L times
- Slide 43
- Secondary hashing
- The above hashing is locality-sensitive
- Slide 46
- Direct L2 solution
- Central limit theorem
- Slide 49
- Slide 50
- Norm Distance
- Slide 52
- The full Hashing
- Slide 54
- Slide 55
- Slide 56
- Generalization P-Stable distribution
- P-Stable summary
- Parameters selection
- Parameters selection hellip
- hellip Parameters selection
- Pros amp Cons
- Conclusion
- LSH - Applications
- Motivation
- Slide 66
- Given an image x what are the parameters θ in this image ie angles of joints orientation of the body etc1048698
- Ingredients
- Example based learning
- Slide 70
- The image features
- PSH The basic assumption
- Insight Manifolds
- Slide 74
- Parameter Sensitive Hashing (PSH)
- Slide 76
- Slide 77
- Slide 78
- Slide 79
- Local Weighted Regression (LWR)
- Results
- Slide 82
- Results ndash real data
- Slide 84
- Slide 85
- Fast pose estimation - summary
- Food for Thought
- Food for Thought Point Location in Different Spheres (PLDS)
- Motivation
- Slide 90
- Slide 91
- Slide 92
- Slide 93
- Image segmentation algorithm
- Slide 95
- Filtering examples
- Segmentation examples
- Mean-shift in high dimensions
- LSH-based data structure
- Choosing the optimal K and L
- Slide 101
- Choosing optimal K and L
- Slide 103
- Data driven partitions
- Additional speedup
- Speedup results
- Food for thought
- A thought for foodhellip
- Summary
- Slide 110
- Thanks
-
![Page 20: K-Nearest Neighbors Search in High Dimensions Tomer Peled Dan Kushnir Tell me who your neighbors are, and I'll know who you are.](https://reader036.fdocuments.us/reader036/viewer/2022062421/56649c765503460f94929f53/html5/thumbnails/20.jpg)
Curse of dimensionality
bullQuery time or spaceO(nd)bullDgt1020 worst than sequential scan
ndashFor most geometric distributionsbullTechniques specific to high dimensions are needed
bullProoved in theory and in practice by Barkol amp Rabani 2000 amp Beame-Vee 2002
O( min(nd nd) )Naive
Curse of dimensionalitySome intuition
2
22
23
2d
Outline
bullProblem definition and flavorsbullAlgorithms overview - low dimensions bullCurse of dimensionality (dgt1020)bullEnchanting the curse Enchanting the curse
Locality Sensitive Hashing Locality Sensitive Hashing (high dimension approximate solutions)
bulll2 extensionbullApplications (Dan)
Preview
bullGeneral Solution ndash Locality sensitive hashing
bullImplementation for Hamming space
bullGeneralization to l1 amp l2
Hash function
Hash function
Hash function
Data_Item
Key
BinBucket
Hash function
X modulo 3
X=Number in the range 0n
02
Storage Address
Data structure
0
Usually we would like related Data-items to be stored at the same bin
Recall r - Nearest Neighbor
r
(1 + ) r
dist(qp1) r
dist(qp2) (1 + ) r r2=(1 + ) r1
Locality sensitive hashing
r(1 + ) r
(r p1p2 )Sensitiveequiv Pr[I(p)=I(q)] is ldquohighrdquo if p is ldquocloserdquo to qequiv Pr[I(p)=I(q)] is ldquolowrdquo if p isrdquofarrdquo from q
r2=(1 + ) r1
P1P2
Preview
bullGeneral Solution ndash Locality sensitive hashing
bullImplementation for Hamming space
bullGeneralization to l1 amp l2
Hamming Space
bullHamming space = 2N binary strings
bullHamming distance = changed digits
aka Signal distanceRichard Hamming
Hamming SpaceN
010100001111
010100001111
010010000011Distance = 4
bullHamming space
bullHamming distance
SUM(X1 XOR X2)
L1 to Hamming Space Embedding
p
8
C=11
1111111100011000000000
2
1111111100011000000000
drsquo=Cd
Hash function
Lj Hash function
p Hdrsquoisin
Gj(p)=p|Ij
j=1L k=3 digits
Bits sampling from p
Store p into bucket p|Ij 2k buckets101
11000000000 111111110000 111000000000 111111110001
Construction
1 2 L
p
Query
1 2 L
q
Alternative intuition random projections
p
8
C=11
1111111100011000000000
2
1111111100011000000000
drsquo=Cd
Alternative intuition random projections
8
C=11
1111111100011000000000
2
1111111100011000000000
p
Alternative intuition random projections
8
C=11
1111111100011000000000
2
1111111100011000000000
p
Alternative intuition random projections
101
11000000000 111111110000 111000000000 111111110001
000
100
110
001
101
111
2233 BucketsBucketsp
k samplings
Repeating
Repeating L times
Repeating L times
Secondary hashing
Support volume tuning
dataset-size vs storage volume
2k buckets
011
Size=B
M Buckets
Simple Hashing
MB=αn α=2
Skip
The above hashing is locality-sensitive
bullProbability (pq in same bucket)=
k=1 k=2
Distance (qpi) Distance (qpi)
Pro
babi
lity Pr
Adopted from Piotr Indykrsquos slides
kqp
dimensions
)(Distance1
Preview
bullGeneral Solution ndash Locality sensitive hashing
bullImplementation for Hamming space
bullGeneralization to l2
Direct L2 solution
bullNew hashing function
bullStill based on sampling
bullUsing mathematical trick
bullP-stable distribution for Lp distance bullGaussian distribution for L2 distance
Central limit theorem
v1 +v2 hellip+vn =+hellip
(Weighted Gaussians) = Weighted Gaussian
Central limit theorem
v1vn = Real Numbers
X1Xn = Independent Identically Distributed(iid)
+v2 X2 hellip+vn Xn =+hellipv1 X1
Central limit theorem
XvXvi
ii
ii
21
2||
Dot Product Norm
Norm Distance
XvuXvXui
iii
iii
ii
21
2||
Features vector 1
Features vector 2 Distance
Norm Distance
XvuXvXui
iii
iii
ii
21
2||
Dot Product
Dot Product Distance
The full Hashing
w
bvavh ba )(
[34 82 21]1
227742
d
d random numbers
+b
phaseRandom[0w]
wDiscretization step
Features vector
The full Hashing
w
bvavh ba )(
+34
100
7944
7900 8000 8100 82007800
The full Hashing
w
bvavh ba )(
+34
phaseRandom[0w]
100Discretization step
7944
The full Hashing
w
bvavh ba )(
a1 v d
iid from p-stable distribution
+b
phaseRandom[0w]
wDiscretization step
Features vector
Generalization P-Stable distribution
bullLp p=eps2
bullGeneralized Central Limit Theorem
bullP-stable distributionCauchy for L2
bullL2
bullCentral Limit Theorem
bullGaussian (normal) distribution
P-Stable summary
bullWorks for bullGeneralizes to 0ltplt=2
bullImproves query time
Query time = O (dn1(1+)log(n) ) O (dn1(1+)^2log(n) )
r - Nearest Neighbor
Latest resultsReported in Email by
Alexander Andoni
Parameters selection
bull90 Probability Best quarry time performance
For Euclidean Space
Parameters selectionhellip
For Euclidean Space
bullSingle projection hit an - Nearest Neighbor with Pr=p1
bullk projections hits an - Nearest Neighbor with Pr=p1k
bullL hashings fail to collide with Pr=(1-p1k)L
bullTo ensure Collision (eg 1-δge90)
bull1( -1-p1k)Lge 1-δ)1log(
)log(
1kp
L
L
Reject Non-NeighborsAccept Neighbors
hellipParameters selection
K
k
time Candidates verification Candidates extraction
Better Query Time than Spatial Data Structures
Scales well to higher dimensions and larger data size ( Sub-linear dependence )
Predictable running time
Extra storage over-head
Inefficient for data with distances concentrated around average
works best for Hamming distance (although can be generalized to Euclidean space)
In secondary storage linear scan is pretty much all we can do (for high dim)
requires radius r to be fixed in advance
Pros amp Cons
From Pioter Indyk slides
Conclusion
bullbut at the endeverything depends on your data set
bullTry it at homendashVisit
httpwebmiteduandoniwwwLSHindexhtml
ndashEmail Alex AndoniAndonimitedundashTest over your own data
(C code under Red Hat Linux )
LSH - Applicationsbull Searching video clips in databases (Hierarchical Non-Uniform Locality Sensitive
Hashing and Its Application to Video Identificationldquo Yang Ooi Sun)
bull Searching image databases (see the following)
bull Image segmentation (see the following)
bull Image classification (ldquoDiscriminant adaptive Nearest Neighbor Classificationrdquo T Hastie R Tibshirani)
bull Texture classification (see the following)
bull Clustering (see the following)
bull Embedding and manifold learning (LLE and many others)
bull Compression ndash vector quantizationbull Search engines (ldquoLSH Forest SelfTuning Indexes for Similarity Searchrdquo M Bawa T Condie P Ganesanrdquo)
bull Genomics (ldquoEfficient Large-Scale Sequence Comparison by Locality-Sensitive Hashingrdquo J Buhler)
bull In short whenever K-Nearest Neighbors (KNN) are needed
Motivation
bull A variety of procedures in learning require KNN computation
bull KNN search is a computational bottleneck
bull LSH provides a fast approximate solution to the problem
bull LSH requires hash function construction and parameter tunning
Outline
Fast Pose Estimation with Parameter Sensitive Hashing G Shakhnarovich P Viola and T Darrell
bull Finding sensitive hash functions
Mean Shift Based Clustering in HighDimensions A Texture Classification Example
B Georgescu I Shimshoni and P Meer
bull Tuning LSH parametersbull LSH data structure is used for algorithm
speedups
Given an image x what are the parameters θ in this image
ie angles of joints orientation of the body etc1048698
The Problem
Fast Pose Estimation with Parameter Sensitive Hashing
G Shakhnarovich P Viola and T Darrell
i
Ingredients
bull Input query image with unknown angles (parameters)
bull Database of human poses with known anglesbull Image feature extractor ndash edge detector
bull Distance metric in feature space dx
bull Distance metric in angles space
m
i
iid1
2121 )cos(1)(
Example based learning
bull Construct a database of example images with their known angles
bull Given a query image run your favorite feature extractorbull Compute KNN from databasebull Use these KNNs to compute the average angles of the
query
Input queryFind KNN in database of examples
Output Average angles of KNN
Input Query
Features extraction
Processed query
PSH (LSH)
Database of examples
The algorithm flow
LWR (Regression)
Output Match
The image features
B A
Axx 4107 )(
4
3
2
4 0
Image features are multi-scale edge histograms
Feature Extraction PSH LWR
PSH The basic assumption
There are two metric spaces here feature space ( )
and parameter space ( )
We want similarity to be measured in the angles
space whereas LSH works on the feature space
bull Assumption The feature space is closely related to the parameter space
xd
d
Feature Extraction PSH LWR
Insight Manifolds
bull Manifold is a space in which every point has a neighborhood resembling a Euclid space
bull But global structure may be complicated curved
bull For example lines are 1D manifolds planes are 2D manifolds etc
Feature Extraction PSH LWR
Parameters Space (angles)
Feature Space
q
Is this Magic
Parameter Sensitive Hashing (PSH)
The trick
Estimate performance of different hash functions on examples and select those sensitive to
The hash functions are applied in feature space but the KNN are valid in angle space
d
Feature Extraction PSH LWR
Label pairs of examples with similar angles
Define hash functions h on feature space
Feature Extraction PSH LWR
Predict labeling of similarnon-similar examples by using h
Compare labeling
If labeling by h is goodaccept h else change h
PSH as a classification problem
+1 +1 -1 -1
(r=025)
Labels
)1()( if 1
)( if 1y
labeled is
)x()(x examples ofpair A
ij
ji
rd
rd
ji
ji
ji
Feature Extraction PSH LWR
otherwise 1-
T(x) if 1)(
xh T
A binary hash functionfeatures
otherwise 1
if 1ˆ
labels ePredict th
)(xh)(xh)x(xy
jTiTjih
Feature Extraction PSH LWR
Feature
Feature Extraction PSH LWR
sconstraint iesprobabilit with thelabeling true thepredicts that Tbest theFind
themseparateor bin
same in the examplesboth place willTh
)(xT
Local Weighted Regression (LWR)bull Given a query image PSH returns
KNNs
bull LWR uses the KNN to compute a weighted average of the estimated angles of the query
weightdist
iXiixNx
xxdKxgdi
0)(
))(())((minarg0
Feature Extraction PSH LWR
Results
Synthetic data were generated
bull 13 angles 1 for rotation of the torso 12 for joints
bull 150000 images
bull Nuisance parameters added clothing illumination face expression
bull 1775000 example pairs
bull Selected 137 out of 5123 meaningful features (how)
18 bit hash functions (k) 150 hash tables (l)
bull Test on 1000 synthetic examplesbull PSH searched only 34 of the data per query
bull Without selection needed 40 bits and
1000 hash tables
Recall P1 is prob of positive hashP2 is prob of bad hashB is the max number of pts in a bucket
Results ndash real data
bull 800 images
bull Processed by a segmentation algorithm
bull 13 of the data were searched
Results ndash real data
Interesting mismatches
Fast pose estimation - summary
bull Fast way to compute the angles of human body figure
bull Moving from one representation space to another
bull Training a sensitive hash function
bull KNN smart averaging
Food for Thought
bull The basic assumption may be problematic (distance metric representations)
bull The training set should be dense
bull Texture and clutter
bull General some features are more important than others and should be weighted
Food for Thought Point Location in Different Spheres (PLDS)
bull Given n spheres in Rd centered at P=p1hellippn
with radii r1helliprn
bull Goal given a query q preprocess the points in P to find point pi that its sphere lsquocoverrsquo the query q
qpi
ri
Courtesy of Mohamad Hegaze
Motivationbull Clustering high dimensional data by using local
density measurements (eg feature space)bull Statistical curse of dimensionality
sparseness of the databull Computational curse of dimensionality
expensive range queriesbull LSH parameters should be adjusted for optimal
performance
Mean-Shift Based Clustering in High Dimensions A Texture Classification Example
B Georgescu I Shimshoni and P Meer
Outline
bull Mean-shift in a nutshell + examples
Our scope
bull Mean-shift in high dimensions ndash using LSH
bull Speedups1 Finding optimal LSH parameters
2 Data-driven partitions into buckets
3 Additional speedup by using LSH data structure
Mean-Shift in a Nutshellbandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
point
KNN in mean-shift
Bandwidth should be inversely proportional to the density in the region
high density - small bandwidth low density - large bandwidth
Based on kth nearest neighbor of the point
The bandwidth is
Adaptive mean-shift vs non-adaptive
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Image segmentation algorithm1 Input Data in 5D (3 color + 2 xy) or 3D (1 gray +2 xy)2 Resolution controlled by the bandwidth hs (spatial) hr (color)3 Apply filtering
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
3D
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Image segmentation algorithm
original segmented
filtered
Filtering pixel value of the nearest mode
Mean-shift trajectories
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
original squirrel filtered
original baboon filtered
Filtering examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Segmentation examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Mean-shift in high dimensions
Computational curse of dimensionality
Statistical curse of dimensionality
Expensive range queries implemented with LSH
Sparseness of the data variable bandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
LSH-based data structure
bull Choose L random partitionsEach partition includes K pairs
(dkvk)bull For each point we check
kdi vxK
It Partitions the data into cells
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing the optimal K and L
bull For a query q compute smallest number of distances to points in its buckets
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
points extra includemight big toois L ifbut
missed bemight points small toois L If
cell ain points ofnumber smaller k Large
C
l
l
CC
dC
LNN
dKnN
)1(
C
C
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
structure data theof resolution thedetermines
decreases but increases increases L As
C
CC
Choosing optimal K and LDetermine accurately the KNN for m randomly-selected data points
distance (bandwidth)
Choose error threshold
The optimal K and L should satisfy
the approximate distance
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing optimal K and Lbull For each K estimate the error forbull In one run for all Lrsquos find the minimal L satisfying the constraint L(K)bull Minimize time t(KL(K))
minimum
Approximationerror for KL
L(K) for =005 Running timet[KL(K)]
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Data driven partitions
bull In original LSH cut values are random in the range of the databull Suggestion Randomly select a point from the data and use one of its coordinates as the cut value
uniform data driven pointsbucket distribution
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Additional speedup
aggregate)an of typea like is (C mode same
the toconverge willCin points all that Assume
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
C
C
Speedup results
65536 points 1638 points sampled k=100
Food for thought
Low dimension High dimension
A thought for foodhellipbull Choose K L by sample learning or take the traditionalbull Can one estimate K L without samplingbull A thought for food does it help to know the data
dimensionality or the data manifoldbull Intuitively dimensionality implies the number of hash
functions neededbull The catch efficient dimensionality learning requires
KNN
1530 cookieshellip
Summary
bull LSH suggests a compromise on accuracy for the gain of complexity
bull Applications that involve massive data in high dimension require the LSH fast performance
bull Extension of the LSH to different spaces (PSH)
bull Learning the LSH parameters and hash functions for different applications
Conclusion
bull but at the endeverything depends on your data set
bull Try it at homendash Visit
httpwebmiteduandoniwwwLSHindexhtmlndash Email Alex Andoni Andonimitedundash Test over your own data
(C code under Red Hat Linux )
Thanks
bull Ilan Shimshoni (Haifa)
bull Mohamad Hegaze (Weizmann)
bull Alex Andoni (MIT)
bull Mica and Denis
- k-Nearest Neighbors Search in High Dimensions
- Outline
- Nearest Neighbor Search Problem definition
- Applications
- Naiumlve solution
- Common solution
- When to use nearest neighbor
- Nearest Neighbor
- r - Nearest Neighbor
- Slide 10
- The simplest solution
- Quadtree
- Quadtree - structure
- Quadtree - Query
- Quadtree ndash Pitfall1
- Slide 16
- Quadtree ndash pitfall 2
- Space partition based algorithms
- Slide 19
- Curse of dimensionality
- Curse of dimensionality Some intuition
- Slide 22
- Preview
- Hash function
- Slide 25
- Slide 26
- Recall r - Nearest Neighbor
- Locality sensitive hashing
- Slide 29
- Hamming Space
- Slide 31
- L1 to Hamming Space Embedding
- Slide 33
- Construction
- Query
- Alternative intuition random projections
- Slide 37
- Slide 38
- Slide 39
- k samplings
- Repeating
- Repeating L times
- Slide 43
- Secondary hashing
- The above hashing is locality-sensitive
- Slide 46
- Direct L2 solution
- Central limit theorem
- Slide 49
- Slide 50
- Norm Distance
- Slide 52
- The full Hashing
- Slide 54
- Slide 55
- Slide 56
- Generalization P-Stable distribution
- P-Stable summary
- Parameters selection
- Parameters selection hellip
- hellip Parameters selection
- Pros amp Cons
- Conclusion
- LSH - Applications
- Motivation
- Slide 66
- Given an image x what are the parameters θ in this image ie angles of joints orientation of the body etc1048698
- Ingredients
- Example based learning
- Slide 70
- The image features
- PSH The basic assumption
- Insight Manifolds
- Slide 74
- Parameter Sensitive Hashing (PSH)
- Slide 76
- Slide 77
- Slide 78
- Slide 79
- Local Weighted Regression (LWR)
- Results
- Slide 82
- Results ndash real data
- Slide 84
- Slide 85
- Fast pose estimation - summary
- Food for Thought
- Food for Thought Point Location in Different Spheres (PLDS)
- Motivation
- Slide 90
- Slide 91
- Slide 92
- Slide 93
- Image segmentation algorithm
- Slide 95
- Filtering examples
- Segmentation examples
- Mean-shift in high dimensions
- LSH-based data structure
- Choosing the optimal K and L
- Slide 101
- Choosing optimal K and L
- Slide 103
- Data driven partitions
- Additional speedup
- Speedup results
- Food for thought
- A thought for foodhellip
- Summary
- Slide 110
- Thanks
-
![Page 21: K-Nearest Neighbors Search in High Dimensions Tomer Peled Dan Kushnir Tell me who your neighbors are, and I'll know who you are.](https://reader036.fdocuments.us/reader036/viewer/2022062421/56649c765503460f94929f53/html5/thumbnails/21.jpg)
Curse of dimensionalitySome intuition
2
22
23
2d
Outline
bullProblem definition and flavorsbullAlgorithms overview - low dimensions bullCurse of dimensionality (dgt1020)bullEnchanting the curse Enchanting the curse
Locality Sensitive Hashing Locality Sensitive Hashing (high dimension approximate solutions)
bulll2 extensionbullApplications (Dan)
Preview
bullGeneral Solution ndash Locality sensitive hashing
bullImplementation for Hamming space
bullGeneralization to l1 amp l2
Hash function
Hash function
Hash function
Data_Item
Key
BinBucket
Hash function
X modulo 3
X=Number in the range 0n
02
Storage Address
Data structure
0
Usually we would like related Data-items to be stored at the same bin
Recall r - Nearest Neighbor
r
(1 + ) r
dist(qp1) r
dist(qp2) (1 + ) r r2=(1 + ) r1
Locality sensitive hashing
r(1 + ) r
(r p1p2 )Sensitiveequiv Pr[I(p)=I(q)] is ldquohighrdquo if p is ldquocloserdquo to qequiv Pr[I(p)=I(q)] is ldquolowrdquo if p isrdquofarrdquo from q
r2=(1 + ) r1
P1P2
Preview
bullGeneral Solution ndash Locality sensitive hashing
bullImplementation for Hamming space
bullGeneralization to l1 amp l2
Hamming Space
bullHamming space = 2N binary strings
bullHamming distance = changed digits
aka Signal distanceRichard Hamming
Hamming SpaceN
010100001111
010100001111
010010000011Distance = 4
bullHamming space
bullHamming distance
SUM(X1 XOR X2)
L1 to Hamming Space Embedding
p
8
C=11
1111111100011000000000
2
1111111100011000000000
drsquo=Cd
Hash function
Lj Hash function
p Hdrsquoisin
Gj(p)=p|Ij
j=1L k=3 digits
Bits sampling from p
Store p into bucket p|Ij 2k buckets101
11000000000 111111110000 111000000000 111111110001
Construction
1 2 L
p
Query
1 2 L
q
Alternative intuition random projections
p
8
C=11
1111111100011000000000
2
1111111100011000000000
drsquo=Cd
Alternative intuition random projections
8
C=11
1111111100011000000000
2
1111111100011000000000
p
Alternative intuition random projections
8
C=11
1111111100011000000000
2
1111111100011000000000
p
Alternative intuition random projections
101
11000000000 111111110000 111000000000 111111110001
000
100
110
001
101
111
2233 BucketsBucketsp
k samplings
Repeating
Repeating L times
Repeating L times
Secondary hashing
Support volume tuning
dataset-size vs storage volume
2k buckets
011
Size=B
M Buckets
Simple Hashing
MB=αn α=2
Skip
The above hashing is locality-sensitive
bullProbability (pq in same bucket)=
k=1 k=2
Distance (qpi) Distance (qpi)
Pro
babi
lity Pr
Adopted from Piotr Indykrsquos slides
kqp
dimensions
)(Distance1
Preview
bullGeneral Solution ndash Locality sensitive hashing
bullImplementation for Hamming space
bullGeneralization to l2
Direct L2 solution
bullNew hashing function
bullStill based on sampling
bullUsing mathematical trick
bullP-stable distribution for Lp distance bullGaussian distribution for L2 distance
Central limit theorem
v1 +v2 hellip+vn =+hellip
(Weighted Gaussians) = Weighted Gaussian
Central limit theorem
v1vn = Real Numbers
X1Xn = Independent Identically Distributed(iid)
+v2 X2 hellip+vn Xn =+hellipv1 X1
Central limit theorem
XvXvi
ii
ii
21
2||
Dot Product Norm
Norm Distance
XvuXvXui
iii
iii
ii
21
2||
Features vector 1
Features vector 2 Distance
Norm Distance
XvuXvXui
iii
iii
ii
21
2||
Dot Product
Dot Product Distance
The full Hashing
w
bvavh ba )(
[34 82 21]1
227742
d
d random numbers
+b
phaseRandom[0w]
wDiscretization step
Features vector
The full Hashing
w
bvavh ba )(
+34
100
7944
7900 8000 8100 82007800
The full Hashing
w
bvavh ba )(
+34
phaseRandom[0w]
100Discretization step
7944
The full Hashing
w
bvavh ba )(
a1 v d
iid from p-stable distribution
+b
phaseRandom[0w]
wDiscretization step
Features vector
Generalization P-Stable distribution
bullLp p=eps2
bullGeneralized Central Limit Theorem
bullP-stable distributionCauchy for L2
bullL2
bullCentral Limit Theorem
bullGaussian (normal) distribution
P-Stable summary
bullWorks for bullGeneralizes to 0ltplt=2
bullImproves query time
Query time = O (dn1(1+)log(n) ) O (dn1(1+)^2log(n) )
r - Nearest Neighbor
Latest resultsReported in Email by
Alexander Andoni
Parameters selection
bull90 Probability Best quarry time performance
For Euclidean Space
Parameters selectionhellip
For Euclidean Space
bullSingle projection hit an - Nearest Neighbor with Pr=p1
bullk projections hits an - Nearest Neighbor with Pr=p1k
bullL hashings fail to collide with Pr=(1-p1k)L
bullTo ensure Collision (eg 1-δge90)
bull1( -1-p1k)Lge 1-δ)1log(
)log(
1kp
L
L
Reject Non-NeighborsAccept Neighbors
hellipParameters selection
K
k
time Candidates verification Candidates extraction
Better Query Time than Spatial Data Structures
Scales well to higher dimensions and larger data size ( Sub-linear dependence )
Predictable running time
Extra storage over-head
Inefficient for data with distances concentrated around average
works best for Hamming distance (although can be generalized to Euclidean space)
In secondary storage linear scan is pretty much all we can do (for high dim)
requires radius r to be fixed in advance
Pros amp Cons
From Pioter Indyk slides
Conclusion
bullbut at the endeverything depends on your data set
bullTry it at homendashVisit
httpwebmiteduandoniwwwLSHindexhtml
ndashEmail Alex AndoniAndonimitedundashTest over your own data
(C code under Red Hat Linux )
LSH - Applicationsbull Searching video clips in databases (Hierarchical Non-Uniform Locality Sensitive
Hashing and Its Application to Video Identificationldquo Yang Ooi Sun)
bull Searching image databases (see the following)
bull Image segmentation (see the following)
bull Image classification (ldquoDiscriminant adaptive Nearest Neighbor Classificationrdquo T Hastie R Tibshirani)
bull Texture classification (see the following)
bull Clustering (see the following)
bull Embedding and manifold learning (LLE and many others)
bull Compression ndash vector quantizationbull Search engines (ldquoLSH Forest SelfTuning Indexes for Similarity Searchrdquo M Bawa T Condie P Ganesanrdquo)
bull Genomics (ldquoEfficient Large-Scale Sequence Comparison by Locality-Sensitive Hashingrdquo J Buhler)
bull In short whenever K-Nearest Neighbors (KNN) are needed
Motivation
bull A variety of procedures in learning require KNN computation
bull KNN search is a computational bottleneck
bull LSH provides a fast approximate solution to the problem
bull LSH requires hash function construction and parameter tunning
Outline
Fast Pose Estimation with Parameter Sensitive Hashing G Shakhnarovich P Viola and T Darrell
bull Finding sensitive hash functions
Mean Shift Based Clustering in HighDimensions A Texture Classification Example
B Georgescu I Shimshoni and P Meer
bull Tuning LSH parametersbull LSH data structure is used for algorithm
speedups
Given an image x what are the parameters θ in this image
ie angles of joints orientation of the body etc1048698
The Problem
Fast Pose Estimation with Parameter Sensitive Hashing
G Shakhnarovich P Viola and T Darrell
i
Ingredients
bull Input query image with unknown angles (parameters)
bull Database of human poses with known anglesbull Image feature extractor ndash edge detector
bull Distance metric in feature space dx
bull Distance metric in angles space
m
i
iid1
2121 )cos(1)(
Example based learning
bull Construct a database of example images with their known angles
bull Given a query image run your favorite feature extractorbull Compute KNN from databasebull Use these KNNs to compute the average angles of the
query
Input queryFind KNN in database of examples
Output Average angles of KNN
Input Query
Features extraction
Processed query
PSH (LSH)
Database of examples
The algorithm flow
LWR (Regression)
Output Match
The image features
B A
Axx 4107 )(
4
3
2
4 0
Image features are multi-scale edge histograms
Feature Extraction PSH LWR
PSH The basic assumption
There are two metric spaces here feature space ( )
and parameter space ( )
We want similarity to be measured in the angles
space whereas LSH works on the feature space
bull Assumption The feature space is closely related to the parameter space
xd
d
Feature Extraction PSH LWR
Insight Manifolds
bull Manifold is a space in which every point has a neighborhood resembling a Euclid space
bull But global structure may be complicated curved
bull For example lines are 1D manifolds planes are 2D manifolds etc
Feature Extraction PSH LWR
Parameters Space (angles)
Feature Space
q
Is this Magic
Parameter Sensitive Hashing (PSH)
The trick
Estimate performance of different hash functions on examples and select those sensitive to
The hash functions are applied in feature space but the KNN are valid in angle space
d
Feature Extraction PSH LWR
Label pairs of examples with similar angles
Define hash functions h on feature space
Feature Extraction PSH LWR
Predict labeling of similarnon-similar examples by using h
Compare labeling
If labeling by h is goodaccept h else change h
PSH as a classification problem
+1 +1 -1 -1
(r=025)
Labels
)1()( if 1
)( if 1y
labeled is
)x()(x examples ofpair A
ij
ji
rd
rd
ji
ji
ji
Feature Extraction PSH LWR
otherwise 1-
T(x) if 1)(
xh T
A binary hash functionfeatures
otherwise 1
if 1ˆ
labels ePredict th
)(xh)(xh)x(xy
jTiTjih
Feature Extraction PSH LWR
Feature
Feature Extraction PSH LWR
sconstraint iesprobabilit with thelabeling true thepredicts that Tbest theFind
themseparateor bin
same in the examplesboth place willTh
)(xT
Local Weighted Regression (LWR)bull Given a query image PSH returns
KNNs
bull LWR uses the KNN to compute a weighted average of the estimated angles of the query
weightdist
iXiixNx
xxdKxgdi
0)(
))(())((minarg0
Feature Extraction PSH LWR
Results
Synthetic data were generated
bull 13 angles 1 for rotation of the torso 12 for joints
bull 150000 images
bull Nuisance parameters added clothing illumination face expression
bull 1775000 example pairs
bull Selected 137 out of 5123 meaningful features (how)
18 bit hash functions (k) 150 hash tables (l)
bull Test on 1000 synthetic examplesbull PSH searched only 34 of the data per query
bull Without selection needed 40 bits and
1000 hash tables
Recall P1 is prob of positive hashP2 is prob of bad hashB is the max number of pts in a bucket
Results ndash real data
bull 800 images
bull Processed by a segmentation algorithm
bull 13 of the data were searched
Results ndash real data
Interesting mismatches
Fast pose estimation - summary
bull Fast way to compute the angles of human body figure
bull Moving from one representation space to another
bull Training a sensitive hash function
bull KNN smart averaging
Food for Thought
bull The basic assumption may be problematic (distance metric representations)
bull The training set should be dense
bull Texture and clutter
bull General some features are more important than others and should be weighted
Food for Thought Point Location in Different Spheres (PLDS)
bull Given n spheres in Rd centered at P=p1hellippn
with radii r1helliprn
bull Goal given a query q preprocess the points in P to find point pi that its sphere lsquocoverrsquo the query q
qpi
ri
Courtesy of Mohamad Hegaze
Motivationbull Clustering high dimensional data by using local
density measurements (eg feature space)bull Statistical curse of dimensionality
sparseness of the databull Computational curse of dimensionality
expensive range queriesbull LSH parameters should be adjusted for optimal
performance
Mean-Shift Based Clustering in High Dimensions A Texture Classification Example
B Georgescu I Shimshoni and P Meer
Outline
bull Mean-shift in a nutshell + examples
Our scope
bull Mean-shift in high dimensions ndash using LSH
bull Speedups1 Finding optimal LSH parameters
2 Data-driven partitions into buckets
3 Additional speedup by using LSH data structure
Mean-Shift in a Nutshellbandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
point
KNN in mean-shift
Bandwidth should be inversely proportional to the density in the region
high density - small bandwidth low density - large bandwidth
Based on kth nearest neighbor of the point
The bandwidth is
Adaptive mean-shift vs non-adaptive
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Image segmentation algorithm1 Input Data in 5D (3 color + 2 xy) or 3D (1 gray +2 xy)2 Resolution controlled by the bandwidth hs (spatial) hr (color)3 Apply filtering
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
3D
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Image segmentation algorithm
original segmented
filtered
Filtering pixel value of the nearest mode
Mean-shift trajectories
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
original squirrel filtered
original baboon filtered
Filtering examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Segmentation examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Mean-shift in high dimensions
Computational curse of dimensionality
Statistical curse of dimensionality
Expensive range queries implemented with LSH
Sparseness of the data variable bandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
LSH-based data structure
bull Choose L random partitionsEach partition includes K pairs
(dkvk)bull For each point we check
kdi vxK
It Partitions the data into cells
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing the optimal K and L
bull For a query q compute smallest number of distances to points in its buckets
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
points extra includemight big toois L ifbut
missed bemight points small toois L If
cell ain points ofnumber smaller k Large
C
l
l
CC
dC
LNN
dKnN
)1(
C
C
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
structure data theof resolution thedetermines
decreases but increases increases L As
C
CC
Choosing optimal K and LDetermine accurately the KNN for m randomly-selected data points
distance (bandwidth)
Choose error threshold
The optimal K and L should satisfy
the approximate distance
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing optimal K and Lbull For each K estimate the error forbull In one run for all Lrsquos find the minimal L satisfying the constraint L(K)bull Minimize time t(KL(K))
minimum
Approximationerror for KL
L(K) for =005 Running timet[KL(K)]
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Data driven partitions
bull In original LSH cut values are random in the range of the databull Suggestion Randomly select a point from the data and use one of its coordinates as the cut value
uniform data driven pointsbucket distribution
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Additional speedup
aggregate)an of typea like is (C mode same
the toconverge willCin points all that Assume
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
C
C
Speedup results
65536 points 1638 points sampled k=100
Food for thought
Low dimension High dimension
A thought for foodhellipbull Choose K L by sample learning or take the traditionalbull Can one estimate K L without samplingbull A thought for food does it help to know the data
dimensionality or the data manifoldbull Intuitively dimensionality implies the number of hash
functions neededbull The catch efficient dimensionality learning requires
KNN
1530 cookieshellip
Summary
bull LSH suggests a compromise on accuracy for the gain of complexity
bull Applications that involve massive data in high dimension require the LSH fast performance
bull Extension of the LSH to different spaces (PSH)
bull Learning the LSH parameters and hash functions for different applications
Conclusion
bull but at the endeverything depends on your data set
bull Try it at homendash Visit
httpwebmiteduandoniwwwLSHindexhtmlndash Email Alex Andoni Andonimitedundash Test over your own data
(C code under Red Hat Linux )
Thanks
bull Ilan Shimshoni (Haifa)
bull Mohamad Hegaze (Weizmann)
bull Alex Andoni (MIT)
bull Mica and Denis
- k-Nearest Neighbors Search in High Dimensions
- Outline
- Nearest Neighbor Search Problem definition
- Applications
- Naiumlve solution
- Common solution
- When to use nearest neighbor
- Nearest Neighbor
- r - Nearest Neighbor
- Slide 10
- The simplest solution
- Quadtree
- Quadtree - structure
- Quadtree - Query
- Quadtree ndash Pitfall1
- Slide 16
- Quadtree ndash pitfall 2
- Space partition based algorithms
- Slide 19
- Curse of dimensionality
- Curse of dimensionality Some intuition
- Slide 22
- Preview
- Hash function
- Slide 25
- Slide 26
- Recall r - Nearest Neighbor
- Locality sensitive hashing
- Slide 29
- Hamming Space
- Slide 31
- L1 to Hamming Space Embedding
- Slide 33
- Construction
- Query
- Alternative intuition random projections
- Slide 37
- Slide 38
- Slide 39
- k samplings
- Repeating
- Repeating L times
- Slide 43
- Secondary hashing
- The above hashing is locality-sensitive
- Slide 46
- Direct L2 solution
- Central limit theorem
- Slide 49
- Slide 50
- Norm Distance
- Slide 52
- The full Hashing
- Slide 54
- Slide 55
- Slide 56
- Generalization P-Stable distribution
- P-Stable summary
- Parameters selection
- Parameters selection hellip
- hellip Parameters selection
- Pros amp Cons
- Conclusion
- LSH - Applications
- Motivation
- Slide 66
- Given an image x what are the parameters θ in this image ie angles of joints orientation of the body etc1048698
- Ingredients
- Example based learning
- Slide 70
- The image features
- PSH The basic assumption
- Insight Manifolds
- Slide 74
- Parameter Sensitive Hashing (PSH)
- Slide 76
- Slide 77
- Slide 78
- Slide 79
- Local Weighted Regression (LWR)
- Results
- Slide 82
- Results ndash real data
- Slide 84
- Slide 85
- Fast pose estimation - summary
- Food for Thought
- Food for Thought Point Location in Different Spheres (PLDS)
- Motivation
- Slide 90
- Slide 91
- Slide 92
- Slide 93
- Image segmentation algorithm
- Slide 95
- Filtering examples
- Segmentation examples
- Mean-shift in high dimensions
- LSH-based data structure
- Choosing the optimal K and L
- Slide 101
- Choosing optimal K and L
- Slide 103
- Data driven partitions
- Additional speedup
- Speedup results
- Food for thought
- A thought for foodhellip
- Summary
- Slide 110
- Thanks
-
![Page 22: K-Nearest Neighbors Search in High Dimensions Tomer Peled Dan Kushnir Tell me who your neighbors are, and I'll know who you are.](https://reader036.fdocuments.us/reader036/viewer/2022062421/56649c765503460f94929f53/html5/thumbnails/22.jpg)
Outline
bullProblem definition and flavorsbullAlgorithms overview - low dimensions bullCurse of dimensionality (dgt1020)bullEnchanting the curse Enchanting the curse
Locality Sensitive Hashing Locality Sensitive Hashing (high dimension approximate solutions)
bulll2 extensionbullApplications (Dan)
Preview
bullGeneral Solution ndash Locality sensitive hashing
bullImplementation for Hamming space
bullGeneralization to l1 amp l2
Hash function
Hash function
Hash function
Data_Item
Key
BinBucket
Hash function
X modulo 3
X=Number in the range 0n
02
Storage Address
Data structure
0
Usually we would like related Data-items to be stored at the same bin
Recall r - Nearest Neighbor
r
(1 + ) r
dist(qp1) r
dist(qp2) (1 + ) r r2=(1 + ) r1
Locality sensitive hashing
r(1 + ) r
(r p1p2 )Sensitiveequiv Pr[I(p)=I(q)] is ldquohighrdquo if p is ldquocloserdquo to qequiv Pr[I(p)=I(q)] is ldquolowrdquo if p isrdquofarrdquo from q
r2=(1 + ) r1
P1P2
Preview
bullGeneral Solution ndash Locality sensitive hashing
bullImplementation for Hamming space
bullGeneralization to l1 amp l2
Hamming Space
bullHamming space = 2N binary strings
bullHamming distance = changed digits
aka Signal distanceRichard Hamming
Hamming SpaceN
010100001111
010100001111
010010000011Distance = 4
bullHamming space
bullHamming distance
SUM(X1 XOR X2)
L1 to Hamming Space Embedding
p
8
C=11
1111111100011000000000
2
1111111100011000000000
drsquo=Cd
Hash function
Lj Hash function
p Hdrsquoisin
Gj(p)=p|Ij
j=1L k=3 digits
Bits sampling from p
Store p into bucket p|Ij 2k buckets101
11000000000 111111110000 111000000000 111111110001
Construction
1 2 L
p
Query
1 2 L
q
Alternative intuition random projections
p
8
C=11
1111111100011000000000
2
1111111100011000000000
drsquo=Cd
Alternative intuition random projections
8
C=11
1111111100011000000000
2
1111111100011000000000
p
Alternative intuition random projections
8
C=11
1111111100011000000000
2
1111111100011000000000
p
Alternative intuition random projections
101
11000000000 111111110000 111000000000 111111110001
000
100
110
001
101
111
2233 BucketsBucketsp
k samplings
Repeating
Repeating L times
Repeating L times
Secondary hashing
Support volume tuning
dataset-size vs storage volume
2k buckets
011
Size=B
M Buckets
Simple Hashing
MB=αn α=2
Skip
The above hashing is locality-sensitive
bullProbability (pq in same bucket)=
k=1 k=2
Distance (qpi) Distance (qpi)
Pro
babi
lity Pr
Adopted from Piotr Indykrsquos slides
kqp
dimensions
)(Distance1
Preview
bullGeneral Solution ndash Locality sensitive hashing
bullImplementation for Hamming space
bullGeneralization to l2
Direct L2 solution
bullNew hashing function
bullStill based on sampling
bullUsing mathematical trick
bullP-stable distribution for Lp distance bullGaussian distribution for L2 distance
Central limit theorem
v1 +v2 hellip+vn =+hellip
(Weighted Gaussians) = Weighted Gaussian
Central limit theorem
v1vn = Real Numbers
X1Xn = Independent Identically Distributed(iid)
+v2 X2 hellip+vn Xn =+hellipv1 X1
Central limit theorem
XvXvi
ii
ii
21
2||
Dot Product Norm
Norm Distance
XvuXvXui
iii
iii
ii
21
2||
Features vector 1
Features vector 2 Distance
Norm Distance
XvuXvXui
iii
iii
ii
21
2||
Dot Product
Dot Product Distance
The full Hashing
w
bvavh ba )(
[34 82 21]1
227742
d
d random numbers
+b
phaseRandom[0w]
wDiscretization step
Features vector
The full Hashing
w
bvavh ba )(
+34
100
7944
7900 8000 8100 82007800
The full Hashing
w
bvavh ba )(
+34
phaseRandom[0w]
100Discretization step
7944
The full Hashing
w
bvavh ba )(
a1 v d
iid from p-stable distribution
+b
phaseRandom[0w]
wDiscretization step
Features vector
Generalization P-Stable distribution
bullLp p=eps2
bullGeneralized Central Limit Theorem
bullP-stable distributionCauchy for L2
bullL2
bullCentral Limit Theorem
bullGaussian (normal) distribution
P-Stable summary
bullWorks for bullGeneralizes to 0ltplt=2
bullImproves query time
Query time = O (dn1(1+)log(n) ) O (dn1(1+)^2log(n) )
r - Nearest Neighbor
Latest resultsReported in Email by
Alexander Andoni
Parameters selection
bull90 Probability Best quarry time performance
For Euclidean Space
Parameters selectionhellip
For Euclidean Space
bullSingle projection hit an - Nearest Neighbor with Pr=p1
bullk projections hits an - Nearest Neighbor with Pr=p1k
bullL hashings fail to collide with Pr=(1-p1k)L
bullTo ensure Collision (eg 1-δge90)
bull1( -1-p1k)Lge 1-δ)1log(
)log(
1kp
L
L
Reject Non-NeighborsAccept Neighbors
hellipParameters selection
K
k
time Candidates verification Candidates extraction
Better Query Time than Spatial Data Structures
Scales well to higher dimensions and larger data size ( Sub-linear dependence )
Predictable running time
Extra storage over-head
Inefficient for data with distances concentrated around average
works best for Hamming distance (although can be generalized to Euclidean space)
In secondary storage linear scan is pretty much all we can do (for high dim)
requires radius r to be fixed in advance
Pros amp Cons
From Pioter Indyk slides
Conclusion
bullbut at the endeverything depends on your data set
bullTry it at homendashVisit
httpwebmiteduandoniwwwLSHindexhtml
ndashEmail Alex AndoniAndonimitedundashTest over your own data
(C code under Red Hat Linux )
LSH - Applicationsbull Searching video clips in databases (Hierarchical Non-Uniform Locality Sensitive
Hashing and Its Application to Video Identificationldquo Yang Ooi Sun)
bull Searching image databases (see the following)
bull Image segmentation (see the following)
bull Image classification (ldquoDiscriminant adaptive Nearest Neighbor Classificationrdquo T Hastie R Tibshirani)
bull Texture classification (see the following)
bull Clustering (see the following)
bull Embedding and manifold learning (LLE and many others)
bull Compression ndash vector quantizationbull Search engines (ldquoLSH Forest SelfTuning Indexes for Similarity Searchrdquo M Bawa T Condie P Ganesanrdquo)
bull Genomics (ldquoEfficient Large-Scale Sequence Comparison by Locality-Sensitive Hashingrdquo J Buhler)
bull In short whenever K-Nearest Neighbors (KNN) are needed
Motivation
bull A variety of procedures in learning require KNN computation
bull KNN search is a computational bottleneck
bull LSH provides a fast approximate solution to the problem
bull LSH requires hash function construction and parameter tunning
Outline
Fast Pose Estimation with Parameter Sensitive Hashing G Shakhnarovich P Viola and T Darrell
bull Finding sensitive hash functions
Mean Shift Based Clustering in HighDimensions A Texture Classification Example
B Georgescu I Shimshoni and P Meer
bull Tuning LSH parametersbull LSH data structure is used for algorithm
speedups
Given an image x what are the parameters θ in this image
ie angles of joints orientation of the body etc1048698
The Problem
Fast Pose Estimation with Parameter Sensitive Hashing
G Shakhnarovich P Viola and T Darrell
i
Ingredients
bull Input query image with unknown angles (parameters)
bull Database of human poses with known anglesbull Image feature extractor ndash edge detector
bull Distance metric in feature space dx
bull Distance metric in angles space
m
i
iid1
2121 )cos(1)(
Example based learning
bull Construct a database of example images with their known angles
bull Given a query image run your favorite feature extractorbull Compute KNN from databasebull Use these KNNs to compute the average angles of the
query
Input queryFind KNN in database of examples
Output Average angles of KNN
Input Query
Features extraction
Processed query
PSH (LSH)
Database of examples
The algorithm flow
LWR (Regression)
Output Match
The image features
B A
Axx 4107 )(
4
3
2
4 0
Image features are multi-scale edge histograms
Feature Extraction PSH LWR
PSH The basic assumption
There are two metric spaces here feature space ( )
and parameter space ( )
We want similarity to be measured in the angles
space whereas LSH works on the feature space
bull Assumption The feature space is closely related to the parameter space
xd
d
Feature Extraction PSH LWR
Insight Manifolds
bull Manifold is a space in which every point has a neighborhood resembling a Euclid space
bull But global structure may be complicated curved
bull For example lines are 1D manifolds planes are 2D manifolds etc
Feature Extraction PSH LWR
Parameters Space (angles)
Feature Space
q
Is this Magic
Parameter Sensitive Hashing (PSH)
The trick
Estimate performance of different hash functions on examples and select those sensitive to
The hash functions are applied in feature space but the KNN are valid in angle space
d
Feature Extraction PSH LWR
Label pairs of examples with similar angles
Define hash functions h on feature space
Feature Extraction PSH LWR
Predict labeling of similarnon-similar examples by using h
Compare labeling
If labeling by h is goodaccept h else change h
PSH as a classification problem
+1 +1 -1 -1
(r=025)
Labels
)1()( if 1
)( if 1y
labeled is
)x()(x examples ofpair A
ij
ji
rd
rd
ji
ji
ji
Feature Extraction PSH LWR
otherwise 1-
T(x) if 1)(
xh T
A binary hash functionfeatures
otherwise 1
if 1ˆ
labels ePredict th
)(xh)(xh)x(xy
jTiTjih
Feature Extraction PSH LWR
Feature
Feature Extraction PSH LWR
sconstraint iesprobabilit with thelabeling true thepredicts that Tbest theFind
themseparateor bin
same in the examplesboth place willTh
)(xT
Local Weighted Regression (LWR)bull Given a query image PSH returns
KNNs
bull LWR uses the KNN to compute a weighted average of the estimated angles of the query
weightdist
iXiixNx
xxdKxgdi
0)(
))(())((minarg0
Feature Extraction PSH LWR
Results
Synthetic data were generated
bull 13 angles 1 for rotation of the torso 12 for joints
bull 150000 images
bull Nuisance parameters added clothing illumination face expression
bull 1775000 example pairs
bull Selected 137 out of 5123 meaningful features (how)
18 bit hash functions (k) 150 hash tables (l)
bull Test on 1000 synthetic examplesbull PSH searched only 34 of the data per query
bull Without selection needed 40 bits and
1000 hash tables
Recall P1 is prob of positive hashP2 is prob of bad hashB is the max number of pts in a bucket
Results ndash real data
bull 800 images
bull Processed by a segmentation algorithm
bull 13 of the data were searched
Results ndash real data
Interesting mismatches
Fast pose estimation - summary
bull Fast way to compute the angles of human body figure
bull Moving from one representation space to another
bull Training a sensitive hash function
bull KNN smart averaging
Food for Thought
bull The basic assumption may be problematic (distance metric representations)
bull The training set should be dense
bull Texture and clutter
bull General some features are more important than others and should be weighted
Food for Thought Point Location in Different Spheres (PLDS)
bull Given n spheres in Rd centered at P=p1hellippn
with radii r1helliprn
bull Goal given a query q preprocess the points in P to find point pi that its sphere lsquocoverrsquo the query q
qpi
ri
Courtesy of Mohamad Hegaze
Motivationbull Clustering high dimensional data by using local
density measurements (eg feature space)bull Statistical curse of dimensionality
sparseness of the databull Computational curse of dimensionality
expensive range queriesbull LSH parameters should be adjusted for optimal
performance
Mean-Shift Based Clustering in High Dimensions A Texture Classification Example
B Georgescu I Shimshoni and P Meer
Outline
bull Mean-shift in a nutshell + examples
Our scope
bull Mean-shift in high dimensions ndash using LSH
bull Speedups1 Finding optimal LSH parameters
2 Data-driven partitions into buckets
3 Additional speedup by using LSH data structure
Mean-Shift in a Nutshellbandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
point
KNN in mean-shift
Bandwidth should be inversely proportional to the density in the region
high density - small bandwidth low density - large bandwidth
Based on kth nearest neighbor of the point
The bandwidth is
Adaptive mean-shift vs non-adaptive
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Image segmentation algorithm1 Input Data in 5D (3 color + 2 xy) or 3D (1 gray +2 xy)2 Resolution controlled by the bandwidth hs (spatial) hr (color)3 Apply filtering
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
3D
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Image segmentation algorithm
original segmented
filtered
Filtering pixel value of the nearest mode
Mean-shift trajectories
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
original squirrel filtered
original baboon filtered
Filtering examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Segmentation examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Mean-shift in high dimensions
Computational curse of dimensionality
Statistical curse of dimensionality
Expensive range queries implemented with LSH
Sparseness of the data variable bandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
LSH-based data structure
bull Choose L random partitionsEach partition includes K pairs
(dkvk)bull For each point we check
kdi vxK
It Partitions the data into cells
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing the optimal K and L
bull For a query q compute smallest number of distances to points in its buckets
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
points extra includemight big toois L ifbut
missed bemight points small toois L If
cell ain points ofnumber smaller k Large
C
l
l
CC
dC
LNN
dKnN
)1(
C
C
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
structure data theof resolution thedetermines
decreases but increases increases L As
C
CC
Choosing optimal K and LDetermine accurately the KNN for m randomly-selected data points
distance (bandwidth)
Choose error threshold
The optimal K and L should satisfy
the approximate distance
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing optimal K and Lbull For each K estimate the error forbull In one run for all Lrsquos find the minimal L satisfying the constraint L(K)bull Minimize time t(KL(K))
minimum
Approximationerror for KL
L(K) for =005 Running timet[KL(K)]
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Data driven partitions
bull In original LSH cut values are random in the range of the databull Suggestion Randomly select a point from the data and use one of its coordinates as the cut value
uniform data driven pointsbucket distribution
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Additional speedup
aggregate)an of typea like is (C mode same
the toconverge willCin points all that Assume
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
C
C
Speedup results
65536 points 1638 points sampled k=100
Food for thought
Low dimension High dimension
A thought for foodhellipbull Choose K L by sample learning or take the traditionalbull Can one estimate K L without samplingbull A thought for food does it help to know the data
dimensionality or the data manifoldbull Intuitively dimensionality implies the number of hash
functions neededbull The catch efficient dimensionality learning requires
KNN
1530 cookieshellip
Summary
bull LSH suggests a compromise on accuracy for the gain of complexity
bull Applications that involve massive data in high dimension require the LSH fast performance
bull Extension of the LSH to different spaces (PSH)
bull Learning the LSH parameters and hash functions for different applications
Conclusion
bull but at the endeverything depends on your data set
bull Try it at homendash Visit
httpwebmiteduandoniwwwLSHindexhtmlndash Email Alex Andoni Andonimitedundash Test over your own data
(C code under Red Hat Linux )
Thanks
bull Ilan Shimshoni (Haifa)
bull Mohamad Hegaze (Weizmann)
bull Alex Andoni (MIT)
bull Mica and Denis
- k-Nearest Neighbors Search in High Dimensions
- Outline
- Nearest Neighbor Search Problem definition
- Applications
- Naiumlve solution
- Common solution
- When to use nearest neighbor
- Nearest Neighbor
- r - Nearest Neighbor
- Slide 10
- The simplest solution
- Quadtree
- Quadtree - structure
- Quadtree - Query
- Quadtree ndash Pitfall1
- Slide 16
- Quadtree ndash pitfall 2
- Space partition based algorithms
- Slide 19
- Curse of dimensionality
- Curse of dimensionality Some intuition
- Slide 22
- Preview
- Hash function
- Slide 25
- Slide 26
- Recall r - Nearest Neighbor
- Locality sensitive hashing
- Slide 29
- Hamming Space
- Slide 31
- L1 to Hamming Space Embedding
- Slide 33
- Construction
- Query
- Alternative intuition random projections
- Slide 37
- Slide 38
- Slide 39
- k samplings
- Repeating
- Repeating L times
- Slide 43
- Secondary hashing
- The above hashing is locality-sensitive
- Slide 46
- Direct L2 solution
- Central limit theorem
- Slide 49
- Slide 50
- Norm Distance
- Slide 52
- The full Hashing
- Slide 54
- Slide 55
- Slide 56
- Generalization P-Stable distribution
- P-Stable summary
- Parameters selection
- Parameters selection hellip
- hellip Parameters selection
- Pros amp Cons
- Conclusion
- LSH - Applications
- Motivation
- Slide 66
- Given an image x what are the parameters θ in this image ie angles of joints orientation of the body etc1048698
- Ingredients
- Example based learning
- Slide 70
- The image features
- PSH The basic assumption
- Insight Manifolds
- Slide 74
- Parameter Sensitive Hashing (PSH)
- Slide 76
- Slide 77
- Slide 78
- Slide 79
- Local Weighted Regression (LWR)
- Results
- Slide 82
- Results ndash real data
- Slide 84
- Slide 85
- Fast pose estimation - summary
- Food for Thought
- Food for Thought Point Location in Different Spheres (PLDS)
- Motivation
- Slide 90
- Slide 91
- Slide 92
- Slide 93
- Image segmentation algorithm
- Slide 95
- Filtering examples
- Segmentation examples
- Mean-shift in high dimensions
- LSH-based data structure
- Choosing the optimal K and L
- Slide 101
- Choosing optimal K and L
- Slide 103
- Data driven partitions
- Additional speedup
- Speedup results
- Food for thought
- A thought for foodhellip
- Summary
- Slide 110
- Thanks
-
![Page 23: K-Nearest Neighbors Search in High Dimensions Tomer Peled Dan Kushnir Tell me who your neighbors are, and I'll know who you are.](https://reader036.fdocuments.us/reader036/viewer/2022062421/56649c765503460f94929f53/html5/thumbnails/23.jpg)
Preview
bullGeneral Solution ndash Locality sensitive hashing
bullImplementation for Hamming space
bullGeneralization to l1 amp l2
Hash function
Hash function
Hash function
Data_Item
Key
BinBucket
Hash function
X modulo 3
X=Number in the range 0n
02
Storage Address
Data structure
0
Usually we would like related Data-items to be stored at the same bin
Recall r - Nearest Neighbor
r
(1 + ) r
dist(qp1) r
dist(qp2) (1 + ) r r2=(1 + ) r1
Locality sensitive hashing
r(1 + ) r
(r p1p2 )Sensitiveequiv Pr[I(p)=I(q)] is ldquohighrdquo if p is ldquocloserdquo to qequiv Pr[I(p)=I(q)] is ldquolowrdquo if p isrdquofarrdquo from q
r2=(1 + ) r1
P1P2
Preview
bullGeneral Solution ndash Locality sensitive hashing
bullImplementation for Hamming space
bullGeneralization to l1 amp l2
Hamming Space
bullHamming space = 2N binary strings
bullHamming distance = changed digits
aka Signal distanceRichard Hamming
Hamming SpaceN
010100001111
010100001111
010010000011Distance = 4
bullHamming space
bullHamming distance
SUM(X1 XOR X2)
L1 to Hamming Space Embedding
p
8
C=11
1111111100011000000000
2
1111111100011000000000
drsquo=Cd
Hash function
Lj Hash function
p Hdrsquoisin
Gj(p)=p|Ij
j=1L k=3 digits
Bits sampling from p
Store p into bucket p|Ij 2k buckets101
11000000000 111111110000 111000000000 111111110001
Construction
1 2 L
p
Query
1 2 L
q
Alternative intuition random projections
p
8
C=11
1111111100011000000000
2
1111111100011000000000
drsquo=Cd
Alternative intuition random projections
8
C=11
1111111100011000000000
2
1111111100011000000000
p
Alternative intuition random projections
8
C=11
1111111100011000000000
2
1111111100011000000000
p
Alternative intuition random projections
101
11000000000 111111110000 111000000000 111111110001
000
100
110
001
101
111
2233 BucketsBucketsp
k samplings
Repeating
Repeating L times
Repeating L times
Secondary hashing
Support volume tuning
dataset-size vs storage volume
2k buckets
011
Size=B
M Buckets
Simple Hashing
MB=αn α=2
Skip
The above hashing is locality-sensitive
bullProbability (pq in same bucket)=
k=1 k=2
Distance (qpi) Distance (qpi)
Pro
babi
lity Pr
Adopted from Piotr Indykrsquos slides
kqp
dimensions
)(Distance1
Preview
bullGeneral Solution ndash Locality sensitive hashing
bullImplementation for Hamming space
bullGeneralization to l2
Direct L2 solution
bullNew hashing function
bullStill based on sampling
bullUsing mathematical trick
bullP-stable distribution for Lp distance bullGaussian distribution for L2 distance
Central limit theorem
v1 +v2 hellip+vn =+hellip
(Weighted Gaussians) = Weighted Gaussian
Central limit theorem
v1vn = Real Numbers
X1Xn = Independent Identically Distributed(iid)
+v2 X2 hellip+vn Xn =+hellipv1 X1
Central limit theorem
XvXvi
ii
ii
21
2||
Dot Product Norm
Norm Distance
XvuXvXui
iii
iii
ii
21
2||
Features vector 1
Features vector 2 Distance
Norm Distance
XvuXvXui
iii
iii
ii
21
2||
Dot Product
Dot Product Distance
The full Hashing
w
bvavh ba )(
[34 82 21]1
227742
d
d random numbers
+b
phaseRandom[0w]
wDiscretization step
Features vector
The full Hashing
w
bvavh ba )(
+34
100
7944
7900 8000 8100 82007800
The full Hashing
w
bvavh ba )(
+34
phaseRandom[0w]
100Discretization step
7944
The full Hashing
w
bvavh ba )(
a1 v d
iid from p-stable distribution
+b
phaseRandom[0w]
wDiscretization step
Features vector
Generalization P-Stable distribution
bullLp p=eps2
bullGeneralized Central Limit Theorem
bullP-stable distributionCauchy for L2
bullL2
bullCentral Limit Theorem
bullGaussian (normal) distribution
P-Stable summary
bullWorks for bullGeneralizes to 0ltplt=2
bullImproves query time
Query time = O (dn1(1+)log(n) ) O (dn1(1+)^2log(n) )
r - Nearest Neighbor
Latest resultsReported in Email by
Alexander Andoni
Parameters selection
bull90 Probability Best quarry time performance
For Euclidean Space
Parameters selectionhellip
For Euclidean Space
bullSingle projection hit an - Nearest Neighbor with Pr=p1
bullk projections hits an - Nearest Neighbor with Pr=p1k
bullL hashings fail to collide with Pr=(1-p1k)L
bullTo ensure Collision (eg 1-δge90)
bull1( -1-p1k)Lge 1-δ)1log(
)log(
1kp
L
L
Reject Non-NeighborsAccept Neighbors
hellipParameters selection
K
k
time Candidates verification Candidates extraction
Better Query Time than Spatial Data Structures
Scales well to higher dimensions and larger data size ( Sub-linear dependence )
Predictable running time
Extra storage over-head
Inefficient for data with distances concentrated around average
works best for Hamming distance (although can be generalized to Euclidean space)
In secondary storage linear scan is pretty much all we can do (for high dim)
requires radius r to be fixed in advance
Pros amp Cons
From Pioter Indyk slides
Conclusion
bullbut at the endeverything depends on your data set
bullTry it at homendashVisit
httpwebmiteduandoniwwwLSHindexhtml
ndashEmail Alex AndoniAndonimitedundashTest over your own data
(C code under Red Hat Linux )
LSH - Applicationsbull Searching video clips in databases (Hierarchical Non-Uniform Locality Sensitive
Hashing and Its Application to Video Identificationldquo Yang Ooi Sun)
bull Searching image databases (see the following)
bull Image segmentation (see the following)
bull Image classification (ldquoDiscriminant adaptive Nearest Neighbor Classificationrdquo T Hastie R Tibshirani)
bull Texture classification (see the following)
bull Clustering (see the following)
bull Embedding and manifold learning (LLE and many others)
bull Compression ndash vector quantizationbull Search engines (ldquoLSH Forest SelfTuning Indexes for Similarity Searchrdquo M Bawa T Condie P Ganesanrdquo)
bull Genomics (ldquoEfficient Large-Scale Sequence Comparison by Locality-Sensitive Hashingrdquo J Buhler)
bull In short whenever K-Nearest Neighbors (KNN) are needed
Motivation
bull A variety of procedures in learning require KNN computation
bull KNN search is a computational bottleneck
bull LSH provides a fast approximate solution to the problem
bull LSH requires hash function construction and parameter tunning
Outline
Fast Pose Estimation with Parameter Sensitive Hashing G Shakhnarovich P Viola and T Darrell
bull Finding sensitive hash functions
Mean Shift Based Clustering in HighDimensions A Texture Classification Example
B Georgescu I Shimshoni and P Meer
bull Tuning LSH parametersbull LSH data structure is used for algorithm
speedups
Given an image x what are the parameters θ in this image
ie angles of joints orientation of the body etc1048698
The Problem
Fast Pose Estimation with Parameter Sensitive Hashing
G Shakhnarovich P Viola and T Darrell
i
Ingredients
bull Input query image with unknown angles (parameters)
bull Database of human poses with known anglesbull Image feature extractor ndash edge detector
bull Distance metric in feature space dx
bull Distance metric in angles space
m
i
iid1
2121 )cos(1)(
Example based learning
bull Construct a database of example images with their known angles
bull Given a query image run your favorite feature extractorbull Compute KNN from databasebull Use these KNNs to compute the average angles of the
query
Input queryFind KNN in database of examples
Output Average angles of KNN
Input Query
Features extraction
Processed query
PSH (LSH)
Database of examples
The algorithm flow
LWR (Regression)
Output Match
The image features
B A
Axx 4107 )(
4
3
2
4 0
Image features are multi-scale edge histograms
Feature Extraction PSH LWR
PSH The basic assumption
There are two metric spaces here feature space ( )
and parameter space ( )
We want similarity to be measured in the angles
space whereas LSH works on the feature space
bull Assumption The feature space is closely related to the parameter space
xd
d
Feature Extraction PSH LWR
Insight Manifolds
bull Manifold is a space in which every point has a neighborhood resembling a Euclid space
bull But global structure may be complicated curved
bull For example lines are 1D manifolds planes are 2D manifolds etc
Feature Extraction PSH LWR
Parameters Space (angles)
Feature Space
q
Is this Magic
Parameter Sensitive Hashing (PSH)
The trick
Estimate performance of different hash functions on examples and select those sensitive to
The hash functions are applied in feature space but the KNN are valid in angle space
d
Feature Extraction PSH LWR
Label pairs of examples with similar angles
Define hash functions h on feature space
Feature Extraction PSH LWR
Predict labeling of similarnon-similar examples by using h
Compare labeling
If labeling by h is goodaccept h else change h
PSH as a classification problem
+1 +1 -1 -1
(r=025)
Labels
)1()( if 1
)( if 1y
labeled is
)x()(x examples ofpair A
ij
ji
rd
rd
ji
ji
ji
Feature Extraction PSH LWR
otherwise 1-
T(x) if 1)(
xh T
A binary hash functionfeatures
otherwise 1
if 1ˆ
labels ePredict th
)(xh)(xh)x(xy
jTiTjih
Feature Extraction PSH LWR
Feature
Feature Extraction PSH LWR
sconstraint iesprobabilit with thelabeling true thepredicts that Tbest theFind
themseparateor bin
same in the examplesboth place willTh
)(xT
Local Weighted Regression (LWR)bull Given a query image PSH returns
KNNs
bull LWR uses the KNN to compute a weighted average of the estimated angles of the query
weightdist
iXiixNx
xxdKxgdi
0)(
))(())((minarg0
Feature Extraction PSH LWR
Results
Synthetic data were generated
bull 13 angles 1 for rotation of the torso 12 for joints
bull 150000 images
bull Nuisance parameters added clothing illumination face expression
bull 1775000 example pairs
bull Selected 137 out of 5123 meaningful features (how)
18 bit hash functions (k) 150 hash tables (l)
bull Test on 1000 synthetic examplesbull PSH searched only 34 of the data per query
bull Without selection needed 40 bits and
1000 hash tables
Recall P1 is prob of positive hashP2 is prob of bad hashB is the max number of pts in a bucket
Results ndash real data
bull 800 images
bull Processed by a segmentation algorithm
bull 13 of the data were searched
Results ndash real data
Interesting mismatches
Fast pose estimation - summary
bull Fast way to compute the angles of human body figure
bull Moving from one representation space to another
bull Training a sensitive hash function
bull KNN smart averaging
Food for Thought
bull The basic assumption may be problematic (distance metric representations)
bull The training set should be dense
bull Texture and clutter
bull General some features are more important than others and should be weighted
Food for Thought Point Location in Different Spheres (PLDS)
bull Given n spheres in Rd centered at P=p1hellippn
with radii r1helliprn
bull Goal given a query q preprocess the points in P to find point pi that its sphere lsquocoverrsquo the query q
qpi
ri
Courtesy of Mohamad Hegaze
Motivationbull Clustering high dimensional data by using local
density measurements (eg feature space)bull Statistical curse of dimensionality
sparseness of the databull Computational curse of dimensionality
expensive range queriesbull LSH parameters should be adjusted for optimal
performance
Mean-Shift Based Clustering in High Dimensions A Texture Classification Example
B Georgescu I Shimshoni and P Meer
Outline
bull Mean-shift in a nutshell + examples
Our scope
bull Mean-shift in high dimensions ndash using LSH
bull Speedups1 Finding optimal LSH parameters
2 Data-driven partitions into buckets
3 Additional speedup by using LSH data structure
Mean-Shift in a Nutshellbandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
point
KNN in mean-shift
Bandwidth should be inversely proportional to the density in the region
high density - small bandwidth low density - large bandwidth
Based on kth nearest neighbor of the point
The bandwidth is
Adaptive mean-shift vs non-adaptive
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Image segmentation algorithm1 Input Data in 5D (3 color + 2 xy) or 3D (1 gray +2 xy)2 Resolution controlled by the bandwidth hs (spatial) hr (color)3 Apply filtering
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
3D
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Image segmentation algorithm
original segmented
filtered
Filtering pixel value of the nearest mode
Mean-shift trajectories
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
original squirrel filtered
original baboon filtered
Filtering examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Segmentation examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Mean-shift in high dimensions
Computational curse of dimensionality
Statistical curse of dimensionality
Expensive range queries implemented with LSH
Sparseness of the data variable bandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
LSH-based data structure
bull Choose L random partitionsEach partition includes K pairs
(dkvk)bull For each point we check
kdi vxK
It Partitions the data into cells
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing the optimal K and L
bull For a query q compute smallest number of distances to points in its buckets
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
points extra includemight big toois L ifbut
missed bemight points small toois L If
cell ain points ofnumber smaller k Large
C
l
l
CC
dC
LNN
dKnN
)1(
C
C
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
structure data theof resolution thedetermines
decreases but increases increases L As
C
CC
Choosing optimal K and LDetermine accurately the KNN for m randomly-selected data points
distance (bandwidth)
Choose error threshold
The optimal K and L should satisfy
the approximate distance
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing optimal K and Lbull For each K estimate the error forbull In one run for all Lrsquos find the minimal L satisfying the constraint L(K)bull Minimize time t(KL(K))
minimum
Approximationerror for KL
L(K) for =005 Running timet[KL(K)]
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Data driven partitions
bull In original LSH cut values are random in the range of the databull Suggestion Randomly select a point from the data and use one of its coordinates as the cut value
uniform data driven pointsbucket distribution
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Additional speedup
aggregate)an of typea like is (C mode same
the toconverge willCin points all that Assume
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
C
C
Speedup results
65536 points 1638 points sampled k=100
Food for thought
Low dimension High dimension
A thought for foodhellipbull Choose K L by sample learning or take the traditionalbull Can one estimate K L without samplingbull A thought for food does it help to know the data
dimensionality or the data manifoldbull Intuitively dimensionality implies the number of hash
functions neededbull The catch efficient dimensionality learning requires
KNN
1530 cookieshellip
Summary
bull LSH suggests a compromise on accuracy for the gain of complexity
bull Applications that involve massive data in high dimension require the LSH fast performance
bull Extension of the LSH to different spaces (PSH)
bull Learning the LSH parameters and hash functions for different applications
Conclusion
bull but at the endeverything depends on your data set
bull Try it at homendash Visit
httpwebmiteduandoniwwwLSHindexhtmlndash Email Alex Andoni Andonimitedundash Test over your own data
(C code under Red Hat Linux )
Thanks
bull Ilan Shimshoni (Haifa)
bull Mohamad Hegaze (Weizmann)
bull Alex Andoni (MIT)
bull Mica and Denis
- k-Nearest Neighbors Search in High Dimensions
- Outline
- Nearest Neighbor Search Problem definition
- Applications
- Naiumlve solution
- Common solution
- When to use nearest neighbor
- Nearest Neighbor
- r - Nearest Neighbor
- Slide 10
- The simplest solution
- Quadtree
- Quadtree - structure
- Quadtree - Query
- Quadtree ndash Pitfall1
- Slide 16
- Quadtree ndash pitfall 2
- Space partition based algorithms
- Slide 19
- Curse of dimensionality
- Curse of dimensionality Some intuition
- Slide 22
- Preview
- Hash function
- Slide 25
- Slide 26
- Recall r - Nearest Neighbor
- Locality sensitive hashing
- Slide 29
- Hamming Space
- Slide 31
- L1 to Hamming Space Embedding
- Slide 33
- Construction
- Query
- Alternative intuition random projections
- Slide 37
- Slide 38
- Slide 39
- k samplings
- Repeating
- Repeating L times
- Slide 43
- Secondary hashing
- The above hashing is locality-sensitive
- Slide 46
- Direct L2 solution
- Central limit theorem
- Slide 49
- Slide 50
- Norm Distance
- Slide 52
- The full Hashing
- Slide 54
- Slide 55
- Slide 56
- Generalization P-Stable distribution
- P-Stable summary
- Parameters selection
- Parameters selection hellip
- hellip Parameters selection
- Pros amp Cons
- Conclusion
- LSH - Applications
- Motivation
- Slide 66
- Given an image x what are the parameters θ in this image ie angles of joints orientation of the body etc1048698
- Ingredients
- Example based learning
- Slide 70
- The image features
- PSH The basic assumption
- Insight Manifolds
- Slide 74
- Parameter Sensitive Hashing (PSH)
- Slide 76
- Slide 77
- Slide 78
- Slide 79
- Local Weighted Regression (LWR)
- Results
- Slide 82
- Results ndash real data
- Slide 84
- Slide 85
- Fast pose estimation - summary
- Food for Thought
- Food for Thought Point Location in Different Spheres (PLDS)
- Motivation
- Slide 90
- Slide 91
- Slide 92
- Slide 93
- Image segmentation algorithm
- Slide 95
- Filtering examples
- Segmentation examples
- Mean-shift in high dimensions
- LSH-based data structure
- Choosing the optimal K and L
- Slide 101
- Choosing optimal K and L
- Slide 103
- Data driven partitions
- Additional speedup
- Speedup results
- Food for thought
- A thought for foodhellip
- Summary
- Slide 110
- Thanks
-
![Page 24: K-Nearest Neighbors Search in High Dimensions Tomer Peled Dan Kushnir Tell me who your neighbors are, and I'll know who you are.](https://reader036.fdocuments.us/reader036/viewer/2022062421/56649c765503460f94929f53/html5/thumbnails/24.jpg)
Hash function
Hash function
Hash function
Data_Item
Key
BinBucket
Hash function
X modulo 3
X=Number in the range 0n
02
Storage Address
Data structure
0
Usually we would like related Data-items to be stored at the same bin
Recall r - Nearest Neighbor
r
(1 + ) r
dist(qp1) r
dist(qp2) (1 + ) r r2=(1 + ) r1
Locality sensitive hashing
r(1 + ) r
(r p1p2 )Sensitiveequiv Pr[I(p)=I(q)] is ldquohighrdquo if p is ldquocloserdquo to qequiv Pr[I(p)=I(q)] is ldquolowrdquo if p isrdquofarrdquo from q
r2=(1 + ) r1
P1P2
Preview
bullGeneral Solution ndash Locality sensitive hashing
bullImplementation for Hamming space
bullGeneralization to l1 amp l2
Hamming Space
bullHamming space = 2N binary strings
bullHamming distance = changed digits
aka Signal distanceRichard Hamming
Hamming SpaceN
010100001111
010100001111
010010000011Distance = 4
bullHamming space
bullHamming distance
SUM(X1 XOR X2)
L1 to Hamming Space Embedding
p
8
C=11
1111111100011000000000
2
1111111100011000000000
drsquo=Cd
Hash function
Lj Hash function
p Hdrsquoisin
Gj(p)=p|Ij
j=1L k=3 digits
Bits sampling from p
Store p into bucket p|Ij 2k buckets101
11000000000 111111110000 111000000000 111111110001
Construction
1 2 L
p
Query
1 2 L
q
Alternative intuition random projections
p
8
C=11
1111111100011000000000
2
1111111100011000000000
drsquo=Cd
Alternative intuition random projections
8
C=11
1111111100011000000000
2
1111111100011000000000
p
Alternative intuition random projections
8
C=11
1111111100011000000000
2
1111111100011000000000
p
Alternative intuition random projections
101
11000000000 111111110000 111000000000 111111110001
000
100
110
001
101
111
2233 BucketsBucketsp
k samplings
Repeating
Repeating L times
Repeating L times
Secondary hashing
Support volume tuning
dataset-size vs storage volume
2k buckets
011
Size=B
M Buckets
Simple Hashing
MB=αn α=2
Skip
The above hashing is locality-sensitive
bullProbability (pq in same bucket)=
k=1 k=2
Distance (qpi) Distance (qpi)
Pro
babi
lity Pr
Adopted from Piotr Indykrsquos slides
kqp
dimensions
)(Distance1
Preview
bullGeneral Solution ndash Locality sensitive hashing
bullImplementation for Hamming space
bullGeneralization to l2
Direct L2 solution
bullNew hashing function
bullStill based on sampling
bullUsing mathematical trick
bullP-stable distribution for Lp distance bullGaussian distribution for L2 distance
Central limit theorem
v1 +v2 hellip+vn =+hellip
(Weighted Gaussians) = Weighted Gaussian
Central limit theorem
v1vn = Real Numbers
X1Xn = Independent Identically Distributed(iid)
+v2 X2 hellip+vn Xn =+hellipv1 X1
Central limit theorem
XvXvi
ii
ii
21
2||
Dot Product Norm
Norm Distance
XvuXvXui
iii
iii
ii
21
2||
Features vector 1
Features vector 2 Distance
Norm Distance
XvuXvXui
iii
iii
ii
21
2||
Dot Product
Dot Product Distance
The full Hashing
w
bvavh ba )(
[34 82 21]1
227742
d
d random numbers
+b
phaseRandom[0w]
wDiscretization step
Features vector
The full Hashing
w
bvavh ba )(
+34
100
7944
7900 8000 8100 82007800
The full Hashing
w
bvavh ba )(
+34
phaseRandom[0w]
100Discretization step
7944
The full Hashing
w
bvavh ba )(
a1 v d
iid from p-stable distribution
+b
phaseRandom[0w]
wDiscretization step
Features vector
Generalization P-Stable distribution
bullLp p=eps2
bullGeneralized Central Limit Theorem
bullP-stable distributionCauchy for L2
bullL2
bullCentral Limit Theorem
bullGaussian (normal) distribution
P-Stable summary
bullWorks for bullGeneralizes to 0ltplt=2
bullImproves query time
Query time = O (dn1(1+)log(n) ) O (dn1(1+)^2log(n) )
r - Nearest Neighbor
Latest resultsReported in Email by
Alexander Andoni
Parameters selection
bull90 Probability Best quarry time performance
For Euclidean Space
Parameters selectionhellip
For Euclidean Space
bullSingle projection hit an - Nearest Neighbor with Pr=p1
bullk projections hits an - Nearest Neighbor with Pr=p1k
bullL hashings fail to collide with Pr=(1-p1k)L
bullTo ensure Collision (eg 1-δge90)
bull1( -1-p1k)Lge 1-δ)1log(
)log(
1kp
L
L
Reject Non-NeighborsAccept Neighbors
hellipParameters selection
K
k
time Candidates verification Candidates extraction
Better Query Time than Spatial Data Structures
Scales well to higher dimensions and larger data size ( Sub-linear dependence )
Predictable running time
Extra storage over-head
Inefficient for data with distances concentrated around average
works best for Hamming distance (although can be generalized to Euclidean space)
In secondary storage linear scan is pretty much all we can do (for high dim)
requires radius r to be fixed in advance
Pros amp Cons
From Pioter Indyk slides
Conclusion
bullbut at the endeverything depends on your data set
bullTry it at homendashVisit
httpwebmiteduandoniwwwLSHindexhtml
ndashEmail Alex AndoniAndonimitedundashTest over your own data
(C code under Red Hat Linux )
LSH - Applicationsbull Searching video clips in databases (Hierarchical Non-Uniform Locality Sensitive
Hashing and Its Application to Video Identificationldquo Yang Ooi Sun)
bull Searching image databases (see the following)
bull Image segmentation (see the following)
bull Image classification (ldquoDiscriminant adaptive Nearest Neighbor Classificationrdquo T Hastie R Tibshirani)
bull Texture classification (see the following)
bull Clustering (see the following)
bull Embedding and manifold learning (LLE and many others)
bull Compression ndash vector quantizationbull Search engines (ldquoLSH Forest SelfTuning Indexes for Similarity Searchrdquo M Bawa T Condie P Ganesanrdquo)
bull Genomics (ldquoEfficient Large-Scale Sequence Comparison by Locality-Sensitive Hashingrdquo J Buhler)
bull In short whenever K-Nearest Neighbors (KNN) are needed
Motivation
bull A variety of procedures in learning require KNN computation
bull KNN search is a computational bottleneck
bull LSH provides a fast approximate solution to the problem
bull LSH requires hash function construction and parameter tunning
Outline
Fast Pose Estimation with Parameter Sensitive Hashing G Shakhnarovich P Viola and T Darrell
bull Finding sensitive hash functions
Mean Shift Based Clustering in HighDimensions A Texture Classification Example
B Georgescu I Shimshoni and P Meer
bull Tuning LSH parametersbull LSH data structure is used for algorithm
speedups
Given an image x what are the parameters θ in this image
ie angles of joints orientation of the body etc1048698
The Problem
Fast Pose Estimation with Parameter Sensitive Hashing
G Shakhnarovich P Viola and T Darrell
i
Ingredients
bull Input query image with unknown angles (parameters)
bull Database of human poses with known anglesbull Image feature extractor ndash edge detector
bull Distance metric in feature space dx
bull Distance metric in angles space
m
i
iid1
2121 )cos(1)(
Example based learning
bull Construct a database of example images with their known angles
bull Given a query image run your favorite feature extractorbull Compute KNN from databasebull Use these KNNs to compute the average angles of the
query
Input queryFind KNN in database of examples
Output Average angles of KNN
Input Query
Features extraction
Processed query
PSH (LSH)
Database of examples
The algorithm flow
LWR (Regression)
Output Match
The image features
B A
Axx 4107 )(
4
3
2
4 0
Image features are multi-scale edge histograms
Feature Extraction PSH LWR
PSH The basic assumption
There are two metric spaces here feature space ( )
and parameter space ( )
We want similarity to be measured in the angles
space whereas LSH works on the feature space
bull Assumption The feature space is closely related to the parameter space
xd
d
Feature Extraction PSH LWR
Insight Manifolds
bull Manifold is a space in which every point has a neighborhood resembling a Euclid space
bull But global structure may be complicated curved
bull For example lines are 1D manifolds planes are 2D manifolds etc
Feature Extraction PSH LWR
Parameters Space (angles)
Feature Space
q
Is this Magic
Parameter Sensitive Hashing (PSH)
The trick
Estimate performance of different hash functions on examples and select those sensitive to
The hash functions are applied in feature space but the KNN are valid in angle space
d
Feature Extraction PSH LWR
Label pairs of examples with similar angles
Define hash functions h on feature space
Feature Extraction PSH LWR
Predict labeling of similarnon-similar examples by using h
Compare labeling
If labeling by h is goodaccept h else change h
PSH as a classification problem
+1 +1 -1 -1
(r=025)
Labels
)1()( if 1
)( if 1y
labeled is
)x()(x examples ofpair A
ij
ji
rd
rd
ji
ji
ji
Feature Extraction PSH LWR
otherwise 1-
T(x) if 1)(
xh T
A binary hash functionfeatures
otherwise 1
if 1ˆ
labels ePredict th
)(xh)(xh)x(xy
jTiTjih
Feature Extraction PSH LWR
Feature
Feature Extraction PSH LWR
sconstraint iesprobabilit with thelabeling true thepredicts that Tbest theFind
themseparateor bin
same in the examplesboth place willTh
)(xT
Local Weighted Regression (LWR)bull Given a query image PSH returns
KNNs
bull LWR uses the KNN to compute a weighted average of the estimated angles of the query
weightdist
iXiixNx
xxdKxgdi
0)(
))(())((minarg0
Feature Extraction PSH LWR
Results
Synthetic data were generated
bull 13 angles 1 for rotation of the torso 12 for joints
bull 150000 images
bull Nuisance parameters added clothing illumination face expression
bull 1775000 example pairs
bull Selected 137 out of 5123 meaningful features (how)
18 bit hash functions (k) 150 hash tables (l)
bull Test on 1000 synthetic examplesbull PSH searched only 34 of the data per query
bull Without selection needed 40 bits and
1000 hash tables
Recall P1 is prob of positive hashP2 is prob of bad hashB is the max number of pts in a bucket
Results ndash real data
bull 800 images
bull Processed by a segmentation algorithm
bull 13 of the data were searched
Results ndash real data
Interesting mismatches
Fast pose estimation - summary
bull Fast way to compute the angles of human body figure
bull Moving from one representation space to another
bull Training a sensitive hash function
bull KNN smart averaging
Food for Thought
bull The basic assumption may be problematic (distance metric representations)
bull The training set should be dense
bull Texture and clutter
bull General some features are more important than others and should be weighted
Food for Thought Point Location in Different Spheres (PLDS)
bull Given n spheres in Rd centered at P=p1hellippn
with radii r1helliprn
bull Goal given a query q preprocess the points in P to find point pi that its sphere lsquocoverrsquo the query q
qpi
ri
Courtesy of Mohamad Hegaze
Motivationbull Clustering high dimensional data by using local
density measurements (eg feature space)bull Statistical curse of dimensionality
sparseness of the databull Computational curse of dimensionality
expensive range queriesbull LSH parameters should be adjusted for optimal
performance
Mean-Shift Based Clustering in High Dimensions A Texture Classification Example
B Georgescu I Shimshoni and P Meer
Outline
bull Mean-shift in a nutshell + examples
Our scope
bull Mean-shift in high dimensions ndash using LSH
bull Speedups1 Finding optimal LSH parameters
2 Data-driven partitions into buckets
3 Additional speedup by using LSH data structure
Mean-Shift in a Nutshellbandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
point
KNN in mean-shift
Bandwidth should be inversely proportional to the density in the region
high density - small bandwidth low density - large bandwidth
Based on kth nearest neighbor of the point
The bandwidth is
Adaptive mean-shift vs non-adaptive
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Image segmentation algorithm1 Input Data in 5D (3 color + 2 xy) or 3D (1 gray +2 xy)2 Resolution controlled by the bandwidth hs (spatial) hr (color)3 Apply filtering
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
3D
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Image segmentation algorithm
original segmented
filtered
Filtering pixel value of the nearest mode
Mean-shift trajectories
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
original squirrel filtered
original baboon filtered
Filtering examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Segmentation examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Mean-shift in high dimensions
Computational curse of dimensionality
Statistical curse of dimensionality
Expensive range queries implemented with LSH
Sparseness of the data variable bandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
LSH-based data structure
bull Choose L random partitionsEach partition includes K pairs
(dkvk)bull For each point we check
kdi vxK
It Partitions the data into cells
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing the optimal K and L
bull For a query q compute smallest number of distances to points in its buckets
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
points extra includemight big toois L ifbut
missed bemight points small toois L If
cell ain points ofnumber smaller k Large
C
l
l
CC
dC
LNN
dKnN
)1(
C
C
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
structure data theof resolution thedetermines
decreases but increases increases L As
C
CC
Choosing optimal K and LDetermine accurately the KNN for m randomly-selected data points
distance (bandwidth)
Choose error threshold
The optimal K and L should satisfy
the approximate distance
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing optimal K and Lbull For each K estimate the error forbull In one run for all Lrsquos find the minimal L satisfying the constraint L(K)bull Minimize time t(KL(K))
minimum
Approximationerror for KL
L(K) for =005 Running timet[KL(K)]
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Data driven partitions
bull In original LSH cut values are random in the range of the databull Suggestion Randomly select a point from the data and use one of its coordinates as the cut value
uniform data driven pointsbucket distribution
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Additional speedup
aggregate)an of typea like is (C mode same
the toconverge willCin points all that Assume
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
C
C
Speedup results
65536 points 1638 points sampled k=100
Food for thought
Low dimension High dimension
A thought for foodhellipbull Choose K L by sample learning or take the traditionalbull Can one estimate K L without samplingbull A thought for food does it help to know the data
dimensionality or the data manifoldbull Intuitively dimensionality implies the number of hash
functions neededbull The catch efficient dimensionality learning requires
KNN
1530 cookieshellip
Summary
bull LSH suggests a compromise on accuracy for the gain of complexity
bull Applications that involve massive data in high dimension require the LSH fast performance
bull Extension of the LSH to different spaces (PSH)
bull Learning the LSH parameters and hash functions for different applications
Conclusion
bull but at the endeverything depends on your data set
bull Try it at homendash Visit
httpwebmiteduandoniwwwLSHindexhtmlndash Email Alex Andoni Andonimitedundash Test over your own data
(C code under Red Hat Linux )
Thanks
bull Ilan Shimshoni (Haifa)
bull Mohamad Hegaze (Weizmann)
bull Alex Andoni (MIT)
bull Mica and Denis
- k-Nearest Neighbors Search in High Dimensions
- Outline
- Nearest Neighbor Search Problem definition
- Applications
- Naiumlve solution
- Common solution
- When to use nearest neighbor
- Nearest Neighbor
- r - Nearest Neighbor
- Slide 10
- The simplest solution
- Quadtree
- Quadtree - structure
- Quadtree - Query
- Quadtree ndash Pitfall1
- Slide 16
- Quadtree ndash pitfall 2
- Space partition based algorithms
- Slide 19
- Curse of dimensionality
- Curse of dimensionality Some intuition
- Slide 22
- Preview
- Hash function
- Slide 25
- Slide 26
- Recall r - Nearest Neighbor
- Locality sensitive hashing
- Slide 29
- Hamming Space
- Slide 31
- L1 to Hamming Space Embedding
- Slide 33
- Construction
- Query
- Alternative intuition random projections
- Slide 37
- Slide 38
- Slide 39
- k samplings
- Repeating
- Repeating L times
- Slide 43
- Secondary hashing
- The above hashing is locality-sensitive
- Slide 46
- Direct L2 solution
- Central limit theorem
- Slide 49
- Slide 50
- Norm Distance
- Slide 52
- The full Hashing
- Slide 54
- Slide 55
- Slide 56
- Generalization P-Stable distribution
- P-Stable summary
- Parameters selection
- Parameters selection hellip
- hellip Parameters selection
- Pros amp Cons
- Conclusion
- LSH - Applications
- Motivation
- Slide 66
- Given an image x what are the parameters θ in this image ie angles of joints orientation of the body etc1048698
- Ingredients
- Example based learning
- Slide 70
- The image features
- PSH The basic assumption
- Insight Manifolds
- Slide 74
- Parameter Sensitive Hashing (PSH)
- Slide 76
- Slide 77
- Slide 78
- Slide 79
- Local Weighted Regression (LWR)
- Results
- Slide 82
- Results ndash real data
- Slide 84
- Slide 85
- Fast pose estimation - summary
- Food for Thought
- Food for Thought Point Location in Different Spheres (PLDS)
- Motivation
- Slide 90
- Slide 91
- Slide 92
- Slide 93
- Image segmentation algorithm
- Slide 95
- Filtering examples
- Segmentation examples
- Mean-shift in high dimensions
- LSH-based data structure
- Choosing the optimal K and L
- Slide 101
- Choosing optimal K and L
- Slide 103
- Data driven partitions
- Additional speedup
- Speedup results
- Food for thought
- A thought for foodhellip
- Summary
- Slide 110
- Thanks
-
![Page 25: K-Nearest Neighbors Search in High Dimensions Tomer Peled Dan Kushnir Tell me who your neighbors are, and I'll know who you are.](https://reader036.fdocuments.us/reader036/viewer/2022062421/56649c765503460f94929f53/html5/thumbnails/25.jpg)
Hash function
Hash function
Data_Item
Key
BinBucket
Hash function
X modulo 3
X=Number in the range 0n
02
Storage Address
Data structure
0
Usually we would like related Data-items to be stored at the same bin
Recall r - Nearest Neighbor
r
(1 + ) r
dist(qp1) r
dist(qp2) (1 + ) r r2=(1 + ) r1
Locality sensitive hashing
r(1 + ) r
(r p1p2 )Sensitiveequiv Pr[I(p)=I(q)] is ldquohighrdquo if p is ldquocloserdquo to qequiv Pr[I(p)=I(q)] is ldquolowrdquo if p isrdquofarrdquo from q
r2=(1 + ) r1
P1P2
Preview
bullGeneral Solution ndash Locality sensitive hashing
bullImplementation for Hamming space
bullGeneralization to l1 amp l2
Hamming Space
bullHamming space = 2N binary strings
bullHamming distance = changed digits
aka Signal distanceRichard Hamming
Hamming SpaceN
010100001111
010100001111
010010000011Distance = 4
bullHamming space
bullHamming distance
SUM(X1 XOR X2)
L1 to Hamming Space Embedding
p
8
C=11
1111111100011000000000
2
1111111100011000000000
drsquo=Cd
Hash function
Lj Hash function
p Hdrsquoisin
Gj(p)=p|Ij
j=1L k=3 digits
Bits sampling from p
Store p into bucket p|Ij 2k buckets101
11000000000 111111110000 111000000000 111111110001
Construction
1 2 L
p
Query
1 2 L
q
Alternative intuition random projections
p
8
C=11
1111111100011000000000
2
1111111100011000000000
drsquo=Cd
Alternative intuition random projections
8
C=11
1111111100011000000000
2
1111111100011000000000
p
Alternative intuition random projections
8
C=11
1111111100011000000000
2
1111111100011000000000
p
Alternative intuition random projections
101
11000000000 111111110000 111000000000 111111110001
000
100
110
001
101
111
2233 BucketsBucketsp
k samplings
Repeating
Repeating L times
Repeating L times
Secondary hashing
Support volume tuning
dataset-size vs storage volume
2k buckets
011
Size=B
M Buckets
Simple Hashing
MB=αn α=2
Skip
The above hashing is locality-sensitive
bullProbability (pq in same bucket)=
k=1 k=2
Distance (qpi) Distance (qpi)
Pro
babi
lity Pr
Adopted from Piotr Indykrsquos slides
kqp
dimensions
)(Distance1
Preview
bullGeneral Solution ndash Locality sensitive hashing
bullImplementation for Hamming space
bullGeneralization to l2
Direct L2 solution
bullNew hashing function
bullStill based on sampling
bullUsing mathematical trick
bullP-stable distribution for Lp distance bullGaussian distribution for L2 distance
Central limit theorem
v1 +v2 hellip+vn =+hellip
(Weighted Gaussians) = Weighted Gaussian
Central limit theorem
v1vn = Real Numbers
X1Xn = Independent Identically Distributed(iid)
+v2 X2 hellip+vn Xn =+hellipv1 X1
Central limit theorem
XvXvi
ii
ii
21
2||
Dot Product Norm
Norm Distance
XvuXvXui
iii
iii
ii
21
2||
Features vector 1
Features vector 2 Distance
Norm Distance
XvuXvXui
iii
iii
ii
21
2||
Dot Product
Dot Product Distance
The full Hashing
w
bvavh ba )(
[34 82 21]1
227742
d
d random numbers
+b
phaseRandom[0w]
wDiscretization step
Features vector
The full Hashing
w
bvavh ba )(
+34
100
7944
7900 8000 8100 82007800
The full Hashing
w
bvavh ba )(
+34
phaseRandom[0w]
100Discretization step
7944
The full Hashing
w
bvavh ba )(
a1 v d
iid from p-stable distribution
+b
phaseRandom[0w]
wDiscretization step
Features vector
Generalization P-Stable distribution
bullLp p=eps2
bullGeneralized Central Limit Theorem
bullP-stable distributionCauchy for L2
bullL2
bullCentral Limit Theorem
bullGaussian (normal) distribution
P-Stable summary
bullWorks for bullGeneralizes to 0ltplt=2
bullImproves query time
Query time = O (dn1(1+)log(n) ) O (dn1(1+)^2log(n) )
r - Nearest Neighbor
Latest resultsReported in Email by
Alexander Andoni
Parameters selection
bull90 Probability Best quarry time performance
For Euclidean Space
Parameters selectionhellip
For Euclidean Space
bullSingle projection hit an - Nearest Neighbor with Pr=p1
bullk projections hits an - Nearest Neighbor with Pr=p1k
bullL hashings fail to collide with Pr=(1-p1k)L
bullTo ensure Collision (eg 1-δge90)
bull1( -1-p1k)Lge 1-δ)1log(
)log(
1kp
L
L
Reject Non-NeighborsAccept Neighbors
hellipParameters selection
K
k
time Candidates verification Candidates extraction
Better Query Time than Spatial Data Structures
Scales well to higher dimensions and larger data size ( Sub-linear dependence )
Predictable running time
Extra storage over-head
Inefficient for data with distances concentrated around average
works best for Hamming distance (although can be generalized to Euclidean space)
In secondary storage linear scan is pretty much all we can do (for high dim)
requires radius r to be fixed in advance
Pros amp Cons
From Pioter Indyk slides
Conclusion
bullbut at the endeverything depends on your data set
bullTry it at homendashVisit
httpwebmiteduandoniwwwLSHindexhtml
ndashEmail Alex AndoniAndonimitedundashTest over your own data
(C code under Red Hat Linux )
LSH - Applicationsbull Searching video clips in databases (Hierarchical Non-Uniform Locality Sensitive
Hashing and Its Application to Video Identificationldquo Yang Ooi Sun)
bull Searching image databases (see the following)
bull Image segmentation (see the following)
bull Image classification (ldquoDiscriminant adaptive Nearest Neighbor Classificationrdquo T Hastie R Tibshirani)
bull Texture classification (see the following)
bull Clustering (see the following)
bull Embedding and manifold learning (LLE and many others)
bull Compression ndash vector quantizationbull Search engines (ldquoLSH Forest SelfTuning Indexes for Similarity Searchrdquo M Bawa T Condie P Ganesanrdquo)
bull Genomics (ldquoEfficient Large-Scale Sequence Comparison by Locality-Sensitive Hashingrdquo J Buhler)
bull In short whenever K-Nearest Neighbors (KNN) are needed
Motivation
bull A variety of procedures in learning require KNN computation
bull KNN search is a computational bottleneck
bull LSH provides a fast approximate solution to the problem
bull LSH requires hash function construction and parameter tunning
Outline
Fast Pose Estimation with Parameter Sensitive Hashing G Shakhnarovich P Viola and T Darrell
bull Finding sensitive hash functions
Mean Shift Based Clustering in HighDimensions A Texture Classification Example
B Georgescu I Shimshoni and P Meer
bull Tuning LSH parametersbull LSH data structure is used for algorithm
speedups
Given an image x what are the parameters θ in this image
ie angles of joints orientation of the body etc1048698
The Problem
Fast Pose Estimation with Parameter Sensitive Hashing
G Shakhnarovich P Viola and T Darrell
i
Ingredients
bull Input query image with unknown angles (parameters)
bull Database of human poses with known anglesbull Image feature extractor ndash edge detector
bull Distance metric in feature space dx
bull Distance metric in angles space
m
i
iid1
2121 )cos(1)(
Example based learning
bull Construct a database of example images with their known angles
bull Given a query image run your favorite feature extractorbull Compute KNN from databasebull Use these KNNs to compute the average angles of the
query
Input queryFind KNN in database of examples
Output Average angles of KNN
Input Query
Features extraction
Processed query
PSH (LSH)
Database of examples
The algorithm flow
LWR (Regression)
Output Match
The image features
B A
Axx 4107 )(
4
3
2
4 0
Image features are multi-scale edge histograms
Feature Extraction PSH LWR
PSH The basic assumption
There are two metric spaces here feature space ( )
and parameter space ( )
We want similarity to be measured in the angles
space whereas LSH works on the feature space
bull Assumption The feature space is closely related to the parameter space
xd
d
Feature Extraction PSH LWR
Insight Manifolds
bull Manifold is a space in which every point has a neighborhood resembling a Euclid space
bull But global structure may be complicated curved
bull For example lines are 1D manifolds planes are 2D manifolds etc
Feature Extraction PSH LWR
Parameters Space (angles)
Feature Space
q
Is this Magic
Parameter Sensitive Hashing (PSH)
The trick
Estimate performance of different hash functions on examples and select those sensitive to
The hash functions are applied in feature space but the KNN are valid in angle space
d
Feature Extraction PSH LWR
Label pairs of examples with similar angles
Define hash functions h on feature space
Feature Extraction PSH LWR
Predict labeling of similarnon-similar examples by using h
Compare labeling
If labeling by h is goodaccept h else change h
PSH as a classification problem
+1 +1 -1 -1
(r=025)
Labels
)1()( if 1
)( if 1y
labeled is
)x()(x examples ofpair A
ij
ji
rd
rd
ji
ji
ji
Feature Extraction PSH LWR
otherwise 1-
T(x) if 1)(
xh T
A binary hash functionfeatures
otherwise 1
if 1ˆ
labels ePredict th
)(xh)(xh)x(xy
jTiTjih
Feature Extraction PSH LWR
Feature
Feature Extraction PSH LWR
sconstraint iesprobabilit with thelabeling true thepredicts that Tbest theFind
themseparateor bin
same in the examplesboth place willTh
)(xT
Local Weighted Regression (LWR)bull Given a query image PSH returns
KNNs
bull LWR uses the KNN to compute a weighted average of the estimated angles of the query
weightdist
iXiixNx
xxdKxgdi
0)(
))(())((minarg0
Feature Extraction PSH LWR
Results
Synthetic data were generated
bull 13 angles 1 for rotation of the torso 12 for joints
bull 150000 images
bull Nuisance parameters added clothing illumination face expression
bull 1775000 example pairs
bull Selected 137 out of 5123 meaningful features (how)
18 bit hash functions (k) 150 hash tables (l)
bull Test on 1000 synthetic examplesbull PSH searched only 34 of the data per query
bull Without selection needed 40 bits and
1000 hash tables
Recall P1 is prob of positive hashP2 is prob of bad hashB is the max number of pts in a bucket
Results ndash real data
bull 800 images
bull Processed by a segmentation algorithm
bull 13 of the data were searched
Results ndash real data
Interesting mismatches
Fast pose estimation - summary
bull Fast way to compute the angles of human body figure
bull Moving from one representation space to another
bull Training a sensitive hash function
bull KNN smart averaging
Food for Thought
bull The basic assumption may be problematic (distance metric representations)
bull The training set should be dense
bull Texture and clutter
bull General some features are more important than others and should be weighted
Food for Thought Point Location in Different Spheres (PLDS)
bull Given n spheres in Rd centered at P=p1hellippn
with radii r1helliprn
bull Goal given a query q preprocess the points in P to find point pi that its sphere lsquocoverrsquo the query q
qpi
ri
Courtesy of Mohamad Hegaze
Motivationbull Clustering high dimensional data by using local
density measurements (eg feature space)bull Statistical curse of dimensionality
sparseness of the databull Computational curse of dimensionality
expensive range queriesbull LSH parameters should be adjusted for optimal
performance
Mean-Shift Based Clustering in High Dimensions A Texture Classification Example
B Georgescu I Shimshoni and P Meer
Outline
bull Mean-shift in a nutshell + examples
Our scope
bull Mean-shift in high dimensions ndash using LSH
bull Speedups1 Finding optimal LSH parameters
2 Data-driven partitions into buckets
3 Additional speedup by using LSH data structure
Mean-Shift in a Nutshellbandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
point
KNN in mean-shift
Bandwidth should be inversely proportional to the density in the region
high density - small bandwidth low density - large bandwidth
Based on kth nearest neighbor of the point
The bandwidth is
Adaptive mean-shift vs non-adaptive
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Image segmentation algorithm1 Input Data in 5D (3 color + 2 xy) or 3D (1 gray +2 xy)2 Resolution controlled by the bandwidth hs (spatial) hr (color)3 Apply filtering
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
3D
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Image segmentation algorithm
original segmented
filtered
Filtering pixel value of the nearest mode
Mean-shift trajectories
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
original squirrel filtered
original baboon filtered
Filtering examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Segmentation examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Mean-shift in high dimensions
Computational curse of dimensionality
Statistical curse of dimensionality
Expensive range queries implemented with LSH
Sparseness of the data variable bandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
LSH-based data structure
bull Choose L random partitionsEach partition includes K pairs
(dkvk)bull For each point we check
kdi vxK
It Partitions the data into cells
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing the optimal K and L
bull For a query q compute smallest number of distances to points in its buckets
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
points extra includemight big toois L ifbut
missed bemight points small toois L If
cell ain points ofnumber smaller k Large
C
l
l
CC
dC
LNN
dKnN
)1(
C
C
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
structure data theof resolution thedetermines
decreases but increases increases L As
C
CC
Choosing optimal K and LDetermine accurately the KNN for m randomly-selected data points
distance (bandwidth)
Choose error threshold
The optimal K and L should satisfy
the approximate distance
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing optimal K and Lbull For each K estimate the error forbull In one run for all Lrsquos find the minimal L satisfying the constraint L(K)bull Minimize time t(KL(K))
minimum
Approximationerror for KL
L(K) for =005 Running timet[KL(K)]
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Data driven partitions
bull In original LSH cut values are random in the range of the databull Suggestion Randomly select a point from the data and use one of its coordinates as the cut value
uniform data driven pointsbucket distribution
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Additional speedup
aggregate)an of typea like is (C mode same
the toconverge willCin points all that Assume
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
C
C
Speedup results
65536 points 1638 points sampled k=100
Food for thought
Low dimension High dimension
A thought for foodhellipbull Choose K L by sample learning or take the traditionalbull Can one estimate K L without samplingbull A thought for food does it help to know the data
dimensionality or the data manifoldbull Intuitively dimensionality implies the number of hash
functions neededbull The catch efficient dimensionality learning requires
KNN
1530 cookieshellip
Summary
bull LSH suggests a compromise on accuracy for the gain of complexity
bull Applications that involve massive data in high dimension require the LSH fast performance
bull Extension of the LSH to different spaces (PSH)
bull Learning the LSH parameters and hash functions for different applications
Conclusion
bull but at the endeverything depends on your data set
bull Try it at homendash Visit
httpwebmiteduandoniwwwLSHindexhtmlndash Email Alex Andoni Andonimitedundash Test over your own data
(C code under Red Hat Linux )
Thanks
bull Ilan Shimshoni (Haifa)
bull Mohamad Hegaze (Weizmann)
bull Alex Andoni (MIT)
bull Mica and Denis
- k-Nearest Neighbors Search in High Dimensions
- Outline
- Nearest Neighbor Search Problem definition
- Applications
- Naiumlve solution
- Common solution
- When to use nearest neighbor
- Nearest Neighbor
- r - Nearest Neighbor
- Slide 10
- The simplest solution
- Quadtree
- Quadtree - structure
- Quadtree - Query
- Quadtree ndash Pitfall1
- Slide 16
- Quadtree ndash pitfall 2
- Space partition based algorithms
- Slide 19
- Curse of dimensionality
- Curse of dimensionality Some intuition
- Slide 22
- Preview
- Hash function
- Slide 25
- Slide 26
- Recall r - Nearest Neighbor
- Locality sensitive hashing
- Slide 29
- Hamming Space
- Slide 31
- L1 to Hamming Space Embedding
- Slide 33
- Construction
- Query
- Alternative intuition random projections
- Slide 37
- Slide 38
- Slide 39
- k samplings
- Repeating
- Repeating L times
- Slide 43
- Secondary hashing
- The above hashing is locality-sensitive
- Slide 46
- Direct L2 solution
- Central limit theorem
- Slide 49
- Slide 50
- Norm Distance
- Slide 52
- The full Hashing
- Slide 54
- Slide 55
- Slide 56
- Generalization P-Stable distribution
- P-Stable summary
- Parameters selection
- Parameters selection hellip
- hellip Parameters selection
- Pros amp Cons
- Conclusion
- LSH - Applications
- Motivation
- Slide 66
- Given an image x what are the parameters θ in this image ie angles of joints orientation of the body etc1048698
- Ingredients
- Example based learning
- Slide 70
- The image features
- PSH The basic assumption
- Insight Manifolds
- Slide 74
- Parameter Sensitive Hashing (PSH)
- Slide 76
- Slide 77
- Slide 78
- Slide 79
- Local Weighted Regression (LWR)
- Results
- Slide 82
- Results ndash real data
- Slide 84
- Slide 85
- Fast pose estimation - summary
- Food for Thought
- Food for Thought Point Location in Different Spheres (PLDS)
- Motivation
- Slide 90
- Slide 91
- Slide 92
- Slide 93
- Image segmentation algorithm
- Slide 95
- Filtering examples
- Segmentation examples
- Mean-shift in high dimensions
- LSH-based data structure
- Choosing the optimal K and L
- Slide 101
- Choosing optimal K and L
- Slide 103
- Data driven partitions
- Additional speedup
- Speedup results
- Food for thought
- A thought for foodhellip
- Summary
- Slide 110
- Thanks
-
![Page 26: K-Nearest Neighbors Search in High Dimensions Tomer Peled Dan Kushnir Tell me who your neighbors are, and I'll know who you are.](https://reader036.fdocuments.us/reader036/viewer/2022062421/56649c765503460f94929f53/html5/thumbnails/26.jpg)
Hash function
X modulo 3
X=Number in the range 0n
02
Storage Address
Data structure
0
Usually we would like related Data-items to be stored at the same bin
Recall r - Nearest Neighbor
r
(1 + ) r
dist(qp1) r
dist(qp2) (1 + ) r r2=(1 + ) r1
Locality sensitive hashing
r(1 + ) r
(r p1p2 )Sensitiveequiv Pr[I(p)=I(q)] is ldquohighrdquo if p is ldquocloserdquo to qequiv Pr[I(p)=I(q)] is ldquolowrdquo if p isrdquofarrdquo from q
r2=(1 + ) r1
P1P2
Preview
bullGeneral Solution ndash Locality sensitive hashing
bullImplementation for Hamming space
bullGeneralization to l1 amp l2
Hamming Space
bullHamming space = 2N binary strings
bullHamming distance = changed digits
aka Signal distanceRichard Hamming
Hamming SpaceN
010100001111
010100001111
010010000011Distance = 4
bullHamming space
bullHamming distance
SUM(X1 XOR X2)
L1 to Hamming Space Embedding
p
8
C=11
1111111100011000000000
2
1111111100011000000000
drsquo=Cd
Hash function
Lj Hash function
p Hdrsquoisin
Gj(p)=p|Ij
j=1L k=3 digits
Bits sampling from p
Store p into bucket p|Ij 2k buckets101
11000000000 111111110000 111000000000 111111110001
Construction
1 2 L
p
Query
1 2 L
q
Alternative intuition random projections
p
8
C=11
1111111100011000000000
2
1111111100011000000000
drsquo=Cd
Alternative intuition random projections
8
C=11
1111111100011000000000
2
1111111100011000000000
p
Alternative intuition random projections
8
C=11
1111111100011000000000
2
1111111100011000000000
p
Alternative intuition random projections
101
11000000000 111111110000 111000000000 111111110001
000
100
110
001
101
111
2233 BucketsBucketsp
k samplings
Repeating
Repeating L times
Repeating L times
Secondary hashing
Support volume tuning
dataset-size vs storage volume
2k buckets
011
Size=B
M Buckets
Simple Hashing
MB=αn α=2
Skip
The above hashing is locality-sensitive
bullProbability (pq in same bucket)=
k=1 k=2
Distance (qpi) Distance (qpi)
Pro
babi
lity Pr
Adopted from Piotr Indykrsquos slides
kqp
dimensions
)(Distance1
Preview
bullGeneral Solution ndash Locality sensitive hashing
bullImplementation for Hamming space
bullGeneralization to l2
Direct L2 solution
bullNew hashing function
bullStill based on sampling
bullUsing mathematical trick
bullP-stable distribution for Lp distance bullGaussian distribution for L2 distance
Central limit theorem
v1 +v2 hellip+vn =+hellip
(Weighted Gaussians) = Weighted Gaussian
Central limit theorem
v1vn = Real Numbers
X1Xn = Independent Identically Distributed(iid)
+v2 X2 hellip+vn Xn =+hellipv1 X1
Central limit theorem
XvXvi
ii
ii
21
2||
Dot Product Norm
Norm Distance
XvuXvXui
iii
iii
ii
21
2||
Features vector 1
Features vector 2 Distance
Norm Distance
XvuXvXui
iii
iii
ii
21
2||
Dot Product
Dot Product Distance
The full Hashing
w
bvavh ba )(
[34 82 21]1
227742
d
d random numbers
+b
phaseRandom[0w]
wDiscretization step
Features vector
The full Hashing
w
bvavh ba )(
+34
100
7944
7900 8000 8100 82007800
The full Hashing
w
bvavh ba )(
+34
phaseRandom[0w]
100Discretization step
7944
The full Hashing
w
bvavh ba )(
a1 v d
iid from p-stable distribution
+b
phaseRandom[0w]
wDiscretization step
Features vector
Generalization P-Stable distribution
bullLp p=eps2
bullGeneralized Central Limit Theorem
bullP-stable distributionCauchy for L2
bullL2
bullCentral Limit Theorem
bullGaussian (normal) distribution
P-Stable summary
bullWorks for bullGeneralizes to 0ltplt=2
bullImproves query time
Query time = O (dn1(1+)log(n) ) O (dn1(1+)^2log(n) )
r - Nearest Neighbor
Latest resultsReported in Email by
Alexander Andoni
Parameters selection
bull90 Probability Best quarry time performance
For Euclidean Space
Parameters selectionhellip
For Euclidean Space
bullSingle projection hit an - Nearest Neighbor with Pr=p1
bullk projections hits an - Nearest Neighbor with Pr=p1k
bullL hashings fail to collide with Pr=(1-p1k)L
bullTo ensure Collision (eg 1-δge90)
bull1( -1-p1k)Lge 1-δ)1log(
)log(
1kp
L
L
Reject Non-NeighborsAccept Neighbors
hellipParameters selection
K
k
time Candidates verification Candidates extraction
Better Query Time than Spatial Data Structures
Scales well to higher dimensions and larger data size ( Sub-linear dependence )
Predictable running time
Extra storage over-head
Inefficient for data with distances concentrated around average
works best for Hamming distance (although can be generalized to Euclidean space)
In secondary storage linear scan is pretty much all we can do (for high dim)
requires radius r to be fixed in advance
Pros amp Cons
From Pioter Indyk slides
Conclusion
bullbut at the endeverything depends on your data set
bullTry it at homendashVisit
httpwebmiteduandoniwwwLSHindexhtml
ndashEmail Alex AndoniAndonimitedundashTest over your own data
(C code under Red Hat Linux )
LSH - Applicationsbull Searching video clips in databases (Hierarchical Non-Uniform Locality Sensitive
Hashing and Its Application to Video Identificationldquo Yang Ooi Sun)
bull Searching image databases (see the following)
bull Image segmentation (see the following)
bull Image classification (ldquoDiscriminant adaptive Nearest Neighbor Classificationrdquo T Hastie R Tibshirani)
bull Texture classification (see the following)
bull Clustering (see the following)
bull Embedding and manifold learning (LLE and many others)
bull Compression ndash vector quantizationbull Search engines (ldquoLSH Forest SelfTuning Indexes for Similarity Searchrdquo M Bawa T Condie P Ganesanrdquo)
bull Genomics (ldquoEfficient Large-Scale Sequence Comparison by Locality-Sensitive Hashingrdquo J Buhler)
bull In short whenever K-Nearest Neighbors (KNN) are needed
Motivation
bull A variety of procedures in learning require KNN computation
bull KNN search is a computational bottleneck
bull LSH provides a fast approximate solution to the problem
bull LSH requires hash function construction and parameter tunning
Outline
Fast Pose Estimation with Parameter Sensitive Hashing G Shakhnarovich P Viola and T Darrell
bull Finding sensitive hash functions
Mean Shift Based Clustering in HighDimensions A Texture Classification Example
B Georgescu I Shimshoni and P Meer
bull Tuning LSH parametersbull LSH data structure is used for algorithm
speedups
Given an image x what are the parameters θ in this image
ie angles of joints orientation of the body etc1048698
The Problem
Fast Pose Estimation with Parameter Sensitive Hashing
G Shakhnarovich P Viola and T Darrell
i
Ingredients
bull Input query image with unknown angles (parameters)
bull Database of human poses with known anglesbull Image feature extractor ndash edge detector
bull Distance metric in feature space dx
bull Distance metric in angles space
m
i
iid1
2121 )cos(1)(
Example based learning
bull Construct a database of example images with their known angles
bull Given a query image run your favorite feature extractorbull Compute KNN from databasebull Use these KNNs to compute the average angles of the
query
Input queryFind KNN in database of examples
Output Average angles of KNN
Input Query
Features extraction
Processed query
PSH (LSH)
Database of examples
The algorithm flow
LWR (Regression)
Output Match
The image features
B A
Axx 4107 )(
4
3
2
4 0
Image features are multi-scale edge histograms
Feature Extraction PSH LWR
PSH The basic assumption
There are two metric spaces here feature space ( )
and parameter space ( )
We want similarity to be measured in the angles
space whereas LSH works on the feature space
bull Assumption The feature space is closely related to the parameter space
xd
d
Feature Extraction PSH LWR
Insight Manifolds
bull Manifold is a space in which every point has a neighborhood resembling a Euclid space
bull But global structure may be complicated curved
bull For example lines are 1D manifolds planes are 2D manifolds etc
Feature Extraction PSH LWR
Parameters Space (angles)
Feature Space
q
Is this Magic
Parameter Sensitive Hashing (PSH)
The trick
Estimate performance of different hash functions on examples and select those sensitive to
The hash functions are applied in feature space but the KNN are valid in angle space
d
Feature Extraction PSH LWR
Label pairs of examples with similar angles
Define hash functions h on feature space
Feature Extraction PSH LWR
Predict labeling of similarnon-similar examples by using h
Compare labeling
If labeling by h is goodaccept h else change h
PSH as a classification problem
+1 +1 -1 -1
(r=025)
Labels
)1()( if 1
)( if 1y
labeled is
)x()(x examples ofpair A
ij
ji
rd
rd
ji
ji
ji
Feature Extraction PSH LWR
otherwise 1-
T(x) if 1)(
xh T
A binary hash functionfeatures
otherwise 1
if 1ˆ
labels ePredict th
)(xh)(xh)x(xy
jTiTjih
Feature Extraction PSH LWR
Feature
Feature Extraction PSH LWR
sconstraint iesprobabilit with thelabeling true thepredicts that Tbest theFind
themseparateor bin
same in the examplesboth place willTh
)(xT
Local Weighted Regression (LWR)bull Given a query image PSH returns
KNNs
bull LWR uses the KNN to compute a weighted average of the estimated angles of the query
weightdist
iXiixNx
xxdKxgdi
0)(
))(())((minarg0
Feature Extraction PSH LWR
Results
Synthetic data were generated
bull 13 angles 1 for rotation of the torso 12 for joints
bull 150000 images
bull Nuisance parameters added clothing illumination face expression
bull 1775000 example pairs
bull Selected 137 out of 5123 meaningful features (how)
18 bit hash functions (k) 150 hash tables (l)
bull Test on 1000 synthetic examplesbull PSH searched only 34 of the data per query
bull Without selection needed 40 bits and
1000 hash tables
Recall P1 is prob of positive hashP2 is prob of bad hashB is the max number of pts in a bucket
Results ndash real data
bull 800 images
bull Processed by a segmentation algorithm
bull 13 of the data were searched
Results ndash real data
Interesting mismatches
Fast pose estimation - summary
bull Fast way to compute the angles of human body figure
bull Moving from one representation space to another
bull Training a sensitive hash function
bull KNN smart averaging
Food for Thought
bull The basic assumption may be problematic (distance metric representations)
bull The training set should be dense
bull Texture and clutter
bull General some features are more important than others and should be weighted
Food for Thought Point Location in Different Spheres (PLDS)
bull Given n spheres in Rd centered at P=p1hellippn
with radii r1helliprn
bull Goal given a query q preprocess the points in P to find point pi that its sphere lsquocoverrsquo the query q
qpi
ri
Courtesy of Mohamad Hegaze
Motivationbull Clustering high dimensional data by using local
density measurements (eg feature space)bull Statistical curse of dimensionality
sparseness of the databull Computational curse of dimensionality
expensive range queriesbull LSH parameters should be adjusted for optimal
performance
Mean-Shift Based Clustering in High Dimensions A Texture Classification Example
B Georgescu I Shimshoni and P Meer
Outline
bull Mean-shift in a nutshell + examples
Our scope
bull Mean-shift in high dimensions ndash using LSH
bull Speedups1 Finding optimal LSH parameters
2 Data-driven partitions into buckets
3 Additional speedup by using LSH data structure
Mean-Shift in a Nutshellbandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
point
KNN in mean-shift
Bandwidth should be inversely proportional to the density in the region
high density - small bandwidth low density - large bandwidth
Based on kth nearest neighbor of the point
The bandwidth is
Adaptive mean-shift vs non-adaptive
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Image segmentation algorithm1 Input Data in 5D (3 color + 2 xy) or 3D (1 gray +2 xy)2 Resolution controlled by the bandwidth hs (spatial) hr (color)3 Apply filtering
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
3D
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Image segmentation algorithm
original segmented
filtered
Filtering pixel value of the nearest mode
Mean-shift trajectories
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
original squirrel filtered
original baboon filtered
Filtering examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Segmentation examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Mean-shift in high dimensions
Computational curse of dimensionality
Statistical curse of dimensionality
Expensive range queries implemented with LSH
Sparseness of the data variable bandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
LSH-based data structure
bull Choose L random partitionsEach partition includes K pairs
(dkvk)bull For each point we check
kdi vxK
It Partitions the data into cells
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing the optimal K and L
bull For a query q compute smallest number of distances to points in its buckets
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
points extra includemight big toois L ifbut
missed bemight points small toois L If
cell ain points ofnumber smaller k Large
C
l
l
CC
dC
LNN
dKnN
)1(
C
C
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
structure data theof resolution thedetermines
decreases but increases increases L As
C
CC
Choosing optimal K and LDetermine accurately the KNN for m randomly-selected data points
distance (bandwidth)
Choose error threshold
The optimal K and L should satisfy
the approximate distance
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing optimal K and Lbull For each K estimate the error forbull In one run for all Lrsquos find the minimal L satisfying the constraint L(K)bull Minimize time t(KL(K))
minimum
Approximationerror for KL
L(K) for =005 Running timet[KL(K)]
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Data driven partitions
bull In original LSH cut values are random in the range of the databull Suggestion Randomly select a point from the data and use one of its coordinates as the cut value
uniform data driven pointsbucket distribution
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Additional speedup
aggregate)an of typea like is (C mode same
the toconverge willCin points all that Assume
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
C
C
Speedup results
65536 points 1638 points sampled k=100
Food for thought
Low dimension High dimension
A thought for foodhellipbull Choose K L by sample learning or take the traditionalbull Can one estimate K L without samplingbull A thought for food does it help to know the data
dimensionality or the data manifoldbull Intuitively dimensionality implies the number of hash
functions neededbull The catch efficient dimensionality learning requires
KNN
1530 cookieshellip
Summary
bull LSH suggests a compromise on accuracy for the gain of complexity
bull Applications that involve massive data in high dimension require the LSH fast performance
bull Extension of the LSH to different spaces (PSH)
bull Learning the LSH parameters and hash functions for different applications
Conclusion
bull but at the endeverything depends on your data set
bull Try it at homendash Visit
httpwebmiteduandoniwwwLSHindexhtmlndash Email Alex Andoni Andonimitedundash Test over your own data
(C code under Red Hat Linux )
Thanks
bull Ilan Shimshoni (Haifa)
bull Mohamad Hegaze (Weizmann)
bull Alex Andoni (MIT)
bull Mica and Denis
- k-Nearest Neighbors Search in High Dimensions
- Outline
- Nearest Neighbor Search Problem definition
- Applications
- Naiumlve solution
- Common solution
- When to use nearest neighbor
- Nearest Neighbor
- r - Nearest Neighbor
- Slide 10
- The simplest solution
- Quadtree
- Quadtree - structure
- Quadtree - Query
- Quadtree ndash Pitfall1
- Slide 16
- Quadtree ndash pitfall 2
- Space partition based algorithms
- Slide 19
- Curse of dimensionality
- Curse of dimensionality Some intuition
- Slide 22
- Preview
- Hash function
- Slide 25
- Slide 26
- Recall r - Nearest Neighbor
- Locality sensitive hashing
- Slide 29
- Hamming Space
- Slide 31
- L1 to Hamming Space Embedding
- Slide 33
- Construction
- Query
- Alternative intuition random projections
- Slide 37
- Slide 38
- Slide 39
- k samplings
- Repeating
- Repeating L times
- Slide 43
- Secondary hashing
- The above hashing is locality-sensitive
- Slide 46
- Direct L2 solution
- Central limit theorem
- Slide 49
- Slide 50
- Norm Distance
- Slide 52
- The full Hashing
- Slide 54
- Slide 55
- Slide 56
- Generalization P-Stable distribution
- P-Stable summary
- Parameters selection
- Parameters selection hellip
- hellip Parameters selection
- Pros amp Cons
- Conclusion
- LSH - Applications
- Motivation
- Slide 66
- Given an image x what are the parameters θ in this image ie angles of joints orientation of the body etc1048698
- Ingredients
- Example based learning
- Slide 70
- The image features
- PSH The basic assumption
- Insight Manifolds
- Slide 74
- Parameter Sensitive Hashing (PSH)
- Slide 76
- Slide 77
- Slide 78
- Slide 79
- Local Weighted Regression (LWR)
- Results
- Slide 82
- Results ndash real data
- Slide 84
- Slide 85
- Fast pose estimation - summary
- Food for Thought
- Food for Thought Point Location in Different Spheres (PLDS)
- Motivation
- Slide 90
- Slide 91
- Slide 92
- Slide 93
- Image segmentation algorithm
- Slide 95
- Filtering examples
- Segmentation examples
- Mean-shift in high dimensions
- LSH-based data structure
- Choosing the optimal K and L
- Slide 101
- Choosing optimal K and L
- Slide 103
- Data driven partitions
- Additional speedup
- Speedup results
- Food for thought
- A thought for foodhellip
- Summary
- Slide 110
- Thanks
-
![Page 27: K-Nearest Neighbors Search in High Dimensions Tomer Peled Dan Kushnir Tell me who your neighbors are, and I'll know who you are.](https://reader036.fdocuments.us/reader036/viewer/2022062421/56649c765503460f94929f53/html5/thumbnails/27.jpg)
Recall r - Nearest Neighbor
r
(1 + ) r
dist(qp1) r
dist(qp2) (1 + ) r r2=(1 + ) r1
Locality sensitive hashing
r(1 + ) r
(r p1p2 )Sensitiveequiv Pr[I(p)=I(q)] is ldquohighrdquo if p is ldquocloserdquo to qequiv Pr[I(p)=I(q)] is ldquolowrdquo if p isrdquofarrdquo from q
r2=(1 + ) r1
P1P2
Preview
bullGeneral Solution ndash Locality sensitive hashing
bullImplementation for Hamming space
bullGeneralization to l1 amp l2
Hamming Space
bullHamming space = 2N binary strings
bullHamming distance = changed digits
aka Signal distanceRichard Hamming
Hamming SpaceN
010100001111
010100001111
010010000011Distance = 4
bullHamming space
bullHamming distance
SUM(X1 XOR X2)
L1 to Hamming Space Embedding
p
8
C=11
1111111100011000000000
2
1111111100011000000000
drsquo=Cd
Hash function
Lj Hash function
p Hdrsquoisin
Gj(p)=p|Ij
j=1L k=3 digits
Bits sampling from p
Store p into bucket p|Ij 2k buckets101
11000000000 111111110000 111000000000 111111110001
Construction
1 2 L
p
Query
1 2 L
q
Alternative intuition random projections
p
8
C=11
1111111100011000000000
2
1111111100011000000000
drsquo=Cd
Alternative intuition random projections
8
C=11
1111111100011000000000
2
1111111100011000000000
p
Alternative intuition random projections
8
C=11
1111111100011000000000
2
1111111100011000000000
p
Alternative intuition random projections
101
11000000000 111111110000 111000000000 111111110001
000
100
110
001
101
111
2233 BucketsBucketsp
k samplings
Repeating
Repeating L times
Repeating L times
Secondary hashing
Support volume tuning
dataset-size vs storage volume
2k buckets
011
Size=B
M Buckets
Simple Hashing
MB=αn α=2
Skip
The above hashing is locality-sensitive
bullProbability (pq in same bucket)=
k=1 k=2
Distance (qpi) Distance (qpi)
Pro
babi
lity Pr
Adopted from Piotr Indykrsquos slides
kqp
dimensions
)(Distance1
Preview
bullGeneral Solution ndash Locality sensitive hashing
bullImplementation for Hamming space
bullGeneralization to l2
Direct L2 solution
bullNew hashing function
bullStill based on sampling
bullUsing mathematical trick
bullP-stable distribution for Lp distance bullGaussian distribution for L2 distance
Central limit theorem
v1 +v2 hellip+vn =+hellip
(Weighted Gaussians) = Weighted Gaussian
Central limit theorem
v1vn = Real Numbers
X1Xn = Independent Identically Distributed(iid)
+v2 X2 hellip+vn Xn =+hellipv1 X1
Central limit theorem
XvXvi
ii
ii
21
2||
Dot Product Norm
Norm Distance
XvuXvXui
iii
iii
ii
21
2||
Features vector 1
Features vector 2 Distance
Norm Distance
XvuXvXui
iii
iii
ii
21
2||
Dot Product
Dot Product Distance
The full Hashing
w
bvavh ba )(
[34 82 21]1
227742
d
d random numbers
+b
phaseRandom[0w]
wDiscretization step
Features vector
The full Hashing
w
bvavh ba )(
+34
100
7944
7900 8000 8100 82007800
The full Hashing
w
bvavh ba )(
+34
phaseRandom[0w]
100Discretization step
7944
The full Hashing
w
bvavh ba )(
a1 v d
iid from p-stable distribution
+b
phaseRandom[0w]
wDiscretization step
Features vector
Generalization P-Stable distribution
bullLp p=eps2
bullGeneralized Central Limit Theorem
bullP-stable distributionCauchy for L2
bullL2
bullCentral Limit Theorem
bullGaussian (normal) distribution
P-Stable summary
bullWorks for bullGeneralizes to 0ltplt=2
bullImproves query time
Query time = O (dn1(1+)log(n) ) O (dn1(1+)^2log(n) )
r - Nearest Neighbor
Latest resultsReported in Email by
Alexander Andoni
Parameters selection
bull90 Probability Best quarry time performance
For Euclidean Space
Parameters selectionhellip
For Euclidean Space
bullSingle projection hit an - Nearest Neighbor with Pr=p1
bullk projections hits an - Nearest Neighbor with Pr=p1k
bullL hashings fail to collide with Pr=(1-p1k)L
bullTo ensure Collision (eg 1-δge90)
bull1( -1-p1k)Lge 1-δ)1log(
)log(
1kp
L
L
Reject Non-NeighborsAccept Neighbors
hellipParameters selection
K
k
time Candidates verification Candidates extraction
Better Query Time than Spatial Data Structures
Scales well to higher dimensions and larger data size ( Sub-linear dependence )
Predictable running time
Extra storage over-head
Inefficient for data with distances concentrated around average
works best for Hamming distance (although can be generalized to Euclidean space)
In secondary storage linear scan is pretty much all we can do (for high dim)
requires radius r to be fixed in advance
Pros amp Cons
From Pioter Indyk slides
Conclusion
bullbut at the endeverything depends on your data set
bullTry it at homendashVisit
httpwebmiteduandoniwwwLSHindexhtml
ndashEmail Alex AndoniAndonimitedundashTest over your own data
(C code under Red Hat Linux )
LSH - Applicationsbull Searching video clips in databases (Hierarchical Non-Uniform Locality Sensitive
Hashing and Its Application to Video Identificationldquo Yang Ooi Sun)
bull Searching image databases (see the following)
bull Image segmentation (see the following)
bull Image classification (ldquoDiscriminant adaptive Nearest Neighbor Classificationrdquo T Hastie R Tibshirani)
bull Texture classification (see the following)
bull Clustering (see the following)
bull Embedding and manifold learning (LLE and many others)
bull Compression ndash vector quantizationbull Search engines (ldquoLSH Forest SelfTuning Indexes for Similarity Searchrdquo M Bawa T Condie P Ganesanrdquo)
bull Genomics (ldquoEfficient Large-Scale Sequence Comparison by Locality-Sensitive Hashingrdquo J Buhler)
bull In short whenever K-Nearest Neighbors (KNN) are needed
Motivation
bull A variety of procedures in learning require KNN computation
bull KNN search is a computational bottleneck
bull LSH provides a fast approximate solution to the problem
bull LSH requires hash function construction and parameter tunning
Outline
Fast Pose Estimation with Parameter Sensitive Hashing G Shakhnarovich P Viola and T Darrell
bull Finding sensitive hash functions
Mean Shift Based Clustering in HighDimensions A Texture Classification Example
B Georgescu I Shimshoni and P Meer
bull Tuning LSH parametersbull LSH data structure is used for algorithm
speedups
Given an image x what are the parameters θ in this image
ie angles of joints orientation of the body etc1048698
The Problem
Fast Pose Estimation with Parameter Sensitive Hashing
G Shakhnarovich P Viola and T Darrell
i
Ingredients
bull Input query image with unknown angles (parameters)
bull Database of human poses with known anglesbull Image feature extractor ndash edge detector
bull Distance metric in feature space dx
bull Distance metric in angles space
m
i
iid1
2121 )cos(1)(
Example based learning
bull Construct a database of example images with their known angles
bull Given a query image run your favorite feature extractorbull Compute KNN from databasebull Use these KNNs to compute the average angles of the
query
Input queryFind KNN in database of examples
Output Average angles of KNN
Input Query
Features extraction
Processed query
PSH (LSH)
Database of examples
The algorithm flow
LWR (Regression)
Output Match
The image features
B A
Axx 4107 )(
4
3
2
4 0
Image features are multi-scale edge histograms
Feature Extraction PSH LWR
PSH The basic assumption
There are two metric spaces here feature space ( )
and parameter space ( )
We want similarity to be measured in the angles
space whereas LSH works on the feature space
bull Assumption The feature space is closely related to the parameter space
xd
d
Feature Extraction PSH LWR
Insight Manifolds
bull Manifold is a space in which every point has a neighborhood resembling a Euclid space
bull But global structure may be complicated curved
bull For example lines are 1D manifolds planes are 2D manifolds etc
Feature Extraction PSH LWR
Parameters Space (angles)
Feature Space
q
Is this Magic
Parameter Sensitive Hashing (PSH)
The trick
Estimate performance of different hash functions on examples and select those sensitive to
The hash functions are applied in feature space but the KNN are valid in angle space
d
Feature Extraction PSH LWR
Label pairs of examples with similar angles
Define hash functions h on feature space
Feature Extraction PSH LWR
Predict labeling of similarnon-similar examples by using h
Compare labeling
If labeling by h is goodaccept h else change h
PSH as a classification problem
+1 +1 -1 -1
(r=025)
Labels
)1()( if 1
)( if 1y
labeled is
)x()(x examples ofpair A
ij
ji
rd
rd
ji
ji
ji
Feature Extraction PSH LWR
otherwise 1-
T(x) if 1)(
xh T
A binary hash functionfeatures
otherwise 1
if 1ˆ
labels ePredict th
)(xh)(xh)x(xy
jTiTjih
Feature Extraction PSH LWR
Feature
Feature Extraction PSH LWR
sconstraint iesprobabilit with thelabeling true thepredicts that Tbest theFind
themseparateor bin
same in the examplesboth place willTh
)(xT
Local Weighted Regression (LWR)bull Given a query image PSH returns
KNNs
bull LWR uses the KNN to compute a weighted average of the estimated angles of the query
weightdist
iXiixNx
xxdKxgdi
0)(
))(())((minarg0
Feature Extraction PSH LWR
Results
Synthetic data were generated
bull 13 angles 1 for rotation of the torso 12 for joints
bull 150000 images
bull Nuisance parameters added clothing illumination face expression
bull 1775000 example pairs
bull Selected 137 out of 5123 meaningful features (how)
18 bit hash functions (k) 150 hash tables (l)
bull Test on 1000 synthetic examplesbull PSH searched only 34 of the data per query
bull Without selection needed 40 bits and
1000 hash tables
Recall P1 is prob of positive hashP2 is prob of bad hashB is the max number of pts in a bucket
Results ndash real data
bull 800 images
bull Processed by a segmentation algorithm
bull 13 of the data were searched
Results ndash real data
Interesting mismatches
Fast pose estimation - summary
bull Fast way to compute the angles of human body figure
bull Moving from one representation space to another
bull Training a sensitive hash function
bull KNN smart averaging
Food for Thought
bull The basic assumption may be problematic (distance metric representations)
bull The training set should be dense
bull Texture and clutter
bull General some features are more important than others and should be weighted
Food for Thought Point Location in Different Spheres (PLDS)
bull Given n spheres in Rd centered at P=p1hellippn
with radii r1helliprn
bull Goal given a query q preprocess the points in P to find point pi that its sphere lsquocoverrsquo the query q
qpi
ri
Courtesy of Mohamad Hegaze
Motivationbull Clustering high dimensional data by using local
density measurements (eg feature space)bull Statistical curse of dimensionality
sparseness of the databull Computational curse of dimensionality
expensive range queriesbull LSH parameters should be adjusted for optimal
performance
Mean-Shift Based Clustering in High Dimensions A Texture Classification Example
B Georgescu I Shimshoni and P Meer
Outline
bull Mean-shift in a nutshell + examples
Our scope
bull Mean-shift in high dimensions ndash using LSH
bull Speedups1 Finding optimal LSH parameters
2 Data-driven partitions into buckets
3 Additional speedup by using LSH data structure
Mean-Shift in a Nutshellbandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
point
KNN in mean-shift
Bandwidth should be inversely proportional to the density in the region
high density - small bandwidth low density - large bandwidth
Based on kth nearest neighbor of the point
The bandwidth is
Adaptive mean-shift vs non-adaptive
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Image segmentation algorithm1 Input Data in 5D (3 color + 2 xy) or 3D (1 gray +2 xy)2 Resolution controlled by the bandwidth hs (spatial) hr (color)3 Apply filtering
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
3D
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Image segmentation algorithm
original segmented
filtered
Filtering pixel value of the nearest mode
Mean-shift trajectories
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
original squirrel filtered
original baboon filtered
Filtering examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Segmentation examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Mean-shift in high dimensions
Computational curse of dimensionality
Statistical curse of dimensionality
Expensive range queries implemented with LSH
Sparseness of the data variable bandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
LSH-based data structure
bull Choose L random partitionsEach partition includes K pairs
(dkvk)bull For each point we check
kdi vxK
It Partitions the data into cells
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing the optimal K and L
bull For a query q compute smallest number of distances to points in its buckets
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
points extra includemight big toois L ifbut
missed bemight points small toois L If
cell ain points ofnumber smaller k Large
C
l
l
CC
dC
LNN
dKnN
)1(
C
C
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
structure data theof resolution thedetermines
decreases but increases increases L As
C
CC
Choosing optimal K and LDetermine accurately the KNN for m randomly-selected data points
distance (bandwidth)
Choose error threshold
The optimal K and L should satisfy
the approximate distance
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing optimal K and Lbull For each K estimate the error forbull In one run for all Lrsquos find the minimal L satisfying the constraint L(K)bull Minimize time t(KL(K))
minimum
Approximationerror for KL
L(K) for =005 Running timet[KL(K)]
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Data driven partitions
bull In original LSH cut values are random in the range of the databull Suggestion Randomly select a point from the data and use one of its coordinates as the cut value
uniform data driven pointsbucket distribution
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Additional speedup
aggregate)an of typea like is (C mode same
the toconverge willCin points all that Assume
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
C
C
Speedup results
65536 points 1638 points sampled k=100
Food for thought
Low dimension High dimension
A thought for foodhellipbull Choose K L by sample learning or take the traditionalbull Can one estimate K L without samplingbull A thought for food does it help to know the data
dimensionality or the data manifoldbull Intuitively dimensionality implies the number of hash
functions neededbull The catch efficient dimensionality learning requires
KNN
1530 cookieshellip
Summary
bull LSH suggests a compromise on accuracy for the gain of complexity
bull Applications that involve massive data in high dimension require the LSH fast performance
bull Extension of the LSH to different spaces (PSH)
bull Learning the LSH parameters and hash functions for different applications
Conclusion
bull but at the endeverything depends on your data set
bull Try it at homendash Visit
httpwebmiteduandoniwwwLSHindexhtmlndash Email Alex Andoni Andonimitedundash Test over your own data
(C code under Red Hat Linux )
Thanks
bull Ilan Shimshoni (Haifa)
bull Mohamad Hegaze (Weizmann)
bull Alex Andoni (MIT)
bull Mica and Denis
- k-Nearest Neighbors Search in High Dimensions
- Outline
- Nearest Neighbor Search Problem definition
- Applications
- Naiumlve solution
- Common solution
- When to use nearest neighbor
- Nearest Neighbor
- r - Nearest Neighbor
- Slide 10
- The simplest solution
- Quadtree
- Quadtree - structure
- Quadtree - Query
- Quadtree ndash Pitfall1
- Slide 16
- Quadtree ndash pitfall 2
- Space partition based algorithms
- Slide 19
- Curse of dimensionality
- Curse of dimensionality Some intuition
- Slide 22
- Preview
- Hash function
- Slide 25
- Slide 26
- Recall r - Nearest Neighbor
- Locality sensitive hashing
- Slide 29
- Hamming Space
- Slide 31
- L1 to Hamming Space Embedding
- Slide 33
- Construction
- Query
- Alternative intuition random projections
- Slide 37
- Slide 38
- Slide 39
- k samplings
- Repeating
- Repeating L times
- Slide 43
- Secondary hashing
- The above hashing is locality-sensitive
- Slide 46
- Direct L2 solution
- Central limit theorem
- Slide 49
- Slide 50
- Norm Distance
- Slide 52
- The full Hashing
- Slide 54
- Slide 55
- Slide 56
- Generalization P-Stable distribution
- P-Stable summary
- Parameters selection
- Parameters selection hellip
- hellip Parameters selection
- Pros amp Cons
- Conclusion
- LSH - Applications
- Motivation
- Slide 66
- Given an image x what are the parameters θ in this image ie angles of joints orientation of the body etc1048698
- Ingredients
- Example based learning
- Slide 70
- The image features
- PSH The basic assumption
- Insight Manifolds
- Slide 74
- Parameter Sensitive Hashing (PSH)
- Slide 76
- Slide 77
- Slide 78
- Slide 79
- Local Weighted Regression (LWR)
- Results
- Slide 82
- Results ndash real data
- Slide 84
- Slide 85
- Fast pose estimation - summary
- Food for Thought
- Food for Thought Point Location in Different Spheres (PLDS)
- Motivation
- Slide 90
- Slide 91
- Slide 92
- Slide 93
- Image segmentation algorithm
- Slide 95
- Filtering examples
- Segmentation examples
- Mean-shift in high dimensions
- LSH-based data structure
- Choosing the optimal K and L
- Slide 101
- Choosing optimal K and L
- Slide 103
- Data driven partitions
- Additional speedup
- Speedup results
- Food for thought
- A thought for foodhellip
- Summary
- Slide 110
- Thanks
-
![Page 28: K-Nearest Neighbors Search in High Dimensions Tomer Peled Dan Kushnir Tell me who your neighbors are, and I'll know who you are.](https://reader036.fdocuments.us/reader036/viewer/2022062421/56649c765503460f94929f53/html5/thumbnails/28.jpg)
Locality sensitive hashing
r(1 + ) r
(r p1p2 )Sensitiveequiv Pr[I(p)=I(q)] is ldquohighrdquo if p is ldquocloserdquo to qequiv Pr[I(p)=I(q)] is ldquolowrdquo if p isrdquofarrdquo from q
r2=(1 + ) r1
P1P2
Preview
bullGeneral Solution ndash Locality sensitive hashing
bullImplementation for Hamming space
bullGeneralization to l1 amp l2
Hamming Space
bullHamming space = 2N binary strings
bullHamming distance = changed digits
aka Signal distanceRichard Hamming
Hamming SpaceN
010100001111
010100001111
010010000011Distance = 4
bullHamming space
bullHamming distance
SUM(X1 XOR X2)
L1 to Hamming Space Embedding
p
8
C=11
1111111100011000000000
2
1111111100011000000000
drsquo=Cd
Hash function
Lj Hash function
p Hdrsquoisin
Gj(p)=p|Ij
j=1L k=3 digits
Bits sampling from p
Store p into bucket p|Ij 2k buckets101
11000000000 111111110000 111000000000 111111110001
Construction
1 2 L
p
Query
1 2 L
q
Alternative intuition random projections
p
8
C=11
1111111100011000000000
2
1111111100011000000000
drsquo=Cd
Alternative intuition random projections
8
C=11
1111111100011000000000
2
1111111100011000000000
p
Alternative intuition random projections
8
C=11
1111111100011000000000
2
1111111100011000000000
p
Alternative intuition random projections
101
11000000000 111111110000 111000000000 111111110001
000
100
110
001
101
111
2233 BucketsBucketsp
k samplings
Repeating
Repeating L times
Repeating L times
Secondary hashing
Support volume tuning
dataset-size vs storage volume
2k buckets
011
Size=B
M Buckets
Simple Hashing
MB=αn α=2
Skip
The above hashing is locality-sensitive
bullProbability (pq in same bucket)=
k=1 k=2
Distance (qpi) Distance (qpi)
Pro
babi
lity Pr
Adopted from Piotr Indykrsquos slides
kqp
dimensions
)(Distance1
Preview
bullGeneral Solution ndash Locality sensitive hashing
bullImplementation for Hamming space
bullGeneralization to l2
Direct L2 solution
bullNew hashing function
bullStill based on sampling
bullUsing mathematical trick
bullP-stable distribution for Lp distance bullGaussian distribution for L2 distance
Central limit theorem
v1 +v2 hellip+vn =+hellip
(Weighted Gaussians) = Weighted Gaussian
Central limit theorem
v1vn = Real Numbers
X1Xn = Independent Identically Distributed(iid)
+v2 X2 hellip+vn Xn =+hellipv1 X1
Central limit theorem
XvXvi
ii
ii
21
2||
Dot Product Norm
Norm Distance
XvuXvXui
iii
iii
ii
21
2||
Features vector 1
Features vector 2 Distance
Norm Distance
XvuXvXui
iii
iii
ii
21
2||
Dot Product
Dot Product Distance
The full Hashing
w
bvavh ba )(
[34 82 21]1
227742
d
d random numbers
+b
phaseRandom[0w]
wDiscretization step
Features vector
The full Hashing
w
bvavh ba )(
+34
100
7944
7900 8000 8100 82007800
The full Hashing
w
bvavh ba )(
+34
phaseRandom[0w]
100Discretization step
7944
The full Hashing
w
bvavh ba )(
a1 v d
iid from p-stable distribution
+b
phaseRandom[0w]
wDiscretization step
Features vector
Generalization P-Stable distribution
bullLp p=eps2
bullGeneralized Central Limit Theorem
bullP-stable distributionCauchy for L2
bullL2
bullCentral Limit Theorem
bullGaussian (normal) distribution
P-Stable summary
bullWorks for bullGeneralizes to 0ltplt=2
bullImproves query time
Query time = O (dn1(1+)log(n) ) O (dn1(1+)^2log(n) )
r - Nearest Neighbor
Latest resultsReported in Email by
Alexander Andoni
Parameters selection
bull90 Probability Best quarry time performance
For Euclidean Space
Parameters selectionhellip
For Euclidean Space
bullSingle projection hit an - Nearest Neighbor with Pr=p1
bullk projections hits an - Nearest Neighbor with Pr=p1k
bullL hashings fail to collide with Pr=(1-p1k)L
bullTo ensure Collision (eg 1-δge90)
bull1( -1-p1k)Lge 1-δ)1log(
)log(
1kp
L
L
Reject Non-NeighborsAccept Neighbors
hellipParameters selection
K
k
time Candidates verification Candidates extraction
Better Query Time than Spatial Data Structures
Scales well to higher dimensions and larger data size ( Sub-linear dependence )
Predictable running time
Extra storage over-head
Inefficient for data with distances concentrated around average
works best for Hamming distance (although can be generalized to Euclidean space)
In secondary storage linear scan is pretty much all we can do (for high dim)
requires radius r to be fixed in advance
Pros amp Cons
From Pioter Indyk slides
Conclusion
bullbut at the endeverything depends on your data set
bullTry it at homendashVisit
httpwebmiteduandoniwwwLSHindexhtml
ndashEmail Alex AndoniAndonimitedundashTest over your own data
(C code under Red Hat Linux )
LSH - Applicationsbull Searching video clips in databases (Hierarchical Non-Uniform Locality Sensitive
Hashing and Its Application to Video Identificationldquo Yang Ooi Sun)
bull Searching image databases (see the following)
bull Image segmentation (see the following)
bull Image classification (ldquoDiscriminant adaptive Nearest Neighbor Classificationrdquo T Hastie R Tibshirani)
bull Texture classification (see the following)
bull Clustering (see the following)
bull Embedding and manifold learning (LLE and many others)
bull Compression ndash vector quantizationbull Search engines (ldquoLSH Forest SelfTuning Indexes for Similarity Searchrdquo M Bawa T Condie P Ganesanrdquo)
bull Genomics (ldquoEfficient Large-Scale Sequence Comparison by Locality-Sensitive Hashingrdquo J Buhler)
bull In short whenever K-Nearest Neighbors (KNN) are needed
Motivation
bull A variety of procedures in learning require KNN computation
bull KNN search is a computational bottleneck
bull LSH provides a fast approximate solution to the problem
bull LSH requires hash function construction and parameter tunning
Outline
Fast Pose Estimation with Parameter Sensitive Hashing G Shakhnarovich P Viola and T Darrell
bull Finding sensitive hash functions
Mean Shift Based Clustering in HighDimensions A Texture Classification Example
B Georgescu I Shimshoni and P Meer
bull Tuning LSH parametersbull LSH data structure is used for algorithm
speedups
Given an image x what are the parameters θ in this image
ie angles of joints orientation of the body etc1048698
The Problem
Fast Pose Estimation with Parameter Sensitive Hashing
G Shakhnarovich P Viola and T Darrell
i
Ingredients
bull Input query image with unknown angles (parameters)
bull Database of human poses with known anglesbull Image feature extractor ndash edge detector
bull Distance metric in feature space dx
bull Distance metric in angles space
m
i
iid1
2121 )cos(1)(
Example based learning
bull Construct a database of example images with their known angles
bull Given a query image run your favorite feature extractorbull Compute KNN from databasebull Use these KNNs to compute the average angles of the
query
Input queryFind KNN in database of examples
Output Average angles of KNN
Input Query
Features extraction
Processed query
PSH (LSH)
Database of examples
The algorithm flow
LWR (Regression)
Output Match
The image features
B A
Axx 4107 )(
4
3
2
4 0
Image features are multi-scale edge histograms
Feature Extraction PSH LWR
PSH The basic assumption
There are two metric spaces here feature space ( )
and parameter space ( )
We want similarity to be measured in the angles
space whereas LSH works on the feature space
bull Assumption The feature space is closely related to the parameter space
xd
d
Feature Extraction PSH LWR
Insight Manifolds
bull Manifold is a space in which every point has a neighborhood resembling a Euclid space
bull But global structure may be complicated curved
bull For example lines are 1D manifolds planes are 2D manifolds etc
Feature Extraction PSH LWR
Parameters Space (angles)
Feature Space
q
Is this Magic
Parameter Sensitive Hashing (PSH)
The trick
Estimate performance of different hash functions on examples and select those sensitive to
The hash functions are applied in feature space but the KNN are valid in angle space
d
Feature Extraction PSH LWR
Label pairs of examples with similar angles
Define hash functions h on feature space
Feature Extraction PSH LWR
Predict labeling of similarnon-similar examples by using h
Compare labeling
If labeling by h is goodaccept h else change h
PSH as a classification problem
+1 +1 -1 -1
(r=025)
Labels
)1()( if 1
)( if 1y
labeled is
)x()(x examples ofpair A
ij
ji
rd
rd
ji
ji
ji
Feature Extraction PSH LWR
otherwise 1-
T(x) if 1)(
xh T
A binary hash functionfeatures
otherwise 1
if 1ˆ
labels ePredict th
)(xh)(xh)x(xy
jTiTjih
Feature Extraction PSH LWR
Feature
Feature Extraction PSH LWR
sconstraint iesprobabilit with thelabeling true thepredicts that Tbest theFind
themseparateor bin
same in the examplesboth place willTh
)(xT
Local Weighted Regression (LWR)bull Given a query image PSH returns
KNNs
bull LWR uses the KNN to compute a weighted average of the estimated angles of the query
weightdist
iXiixNx
xxdKxgdi
0)(
))(())((minarg0
Feature Extraction PSH LWR
Results
Synthetic data were generated
bull 13 angles 1 for rotation of the torso 12 for joints
bull 150000 images
bull Nuisance parameters added clothing illumination face expression
bull 1775000 example pairs
bull Selected 137 out of 5123 meaningful features (how)
18 bit hash functions (k) 150 hash tables (l)
bull Test on 1000 synthetic examplesbull PSH searched only 34 of the data per query
bull Without selection needed 40 bits and
1000 hash tables
Recall P1 is prob of positive hashP2 is prob of bad hashB is the max number of pts in a bucket
Results ndash real data
bull 800 images
bull Processed by a segmentation algorithm
bull 13 of the data were searched
Results ndash real data
Interesting mismatches
Fast pose estimation - summary
bull Fast way to compute the angles of human body figure
bull Moving from one representation space to another
bull Training a sensitive hash function
bull KNN smart averaging
Food for Thought
bull The basic assumption may be problematic (distance metric representations)
bull The training set should be dense
bull Texture and clutter
bull General some features are more important than others and should be weighted
Food for Thought Point Location in Different Spheres (PLDS)
bull Given n spheres in Rd centered at P=p1hellippn
with radii r1helliprn
bull Goal given a query q preprocess the points in P to find point pi that its sphere lsquocoverrsquo the query q
qpi
ri
Courtesy of Mohamad Hegaze
Motivationbull Clustering high dimensional data by using local
density measurements (eg feature space)bull Statistical curse of dimensionality
sparseness of the databull Computational curse of dimensionality
expensive range queriesbull LSH parameters should be adjusted for optimal
performance
Mean-Shift Based Clustering in High Dimensions A Texture Classification Example
B Georgescu I Shimshoni and P Meer
Outline
bull Mean-shift in a nutshell + examples
Our scope
bull Mean-shift in high dimensions ndash using LSH
bull Speedups1 Finding optimal LSH parameters
2 Data-driven partitions into buckets
3 Additional speedup by using LSH data structure
Mean-Shift in a Nutshellbandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
point
KNN in mean-shift
Bandwidth should be inversely proportional to the density in the region
high density - small bandwidth low density - large bandwidth
Based on kth nearest neighbor of the point
The bandwidth is
Adaptive mean-shift vs non-adaptive
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Image segmentation algorithm1 Input Data in 5D (3 color + 2 xy) or 3D (1 gray +2 xy)2 Resolution controlled by the bandwidth hs (spatial) hr (color)3 Apply filtering
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
3D
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Image segmentation algorithm
original segmented
filtered
Filtering pixel value of the nearest mode
Mean-shift trajectories
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
original squirrel filtered
original baboon filtered
Filtering examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Segmentation examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Mean-shift in high dimensions
Computational curse of dimensionality
Statistical curse of dimensionality
Expensive range queries implemented with LSH
Sparseness of the data variable bandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
LSH-based data structure
bull Choose L random partitionsEach partition includes K pairs
(dkvk)bull For each point we check
kdi vxK
It Partitions the data into cells
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing the optimal K and L
bull For a query q compute smallest number of distances to points in its buckets
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
points extra includemight big toois L ifbut
missed bemight points small toois L If
cell ain points ofnumber smaller k Large
C
l
l
CC
dC
LNN
dKnN
)1(
C
C
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
structure data theof resolution thedetermines
decreases but increases increases L As
C
CC
Choosing optimal K and LDetermine accurately the KNN for m randomly-selected data points
distance (bandwidth)
Choose error threshold
The optimal K and L should satisfy
the approximate distance
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing optimal K and Lbull For each K estimate the error forbull In one run for all Lrsquos find the minimal L satisfying the constraint L(K)bull Minimize time t(KL(K))
minimum
Approximationerror for KL
L(K) for =005 Running timet[KL(K)]
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Data driven partitions
bull In original LSH cut values are random in the range of the databull Suggestion Randomly select a point from the data and use one of its coordinates as the cut value
uniform data driven pointsbucket distribution
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Additional speedup
aggregate)an of typea like is (C mode same
the toconverge willCin points all that Assume
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
C
C
Speedup results
65536 points 1638 points sampled k=100
Food for thought
Low dimension High dimension
A thought for foodhellipbull Choose K L by sample learning or take the traditionalbull Can one estimate K L without samplingbull A thought for food does it help to know the data
dimensionality or the data manifoldbull Intuitively dimensionality implies the number of hash
functions neededbull The catch efficient dimensionality learning requires
KNN
1530 cookieshellip
Summary
bull LSH suggests a compromise on accuracy for the gain of complexity
bull Applications that involve massive data in high dimension require the LSH fast performance
bull Extension of the LSH to different spaces (PSH)
bull Learning the LSH parameters and hash functions for different applications
Conclusion
bull but at the endeverything depends on your data set
bull Try it at homendash Visit
httpwebmiteduandoniwwwLSHindexhtmlndash Email Alex Andoni Andonimitedundash Test over your own data
(C code under Red Hat Linux )
Thanks
bull Ilan Shimshoni (Haifa)
bull Mohamad Hegaze (Weizmann)
bull Alex Andoni (MIT)
bull Mica and Denis
- k-Nearest Neighbors Search in High Dimensions
- Outline
- Nearest Neighbor Search Problem definition
- Applications
- Naiumlve solution
- Common solution
- When to use nearest neighbor
- Nearest Neighbor
- r - Nearest Neighbor
- Slide 10
- The simplest solution
- Quadtree
- Quadtree - structure
- Quadtree - Query
- Quadtree ndash Pitfall1
- Slide 16
- Quadtree ndash pitfall 2
- Space partition based algorithms
- Slide 19
- Curse of dimensionality
- Curse of dimensionality Some intuition
- Slide 22
- Preview
- Hash function
- Slide 25
- Slide 26
- Recall r - Nearest Neighbor
- Locality sensitive hashing
- Slide 29
- Hamming Space
- Slide 31
- L1 to Hamming Space Embedding
- Slide 33
- Construction
- Query
- Alternative intuition random projections
- Slide 37
- Slide 38
- Slide 39
- k samplings
- Repeating
- Repeating L times
- Slide 43
- Secondary hashing
- The above hashing is locality-sensitive
- Slide 46
- Direct L2 solution
- Central limit theorem
- Slide 49
- Slide 50
- Norm Distance
- Slide 52
- The full Hashing
- Slide 54
- Slide 55
- Slide 56
- Generalization P-Stable distribution
- P-Stable summary
- Parameters selection
- Parameters selection hellip
- hellip Parameters selection
- Pros amp Cons
- Conclusion
- LSH - Applications
- Motivation
- Slide 66
- Given an image x what are the parameters θ in this image ie angles of joints orientation of the body etc1048698
- Ingredients
- Example based learning
- Slide 70
- The image features
- PSH The basic assumption
- Insight Manifolds
- Slide 74
- Parameter Sensitive Hashing (PSH)
- Slide 76
- Slide 77
- Slide 78
- Slide 79
- Local Weighted Regression (LWR)
- Results
- Slide 82
- Results ndash real data
- Slide 84
- Slide 85
- Fast pose estimation - summary
- Food for Thought
- Food for Thought Point Location in Different Spheres (PLDS)
- Motivation
- Slide 90
- Slide 91
- Slide 92
- Slide 93
- Image segmentation algorithm
- Slide 95
- Filtering examples
- Segmentation examples
- Mean-shift in high dimensions
- LSH-based data structure
- Choosing the optimal K and L
- Slide 101
- Choosing optimal K and L
- Slide 103
- Data driven partitions
- Additional speedup
- Speedup results
- Food for thought
- A thought for foodhellip
- Summary
- Slide 110
- Thanks
-
![Page 29: K-Nearest Neighbors Search in High Dimensions Tomer Peled Dan Kushnir Tell me who your neighbors are, and I'll know who you are.](https://reader036.fdocuments.us/reader036/viewer/2022062421/56649c765503460f94929f53/html5/thumbnails/29.jpg)
Preview
bullGeneral Solution ndash Locality sensitive hashing
bullImplementation for Hamming space
bullGeneralization to l1 amp l2
Hamming Space
bullHamming space = 2N binary strings
bullHamming distance = changed digits
aka Signal distanceRichard Hamming
Hamming SpaceN
010100001111
010100001111
010010000011Distance = 4
bullHamming space
bullHamming distance
SUM(X1 XOR X2)
L1 to Hamming Space Embedding
p
8
C=11
1111111100011000000000
2
1111111100011000000000
drsquo=Cd
Hash function
Lj Hash function
p Hdrsquoisin
Gj(p)=p|Ij
j=1L k=3 digits
Bits sampling from p
Store p into bucket p|Ij 2k buckets101
11000000000 111111110000 111000000000 111111110001
Construction
1 2 L
p
Query
1 2 L
q
Alternative intuition random projections
p
8
C=11
1111111100011000000000
2
1111111100011000000000
drsquo=Cd
Alternative intuition random projections
8
C=11
1111111100011000000000
2
1111111100011000000000
p
Alternative intuition random projections
8
C=11
1111111100011000000000
2
1111111100011000000000
p
Alternative intuition random projections
101
11000000000 111111110000 111000000000 111111110001
000
100
110
001
101
111
2233 BucketsBucketsp
k samplings
Repeating
Repeating L times
Repeating L times
Secondary hashing
Support volume tuning
dataset-size vs storage volume
2k buckets
011
Size=B
M Buckets
Simple Hashing
MB=αn α=2
Skip
The above hashing is locality-sensitive
bullProbability (pq in same bucket)=
k=1 k=2
Distance (qpi) Distance (qpi)
Pro
babi
lity Pr
Adopted from Piotr Indykrsquos slides
kqp
dimensions
)(Distance1
Preview
bullGeneral Solution ndash Locality sensitive hashing
bullImplementation for Hamming space
bullGeneralization to l2
Direct L2 solution
bullNew hashing function
bullStill based on sampling
bullUsing mathematical trick
bullP-stable distribution for Lp distance bullGaussian distribution for L2 distance
Central limit theorem
v1 +v2 hellip+vn =+hellip
(Weighted Gaussians) = Weighted Gaussian
Central limit theorem
v1vn = Real Numbers
X1Xn = Independent Identically Distributed(iid)
+v2 X2 hellip+vn Xn =+hellipv1 X1
Central limit theorem
XvXvi
ii
ii
21
2||
Dot Product Norm
Norm Distance
XvuXvXui
iii
iii
ii
21
2||
Features vector 1
Features vector 2 Distance
Norm Distance
XvuXvXui
iii
iii
ii
21
2||
Dot Product
Dot Product Distance
The full Hashing
w
bvavh ba )(
[34 82 21]1
227742
d
d random numbers
+b
phaseRandom[0w]
wDiscretization step
Features vector
The full Hashing
w
bvavh ba )(
+34
100
7944
7900 8000 8100 82007800
The full Hashing
w
bvavh ba )(
+34
phaseRandom[0w]
100Discretization step
7944
The full Hashing
w
bvavh ba )(
a1 v d
iid from p-stable distribution
+b
phaseRandom[0w]
wDiscretization step
Features vector
Generalization P-Stable distribution
bullLp p=eps2
bullGeneralized Central Limit Theorem
bullP-stable distributionCauchy for L2
bullL2
bullCentral Limit Theorem
bullGaussian (normal) distribution
P-Stable summary
bullWorks for bullGeneralizes to 0ltplt=2
bullImproves query time
Query time = O (dn1(1+)log(n) ) O (dn1(1+)^2log(n) )
r - Nearest Neighbor
Latest resultsReported in Email by
Alexander Andoni
Parameters selection
bull90 Probability Best quarry time performance
For Euclidean Space
Parameters selectionhellip
For Euclidean Space
bullSingle projection hit an - Nearest Neighbor with Pr=p1
bullk projections hits an - Nearest Neighbor with Pr=p1k
bullL hashings fail to collide with Pr=(1-p1k)L
bullTo ensure Collision (eg 1-δge90)
bull1( -1-p1k)Lge 1-δ)1log(
)log(
1kp
L
L
Reject Non-NeighborsAccept Neighbors
hellipParameters selection
K
k
time Candidates verification Candidates extraction
Better Query Time than Spatial Data Structures
Scales well to higher dimensions and larger data size ( Sub-linear dependence )
Predictable running time
Extra storage over-head
Inefficient for data with distances concentrated around average
works best for Hamming distance (although can be generalized to Euclidean space)
In secondary storage linear scan is pretty much all we can do (for high dim)
requires radius r to be fixed in advance
Pros amp Cons
From Pioter Indyk slides
Conclusion
bullbut at the endeverything depends on your data set
bullTry it at homendashVisit
httpwebmiteduandoniwwwLSHindexhtml
ndashEmail Alex AndoniAndonimitedundashTest over your own data
(C code under Red Hat Linux )
LSH - Applicationsbull Searching video clips in databases (Hierarchical Non-Uniform Locality Sensitive
Hashing and Its Application to Video Identificationldquo Yang Ooi Sun)
bull Searching image databases (see the following)
bull Image segmentation (see the following)
bull Image classification (ldquoDiscriminant adaptive Nearest Neighbor Classificationrdquo T Hastie R Tibshirani)
bull Texture classification (see the following)
bull Clustering (see the following)
bull Embedding and manifold learning (LLE and many others)
bull Compression ndash vector quantizationbull Search engines (ldquoLSH Forest SelfTuning Indexes for Similarity Searchrdquo M Bawa T Condie P Ganesanrdquo)
bull Genomics (ldquoEfficient Large-Scale Sequence Comparison by Locality-Sensitive Hashingrdquo J Buhler)
bull In short whenever K-Nearest Neighbors (KNN) are needed
Motivation
bull A variety of procedures in learning require KNN computation
bull KNN search is a computational bottleneck
bull LSH provides a fast approximate solution to the problem
bull LSH requires hash function construction and parameter tunning
Outline
Fast Pose Estimation with Parameter Sensitive Hashing G Shakhnarovich P Viola and T Darrell
bull Finding sensitive hash functions
Mean Shift Based Clustering in HighDimensions A Texture Classification Example
B Georgescu I Shimshoni and P Meer
bull Tuning LSH parametersbull LSH data structure is used for algorithm
speedups
Given an image x what are the parameters θ in this image
ie angles of joints orientation of the body etc1048698
The Problem
Fast Pose Estimation with Parameter Sensitive Hashing
G Shakhnarovich P Viola and T Darrell
i
Ingredients
bull Input query image with unknown angles (parameters)
bull Database of human poses with known anglesbull Image feature extractor ndash edge detector
bull Distance metric in feature space dx
bull Distance metric in angles space
m
i
iid1
2121 )cos(1)(
Example based learning
bull Construct a database of example images with their known angles
bull Given a query image run your favorite feature extractorbull Compute KNN from databasebull Use these KNNs to compute the average angles of the
query
Input queryFind KNN in database of examples
Output Average angles of KNN
Input Query
Features extraction
Processed query
PSH (LSH)
Database of examples
The algorithm flow
LWR (Regression)
Output Match
The image features
B A
Axx 4107 )(
4
3
2
4 0
Image features are multi-scale edge histograms
Feature Extraction PSH LWR
PSH The basic assumption
There are two metric spaces here feature space ( )
and parameter space ( )
We want similarity to be measured in the angles
space whereas LSH works on the feature space
bull Assumption The feature space is closely related to the parameter space
xd
d
Feature Extraction PSH LWR
Insight Manifolds
bull Manifold is a space in which every point has a neighborhood resembling a Euclid space
bull But global structure may be complicated curved
bull For example lines are 1D manifolds planes are 2D manifolds etc
Feature Extraction PSH LWR
Parameters Space (angles)
Feature Space
q
Is this Magic
Parameter Sensitive Hashing (PSH)
The trick
Estimate performance of different hash functions on examples and select those sensitive to
The hash functions are applied in feature space but the KNN are valid in angle space
d
Feature Extraction PSH LWR
Label pairs of examples with similar angles
Define hash functions h on feature space
Feature Extraction PSH LWR
Predict labeling of similarnon-similar examples by using h
Compare labeling
If labeling by h is goodaccept h else change h
PSH as a classification problem
+1 +1 -1 -1
(r=025)
Labels
)1()( if 1
)( if 1y
labeled is
)x()(x examples ofpair A
ij
ji
rd
rd
ji
ji
ji
Feature Extraction PSH LWR
otherwise 1-
T(x) if 1)(
xh T
A binary hash functionfeatures
otherwise 1
if 1ˆ
labels ePredict th
)(xh)(xh)x(xy
jTiTjih
Feature Extraction PSH LWR
Feature
Feature Extraction PSH LWR
sconstraint iesprobabilit with thelabeling true thepredicts that Tbest theFind
themseparateor bin
same in the examplesboth place willTh
)(xT
Local Weighted Regression (LWR)bull Given a query image PSH returns
KNNs
bull LWR uses the KNN to compute a weighted average of the estimated angles of the query
weightdist
iXiixNx
xxdKxgdi
0)(
))(())((minarg0
Feature Extraction PSH LWR
Results
Synthetic data were generated
bull 13 angles 1 for rotation of the torso 12 for joints
bull 150000 images
bull Nuisance parameters added clothing illumination face expression
bull 1775000 example pairs
bull Selected 137 out of 5123 meaningful features (how)
18 bit hash functions (k) 150 hash tables (l)
bull Test on 1000 synthetic examplesbull PSH searched only 34 of the data per query
bull Without selection needed 40 bits and
1000 hash tables
Recall P1 is prob of positive hashP2 is prob of bad hashB is the max number of pts in a bucket
Results ndash real data
bull 800 images
bull Processed by a segmentation algorithm
bull 13 of the data were searched
Results ndash real data
Interesting mismatches
Fast pose estimation - summary
bull Fast way to compute the angles of human body figure
bull Moving from one representation space to another
bull Training a sensitive hash function
bull KNN smart averaging
Food for Thought
bull The basic assumption may be problematic (distance metric representations)
bull The training set should be dense
bull Texture and clutter
bull General some features are more important than others and should be weighted
Food for Thought Point Location in Different Spheres (PLDS)
bull Given n spheres in Rd centered at P=p1hellippn
with radii r1helliprn
bull Goal given a query q preprocess the points in P to find point pi that its sphere lsquocoverrsquo the query q
qpi
ri
Courtesy of Mohamad Hegaze
Motivationbull Clustering high dimensional data by using local
density measurements (eg feature space)bull Statistical curse of dimensionality
sparseness of the databull Computational curse of dimensionality
expensive range queriesbull LSH parameters should be adjusted for optimal
performance
Mean-Shift Based Clustering in High Dimensions A Texture Classification Example
B Georgescu I Shimshoni and P Meer
Outline
bull Mean-shift in a nutshell + examples
Our scope
bull Mean-shift in high dimensions ndash using LSH
bull Speedups1 Finding optimal LSH parameters
2 Data-driven partitions into buckets
3 Additional speedup by using LSH data structure
Mean-Shift in a Nutshellbandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
point
KNN in mean-shift
Bandwidth should be inversely proportional to the density in the region
high density - small bandwidth low density - large bandwidth
Based on kth nearest neighbor of the point
The bandwidth is
Adaptive mean-shift vs non-adaptive
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Image segmentation algorithm1 Input Data in 5D (3 color + 2 xy) or 3D (1 gray +2 xy)2 Resolution controlled by the bandwidth hs (spatial) hr (color)3 Apply filtering
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
3D
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Image segmentation algorithm
original segmented
filtered
Filtering pixel value of the nearest mode
Mean-shift trajectories
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
original squirrel filtered
original baboon filtered
Filtering examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Segmentation examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Mean-shift in high dimensions
Computational curse of dimensionality
Statistical curse of dimensionality
Expensive range queries implemented with LSH
Sparseness of the data variable bandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
LSH-based data structure
bull Choose L random partitionsEach partition includes K pairs
(dkvk)bull For each point we check
kdi vxK
It Partitions the data into cells
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing the optimal K and L
bull For a query q compute smallest number of distances to points in its buckets
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
points extra includemight big toois L ifbut
missed bemight points small toois L If
cell ain points ofnumber smaller k Large
C
l
l
CC
dC
LNN
dKnN
)1(
C
C
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
structure data theof resolution thedetermines
decreases but increases increases L As
C
CC
Choosing optimal K and LDetermine accurately the KNN for m randomly-selected data points
distance (bandwidth)
Choose error threshold
The optimal K and L should satisfy
the approximate distance
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing optimal K and Lbull For each K estimate the error forbull In one run for all Lrsquos find the minimal L satisfying the constraint L(K)bull Minimize time t(KL(K))
minimum
Approximationerror for KL
L(K) for =005 Running timet[KL(K)]
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Data driven partitions
bull In original LSH cut values are random in the range of the databull Suggestion Randomly select a point from the data and use one of its coordinates as the cut value
uniform data driven pointsbucket distribution
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Additional speedup
aggregate)an of typea like is (C mode same
the toconverge willCin points all that Assume
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
C
C
Speedup results
65536 points 1638 points sampled k=100
Food for thought
Low dimension High dimension
A thought for foodhellipbull Choose K L by sample learning or take the traditionalbull Can one estimate K L without samplingbull A thought for food does it help to know the data
dimensionality or the data manifoldbull Intuitively dimensionality implies the number of hash
functions neededbull The catch efficient dimensionality learning requires
KNN
1530 cookieshellip
Summary
bull LSH suggests a compromise on accuracy for the gain of complexity
bull Applications that involve massive data in high dimension require the LSH fast performance
bull Extension of the LSH to different spaces (PSH)
bull Learning the LSH parameters and hash functions for different applications
Conclusion
bull but at the endeverything depends on your data set
bull Try it at homendash Visit
httpwebmiteduandoniwwwLSHindexhtmlndash Email Alex Andoni Andonimitedundash Test over your own data
(C code under Red Hat Linux )
Thanks
bull Ilan Shimshoni (Haifa)
bull Mohamad Hegaze (Weizmann)
bull Alex Andoni (MIT)
bull Mica and Denis
- k-Nearest Neighbors Search in High Dimensions
- Outline
- Nearest Neighbor Search Problem definition
- Applications
- Naiumlve solution
- Common solution
- When to use nearest neighbor
- Nearest Neighbor
- r - Nearest Neighbor
- Slide 10
- The simplest solution
- Quadtree
- Quadtree - structure
- Quadtree - Query
- Quadtree ndash Pitfall1
- Slide 16
- Quadtree ndash pitfall 2
- Space partition based algorithms
- Slide 19
- Curse of dimensionality
- Curse of dimensionality Some intuition
- Slide 22
- Preview
- Hash function
- Slide 25
- Slide 26
- Recall r - Nearest Neighbor
- Locality sensitive hashing
- Slide 29
- Hamming Space
- Slide 31
- L1 to Hamming Space Embedding
- Slide 33
- Construction
- Query
- Alternative intuition random projections
- Slide 37
- Slide 38
- Slide 39
- k samplings
- Repeating
- Repeating L times
- Slide 43
- Secondary hashing
- The above hashing is locality-sensitive
- Slide 46
- Direct L2 solution
- Central limit theorem
- Slide 49
- Slide 50
- Norm Distance
- Slide 52
- The full Hashing
- Slide 54
- Slide 55
- Slide 56
- Generalization P-Stable distribution
- P-Stable summary
- Parameters selection
- Parameters selection hellip
- hellip Parameters selection
- Pros amp Cons
- Conclusion
- LSH - Applications
- Motivation
- Slide 66
- Given an image x what are the parameters θ in this image ie angles of joints orientation of the body etc1048698
- Ingredients
- Example based learning
- Slide 70
- The image features
- PSH The basic assumption
- Insight Manifolds
- Slide 74
- Parameter Sensitive Hashing (PSH)
- Slide 76
- Slide 77
- Slide 78
- Slide 79
- Local Weighted Regression (LWR)
- Results
- Slide 82
- Results ndash real data
- Slide 84
- Slide 85
- Fast pose estimation - summary
- Food for Thought
- Food for Thought Point Location in Different Spheres (PLDS)
- Motivation
- Slide 90
- Slide 91
- Slide 92
- Slide 93
- Image segmentation algorithm
- Slide 95
- Filtering examples
- Segmentation examples
- Mean-shift in high dimensions
- LSH-based data structure
- Choosing the optimal K and L
- Slide 101
- Choosing optimal K and L
- Slide 103
- Data driven partitions
- Additional speedup
- Speedup results
- Food for thought
- A thought for foodhellip
- Summary
- Slide 110
- Thanks
-
![Page 30: K-Nearest Neighbors Search in High Dimensions Tomer Peled Dan Kushnir Tell me who your neighbors are, and I'll know who you are.](https://reader036.fdocuments.us/reader036/viewer/2022062421/56649c765503460f94929f53/html5/thumbnails/30.jpg)
Hamming Space
bullHamming space = 2N binary strings
bullHamming distance = changed digits
aka Signal distanceRichard Hamming
Hamming SpaceN
010100001111
010100001111
010010000011Distance = 4
bullHamming space
bullHamming distance
SUM(X1 XOR X2)
L1 to Hamming Space Embedding
p
8
C=11
1111111100011000000000
2
1111111100011000000000
drsquo=Cd
Hash function
Lj Hash function
p Hdrsquoisin
Gj(p)=p|Ij
j=1L k=3 digits
Bits sampling from p
Store p into bucket p|Ij 2k buckets101
11000000000 111111110000 111000000000 111111110001
Construction
1 2 L
p
Query
1 2 L
q
Alternative intuition random projections
p
8
C=11
1111111100011000000000
2
1111111100011000000000
drsquo=Cd
Alternative intuition random projections
8
C=11
1111111100011000000000
2
1111111100011000000000
p
Alternative intuition random projections
8
C=11
1111111100011000000000
2
1111111100011000000000
p
Alternative intuition random projections
101
11000000000 111111110000 111000000000 111111110001
000
100
110
001
101
111
2233 BucketsBucketsp
k samplings
Repeating
Repeating L times
Repeating L times
Secondary hashing
Support volume tuning
dataset-size vs storage volume
2k buckets
011
Size=B
M Buckets
Simple Hashing
MB=αn α=2
Skip
The above hashing is locality-sensitive
bullProbability (pq in same bucket)=
k=1 k=2
Distance (qpi) Distance (qpi)
Pro
babi
lity Pr
Adopted from Piotr Indykrsquos slides
kqp
dimensions
)(Distance1
Preview
bullGeneral Solution ndash Locality sensitive hashing
bullImplementation for Hamming space
bullGeneralization to l2
Direct L2 solution
bullNew hashing function
bullStill based on sampling
bullUsing mathematical trick
bullP-stable distribution for Lp distance bullGaussian distribution for L2 distance
Central limit theorem
v1 +v2 hellip+vn =+hellip
(Weighted Gaussians) = Weighted Gaussian
Central limit theorem
v1vn = Real Numbers
X1Xn = Independent Identically Distributed(iid)
+v2 X2 hellip+vn Xn =+hellipv1 X1
Central limit theorem
XvXvi
ii
ii
21
2||
Dot Product Norm
Norm Distance
XvuXvXui
iii
iii
ii
21
2||
Features vector 1
Features vector 2 Distance
Norm Distance
XvuXvXui
iii
iii
ii
21
2||
Dot Product
Dot Product Distance
The full Hashing
w
bvavh ba )(
[34 82 21]1
227742
d
d random numbers
+b
phaseRandom[0w]
wDiscretization step
Features vector
The full Hashing
w
bvavh ba )(
+34
100
7944
7900 8000 8100 82007800
The full Hashing
w
bvavh ba )(
+34
phaseRandom[0w]
100Discretization step
7944
The full Hashing
w
bvavh ba )(
a1 v d
iid from p-stable distribution
+b
phaseRandom[0w]
wDiscretization step
Features vector
Generalization P-Stable distribution
bullLp p=eps2
bullGeneralized Central Limit Theorem
bullP-stable distributionCauchy for L2
bullL2
bullCentral Limit Theorem
bullGaussian (normal) distribution
P-Stable summary
bullWorks for bullGeneralizes to 0ltplt=2
bullImproves query time
Query time = O (dn1(1+)log(n) ) O (dn1(1+)^2log(n) )
r - Nearest Neighbor
Latest resultsReported in Email by
Alexander Andoni
Parameters selection
bull90 Probability Best quarry time performance
For Euclidean Space
Parameters selectionhellip
For Euclidean Space
bullSingle projection hit an - Nearest Neighbor with Pr=p1
bullk projections hits an - Nearest Neighbor with Pr=p1k
bullL hashings fail to collide with Pr=(1-p1k)L
bullTo ensure Collision (eg 1-δge90)
bull1( -1-p1k)Lge 1-δ)1log(
)log(
1kp
L
L
Reject Non-NeighborsAccept Neighbors
hellipParameters selection
K
k
time Candidates verification Candidates extraction
Better Query Time than Spatial Data Structures
Scales well to higher dimensions and larger data size ( Sub-linear dependence )
Predictable running time
Extra storage over-head
Inefficient for data with distances concentrated around average
works best for Hamming distance (although can be generalized to Euclidean space)
In secondary storage linear scan is pretty much all we can do (for high dim)
requires radius r to be fixed in advance
Pros amp Cons
From Pioter Indyk slides
Conclusion
bullbut at the endeverything depends on your data set
bullTry it at homendashVisit
httpwebmiteduandoniwwwLSHindexhtml
ndashEmail Alex AndoniAndonimitedundashTest over your own data
(C code under Red Hat Linux )
LSH - Applicationsbull Searching video clips in databases (Hierarchical Non-Uniform Locality Sensitive
Hashing and Its Application to Video Identificationldquo Yang Ooi Sun)
bull Searching image databases (see the following)
bull Image segmentation (see the following)
bull Image classification (ldquoDiscriminant adaptive Nearest Neighbor Classificationrdquo T Hastie R Tibshirani)
bull Texture classification (see the following)
bull Clustering (see the following)
bull Embedding and manifold learning (LLE and many others)
bull Compression ndash vector quantizationbull Search engines (ldquoLSH Forest SelfTuning Indexes for Similarity Searchrdquo M Bawa T Condie P Ganesanrdquo)
bull Genomics (ldquoEfficient Large-Scale Sequence Comparison by Locality-Sensitive Hashingrdquo J Buhler)
bull In short whenever K-Nearest Neighbors (KNN) are needed
Motivation
bull A variety of procedures in learning require KNN computation
bull KNN search is a computational bottleneck
bull LSH provides a fast approximate solution to the problem
bull LSH requires hash function construction and parameter tunning
Outline
Fast Pose Estimation with Parameter Sensitive Hashing G Shakhnarovich P Viola and T Darrell
bull Finding sensitive hash functions
Mean Shift Based Clustering in HighDimensions A Texture Classification Example
B Georgescu I Shimshoni and P Meer
bull Tuning LSH parametersbull LSH data structure is used for algorithm
speedups
Given an image x what are the parameters θ in this image
ie angles of joints orientation of the body etc1048698
The Problem
Fast Pose Estimation with Parameter Sensitive Hashing
G Shakhnarovich P Viola and T Darrell
i
Ingredients
bull Input query image with unknown angles (parameters)
bull Database of human poses with known anglesbull Image feature extractor ndash edge detector
bull Distance metric in feature space dx
bull Distance metric in angles space
m
i
iid1
2121 )cos(1)(
Example based learning
bull Construct a database of example images with their known angles
bull Given a query image run your favorite feature extractorbull Compute KNN from databasebull Use these KNNs to compute the average angles of the
query
Input queryFind KNN in database of examples
Output Average angles of KNN
Input Query
Features extraction
Processed query
PSH (LSH)
Database of examples
The algorithm flow
LWR (Regression)
Output Match
The image features
B A
Axx 4107 )(
4
3
2
4 0
Image features are multi-scale edge histograms
Feature Extraction PSH LWR
PSH The basic assumption
There are two metric spaces here feature space ( )
and parameter space ( )
We want similarity to be measured in the angles
space whereas LSH works on the feature space
bull Assumption The feature space is closely related to the parameter space
xd
d
Feature Extraction PSH LWR
Insight Manifolds
bull Manifold is a space in which every point has a neighborhood resembling a Euclid space
bull But global structure may be complicated curved
bull For example lines are 1D manifolds planes are 2D manifolds etc
Feature Extraction PSH LWR
Parameters Space (angles)
Feature Space
q
Is this Magic
Parameter Sensitive Hashing (PSH)
The trick
Estimate performance of different hash functions on examples and select those sensitive to
The hash functions are applied in feature space but the KNN are valid in angle space
d
Feature Extraction PSH LWR
Label pairs of examples with similar angles
Define hash functions h on feature space
Feature Extraction PSH LWR
Predict labeling of similarnon-similar examples by using h
Compare labeling
If labeling by h is goodaccept h else change h
PSH as a classification problem
+1 +1 -1 -1
(r=025)
Labels
)1()( if 1
)( if 1y
labeled is
)x()(x examples ofpair A
ij
ji
rd
rd
ji
ji
ji
Feature Extraction PSH LWR
otherwise 1-
T(x) if 1)(
xh T
A binary hash functionfeatures
otherwise 1
if 1ˆ
labels ePredict th
)(xh)(xh)x(xy
jTiTjih
Feature Extraction PSH LWR
Feature
Feature Extraction PSH LWR
sconstraint iesprobabilit with thelabeling true thepredicts that Tbest theFind
themseparateor bin
same in the examplesboth place willTh
)(xT
Local Weighted Regression (LWR)bull Given a query image PSH returns
KNNs
bull LWR uses the KNN to compute a weighted average of the estimated angles of the query
weightdist
iXiixNx
xxdKxgdi
0)(
))(())((minarg0
Feature Extraction PSH LWR
Results
Synthetic data were generated
bull 13 angles 1 for rotation of the torso 12 for joints
bull 150000 images
bull Nuisance parameters added clothing illumination face expression
bull 1775000 example pairs
bull Selected 137 out of 5123 meaningful features (how)
18 bit hash functions (k) 150 hash tables (l)
bull Test on 1000 synthetic examplesbull PSH searched only 34 of the data per query
bull Without selection needed 40 bits and
1000 hash tables
Recall P1 is prob of positive hashP2 is prob of bad hashB is the max number of pts in a bucket
Results ndash real data
bull 800 images
bull Processed by a segmentation algorithm
bull 13 of the data were searched
Results ndash real data
Interesting mismatches
Fast pose estimation - summary
bull Fast way to compute the angles of human body figure
bull Moving from one representation space to another
bull Training a sensitive hash function
bull KNN smart averaging
Food for Thought
bull The basic assumption may be problematic (distance metric representations)
bull The training set should be dense
bull Texture and clutter
bull General some features are more important than others and should be weighted
Food for Thought Point Location in Different Spheres (PLDS)
bull Given n spheres in Rd centered at P=p1hellippn
with radii r1helliprn
bull Goal given a query q preprocess the points in P to find point pi that its sphere lsquocoverrsquo the query q
qpi
ri
Courtesy of Mohamad Hegaze
Motivationbull Clustering high dimensional data by using local
density measurements (eg feature space)bull Statistical curse of dimensionality
sparseness of the databull Computational curse of dimensionality
expensive range queriesbull LSH parameters should be adjusted for optimal
performance
Mean-Shift Based Clustering in High Dimensions A Texture Classification Example
B Georgescu I Shimshoni and P Meer
Outline
bull Mean-shift in a nutshell + examples
Our scope
bull Mean-shift in high dimensions ndash using LSH
bull Speedups1 Finding optimal LSH parameters
2 Data-driven partitions into buckets
3 Additional speedup by using LSH data structure
Mean-Shift in a Nutshellbandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
point
KNN in mean-shift
Bandwidth should be inversely proportional to the density in the region
high density - small bandwidth low density - large bandwidth
Based on kth nearest neighbor of the point
The bandwidth is
Adaptive mean-shift vs non-adaptive
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Image segmentation algorithm1 Input Data in 5D (3 color + 2 xy) or 3D (1 gray +2 xy)2 Resolution controlled by the bandwidth hs (spatial) hr (color)3 Apply filtering
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
3D
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Image segmentation algorithm
original segmented
filtered
Filtering pixel value of the nearest mode
Mean-shift trajectories
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
original squirrel filtered
original baboon filtered
Filtering examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Segmentation examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Mean-shift in high dimensions
Computational curse of dimensionality
Statistical curse of dimensionality
Expensive range queries implemented with LSH
Sparseness of the data variable bandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
LSH-based data structure
bull Choose L random partitionsEach partition includes K pairs
(dkvk)bull For each point we check
kdi vxK
It Partitions the data into cells
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing the optimal K and L
bull For a query q compute smallest number of distances to points in its buckets
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
points extra includemight big toois L ifbut
missed bemight points small toois L If
cell ain points ofnumber smaller k Large
C
l
l
CC
dC
LNN
dKnN
)1(
C
C
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
structure data theof resolution thedetermines
decreases but increases increases L As
C
CC
Choosing optimal K and LDetermine accurately the KNN for m randomly-selected data points
distance (bandwidth)
Choose error threshold
The optimal K and L should satisfy
the approximate distance
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing optimal K and Lbull For each K estimate the error forbull In one run for all Lrsquos find the minimal L satisfying the constraint L(K)bull Minimize time t(KL(K))
minimum
Approximationerror for KL
L(K) for =005 Running timet[KL(K)]
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Data driven partitions
bull In original LSH cut values are random in the range of the databull Suggestion Randomly select a point from the data and use one of its coordinates as the cut value
uniform data driven pointsbucket distribution
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Additional speedup
aggregate)an of typea like is (C mode same
the toconverge willCin points all that Assume
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
C
C
Speedup results
65536 points 1638 points sampled k=100
Food for thought
Low dimension High dimension
A thought for foodhellipbull Choose K L by sample learning or take the traditionalbull Can one estimate K L without samplingbull A thought for food does it help to know the data
dimensionality or the data manifoldbull Intuitively dimensionality implies the number of hash
functions neededbull The catch efficient dimensionality learning requires
KNN
1530 cookieshellip
Summary
bull LSH suggests a compromise on accuracy for the gain of complexity
bull Applications that involve massive data in high dimension require the LSH fast performance
bull Extension of the LSH to different spaces (PSH)
bull Learning the LSH parameters and hash functions for different applications
Conclusion
bull but at the endeverything depends on your data set
bull Try it at homendash Visit
httpwebmiteduandoniwwwLSHindexhtmlndash Email Alex Andoni Andonimitedundash Test over your own data
(C code under Red Hat Linux )
Thanks
bull Ilan Shimshoni (Haifa)
bull Mohamad Hegaze (Weizmann)
bull Alex Andoni (MIT)
bull Mica and Denis
- k-Nearest Neighbors Search in High Dimensions
- Outline
- Nearest Neighbor Search Problem definition
- Applications
- Naiumlve solution
- Common solution
- When to use nearest neighbor
- Nearest Neighbor
- r - Nearest Neighbor
- Slide 10
- The simplest solution
- Quadtree
- Quadtree - structure
- Quadtree - Query
- Quadtree ndash Pitfall1
- Slide 16
- Quadtree ndash pitfall 2
- Space partition based algorithms
- Slide 19
- Curse of dimensionality
- Curse of dimensionality Some intuition
- Slide 22
- Preview
- Hash function
- Slide 25
- Slide 26
- Recall r - Nearest Neighbor
- Locality sensitive hashing
- Slide 29
- Hamming Space
- Slide 31
- L1 to Hamming Space Embedding
- Slide 33
- Construction
- Query
- Alternative intuition random projections
- Slide 37
- Slide 38
- Slide 39
- k samplings
- Repeating
- Repeating L times
- Slide 43
- Secondary hashing
- The above hashing is locality-sensitive
- Slide 46
- Direct L2 solution
- Central limit theorem
- Slide 49
- Slide 50
- Norm Distance
- Slide 52
- The full Hashing
- Slide 54
- Slide 55
- Slide 56
- Generalization P-Stable distribution
- P-Stable summary
- Parameters selection
- Parameters selection hellip
- hellip Parameters selection
- Pros amp Cons
- Conclusion
- LSH - Applications
- Motivation
- Slide 66
- Given an image x what are the parameters θ in this image ie angles of joints orientation of the body etc1048698
- Ingredients
- Example based learning
- Slide 70
- The image features
- PSH The basic assumption
- Insight Manifolds
- Slide 74
- Parameter Sensitive Hashing (PSH)
- Slide 76
- Slide 77
- Slide 78
- Slide 79
- Local Weighted Regression (LWR)
- Results
- Slide 82
- Results ndash real data
- Slide 84
- Slide 85
- Fast pose estimation - summary
- Food for Thought
- Food for Thought Point Location in Different Spheres (PLDS)
- Motivation
- Slide 90
- Slide 91
- Slide 92
- Slide 93
- Image segmentation algorithm
- Slide 95
- Filtering examples
- Segmentation examples
- Mean-shift in high dimensions
- LSH-based data structure
- Choosing the optimal K and L
- Slide 101
- Choosing optimal K and L
- Slide 103
- Data driven partitions
- Additional speedup
- Speedup results
- Food for thought
- A thought for foodhellip
- Summary
- Slide 110
- Thanks
-
![Page 31: K-Nearest Neighbors Search in High Dimensions Tomer Peled Dan Kushnir Tell me who your neighbors are, and I'll know who you are.](https://reader036.fdocuments.us/reader036/viewer/2022062421/56649c765503460f94929f53/html5/thumbnails/31.jpg)
Hamming SpaceN
010100001111
010100001111
010010000011Distance = 4
bullHamming space
bullHamming distance
SUM(X1 XOR X2)
L1 to Hamming Space Embedding
p
8
C=11
1111111100011000000000
2
1111111100011000000000
drsquo=Cd
Hash function
Lj Hash function
p Hdrsquoisin
Gj(p)=p|Ij
j=1L k=3 digits
Bits sampling from p
Store p into bucket p|Ij 2k buckets101
11000000000 111111110000 111000000000 111111110001
Construction
1 2 L
p
Query
1 2 L
q
Alternative intuition random projections
p
8
C=11
1111111100011000000000
2
1111111100011000000000
drsquo=Cd
Alternative intuition random projections
8
C=11
1111111100011000000000
2
1111111100011000000000
p
Alternative intuition random projections
8
C=11
1111111100011000000000
2
1111111100011000000000
p
Alternative intuition random projections
101
11000000000 111111110000 111000000000 111111110001
000
100
110
001
101
111
2233 BucketsBucketsp
k samplings
Repeating
Repeating L times
Repeating L times
Secondary hashing
Support volume tuning
dataset-size vs storage volume
2k buckets
011
Size=B
M Buckets
Simple Hashing
MB=αn α=2
Skip
The above hashing is locality-sensitive
bullProbability (pq in same bucket)=
k=1 k=2
Distance (qpi) Distance (qpi)
Pro
babi
lity Pr
Adopted from Piotr Indykrsquos slides
kqp
dimensions
)(Distance1
Preview
bullGeneral Solution ndash Locality sensitive hashing
bullImplementation for Hamming space
bullGeneralization to l2
Direct L2 solution
bullNew hashing function
bullStill based on sampling
bullUsing mathematical trick
bullP-stable distribution for Lp distance bullGaussian distribution for L2 distance
Central limit theorem
v1 +v2 hellip+vn =+hellip
(Weighted Gaussians) = Weighted Gaussian
Central limit theorem
v1vn = Real Numbers
X1Xn = Independent Identically Distributed(iid)
+v2 X2 hellip+vn Xn =+hellipv1 X1
Central limit theorem
XvXvi
ii
ii
21
2||
Dot Product Norm
Norm Distance
XvuXvXui
iii
iii
ii
21
2||
Features vector 1
Features vector 2 Distance
Norm Distance
XvuXvXui
iii
iii
ii
21
2||
Dot Product
Dot Product Distance
The full Hashing
w
bvavh ba )(
[34 82 21]1
227742
d
d random numbers
+b
phaseRandom[0w]
wDiscretization step
Features vector
The full Hashing
w
bvavh ba )(
+34
100
7944
7900 8000 8100 82007800
The full Hashing
w
bvavh ba )(
+34
phaseRandom[0w]
100Discretization step
7944
The full Hashing
w
bvavh ba )(
a1 v d
iid from p-stable distribution
+b
phaseRandom[0w]
wDiscretization step
Features vector
Generalization P-Stable distribution
bullLp p=eps2
bullGeneralized Central Limit Theorem
bullP-stable distributionCauchy for L2
bullL2
bullCentral Limit Theorem
bullGaussian (normal) distribution
P-Stable summary
bullWorks for bullGeneralizes to 0ltplt=2
bullImproves query time
Query time = O (dn1(1+)log(n) ) O (dn1(1+)^2log(n) )
r - Nearest Neighbor
Latest resultsReported in Email by
Alexander Andoni
Parameters selection
bull90 Probability Best quarry time performance
For Euclidean Space
Parameters selectionhellip
For Euclidean Space
bullSingle projection hit an - Nearest Neighbor with Pr=p1
bullk projections hits an - Nearest Neighbor with Pr=p1k
bullL hashings fail to collide with Pr=(1-p1k)L
bullTo ensure Collision (eg 1-δge90)
bull1( -1-p1k)Lge 1-δ)1log(
)log(
1kp
L
L
Reject Non-NeighborsAccept Neighbors
hellipParameters selection
K
k
time Candidates verification Candidates extraction
Better Query Time than Spatial Data Structures
Scales well to higher dimensions and larger data size ( Sub-linear dependence )
Predictable running time
Extra storage over-head
Inefficient for data with distances concentrated around average
works best for Hamming distance (although can be generalized to Euclidean space)
In secondary storage linear scan is pretty much all we can do (for high dim)
requires radius r to be fixed in advance
Pros amp Cons
From Pioter Indyk slides
Conclusion
bullbut at the endeverything depends on your data set
bullTry it at homendashVisit
httpwebmiteduandoniwwwLSHindexhtml
ndashEmail Alex AndoniAndonimitedundashTest over your own data
(C code under Red Hat Linux )
LSH - Applicationsbull Searching video clips in databases (Hierarchical Non-Uniform Locality Sensitive
Hashing and Its Application to Video Identificationldquo Yang Ooi Sun)
bull Searching image databases (see the following)
bull Image segmentation (see the following)
bull Image classification (ldquoDiscriminant adaptive Nearest Neighbor Classificationrdquo T Hastie R Tibshirani)
bull Texture classification (see the following)
bull Clustering (see the following)
bull Embedding and manifold learning (LLE and many others)
bull Compression ndash vector quantizationbull Search engines (ldquoLSH Forest SelfTuning Indexes for Similarity Searchrdquo M Bawa T Condie P Ganesanrdquo)
bull Genomics (ldquoEfficient Large-Scale Sequence Comparison by Locality-Sensitive Hashingrdquo J Buhler)
bull In short whenever K-Nearest Neighbors (KNN) are needed
Motivation
bull A variety of procedures in learning require KNN computation
bull KNN search is a computational bottleneck
bull LSH provides a fast approximate solution to the problem
bull LSH requires hash function construction and parameter tunning
Outline
Fast Pose Estimation with Parameter Sensitive Hashing G Shakhnarovich P Viola and T Darrell
bull Finding sensitive hash functions
Mean Shift Based Clustering in HighDimensions A Texture Classification Example
B Georgescu I Shimshoni and P Meer
bull Tuning LSH parametersbull LSH data structure is used for algorithm
speedups
Given an image x what are the parameters θ in this image
ie angles of joints orientation of the body etc1048698
The Problem
Fast Pose Estimation with Parameter Sensitive Hashing
G Shakhnarovich P Viola and T Darrell
i
Ingredients
bull Input query image with unknown angles (parameters)
bull Database of human poses with known anglesbull Image feature extractor ndash edge detector
bull Distance metric in feature space dx
bull Distance metric in angles space
m
i
iid1
2121 )cos(1)(
Example based learning
bull Construct a database of example images with their known angles
bull Given a query image run your favorite feature extractorbull Compute KNN from databasebull Use these KNNs to compute the average angles of the
query
Input queryFind KNN in database of examples
Output Average angles of KNN
Input Query
Features extraction
Processed query
PSH (LSH)
Database of examples
The algorithm flow
LWR (Regression)
Output Match
The image features
B A
Axx 4107 )(
4
3
2
4 0
Image features are multi-scale edge histograms
Feature Extraction PSH LWR
PSH The basic assumption
There are two metric spaces here feature space ( )
and parameter space ( )
We want similarity to be measured in the angles
space whereas LSH works on the feature space
bull Assumption The feature space is closely related to the parameter space
xd
d
Feature Extraction PSH LWR
Insight Manifolds
bull Manifold is a space in which every point has a neighborhood resembling a Euclid space
bull But global structure may be complicated curved
bull For example lines are 1D manifolds planes are 2D manifolds etc
Feature Extraction PSH LWR
Parameters Space (angles)
Feature Space
q
Is this Magic
Parameter Sensitive Hashing (PSH)
The trick
Estimate performance of different hash functions on examples and select those sensitive to
The hash functions are applied in feature space but the KNN are valid in angle space
d
Feature Extraction PSH LWR
Label pairs of examples with similar angles
Define hash functions h on feature space
Feature Extraction PSH LWR
Predict labeling of similarnon-similar examples by using h
Compare labeling
If labeling by h is goodaccept h else change h
PSH as a classification problem
+1 +1 -1 -1
(r=025)
Labels
)1()( if 1
)( if 1y
labeled is
)x()(x examples ofpair A
ij
ji
rd
rd
ji
ji
ji
Feature Extraction PSH LWR
otherwise 1-
T(x) if 1)(
xh T
A binary hash functionfeatures
otherwise 1
if 1ˆ
labels ePredict th
)(xh)(xh)x(xy
jTiTjih
Feature Extraction PSH LWR
Feature
Feature Extraction PSH LWR
sconstraint iesprobabilit with thelabeling true thepredicts that Tbest theFind
themseparateor bin
same in the examplesboth place willTh
)(xT
Local Weighted Regression (LWR)bull Given a query image PSH returns
KNNs
bull LWR uses the KNN to compute a weighted average of the estimated angles of the query
weightdist
iXiixNx
xxdKxgdi
0)(
))(())((minarg0
Feature Extraction PSH LWR
Results
Synthetic data were generated
bull 13 angles 1 for rotation of the torso 12 for joints
bull 150000 images
bull Nuisance parameters added clothing illumination face expression
bull 1775000 example pairs
bull Selected 137 out of 5123 meaningful features (how)
18 bit hash functions (k) 150 hash tables (l)
bull Test on 1000 synthetic examplesbull PSH searched only 34 of the data per query
bull Without selection needed 40 bits and
1000 hash tables
Recall P1 is prob of positive hashP2 is prob of bad hashB is the max number of pts in a bucket
Results ndash real data
bull 800 images
bull Processed by a segmentation algorithm
bull 13 of the data were searched
Results ndash real data
Interesting mismatches
Fast pose estimation - summary
bull Fast way to compute the angles of human body figure
bull Moving from one representation space to another
bull Training a sensitive hash function
bull KNN smart averaging
Food for Thought
bull The basic assumption may be problematic (distance metric representations)
bull The training set should be dense
bull Texture and clutter
bull General some features are more important than others and should be weighted
Food for Thought Point Location in Different Spheres (PLDS)
bull Given n spheres in Rd centered at P=p1hellippn
with radii r1helliprn
bull Goal given a query q preprocess the points in P to find point pi that its sphere lsquocoverrsquo the query q
qpi
ri
Courtesy of Mohamad Hegaze
Motivationbull Clustering high dimensional data by using local
density measurements (eg feature space)bull Statistical curse of dimensionality
sparseness of the databull Computational curse of dimensionality
expensive range queriesbull LSH parameters should be adjusted for optimal
performance
Mean-Shift Based Clustering in High Dimensions A Texture Classification Example
B Georgescu I Shimshoni and P Meer
Outline
bull Mean-shift in a nutshell + examples
Our scope
bull Mean-shift in high dimensions ndash using LSH
bull Speedups1 Finding optimal LSH parameters
2 Data-driven partitions into buckets
3 Additional speedup by using LSH data structure
Mean-Shift in a Nutshellbandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
point
KNN in mean-shift
Bandwidth should be inversely proportional to the density in the region
high density - small bandwidth low density - large bandwidth
Based on kth nearest neighbor of the point
The bandwidth is
Adaptive mean-shift vs non-adaptive
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Image segmentation algorithm1 Input Data in 5D (3 color + 2 xy) or 3D (1 gray +2 xy)2 Resolution controlled by the bandwidth hs (spatial) hr (color)3 Apply filtering
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
3D
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Image segmentation algorithm
original segmented
filtered
Filtering pixel value of the nearest mode
Mean-shift trajectories
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
original squirrel filtered
original baboon filtered
Filtering examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Segmentation examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Mean-shift in high dimensions
Computational curse of dimensionality
Statistical curse of dimensionality
Expensive range queries implemented with LSH
Sparseness of the data variable bandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
LSH-based data structure
bull Choose L random partitionsEach partition includes K pairs
(dkvk)bull For each point we check
kdi vxK
It Partitions the data into cells
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing the optimal K and L
bull For a query q compute smallest number of distances to points in its buckets
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
points extra includemight big toois L ifbut
missed bemight points small toois L If
cell ain points ofnumber smaller k Large
C
l
l
CC
dC
LNN
dKnN
)1(
C
C
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
structure data theof resolution thedetermines
decreases but increases increases L As
C
CC
Choosing optimal K and LDetermine accurately the KNN for m randomly-selected data points
distance (bandwidth)
Choose error threshold
The optimal K and L should satisfy
the approximate distance
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing optimal K and Lbull For each K estimate the error forbull In one run for all Lrsquos find the minimal L satisfying the constraint L(K)bull Minimize time t(KL(K))
minimum
Approximationerror for KL
L(K) for =005 Running timet[KL(K)]
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Data driven partitions
bull In original LSH cut values are random in the range of the databull Suggestion Randomly select a point from the data and use one of its coordinates as the cut value
uniform data driven pointsbucket distribution
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Additional speedup
aggregate)an of typea like is (C mode same
the toconverge willCin points all that Assume
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
C
C
Speedup results
65536 points 1638 points sampled k=100
Food for thought
Low dimension High dimension
A thought for foodhellipbull Choose K L by sample learning or take the traditionalbull Can one estimate K L without samplingbull A thought for food does it help to know the data
dimensionality or the data manifoldbull Intuitively dimensionality implies the number of hash
functions neededbull The catch efficient dimensionality learning requires
KNN
1530 cookieshellip
Summary
bull LSH suggests a compromise on accuracy for the gain of complexity
bull Applications that involve massive data in high dimension require the LSH fast performance
bull Extension of the LSH to different spaces (PSH)
bull Learning the LSH parameters and hash functions for different applications
Conclusion
bull but at the endeverything depends on your data set
bull Try it at homendash Visit
httpwebmiteduandoniwwwLSHindexhtmlndash Email Alex Andoni Andonimitedundash Test over your own data
(C code under Red Hat Linux )
Thanks
bull Ilan Shimshoni (Haifa)
bull Mohamad Hegaze (Weizmann)
bull Alex Andoni (MIT)
bull Mica and Denis
- k-Nearest Neighbors Search in High Dimensions
- Outline
- Nearest Neighbor Search Problem definition
- Applications
- Naiumlve solution
- Common solution
- When to use nearest neighbor
- Nearest Neighbor
- r - Nearest Neighbor
- Slide 10
- The simplest solution
- Quadtree
- Quadtree - structure
- Quadtree - Query
- Quadtree ndash Pitfall1
- Slide 16
- Quadtree ndash pitfall 2
- Space partition based algorithms
- Slide 19
- Curse of dimensionality
- Curse of dimensionality Some intuition
- Slide 22
- Preview
- Hash function
- Slide 25
- Slide 26
- Recall r - Nearest Neighbor
- Locality sensitive hashing
- Slide 29
- Hamming Space
- Slide 31
- L1 to Hamming Space Embedding
- Slide 33
- Construction
- Query
- Alternative intuition random projections
- Slide 37
- Slide 38
- Slide 39
- k samplings
- Repeating
- Repeating L times
- Slide 43
- Secondary hashing
- The above hashing is locality-sensitive
- Slide 46
- Direct L2 solution
- Central limit theorem
- Slide 49
- Slide 50
- Norm Distance
- Slide 52
- The full Hashing
- Slide 54
- Slide 55
- Slide 56
- Generalization P-Stable distribution
- P-Stable summary
- Parameters selection
- Parameters selection hellip
- hellip Parameters selection
- Pros amp Cons
- Conclusion
- LSH - Applications
- Motivation
- Slide 66
- Given an image x what are the parameters θ in this image ie angles of joints orientation of the body etc1048698
- Ingredients
- Example based learning
- Slide 70
- The image features
- PSH The basic assumption
- Insight Manifolds
- Slide 74
- Parameter Sensitive Hashing (PSH)
- Slide 76
- Slide 77
- Slide 78
- Slide 79
- Local Weighted Regression (LWR)
- Results
- Slide 82
- Results ndash real data
- Slide 84
- Slide 85
- Fast pose estimation - summary
- Food for Thought
- Food for Thought Point Location in Different Spheres (PLDS)
- Motivation
- Slide 90
- Slide 91
- Slide 92
- Slide 93
- Image segmentation algorithm
- Slide 95
- Filtering examples
- Segmentation examples
- Mean-shift in high dimensions
- LSH-based data structure
- Choosing the optimal K and L
- Slide 101
- Choosing optimal K and L
- Slide 103
- Data driven partitions
- Additional speedup
- Speedup results
- Food for thought
- A thought for foodhellip
- Summary
- Slide 110
- Thanks
-
![Page 32: K-Nearest Neighbors Search in High Dimensions Tomer Peled Dan Kushnir Tell me who your neighbors are, and I'll know who you are.](https://reader036.fdocuments.us/reader036/viewer/2022062421/56649c765503460f94929f53/html5/thumbnails/32.jpg)
L1 to Hamming Space Embedding
p
8
C=11
1111111100011000000000
2
1111111100011000000000
drsquo=Cd
Hash function
Lj Hash function
p Hdrsquoisin
Gj(p)=p|Ij
j=1L k=3 digits
Bits sampling from p
Store p into bucket p|Ij 2k buckets101
11000000000 111111110000 111000000000 111111110001
Construction
1 2 L
p
Query
1 2 L
q
Alternative intuition random projections
p
8
C=11
1111111100011000000000
2
1111111100011000000000
drsquo=Cd
Alternative intuition random projections
8
C=11
1111111100011000000000
2
1111111100011000000000
p
Alternative intuition random projections
8
C=11
1111111100011000000000
2
1111111100011000000000
p
Alternative intuition random projections
101
11000000000 111111110000 111000000000 111111110001
000
100
110
001
101
111
2233 BucketsBucketsp
k samplings
Repeating
Repeating L times
Repeating L times
Secondary hashing
Support volume tuning
dataset-size vs storage volume
2k buckets
011
Size=B
M Buckets
Simple Hashing
MB=αn α=2
Skip
The above hashing is locality-sensitive
bullProbability (pq in same bucket)=
k=1 k=2
Distance (qpi) Distance (qpi)
Pro
babi
lity Pr
Adopted from Piotr Indykrsquos slides
kqp
dimensions
)(Distance1
Preview
bullGeneral Solution ndash Locality sensitive hashing
bullImplementation for Hamming space
bullGeneralization to l2
Direct L2 solution
bullNew hashing function
bullStill based on sampling
bullUsing mathematical trick
bullP-stable distribution for Lp distance bullGaussian distribution for L2 distance
Central limit theorem
v1 +v2 hellip+vn =+hellip
(Weighted Gaussians) = Weighted Gaussian
Central limit theorem
v1vn = Real Numbers
X1Xn = Independent Identically Distributed(iid)
+v2 X2 hellip+vn Xn =+hellipv1 X1
Central limit theorem
XvXvi
ii
ii
21
2||
Dot Product Norm
Norm Distance
XvuXvXui
iii
iii
ii
21
2||
Features vector 1
Features vector 2 Distance
Norm Distance
XvuXvXui
iii
iii
ii
21
2||
Dot Product
Dot Product Distance
The full Hashing
w
bvavh ba )(
[34 82 21]1
227742
d
d random numbers
+b
phaseRandom[0w]
wDiscretization step
Features vector
The full Hashing
w
bvavh ba )(
+34
100
7944
7900 8000 8100 82007800
The full Hashing
w
bvavh ba )(
+34
phaseRandom[0w]
100Discretization step
7944
The full Hashing
w
bvavh ba )(
a1 v d
iid from p-stable distribution
+b
phaseRandom[0w]
wDiscretization step
Features vector
Generalization P-Stable distribution
bullLp p=eps2
bullGeneralized Central Limit Theorem
bullP-stable distributionCauchy for L2
bullL2
bullCentral Limit Theorem
bullGaussian (normal) distribution
P-Stable summary
bullWorks for bullGeneralizes to 0ltplt=2
bullImproves query time
Query time = O (dn1(1+)log(n) ) O (dn1(1+)^2log(n) )
r - Nearest Neighbor
Latest resultsReported in Email by
Alexander Andoni
Parameters selection
bull90 Probability Best quarry time performance
For Euclidean Space
Parameters selectionhellip
For Euclidean Space
bullSingle projection hit an - Nearest Neighbor with Pr=p1
bullk projections hits an - Nearest Neighbor with Pr=p1k
bullL hashings fail to collide with Pr=(1-p1k)L
bullTo ensure Collision (eg 1-δge90)
bull1( -1-p1k)Lge 1-δ)1log(
)log(
1kp
L
L
Reject Non-NeighborsAccept Neighbors
hellipParameters selection
K
k
time Candidates verification Candidates extraction
Better Query Time than Spatial Data Structures
Scales well to higher dimensions and larger data size ( Sub-linear dependence )
Predictable running time
Extra storage over-head
Inefficient for data with distances concentrated around average
works best for Hamming distance (although can be generalized to Euclidean space)
In secondary storage linear scan is pretty much all we can do (for high dim)
requires radius r to be fixed in advance
Pros amp Cons
From Pioter Indyk slides
Conclusion
bullbut at the endeverything depends on your data set
bullTry it at homendashVisit
httpwebmiteduandoniwwwLSHindexhtml
ndashEmail Alex AndoniAndonimitedundashTest over your own data
(C code under Red Hat Linux )
LSH - Applicationsbull Searching video clips in databases (Hierarchical Non-Uniform Locality Sensitive
Hashing and Its Application to Video Identificationldquo Yang Ooi Sun)
bull Searching image databases (see the following)
bull Image segmentation (see the following)
bull Image classification (ldquoDiscriminant adaptive Nearest Neighbor Classificationrdquo T Hastie R Tibshirani)
bull Texture classification (see the following)
bull Clustering (see the following)
bull Embedding and manifold learning (LLE and many others)
bull Compression ndash vector quantizationbull Search engines (ldquoLSH Forest SelfTuning Indexes for Similarity Searchrdquo M Bawa T Condie P Ganesanrdquo)
bull Genomics (ldquoEfficient Large-Scale Sequence Comparison by Locality-Sensitive Hashingrdquo J Buhler)
bull In short whenever K-Nearest Neighbors (KNN) are needed
Motivation
bull A variety of procedures in learning require KNN computation
bull KNN search is a computational bottleneck
bull LSH provides a fast approximate solution to the problem
bull LSH requires hash function construction and parameter tunning
Outline
Fast Pose Estimation with Parameter Sensitive Hashing G Shakhnarovich P Viola and T Darrell
bull Finding sensitive hash functions
Mean Shift Based Clustering in HighDimensions A Texture Classification Example
B Georgescu I Shimshoni and P Meer
bull Tuning LSH parametersbull LSH data structure is used for algorithm
speedups
Given an image x what are the parameters θ in this image
ie angles of joints orientation of the body etc1048698
The Problem
Fast Pose Estimation with Parameter Sensitive Hashing
G Shakhnarovich P Viola and T Darrell
i
Ingredients
bull Input query image with unknown angles (parameters)
bull Database of human poses with known anglesbull Image feature extractor ndash edge detector
bull Distance metric in feature space dx
bull Distance metric in angles space
m
i
iid1
2121 )cos(1)(
Example based learning
bull Construct a database of example images with their known angles
bull Given a query image run your favorite feature extractorbull Compute KNN from databasebull Use these KNNs to compute the average angles of the
query
Input queryFind KNN in database of examples
Output Average angles of KNN
Input Query
Features extraction
Processed query
PSH (LSH)
Database of examples
The algorithm flow
LWR (Regression)
Output Match
The image features
B A
Axx 4107 )(
4
3
2
4 0
Image features are multi-scale edge histograms
Feature Extraction PSH LWR
PSH The basic assumption
There are two metric spaces here feature space ( )
and parameter space ( )
We want similarity to be measured in the angles
space whereas LSH works on the feature space
bull Assumption The feature space is closely related to the parameter space
xd
d
Feature Extraction PSH LWR
Insight Manifolds
bull Manifold is a space in which every point has a neighborhood resembling a Euclid space
bull But global structure may be complicated curved
bull For example lines are 1D manifolds planes are 2D manifolds etc
Feature Extraction PSH LWR
Parameters Space (angles)
Feature Space
q
Is this Magic
Parameter Sensitive Hashing (PSH)
The trick
Estimate performance of different hash functions on examples and select those sensitive to
The hash functions are applied in feature space but the KNN are valid in angle space
d
Feature Extraction PSH LWR
Label pairs of examples with similar angles
Define hash functions h on feature space
Feature Extraction PSH LWR
Predict labeling of similarnon-similar examples by using h
Compare labeling
If labeling by h is goodaccept h else change h
PSH as a classification problem
+1 +1 -1 -1
(r=025)
Labels
)1()( if 1
)( if 1y
labeled is
)x()(x examples ofpair A
ij
ji
rd
rd
ji
ji
ji
Feature Extraction PSH LWR
otherwise 1-
T(x) if 1)(
xh T
A binary hash functionfeatures
otherwise 1
if 1ˆ
labels ePredict th
)(xh)(xh)x(xy
jTiTjih
Feature Extraction PSH LWR
Feature
Feature Extraction PSH LWR
sconstraint iesprobabilit with thelabeling true thepredicts that Tbest theFind
themseparateor bin
same in the examplesboth place willTh
)(xT
Local Weighted Regression (LWR)bull Given a query image PSH returns
KNNs
bull LWR uses the KNN to compute a weighted average of the estimated angles of the query
weightdist
iXiixNx
xxdKxgdi
0)(
))(())((minarg0
Feature Extraction PSH LWR
Results
Synthetic data were generated
bull 13 angles 1 for rotation of the torso 12 for joints
bull 150000 images
bull Nuisance parameters added clothing illumination face expression
bull 1775000 example pairs
bull Selected 137 out of 5123 meaningful features (how)
18 bit hash functions (k) 150 hash tables (l)
bull Test on 1000 synthetic examplesbull PSH searched only 34 of the data per query
bull Without selection needed 40 bits and
1000 hash tables
Recall P1 is prob of positive hashP2 is prob of bad hashB is the max number of pts in a bucket
Results ndash real data
bull 800 images
bull Processed by a segmentation algorithm
bull 13 of the data were searched
Results ndash real data
Interesting mismatches
Fast pose estimation - summary
bull Fast way to compute the angles of human body figure
bull Moving from one representation space to another
bull Training a sensitive hash function
bull KNN smart averaging
Food for Thought
bull The basic assumption may be problematic (distance metric representations)
bull The training set should be dense
bull Texture and clutter
bull General some features are more important than others and should be weighted
Food for Thought Point Location in Different Spheres (PLDS)
bull Given n spheres in Rd centered at P=p1hellippn
with radii r1helliprn
bull Goal given a query q preprocess the points in P to find point pi that its sphere lsquocoverrsquo the query q
qpi
ri
Courtesy of Mohamad Hegaze
Motivationbull Clustering high dimensional data by using local
density measurements (eg feature space)bull Statistical curse of dimensionality
sparseness of the databull Computational curse of dimensionality
expensive range queriesbull LSH parameters should be adjusted for optimal
performance
Mean-Shift Based Clustering in High Dimensions A Texture Classification Example
B Georgescu I Shimshoni and P Meer
Outline
bull Mean-shift in a nutshell + examples
Our scope
bull Mean-shift in high dimensions ndash using LSH
bull Speedups1 Finding optimal LSH parameters
2 Data-driven partitions into buckets
3 Additional speedup by using LSH data structure
Mean-Shift in a Nutshellbandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
point
KNN in mean-shift
Bandwidth should be inversely proportional to the density in the region
high density - small bandwidth low density - large bandwidth
Based on kth nearest neighbor of the point
The bandwidth is
Adaptive mean-shift vs non-adaptive
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Image segmentation algorithm1 Input Data in 5D (3 color + 2 xy) or 3D (1 gray +2 xy)2 Resolution controlled by the bandwidth hs (spatial) hr (color)3 Apply filtering
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
3D
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Image segmentation algorithm
original segmented
filtered
Filtering pixel value of the nearest mode
Mean-shift trajectories
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
original squirrel filtered
original baboon filtered
Filtering examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Segmentation examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Mean-shift in high dimensions
Computational curse of dimensionality
Statistical curse of dimensionality
Expensive range queries implemented with LSH
Sparseness of the data variable bandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
LSH-based data structure
bull Choose L random partitionsEach partition includes K pairs
(dkvk)bull For each point we check
kdi vxK
It Partitions the data into cells
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing the optimal K and L
bull For a query q compute smallest number of distances to points in its buckets
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
points extra includemight big toois L ifbut
missed bemight points small toois L If
cell ain points ofnumber smaller k Large
C
l
l
CC
dC
LNN
dKnN
)1(
C
C
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
structure data theof resolution thedetermines
decreases but increases increases L As
C
CC
Choosing optimal K and LDetermine accurately the KNN for m randomly-selected data points
distance (bandwidth)
Choose error threshold
The optimal K and L should satisfy
the approximate distance
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing optimal K and Lbull For each K estimate the error forbull In one run for all Lrsquos find the minimal L satisfying the constraint L(K)bull Minimize time t(KL(K))
minimum
Approximationerror for KL
L(K) for =005 Running timet[KL(K)]
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Data driven partitions
bull In original LSH cut values are random in the range of the databull Suggestion Randomly select a point from the data and use one of its coordinates as the cut value
uniform data driven pointsbucket distribution
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Additional speedup
aggregate)an of typea like is (C mode same
the toconverge willCin points all that Assume
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
C
C
Speedup results
65536 points 1638 points sampled k=100
Food for thought
Low dimension High dimension
A thought for foodhellipbull Choose K L by sample learning or take the traditionalbull Can one estimate K L without samplingbull A thought for food does it help to know the data
dimensionality or the data manifoldbull Intuitively dimensionality implies the number of hash
functions neededbull The catch efficient dimensionality learning requires
KNN
1530 cookieshellip
Summary
bull LSH suggests a compromise on accuracy for the gain of complexity
bull Applications that involve massive data in high dimension require the LSH fast performance
bull Extension of the LSH to different spaces (PSH)
bull Learning the LSH parameters and hash functions for different applications
Conclusion
bull but at the endeverything depends on your data set
bull Try it at homendash Visit
httpwebmiteduandoniwwwLSHindexhtmlndash Email Alex Andoni Andonimitedundash Test over your own data
(C code under Red Hat Linux )
Thanks
bull Ilan Shimshoni (Haifa)
bull Mohamad Hegaze (Weizmann)
bull Alex Andoni (MIT)
bull Mica and Denis
- k-Nearest Neighbors Search in High Dimensions
- Outline
- Nearest Neighbor Search Problem definition
- Applications
- Naiumlve solution
- Common solution
- When to use nearest neighbor
- Nearest Neighbor
- r - Nearest Neighbor
- Slide 10
- The simplest solution
- Quadtree
- Quadtree - structure
- Quadtree - Query
- Quadtree ndash Pitfall1
- Slide 16
- Quadtree ndash pitfall 2
- Space partition based algorithms
- Slide 19
- Curse of dimensionality
- Curse of dimensionality Some intuition
- Slide 22
- Preview
- Hash function
- Slide 25
- Slide 26
- Recall r - Nearest Neighbor
- Locality sensitive hashing
- Slide 29
- Hamming Space
- Slide 31
- L1 to Hamming Space Embedding
- Slide 33
- Construction
- Query
- Alternative intuition random projections
- Slide 37
- Slide 38
- Slide 39
- k samplings
- Repeating
- Repeating L times
- Slide 43
- Secondary hashing
- The above hashing is locality-sensitive
- Slide 46
- Direct L2 solution
- Central limit theorem
- Slide 49
- Slide 50
- Norm Distance
- Slide 52
- The full Hashing
- Slide 54
- Slide 55
- Slide 56
- Generalization P-Stable distribution
- P-Stable summary
- Parameters selection
- Parameters selection hellip
- hellip Parameters selection
- Pros amp Cons
- Conclusion
- LSH - Applications
- Motivation
- Slide 66
- Given an image x what are the parameters θ in this image ie angles of joints orientation of the body etc1048698
- Ingredients
- Example based learning
- Slide 70
- The image features
- PSH The basic assumption
- Insight Manifolds
- Slide 74
- Parameter Sensitive Hashing (PSH)
- Slide 76
- Slide 77
- Slide 78
- Slide 79
- Local Weighted Regression (LWR)
- Results
- Slide 82
- Results ndash real data
- Slide 84
- Slide 85
- Fast pose estimation - summary
- Food for Thought
- Food for Thought Point Location in Different Spheres (PLDS)
- Motivation
- Slide 90
- Slide 91
- Slide 92
- Slide 93
- Image segmentation algorithm
- Slide 95
- Filtering examples
- Segmentation examples
- Mean-shift in high dimensions
- LSH-based data structure
- Choosing the optimal K and L
- Slide 101
- Choosing optimal K and L
- Slide 103
- Data driven partitions
- Additional speedup
- Speedup results
- Food for thought
- A thought for foodhellip
- Summary
- Slide 110
- Thanks
-
![Page 33: K-Nearest Neighbors Search in High Dimensions Tomer Peled Dan Kushnir Tell me who your neighbors are, and I'll know who you are.](https://reader036.fdocuments.us/reader036/viewer/2022062421/56649c765503460f94929f53/html5/thumbnails/33.jpg)
Hash function
Lj Hash function
p Hdrsquoisin
Gj(p)=p|Ij
j=1L k=3 digits
Bits sampling from p
Store p into bucket p|Ij 2k buckets101
11000000000 111111110000 111000000000 111111110001
Construction
1 2 L
p
Query
1 2 L
q
Alternative intuition random projections
p
8
C=11
1111111100011000000000
2
1111111100011000000000
drsquo=Cd
Alternative intuition random projections
8
C=11
1111111100011000000000
2
1111111100011000000000
p
Alternative intuition random projections
8
C=11
1111111100011000000000
2
1111111100011000000000
p
Alternative intuition random projections
101
11000000000 111111110000 111000000000 111111110001
000
100
110
001
101
111
2233 BucketsBucketsp
k samplings
Repeating
Repeating L times
Repeating L times
Secondary hashing
Support volume tuning
dataset-size vs storage volume
2k buckets
011
Size=B
M Buckets
Simple Hashing
MB=αn α=2
Skip
The above hashing is locality-sensitive
bullProbability (pq in same bucket)=
k=1 k=2
Distance (qpi) Distance (qpi)
Pro
babi
lity Pr
Adopted from Piotr Indykrsquos slides
kqp
dimensions
)(Distance1
Preview
bullGeneral Solution ndash Locality sensitive hashing
bullImplementation for Hamming space
bullGeneralization to l2
Direct L2 solution
bullNew hashing function
bullStill based on sampling
bullUsing mathematical trick
bullP-stable distribution for Lp distance bullGaussian distribution for L2 distance
Central limit theorem
v1 +v2 hellip+vn =+hellip
(Weighted Gaussians) = Weighted Gaussian
Central limit theorem
v1vn = Real Numbers
X1Xn = Independent Identically Distributed(iid)
+v2 X2 hellip+vn Xn =+hellipv1 X1
Central limit theorem
XvXvi
ii
ii
21
2||
Dot Product Norm
Norm Distance
XvuXvXui
iii
iii
ii
21
2||
Features vector 1
Features vector 2 Distance
Norm Distance
XvuXvXui
iii
iii
ii
21
2||
Dot Product
Dot Product Distance
The full Hashing
w
bvavh ba )(
[34 82 21]1
227742
d
d random numbers
+b
phaseRandom[0w]
wDiscretization step
Features vector
The full Hashing
w
bvavh ba )(
+34
100
7944
7900 8000 8100 82007800
The full Hashing
w
bvavh ba )(
+34
phaseRandom[0w]
100Discretization step
7944
The full Hashing
w
bvavh ba )(
a1 v d
iid from p-stable distribution
+b
phaseRandom[0w]
wDiscretization step
Features vector
Generalization P-Stable distribution
bullLp p=eps2
bullGeneralized Central Limit Theorem
bullP-stable distributionCauchy for L2
bullL2
bullCentral Limit Theorem
bullGaussian (normal) distribution
P-Stable summary
bullWorks for bullGeneralizes to 0ltplt=2
bullImproves query time
Query time = O (dn1(1+)log(n) ) O (dn1(1+)^2log(n) )
r - Nearest Neighbor
Latest resultsReported in Email by
Alexander Andoni
Parameters selection
bull90 Probability Best quarry time performance
For Euclidean Space
Parameters selectionhellip
For Euclidean Space
bullSingle projection hit an - Nearest Neighbor with Pr=p1
bullk projections hits an - Nearest Neighbor with Pr=p1k
bullL hashings fail to collide with Pr=(1-p1k)L
bullTo ensure Collision (eg 1-δge90)
bull1( -1-p1k)Lge 1-δ)1log(
)log(
1kp
L
L
Reject Non-NeighborsAccept Neighbors
hellipParameters selection
K
k
time Candidates verification Candidates extraction
Better Query Time than Spatial Data Structures
Scales well to higher dimensions and larger data size ( Sub-linear dependence )
Predictable running time
Extra storage over-head
Inefficient for data with distances concentrated around average
works best for Hamming distance (although can be generalized to Euclidean space)
In secondary storage linear scan is pretty much all we can do (for high dim)
requires radius r to be fixed in advance
Pros amp Cons
From Pioter Indyk slides
Conclusion
bullbut at the endeverything depends on your data set
bullTry it at homendashVisit
httpwebmiteduandoniwwwLSHindexhtml
ndashEmail Alex AndoniAndonimitedundashTest over your own data
(C code under Red Hat Linux )
LSH - Applicationsbull Searching video clips in databases (Hierarchical Non-Uniform Locality Sensitive
Hashing and Its Application to Video Identificationldquo Yang Ooi Sun)
bull Searching image databases (see the following)
bull Image segmentation (see the following)
bull Image classification (ldquoDiscriminant adaptive Nearest Neighbor Classificationrdquo T Hastie R Tibshirani)
bull Texture classification (see the following)
bull Clustering (see the following)
bull Embedding and manifold learning (LLE and many others)
bull Compression ndash vector quantizationbull Search engines (ldquoLSH Forest SelfTuning Indexes for Similarity Searchrdquo M Bawa T Condie P Ganesanrdquo)
bull Genomics (ldquoEfficient Large-Scale Sequence Comparison by Locality-Sensitive Hashingrdquo J Buhler)
bull In short whenever K-Nearest Neighbors (KNN) are needed
Motivation
bull A variety of procedures in learning require KNN computation
bull KNN search is a computational bottleneck
bull LSH provides a fast approximate solution to the problem
bull LSH requires hash function construction and parameter tunning
Outline
Fast Pose Estimation with Parameter Sensitive Hashing G Shakhnarovich P Viola and T Darrell
bull Finding sensitive hash functions
Mean Shift Based Clustering in HighDimensions A Texture Classification Example
B Georgescu I Shimshoni and P Meer
bull Tuning LSH parametersbull LSH data structure is used for algorithm
speedups
Given an image x what are the parameters θ in this image
ie angles of joints orientation of the body etc1048698
The Problem
Fast Pose Estimation with Parameter Sensitive Hashing
G Shakhnarovich P Viola and T Darrell
i
Ingredients
bull Input query image with unknown angles (parameters)
bull Database of human poses with known anglesbull Image feature extractor ndash edge detector
bull Distance metric in feature space dx
bull Distance metric in angles space
m
i
iid1
2121 )cos(1)(
Example based learning
bull Construct a database of example images with their known angles
bull Given a query image run your favorite feature extractorbull Compute KNN from databasebull Use these KNNs to compute the average angles of the
query
Input queryFind KNN in database of examples
Output Average angles of KNN
Input Query
Features extraction
Processed query
PSH (LSH)
Database of examples
The algorithm flow
LWR (Regression)
Output Match
The image features
B A
Axx 4107 )(
4
3
2
4 0
Image features are multi-scale edge histograms
Feature Extraction PSH LWR
PSH The basic assumption
There are two metric spaces here feature space ( )
and parameter space ( )
We want similarity to be measured in the angles
space whereas LSH works on the feature space
bull Assumption The feature space is closely related to the parameter space
xd
d
Feature Extraction PSH LWR
Insight Manifolds
bull Manifold is a space in which every point has a neighborhood resembling a Euclid space
bull But global structure may be complicated curved
bull For example lines are 1D manifolds planes are 2D manifolds etc
Feature Extraction PSH LWR
Parameters Space (angles)
Feature Space
q
Is this Magic
Parameter Sensitive Hashing (PSH)
The trick
Estimate performance of different hash functions on examples and select those sensitive to
The hash functions are applied in feature space but the KNN are valid in angle space
d
Feature Extraction PSH LWR
Label pairs of examples with similar angles
Define hash functions h on feature space
Feature Extraction PSH LWR
Predict labeling of similarnon-similar examples by using h
Compare labeling
If labeling by h is goodaccept h else change h
PSH as a classification problem
+1 +1 -1 -1
(r=025)
Labels
)1()( if 1
)( if 1y
labeled is
)x()(x examples ofpair A
ij
ji
rd
rd
ji
ji
ji
Feature Extraction PSH LWR
otherwise 1-
T(x) if 1)(
xh T
A binary hash functionfeatures
otherwise 1
if 1ˆ
labels ePredict th
)(xh)(xh)x(xy
jTiTjih
Feature Extraction PSH LWR
Feature
Feature Extraction PSH LWR
sconstraint iesprobabilit with thelabeling true thepredicts that Tbest theFind
themseparateor bin
same in the examplesboth place willTh
)(xT
Local Weighted Regression (LWR)bull Given a query image PSH returns
KNNs
bull LWR uses the KNN to compute a weighted average of the estimated angles of the query
weightdist
iXiixNx
xxdKxgdi
0)(
))(())((minarg0
Feature Extraction PSH LWR
Results
Synthetic data were generated
bull 13 angles 1 for rotation of the torso 12 for joints
bull 150000 images
bull Nuisance parameters added clothing illumination face expression
bull 1775000 example pairs
bull Selected 137 out of 5123 meaningful features (how)
18 bit hash functions (k) 150 hash tables (l)
bull Test on 1000 synthetic examplesbull PSH searched only 34 of the data per query
bull Without selection needed 40 bits and
1000 hash tables
Recall P1 is prob of positive hashP2 is prob of bad hashB is the max number of pts in a bucket
Results ndash real data
bull 800 images
bull Processed by a segmentation algorithm
bull 13 of the data were searched
Results ndash real data
Interesting mismatches
Fast pose estimation - summary
bull Fast way to compute the angles of human body figure
bull Moving from one representation space to another
bull Training a sensitive hash function
bull KNN smart averaging
Food for Thought
bull The basic assumption may be problematic (distance metric representations)
bull The training set should be dense
bull Texture and clutter
bull General some features are more important than others and should be weighted
Food for Thought Point Location in Different Spheres (PLDS)
bull Given n spheres in Rd centered at P=p1hellippn
with radii r1helliprn
bull Goal given a query q preprocess the points in P to find point pi that its sphere lsquocoverrsquo the query q
qpi
ri
Courtesy of Mohamad Hegaze
Motivationbull Clustering high dimensional data by using local
density measurements (eg feature space)bull Statistical curse of dimensionality
sparseness of the databull Computational curse of dimensionality
expensive range queriesbull LSH parameters should be adjusted for optimal
performance
Mean-Shift Based Clustering in High Dimensions A Texture Classification Example
B Georgescu I Shimshoni and P Meer
Outline
bull Mean-shift in a nutshell + examples
Our scope
bull Mean-shift in high dimensions ndash using LSH
bull Speedups1 Finding optimal LSH parameters
2 Data-driven partitions into buckets
3 Additional speedup by using LSH data structure
Mean-Shift in a Nutshellbandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
point
KNN in mean-shift
Bandwidth should be inversely proportional to the density in the region
high density - small bandwidth low density - large bandwidth
Based on kth nearest neighbor of the point
The bandwidth is
Adaptive mean-shift vs non-adaptive
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Image segmentation algorithm1 Input Data in 5D (3 color + 2 xy) or 3D (1 gray +2 xy)2 Resolution controlled by the bandwidth hs (spatial) hr (color)3 Apply filtering
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
3D
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Image segmentation algorithm
original segmented
filtered
Filtering pixel value of the nearest mode
Mean-shift trajectories
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
original squirrel filtered
original baboon filtered
Filtering examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Segmentation examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Mean-shift in high dimensions
Computational curse of dimensionality
Statistical curse of dimensionality
Expensive range queries implemented with LSH
Sparseness of the data variable bandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
LSH-based data structure
bull Choose L random partitionsEach partition includes K pairs
(dkvk)bull For each point we check
kdi vxK
It Partitions the data into cells
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing the optimal K and L
bull For a query q compute smallest number of distances to points in its buckets
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
points extra includemight big toois L ifbut
missed bemight points small toois L If
cell ain points ofnumber smaller k Large
C
l
l
CC
dC
LNN
dKnN
)1(
C
C
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
structure data theof resolution thedetermines
decreases but increases increases L As
C
CC
Choosing optimal K and LDetermine accurately the KNN for m randomly-selected data points
distance (bandwidth)
Choose error threshold
The optimal K and L should satisfy
the approximate distance
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing optimal K and Lbull For each K estimate the error forbull In one run for all Lrsquos find the minimal L satisfying the constraint L(K)bull Minimize time t(KL(K))
minimum
Approximationerror for KL
L(K) for =005 Running timet[KL(K)]
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Data driven partitions
bull In original LSH cut values are random in the range of the databull Suggestion Randomly select a point from the data and use one of its coordinates as the cut value
uniform data driven pointsbucket distribution
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Additional speedup
aggregate)an of typea like is (C mode same
the toconverge willCin points all that Assume
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
C
C
Speedup results
65536 points 1638 points sampled k=100
Food for thought
Low dimension High dimension
A thought for foodhellipbull Choose K L by sample learning or take the traditionalbull Can one estimate K L without samplingbull A thought for food does it help to know the data
dimensionality or the data manifoldbull Intuitively dimensionality implies the number of hash
functions neededbull The catch efficient dimensionality learning requires
KNN
1530 cookieshellip
Summary
bull LSH suggests a compromise on accuracy for the gain of complexity
bull Applications that involve massive data in high dimension require the LSH fast performance
bull Extension of the LSH to different spaces (PSH)
bull Learning the LSH parameters and hash functions for different applications
Conclusion
bull but at the endeverything depends on your data set
bull Try it at homendash Visit
httpwebmiteduandoniwwwLSHindexhtmlndash Email Alex Andoni Andonimitedundash Test over your own data
(C code under Red Hat Linux )
Thanks
bull Ilan Shimshoni (Haifa)
bull Mohamad Hegaze (Weizmann)
bull Alex Andoni (MIT)
bull Mica and Denis
- k-Nearest Neighbors Search in High Dimensions
- Outline
- Nearest Neighbor Search Problem definition
- Applications
- Naiumlve solution
- Common solution
- When to use nearest neighbor
- Nearest Neighbor
- r - Nearest Neighbor
- Slide 10
- The simplest solution
- Quadtree
- Quadtree - structure
- Quadtree - Query
- Quadtree ndash Pitfall1
- Slide 16
- Quadtree ndash pitfall 2
- Space partition based algorithms
- Slide 19
- Curse of dimensionality
- Curse of dimensionality Some intuition
- Slide 22
- Preview
- Hash function
- Slide 25
- Slide 26
- Recall r - Nearest Neighbor
- Locality sensitive hashing
- Slide 29
- Hamming Space
- Slide 31
- L1 to Hamming Space Embedding
- Slide 33
- Construction
- Query
- Alternative intuition random projections
- Slide 37
- Slide 38
- Slide 39
- k samplings
- Repeating
- Repeating L times
- Slide 43
- Secondary hashing
- The above hashing is locality-sensitive
- Slide 46
- Direct L2 solution
- Central limit theorem
- Slide 49
- Slide 50
- Norm Distance
- Slide 52
- The full Hashing
- Slide 54
- Slide 55
- Slide 56
- Generalization P-Stable distribution
- P-Stable summary
- Parameters selection
- Parameters selection hellip
- hellip Parameters selection
- Pros amp Cons
- Conclusion
- LSH - Applications
- Motivation
- Slide 66
- Given an image x what are the parameters θ in this image ie angles of joints orientation of the body etc1048698
- Ingredients
- Example based learning
- Slide 70
- The image features
- PSH The basic assumption
- Insight Manifolds
- Slide 74
- Parameter Sensitive Hashing (PSH)
- Slide 76
- Slide 77
- Slide 78
- Slide 79
- Local Weighted Regression (LWR)
- Results
- Slide 82
- Results ndash real data
- Slide 84
- Slide 85
- Fast pose estimation - summary
- Food for Thought
- Food for Thought Point Location in Different Spheres (PLDS)
- Motivation
- Slide 90
- Slide 91
- Slide 92
- Slide 93
- Image segmentation algorithm
- Slide 95
- Filtering examples
- Segmentation examples
- Mean-shift in high dimensions
- LSH-based data structure
- Choosing the optimal K and L
- Slide 101
- Choosing optimal K and L
- Slide 103
- Data driven partitions
- Additional speedup
- Speedup results
- Food for thought
- A thought for foodhellip
- Summary
- Slide 110
- Thanks
-
![Page 34: K-Nearest Neighbors Search in High Dimensions Tomer Peled Dan Kushnir Tell me who your neighbors are, and I'll know who you are.](https://reader036.fdocuments.us/reader036/viewer/2022062421/56649c765503460f94929f53/html5/thumbnails/34.jpg)
Construction
1 2 L
p
Query
1 2 L
q
Alternative intuition random projections
p
8
C=11
1111111100011000000000
2
1111111100011000000000
drsquo=Cd
Alternative intuition random projections
8
C=11
1111111100011000000000
2
1111111100011000000000
p
Alternative intuition random projections
8
C=11
1111111100011000000000
2
1111111100011000000000
p
Alternative intuition random projections
101
11000000000 111111110000 111000000000 111111110001
000
100
110
001
101
111
2233 BucketsBucketsp
k samplings
Repeating
Repeating L times
Repeating L times
Secondary hashing
Support volume tuning
dataset-size vs storage volume
2k buckets
011
Size=B
M Buckets
Simple Hashing
MB=αn α=2
Skip
The above hashing is locality-sensitive
bullProbability (pq in same bucket)=
k=1 k=2
Distance (qpi) Distance (qpi)
Pro
babi
lity Pr
Adopted from Piotr Indykrsquos slides
kqp
dimensions
)(Distance1
Preview
bullGeneral Solution ndash Locality sensitive hashing
bullImplementation for Hamming space
bullGeneralization to l2
Direct L2 solution
bullNew hashing function
bullStill based on sampling
bullUsing mathematical trick
bullP-stable distribution for Lp distance bullGaussian distribution for L2 distance
Central limit theorem
v1 +v2 hellip+vn =+hellip
(Weighted Gaussians) = Weighted Gaussian
Central limit theorem
v1vn = Real Numbers
X1Xn = Independent Identically Distributed(iid)
+v2 X2 hellip+vn Xn =+hellipv1 X1
Central limit theorem
XvXvi
ii
ii
21
2||
Dot Product Norm
Norm Distance
XvuXvXui
iii
iii
ii
21
2||
Features vector 1
Features vector 2 Distance
Norm Distance
XvuXvXui
iii
iii
ii
21
2||
Dot Product
Dot Product Distance
The full Hashing
w
bvavh ba )(
[34 82 21]1
227742
d
d random numbers
+b
phaseRandom[0w]
wDiscretization step
Features vector
The full Hashing
w
bvavh ba )(
+34
100
7944
7900 8000 8100 82007800
The full Hashing
w
bvavh ba )(
+34
phaseRandom[0w]
100Discretization step
7944
The full Hashing
w
bvavh ba )(
a1 v d
iid from p-stable distribution
+b
phaseRandom[0w]
wDiscretization step
Features vector
Generalization P-Stable distribution
bullLp p=eps2
bullGeneralized Central Limit Theorem
bullP-stable distributionCauchy for L2
bullL2
bullCentral Limit Theorem
bullGaussian (normal) distribution
P-Stable summary
bullWorks for bullGeneralizes to 0ltplt=2
bullImproves query time
Query time = O (dn1(1+)log(n) ) O (dn1(1+)^2log(n) )
r - Nearest Neighbor
Latest resultsReported in Email by
Alexander Andoni
Parameters selection
bull90 Probability Best quarry time performance
For Euclidean Space
Parameters selectionhellip
For Euclidean Space
bullSingle projection hit an - Nearest Neighbor with Pr=p1
bullk projections hits an - Nearest Neighbor with Pr=p1k
bullL hashings fail to collide with Pr=(1-p1k)L
bullTo ensure Collision (eg 1-δge90)
bull1( -1-p1k)Lge 1-δ)1log(
)log(
1kp
L
L
Reject Non-NeighborsAccept Neighbors
hellipParameters selection
K
k
time Candidates verification Candidates extraction
Better Query Time than Spatial Data Structures
Scales well to higher dimensions and larger data size ( Sub-linear dependence )
Predictable running time
Extra storage over-head
Inefficient for data with distances concentrated around average
works best for Hamming distance (although can be generalized to Euclidean space)
In secondary storage linear scan is pretty much all we can do (for high dim)
requires radius r to be fixed in advance
Pros amp Cons
From Pioter Indyk slides
Conclusion
bullbut at the endeverything depends on your data set
bullTry it at homendashVisit
httpwebmiteduandoniwwwLSHindexhtml
ndashEmail Alex AndoniAndonimitedundashTest over your own data
(C code under Red Hat Linux )
LSH - Applicationsbull Searching video clips in databases (Hierarchical Non-Uniform Locality Sensitive
Hashing and Its Application to Video Identificationldquo Yang Ooi Sun)
bull Searching image databases (see the following)
bull Image segmentation (see the following)
bull Image classification (ldquoDiscriminant adaptive Nearest Neighbor Classificationrdquo T Hastie R Tibshirani)
bull Texture classification (see the following)
bull Clustering (see the following)
bull Embedding and manifold learning (LLE and many others)
bull Compression ndash vector quantizationbull Search engines (ldquoLSH Forest SelfTuning Indexes for Similarity Searchrdquo M Bawa T Condie P Ganesanrdquo)
bull Genomics (ldquoEfficient Large-Scale Sequence Comparison by Locality-Sensitive Hashingrdquo J Buhler)
bull In short whenever K-Nearest Neighbors (KNN) are needed
Motivation
bull A variety of procedures in learning require KNN computation
bull KNN search is a computational bottleneck
bull LSH provides a fast approximate solution to the problem
bull LSH requires hash function construction and parameter tunning
Outline
Fast Pose Estimation with Parameter Sensitive Hashing G Shakhnarovich P Viola and T Darrell
bull Finding sensitive hash functions
Mean Shift Based Clustering in HighDimensions A Texture Classification Example
B Georgescu I Shimshoni and P Meer
bull Tuning LSH parametersbull LSH data structure is used for algorithm
speedups
Given an image x what are the parameters θ in this image
ie angles of joints orientation of the body etc1048698
The Problem
Fast Pose Estimation with Parameter Sensitive Hashing
G Shakhnarovich P Viola and T Darrell
i
Ingredients
bull Input query image with unknown angles (parameters)
bull Database of human poses with known anglesbull Image feature extractor ndash edge detector
bull Distance metric in feature space dx
bull Distance metric in angles space
m
i
iid1
2121 )cos(1)(
Example based learning
bull Construct a database of example images with their known angles
bull Given a query image run your favorite feature extractorbull Compute KNN from databasebull Use these KNNs to compute the average angles of the
query
Input queryFind KNN in database of examples
Output Average angles of KNN
Input Query
Features extraction
Processed query
PSH (LSH)
Database of examples
The algorithm flow
LWR (Regression)
Output Match
The image features
B A
Axx 4107 )(
4
3
2
4 0
Image features are multi-scale edge histograms
Feature Extraction PSH LWR
PSH The basic assumption
There are two metric spaces here feature space ( )
and parameter space ( )
We want similarity to be measured in the angles
space whereas LSH works on the feature space
bull Assumption The feature space is closely related to the parameter space
xd
d
Feature Extraction PSH LWR
Insight Manifolds
bull Manifold is a space in which every point has a neighborhood resembling a Euclid space
bull But global structure may be complicated curved
bull For example lines are 1D manifolds planes are 2D manifolds etc
Feature Extraction PSH LWR
Parameters Space (angles)
Feature Space
q
Is this Magic
Parameter Sensitive Hashing (PSH)
The trick
Estimate performance of different hash functions on examples and select those sensitive to
The hash functions are applied in feature space but the KNN are valid in angle space
d
Feature Extraction PSH LWR
Label pairs of examples with similar angles
Define hash functions h on feature space
Feature Extraction PSH LWR
Predict labeling of similarnon-similar examples by using h
Compare labeling
If labeling by h is goodaccept h else change h
PSH as a classification problem
+1 +1 -1 -1
(r=025)
Labels
)1()( if 1
)( if 1y
labeled is
)x()(x examples ofpair A
ij
ji
rd
rd
ji
ji
ji
Feature Extraction PSH LWR
otherwise 1-
T(x) if 1)(
xh T
A binary hash functionfeatures
otherwise 1
if 1ˆ
labels ePredict th
)(xh)(xh)x(xy
jTiTjih
Feature Extraction PSH LWR
Feature
Feature Extraction PSH LWR
sconstraint iesprobabilit with thelabeling true thepredicts that Tbest theFind
themseparateor bin
same in the examplesboth place willTh
)(xT
Local Weighted Regression (LWR)bull Given a query image PSH returns
KNNs
bull LWR uses the KNN to compute a weighted average of the estimated angles of the query
weightdist
iXiixNx
xxdKxgdi
0)(
))(())((minarg0
Feature Extraction PSH LWR
Results
Synthetic data were generated
bull 13 angles 1 for rotation of the torso 12 for joints
bull 150000 images
bull Nuisance parameters added clothing illumination face expression
bull 1775000 example pairs
bull Selected 137 out of 5123 meaningful features (how)
18 bit hash functions (k) 150 hash tables (l)
bull Test on 1000 synthetic examplesbull PSH searched only 34 of the data per query
bull Without selection needed 40 bits and
1000 hash tables
Recall P1 is prob of positive hashP2 is prob of bad hashB is the max number of pts in a bucket
Results ndash real data
bull 800 images
bull Processed by a segmentation algorithm
bull 13 of the data were searched
Results ndash real data
Interesting mismatches
Fast pose estimation - summary
bull Fast way to compute the angles of human body figure
bull Moving from one representation space to another
bull Training a sensitive hash function
bull KNN smart averaging
Food for Thought
bull The basic assumption may be problematic (distance metric representations)
bull The training set should be dense
bull Texture and clutter
bull General some features are more important than others and should be weighted
Food for Thought Point Location in Different Spheres (PLDS)
bull Given n spheres in Rd centered at P=p1hellippn
with radii r1helliprn
bull Goal given a query q preprocess the points in P to find point pi that its sphere lsquocoverrsquo the query q
qpi
ri
Courtesy of Mohamad Hegaze
Motivationbull Clustering high dimensional data by using local
density measurements (eg feature space)bull Statistical curse of dimensionality
sparseness of the databull Computational curse of dimensionality
expensive range queriesbull LSH parameters should be adjusted for optimal
performance
Mean-Shift Based Clustering in High Dimensions A Texture Classification Example
B Georgescu I Shimshoni and P Meer
Outline
bull Mean-shift in a nutshell + examples
Our scope
bull Mean-shift in high dimensions ndash using LSH
bull Speedups1 Finding optimal LSH parameters
2 Data-driven partitions into buckets
3 Additional speedup by using LSH data structure
Mean-Shift in a Nutshellbandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
point
KNN in mean-shift
Bandwidth should be inversely proportional to the density in the region
high density - small bandwidth low density - large bandwidth
Based on kth nearest neighbor of the point
The bandwidth is
Adaptive mean-shift vs non-adaptive
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Image segmentation algorithm1 Input Data in 5D (3 color + 2 xy) or 3D (1 gray +2 xy)2 Resolution controlled by the bandwidth hs (spatial) hr (color)3 Apply filtering
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
3D
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Image segmentation algorithm
original segmented
filtered
Filtering pixel value of the nearest mode
Mean-shift trajectories
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
original squirrel filtered
original baboon filtered
Filtering examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Segmentation examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Mean-shift in high dimensions
Computational curse of dimensionality
Statistical curse of dimensionality
Expensive range queries implemented with LSH
Sparseness of the data variable bandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
LSH-based data structure
bull Choose L random partitionsEach partition includes K pairs
(dkvk)bull For each point we check
kdi vxK
It Partitions the data into cells
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing the optimal K and L
bull For a query q compute smallest number of distances to points in its buckets
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
points extra includemight big toois L ifbut
missed bemight points small toois L If
cell ain points ofnumber smaller k Large
C
l
l
CC
dC
LNN
dKnN
)1(
C
C
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
structure data theof resolution thedetermines
decreases but increases increases L As
C
CC
Choosing optimal K and LDetermine accurately the KNN for m randomly-selected data points
distance (bandwidth)
Choose error threshold
The optimal K and L should satisfy
the approximate distance
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing optimal K and Lbull For each K estimate the error forbull In one run for all Lrsquos find the minimal L satisfying the constraint L(K)bull Minimize time t(KL(K))
minimum
Approximationerror for KL
L(K) for =005 Running timet[KL(K)]
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Data driven partitions
bull In original LSH cut values are random in the range of the databull Suggestion Randomly select a point from the data and use one of its coordinates as the cut value
uniform data driven pointsbucket distribution
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Additional speedup
aggregate)an of typea like is (C mode same
the toconverge willCin points all that Assume
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
C
C
Speedup results
65536 points 1638 points sampled k=100
Food for thought
Low dimension High dimension
A thought for foodhellipbull Choose K L by sample learning or take the traditionalbull Can one estimate K L without samplingbull A thought for food does it help to know the data
dimensionality or the data manifoldbull Intuitively dimensionality implies the number of hash
functions neededbull The catch efficient dimensionality learning requires
KNN
1530 cookieshellip
Summary
bull LSH suggests a compromise on accuracy for the gain of complexity
bull Applications that involve massive data in high dimension require the LSH fast performance
bull Extension of the LSH to different spaces (PSH)
bull Learning the LSH parameters and hash functions for different applications
Conclusion
bull but at the endeverything depends on your data set
bull Try it at homendash Visit
httpwebmiteduandoniwwwLSHindexhtmlndash Email Alex Andoni Andonimitedundash Test over your own data
(C code under Red Hat Linux )
Thanks
bull Ilan Shimshoni (Haifa)
bull Mohamad Hegaze (Weizmann)
bull Alex Andoni (MIT)
bull Mica and Denis
- k-Nearest Neighbors Search in High Dimensions
- Outline
- Nearest Neighbor Search Problem definition
- Applications
- Naiumlve solution
- Common solution
- When to use nearest neighbor
- Nearest Neighbor
- r - Nearest Neighbor
- Slide 10
- The simplest solution
- Quadtree
- Quadtree - structure
- Quadtree - Query
- Quadtree ndash Pitfall1
- Slide 16
- Quadtree ndash pitfall 2
- Space partition based algorithms
- Slide 19
- Curse of dimensionality
- Curse of dimensionality Some intuition
- Slide 22
- Preview
- Hash function
- Slide 25
- Slide 26
- Recall r - Nearest Neighbor
- Locality sensitive hashing
- Slide 29
- Hamming Space
- Slide 31
- L1 to Hamming Space Embedding
- Slide 33
- Construction
- Query
- Alternative intuition random projections
- Slide 37
- Slide 38
- Slide 39
- k samplings
- Repeating
- Repeating L times
- Slide 43
- Secondary hashing
- The above hashing is locality-sensitive
- Slide 46
- Direct L2 solution
- Central limit theorem
- Slide 49
- Slide 50
- Norm Distance
- Slide 52
- The full Hashing
- Slide 54
- Slide 55
- Slide 56
- Generalization P-Stable distribution
- P-Stable summary
- Parameters selection
- Parameters selection hellip
- hellip Parameters selection
- Pros amp Cons
- Conclusion
- LSH - Applications
- Motivation
- Slide 66
- Given an image x what are the parameters θ in this image ie angles of joints orientation of the body etc1048698
- Ingredients
- Example based learning
- Slide 70
- The image features
- PSH The basic assumption
- Insight Manifolds
- Slide 74
- Parameter Sensitive Hashing (PSH)
- Slide 76
- Slide 77
- Slide 78
- Slide 79
- Local Weighted Regression (LWR)
- Results
- Slide 82
- Results ndash real data
- Slide 84
- Slide 85
- Fast pose estimation - summary
- Food for Thought
- Food for Thought Point Location in Different Spheres (PLDS)
- Motivation
- Slide 90
- Slide 91
- Slide 92
- Slide 93
- Image segmentation algorithm
- Slide 95
- Filtering examples
- Segmentation examples
- Mean-shift in high dimensions
- LSH-based data structure
- Choosing the optimal K and L
- Slide 101
- Choosing optimal K and L
- Slide 103
- Data driven partitions
- Additional speedup
- Speedup results
- Food for thought
- A thought for foodhellip
- Summary
- Slide 110
- Thanks
-
![Page 35: K-Nearest Neighbors Search in High Dimensions Tomer Peled Dan Kushnir Tell me who your neighbors are, and I'll know who you are.](https://reader036.fdocuments.us/reader036/viewer/2022062421/56649c765503460f94929f53/html5/thumbnails/35.jpg)
Query
1 2 L
q
Alternative intuition random projections
p
8
C=11
1111111100011000000000
2
1111111100011000000000
drsquo=Cd
Alternative intuition random projections
8
C=11
1111111100011000000000
2
1111111100011000000000
p
Alternative intuition random projections
8
C=11
1111111100011000000000
2
1111111100011000000000
p
Alternative intuition random projections
101
11000000000 111111110000 111000000000 111111110001
000
100
110
001
101
111
2233 BucketsBucketsp
k samplings
Repeating
Repeating L times
Repeating L times
Secondary hashing
Support volume tuning
dataset-size vs storage volume
2k buckets
011
Size=B
M Buckets
Simple Hashing
MB=αn α=2
Skip
The above hashing is locality-sensitive
bullProbability (pq in same bucket)=
k=1 k=2
Distance (qpi) Distance (qpi)
Pro
babi
lity Pr
Adopted from Piotr Indykrsquos slides
kqp
dimensions
)(Distance1
Preview
bullGeneral Solution ndash Locality sensitive hashing
bullImplementation for Hamming space
bullGeneralization to l2
Direct L2 solution
bullNew hashing function
bullStill based on sampling
bullUsing mathematical trick
bullP-stable distribution for Lp distance bullGaussian distribution for L2 distance
Central limit theorem
v1 +v2 hellip+vn =+hellip
(Weighted Gaussians) = Weighted Gaussian
Central limit theorem
v1vn = Real Numbers
X1Xn = Independent Identically Distributed(iid)
+v2 X2 hellip+vn Xn =+hellipv1 X1
Central limit theorem
XvXvi
ii
ii
21
2||
Dot Product Norm
Norm Distance
XvuXvXui
iii
iii
ii
21
2||
Features vector 1
Features vector 2 Distance
Norm Distance
XvuXvXui
iii
iii
ii
21
2||
Dot Product
Dot Product Distance
The full Hashing
w
bvavh ba )(
[34 82 21]1
227742
d
d random numbers
+b
phaseRandom[0w]
wDiscretization step
Features vector
The full Hashing
w
bvavh ba )(
+34
100
7944
7900 8000 8100 82007800
The full Hashing
w
bvavh ba )(
+34
phaseRandom[0w]
100Discretization step
7944
The full Hashing
w
bvavh ba )(
a1 v d
iid from p-stable distribution
+b
phaseRandom[0w]
wDiscretization step
Features vector
Generalization P-Stable distribution
bullLp p=eps2
bullGeneralized Central Limit Theorem
bullP-stable distributionCauchy for L2
bullL2
bullCentral Limit Theorem
bullGaussian (normal) distribution
P-Stable summary
bullWorks for bullGeneralizes to 0ltplt=2
bullImproves query time
Query time = O (dn1(1+)log(n) ) O (dn1(1+)^2log(n) )
r - Nearest Neighbor
Latest resultsReported in Email by
Alexander Andoni
Parameters selection
bull90 Probability Best quarry time performance
For Euclidean Space
Parameters selectionhellip
For Euclidean Space
bullSingle projection hit an - Nearest Neighbor with Pr=p1
bullk projections hits an - Nearest Neighbor with Pr=p1k
bullL hashings fail to collide with Pr=(1-p1k)L
bullTo ensure Collision (eg 1-δge90)
bull1( -1-p1k)Lge 1-δ)1log(
)log(
1kp
L
L
Reject Non-NeighborsAccept Neighbors
hellipParameters selection
K
k
time Candidates verification Candidates extraction
Better Query Time than Spatial Data Structures
Scales well to higher dimensions and larger data size ( Sub-linear dependence )
Predictable running time
Extra storage over-head
Inefficient for data with distances concentrated around average
works best for Hamming distance (although can be generalized to Euclidean space)
In secondary storage linear scan is pretty much all we can do (for high dim)
requires radius r to be fixed in advance
Pros amp Cons
From Pioter Indyk slides
Conclusion
bullbut at the endeverything depends on your data set
bullTry it at homendashVisit
httpwebmiteduandoniwwwLSHindexhtml
ndashEmail Alex AndoniAndonimitedundashTest over your own data
(C code under Red Hat Linux )
LSH - Applicationsbull Searching video clips in databases (Hierarchical Non-Uniform Locality Sensitive
Hashing and Its Application to Video Identificationldquo Yang Ooi Sun)
bull Searching image databases (see the following)
bull Image segmentation (see the following)
bull Image classification (ldquoDiscriminant adaptive Nearest Neighbor Classificationrdquo T Hastie R Tibshirani)
bull Texture classification (see the following)
bull Clustering (see the following)
bull Embedding and manifold learning (LLE and many others)
bull Compression ndash vector quantizationbull Search engines (ldquoLSH Forest SelfTuning Indexes for Similarity Searchrdquo M Bawa T Condie P Ganesanrdquo)
bull Genomics (ldquoEfficient Large-Scale Sequence Comparison by Locality-Sensitive Hashingrdquo J Buhler)
bull In short whenever K-Nearest Neighbors (KNN) are needed
Motivation
bull A variety of procedures in learning require KNN computation
bull KNN search is a computational bottleneck
bull LSH provides a fast approximate solution to the problem
bull LSH requires hash function construction and parameter tunning
Outline
Fast Pose Estimation with Parameter Sensitive Hashing G Shakhnarovich P Viola and T Darrell
bull Finding sensitive hash functions
Mean Shift Based Clustering in HighDimensions A Texture Classification Example
B Georgescu I Shimshoni and P Meer
bull Tuning LSH parametersbull LSH data structure is used for algorithm
speedups
Given an image x what are the parameters θ in this image
ie angles of joints orientation of the body etc1048698
The Problem
Fast Pose Estimation with Parameter Sensitive Hashing
G Shakhnarovich P Viola and T Darrell
i
Ingredients
bull Input query image with unknown angles (parameters)
bull Database of human poses with known anglesbull Image feature extractor ndash edge detector
bull Distance metric in feature space dx
bull Distance metric in angles space
m
i
iid1
2121 )cos(1)(
Example based learning
bull Construct a database of example images with their known angles
bull Given a query image run your favorite feature extractorbull Compute KNN from databasebull Use these KNNs to compute the average angles of the
query
Input queryFind KNN in database of examples
Output Average angles of KNN
Input Query
Features extraction
Processed query
PSH (LSH)
Database of examples
The algorithm flow
LWR (Regression)
Output Match
The image features
B A
Axx 4107 )(
4
3
2
4 0
Image features are multi-scale edge histograms
Feature Extraction PSH LWR
PSH The basic assumption
There are two metric spaces here feature space ( )
and parameter space ( )
We want similarity to be measured in the angles
space whereas LSH works on the feature space
bull Assumption The feature space is closely related to the parameter space
xd
d
Feature Extraction PSH LWR
Insight Manifolds
bull Manifold is a space in which every point has a neighborhood resembling a Euclid space
bull But global structure may be complicated curved
bull For example lines are 1D manifolds planes are 2D manifolds etc
Feature Extraction PSH LWR
Parameters Space (angles)
Feature Space
q
Is this Magic
Parameter Sensitive Hashing (PSH)
The trick
Estimate performance of different hash functions on examples and select those sensitive to
The hash functions are applied in feature space but the KNN are valid in angle space
d
Feature Extraction PSH LWR
Label pairs of examples with similar angles
Define hash functions h on feature space
Feature Extraction PSH LWR
Predict labeling of similarnon-similar examples by using h
Compare labeling
If labeling by h is goodaccept h else change h
PSH as a classification problem
+1 +1 -1 -1
(r=025)
Labels
)1()( if 1
)( if 1y
labeled is
)x()(x examples ofpair A
ij
ji
rd
rd
ji
ji
ji
Feature Extraction PSH LWR
otherwise 1-
T(x) if 1)(
xh T
A binary hash functionfeatures
otherwise 1
if 1ˆ
labels ePredict th
)(xh)(xh)x(xy
jTiTjih
Feature Extraction PSH LWR
Feature
Feature Extraction PSH LWR
sconstraint iesprobabilit with thelabeling true thepredicts that Tbest theFind
themseparateor bin
same in the examplesboth place willTh
)(xT
Local Weighted Regression (LWR)bull Given a query image PSH returns
KNNs
bull LWR uses the KNN to compute a weighted average of the estimated angles of the query
weightdist
iXiixNx
xxdKxgdi
0)(
))(())((minarg0
Feature Extraction PSH LWR
Results
Synthetic data were generated
bull 13 angles 1 for rotation of the torso 12 for joints
bull 150000 images
bull Nuisance parameters added clothing illumination face expression
bull 1775000 example pairs
bull Selected 137 out of 5123 meaningful features (how)
18 bit hash functions (k) 150 hash tables (l)
bull Test on 1000 synthetic examplesbull PSH searched only 34 of the data per query
bull Without selection needed 40 bits and
1000 hash tables
Recall P1 is prob of positive hashP2 is prob of bad hashB is the max number of pts in a bucket
Results ndash real data
bull 800 images
bull Processed by a segmentation algorithm
bull 13 of the data were searched
Results ndash real data
Interesting mismatches
Fast pose estimation - summary
bull Fast way to compute the angles of human body figure
bull Moving from one representation space to another
bull Training a sensitive hash function
bull KNN smart averaging
Food for Thought
bull The basic assumption may be problematic (distance metric representations)
bull The training set should be dense
bull Texture and clutter
bull General some features are more important than others and should be weighted
Food for Thought Point Location in Different Spheres (PLDS)
bull Given n spheres in Rd centered at P=p1hellippn
with radii r1helliprn
bull Goal given a query q preprocess the points in P to find point pi that its sphere lsquocoverrsquo the query q
qpi
ri
Courtesy of Mohamad Hegaze
Motivationbull Clustering high dimensional data by using local
density measurements (eg feature space)bull Statistical curse of dimensionality
sparseness of the databull Computational curse of dimensionality
expensive range queriesbull LSH parameters should be adjusted for optimal
performance
Mean-Shift Based Clustering in High Dimensions A Texture Classification Example
B Georgescu I Shimshoni and P Meer
Outline
bull Mean-shift in a nutshell + examples
Our scope
bull Mean-shift in high dimensions ndash using LSH
bull Speedups1 Finding optimal LSH parameters
2 Data-driven partitions into buckets
3 Additional speedup by using LSH data structure
Mean-Shift in a Nutshellbandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
point
KNN in mean-shift
Bandwidth should be inversely proportional to the density in the region
high density - small bandwidth low density - large bandwidth
Based on kth nearest neighbor of the point
The bandwidth is
Adaptive mean-shift vs non-adaptive
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Image segmentation algorithm1 Input Data in 5D (3 color + 2 xy) or 3D (1 gray +2 xy)2 Resolution controlled by the bandwidth hs (spatial) hr (color)3 Apply filtering
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
3D
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Image segmentation algorithm
original segmented
filtered
Filtering pixel value of the nearest mode
Mean-shift trajectories
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
original squirrel filtered
original baboon filtered
Filtering examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Segmentation examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Mean-shift in high dimensions
Computational curse of dimensionality
Statistical curse of dimensionality
Expensive range queries implemented with LSH
Sparseness of the data variable bandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
LSH-based data structure
bull Choose L random partitionsEach partition includes K pairs
(dkvk)bull For each point we check
kdi vxK
It Partitions the data into cells
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing the optimal K and L
bull For a query q compute smallest number of distances to points in its buckets
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
points extra includemight big toois L ifbut
missed bemight points small toois L If
cell ain points ofnumber smaller k Large
C
l
l
CC
dC
LNN
dKnN
)1(
C
C
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
structure data theof resolution thedetermines
decreases but increases increases L As
C
CC
Choosing optimal K and LDetermine accurately the KNN for m randomly-selected data points
distance (bandwidth)
Choose error threshold
The optimal K and L should satisfy
the approximate distance
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing optimal K and Lbull For each K estimate the error forbull In one run for all Lrsquos find the minimal L satisfying the constraint L(K)bull Minimize time t(KL(K))
minimum
Approximationerror for KL
L(K) for =005 Running timet[KL(K)]
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Data driven partitions
bull In original LSH cut values are random in the range of the databull Suggestion Randomly select a point from the data and use one of its coordinates as the cut value
uniform data driven pointsbucket distribution
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Additional speedup
aggregate)an of typea like is (C mode same
the toconverge willCin points all that Assume
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
C
C
Speedup results
65536 points 1638 points sampled k=100
Food for thought
Low dimension High dimension
A thought for foodhellipbull Choose K L by sample learning or take the traditionalbull Can one estimate K L without samplingbull A thought for food does it help to know the data
dimensionality or the data manifoldbull Intuitively dimensionality implies the number of hash
functions neededbull The catch efficient dimensionality learning requires
KNN
1530 cookieshellip
Summary
bull LSH suggests a compromise on accuracy for the gain of complexity
bull Applications that involve massive data in high dimension require the LSH fast performance
bull Extension of the LSH to different spaces (PSH)
bull Learning the LSH parameters and hash functions for different applications
Conclusion
bull but at the endeverything depends on your data set
bull Try it at homendash Visit
httpwebmiteduandoniwwwLSHindexhtmlndash Email Alex Andoni Andonimitedundash Test over your own data
(C code under Red Hat Linux )
Thanks
bull Ilan Shimshoni (Haifa)
bull Mohamad Hegaze (Weizmann)
bull Alex Andoni (MIT)
bull Mica and Denis
- k-Nearest Neighbors Search in High Dimensions
- Outline
- Nearest Neighbor Search Problem definition
- Applications
- Naiumlve solution
- Common solution
- When to use nearest neighbor
- Nearest Neighbor
- r - Nearest Neighbor
- Slide 10
- The simplest solution
- Quadtree
- Quadtree - structure
- Quadtree - Query
- Quadtree ndash Pitfall1
- Slide 16
- Quadtree ndash pitfall 2
- Space partition based algorithms
- Slide 19
- Curse of dimensionality
- Curse of dimensionality Some intuition
- Slide 22
- Preview
- Hash function
- Slide 25
- Slide 26
- Recall r - Nearest Neighbor
- Locality sensitive hashing
- Slide 29
- Hamming Space
- Slide 31
- L1 to Hamming Space Embedding
- Slide 33
- Construction
- Query
- Alternative intuition random projections
- Slide 37
- Slide 38
- Slide 39
- k samplings
- Repeating
- Repeating L times
- Slide 43
- Secondary hashing
- The above hashing is locality-sensitive
- Slide 46
- Direct L2 solution
- Central limit theorem
- Slide 49
- Slide 50
- Norm Distance
- Slide 52
- The full Hashing
- Slide 54
- Slide 55
- Slide 56
- Generalization P-Stable distribution
- P-Stable summary
- Parameters selection
- Parameters selection hellip
- hellip Parameters selection
- Pros amp Cons
- Conclusion
- LSH - Applications
- Motivation
- Slide 66
- Given an image x what are the parameters θ in this image ie angles of joints orientation of the body etc1048698
- Ingredients
- Example based learning
- Slide 70
- The image features
- PSH The basic assumption
- Insight Manifolds
- Slide 74
- Parameter Sensitive Hashing (PSH)
- Slide 76
- Slide 77
- Slide 78
- Slide 79
- Local Weighted Regression (LWR)
- Results
- Slide 82
- Results ndash real data
- Slide 84
- Slide 85
- Fast pose estimation - summary
- Food for Thought
- Food for Thought Point Location in Different Spheres (PLDS)
- Motivation
- Slide 90
- Slide 91
- Slide 92
- Slide 93
- Image segmentation algorithm
- Slide 95
- Filtering examples
- Segmentation examples
- Mean-shift in high dimensions
- LSH-based data structure
- Choosing the optimal K and L
- Slide 101
- Choosing optimal K and L
- Slide 103
- Data driven partitions
- Additional speedup
- Speedup results
- Food for thought
- A thought for foodhellip
- Summary
- Slide 110
- Thanks
-
![Page 36: K-Nearest Neighbors Search in High Dimensions Tomer Peled Dan Kushnir Tell me who your neighbors are, and I'll know who you are.](https://reader036.fdocuments.us/reader036/viewer/2022062421/56649c765503460f94929f53/html5/thumbnails/36.jpg)
Alternative intuition random projections
p
8
C=11
1111111100011000000000
2
1111111100011000000000
drsquo=Cd
Alternative intuition random projections
8
C=11
1111111100011000000000
2
1111111100011000000000
p
Alternative intuition random projections
8
C=11
1111111100011000000000
2
1111111100011000000000
p
Alternative intuition random projections
101
11000000000 111111110000 111000000000 111111110001
000
100
110
001
101
111
2233 BucketsBucketsp
k samplings
Repeating
Repeating L times
Repeating L times
Secondary hashing
Support volume tuning
dataset-size vs storage volume
2k buckets
011
Size=B
M Buckets
Simple Hashing
MB=αn α=2
Skip
The above hashing is locality-sensitive
bullProbability (pq in same bucket)=
k=1 k=2
Distance (qpi) Distance (qpi)
Pro
babi
lity Pr
Adopted from Piotr Indykrsquos slides
kqp
dimensions
)(Distance1
Preview
bullGeneral Solution ndash Locality sensitive hashing
bullImplementation for Hamming space
bullGeneralization to l2
Direct L2 solution
bullNew hashing function
bullStill based on sampling
bullUsing mathematical trick
bullP-stable distribution for Lp distance bullGaussian distribution for L2 distance
Central limit theorem
v1 +v2 hellip+vn =+hellip
(Weighted Gaussians) = Weighted Gaussian
Central limit theorem
v1vn = Real Numbers
X1Xn = Independent Identically Distributed(iid)
+v2 X2 hellip+vn Xn =+hellipv1 X1
Central limit theorem
XvXvi
ii
ii
21
2||
Dot Product Norm
Norm Distance
XvuXvXui
iii
iii
ii
21
2||
Features vector 1
Features vector 2 Distance
Norm Distance
XvuXvXui
iii
iii
ii
21
2||
Dot Product
Dot Product Distance
The full Hashing
w
bvavh ba )(
[34 82 21]1
227742
d
d random numbers
+b
phaseRandom[0w]
wDiscretization step
Features vector
The full Hashing
w
bvavh ba )(
+34
100
7944
7900 8000 8100 82007800
The full Hashing
w
bvavh ba )(
+34
phaseRandom[0w]
100Discretization step
7944
The full Hashing
w
bvavh ba )(
a1 v d
iid from p-stable distribution
+b
phaseRandom[0w]
wDiscretization step
Features vector
Generalization P-Stable distribution
bullLp p=eps2
bullGeneralized Central Limit Theorem
bullP-stable distributionCauchy for L2
bullL2
bullCentral Limit Theorem
bullGaussian (normal) distribution
P-Stable summary
bullWorks for bullGeneralizes to 0ltplt=2
bullImproves query time
Query time = O (dn1(1+)log(n) ) O (dn1(1+)^2log(n) )
r - Nearest Neighbor
Latest resultsReported in Email by
Alexander Andoni
Parameters selection
bull90 Probability Best quarry time performance
For Euclidean Space
Parameters selectionhellip
For Euclidean Space
bullSingle projection hit an - Nearest Neighbor with Pr=p1
bullk projections hits an - Nearest Neighbor with Pr=p1k
bullL hashings fail to collide with Pr=(1-p1k)L
bullTo ensure Collision (eg 1-δge90)
bull1( -1-p1k)Lge 1-δ)1log(
)log(
1kp
L
L
Reject Non-NeighborsAccept Neighbors
hellipParameters selection
K
k
time Candidates verification Candidates extraction
Better Query Time than Spatial Data Structures
Scales well to higher dimensions and larger data size ( Sub-linear dependence )
Predictable running time
Extra storage over-head
Inefficient for data with distances concentrated around average
works best for Hamming distance (although can be generalized to Euclidean space)
In secondary storage linear scan is pretty much all we can do (for high dim)
requires radius r to be fixed in advance
Pros amp Cons
From Pioter Indyk slides
Conclusion
bullbut at the endeverything depends on your data set
bullTry it at homendashVisit
httpwebmiteduandoniwwwLSHindexhtml
ndashEmail Alex AndoniAndonimitedundashTest over your own data
(C code under Red Hat Linux )
LSH - Applicationsbull Searching video clips in databases (Hierarchical Non-Uniform Locality Sensitive
Hashing and Its Application to Video Identificationldquo Yang Ooi Sun)
bull Searching image databases (see the following)
bull Image segmentation (see the following)
bull Image classification (ldquoDiscriminant adaptive Nearest Neighbor Classificationrdquo T Hastie R Tibshirani)
bull Texture classification (see the following)
bull Clustering (see the following)
bull Embedding and manifold learning (LLE and many others)
bull Compression ndash vector quantizationbull Search engines (ldquoLSH Forest SelfTuning Indexes for Similarity Searchrdquo M Bawa T Condie P Ganesanrdquo)
bull Genomics (ldquoEfficient Large-Scale Sequence Comparison by Locality-Sensitive Hashingrdquo J Buhler)
bull In short whenever K-Nearest Neighbors (KNN) are needed
Motivation
bull A variety of procedures in learning require KNN computation
bull KNN search is a computational bottleneck
bull LSH provides a fast approximate solution to the problem
bull LSH requires hash function construction and parameter tunning
Outline
Fast Pose Estimation with Parameter Sensitive Hashing G Shakhnarovich P Viola and T Darrell
bull Finding sensitive hash functions
Mean Shift Based Clustering in HighDimensions A Texture Classification Example
B Georgescu I Shimshoni and P Meer
bull Tuning LSH parametersbull LSH data structure is used for algorithm
speedups
Given an image x what are the parameters θ in this image
ie angles of joints orientation of the body etc1048698
The Problem
Fast Pose Estimation with Parameter Sensitive Hashing
G Shakhnarovich P Viola and T Darrell
i
Ingredients
bull Input query image with unknown angles (parameters)
bull Database of human poses with known anglesbull Image feature extractor ndash edge detector
bull Distance metric in feature space dx
bull Distance metric in angles space
m
i
iid1
2121 )cos(1)(
Example based learning
bull Construct a database of example images with their known angles
bull Given a query image run your favorite feature extractorbull Compute KNN from databasebull Use these KNNs to compute the average angles of the
query
Input queryFind KNN in database of examples
Output Average angles of KNN
Input Query
Features extraction
Processed query
PSH (LSH)
Database of examples
The algorithm flow
LWR (Regression)
Output Match
The image features
B A
Axx 4107 )(
4
3
2
4 0
Image features are multi-scale edge histograms
Feature Extraction PSH LWR
PSH The basic assumption
There are two metric spaces here feature space ( )
and parameter space ( )
We want similarity to be measured in the angles
space whereas LSH works on the feature space
bull Assumption The feature space is closely related to the parameter space
xd
d
Feature Extraction PSH LWR
Insight Manifolds
bull Manifold is a space in which every point has a neighborhood resembling a Euclid space
bull But global structure may be complicated curved
bull For example lines are 1D manifolds planes are 2D manifolds etc
Feature Extraction PSH LWR
Parameters Space (angles)
Feature Space
q
Is this Magic
Parameter Sensitive Hashing (PSH)
The trick
Estimate performance of different hash functions on examples and select those sensitive to
The hash functions are applied in feature space but the KNN are valid in angle space
d
Feature Extraction PSH LWR
Label pairs of examples with similar angles
Define hash functions h on feature space
Feature Extraction PSH LWR
Predict labeling of similarnon-similar examples by using h
Compare labeling
If labeling by h is goodaccept h else change h
PSH as a classification problem
+1 +1 -1 -1
(r=025)
Labels
)1()( if 1
)( if 1y
labeled is
)x()(x examples ofpair A
ij
ji
rd
rd
ji
ji
ji
Feature Extraction PSH LWR
otherwise 1-
T(x) if 1)(
xh T
A binary hash functionfeatures
otherwise 1
if 1ˆ
labels ePredict th
)(xh)(xh)x(xy
jTiTjih
Feature Extraction PSH LWR
Feature
Feature Extraction PSH LWR
sconstraint iesprobabilit with thelabeling true thepredicts that Tbest theFind
themseparateor bin
same in the examplesboth place willTh
)(xT
Local Weighted Regression (LWR)bull Given a query image PSH returns
KNNs
bull LWR uses the KNN to compute a weighted average of the estimated angles of the query
weightdist
iXiixNx
xxdKxgdi
0)(
))(())((minarg0
Feature Extraction PSH LWR
Results
Synthetic data were generated
bull 13 angles 1 for rotation of the torso 12 for joints
bull 150000 images
bull Nuisance parameters added clothing illumination face expression
bull 1775000 example pairs
bull Selected 137 out of 5123 meaningful features (how)
18 bit hash functions (k) 150 hash tables (l)
bull Test on 1000 synthetic examplesbull PSH searched only 34 of the data per query
bull Without selection needed 40 bits and
1000 hash tables
Recall P1 is prob of positive hashP2 is prob of bad hashB is the max number of pts in a bucket
Results ndash real data
bull 800 images
bull Processed by a segmentation algorithm
bull 13 of the data were searched
Results ndash real data
Interesting mismatches
Fast pose estimation - summary
bull Fast way to compute the angles of human body figure
bull Moving from one representation space to another
bull Training a sensitive hash function
bull KNN smart averaging
Food for Thought
bull The basic assumption may be problematic (distance metric representations)
bull The training set should be dense
bull Texture and clutter
bull General some features are more important than others and should be weighted
Food for Thought Point Location in Different Spheres (PLDS)
bull Given n spheres in Rd centered at P=p1hellippn
with radii r1helliprn
bull Goal given a query q preprocess the points in P to find point pi that its sphere lsquocoverrsquo the query q
qpi
ri
Courtesy of Mohamad Hegaze
Motivationbull Clustering high dimensional data by using local
density measurements (eg feature space)bull Statistical curse of dimensionality
sparseness of the databull Computational curse of dimensionality
expensive range queriesbull LSH parameters should be adjusted for optimal
performance
Mean-Shift Based Clustering in High Dimensions A Texture Classification Example
B Georgescu I Shimshoni and P Meer
Outline
bull Mean-shift in a nutshell + examples
Our scope
bull Mean-shift in high dimensions ndash using LSH
bull Speedups1 Finding optimal LSH parameters
2 Data-driven partitions into buckets
3 Additional speedup by using LSH data structure
Mean-Shift in a Nutshellbandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
point
KNN in mean-shift
Bandwidth should be inversely proportional to the density in the region
high density - small bandwidth low density - large bandwidth
Based on kth nearest neighbor of the point
The bandwidth is
Adaptive mean-shift vs non-adaptive
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Image segmentation algorithm1 Input Data in 5D (3 color + 2 xy) or 3D (1 gray +2 xy)2 Resolution controlled by the bandwidth hs (spatial) hr (color)3 Apply filtering
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
3D
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Image segmentation algorithm
original segmented
filtered
Filtering pixel value of the nearest mode
Mean-shift trajectories
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
original squirrel filtered
original baboon filtered
Filtering examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Segmentation examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Mean-shift in high dimensions
Computational curse of dimensionality
Statistical curse of dimensionality
Expensive range queries implemented with LSH
Sparseness of the data variable bandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
LSH-based data structure
bull Choose L random partitionsEach partition includes K pairs
(dkvk)bull For each point we check
kdi vxK
It Partitions the data into cells
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing the optimal K and L
bull For a query q compute smallest number of distances to points in its buckets
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
points extra includemight big toois L ifbut
missed bemight points small toois L If
cell ain points ofnumber smaller k Large
C
l
l
CC
dC
LNN
dKnN
)1(
C
C
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
structure data theof resolution thedetermines
decreases but increases increases L As
C
CC
Choosing optimal K and LDetermine accurately the KNN for m randomly-selected data points
distance (bandwidth)
Choose error threshold
The optimal K and L should satisfy
the approximate distance
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing optimal K and Lbull For each K estimate the error forbull In one run for all Lrsquos find the minimal L satisfying the constraint L(K)bull Minimize time t(KL(K))
minimum
Approximationerror for KL
L(K) for =005 Running timet[KL(K)]
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Data driven partitions
bull In original LSH cut values are random in the range of the databull Suggestion Randomly select a point from the data and use one of its coordinates as the cut value
uniform data driven pointsbucket distribution
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Additional speedup
aggregate)an of typea like is (C mode same
the toconverge willCin points all that Assume
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
C
C
Speedup results
65536 points 1638 points sampled k=100
Food for thought
Low dimension High dimension
A thought for foodhellipbull Choose K L by sample learning or take the traditionalbull Can one estimate K L without samplingbull A thought for food does it help to know the data
dimensionality or the data manifoldbull Intuitively dimensionality implies the number of hash
functions neededbull The catch efficient dimensionality learning requires
KNN
1530 cookieshellip
Summary
bull LSH suggests a compromise on accuracy for the gain of complexity
bull Applications that involve massive data in high dimension require the LSH fast performance
bull Extension of the LSH to different spaces (PSH)
bull Learning the LSH parameters and hash functions for different applications
Conclusion
bull but at the endeverything depends on your data set
bull Try it at homendash Visit
httpwebmiteduandoniwwwLSHindexhtmlndash Email Alex Andoni Andonimitedundash Test over your own data
(C code under Red Hat Linux )
Thanks
bull Ilan Shimshoni (Haifa)
bull Mohamad Hegaze (Weizmann)
bull Alex Andoni (MIT)
bull Mica and Denis
- k-Nearest Neighbors Search in High Dimensions
- Outline
- Nearest Neighbor Search Problem definition
- Applications
- Naiumlve solution
- Common solution
- When to use nearest neighbor
- Nearest Neighbor
- r - Nearest Neighbor
- Slide 10
- The simplest solution
- Quadtree
- Quadtree - structure
- Quadtree - Query
- Quadtree ndash Pitfall1
- Slide 16
- Quadtree ndash pitfall 2
- Space partition based algorithms
- Slide 19
- Curse of dimensionality
- Curse of dimensionality Some intuition
- Slide 22
- Preview
- Hash function
- Slide 25
- Slide 26
- Recall r - Nearest Neighbor
- Locality sensitive hashing
- Slide 29
- Hamming Space
- Slide 31
- L1 to Hamming Space Embedding
- Slide 33
- Construction
- Query
- Alternative intuition random projections
- Slide 37
- Slide 38
- Slide 39
- k samplings
- Repeating
- Repeating L times
- Slide 43
- Secondary hashing
- The above hashing is locality-sensitive
- Slide 46
- Direct L2 solution
- Central limit theorem
- Slide 49
- Slide 50
- Norm Distance
- Slide 52
- The full Hashing
- Slide 54
- Slide 55
- Slide 56
- Generalization P-Stable distribution
- P-Stable summary
- Parameters selection
- Parameters selection hellip
- hellip Parameters selection
- Pros amp Cons
- Conclusion
- LSH - Applications
- Motivation
- Slide 66
- Given an image x what are the parameters θ in this image ie angles of joints orientation of the body etc1048698
- Ingredients
- Example based learning
- Slide 70
- The image features
- PSH The basic assumption
- Insight Manifolds
- Slide 74
- Parameter Sensitive Hashing (PSH)
- Slide 76
- Slide 77
- Slide 78
- Slide 79
- Local Weighted Regression (LWR)
- Results
- Slide 82
- Results ndash real data
- Slide 84
- Slide 85
- Fast pose estimation - summary
- Food for Thought
- Food for Thought Point Location in Different Spheres (PLDS)
- Motivation
- Slide 90
- Slide 91
- Slide 92
- Slide 93
- Image segmentation algorithm
- Slide 95
- Filtering examples
- Segmentation examples
- Mean-shift in high dimensions
- LSH-based data structure
- Choosing the optimal K and L
- Slide 101
- Choosing optimal K and L
- Slide 103
- Data driven partitions
- Additional speedup
- Speedup results
- Food for thought
- A thought for foodhellip
- Summary
- Slide 110
- Thanks
-
![Page 37: K-Nearest Neighbors Search in High Dimensions Tomer Peled Dan Kushnir Tell me who your neighbors are, and I'll know who you are.](https://reader036.fdocuments.us/reader036/viewer/2022062421/56649c765503460f94929f53/html5/thumbnails/37.jpg)
Alternative intuition random projections
8
C=11
1111111100011000000000
2
1111111100011000000000
p
Alternative intuition random projections
8
C=11
1111111100011000000000
2
1111111100011000000000
p
Alternative intuition random projections
101
11000000000 111111110000 111000000000 111111110001
000
100
110
001
101
111
2233 BucketsBucketsp
k samplings
Repeating
Repeating L times
Repeating L times
Secondary hashing
Support volume tuning
dataset-size vs storage volume
2k buckets
011
Size=B
M Buckets
Simple Hashing
MB=αn α=2
Skip
The above hashing is locality-sensitive
bullProbability (pq in same bucket)=
k=1 k=2
Distance (qpi) Distance (qpi)
Pro
babi
lity Pr
Adopted from Piotr Indykrsquos slides
kqp
dimensions
)(Distance1
Preview
bullGeneral Solution ndash Locality sensitive hashing
bullImplementation for Hamming space
bullGeneralization to l2
Direct L2 solution
bullNew hashing function
bullStill based on sampling
bullUsing mathematical trick
bullP-stable distribution for Lp distance bullGaussian distribution for L2 distance
Central limit theorem
v1 +v2 hellip+vn =+hellip
(Weighted Gaussians) = Weighted Gaussian
Central limit theorem
v1vn = Real Numbers
X1Xn = Independent Identically Distributed(iid)
+v2 X2 hellip+vn Xn =+hellipv1 X1
Central limit theorem
XvXvi
ii
ii
21
2||
Dot Product Norm
Norm Distance
XvuXvXui
iii
iii
ii
21
2||
Features vector 1
Features vector 2 Distance
Norm Distance
XvuXvXui
iii
iii
ii
21
2||
Dot Product
Dot Product Distance
The full Hashing
w
bvavh ba )(
[34 82 21]1
227742
d
d random numbers
+b
phaseRandom[0w]
wDiscretization step
Features vector
The full Hashing
w
bvavh ba )(
+34
100
7944
7900 8000 8100 82007800
The full Hashing
w
bvavh ba )(
+34
phaseRandom[0w]
100Discretization step
7944
The full Hashing
w
bvavh ba )(
a1 v d
iid from p-stable distribution
+b
phaseRandom[0w]
wDiscretization step
Features vector
Generalization P-Stable distribution
bullLp p=eps2
bullGeneralized Central Limit Theorem
bullP-stable distributionCauchy for L2
bullL2
bullCentral Limit Theorem
bullGaussian (normal) distribution
P-Stable summary
bullWorks for bullGeneralizes to 0ltplt=2
bullImproves query time
Query time = O (dn1(1+)log(n) ) O (dn1(1+)^2log(n) )
r - Nearest Neighbor
Latest resultsReported in Email by
Alexander Andoni
Parameters selection
bull90 Probability Best quarry time performance
For Euclidean Space
Parameters selectionhellip
For Euclidean Space
bullSingle projection hit an - Nearest Neighbor with Pr=p1
bullk projections hits an - Nearest Neighbor with Pr=p1k
bullL hashings fail to collide with Pr=(1-p1k)L
bullTo ensure Collision (eg 1-δge90)
bull1( -1-p1k)Lge 1-δ)1log(
)log(
1kp
L
L
Reject Non-NeighborsAccept Neighbors
hellipParameters selection
K
k
time Candidates verification Candidates extraction
Better Query Time than Spatial Data Structures
Scales well to higher dimensions and larger data size ( Sub-linear dependence )
Predictable running time
Extra storage over-head
Inefficient for data with distances concentrated around average
works best for Hamming distance (although can be generalized to Euclidean space)
In secondary storage linear scan is pretty much all we can do (for high dim)
requires radius r to be fixed in advance
Pros amp Cons
From Pioter Indyk slides
Conclusion
bullbut at the endeverything depends on your data set
bullTry it at homendashVisit
httpwebmiteduandoniwwwLSHindexhtml
ndashEmail Alex AndoniAndonimitedundashTest over your own data
(C code under Red Hat Linux )
LSH - Applicationsbull Searching video clips in databases (Hierarchical Non-Uniform Locality Sensitive
Hashing and Its Application to Video Identificationldquo Yang Ooi Sun)
bull Searching image databases (see the following)
bull Image segmentation (see the following)
bull Image classification (ldquoDiscriminant adaptive Nearest Neighbor Classificationrdquo T Hastie R Tibshirani)
bull Texture classification (see the following)
bull Clustering (see the following)
bull Embedding and manifold learning (LLE and many others)
bull Compression ndash vector quantizationbull Search engines (ldquoLSH Forest SelfTuning Indexes for Similarity Searchrdquo M Bawa T Condie P Ganesanrdquo)
bull Genomics (ldquoEfficient Large-Scale Sequence Comparison by Locality-Sensitive Hashingrdquo J Buhler)
bull In short whenever K-Nearest Neighbors (KNN) are needed
Motivation
bull A variety of procedures in learning require KNN computation
bull KNN search is a computational bottleneck
bull LSH provides a fast approximate solution to the problem
bull LSH requires hash function construction and parameter tunning
Outline
Fast Pose Estimation with Parameter Sensitive Hashing G Shakhnarovich P Viola and T Darrell
bull Finding sensitive hash functions
Mean Shift Based Clustering in HighDimensions A Texture Classification Example
B Georgescu I Shimshoni and P Meer
bull Tuning LSH parametersbull LSH data structure is used for algorithm
speedups
Given an image x what are the parameters θ in this image
ie angles of joints orientation of the body etc1048698
The Problem
Fast Pose Estimation with Parameter Sensitive Hashing
G Shakhnarovich P Viola and T Darrell
i
Ingredients
bull Input query image with unknown angles (parameters)
bull Database of human poses with known anglesbull Image feature extractor ndash edge detector
bull Distance metric in feature space dx
bull Distance metric in angles space
m
i
iid1
2121 )cos(1)(
Example based learning
bull Construct a database of example images with their known angles
bull Given a query image run your favorite feature extractorbull Compute KNN from databasebull Use these KNNs to compute the average angles of the
query
Input queryFind KNN in database of examples
Output Average angles of KNN
Input Query
Features extraction
Processed query
PSH (LSH)
Database of examples
The algorithm flow
LWR (Regression)
Output Match
The image features
B A
Axx 4107 )(
4
3
2
4 0
Image features are multi-scale edge histograms
Feature Extraction PSH LWR
PSH The basic assumption
There are two metric spaces here feature space ( )
and parameter space ( )
We want similarity to be measured in the angles
space whereas LSH works on the feature space
bull Assumption The feature space is closely related to the parameter space
xd
d
Feature Extraction PSH LWR
Insight Manifolds
bull Manifold is a space in which every point has a neighborhood resembling a Euclid space
bull But global structure may be complicated curved
bull For example lines are 1D manifolds planes are 2D manifolds etc
Feature Extraction PSH LWR
Parameters Space (angles)
Feature Space
q
Is this Magic
Parameter Sensitive Hashing (PSH)
The trick
Estimate performance of different hash functions on examples and select those sensitive to
The hash functions are applied in feature space but the KNN are valid in angle space
d
Feature Extraction PSH LWR
Label pairs of examples with similar angles
Define hash functions h on feature space
Feature Extraction PSH LWR
Predict labeling of similarnon-similar examples by using h
Compare labeling
If labeling by h is goodaccept h else change h
PSH as a classification problem
+1 +1 -1 -1
(r=025)
Labels
)1()( if 1
)( if 1y
labeled is
)x()(x examples ofpair A
ij
ji
rd
rd
ji
ji
ji
Feature Extraction PSH LWR
otherwise 1-
T(x) if 1)(
xh T
A binary hash functionfeatures
otherwise 1
if 1ˆ
labels ePredict th
)(xh)(xh)x(xy
jTiTjih
Feature Extraction PSH LWR
Feature
Feature Extraction PSH LWR
sconstraint iesprobabilit with thelabeling true thepredicts that Tbest theFind
themseparateor bin
same in the examplesboth place willTh
)(xT
Local Weighted Regression (LWR)bull Given a query image PSH returns
KNNs
bull LWR uses the KNN to compute a weighted average of the estimated angles of the query
weightdist
iXiixNx
xxdKxgdi
0)(
))(())((minarg0
Feature Extraction PSH LWR
Results
Synthetic data were generated
bull 13 angles 1 for rotation of the torso 12 for joints
bull 150000 images
bull Nuisance parameters added clothing illumination face expression
bull 1775000 example pairs
bull Selected 137 out of 5123 meaningful features (how)
18 bit hash functions (k) 150 hash tables (l)
bull Test on 1000 synthetic examplesbull PSH searched only 34 of the data per query
bull Without selection needed 40 bits and
1000 hash tables
Recall P1 is prob of positive hashP2 is prob of bad hashB is the max number of pts in a bucket
Results ndash real data
bull 800 images
bull Processed by a segmentation algorithm
bull 13 of the data were searched
Results ndash real data
Interesting mismatches
Fast pose estimation - summary
bull Fast way to compute the angles of human body figure
bull Moving from one representation space to another
bull Training a sensitive hash function
bull KNN smart averaging
Food for Thought
bull The basic assumption may be problematic (distance metric representations)
bull The training set should be dense
bull Texture and clutter
bull General some features are more important than others and should be weighted
Food for Thought Point Location in Different Spheres (PLDS)
bull Given n spheres in Rd centered at P=p1hellippn
with radii r1helliprn
bull Goal given a query q preprocess the points in P to find point pi that its sphere lsquocoverrsquo the query q
qpi
ri
Courtesy of Mohamad Hegaze
Motivationbull Clustering high dimensional data by using local
density measurements (eg feature space)bull Statistical curse of dimensionality
sparseness of the databull Computational curse of dimensionality
expensive range queriesbull LSH parameters should be adjusted for optimal
performance
Mean-Shift Based Clustering in High Dimensions A Texture Classification Example
B Georgescu I Shimshoni and P Meer
Outline
bull Mean-shift in a nutshell + examples
Our scope
bull Mean-shift in high dimensions ndash using LSH
bull Speedups1 Finding optimal LSH parameters
2 Data-driven partitions into buckets
3 Additional speedup by using LSH data structure
Mean-Shift in a Nutshellbandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
point
KNN in mean-shift
Bandwidth should be inversely proportional to the density in the region
high density - small bandwidth low density - large bandwidth
Based on kth nearest neighbor of the point
The bandwidth is
Adaptive mean-shift vs non-adaptive
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Image segmentation algorithm1 Input Data in 5D (3 color + 2 xy) or 3D (1 gray +2 xy)2 Resolution controlled by the bandwidth hs (spatial) hr (color)3 Apply filtering
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
3D
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Image segmentation algorithm
original segmented
filtered
Filtering pixel value of the nearest mode
Mean-shift trajectories
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
original squirrel filtered
original baboon filtered
Filtering examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Segmentation examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Mean-shift in high dimensions
Computational curse of dimensionality
Statistical curse of dimensionality
Expensive range queries implemented with LSH
Sparseness of the data variable bandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
LSH-based data structure
bull Choose L random partitionsEach partition includes K pairs
(dkvk)bull For each point we check
kdi vxK
It Partitions the data into cells
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing the optimal K and L
bull For a query q compute smallest number of distances to points in its buckets
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
points extra includemight big toois L ifbut
missed bemight points small toois L If
cell ain points ofnumber smaller k Large
C
l
l
CC
dC
LNN
dKnN
)1(
C
C
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
structure data theof resolution thedetermines
decreases but increases increases L As
C
CC
Choosing optimal K and LDetermine accurately the KNN for m randomly-selected data points
distance (bandwidth)
Choose error threshold
The optimal K and L should satisfy
the approximate distance
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing optimal K and Lbull For each K estimate the error forbull In one run for all Lrsquos find the minimal L satisfying the constraint L(K)bull Minimize time t(KL(K))
minimum
Approximationerror for KL
L(K) for =005 Running timet[KL(K)]
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Data driven partitions
bull In original LSH cut values are random in the range of the databull Suggestion Randomly select a point from the data and use one of its coordinates as the cut value
uniform data driven pointsbucket distribution
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Additional speedup
aggregate)an of typea like is (C mode same
the toconverge willCin points all that Assume
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
C
C
Speedup results
65536 points 1638 points sampled k=100
Food for thought
Low dimension High dimension
A thought for foodhellipbull Choose K L by sample learning or take the traditionalbull Can one estimate K L without samplingbull A thought for food does it help to know the data
dimensionality or the data manifoldbull Intuitively dimensionality implies the number of hash
functions neededbull The catch efficient dimensionality learning requires
KNN
1530 cookieshellip
Summary
bull LSH suggests a compromise on accuracy for the gain of complexity
bull Applications that involve massive data in high dimension require the LSH fast performance
bull Extension of the LSH to different spaces (PSH)
bull Learning the LSH parameters and hash functions for different applications
Conclusion
bull but at the endeverything depends on your data set
bull Try it at homendash Visit
httpwebmiteduandoniwwwLSHindexhtmlndash Email Alex Andoni Andonimitedundash Test over your own data
(C code under Red Hat Linux )
Thanks
bull Ilan Shimshoni (Haifa)
bull Mohamad Hegaze (Weizmann)
bull Alex Andoni (MIT)
bull Mica and Denis
- k-Nearest Neighbors Search in High Dimensions
- Outline
- Nearest Neighbor Search Problem definition
- Applications
- Naiumlve solution
- Common solution
- When to use nearest neighbor
- Nearest Neighbor
- r - Nearest Neighbor
- Slide 10
- The simplest solution
- Quadtree
- Quadtree - structure
- Quadtree - Query
- Quadtree ndash Pitfall1
- Slide 16
- Quadtree ndash pitfall 2
- Space partition based algorithms
- Slide 19
- Curse of dimensionality
- Curse of dimensionality Some intuition
- Slide 22
- Preview
- Hash function
- Slide 25
- Slide 26
- Recall r - Nearest Neighbor
- Locality sensitive hashing
- Slide 29
- Hamming Space
- Slide 31
- L1 to Hamming Space Embedding
- Slide 33
- Construction
- Query
- Alternative intuition random projections
- Slide 37
- Slide 38
- Slide 39
- k samplings
- Repeating
- Repeating L times
- Slide 43
- Secondary hashing
- The above hashing is locality-sensitive
- Slide 46
- Direct L2 solution
- Central limit theorem
- Slide 49
- Slide 50
- Norm Distance
- Slide 52
- The full Hashing
- Slide 54
- Slide 55
- Slide 56
- Generalization P-Stable distribution
- P-Stable summary
- Parameters selection
- Parameters selection hellip
- hellip Parameters selection
- Pros amp Cons
- Conclusion
- LSH - Applications
- Motivation
- Slide 66
- Given an image x what are the parameters θ in this image ie angles of joints orientation of the body etc1048698
- Ingredients
- Example based learning
- Slide 70
- The image features
- PSH The basic assumption
- Insight Manifolds
- Slide 74
- Parameter Sensitive Hashing (PSH)
- Slide 76
- Slide 77
- Slide 78
- Slide 79
- Local Weighted Regression (LWR)
- Results
- Slide 82
- Results ndash real data
- Slide 84
- Slide 85
- Fast pose estimation - summary
- Food for Thought
- Food for Thought Point Location in Different Spheres (PLDS)
- Motivation
- Slide 90
- Slide 91
- Slide 92
- Slide 93
- Image segmentation algorithm
- Slide 95
- Filtering examples
- Segmentation examples
- Mean-shift in high dimensions
- LSH-based data structure
- Choosing the optimal K and L
- Slide 101
- Choosing optimal K and L
- Slide 103
- Data driven partitions
- Additional speedup
- Speedup results
- Food for thought
- A thought for foodhellip
- Summary
- Slide 110
- Thanks
-
![Page 38: K-Nearest Neighbors Search in High Dimensions Tomer Peled Dan Kushnir Tell me who your neighbors are, and I'll know who you are.](https://reader036.fdocuments.us/reader036/viewer/2022062421/56649c765503460f94929f53/html5/thumbnails/38.jpg)
Alternative intuition random projections
8
C=11
1111111100011000000000
2
1111111100011000000000
p
Alternative intuition random projections
101
11000000000 111111110000 111000000000 111111110001
000
100
110
001
101
111
2233 BucketsBucketsp
k samplings
Repeating
Repeating L times
Repeating L times
Secondary hashing
Support volume tuning
dataset-size vs storage volume
2k buckets
011
Size=B
M Buckets
Simple Hashing
MB=αn α=2
Skip
The above hashing is locality-sensitive
bullProbability (pq in same bucket)=
k=1 k=2
Distance (qpi) Distance (qpi)
Pro
babi
lity Pr
Adopted from Piotr Indykrsquos slides
kqp
dimensions
)(Distance1
Preview
bullGeneral Solution ndash Locality sensitive hashing
bullImplementation for Hamming space
bullGeneralization to l2
Direct L2 solution
bullNew hashing function
bullStill based on sampling
bullUsing mathematical trick
bullP-stable distribution for Lp distance bullGaussian distribution for L2 distance
Central limit theorem
v1 +v2 hellip+vn =+hellip
(Weighted Gaussians) = Weighted Gaussian
Central limit theorem
v1vn = Real Numbers
X1Xn = Independent Identically Distributed(iid)
+v2 X2 hellip+vn Xn =+hellipv1 X1
Central limit theorem
XvXvi
ii
ii
21
2||
Dot Product Norm
Norm Distance
XvuXvXui
iii
iii
ii
21
2||
Features vector 1
Features vector 2 Distance
Norm Distance
XvuXvXui
iii
iii
ii
21
2||
Dot Product
Dot Product Distance
The full Hashing
w
bvavh ba )(
[34 82 21]1
227742
d
d random numbers
+b
phaseRandom[0w]
wDiscretization step
Features vector
The full Hashing
w
bvavh ba )(
+34
100
7944
7900 8000 8100 82007800
The full Hashing
w
bvavh ba )(
+34
phaseRandom[0w]
100Discretization step
7944
The full Hashing
w
bvavh ba )(
a1 v d
iid from p-stable distribution
+b
phaseRandom[0w]
wDiscretization step
Features vector
Generalization P-Stable distribution
bullLp p=eps2
bullGeneralized Central Limit Theorem
bullP-stable distributionCauchy for L2
bullL2
bullCentral Limit Theorem
bullGaussian (normal) distribution
P-Stable summary
bullWorks for bullGeneralizes to 0ltplt=2
bullImproves query time
Query time = O (dn1(1+)log(n) ) O (dn1(1+)^2log(n) )
r - Nearest Neighbor
Latest resultsReported in Email by
Alexander Andoni
Parameters selection
bull90 Probability Best quarry time performance
For Euclidean Space
Parameters selectionhellip
For Euclidean Space
bullSingle projection hit an - Nearest Neighbor with Pr=p1
bullk projections hits an - Nearest Neighbor with Pr=p1k
bullL hashings fail to collide with Pr=(1-p1k)L
bullTo ensure Collision (eg 1-δge90)
bull1( -1-p1k)Lge 1-δ)1log(
)log(
1kp
L
L
Reject Non-NeighborsAccept Neighbors
hellipParameters selection
K
k
time Candidates verification Candidates extraction
Better Query Time than Spatial Data Structures
Scales well to higher dimensions and larger data size ( Sub-linear dependence )
Predictable running time
Extra storage over-head
Inefficient for data with distances concentrated around average
works best for Hamming distance (although can be generalized to Euclidean space)
In secondary storage linear scan is pretty much all we can do (for high dim)
requires radius r to be fixed in advance
Pros amp Cons
From Pioter Indyk slides
Conclusion
bullbut at the endeverything depends on your data set
bullTry it at homendashVisit
httpwebmiteduandoniwwwLSHindexhtml
ndashEmail Alex AndoniAndonimitedundashTest over your own data
(C code under Red Hat Linux )
LSH - Applicationsbull Searching video clips in databases (Hierarchical Non-Uniform Locality Sensitive
Hashing and Its Application to Video Identificationldquo Yang Ooi Sun)
bull Searching image databases (see the following)
bull Image segmentation (see the following)
bull Image classification (ldquoDiscriminant adaptive Nearest Neighbor Classificationrdquo T Hastie R Tibshirani)
bull Texture classification (see the following)
bull Clustering (see the following)
bull Embedding and manifold learning (LLE and many others)
bull Compression ndash vector quantizationbull Search engines (ldquoLSH Forest SelfTuning Indexes for Similarity Searchrdquo M Bawa T Condie P Ganesanrdquo)
bull Genomics (ldquoEfficient Large-Scale Sequence Comparison by Locality-Sensitive Hashingrdquo J Buhler)
bull In short whenever K-Nearest Neighbors (KNN) are needed
Motivation
bull A variety of procedures in learning require KNN computation
bull KNN search is a computational bottleneck
bull LSH provides a fast approximate solution to the problem
bull LSH requires hash function construction and parameter tunning
Outline
Fast Pose Estimation with Parameter Sensitive Hashing G Shakhnarovich P Viola and T Darrell
bull Finding sensitive hash functions
Mean Shift Based Clustering in HighDimensions A Texture Classification Example
B Georgescu I Shimshoni and P Meer
bull Tuning LSH parametersbull LSH data structure is used for algorithm
speedups
Given an image x what are the parameters θ in this image
ie angles of joints orientation of the body etc1048698
The Problem
Fast Pose Estimation with Parameter Sensitive Hashing
G Shakhnarovich P Viola and T Darrell
i
Ingredients
bull Input query image with unknown angles (parameters)
bull Database of human poses with known anglesbull Image feature extractor ndash edge detector
bull Distance metric in feature space dx
bull Distance metric in angles space
m
i
iid1
2121 )cos(1)(
Example based learning
bull Construct a database of example images with their known angles
bull Given a query image run your favorite feature extractorbull Compute KNN from databasebull Use these KNNs to compute the average angles of the
query
Input queryFind KNN in database of examples
Output Average angles of KNN
Input Query
Features extraction
Processed query
PSH (LSH)
Database of examples
The algorithm flow
LWR (Regression)
Output Match
The image features
B A
Axx 4107 )(
4
3
2
4 0
Image features are multi-scale edge histograms
Feature Extraction PSH LWR
PSH The basic assumption
There are two metric spaces here feature space ( )
and parameter space ( )
We want similarity to be measured in the angles
space whereas LSH works on the feature space
bull Assumption The feature space is closely related to the parameter space
xd
d
Feature Extraction PSH LWR
Insight Manifolds
bull Manifold is a space in which every point has a neighborhood resembling a Euclid space
bull But global structure may be complicated curved
bull For example lines are 1D manifolds planes are 2D manifolds etc
Feature Extraction PSH LWR
Parameters Space (angles)
Feature Space
q
Is this Magic
Parameter Sensitive Hashing (PSH)
The trick
Estimate performance of different hash functions on examples and select those sensitive to
The hash functions are applied in feature space but the KNN are valid in angle space
d
Feature Extraction PSH LWR
Label pairs of examples with similar angles
Define hash functions h on feature space
Feature Extraction PSH LWR
Predict labeling of similarnon-similar examples by using h
Compare labeling
If labeling by h is goodaccept h else change h
PSH as a classification problem
+1 +1 -1 -1
(r=025)
Labels
)1()( if 1
)( if 1y
labeled is
)x()(x examples ofpair A
ij
ji
rd
rd
ji
ji
ji
Feature Extraction PSH LWR
otherwise 1-
T(x) if 1)(
xh T
A binary hash functionfeatures
otherwise 1
if 1ˆ
labels ePredict th
)(xh)(xh)x(xy
jTiTjih
Feature Extraction PSH LWR
Feature
Feature Extraction PSH LWR
sconstraint iesprobabilit with thelabeling true thepredicts that Tbest theFind
themseparateor bin
same in the examplesboth place willTh
)(xT
Local Weighted Regression (LWR)bull Given a query image PSH returns
KNNs
bull LWR uses the KNN to compute a weighted average of the estimated angles of the query
weightdist
iXiixNx
xxdKxgdi
0)(
))(())((minarg0
Feature Extraction PSH LWR
Results
Synthetic data were generated
bull 13 angles 1 for rotation of the torso 12 for joints
bull 150000 images
bull Nuisance parameters added clothing illumination face expression
bull 1775000 example pairs
bull Selected 137 out of 5123 meaningful features (how)
18 bit hash functions (k) 150 hash tables (l)
bull Test on 1000 synthetic examplesbull PSH searched only 34 of the data per query
bull Without selection needed 40 bits and
1000 hash tables
Recall P1 is prob of positive hashP2 is prob of bad hashB is the max number of pts in a bucket
Results ndash real data
bull 800 images
bull Processed by a segmentation algorithm
bull 13 of the data were searched
Results ndash real data
Interesting mismatches
Fast pose estimation - summary
bull Fast way to compute the angles of human body figure
bull Moving from one representation space to another
bull Training a sensitive hash function
bull KNN smart averaging
Food for Thought
bull The basic assumption may be problematic (distance metric representations)
bull The training set should be dense
bull Texture and clutter
bull General some features are more important than others and should be weighted
Food for Thought Point Location in Different Spheres (PLDS)
bull Given n spheres in Rd centered at P=p1hellippn
with radii r1helliprn
bull Goal given a query q preprocess the points in P to find point pi that its sphere lsquocoverrsquo the query q
qpi
ri
Courtesy of Mohamad Hegaze
Motivationbull Clustering high dimensional data by using local
density measurements (eg feature space)bull Statistical curse of dimensionality
sparseness of the databull Computational curse of dimensionality
expensive range queriesbull LSH parameters should be adjusted for optimal
performance
Mean-Shift Based Clustering in High Dimensions A Texture Classification Example
B Georgescu I Shimshoni and P Meer
Outline
bull Mean-shift in a nutshell + examples
Our scope
bull Mean-shift in high dimensions ndash using LSH
bull Speedups1 Finding optimal LSH parameters
2 Data-driven partitions into buckets
3 Additional speedup by using LSH data structure
Mean-Shift in a Nutshellbandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
point
KNN in mean-shift
Bandwidth should be inversely proportional to the density in the region
high density - small bandwidth low density - large bandwidth
Based on kth nearest neighbor of the point
The bandwidth is
Adaptive mean-shift vs non-adaptive
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Image segmentation algorithm1 Input Data in 5D (3 color + 2 xy) or 3D (1 gray +2 xy)2 Resolution controlled by the bandwidth hs (spatial) hr (color)3 Apply filtering
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
3D
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Image segmentation algorithm
original segmented
filtered
Filtering pixel value of the nearest mode
Mean-shift trajectories
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
original squirrel filtered
original baboon filtered
Filtering examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Segmentation examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Mean-shift in high dimensions
Computational curse of dimensionality
Statistical curse of dimensionality
Expensive range queries implemented with LSH
Sparseness of the data variable bandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
LSH-based data structure
bull Choose L random partitionsEach partition includes K pairs
(dkvk)bull For each point we check
kdi vxK
It Partitions the data into cells
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing the optimal K and L
bull For a query q compute smallest number of distances to points in its buckets
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
points extra includemight big toois L ifbut
missed bemight points small toois L If
cell ain points ofnumber smaller k Large
C
l
l
CC
dC
LNN
dKnN
)1(
C
C
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
structure data theof resolution thedetermines
decreases but increases increases L As
C
CC
Choosing optimal K and LDetermine accurately the KNN for m randomly-selected data points
distance (bandwidth)
Choose error threshold
The optimal K and L should satisfy
the approximate distance
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing optimal K and Lbull For each K estimate the error forbull In one run for all Lrsquos find the minimal L satisfying the constraint L(K)bull Minimize time t(KL(K))
minimum
Approximationerror for KL
L(K) for =005 Running timet[KL(K)]
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Data driven partitions
bull In original LSH cut values are random in the range of the databull Suggestion Randomly select a point from the data and use one of its coordinates as the cut value
uniform data driven pointsbucket distribution
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Additional speedup
aggregate)an of typea like is (C mode same
the toconverge willCin points all that Assume
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
C
C
Speedup results
65536 points 1638 points sampled k=100
Food for thought
Low dimension High dimension
A thought for foodhellipbull Choose K L by sample learning or take the traditionalbull Can one estimate K L without samplingbull A thought for food does it help to know the data
dimensionality or the data manifoldbull Intuitively dimensionality implies the number of hash
functions neededbull The catch efficient dimensionality learning requires
KNN
1530 cookieshellip
Summary
bull LSH suggests a compromise on accuracy for the gain of complexity
bull Applications that involve massive data in high dimension require the LSH fast performance
bull Extension of the LSH to different spaces (PSH)
bull Learning the LSH parameters and hash functions for different applications
Conclusion
bull but at the endeverything depends on your data set
bull Try it at homendash Visit
httpwebmiteduandoniwwwLSHindexhtmlndash Email Alex Andoni Andonimitedundash Test over your own data
(C code under Red Hat Linux )
Thanks
bull Ilan Shimshoni (Haifa)
bull Mohamad Hegaze (Weizmann)
bull Alex Andoni (MIT)
bull Mica and Denis
- k-Nearest Neighbors Search in High Dimensions
- Outline
- Nearest Neighbor Search Problem definition
- Applications
- Naiumlve solution
- Common solution
- When to use nearest neighbor
- Nearest Neighbor
- r - Nearest Neighbor
- Slide 10
- The simplest solution
- Quadtree
- Quadtree - structure
- Quadtree - Query
- Quadtree ndash Pitfall1
- Slide 16
- Quadtree ndash pitfall 2
- Space partition based algorithms
- Slide 19
- Curse of dimensionality
- Curse of dimensionality Some intuition
- Slide 22
- Preview
- Hash function
- Slide 25
- Slide 26
- Recall r - Nearest Neighbor
- Locality sensitive hashing
- Slide 29
- Hamming Space
- Slide 31
- L1 to Hamming Space Embedding
- Slide 33
- Construction
- Query
- Alternative intuition random projections
- Slide 37
- Slide 38
- Slide 39
- k samplings
- Repeating
- Repeating L times
- Slide 43
- Secondary hashing
- The above hashing is locality-sensitive
- Slide 46
- Direct L2 solution
- Central limit theorem
- Slide 49
- Slide 50
- Norm Distance
- Slide 52
- The full Hashing
- Slide 54
- Slide 55
- Slide 56
- Generalization P-Stable distribution
- P-Stable summary
- Parameters selection
- Parameters selection hellip
- hellip Parameters selection
- Pros amp Cons
- Conclusion
- LSH - Applications
- Motivation
- Slide 66
- Given an image x what are the parameters θ in this image ie angles of joints orientation of the body etc1048698
- Ingredients
- Example based learning
- Slide 70
- The image features
- PSH The basic assumption
- Insight Manifolds
- Slide 74
- Parameter Sensitive Hashing (PSH)
- Slide 76
- Slide 77
- Slide 78
- Slide 79
- Local Weighted Regression (LWR)
- Results
- Slide 82
- Results ndash real data
- Slide 84
- Slide 85
- Fast pose estimation - summary
- Food for Thought
- Food for Thought Point Location in Different Spheres (PLDS)
- Motivation
- Slide 90
- Slide 91
- Slide 92
- Slide 93
- Image segmentation algorithm
- Slide 95
- Filtering examples
- Segmentation examples
- Mean-shift in high dimensions
- LSH-based data structure
- Choosing the optimal K and L
- Slide 101
- Choosing optimal K and L
- Slide 103
- Data driven partitions
- Additional speedup
- Speedup results
- Food for thought
- A thought for foodhellip
- Summary
- Slide 110
- Thanks
-
![Page 39: K-Nearest Neighbors Search in High Dimensions Tomer Peled Dan Kushnir Tell me who your neighbors are, and I'll know who you are.](https://reader036.fdocuments.us/reader036/viewer/2022062421/56649c765503460f94929f53/html5/thumbnails/39.jpg)
Alternative intuition random projections
101
11000000000 111111110000 111000000000 111111110001
000
100
110
001
101
111
2233 BucketsBucketsp
k samplings
Repeating
Repeating L times
Repeating L times
Secondary hashing
Support volume tuning
dataset-size vs storage volume
2k buckets
011
Size=B
M Buckets
Simple Hashing
MB=αn α=2
Skip
The above hashing is locality-sensitive
bullProbability (pq in same bucket)=
k=1 k=2
Distance (qpi) Distance (qpi)
Pro
babi
lity Pr
Adopted from Piotr Indykrsquos slides
kqp
dimensions
)(Distance1
Preview
bullGeneral Solution ndash Locality sensitive hashing
bullImplementation for Hamming space
bullGeneralization to l2
Direct L2 solution
bullNew hashing function
bullStill based on sampling
bullUsing mathematical trick
bullP-stable distribution for Lp distance bullGaussian distribution for L2 distance
Central limit theorem
v1 +v2 hellip+vn =+hellip
(Weighted Gaussians) = Weighted Gaussian
Central limit theorem
v1vn = Real Numbers
X1Xn = Independent Identically Distributed(iid)
+v2 X2 hellip+vn Xn =+hellipv1 X1
Central limit theorem
XvXvi
ii
ii
21
2||
Dot Product Norm
Norm Distance
XvuXvXui
iii
iii
ii
21
2||
Features vector 1
Features vector 2 Distance
Norm Distance
XvuXvXui
iii
iii
ii
21
2||
Dot Product
Dot Product Distance
The full Hashing
w
bvavh ba )(
[34 82 21]1
227742
d
d random numbers
+b
phaseRandom[0w]
wDiscretization step
Features vector
The full Hashing
w
bvavh ba )(
+34
100
7944
7900 8000 8100 82007800
The full Hashing
w
bvavh ba )(
+34
phaseRandom[0w]
100Discretization step
7944
The full Hashing
w
bvavh ba )(
a1 v d
iid from p-stable distribution
+b
phaseRandom[0w]
wDiscretization step
Features vector
Generalization P-Stable distribution
bullLp p=eps2
bullGeneralized Central Limit Theorem
bullP-stable distributionCauchy for L2
bullL2
bullCentral Limit Theorem
bullGaussian (normal) distribution
P-Stable summary
bullWorks for bullGeneralizes to 0ltplt=2
bullImproves query time
Query time = O (dn1(1+)log(n) ) O (dn1(1+)^2log(n) )
r - Nearest Neighbor
Latest resultsReported in Email by
Alexander Andoni
Parameters selection
bull90 Probability Best quarry time performance
For Euclidean Space
Parameters selectionhellip
For Euclidean Space
bullSingle projection hit an - Nearest Neighbor with Pr=p1
bullk projections hits an - Nearest Neighbor with Pr=p1k
bullL hashings fail to collide with Pr=(1-p1k)L
bullTo ensure Collision (eg 1-δge90)
bull1( -1-p1k)Lge 1-δ)1log(
)log(
1kp
L
L
Reject Non-NeighborsAccept Neighbors
hellipParameters selection
K
k
time Candidates verification Candidates extraction
Better Query Time than Spatial Data Structures
Scales well to higher dimensions and larger data size ( Sub-linear dependence )
Predictable running time
Extra storage over-head
Inefficient for data with distances concentrated around average
works best for Hamming distance (although can be generalized to Euclidean space)
In secondary storage linear scan is pretty much all we can do (for high dim)
requires radius r to be fixed in advance
Pros amp Cons
From Pioter Indyk slides
Conclusion
bullbut at the endeverything depends on your data set
bullTry it at homendashVisit
httpwebmiteduandoniwwwLSHindexhtml
ndashEmail Alex AndoniAndonimitedundashTest over your own data
(C code under Red Hat Linux )
LSH - Applicationsbull Searching video clips in databases (Hierarchical Non-Uniform Locality Sensitive
Hashing and Its Application to Video Identificationldquo Yang Ooi Sun)
bull Searching image databases (see the following)
bull Image segmentation (see the following)
bull Image classification (ldquoDiscriminant adaptive Nearest Neighbor Classificationrdquo T Hastie R Tibshirani)
bull Texture classification (see the following)
bull Clustering (see the following)
bull Embedding and manifold learning (LLE and many others)
bull Compression ndash vector quantizationbull Search engines (ldquoLSH Forest SelfTuning Indexes for Similarity Searchrdquo M Bawa T Condie P Ganesanrdquo)
bull Genomics (ldquoEfficient Large-Scale Sequence Comparison by Locality-Sensitive Hashingrdquo J Buhler)
bull In short whenever K-Nearest Neighbors (KNN) are needed
Motivation
bull A variety of procedures in learning require KNN computation
bull KNN search is a computational bottleneck
bull LSH provides a fast approximate solution to the problem
bull LSH requires hash function construction and parameter tunning
Outline
Fast Pose Estimation with Parameter Sensitive Hashing G Shakhnarovich P Viola and T Darrell
bull Finding sensitive hash functions
Mean Shift Based Clustering in HighDimensions A Texture Classification Example
B Georgescu I Shimshoni and P Meer
bull Tuning LSH parametersbull LSH data structure is used for algorithm
speedups
Given an image x what are the parameters θ in this image
ie angles of joints orientation of the body etc1048698
The Problem
Fast Pose Estimation with Parameter Sensitive Hashing
G Shakhnarovich P Viola and T Darrell
i
Ingredients
bull Input query image with unknown angles (parameters)
bull Database of human poses with known anglesbull Image feature extractor ndash edge detector
bull Distance metric in feature space dx
bull Distance metric in angles space
m
i
iid1
2121 )cos(1)(
Example based learning
bull Construct a database of example images with their known angles
bull Given a query image run your favorite feature extractorbull Compute KNN from databasebull Use these KNNs to compute the average angles of the
query
Input queryFind KNN in database of examples
Output Average angles of KNN
Input Query
Features extraction
Processed query
PSH (LSH)
Database of examples
The algorithm flow
LWR (Regression)
Output Match
The image features
B A
Axx 4107 )(
4
3
2
4 0
Image features are multi-scale edge histograms
Feature Extraction PSH LWR
PSH The basic assumption
There are two metric spaces here feature space ( )
and parameter space ( )
We want similarity to be measured in the angles
space whereas LSH works on the feature space
bull Assumption The feature space is closely related to the parameter space
xd
d
Feature Extraction PSH LWR
Insight Manifolds
bull Manifold is a space in which every point has a neighborhood resembling a Euclid space
bull But global structure may be complicated curved
bull For example lines are 1D manifolds planes are 2D manifolds etc
Feature Extraction PSH LWR
Parameters Space (angles)
Feature Space
q
Is this Magic
Parameter Sensitive Hashing (PSH)
The trick
Estimate performance of different hash functions on examples and select those sensitive to
The hash functions are applied in feature space but the KNN are valid in angle space
d
Feature Extraction PSH LWR
Label pairs of examples with similar angles
Define hash functions h on feature space
Feature Extraction PSH LWR
Predict labeling of similarnon-similar examples by using h
Compare labeling
If labeling by h is goodaccept h else change h
PSH as a classification problem
+1 +1 -1 -1
(r=025)
Labels
)1()( if 1
)( if 1y
labeled is
)x()(x examples ofpair A
ij
ji
rd
rd
ji
ji
ji
Feature Extraction PSH LWR
otherwise 1-
T(x) if 1)(
xh T
A binary hash functionfeatures
otherwise 1
if 1ˆ
labels ePredict th
)(xh)(xh)x(xy
jTiTjih
Feature Extraction PSH LWR
Feature
Feature Extraction PSH LWR
sconstraint iesprobabilit with thelabeling true thepredicts that Tbest theFind
themseparateor bin
same in the examplesboth place willTh
)(xT
Local Weighted Regression (LWR)bull Given a query image PSH returns
KNNs
bull LWR uses the KNN to compute a weighted average of the estimated angles of the query
weightdist
iXiixNx
xxdKxgdi
0)(
))(())((minarg0
Feature Extraction PSH LWR
Results
Synthetic data were generated
bull 13 angles 1 for rotation of the torso 12 for joints
bull 150000 images
bull Nuisance parameters added clothing illumination face expression
bull 1775000 example pairs
bull Selected 137 out of 5123 meaningful features (how)
18 bit hash functions (k) 150 hash tables (l)
bull Test on 1000 synthetic examplesbull PSH searched only 34 of the data per query
bull Without selection needed 40 bits and
1000 hash tables
Recall P1 is prob of positive hashP2 is prob of bad hashB is the max number of pts in a bucket
Results ndash real data
bull 800 images
bull Processed by a segmentation algorithm
bull 13 of the data were searched
Results ndash real data
Interesting mismatches
Fast pose estimation - summary
bull Fast way to compute the angles of human body figure
bull Moving from one representation space to another
bull Training a sensitive hash function
bull KNN smart averaging
Food for Thought
bull The basic assumption may be problematic (distance metric representations)
bull The training set should be dense
bull Texture and clutter
bull General some features are more important than others and should be weighted
Food for Thought Point Location in Different Spheres (PLDS)
bull Given n spheres in Rd centered at P=p1hellippn
with radii r1helliprn
bull Goal given a query q preprocess the points in P to find point pi that its sphere lsquocoverrsquo the query q
qpi
ri
Courtesy of Mohamad Hegaze
Motivationbull Clustering high dimensional data by using local
density measurements (eg feature space)bull Statistical curse of dimensionality
sparseness of the databull Computational curse of dimensionality
expensive range queriesbull LSH parameters should be adjusted for optimal
performance
Mean-Shift Based Clustering in High Dimensions A Texture Classification Example
B Georgescu I Shimshoni and P Meer
Outline
bull Mean-shift in a nutshell + examples
Our scope
bull Mean-shift in high dimensions ndash using LSH
bull Speedups1 Finding optimal LSH parameters
2 Data-driven partitions into buckets
3 Additional speedup by using LSH data structure
Mean-Shift in a Nutshellbandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
point
KNN in mean-shift
Bandwidth should be inversely proportional to the density in the region
high density - small bandwidth low density - large bandwidth
Based on kth nearest neighbor of the point
The bandwidth is
Adaptive mean-shift vs non-adaptive
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Image segmentation algorithm1 Input Data in 5D (3 color + 2 xy) or 3D (1 gray +2 xy)2 Resolution controlled by the bandwidth hs (spatial) hr (color)3 Apply filtering
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
3D
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Image segmentation algorithm
original segmented
filtered
Filtering pixel value of the nearest mode
Mean-shift trajectories
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
original squirrel filtered
original baboon filtered
Filtering examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Segmentation examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Mean-shift in high dimensions
Computational curse of dimensionality
Statistical curse of dimensionality
Expensive range queries implemented with LSH
Sparseness of the data variable bandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
LSH-based data structure
bull Choose L random partitionsEach partition includes K pairs
(dkvk)bull For each point we check
kdi vxK
It Partitions the data into cells
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing the optimal K and L
bull For a query q compute smallest number of distances to points in its buckets
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
points extra includemight big toois L ifbut
missed bemight points small toois L If
cell ain points ofnumber smaller k Large
C
l
l
CC
dC
LNN
dKnN
)1(
C
C
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
structure data theof resolution thedetermines
decreases but increases increases L As
C
CC
Choosing optimal K and LDetermine accurately the KNN for m randomly-selected data points
distance (bandwidth)
Choose error threshold
The optimal K and L should satisfy
the approximate distance
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing optimal K and Lbull For each K estimate the error forbull In one run for all Lrsquos find the minimal L satisfying the constraint L(K)bull Minimize time t(KL(K))
minimum
Approximationerror for KL
L(K) for =005 Running timet[KL(K)]
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Data driven partitions
bull In original LSH cut values are random in the range of the databull Suggestion Randomly select a point from the data and use one of its coordinates as the cut value
uniform data driven pointsbucket distribution
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Additional speedup
aggregate)an of typea like is (C mode same
the toconverge willCin points all that Assume
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
C
C
Speedup results
65536 points 1638 points sampled k=100
Food for thought
Low dimension High dimension
A thought for foodhellipbull Choose K L by sample learning or take the traditionalbull Can one estimate K L without samplingbull A thought for food does it help to know the data
dimensionality or the data manifoldbull Intuitively dimensionality implies the number of hash
functions neededbull The catch efficient dimensionality learning requires
KNN
1530 cookieshellip
Summary
bull LSH suggests a compromise on accuracy for the gain of complexity
bull Applications that involve massive data in high dimension require the LSH fast performance
bull Extension of the LSH to different spaces (PSH)
bull Learning the LSH parameters and hash functions for different applications
Conclusion
bull but at the endeverything depends on your data set
bull Try it at homendash Visit
httpwebmiteduandoniwwwLSHindexhtmlndash Email Alex Andoni Andonimitedundash Test over your own data
(C code under Red Hat Linux )
Thanks
bull Ilan Shimshoni (Haifa)
bull Mohamad Hegaze (Weizmann)
bull Alex Andoni (MIT)
bull Mica and Denis
- k-Nearest Neighbors Search in High Dimensions
- Outline
- Nearest Neighbor Search Problem definition
- Applications
- Naiumlve solution
- Common solution
- When to use nearest neighbor
- Nearest Neighbor
- r - Nearest Neighbor
- Slide 10
- The simplest solution
- Quadtree
- Quadtree - structure
- Quadtree - Query
- Quadtree ndash Pitfall1
- Slide 16
- Quadtree ndash pitfall 2
- Space partition based algorithms
- Slide 19
- Curse of dimensionality
- Curse of dimensionality Some intuition
- Slide 22
- Preview
- Hash function
- Slide 25
- Slide 26
- Recall r - Nearest Neighbor
- Locality sensitive hashing
- Slide 29
- Hamming Space
- Slide 31
- L1 to Hamming Space Embedding
- Slide 33
- Construction
- Query
- Alternative intuition random projections
- Slide 37
- Slide 38
- Slide 39
- k samplings
- Repeating
- Repeating L times
- Slide 43
- Secondary hashing
- The above hashing is locality-sensitive
- Slide 46
- Direct L2 solution
- Central limit theorem
- Slide 49
- Slide 50
- Norm Distance
- Slide 52
- The full Hashing
- Slide 54
- Slide 55
- Slide 56
- Generalization P-Stable distribution
- P-Stable summary
- Parameters selection
- Parameters selection hellip
- hellip Parameters selection
- Pros amp Cons
- Conclusion
- LSH - Applications
- Motivation
- Slide 66
- Given an image x what are the parameters θ in this image ie angles of joints orientation of the body etc1048698
- Ingredients
- Example based learning
- Slide 70
- The image features
- PSH The basic assumption
- Insight Manifolds
- Slide 74
- Parameter Sensitive Hashing (PSH)
- Slide 76
- Slide 77
- Slide 78
- Slide 79
- Local Weighted Regression (LWR)
- Results
- Slide 82
- Results ndash real data
- Slide 84
- Slide 85
- Fast pose estimation - summary
- Food for Thought
- Food for Thought Point Location in Different Spheres (PLDS)
- Motivation
- Slide 90
- Slide 91
- Slide 92
- Slide 93
- Image segmentation algorithm
- Slide 95
- Filtering examples
- Segmentation examples
- Mean-shift in high dimensions
- LSH-based data structure
- Choosing the optimal K and L
- Slide 101
- Choosing optimal K and L
- Slide 103
- Data driven partitions
- Additional speedup
- Speedup results
- Food for thought
- A thought for foodhellip
- Summary
- Slide 110
- Thanks
-
![Page 40: K-Nearest Neighbors Search in High Dimensions Tomer Peled Dan Kushnir Tell me who your neighbors are, and I'll know who you are.](https://reader036.fdocuments.us/reader036/viewer/2022062421/56649c765503460f94929f53/html5/thumbnails/40.jpg)
k samplings
Repeating
Repeating L times
Repeating L times
Secondary hashing
Support volume tuning
dataset-size vs storage volume
2k buckets
011
Size=B
M Buckets
Simple Hashing
MB=αn α=2
Skip
The above hashing is locality-sensitive
bullProbability (pq in same bucket)=
k=1 k=2
Distance (qpi) Distance (qpi)
Pro
babi
lity Pr
Adopted from Piotr Indykrsquos slides
kqp
dimensions
)(Distance1
Preview
bullGeneral Solution ndash Locality sensitive hashing
bullImplementation for Hamming space
bullGeneralization to l2
Direct L2 solution
bullNew hashing function
bullStill based on sampling
bullUsing mathematical trick
bullP-stable distribution for Lp distance bullGaussian distribution for L2 distance
Central limit theorem
v1 +v2 hellip+vn =+hellip
(Weighted Gaussians) = Weighted Gaussian
Central limit theorem
v1vn = Real Numbers
X1Xn = Independent Identically Distributed(iid)
+v2 X2 hellip+vn Xn =+hellipv1 X1
Central limit theorem
XvXvi
ii
ii
21
2||
Dot Product Norm
Norm Distance
XvuXvXui
iii
iii
ii
21
2||
Features vector 1
Features vector 2 Distance
Norm Distance
XvuXvXui
iii
iii
ii
21
2||
Dot Product
Dot Product Distance
The full Hashing
w
bvavh ba )(
[34 82 21]1
227742
d
d random numbers
+b
phaseRandom[0w]
wDiscretization step
Features vector
The full Hashing
w
bvavh ba )(
+34
100
7944
7900 8000 8100 82007800
The full Hashing
w
bvavh ba )(
+34
phaseRandom[0w]
100Discretization step
7944
The full Hashing
w
bvavh ba )(
a1 v d
iid from p-stable distribution
+b
phaseRandom[0w]
wDiscretization step
Features vector
Generalization P-Stable distribution
bullLp p=eps2
bullGeneralized Central Limit Theorem
bullP-stable distributionCauchy for L2
bullL2
bullCentral Limit Theorem
bullGaussian (normal) distribution
P-Stable summary
bullWorks for bullGeneralizes to 0ltplt=2
bullImproves query time
Query time = O (dn1(1+)log(n) ) O (dn1(1+)^2log(n) )
r - Nearest Neighbor
Latest resultsReported in Email by
Alexander Andoni
Parameters selection
bull90 Probability Best quarry time performance
For Euclidean Space
Parameters selectionhellip
For Euclidean Space
bullSingle projection hit an - Nearest Neighbor with Pr=p1
bullk projections hits an - Nearest Neighbor with Pr=p1k
bullL hashings fail to collide with Pr=(1-p1k)L
bullTo ensure Collision (eg 1-δge90)
bull1( -1-p1k)Lge 1-δ)1log(
)log(
1kp
L
L
Reject Non-NeighborsAccept Neighbors
hellipParameters selection
K
k
time Candidates verification Candidates extraction
Better Query Time than Spatial Data Structures
Scales well to higher dimensions and larger data size ( Sub-linear dependence )
Predictable running time
Extra storage over-head
Inefficient for data with distances concentrated around average
works best for Hamming distance (although can be generalized to Euclidean space)
In secondary storage linear scan is pretty much all we can do (for high dim)
requires radius r to be fixed in advance
Pros amp Cons
From Pioter Indyk slides
Conclusion
bullbut at the endeverything depends on your data set
bullTry it at homendashVisit
httpwebmiteduandoniwwwLSHindexhtml
ndashEmail Alex AndoniAndonimitedundashTest over your own data
(C code under Red Hat Linux )
LSH - Applicationsbull Searching video clips in databases (Hierarchical Non-Uniform Locality Sensitive
Hashing and Its Application to Video Identificationldquo Yang Ooi Sun)
bull Searching image databases (see the following)
bull Image segmentation (see the following)
bull Image classification (ldquoDiscriminant adaptive Nearest Neighbor Classificationrdquo T Hastie R Tibshirani)
bull Texture classification (see the following)
bull Clustering (see the following)
bull Embedding and manifold learning (LLE and many others)
bull Compression ndash vector quantizationbull Search engines (ldquoLSH Forest SelfTuning Indexes for Similarity Searchrdquo M Bawa T Condie P Ganesanrdquo)
bull Genomics (ldquoEfficient Large-Scale Sequence Comparison by Locality-Sensitive Hashingrdquo J Buhler)
bull In short whenever K-Nearest Neighbors (KNN) are needed
Motivation
bull A variety of procedures in learning require KNN computation
bull KNN search is a computational bottleneck
bull LSH provides a fast approximate solution to the problem
bull LSH requires hash function construction and parameter tunning
Outline
Fast Pose Estimation with Parameter Sensitive Hashing G Shakhnarovich P Viola and T Darrell
bull Finding sensitive hash functions
Mean Shift Based Clustering in HighDimensions A Texture Classification Example
B Georgescu I Shimshoni and P Meer
bull Tuning LSH parametersbull LSH data structure is used for algorithm
speedups
Given an image x what are the parameters θ in this image
ie angles of joints orientation of the body etc1048698
The Problem
Fast Pose Estimation with Parameter Sensitive Hashing
G Shakhnarovich P Viola and T Darrell
i
Ingredients
bull Input query image with unknown angles (parameters)
bull Database of human poses with known anglesbull Image feature extractor ndash edge detector
bull Distance metric in feature space dx
bull Distance metric in angles space
m
i
iid1
2121 )cos(1)(
Example based learning
bull Construct a database of example images with their known angles
bull Given a query image run your favorite feature extractorbull Compute KNN from databasebull Use these KNNs to compute the average angles of the
query
Input queryFind KNN in database of examples
Output Average angles of KNN
Input Query
Features extraction
Processed query
PSH (LSH)
Database of examples
The algorithm flow
LWR (Regression)
Output Match
The image features
B A
Axx 4107 )(
4
3
2
4 0
Image features are multi-scale edge histograms
Feature Extraction PSH LWR
PSH The basic assumption
There are two metric spaces here feature space ( )
and parameter space ( )
We want similarity to be measured in the angles
space whereas LSH works on the feature space
bull Assumption The feature space is closely related to the parameter space
xd
d
Feature Extraction PSH LWR
Insight Manifolds
bull Manifold is a space in which every point has a neighborhood resembling a Euclid space
bull But global structure may be complicated curved
bull For example lines are 1D manifolds planes are 2D manifolds etc
Feature Extraction PSH LWR
Parameters Space (angles)
Feature Space
q
Is this Magic
Parameter Sensitive Hashing (PSH)
The trick
Estimate performance of different hash functions on examples and select those sensitive to
The hash functions are applied in feature space but the KNN are valid in angle space
d
Feature Extraction PSH LWR
Label pairs of examples with similar angles
Define hash functions h on feature space
Feature Extraction PSH LWR
Predict labeling of similarnon-similar examples by using h
Compare labeling
If labeling by h is goodaccept h else change h
PSH as a classification problem
+1 +1 -1 -1
(r=025)
Labels
)1()( if 1
)( if 1y
labeled is
)x()(x examples ofpair A
ij
ji
rd
rd
ji
ji
ji
Feature Extraction PSH LWR
otherwise 1-
T(x) if 1)(
xh T
A binary hash functionfeatures
otherwise 1
if 1ˆ
labels ePredict th
)(xh)(xh)x(xy
jTiTjih
Feature Extraction PSH LWR
Feature
Feature Extraction PSH LWR
sconstraint iesprobabilit with thelabeling true thepredicts that Tbest theFind
themseparateor bin
same in the examplesboth place willTh
)(xT
Local Weighted Regression (LWR)bull Given a query image PSH returns
KNNs
bull LWR uses the KNN to compute a weighted average of the estimated angles of the query
weightdist
iXiixNx
xxdKxgdi
0)(
))(())((minarg0
Feature Extraction PSH LWR
Results
Synthetic data were generated
bull 13 angles 1 for rotation of the torso 12 for joints
bull 150000 images
bull Nuisance parameters added clothing illumination face expression
bull 1775000 example pairs
bull Selected 137 out of 5123 meaningful features (how)
18 bit hash functions (k) 150 hash tables (l)
bull Test on 1000 synthetic examplesbull PSH searched only 34 of the data per query
bull Without selection needed 40 bits and
1000 hash tables
Recall P1 is prob of positive hashP2 is prob of bad hashB is the max number of pts in a bucket
Results ndash real data
bull 800 images
bull Processed by a segmentation algorithm
bull 13 of the data were searched
Results ndash real data
Interesting mismatches
Fast pose estimation - summary
bull Fast way to compute the angles of human body figure
bull Moving from one representation space to another
bull Training a sensitive hash function
bull KNN smart averaging
Food for Thought
bull The basic assumption may be problematic (distance metric representations)
bull The training set should be dense
bull Texture and clutter
bull General some features are more important than others and should be weighted
Food for Thought Point Location in Different Spheres (PLDS)
bull Given n spheres in Rd centered at P=p1hellippn
with radii r1helliprn
bull Goal given a query q preprocess the points in P to find point pi that its sphere lsquocoverrsquo the query q
qpi
ri
Courtesy of Mohamad Hegaze
Motivationbull Clustering high dimensional data by using local
density measurements (eg feature space)bull Statistical curse of dimensionality
sparseness of the databull Computational curse of dimensionality
expensive range queriesbull LSH parameters should be adjusted for optimal
performance
Mean-Shift Based Clustering in High Dimensions A Texture Classification Example
B Georgescu I Shimshoni and P Meer
Outline
bull Mean-shift in a nutshell + examples
Our scope
bull Mean-shift in high dimensions ndash using LSH
bull Speedups1 Finding optimal LSH parameters
2 Data-driven partitions into buckets
3 Additional speedup by using LSH data structure
Mean-Shift in a Nutshellbandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
point
KNN in mean-shift
Bandwidth should be inversely proportional to the density in the region
high density - small bandwidth low density - large bandwidth
Based on kth nearest neighbor of the point
The bandwidth is
Adaptive mean-shift vs non-adaptive
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Image segmentation algorithm1 Input Data in 5D (3 color + 2 xy) or 3D (1 gray +2 xy)2 Resolution controlled by the bandwidth hs (spatial) hr (color)3 Apply filtering
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
3D
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Image segmentation algorithm
original segmented
filtered
Filtering pixel value of the nearest mode
Mean-shift trajectories
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
original squirrel filtered
original baboon filtered
Filtering examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Segmentation examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Mean-shift in high dimensions
Computational curse of dimensionality
Statistical curse of dimensionality
Expensive range queries implemented with LSH
Sparseness of the data variable bandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
LSH-based data structure
bull Choose L random partitionsEach partition includes K pairs
(dkvk)bull For each point we check
kdi vxK
It Partitions the data into cells
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing the optimal K and L
bull For a query q compute smallest number of distances to points in its buckets
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
points extra includemight big toois L ifbut
missed bemight points small toois L If
cell ain points ofnumber smaller k Large
C
l
l
CC
dC
LNN
dKnN
)1(
C
C
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
structure data theof resolution thedetermines
decreases but increases increases L As
C
CC
Choosing optimal K and LDetermine accurately the KNN for m randomly-selected data points
distance (bandwidth)
Choose error threshold
The optimal K and L should satisfy
the approximate distance
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing optimal K and Lbull For each K estimate the error forbull In one run for all Lrsquos find the minimal L satisfying the constraint L(K)bull Minimize time t(KL(K))
minimum
Approximationerror for KL
L(K) for =005 Running timet[KL(K)]
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Data driven partitions
bull In original LSH cut values are random in the range of the databull Suggestion Randomly select a point from the data and use one of its coordinates as the cut value
uniform data driven pointsbucket distribution
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Additional speedup
aggregate)an of typea like is (C mode same
the toconverge willCin points all that Assume
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
C
C
Speedup results
65536 points 1638 points sampled k=100
Food for thought
Low dimension High dimension
A thought for foodhellipbull Choose K L by sample learning or take the traditionalbull Can one estimate K L without samplingbull A thought for food does it help to know the data
dimensionality or the data manifoldbull Intuitively dimensionality implies the number of hash
functions neededbull The catch efficient dimensionality learning requires
KNN
1530 cookieshellip
Summary
bull LSH suggests a compromise on accuracy for the gain of complexity
bull Applications that involve massive data in high dimension require the LSH fast performance
bull Extension of the LSH to different spaces (PSH)
bull Learning the LSH parameters and hash functions for different applications
Conclusion
bull but at the endeverything depends on your data set
bull Try it at homendash Visit
httpwebmiteduandoniwwwLSHindexhtmlndash Email Alex Andoni Andonimitedundash Test over your own data
(C code under Red Hat Linux )
Thanks
bull Ilan Shimshoni (Haifa)
bull Mohamad Hegaze (Weizmann)
bull Alex Andoni (MIT)
bull Mica and Denis
- k-Nearest Neighbors Search in High Dimensions
- Outline
- Nearest Neighbor Search Problem definition
- Applications
- Naiumlve solution
- Common solution
- When to use nearest neighbor
- Nearest Neighbor
- r - Nearest Neighbor
- Slide 10
- The simplest solution
- Quadtree
- Quadtree - structure
- Quadtree - Query
- Quadtree ndash Pitfall1
- Slide 16
- Quadtree ndash pitfall 2
- Space partition based algorithms
- Slide 19
- Curse of dimensionality
- Curse of dimensionality Some intuition
- Slide 22
- Preview
- Hash function
- Slide 25
- Slide 26
- Recall r - Nearest Neighbor
- Locality sensitive hashing
- Slide 29
- Hamming Space
- Slide 31
- L1 to Hamming Space Embedding
- Slide 33
- Construction
- Query
- Alternative intuition random projections
- Slide 37
- Slide 38
- Slide 39
- k samplings
- Repeating
- Repeating L times
- Slide 43
- Secondary hashing
- The above hashing is locality-sensitive
- Slide 46
- Direct L2 solution
- Central limit theorem
- Slide 49
- Slide 50
- Norm Distance
- Slide 52
- The full Hashing
- Slide 54
- Slide 55
- Slide 56
- Generalization P-Stable distribution
- P-Stable summary
- Parameters selection
- Parameters selection hellip
- hellip Parameters selection
- Pros amp Cons
- Conclusion
- LSH - Applications
- Motivation
- Slide 66
- Given an image x what are the parameters θ in this image ie angles of joints orientation of the body etc1048698
- Ingredients
- Example based learning
- Slide 70
- The image features
- PSH The basic assumption
- Insight Manifolds
- Slide 74
- Parameter Sensitive Hashing (PSH)
- Slide 76
- Slide 77
- Slide 78
- Slide 79
- Local Weighted Regression (LWR)
- Results
- Slide 82
- Results ndash real data
- Slide 84
- Slide 85
- Fast pose estimation - summary
- Food for Thought
- Food for Thought Point Location in Different Spheres (PLDS)
- Motivation
- Slide 90
- Slide 91
- Slide 92
- Slide 93
- Image segmentation algorithm
- Slide 95
- Filtering examples
- Segmentation examples
- Mean-shift in high dimensions
- LSH-based data structure
- Choosing the optimal K and L
- Slide 101
- Choosing optimal K and L
- Slide 103
- Data driven partitions
- Additional speedup
- Speedup results
- Food for thought
- A thought for foodhellip
- Summary
- Slide 110
- Thanks
-
![Page 41: K-Nearest Neighbors Search in High Dimensions Tomer Peled Dan Kushnir Tell me who your neighbors are, and I'll know who you are.](https://reader036.fdocuments.us/reader036/viewer/2022062421/56649c765503460f94929f53/html5/thumbnails/41.jpg)
Repeating
Repeating L times
Repeating L times
Secondary hashing
Support volume tuning
dataset-size vs storage volume
2k buckets
011
Size=B
M Buckets
Simple Hashing
MB=αn α=2
Skip
The above hashing is locality-sensitive
bullProbability (pq in same bucket)=
k=1 k=2
Distance (qpi) Distance (qpi)
Pro
babi
lity Pr
Adopted from Piotr Indykrsquos slides
kqp
dimensions
)(Distance1
Preview
bullGeneral Solution ndash Locality sensitive hashing
bullImplementation for Hamming space
bullGeneralization to l2
Direct L2 solution
bullNew hashing function
bullStill based on sampling
bullUsing mathematical trick
bullP-stable distribution for Lp distance bullGaussian distribution for L2 distance
Central limit theorem
v1 +v2 hellip+vn =+hellip
(Weighted Gaussians) = Weighted Gaussian
Central limit theorem
v1vn = Real Numbers
X1Xn = Independent Identically Distributed(iid)
+v2 X2 hellip+vn Xn =+hellipv1 X1
Central limit theorem
XvXvi
ii
ii
21
2||
Dot Product Norm
Norm Distance
XvuXvXui
iii
iii
ii
21
2||
Features vector 1
Features vector 2 Distance
Norm Distance
XvuXvXui
iii
iii
ii
21
2||
Dot Product
Dot Product Distance
The full Hashing
w
bvavh ba )(
[34 82 21]1
227742
d
d random numbers
+b
phaseRandom[0w]
wDiscretization step
Features vector
The full Hashing
w
bvavh ba )(
+34
100
7944
7900 8000 8100 82007800
The full Hashing
w
bvavh ba )(
+34
phaseRandom[0w]
100Discretization step
7944
The full Hashing
w
bvavh ba )(
a1 v d
iid from p-stable distribution
+b
phaseRandom[0w]
wDiscretization step
Features vector
Generalization P-Stable distribution
bullLp p=eps2
bullGeneralized Central Limit Theorem
bullP-stable distributionCauchy for L2
bullL2
bullCentral Limit Theorem
bullGaussian (normal) distribution
P-Stable summary
bullWorks for bullGeneralizes to 0ltplt=2
bullImproves query time
Query time = O (dn1(1+)log(n) ) O (dn1(1+)^2log(n) )
r - Nearest Neighbor
Latest resultsReported in Email by
Alexander Andoni
Parameters selection
bull90 Probability Best quarry time performance
For Euclidean Space
Parameters selectionhellip
For Euclidean Space
bullSingle projection hit an - Nearest Neighbor with Pr=p1
bullk projections hits an - Nearest Neighbor with Pr=p1k
bullL hashings fail to collide with Pr=(1-p1k)L
bullTo ensure Collision (eg 1-δge90)
bull1( -1-p1k)Lge 1-δ)1log(
)log(
1kp
L
L
Reject Non-NeighborsAccept Neighbors
hellipParameters selection
K
k
time Candidates verification Candidates extraction
Better Query Time than Spatial Data Structures
Scales well to higher dimensions and larger data size ( Sub-linear dependence )
Predictable running time
Extra storage over-head
Inefficient for data with distances concentrated around average
works best for Hamming distance (although can be generalized to Euclidean space)
In secondary storage linear scan is pretty much all we can do (for high dim)
requires radius r to be fixed in advance
Pros amp Cons
From Pioter Indyk slides
Conclusion
bullbut at the endeverything depends on your data set
bullTry it at homendashVisit
httpwebmiteduandoniwwwLSHindexhtml
ndashEmail Alex AndoniAndonimitedundashTest over your own data
(C code under Red Hat Linux )
LSH - Applicationsbull Searching video clips in databases (Hierarchical Non-Uniform Locality Sensitive
Hashing and Its Application to Video Identificationldquo Yang Ooi Sun)
bull Searching image databases (see the following)
bull Image segmentation (see the following)
bull Image classification (ldquoDiscriminant adaptive Nearest Neighbor Classificationrdquo T Hastie R Tibshirani)
bull Texture classification (see the following)
bull Clustering (see the following)
bull Embedding and manifold learning (LLE and many others)
bull Compression ndash vector quantizationbull Search engines (ldquoLSH Forest SelfTuning Indexes for Similarity Searchrdquo M Bawa T Condie P Ganesanrdquo)
bull Genomics (ldquoEfficient Large-Scale Sequence Comparison by Locality-Sensitive Hashingrdquo J Buhler)
bull In short whenever K-Nearest Neighbors (KNN) are needed
Motivation
bull A variety of procedures in learning require KNN computation
bull KNN search is a computational bottleneck
bull LSH provides a fast approximate solution to the problem
bull LSH requires hash function construction and parameter tunning
Outline
Fast Pose Estimation with Parameter Sensitive Hashing G Shakhnarovich P Viola and T Darrell
bull Finding sensitive hash functions
Mean Shift Based Clustering in HighDimensions A Texture Classification Example
B Georgescu I Shimshoni and P Meer
bull Tuning LSH parametersbull LSH data structure is used for algorithm
speedups
Given an image x what are the parameters θ in this image
ie angles of joints orientation of the body etc1048698
The Problem
Fast Pose Estimation with Parameter Sensitive Hashing
G Shakhnarovich P Viola and T Darrell
i
Ingredients
bull Input query image with unknown angles (parameters)
bull Database of human poses with known anglesbull Image feature extractor ndash edge detector
bull Distance metric in feature space dx
bull Distance metric in angles space
m
i
iid1
2121 )cos(1)(
Example based learning
bull Construct a database of example images with their known angles
bull Given a query image run your favorite feature extractorbull Compute KNN from databasebull Use these KNNs to compute the average angles of the
query
Input queryFind KNN in database of examples
Output Average angles of KNN
Input Query
Features extraction
Processed query
PSH (LSH)
Database of examples
The algorithm flow
LWR (Regression)
Output Match
The image features
B A
Axx 4107 )(
4
3
2
4 0
Image features are multi-scale edge histograms
Feature Extraction PSH LWR
PSH The basic assumption
There are two metric spaces here feature space ( )
and parameter space ( )
We want similarity to be measured in the angles
space whereas LSH works on the feature space
bull Assumption The feature space is closely related to the parameter space
xd
d
Feature Extraction PSH LWR
Insight Manifolds
bull Manifold is a space in which every point has a neighborhood resembling a Euclid space
bull But global structure may be complicated curved
bull For example lines are 1D manifolds planes are 2D manifolds etc
Feature Extraction PSH LWR
Parameters Space (angles)
Feature Space
q
Is this Magic
Parameter Sensitive Hashing (PSH)
The trick
Estimate performance of different hash functions on examples and select those sensitive to
The hash functions are applied in feature space but the KNN are valid in angle space
d
Feature Extraction PSH LWR
Label pairs of examples with similar angles
Define hash functions h on feature space
Feature Extraction PSH LWR
Predict labeling of similarnon-similar examples by using h
Compare labeling
If labeling by h is goodaccept h else change h
PSH as a classification problem
+1 +1 -1 -1
(r=025)
Labels
)1()( if 1
)( if 1y
labeled is
)x()(x examples ofpair A
ij
ji
rd
rd
ji
ji
ji
Feature Extraction PSH LWR
otherwise 1-
T(x) if 1)(
xh T
A binary hash functionfeatures
otherwise 1
if 1ˆ
labels ePredict th
)(xh)(xh)x(xy
jTiTjih
Feature Extraction PSH LWR
Feature
Feature Extraction PSH LWR
sconstraint iesprobabilit with thelabeling true thepredicts that Tbest theFind
themseparateor bin
same in the examplesboth place willTh
)(xT
Local Weighted Regression (LWR)bull Given a query image PSH returns
KNNs
bull LWR uses the KNN to compute a weighted average of the estimated angles of the query
weightdist
iXiixNx
xxdKxgdi
0)(
))(())((minarg0
Feature Extraction PSH LWR
Results
Synthetic data were generated
bull 13 angles 1 for rotation of the torso 12 for joints
bull 150000 images
bull Nuisance parameters added clothing illumination face expression
bull 1775000 example pairs
bull Selected 137 out of 5123 meaningful features (how)
18 bit hash functions (k) 150 hash tables (l)
bull Test on 1000 synthetic examplesbull PSH searched only 34 of the data per query
bull Without selection needed 40 bits and
1000 hash tables
Recall P1 is prob of positive hashP2 is prob of bad hashB is the max number of pts in a bucket
Results ndash real data
bull 800 images
bull Processed by a segmentation algorithm
bull 13 of the data were searched
Results ndash real data
Interesting mismatches
Fast pose estimation - summary
bull Fast way to compute the angles of human body figure
bull Moving from one representation space to another
bull Training a sensitive hash function
bull KNN smart averaging
Food for Thought
bull The basic assumption may be problematic (distance metric representations)
bull The training set should be dense
bull Texture and clutter
bull General some features are more important than others and should be weighted
Food for Thought Point Location in Different Spheres (PLDS)
bull Given n spheres in Rd centered at P=p1hellippn
with radii r1helliprn
bull Goal given a query q preprocess the points in P to find point pi that its sphere lsquocoverrsquo the query q
qpi
ri
Courtesy of Mohamad Hegaze
Motivationbull Clustering high dimensional data by using local
density measurements (eg feature space)bull Statistical curse of dimensionality
sparseness of the databull Computational curse of dimensionality
expensive range queriesbull LSH parameters should be adjusted for optimal
performance
Mean-Shift Based Clustering in High Dimensions A Texture Classification Example
B Georgescu I Shimshoni and P Meer
Outline
bull Mean-shift in a nutshell + examples
Our scope
bull Mean-shift in high dimensions ndash using LSH
bull Speedups1 Finding optimal LSH parameters
2 Data-driven partitions into buckets
3 Additional speedup by using LSH data structure
Mean-Shift in a Nutshellbandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
point
KNN in mean-shift
Bandwidth should be inversely proportional to the density in the region
high density - small bandwidth low density - large bandwidth
Based on kth nearest neighbor of the point
The bandwidth is
Adaptive mean-shift vs non-adaptive
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Image segmentation algorithm1 Input Data in 5D (3 color + 2 xy) or 3D (1 gray +2 xy)2 Resolution controlled by the bandwidth hs (spatial) hr (color)3 Apply filtering
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
3D
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Image segmentation algorithm
original segmented
filtered
Filtering pixel value of the nearest mode
Mean-shift trajectories
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
original squirrel filtered
original baboon filtered
Filtering examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Segmentation examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Mean-shift in high dimensions
Computational curse of dimensionality
Statistical curse of dimensionality
Expensive range queries implemented with LSH
Sparseness of the data variable bandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
LSH-based data structure
bull Choose L random partitionsEach partition includes K pairs
(dkvk)bull For each point we check
kdi vxK
It Partitions the data into cells
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing the optimal K and L
bull For a query q compute smallest number of distances to points in its buckets
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
points extra includemight big toois L ifbut
missed bemight points small toois L If
cell ain points ofnumber smaller k Large
C
l
l
CC
dC
LNN
dKnN
)1(
C
C
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
structure data theof resolution thedetermines
decreases but increases increases L As
C
CC
Choosing optimal K and LDetermine accurately the KNN for m randomly-selected data points
distance (bandwidth)
Choose error threshold
The optimal K and L should satisfy
the approximate distance
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing optimal K and Lbull For each K estimate the error forbull In one run for all Lrsquos find the minimal L satisfying the constraint L(K)bull Minimize time t(KL(K))
minimum
Approximationerror for KL
L(K) for =005 Running timet[KL(K)]
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Data driven partitions
bull In original LSH cut values are random in the range of the databull Suggestion Randomly select a point from the data and use one of its coordinates as the cut value
uniform data driven pointsbucket distribution
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Additional speedup
aggregate)an of typea like is (C mode same
the toconverge willCin points all that Assume
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
C
C
Speedup results
65536 points 1638 points sampled k=100
Food for thought
Low dimension High dimension
A thought for foodhellipbull Choose K L by sample learning or take the traditionalbull Can one estimate K L without samplingbull A thought for food does it help to know the data
dimensionality or the data manifoldbull Intuitively dimensionality implies the number of hash
functions neededbull The catch efficient dimensionality learning requires
KNN
1530 cookieshellip
Summary
bull LSH suggests a compromise on accuracy for the gain of complexity
bull Applications that involve massive data in high dimension require the LSH fast performance
bull Extension of the LSH to different spaces (PSH)
bull Learning the LSH parameters and hash functions for different applications
Conclusion
bull but at the endeverything depends on your data set
bull Try it at homendash Visit
httpwebmiteduandoniwwwLSHindexhtmlndash Email Alex Andoni Andonimitedundash Test over your own data
(C code under Red Hat Linux )
Thanks
bull Ilan Shimshoni (Haifa)
bull Mohamad Hegaze (Weizmann)
bull Alex Andoni (MIT)
bull Mica and Denis
- k-Nearest Neighbors Search in High Dimensions
- Outline
- Nearest Neighbor Search Problem definition
- Applications
- Naiumlve solution
- Common solution
- When to use nearest neighbor
- Nearest Neighbor
- r - Nearest Neighbor
- Slide 10
- The simplest solution
- Quadtree
- Quadtree - structure
- Quadtree - Query
- Quadtree ndash Pitfall1
- Slide 16
- Quadtree ndash pitfall 2
- Space partition based algorithms
- Slide 19
- Curse of dimensionality
- Curse of dimensionality Some intuition
- Slide 22
- Preview
- Hash function
- Slide 25
- Slide 26
- Recall r - Nearest Neighbor
- Locality sensitive hashing
- Slide 29
- Hamming Space
- Slide 31
- L1 to Hamming Space Embedding
- Slide 33
- Construction
- Query
- Alternative intuition random projections
- Slide 37
- Slide 38
- Slide 39
- k samplings
- Repeating
- Repeating L times
- Slide 43
- Secondary hashing
- The above hashing is locality-sensitive
- Slide 46
- Direct L2 solution
- Central limit theorem
- Slide 49
- Slide 50
- Norm Distance
- Slide 52
- The full Hashing
- Slide 54
- Slide 55
- Slide 56
- Generalization P-Stable distribution
- P-Stable summary
- Parameters selection
- Parameters selection hellip
- hellip Parameters selection
- Pros amp Cons
- Conclusion
- LSH - Applications
- Motivation
- Slide 66
- Given an image x what are the parameters θ in this image ie angles of joints orientation of the body etc1048698
- Ingredients
- Example based learning
- Slide 70
- The image features
- PSH The basic assumption
- Insight Manifolds
- Slide 74
- Parameter Sensitive Hashing (PSH)
- Slide 76
- Slide 77
- Slide 78
- Slide 79
- Local Weighted Regression (LWR)
- Results
- Slide 82
- Results ndash real data
- Slide 84
- Slide 85
- Fast pose estimation - summary
- Food for Thought
- Food for Thought Point Location in Different Spheres (PLDS)
- Motivation
- Slide 90
- Slide 91
- Slide 92
- Slide 93
- Image segmentation algorithm
- Slide 95
- Filtering examples
- Segmentation examples
- Mean-shift in high dimensions
- LSH-based data structure
- Choosing the optimal K and L
- Slide 101
- Choosing optimal K and L
- Slide 103
- Data driven partitions
- Additional speedup
- Speedup results
- Food for thought
- A thought for foodhellip
- Summary
- Slide 110
- Thanks
-
![Page 42: K-Nearest Neighbors Search in High Dimensions Tomer Peled Dan Kushnir Tell me who your neighbors are, and I'll know who you are.](https://reader036.fdocuments.us/reader036/viewer/2022062421/56649c765503460f94929f53/html5/thumbnails/42.jpg)
Repeating L times
Repeating L times
Secondary hashing
Support volume tuning
dataset-size vs storage volume
2k buckets
011
Size=B
M Buckets
Simple Hashing
MB=αn α=2
Skip
The above hashing is locality-sensitive
bullProbability (pq in same bucket)=
k=1 k=2
Distance (qpi) Distance (qpi)
Pro
babi
lity Pr
Adopted from Piotr Indykrsquos slides
kqp
dimensions
)(Distance1
Preview
bullGeneral Solution ndash Locality sensitive hashing
bullImplementation for Hamming space
bullGeneralization to l2
Direct L2 solution
bullNew hashing function
bullStill based on sampling
bullUsing mathematical trick
bullP-stable distribution for Lp distance bullGaussian distribution for L2 distance
Central limit theorem
v1 +v2 hellip+vn =+hellip
(Weighted Gaussians) = Weighted Gaussian
Central limit theorem
v1vn = Real Numbers
X1Xn = Independent Identically Distributed(iid)
+v2 X2 hellip+vn Xn =+hellipv1 X1
Central limit theorem
XvXvi
ii
ii
21
2||
Dot Product Norm
Norm Distance
XvuXvXui
iii
iii
ii
21
2||
Features vector 1
Features vector 2 Distance
Norm Distance
XvuXvXui
iii
iii
ii
21
2||
Dot Product
Dot Product Distance
The full Hashing
w
bvavh ba )(
[34 82 21]1
227742
d
d random numbers
+b
phaseRandom[0w]
wDiscretization step
Features vector
The full Hashing
w
bvavh ba )(
+34
100
7944
7900 8000 8100 82007800
The full Hashing
w
bvavh ba )(
+34
phaseRandom[0w]
100Discretization step
7944
The full Hashing
w
bvavh ba )(
a1 v d
iid from p-stable distribution
+b
phaseRandom[0w]
wDiscretization step
Features vector
Generalization P-Stable distribution
bullLp p=eps2
bullGeneralized Central Limit Theorem
bullP-stable distributionCauchy for L2
bullL2
bullCentral Limit Theorem
bullGaussian (normal) distribution
P-Stable summary
bullWorks for bullGeneralizes to 0ltplt=2
bullImproves query time
Query time = O (dn1(1+)log(n) ) O (dn1(1+)^2log(n) )
r - Nearest Neighbor
Latest resultsReported in Email by
Alexander Andoni
Parameters selection
bull90 Probability Best quarry time performance
For Euclidean Space
Parameters selectionhellip
For Euclidean Space
bullSingle projection hit an - Nearest Neighbor with Pr=p1
bullk projections hits an - Nearest Neighbor with Pr=p1k
bullL hashings fail to collide with Pr=(1-p1k)L
bullTo ensure Collision (eg 1-δge90)
bull1( -1-p1k)Lge 1-δ)1log(
)log(
1kp
L
L
Reject Non-NeighborsAccept Neighbors
hellipParameters selection
K
k
time Candidates verification Candidates extraction
Better Query Time than Spatial Data Structures
Scales well to higher dimensions and larger data size ( Sub-linear dependence )
Predictable running time
Extra storage over-head
Inefficient for data with distances concentrated around average
works best for Hamming distance (although can be generalized to Euclidean space)
In secondary storage linear scan is pretty much all we can do (for high dim)
requires radius r to be fixed in advance
Pros amp Cons
From Pioter Indyk slides
Conclusion
bullbut at the endeverything depends on your data set
bullTry it at homendashVisit
httpwebmiteduandoniwwwLSHindexhtml
ndashEmail Alex AndoniAndonimitedundashTest over your own data
(C code under Red Hat Linux )
LSH - Applicationsbull Searching video clips in databases (Hierarchical Non-Uniform Locality Sensitive
Hashing and Its Application to Video Identificationldquo Yang Ooi Sun)
bull Searching image databases (see the following)
bull Image segmentation (see the following)
bull Image classification (ldquoDiscriminant adaptive Nearest Neighbor Classificationrdquo T Hastie R Tibshirani)
bull Texture classification (see the following)
bull Clustering (see the following)
bull Embedding and manifold learning (LLE and many others)
bull Compression ndash vector quantizationbull Search engines (ldquoLSH Forest SelfTuning Indexes for Similarity Searchrdquo M Bawa T Condie P Ganesanrdquo)
bull Genomics (ldquoEfficient Large-Scale Sequence Comparison by Locality-Sensitive Hashingrdquo J Buhler)
bull In short whenever K-Nearest Neighbors (KNN) are needed
Motivation
bull A variety of procedures in learning require KNN computation
bull KNN search is a computational bottleneck
bull LSH provides a fast approximate solution to the problem
bull LSH requires hash function construction and parameter tunning
Outline
Fast Pose Estimation with Parameter Sensitive Hashing G Shakhnarovich P Viola and T Darrell
bull Finding sensitive hash functions
Mean Shift Based Clustering in HighDimensions A Texture Classification Example
B Georgescu I Shimshoni and P Meer
bull Tuning LSH parametersbull LSH data structure is used for algorithm
speedups
Given an image x what are the parameters θ in this image
ie angles of joints orientation of the body etc1048698
The Problem
Fast Pose Estimation with Parameter Sensitive Hashing
G Shakhnarovich P Viola and T Darrell
i
Ingredients
bull Input query image with unknown angles (parameters)
bull Database of human poses with known anglesbull Image feature extractor ndash edge detector
bull Distance metric in feature space dx
bull Distance metric in angles space
m
i
iid1
2121 )cos(1)(
Example based learning
bull Construct a database of example images with their known angles
bull Given a query image run your favorite feature extractorbull Compute KNN from databasebull Use these KNNs to compute the average angles of the
query
Input queryFind KNN in database of examples
Output Average angles of KNN
Input Query
Features extraction
Processed query
PSH (LSH)
Database of examples
The algorithm flow
LWR (Regression)
Output Match
The image features
B A
Axx 4107 )(
4
3
2
4 0
Image features are multi-scale edge histograms
Feature Extraction PSH LWR
PSH The basic assumption
There are two metric spaces here feature space ( )
and parameter space ( )
We want similarity to be measured in the angles
space whereas LSH works on the feature space
bull Assumption The feature space is closely related to the parameter space
xd
d
Feature Extraction PSH LWR
Insight Manifolds
bull Manifold is a space in which every point has a neighborhood resembling a Euclid space
bull But global structure may be complicated curved
bull For example lines are 1D manifolds planes are 2D manifolds etc
Feature Extraction PSH LWR
Parameters Space (angles)
Feature Space
q
Is this Magic
Parameter Sensitive Hashing (PSH)
The trick
Estimate performance of different hash functions on examples and select those sensitive to
The hash functions are applied in feature space but the KNN are valid in angle space
d
Feature Extraction PSH LWR
Label pairs of examples with similar angles
Define hash functions h on feature space
Feature Extraction PSH LWR
Predict labeling of similarnon-similar examples by using h
Compare labeling
If labeling by h is goodaccept h else change h
PSH as a classification problem
+1 +1 -1 -1
(r=025)
Labels
)1()( if 1
)( if 1y
labeled is
)x()(x examples ofpair A
ij
ji
rd
rd
ji
ji
ji
Feature Extraction PSH LWR
otherwise 1-
T(x) if 1)(
xh T
A binary hash functionfeatures
otherwise 1
if 1ˆ
labels ePredict th
)(xh)(xh)x(xy
jTiTjih
Feature Extraction PSH LWR
Feature
Feature Extraction PSH LWR
sconstraint iesprobabilit with thelabeling true thepredicts that Tbest theFind
themseparateor bin
same in the examplesboth place willTh
)(xT
Local Weighted Regression (LWR)bull Given a query image PSH returns
KNNs
bull LWR uses the KNN to compute a weighted average of the estimated angles of the query
weightdist
iXiixNx
xxdKxgdi
0)(
))(())((minarg0
Feature Extraction PSH LWR
Results
Synthetic data were generated
bull 13 angles 1 for rotation of the torso 12 for joints
bull 150000 images
bull Nuisance parameters added clothing illumination face expression
bull 1775000 example pairs
bull Selected 137 out of 5123 meaningful features (how)
18 bit hash functions (k) 150 hash tables (l)
bull Test on 1000 synthetic examplesbull PSH searched only 34 of the data per query
bull Without selection needed 40 bits and
1000 hash tables
Recall P1 is prob of positive hashP2 is prob of bad hashB is the max number of pts in a bucket
Results ndash real data
bull 800 images
bull Processed by a segmentation algorithm
bull 13 of the data were searched
Results ndash real data
Interesting mismatches
Fast pose estimation - summary
bull Fast way to compute the angles of human body figure
bull Moving from one representation space to another
bull Training a sensitive hash function
bull KNN smart averaging
Food for Thought
bull The basic assumption may be problematic (distance metric representations)
bull The training set should be dense
bull Texture and clutter
bull General some features are more important than others and should be weighted
Food for Thought Point Location in Different Spheres (PLDS)
bull Given n spheres in Rd centered at P=p1hellippn
with radii r1helliprn
bull Goal given a query q preprocess the points in P to find point pi that its sphere lsquocoverrsquo the query q
qpi
ri
Courtesy of Mohamad Hegaze
Motivationbull Clustering high dimensional data by using local
density measurements (eg feature space)bull Statistical curse of dimensionality
sparseness of the databull Computational curse of dimensionality
expensive range queriesbull LSH parameters should be adjusted for optimal
performance
Mean-Shift Based Clustering in High Dimensions A Texture Classification Example
B Georgescu I Shimshoni and P Meer
Outline
bull Mean-shift in a nutshell + examples
Our scope
bull Mean-shift in high dimensions ndash using LSH
bull Speedups1 Finding optimal LSH parameters
2 Data-driven partitions into buckets
3 Additional speedup by using LSH data structure
Mean-Shift in a Nutshellbandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
point
KNN in mean-shift
Bandwidth should be inversely proportional to the density in the region
high density - small bandwidth low density - large bandwidth
Based on kth nearest neighbor of the point
The bandwidth is
Adaptive mean-shift vs non-adaptive
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Image segmentation algorithm1 Input Data in 5D (3 color + 2 xy) or 3D (1 gray +2 xy)2 Resolution controlled by the bandwidth hs (spatial) hr (color)3 Apply filtering
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
3D
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Image segmentation algorithm
original segmented
filtered
Filtering pixel value of the nearest mode
Mean-shift trajectories
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
original squirrel filtered
original baboon filtered
Filtering examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Segmentation examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Mean-shift in high dimensions
Computational curse of dimensionality
Statistical curse of dimensionality
Expensive range queries implemented with LSH
Sparseness of the data variable bandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
LSH-based data structure
bull Choose L random partitionsEach partition includes K pairs
(dkvk)bull For each point we check
kdi vxK
It Partitions the data into cells
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing the optimal K and L
bull For a query q compute smallest number of distances to points in its buckets
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
points extra includemight big toois L ifbut
missed bemight points small toois L If
cell ain points ofnumber smaller k Large
C
l
l
CC
dC
LNN
dKnN
)1(
C
C
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
structure data theof resolution thedetermines
decreases but increases increases L As
C
CC
Choosing optimal K and LDetermine accurately the KNN for m randomly-selected data points
distance (bandwidth)
Choose error threshold
The optimal K and L should satisfy
the approximate distance
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing optimal K and Lbull For each K estimate the error forbull In one run for all Lrsquos find the minimal L satisfying the constraint L(K)bull Minimize time t(KL(K))
minimum
Approximationerror for KL
L(K) for =005 Running timet[KL(K)]
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Data driven partitions
bull In original LSH cut values are random in the range of the databull Suggestion Randomly select a point from the data and use one of its coordinates as the cut value
uniform data driven pointsbucket distribution
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Additional speedup
aggregate)an of typea like is (C mode same
the toconverge willCin points all that Assume
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
C
C
Speedup results
65536 points 1638 points sampled k=100
Food for thought
Low dimension High dimension
A thought for foodhellipbull Choose K L by sample learning or take the traditionalbull Can one estimate K L without samplingbull A thought for food does it help to know the data
dimensionality or the data manifoldbull Intuitively dimensionality implies the number of hash
functions neededbull The catch efficient dimensionality learning requires
KNN
1530 cookieshellip
Summary
bull LSH suggests a compromise on accuracy for the gain of complexity
bull Applications that involve massive data in high dimension require the LSH fast performance
bull Extension of the LSH to different spaces (PSH)
bull Learning the LSH parameters and hash functions for different applications
Conclusion
bull but at the endeverything depends on your data set
bull Try it at homendash Visit
httpwebmiteduandoniwwwLSHindexhtmlndash Email Alex Andoni Andonimitedundash Test over your own data
(C code under Red Hat Linux )
Thanks
bull Ilan Shimshoni (Haifa)
bull Mohamad Hegaze (Weizmann)
bull Alex Andoni (MIT)
bull Mica and Denis
- k-Nearest Neighbors Search in High Dimensions
- Outline
- Nearest Neighbor Search Problem definition
- Applications
- Naiumlve solution
- Common solution
- When to use nearest neighbor
- Nearest Neighbor
- r - Nearest Neighbor
- Slide 10
- The simplest solution
- Quadtree
- Quadtree - structure
- Quadtree - Query
- Quadtree ndash Pitfall1
- Slide 16
- Quadtree ndash pitfall 2
- Space partition based algorithms
- Slide 19
- Curse of dimensionality
- Curse of dimensionality Some intuition
- Slide 22
- Preview
- Hash function
- Slide 25
- Slide 26
- Recall r - Nearest Neighbor
- Locality sensitive hashing
- Slide 29
- Hamming Space
- Slide 31
- L1 to Hamming Space Embedding
- Slide 33
- Construction
- Query
- Alternative intuition random projections
- Slide 37
- Slide 38
- Slide 39
- k samplings
- Repeating
- Repeating L times
- Slide 43
- Secondary hashing
- The above hashing is locality-sensitive
- Slide 46
- Direct L2 solution
- Central limit theorem
- Slide 49
- Slide 50
- Norm Distance
- Slide 52
- The full Hashing
- Slide 54
- Slide 55
- Slide 56
- Generalization P-Stable distribution
- P-Stable summary
- Parameters selection
- Parameters selection hellip
- hellip Parameters selection
- Pros amp Cons
- Conclusion
- LSH - Applications
- Motivation
- Slide 66
- Given an image x what are the parameters θ in this image ie angles of joints orientation of the body etc1048698
- Ingredients
- Example based learning
- Slide 70
- The image features
- PSH The basic assumption
- Insight Manifolds
- Slide 74
- Parameter Sensitive Hashing (PSH)
- Slide 76
- Slide 77
- Slide 78
- Slide 79
- Local Weighted Regression (LWR)
- Results
- Slide 82
- Results ndash real data
- Slide 84
- Slide 85
- Fast pose estimation - summary
- Food for Thought
- Food for Thought Point Location in Different Spheres (PLDS)
- Motivation
- Slide 90
- Slide 91
- Slide 92
- Slide 93
- Image segmentation algorithm
- Slide 95
- Filtering examples
- Segmentation examples
- Mean-shift in high dimensions
- LSH-based data structure
- Choosing the optimal K and L
- Slide 101
- Choosing optimal K and L
- Slide 103
- Data driven partitions
- Additional speedup
- Speedup results
- Food for thought
- A thought for foodhellip
- Summary
- Slide 110
- Thanks
-
![Page 43: K-Nearest Neighbors Search in High Dimensions Tomer Peled Dan Kushnir Tell me who your neighbors are, and I'll know who you are.](https://reader036.fdocuments.us/reader036/viewer/2022062421/56649c765503460f94929f53/html5/thumbnails/43.jpg)
Repeating L times
Secondary hashing
Support volume tuning
dataset-size vs storage volume
2k buckets
011
Size=B
M Buckets
Simple Hashing
MB=αn α=2
Skip
The above hashing is locality-sensitive
bullProbability (pq in same bucket)=
k=1 k=2
Distance (qpi) Distance (qpi)
Pro
babi
lity Pr
Adopted from Piotr Indykrsquos slides
kqp
dimensions
)(Distance1
Preview
bullGeneral Solution ndash Locality sensitive hashing
bullImplementation for Hamming space
bullGeneralization to l2
Direct L2 solution
bullNew hashing function
bullStill based on sampling
bullUsing mathematical trick
bullP-stable distribution for Lp distance bullGaussian distribution for L2 distance
Central limit theorem
v1 +v2 hellip+vn =+hellip
(Weighted Gaussians) = Weighted Gaussian
Central limit theorem
v1vn = Real Numbers
X1Xn = Independent Identically Distributed(iid)
+v2 X2 hellip+vn Xn =+hellipv1 X1
Central limit theorem
XvXvi
ii
ii
21
2||
Dot Product Norm
Norm Distance
XvuXvXui
iii
iii
ii
21
2||
Features vector 1
Features vector 2 Distance
Norm Distance
XvuXvXui
iii
iii
ii
21
2||
Dot Product
Dot Product Distance
The full Hashing
w
bvavh ba )(
[34 82 21]1
227742
d
d random numbers
+b
phaseRandom[0w]
wDiscretization step
Features vector
The full Hashing
w
bvavh ba )(
+34
100
7944
7900 8000 8100 82007800
The full Hashing
w
bvavh ba )(
+34
phaseRandom[0w]
100Discretization step
7944
The full Hashing
w
bvavh ba )(
a1 v d
iid from p-stable distribution
+b
phaseRandom[0w]
wDiscretization step
Features vector
Generalization P-Stable distribution
bullLp p=eps2
bullGeneralized Central Limit Theorem
bullP-stable distributionCauchy for L2
bullL2
bullCentral Limit Theorem
bullGaussian (normal) distribution
P-Stable summary
bullWorks for bullGeneralizes to 0ltplt=2
bullImproves query time
Query time = O (dn1(1+)log(n) ) O (dn1(1+)^2log(n) )
r - Nearest Neighbor
Latest resultsReported in Email by
Alexander Andoni
Parameters selection
bull90 Probability Best quarry time performance
For Euclidean Space
Parameters selectionhellip
For Euclidean Space
bullSingle projection hit an - Nearest Neighbor with Pr=p1
bullk projections hits an - Nearest Neighbor with Pr=p1k
bullL hashings fail to collide with Pr=(1-p1k)L
bullTo ensure Collision (eg 1-δge90)
bull1( -1-p1k)Lge 1-δ)1log(
)log(
1kp
L
L
Reject Non-NeighborsAccept Neighbors
hellipParameters selection
K
k
time Candidates verification Candidates extraction
Better Query Time than Spatial Data Structures
Scales well to higher dimensions and larger data size ( Sub-linear dependence )
Predictable running time
Extra storage over-head
Inefficient for data with distances concentrated around average
works best for Hamming distance (although can be generalized to Euclidean space)
In secondary storage linear scan is pretty much all we can do (for high dim)
requires radius r to be fixed in advance
Pros amp Cons
From Pioter Indyk slides
Conclusion
bullbut at the endeverything depends on your data set
bullTry it at homendashVisit
httpwebmiteduandoniwwwLSHindexhtml
ndashEmail Alex AndoniAndonimitedundashTest over your own data
(C code under Red Hat Linux )
LSH - Applicationsbull Searching video clips in databases (Hierarchical Non-Uniform Locality Sensitive
Hashing and Its Application to Video Identificationldquo Yang Ooi Sun)
bull Searching image databases (see the following)
bull Image segmentation (see the following)
bull Image classification (ldquoDiscriminant adaptive Nearest Neighbor Classificationrdquo T Hastie R Tibshirani)
bull Texture classification (see the following)
bull Clustering (see the following)
bull Embedding and manifold learning (LLE and many others)
bull Compression ndash vector quantizationbull Search engines (ldquoLSH Forest SelfTuning Indexes for Similarity Searchrdquo M Bawa T Condie P Ganesanrdquo)
bull Genomics (ldquoEfficient Large-Scale Sequence Comparison by Locality-Sensitive Hashingrdquo J Buhler)
bull In short whenever K-Nearest Neighbors (KNN) are needed
Motivation
bull A variety of procedures in learning require KNN computation
bull KNN search is a computational bottleneck
bull LSH provides a fast approximate solution to the problem
bull LSH requires hash function construction and parameter tunning
Outline
Fast Pose Estimation with Parameter Sensitive Hashing G Shakhnarovich P Viola and T Darrell
bull Finding sensitive hash functions
Mean Shift Based Clustering in HighDimensions A Texture Classification Example
B Georgescu I Shimshoni and P Meer
bull Tuning LSH parametersbull LSH data structure is used for algorithm
speedups
Given an image x what are the parameters θ in this image
ie angles of joints orientation of the body etc1048698
The Problem
Fast Pose Estimation with Parameter Sensitive Hashing
G Shakhnarovich P Viola and T Darrell
i
Ingredients
bull Input query image with unknown angles (parameters)
bull Database of human poses with known anglesbull Image feature extractor ndash edge detector
bull Distance metric in feature space dx
bull Distance metric in angles space
m
i
iid1
2121 )cos(1)(
Example based learning
bull Construct a database of example images with their known angles
bull Given a query image run your favorite feature extractorbull Compute KNN from databasebull Use these KNNs to compute the average angles of the
query
Input queryFind KNN in database of examples
Output Average angles of KNN
Input Query
Features extraction
Processed query
PSH (LSH)
Database of examples
The algorithm flow
LWR (Regression)
Output Match
The image features
B A
Axx 4107 )(
4
3
2
4 0
Image features are multi-scale edge histograms
Feature Extraction PSH LWR
PSH The basic assumption
There are two metric spaces here feature space ( )
and parameter space ( )
We want similarity to be measured in the angles
space whereas LSH works on the feature space
bull Assumption The feature space is closely related to the parameter space
xd
d
Feature Extraction PSH LWR
Insight Manifolds
bull Manifold is a space in which every point has a neighborhood resembling a Euclid space
bull But global structure may be complicated curved
bull For example lines are 1D manifolds planes are 2D manifolds etc
Feature Extraction PSH LWR
Parameters Space (angles)
Feature Space
q
Is this Magic
Parameter Sensitive Hashing (PSH)
The trick
Estimate performance of different hash functions on examples and select those sensitive to
The hash functions are applied in feature space but the KNN are valid in angle space
d
Feature Extraction PSH LWR
Label pairs of examples with similar angles
Define hash functions h on feature space
Feature Extraction PSH LWR
Predict labeling of similarnon-similar examples by using h
Compare labeling
If labeling by h is goodaccept h else change h
PSH as a classification problem
+1 +1 -1 -1
(r=025)
Labels
)1()( if 1
)( if 1y
labeled is
)x()(x examples ofpair A
ij
ji
rd
rd
ji
ji
ji
Feature Extraction PSH LWR
otherwise 1-
T(x) if 1)(
xh T
A binary hash functionfeatures
otherwise 1
if 1ˆ
labels ePredict th
)(xh)(xh)x(xy
jTiTjih
Feature Extraction PSH LWR
Feature
Feature Extraction PSH LWR
sconstraint iesprobabilit with thelabeling true thepredicts that Tbest theFind
themseparateor bin
same in the examplesboth place willTh
)(xT
Local Weighted Regression (LWR)bull Given a query image PSH returns
KNNs
bull LWR uses the KNN to compute a weighted average of the estimated angles of the query
weightdist
iXiixNx
xxdKxgdi
0)(
))(())((minarg0
Feature Extraction PSH LWR
Results
Synthetic data were generated
bull 13 angles 1 for rotation of the torso 12 for joints
bull 150000 images
bull Nuisance parameters added clothing illumination face expression
bull 1775000 example pairs
bull Selected 137 out of 5123 meaningful features (how)
18 bit hash functions (k) 150 hash tables (l)
bull Test on 1000 synthetic examplesbull PSH searched only 34 of the data per query
bull Without selection needed 40 bits and
1000 hash tables
Recall P1 is prob of positive hashP2 is prob of bad hashB is the max number of pts in a bucket
Results ndash real data
bull 800 images
bull Processed by a segmentation algorithm
bull 13 of the data were searched
Results ndash real data
Interesting mismatches
Fast pose estimation - summary
bull Fast way to compute the angles of human body figure
bull Moving from one representation space to another
bull Training a sensitive hash function
bull KNN smart averaging
Food for Thought
bull The basic assumption may be problematic (distance metric representations)
bull The training set should be dense
bull Texture and clutter
bull General some features are more important than others and should be weighted
Food for Thought Point Location in Different Spheres (PLDS)
bull Given n spheres in Rd centered at P=p1hellippn
with radii r1helliprn
bull Goal given a query q preprocess the points in P to find point pi that its sphere lsquocoverrsquo the query q
qpi
ri
Courtesy of Mohamad Hegaze
Motivationbull Clustering high dimensional data by using local
density measurements (eg feature space)bull Statistical curse of dimensionality
sparseness of the databull Computational curse of dimensionality
expensive range queriesbull LSH parameters should be adjusted for optimal
performance
Mean-Shift Based Clustering in High Dimensions A Texture Classification Example
B Georgescu I Shimshoni and P Meer
Outline
bull Mean-shift in a nutshell + examples
Our scope
bull Mean-shift in high dimensions ndash using LSH
bull Speedups1 Finding optimal LSH parameters
2 Data-driven partitions into buckets
3 Additional speedup by using LSH data structure
Mean-Shift in a Nutshellbandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
point
KNN in mean-shift
Bandwidth should be inversely proportional to the density in the region
high density - small bandwidth low density - large bandwidth
Based on kth nearest neighbor of the point
The bandwidth is
Adaptive mean-shift vs non-adaptive
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Image segmentation algorithm1 Input Data in 5D (3 color + 2 xy) or 3D (1 gray +2 xy)2 Resolution controlled by the bandwidth hs (spatial) hr (color)3 Apply filtering
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
3D
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Image segmentation algorithm
original segmented
filtered
Filtering pixel value of the nearest mode
Mean-shift trajectories
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
original squirrel filtered
original baboon filtered
Filtering examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Segmentation examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Mean-shift in high dimensions
Computational curse of dimensionality
Statistical curse of dimensionality
Expensive range queries implemented with LSH
Sparseness of the data variable bandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
LSH-based data structure
bull Choose L random partitionsEach partition includes K pairs
(dkvk)bull For each point we check
kdi vxK
It Partitions the data into cells
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing the optimal K and L
bull For a query q compute smallest number of distances to points in its buckets
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
points extra includemight big toois L ifbut
missed bemight points small toois L If
cell ain points ofnumber smaller k Large
C
l
l
CC
dC
LNN
dKnN
)1(
C
C
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
structure data theof resolution thedetermines
decreases but increases increases L As
C
CC
Choosing optimal K and LDetermine accurately the KNN for m randomly-selected data points
distance (bandwidth)
Choose error threshold
The optimal K and L should satisfy
the approximate distance
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing optimal K and Lbull For each K estimate the error forbull In one run for all Lrsquos find the minimal L satisfying the constraint L(K)bull Minimize time t(KL(K))
minimum
Approximationerror for KL
L(K) for =005 Running timet[KL(K)]
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Data driven partitions
bull In original LSH cut values are random in the range of the databull Suggestion Randomly select a point from the data and use one of its coordinates as the cut value
uniform data driven pointsbucket distribution
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Additional speedup
aggregate)an of typea like is (C mode same
the toconverge willCin points all that Assume
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
C
C
Speedup results
65536 points 1638 points sampled k=100
Food for thought
Low dimension High dimension
A thought for foodhellipbull Choose K L by sample learning or take the traditionalbull Can one estimate K L without samplingbull A thought for food does it help to know the data
dimensionality or the data manifoldbull Intuitively dimensionality implies the number of hash
functions neededbull The catch efficient dimensionality learning requires
KNN
1530 cookieshellip
Summary
bull LSH suggests a compromise on accuracy for the gain of complexity
bull Applications that involve massive data in high dimension require the LSH fast performance
bull Extension of the LSH to different spaces (PSH)
bull Learning the LSH parameters and hash functions for different applications
Conclusion
bull but at the endeverything depends on your data set
bull Try it at homendash Visit
httpwebmiteduandoniwwwLSHindexhtmlndash Email Alex Andoni Andonimitedundash Test over your own data
(C code under Red Hat Linux )
Thanks
bull Ilan Shimshoni (Haifa)
bull Mohamad Hegaze (Weizmann)
bull Alex Andoni (MIT)
bull Mica and Denis
- k-Nearest Neighbors Search in High Dimensions
- Outline
- Nearest Neighbor Search Problem definition
- Applications
- Naiumlve solution
- Common solution
- When to use nearest neighbor
- Nearest Neighbor
- r - Nearest Neighbor
- Slide 10
- The simplest solution
- Quadtree
- Quadtree - structure
- Quadtree - Query
- Quadtree ndash Pitfall1
- Slide 16
- Quadtree ndash pitfall 2
- Space partition based algorithms
- Slide 19
- Curse of dimensionality
- Curse of dimensionality Some intuition
- Slide 22
- Preview
- Hash function
- Slide 25
- Slide 26
- Recall r - Nearest Neighbor
- Locality sensitive hashing
- Slide 29
- Hamming Space
- Slide 31
- L1 to Hamming Space Embedding
- Slide 33
- Construction
- Query
- Alternative intuition random projections
- Slide 37
- Slide 38
- Slide 39
- k samplings
- Repeating
- Repeating L times
- Slide 43
- Secondary hashing
- The above hashing is locality-sensitive
- Slide 46
- Direct L2 solution
- Central limit theorem
- Slide 49
- Slide 50
- Norm Distance
- Slide 52
- The full Hashing
- Slide 54
- Slide 55
- Slide 56
- Generalization P-Stable distribution
- P-Stable summary
- Parameters selection
- Parameters selection hellip
- hellip Parameters selection
- Pros amp Cons
- Conclusion
- LSH - Applications
- Motivation
- Slide 66
- Given an image x what are the parameters θ in this image ie angles of joints orientation of the body etc1048698
- Ingredients
- Example based learning
- Slide 70
- The image features
- PSH The basic assumption
- Insight Manifolds
- Slide 74
- Parameter Sensitive Hashing (PSH)
- Slide 76
- Slide 77
- Slide 78
- Slide 79
- Local Weighted Regression (LWR)
- Results
- Slide 82
- Results ndash real data
- Slide 84
- Slide 85
- Fast pose estimation - summary
- Food for Thought
- Food for Thought Point Location in Different Spheres (PLDS)
- Motivation
- Slide 90
- Slide 91
- Slide 92
- Slide 93
- Image segmentation algorithm
- Slide 95
- Filtering examples
- Segmentation examples
- Mean-shift in high dimensions
- LSH-based data structure
- Choosing the optimal K and L
- Slide 101
- Choosing optimal K and L
- Slide 103
- Data driven partitions
- Additional speedup
- Speedup results
- Food for thought
- A thought for foodhellip
- Summary
- Slide 110
- Thanks
-
![Page 44: K-Nearest Neighbors Search in High Dimensions Tomer Peled Dan Kushnir Tell me who your neighbors are, and I'll know who you are.](https://reader036.fdocuments.us/reader036/viewer/2022062421/56649c765503460f94929f53/html5/thumbnails/44.jpg)
Secondary hashing
Support volume tuning
dataset-size vs storage volume
2k buckets
011
Size=B
M Buckets
Simple Hashing
MB=αn α=2
Skip
The above hashing is locality-sensitive
bullProbability (pq in same bucket)=
k=1 k=2
Distance (qpi) Distance (qpi)
Pro
babi
lity Pr
Adopted from Piotr Indykrsquos slides
kqp
dimensions
)(Distance1
Preview
bullGeneral Solution ndash Locality sensitive hashing
bullImplementation for Hamming space
bullGeneralization to l2
Direct L2 solution
bullNew hashing function
bullStill based on sampling
bullUsing mathematical trick
bullP-stable distribution for Lp distance bullGaussian distribution for L2 distance
Central limit theorem
v1 +v2 hellip+vn =+hellip
(Weighted Gaussians) = Weighted Gaussian
Central limit theorem
v1vn = Real Numbers
X1Xn = Independent Identically Distributed(iid)
+v2 X2 hellip+vn Xn =+hellipv1 X1
Central limit theorem
XvXvi
ii
ii
21
2||
Dot Product Norm
Norm Distance
XvuXvXui
iii
iii
ii
21
2||
Features vector 1
Features vector 2 Distance
Norm Distance
XvuXvXui
iii
iii
ii
21
2||
Dot Product
Dot Product Distance
The full Hashing
w
bvavh ba )(
[34 82 21]1
227742
d
d random numbers
+b
phaseRandom[0w]
wDiscretization step
Features vector
The full Hashing
w
bvavh ba )(
+34
100
7944
7900 8000 8100 82007800
The full Hashing
w
bvavh ba )(
+34
phaseRandom[0w]
100Discretization step
7944
The full Hashing
w
bvavh ba )(
a1 v d
iid from p-stable distribution
+b
phaseRandom[0w]
wDiscretization step
Features vector
Generalization P-Stable distribution
bullLp p=eps2
bullGeneralized Central Limit Theorem
bullP-stable distributionCauchy for L2
bullL2
bullCentral Limit Theorem
bullGaussian (normal) distribution
P-Stable summary
bullWorks for bullGeneralizes to 0ltplt=2
bullImproves query time
Query time = O (dn1(1+)log(n) ) O (dn1(1+)^2log(n) )
r - Nearest Neighbor
Latest resultsReported in Email by
Alexander Andoni
Parameters selection
bull90 Probability Best quarry time performance
For Euclidean Space
Parameters selectionhellip
For Euclidean Space
bullSingle projection hit an - Nearest Neighbor with Pr=p1
bullk projections hits an - Nearest Neighbor with Pr=p1k
bullL hashings fail to collide with Pr=(1-p1k)L
bullTo ensure Collision (eg 1-δge90)
bull1( -1-p1k)Lge 1-δ)1log(
)log(
1kp
L
L
Reject Non-NeighborsAccept Neighbors
hellipParameters selection
K
k
time Candidates verification Candidates extraction
Better Query Time than Spatial Data Structures
Scales well to higher dimensions and larger data size ( Sub-linear dependence )
Predictable running time
Extra storage over-head
Inefficient for data with distances concentrated around average
works best for Hamming distance (although can be generalized to Euclidean space)
In secondary storage linear scan is pretty much all we can do (for high dim)
requires radius r to be fixed in advance
Pros amp Cons
From Pioter Indyk slides
Conclusion
bullbut at the endeverything depends on your data set
bullTry it at homendashVisit
httpwebmiteduandoniwwwLSHindexhtml
ndashEmail Alex AndoniAndonimitedundashTest over your own data
(C code under Red Hat Linux )
LSH - Applicationsbull Searching video clips in databases (Hierarchical Non-Uniform Locality Sensitive
Hashing and Its Application to Video Identificationldquo Yang Ooi Sun)
bull Searching image databases (see the following)
bull Image segmentation (see the following)
bull Image classification (ldquoDiscriminant adaptive Nearest Neighbor Classificationrdquo T Hastie R Tibshirani)
bull Texture classification (see the following)
bull Clustering (see the following)
bull Embedding and manifold learning (LLE and many others)
bull Compression ndash vector quantizationbull Search engines (ldquoLSH Forest SelfTuning Indexes for Similarity Searchrdquo M Bawa T Condie P Ganesanrdquo)
bull Genomics (ldquoEfficient Large-Scale Sequence Comparison by Locality-Sensitive Hashingrdquo J Buhler)
bull In short whenever K-Nearest Neighbors (KNN) are needed
Motivation
bull A variety of procedures in learning require KNN computation
bull KNN search is a computational bottleneck
bull LSH provides a fast approximate solution to the problem
bull LSH requires hash function construction and parameter tunning
Outline
Fast Pose Estimation with Parameter Sensitive Hashing G Shakhnarovich P Viola and T Darrell
bull Finding sensitive hash functions
Mean Shift Based Clustering in HighDimensions A Texture Classification Example
B Georgescu I Shimshoni and P Meer
bull Tuning LSH parametersbull LSH data structure is used for algorithm
speedups
Given an image x what are the parameters θ in this image
ie angles of joints orientation of the body etc1048698
The Problem
Fast Pose Estimation with Parameter Sensitive Hashing
G Shakhnarovich P Viola and T Darrell
i
Ingredients
bull Input query image with unknown angles (parameters)
bull Database of human poses with known anglesbull Image feature extractor ndash edge detector
bull Distance metric in feature space dx
bull Distance metric in angles space
m
i
iid1
2121 )cos(1)(
Example based learning
bull Construct a database of example images with their known angles
bull Given a query image run your favorite feature extractorbull Compute KNN from databasebull Use these KNNs to compute the average angles of the
query
Input queryFind KNN in database of examples
Output Average angles of KNN
Input Query
Features extraction
Processed query
PSH (LSH)
Database of examples
The algorithm flow
LWR (Regression)
Output Match
The image features
B A
Axx 4107 )(
4
3
2
4 0
Image features are multi-scale edge histograms
Feature Extraction PSH LWR
PSH The basic assumption
There are two metric spaces here feature space ( )
and parameter space ( )
We want similarity to be measured in the angles
space whereas LSH works on the feature space
bull Assumption The feature space is closely related to the parameter space
xd
d
Feature Extraction PSH LWR
Insight Manifolds
bull Manifold is a space in which every point has a neighborhood resembling a Euclid space
bull But global structure may be complicated curved
bull For example lines are 1D manifolds planes are 2D manifolds etc
Feature Extraction PSH LWR
Parameters Space (angles)
Feature Space
q
Is this Magic
Parameter Sensitive Hashing (PSH)
The trick
Estimate performance of different hash functions on examples and select those sensitive to
The hash functions are applied in feature space but the KNN are valid in angle space
d
Feature Extraction PSH LWR
Label pairs of examples with similar angles
Define hash functions h on feature space
Feature Extraction PSH LWR
Predict labeling of similarnon-similar examples by using h
Compare labeling
If labeling by h is goodaccept h else change h
PSH as a classification problem
+1 +1 -1 -1
(r=025)
Labels
)1()( if 1
)( if 1y
labeled is
)x()(x examples ofpair A
ij
ji
rd
rd
ji
ji
ji
Feature Extraction PSH LWR
otherwise 1-
T(x) if 1)(
xh T
A binary hash functionfeatures
otherwise 1
if 1ˆ
labels ePredict th
)(xh)(xh)x(xy
jTiTjih
Feature Extraction PSH LWR
Feature
Feature Extraction PSH LWR
sconstraint iesprobabilit with thelabeling true thepredicts that Tbest theFind
themseparateor bin
same in the examplesboth place willTh
)(xT
Local Weighted Regression (LWR)bull Given a query image PSH returns
KNNs
bull LWR uses the KNN to compute a weighted average of the estimated angles of the query
weightdist
iXiixNx
xxdKxgdi
0)(
))(())((minarg0
Feature Extraction PSH LWR
Results
Synthetic data were generated
bull 13 angles 1 for rotation of the torso 12 for joints
bull 150000 images
bull Nuisance parameters added clothing illumination face expression
bull 1775000 example pairs
bull Selected 137 out of 5123 meaningful features (how)
18 bit hash functions (k) 150 hash tables (l)
bull Test on 1000 synthetic examplesbull PSH searched only 34 of the data per query
bull Without selection needed 40 bits and
1000 hash tables
Recall P1 is prob of positive hashP2 is prob of bad hashB is the max number of pts in a bucket
Results ndash real data
bull 800 images
bull Processed by a segmentation algorithm
bull 13 of the data were searched
Results ndash real data
Interesting mismatches
Fast pose estimation - summary
bull Fast way to compute the angles of human body figure
bull Moving from one representation space to another
bull Training a sensitive hash function
bull KNN smart averaging
Food for Thought
bull The basic assumption may be problematic (distance metric representations)
bull The training set should be dense
bull Texture and clutter
bull General some features are more important than others and should be weighted
Food for Thought Point Location in Different Spheres (PLDS)
bull Given n spheres in Rd centered at P=p1hellippn
with radii r1helliprn
bull Goal given a query q preprocess the points in P to find point pi that its sphere lsquocoverrsquo the query q
qpi
ri
Courtesy of Mohamad Hegaze
Motivationbull Clustering high dimensional data by using local
density measurements (eg feature space)bull Statistical curse of dimensionality
sparseness of the databull Computational curse of dimensionality
expensive range queriesbull LSH parameters should be adjusted for optimal
performance
Mean-Shift Based Clustering in High Dimensions A Texture Classification Example
B Georgescu I Shimshoni and P Meer
Outline
bull Mean-shift in a nutshell + examples
Our scope
bull Mean-shift in high dimensions ndash using LSH
bull Speedups1 Finding optimal LSH parameters
2 Data-driven partitions into buckets
3 Additional speedup by using LSH data structure
Mean-Shift in a Nutshellbandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
point
KNN in mean-shift
Bandwidth should be inversely proportional to the density in the region
high density - small bandwidth low density - large bandwidth
Based on kth nearest neighbor of the point
The bandwidth is
Adaptive mean-shift vs non-adaptive
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Image segmentation algorithm1 Input Data in 5D (3 color + 2 xy) or 3D (1 gray +2 xy)2 Resolution controlled by the bandwidth hs (spatial) hr (color)3 Apply filtering
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
3D
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Image segmentation algorithm
original segmented
filtered
Filtering pixel value of the nearest mode
Mean-shift trajectories
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
original squirrel filtered
original baboon filtered
Filtering examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Segmentation examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Mean-shift in high dimensions
Computational curse of dimensionality
Statistical curse of dimensionality
Expensive range queries implemented with LSH
Sparseness of the data variable bandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
LSH-based data structure
bull Choose L random partitionsEach partition includes K pairs
(dkvk)bull For each point we check
kdi vxK
It Partitions the data into cells
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing the optimal K and L
bull For a query q compute smallest number of distances to points in its buckets
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
points extra includemight big toois L ifbut
missed bemight points small toois L If
cell ain points ofnumber smaller k Large
C
l
l
CC
dC
LNN
dKnN
)1(
C
C
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
structure data theof resolution thedetermines
decreases but increases increases L As
C
CC
Choosing optimal K and LDetermine accurately the KNN for m randomly-selected data points
distance (bandwidth)
Choose error threshold
The optimal K and L should satisfy
the approximate distance
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing optimal K and Lbull For each K estimate the error forbull In one run for all Lrsquos find the minimal L satisfying the constraint L(K)bull Minimize time t(KL(K))
minimum
Approximationerror for KL
L(K) for =005 Running timet[KL(K)]
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Data driven partitions
bull In original LSH cut values are random in the range of the databull Suggestion Randomly select a point from the data and use one of its coordinates as the cut value
uniform data driven pointsbucket distribution
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Additional speedup
aggregate)an of typea like is (C mode same
the toconverge willCin points all that Assume
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
C
C
Speedup results
65536 points 1638 points sampled k=100
Food for thought
Low dimension High dimension
A thought for foodhellipbull Choose K L by sample learning or take the traditionalbull Can one estimate K L without samplingbull A thought for food does it help to know the data
dimensionality or the data manifoldbull Intuitively dimensionality implies the number of hash
functions neededbull The catch efficient dimensionality learning requires
KNN
1530 cookieshellip
Summary
bull LSH suggests a compromise on accuracy for the gain of complexity
bull Applications that involve massive data in high dimension require the LSH fast performance
bull Extension of the LSH to different spaces (PSH)
bull Learning the LSH parameters and hash functions for different applications
Conclusion
bull but at the endeverything depends on your data set
bull Try it at homendash Visit
httpwebmiteduandoniwwwLSHindexhtmlndash Email Alex Andoni Andonimitedundash Test over your own data
(C code under Red Hat Linux )
Thanks
bull Ilan Shimshoni (Haifa)
bull Mohamad Hegaze (Weizmann)
bull Alex Andoni (MIT)
bull Mica and Denis
- k-Nearest Neighbors Search in High Dimensions
- Outline
- Nearest Neighbor Search Problem definition
- Applications
- Naiumlve solution
- Common solution
- When to use nearest neighbor
- Nearest Neighbor
- r - Nearest Neighbor
- Slide 10
- The simplest solution
- Quadtree
- Quadtree - structure
- Quadtree - Query
- Quadtree ndash Pitfall1
- Slide 16
- Quadtree ndash pitfall 2
- Space partition based algorithms
- Slide 19
- Curse of dimensionality
- Curse of dimensionality Some intuition
- Slide 22
- Preview
- Hash function
- Slide 25
- Slide 26
- Recall r - Nearest Neighbor
- Locality sensitive hashing
- Slide 29
- Hamming Space
- Slide 31
- L1 to Hamming Space Embedding
- Slide 33
- Construction
- Query
- Alternative intuition random projections
- Slide 37
- Slide 38
- Slide 39
- k samplings
- Repeating
- Repeating L times
- Slide 43
- Secondary hashing
- The above hashing is locality-sensitive
- Slide 46
- Direct L2 solution
- Central limit theorem
- Slide 49
- Slide 50
- Norm Distance
- Slide 52
- The full Hashing
- Slide 54
- Slide 55
- Slide 56
- Generalization P-Stable distribution
- P-Stable summary
- Parameters selection
- Parameters selection hellip
- hellip Parameters selection
- Pros amp Cons
- Conclusion
- LSH - Applications
- Motivation
- Slide 66
- Given an image x what are the parameters θ in this image ie angles of joints orientation of the body etc1048698
- Ingredients
- Example based learning
- Slide 70
- The image features
- PSH The basic assumption
- Insight Manifolds
- Slide 74
- Parameter Sensitive Hashing (PSH)
- Slide 76
- Slide 77
- Slide 78
- Slide 79
- Local Weighted Regression (LWR)
- Results
- Slide 82
- Results ndash real data
- Slide 84
- Slide 85
- Fast pose estimation - summary
- Food for Thought
- Food for Thought Point Location in Different Spheres (PLDS)
- Motivation
- Slide 90
- Slide 91
- Slide 92
- Slide 93
- Image segmentation algorithm
- Slide 95
- Filtering examples
- Segmentation examples
- Mean-shift in high dimensions
- LSH-based data structure
- Choosing the optimal K and L
- Slide 101
- Choosing optimal K and L
- Slide 103
- Data driven partitions
- Additional speedup
- Speedup results
- Food for thought
- A thought for foodhellip
- Summary
- Slide 110
- Thanks
-
![Page 45: K-Nearest Neighbors Search in High Dimensions Tomer Peled Dan Kushnir Tell me who your neighbors are, and I'll know who you are.](https://reader036.fdocuments.us/reader036/viewer/2022062421/56649c765503460f94929f53/html5/thumbnails/45.jpg)
The above hashing is locality-sensitive
bullProbability (pq in same bucket)=
k=1 k=2
Distance (qpi) Distance (qpi)
Pro
babi
lity Pr
Adopted from Piotr Indykrsquos slides
kqp
dimensions
)(Distance1
Preview
bullGeneral Solution ndash Locality sensitive hashing
bullImplementation for Hamming space
bullGeneralization to l2
Direct L2 solution
bullNew hashing function
bullStill based on sampling
bullUsing mathematical trick
bullP-stable distribution for Lp distance bullGaussian distribution for L2 distance
Central limit theorem
v1 +v2 hellip+vn =+hellip
(Weighted Gaussians) = Weighted Gaussian
Central limit theorem
v1vn = Real Numbers
X1Xn = Independent Identically Distributed(iid)
+v2 X2 hellip+vn Xn =+hellipv1 X1
Central limit theorem
XvXvi
ii
ii
21
2||
Dot Product Norm
Norm Distance
XvuXvXui
iii
iii
ii
21
2||
Features vector 1
Features vector 2 Distance
Norm Distance
XvuXvXui
iii
iii
ii
21
2||
Dot Product
Dot Product Distance
The full Hashing
w
bvavh ba )(
[34 82 21]1
227742
d
d random numbers
+b
phaseRandom[0w]
wDiscretization step
Features vector
The full Hashing
w
bvavh ba )(
+34
100
7944
7900 8000 8100 82007800
The full Hashing
w
bvavh ba )(
+34
phaseRandom[0w]
100Discretization step
7944
The full Hashing
w
bvavh ba )(
a1 v d
iid from p-stable distribution
+b
phaseRandom[0w]
wDiscretization step
Features vector
Generalization P-Stable distribution
bullLp p=eps2
bullGeneralized Central Limit Theorem
bullP-stable distributionCauchy for L2
bullL2
bullCentral Limit Theorem
bullGaussian (normal) distribution
P-Stable summary
bullWorks for bullGeneralizes to 0ltplt=2
bullImproves query time
Query time = O (dn1(1+)log(n) ) O (dn1(1+)^2log(n) )
r - Nearest Neighbor
Latest resultsReported in Email by
Alexander Andoni
Parameters selection
bull90 Probability Best quarry time performance
For Euclidean Space
Parameters selectionhellip
For Euclidean Space
bullSingle projection hit an - Nearest Neighbor with Pr=p1
bullk projections hits an - Nearest Neighbor with Pr=p1k
bullL hashings fail to collide with Pr=(1-p1k)L
bullTo ensure Collision (eg 1-δge90)
bull1( -1-p1k)Lge 1-δ)1log(
)log(
1kp
L
L
Reject Non-NeighborsAccept Neighbors
hellipParameters selection
K
k
time Candidates verification Candidates extraction
Better Query Time than Spatial Data Structures
Scales well to higher dimensions and larger data size ( Sub-linear dependence )
Predictable running time
Extra storage over-head
Inefficient for data with distances concentrated around average
works best for Hamming distance (although can be generalized to Euclidean space)
In secondary storage linear scan is pretty much all we can do (for high dim)
requires radius r to be fixed in advance
Pros amp Cons
From Pioter Indyk slides
Conclusion
bullbut at the endeverything depends on your data set
bullTry it at homendashVisit
httpwebmiteduandoniwwwLSHindexhtml
ndashEmail Alex AndoniAndonimitedundashTest over your own data
(C code under Red Hat Linux )
LSH - Applicationsbull Searching video clips in databases (Hierarchical Non-Uniform Locality Sensitive
Hashing and Its Application to Video Identificationldquo Yang Ooi Sun)
bull Searching image databases (see the following)
bull Image segmentation (see the following)
bull Image classification (ldquoDiscriminant adaptive Nearest Neighbor Classificationrdquo T Hastie R Tibshirani)
bull Texture classification (see the following)
bull Clustering (see the following)
bull Embedding and manifold learning (LLE and many others)
bull Compression ndash vector quantizationbull Search engines (ldquoLSH Forest SelfTuning Indexes for Similarity Searchrdquo M Bawa T Condie P Ganesanrdquo)
bull Genomics (ldquoEfficient Large-Scale Sequence Comparison by Locality-Sensitive Hashingrdquo J Buhler)
bull In short whenever K-Nearest Neighbors (KNN) are needed
Motivation
bull A variety of procedures in learning require KNN computation
bull KNN search is a computational bottleneck
bull LSH provides a fast approximate solution to the problem
bull LSH requires hash function construction and parameter tunning
Outline
Fast Pose Estimation with Parameter Sensitive Hashing G Shakhnarovich P Viola and T Darrell
bull Finding sensitive hash functions
Mean Shift Based Clustering in HighDimensions A Texture Classification Example
B Georgescu I Shimshoni and P Meer
bull Tuning LSH parametersbull LSH data structure is used for algorithm
speedups
Given an image x what are the parameters θ in this image
ie angles of joints orientation of the body etc1048698
The Problem
Fast Pose Estimation with Parameter Sensitive Hashing
G Shakhnarovich P Viola and T Darrell
i
Ingredients
bull Input query image with unknown angles (parameters)
bull Database of human poses with known anglesbull Image feature extractor ndash edge detector
bull Distance metric in feature space dx
bull Distance metric in angles space
m
i
iid1
2121 )cos(1)(
Example based learning
bull Construct a database of example images with their known angles
bull Given a query image run your favorite feature extractorbull Compute KNN from databasebull Use these KNNs to compute the average angles of the
query
Input queryFind KNN in database of examples
Output Average angles of KNN
Input Query
Features extraction
Processed query
PSH (LSH)
Database of examples
The algorithm flow
LWR (Regression)
Output Match
The image features
B A
Axx 4107 )(
4
3
2
4 0
Image features are multi-scale edge histograms
Feature Extraction PSH LWR
PSH The basic assumption
There are two metric spaces here feature space ( )
and parameter space ( )
We want similarity to be measured in the angles
space whereas LSH works on the feature space
bull Assumption The feature space is closely related to the parameter space
xd
d
Feature Extraction PSH LWR
Insight Manifolds
bull Manifold is a space in which every point has a neighborhood resembling a Euclid space
bull But global structure may be complicated curved
bull For example lines are 1D manifolds planes are 2D manifolds etc
Feature Extraction PSH LWR
Parameters Space (angles)
Feature Space
q
Is this Magic
Parameter Sensitive Hashing (PSH)
The trick
Estimate performance of different hash functions on examples and select those sensitive to
The hash functions are applied in feature space but the KNN are valid in angle space
d
Feature Extraction PSH LWR
Label pairs of examples with similar angles
Define hash functions h on feature space
Feature Extraction PSH LWR
Predict labeling of similarnon-similar examples by using h
Compare labeling
If labeling by h is goodaccept h else change h
PSH as a classification problem
+1 +1 -1 -1
(r=025)
Labels
)1()( if 1
)( if 1y
labeled is
)x()(x examples ofpair A
ij
ji
rd
rd
ji
ji
ji
Feature Extraction PSH LWR
otherwise 1-
T(x) if 1)(
xh T
A binary hash functionfeatures
otherwise 1
if 1ˆ
labels ePredict th
)(xh)(xh)x(xy
jTiTjih
Feature Extraction PSH LWR
Feature
Feature Extraction PSH LWR
sconstraint iesprobabilit with thelabeling true thepredicts that Tbest theFind
themseparateor bin
same in the examplesboth place willTh
)(xT
Local Weighted Regression (LWR)bull Given a query image PSH returns
KNNs
bull LWR uses the KNN to compute a weighted average of the estimated angles of the query
weightdist
iXiixNx
xxdKxgdi
0)(
))(())((minarg0
Feature Extraction PSH LWR
Results
Synthetic data were generated
bull 13 angles 1 for rotation of the torso 12 for joints
bull 150000 images
bull Nuisance parameters added clothing illumination face expression
bull 1775000 example pairs
bull Selected 137 out of 5123 meaningful features (how)
18 bit hash functions (k) 150 hash tables (l)
bull Test on 1000 synthetic examplesbull PSH searched only 34 of the data per query
bull Without selection needed 40 bits and
1000 hash tables
Recall P1 is prob of positive hashP2 is prob of bad hashB is the max number of pts in a bucket
Results ndash real data
bull 800 images
bull Processed by a segmentation algorithm
bull 13 of the data were searched
Results ndash real data
Interesting mismatches
Fast pose estimation - summary
bull Fast way to compute the angles of human body figure
bull Moving from one representation space to another
bull Training a sensitive hash function
bull KNN smart averaging
Food for Thought
bull The basic assumption may be problematic (distance metric representations)
bull The training set should be dense
bull Texture and clutter
bull General some features are more important than others and should be weighted
Food for Thought Point Location in Different Spheres (PLDS)
bull Given n spheres in Rd centered at P=p1hellippn
with radii r1helliprn
bull Goal given a query q preprocess the points in P to find point pi that its sphere lsquocoverrsquo the query q
qpi
ri
Courtesy of Mohamad Hegaze
Motivationbull Clustering high dimensional data by using local
density measurements (eg feature space)bull Statistical curse of dimensionality
sparseness of the databull Computational curse of dimensionality
expensive range queriesbull LSH parameters should be adjusted for optimal
performance
Mean-Shift Based Clustering in High Dimensions A Texture Classification Example
B Georgescu I Shimshoni and P Meer
Outline
bull Mean-shift in a nutshell + examples
Our scope
bull Mean-shift in high dimensions ndash using LSH
bull Speedups1 Finding optimal LSH parameters
2 Data-driven partitions into buckets
3 Additional speedup by using LSH data structure
Mean-Shift in a Nutshellbandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
point
KNN in mean-shift
Bandwidth should be inversely proportional to the density in the region
high density - small bandwidth low density - large bandwidth
Based on kth nearest neighbor of the point
The bandwidth is
Adaptive mean-shift vs non-adaptive
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Image segmentation algorithm1 Input Data in 5D (3 color + 2 xy) or 3D (1 gray +2 xy)2 Resolution controlled by the bandwidth hs (spatial) hr (color)3 Apply filtering
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
3D
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Image segmentation algorithm
original segmented
filtered
Filtering pixel value of the nearest mode
Mean-shift trajectories
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
original squirrel filtered
original baboon filtered
Filtering examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Segmentation examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Mean-shift in high dimensions
Computational curse of dimensionality
Statistical curse of dimensionality
Expensive range queries implemented with LSH
Sparseness of the data variable bandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
LSH-based data structure
bull Choose L random partitionsEach partition includes K pairs
(dkvk)bull For each point we check
kdi vxK
It Partitions the data into cells
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing the optimal K and L
bull For a query q compute smallest number of distances to points in its buckets
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
points extra includemight big toois L ifbut
missed bemight points small toois L If
cell ain points ofnumber smaller k Large
C
l
l
CC
dC
LNN
dKnN
)1(
C
C
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
structure data theof resolution thedetermines
decreases but increases increases L As
C
CC
Choosing optimal K and LDetermine accurately the KNN for m randomly-selected data points
distance (bandwidth)
Choose error threshold
The optimal K and L should satisfy
the approximate distance
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing optimal K and Lbull For each K estimate the error forbull In one run for all Lrsquos find the minimal L satisfying the constraint L(K)bull Minimize time t(KL(K))
minimum
Approximationerror for KL
L(K) for =005 Running timet[KL(K)]
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Data driven partitions
bull In original LSH cut values are random in the range of the databull Suggestion Randomly select a point from the data and use one of its coordinates as the cut value
uniform data driven pointsbucket distribution
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Additional speedup
aggregate)an of typea like is (C mode same
the toconverge willCin points all that Assume
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
C
C
Speedup results
65536 points 1638 points sampled k=100
Food for thought
Low dimension High dimension
A thought for foodhellipbull Choose K L by sample learning or take the traditionalbull Can one estimate K L without samplingbull A thought for food does it help to know the data
dimensionality or the data manifoldbull Intuitively dimensionality implies the number of hash
functions neededbull The catch efficient dimensionality learning requires
KNN
1530 cookieshellip
Summary
bull LSH suggests a compromise on accuracy for the gain of complexity
bull Applications that involve massive data in high dimension require the LSH fast performance
bull Extension of the LSH to different spaces (PSH)
bull Learning the LSH parameters and hash functions for different applications
Conclusion
bull but at the endeverything depends on your data set
bull Try it at homendash Visit
httpwebmiteduandoniwwwLSHindexhtmlndash Email Alex Andoni Andonimitedundash Test over your own data
(C code under Red Hat Linux )
Thanks
bull Ilan Shimshoni (Haifa)
bull Mohamad Hegaze (Weizmann)
bull Alex Andoni (MIT)
bull Mica and Denis
- k-Nearest Neighbors Search in High Dimensions
- Outline
- Nearest Neighbor Search Problem definition
- Applications
- Naiumlve solution
- Common solution
- When to use nearest neighbor
- Nearest Neighbor
- r - Nearest Neighbor
- Slide 10
- The simplest solution
- Quadtree
- Quadtree - structure
- Quadtree - Query
- Quadtree ndash Pitfall1
- Slide 16
- Quadtree ndash pitfall 2
- Space partition based algorithms
- Slide 19
- Curse of dimensionality
- Curse of dimensionality Some intuition
- Slide 22
- Preview
- Hash function
- Slide 25
- Slide 26
- Recall r - Nearest Neighbor
- Locality sensitive hashing
- Slide 29
- Hamming Space
- Slide 31
- L1 to Hamming Space Embedding
- Slide 33
- Construction
- Query
- Alternative intuition random projections
- Slide 37
- Slide 38
- Slide 39
- k samplings
- Repeating
- Repeating L times
- Slide 43
- Secondary hashing
- The above hashing is locality-sensitive
- Slide 46
- Direct L2 solution
- Central limit theorem
- Slide 49
- Slide 50
- Norm Distance
- Slide 52
- The full Hashing
- Slide 54
- Slide 55
- Slide 56
- Generalization P-Stable distribution
- P-Stable summary
- Parameters selection
- Parameters selection hellip
- hellip Parameters selection
- Pros amp Cons
- Conclusion
- LSH - Applications
- Motivation
- Slide 66
- Given an image x what are the parameters θ in this image ie angles of joints orientation of the body etc1048698
- Ingredients
- Example based learning
- Slide 70
- The image features
- PSH The basic assumption
- Insight Manifolds
- Slide 74
- Parameter Sensitive Hashing (PSH)
- Slide 76
- Slide 77
- Slide 78
- Slide 79
- Local Weighted Regression (LWR)
- Results
- Slide 82
- Results ndash real data
- Slide 84
- Slide 85
- Fast pose estimation - summary
- Food for Thought
- Food for Thought Point Location in Different Spheres (PLDS)
- Motivation
- Slide 90
- Slide 91
- Slide 92
- Slide 93
- Image segmentation algorithm
- Slide 95
- Filtering examples
- Segmentation examples
- Mean-shift in high dimensions
- LSH-based data structure
- Choosing the optimal K and L
- Slide 101
- Choosing optimal K and L
- Slide 103
- Data driven partitions
- Additional speedup
- Speedup results
- Food for thought
- A thought for foodhellip
- Summary
- Slide 110
- Thanks
-
![Page 46: K-Nearest Neighbors Search in High Dimensions Tomer Peled Dan Kushnir Tell me who your neighbors are, and I'll know who you are.](https://reader036.fdocuments.us/reader036/viewer/2022062421/56649c765503460f94929f53/html5/thumbnails/46.jpg)
Preview
bullGeneral Solution ndash Locality sensitive hashing
bullImplementation for Hamming space
bullGeneralization to l2
Direct L2 solution
bullNew hashing function
bullStill based on sampling
bullUsing mathematical trick
bullP-stable distribution for Lp distance bullGaussian distribution for L2 distance
Central limit theorem
v1 +v2 hellip+vn =+hellip
(Weighted Gaussians) = Weighted Gaussian
Central limit theorem
v1vn = Real Numbers
X1Xn = Independent Identically Distributed(iid)
+v2 X2 hellip+vn Xn =+hellipv1 X1
Central limit theorem
XvXvi
ii
ii
21
2||
Dot Product Norm
Norm Distance
XvuXvXui
iii
iii
ii
21
2||
Features vector 1
Features vector 2 Distance
Norm Distance
XvuXvXui
iii
iii
ii
21
2||
Dot Product
Dot Product Distance
The full Hashing
w
bvavh ba )(
[34 82 21]1
227742
d
d random numbers
+b
phaseRandom[0w]
wDiscretization step
Features vector
The full Hashing
w
bvavh ba )(
+34
100
7944
7900 8000 8100 82007800
The full Hashing
w
bvavh ba )(
+34
phaseRandom[0w]
100Discretization step
7944
The full Hashing
w
bvavh ba )(
a1 v d
iid from p-stable distribution
+b
phaseRandom[0w]
wDiscretization step
Features vector
Generalization P-Stable distribution
bullLp p=eps2
bullGeneralized Central Limit Theorem
bullP-stable distributionCauchy for L2
bullL2
bullCentral Limit Theorem
bullGaussian (normal) distribution
P-Stable summary
bullWorks for bullGeneralizes to 0ltplt=2
bullImproves query time
Query time = O (dn1(1+)log(n) ) O (dn1(1+)^2log(n) )
r - Nearest Neighbor
Latest resultsReported in Email by
Alexander Andoni
Parameters selection
bull90 Probability Best quarry time performance
For Euclidean Space
Parameters selectionhellip
For Euclidean Space
bullSingle projection hit an - Nearest Neighbor with Pr=p1
bullk projections hits an - Nearest Neighbor with Pr=p1k
bullL hashings fail to collide with Pr=(1-p1k)L
bullTo ensure Collision (eg 1-δge90)
bull1( -1-p1k)Lge 1-δ)1log(
)log(
1kp
L
L
Reject Non-NeighborsAccept Neighbors
hellipParameters selection
K
k
time Candidates verification Candidates extraction
Better Query Time than Spatial Data Structures
Scales well to higher dimensions and larger data size ( Sub-linear dependence )
Predictable running time
Extra storage over-head
Inefficient for data with distances concentrated around average
works best for Hamming distance (although can be generalized to Euclidean space)
In secondary storage linear scan is pretty much all we can do (for high dim)
requires radius r to be fixed in advance
Pros amp Cons
From Pioter Indyk slides
Conclusion
bullbut at the endeverything depends on your data set
bullTry it at homendashVisit
httpwebmiteduandoniwwwLSHindexhtml
ndashEmail Alex AndoniAndonimitedundashTest over your own data
(C code under Red Hat Linux )
LSH - Applicationsbull Searching video clips in databases (Hierarchical Non-Uniform Locality Sensitive
Hashing and Its Application to Video Identificationldquo Yang Ooi Sun)
bull Searching image databases (see the following)
bull Image segmentation (see the following)
bull Image classification (ldquoDiscriminant adaptive Nearest Neighbor Classificationrdquo T Hastie R Tibshirani)
bull Texture classification (see the following)
bull Clustering (see the following)
bull Embedding and manifold learning (LLE and many others)
bull Compression ndash vector quantizationbull Search engines (ldquoLSH Forest SelfTuning Indexes for Similarity Searchrdquo M Bawa T Condie P Ganesanrdquo)
bull Genomics (ldquoEfficient Large-Scale Sequence Comparison by Locality-Sensitive Hashingrdquo J Buhler)
bull In short whenever K-Nearest Neighbors (KNN) are needed
Motivation
bull A variety of procedures in learning require KNN computation
bull KNN search is a computational bottleneck
bull LSH provides a fast approximate solution to the problem
bull LSH requires hash function construction and parameter tunning
Outline
Fast Pose Estimation with Parameter Sensitive Hashing G Shakhnarovich P Viola and T Darrell
bull Finding sensitive hash functions
Mean Shift Based Clustering in HighDimensions A Texture Classification Example
B Georgescu I Shimshoni and P Meer
bull Tuning LSH parametersbull LSH data structure is used for algorithm
speedups
Given an image x what are the parameters θ in this image
ie angles of joints orientation of the body etc1048698
The Problem
Fast Pose Estimation with Parameter Sensitive Hashing
G Shakhnarovich P Viola and T Darrell
i
Ingredients
bull Input query image with unknown angles (parameters)
bull Database of human poses with known anglesbull Image feature extractor ndash edge detector
bull Distance metric in feature space dx
bull Distance metric in angles space
m
i
iid1
2121 )cos(1)(
Example based learning
bull Construct a database of example images with their known angles
bull Given a query image run your favorite feature extractorbull Compute KNN from databasebull Use these KNNs to compute the average angles of the
query
Input queryFind KNN in database of examples
Output Average angles of KNN
Input Query
Features extraction
Processed query
PSH (LSH)
Database of examples
The algorithm flow
LWR (Regression)
Output Match
The image features
B A
Axx 4107 )(
4
3
2
4 0
Image features are multi-scale edge histograms
Feature Extraction PSH LWR
PSH The basic assumption
There are two metric spaces here feature space ( )
and parameter space ( )
We want similarity to be measured in the angles
space whereas LSH works on the feature space
bull Assumption The feature space is closely related to the parameter space
xd
d
Feature Extraction PSH LWR
Insight Manifolds
bull Manifold is a space in which every point has a neighborhood resembling a Euclid space
bull But global structure may be complicated curved
bull For example lines are 1D manifolds planes are 2D manifolds etc
Feature Extraction PSH LWR
Parameters Space (angles)
Feature Space
q
Is this Magic
Parameter Sensitive Hashing (PSH)
The trick
Estimate performance of different hash functions on examples and select those sensitive to
The hash functions are applied in feature space but the KNN are valid in angle space
d
Feature Extraction PSH LWR
Label pairs of examples with similar angles
Define hash functions h on feature space
Feature Extraction PSH LWR
Predict labeling of similarnon-similar examples by using h
Compare labeling
If labeling by h is goodaccept h else change h
PSH as a classification problem
+1 +1 -1 -1
(r=025)
Labels
)1()( if 1
)( if 1y
labeled is
)x()(x examples ofpair A
ij
ji
rd
rd
ji
ji
ji
Feature Extraction PSH LWR
otherwise 1-
T(x) if 1)(
xh T
A binary hash functionfeatures
otherwise 1
if 1ˆ
labels ePredict th
)(xh)(xh)x(xy
jTiTjih
Feature Extraction PSH LWR
Feature
Feature Extraction PSH LWR
sconstraint iesprobabilit with thelabeling true thepredicts that Tbest theFind
themseparateor bin
same in the examplesboth place willTh
)(xT
Local Weighted Regression (LWR)bull Given a query image PSH returns
KNNs
bull LWR uses the KNN to compute a weighted average of the estimated angles of the query
weightdist
iXiixNx
xxdKxgdi
0)(
))(())((minarg0
Feature Extraction PSH LWR
Results
Synthetic data were generated
bull 13 angles 1 for rotation of the torso 12 for joints
bull 150000 images
bull Nuisance parameters added clothing illumination face expression
bull 1775000 example pairs
bull Selected 137 out of 5123 meaningful features (how)
18 bit hash functions (k) 150 hash tables (l)
bull Test on 1000 synthetic examplesbull PSH searched only 34 of the data per query
bull Without selection needed 40 bits and
1000 hash tables
Recall P1 is prob of positive hashP2 is prob of bad hashB is the max number of pts in a bucket
Results ndash real data
bull 800 images
bull Processed by a segmentation algorithm
bull 13 of the data were searched
Results ndash real data
Interesting mismatches
Fast pose estimation - summary
bull Fast way to compute the angles of human body figure
bull Moving from one representation space to another
bull Training a sensitive hash function
bull KNN smart averaging
Food for Thought
bull The basic assumption may be problematic (distance metric representations)
bull The training set should be dense
bull Texture and clutter
bull General some features are more important than others and should be weighted
Food for Thought Point Location in Different Spheres (PLDS)
bull Given n spheres in Rd centered at P=p1hellippn
with radii r1helliprn
bull Goal given a query q preprocess the points in P to find point pi that its sphere lsquocoverrsquo the query q
qpi
ri
Courtesy of Mohamad Hegaze
Motivationbull Clustering high dimensional data by using local
density measurements (eg feature space)bull Statistical curse of dimensionality
sparseness of the databull Computational curse of dimensionality
expensive range queriesbull LSH parameters should be adjusted for optimal
performance
Mean-Shift Based Clustering in High Dimensions A Texture Classification Example
B Georgescu I Shimshoni and P Meer
Outline
bull Mean-shift in a nutshell + examples
Our scope
bull Mean-shift in high dimensions ndash using LSH
bull Speedups1 Finding optimal LSH parameters
2 Data-driven partitions into buckets
3 Additional speedup by using LSH data structure
Mean-Shift in a Nutshellbandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
point
KNN in mean-shift
Bandwidth should be inversely proportional to the density in the region
high density - small bandwidth low density - large bandwidth
Based on kth nearest neighbor of the point
The bandwidth is
Adaptive mean-shift vs non-adaptive
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Image segmentation algorithm1 Input Data in 5D (3 color + 2 xy) or 3D (1 gray +2 xy)2 Resolution controlled by the bandwidth hs (spatial) hr (color)3 Apply filtering
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
3D
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Image segmentation algorithm
original segmented
filtered
Filtering pixel value of the nearest mode
Mean-shift trajectories
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
original squirrel filtered
original baboon filtered
Filtering examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Segmentation examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Mean-shift in high dimensions
Computational curse of dimensionality
Statistical curse of dimensionality
Expensive range queries implemented with LSH
Sparseness of the data variable bandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
LSH-based data structure
bull Choose L random partitionsEach partition includes K pairs
(dkvk)bull For each point we check
kdi vxK
It Partitions the data into cells
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing the optimal K and L
bull For a query q compute smallest number of distances to points in its buckets
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
points extra includemight big toois L ifbut
missed bemight points small toois L If
cell ain points ofnumber smaller k Large
C
l
l
CC
dC
LNN
dKnN
)1(
C
C
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
structure data theof resolution thedetermines
decreases but increases increases L As
C
CC
Choosing optimal K and LDetermine accurately the KNN for m randomly-selected data points
distance (bandwidth)
Choose error threshold
The optimal K and L should satisfy
the approximate distance
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing optimal K and Lbull For each K estimate the error forbull In one run for all Lrsquos find the minimal L satisfying the constraint L(K)bull Minimize time t(KL(K))
minimum
Approximationerror for KL
L(K) for =005 Running timet[KL(K)]
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Data driven partitions
bull In original LSH cut values are random in the range of the databull Suggestion Randomly select a point from the data and use one of its coordinates as the cut value
uniform data driven pointsbucket distribution
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Additional speedup
aggregate)an of typea like is (C mode same
the toconverge willCin points all that Assume
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
C
C
Speedup results
65536 points 1638 points sampled k=100
Food for thought
Low dimension High dimension
A thought for foodhellipbull Choose K L by sample learning or take the traditionalbull Can one estimate K L without samplingbull A thought for food does it help to know the data
dimensionality or the data manifoldbull Intuitively dimensionality implies the number of hash
functions neededbull The catch efficient dimensionality learning requires
KNN
1530 cookieshellip
Summary
bull LSH suggests a compromise on accuracy for the gain of complexity
bull Applications that involve massive data in high dimension require the LSH fast performance
bull Extension of the LSH to different spaces (PSH)
bull Learning the LSH parameters and hash functions for different applications
Conclusion
bull but at the endeverything depends on your data set
bull Try it at homendash Visit
httpwebmiteduandoniwwwLSHindexhtmlndash Email Alex Andoni Andonimitedundash Test over your own data
(C code under Red Hat Linux )
Thanks
bull Ilan Shimshoni (Haifa)
bull Mohamad Hegaze (Weizmann)
bull Alex Andoni (MIT)
bull Mica and Denis
- k-Nearest Neighbors Search in High Dimensions
- Outline
- Nearest Neighbor Search Problem definition
- Applications
- Naiumlve solution
- Common solution
- When to use nearest neighbor
- Nearest Neighbor
- r - Nearest Neighbor
- Slide 10
- The simplest solution
- Quadtree
- Quadtree - structure
- Quadtree - Query
- Quadtree ndash Pitfall1
- Slide 16
- Quadtree ndash pitfall 2
- Space partition based algorithms
- Slide 19
- Curse of dimensionality
- Curse of dimensionality Some intuition
- Slide 22
- Preview
- Hash function
- Slide 25
- Slide 26
- Recall r - Nearest Neighbor
- Locality sensitive hashing
- Slide 29
- Hamming Space
- Slide 31
- L1 to Hamming Space Embedding
- Slide 33
- Construction
- Query
- Alternative intuition random projections
- Slide 37
- Slide 38
- Slide 39
- k samplings
- Repeating
- Repeating L times
- Slide 43
- Secondary hashing
- The above hashing is locality-sensitive
- Slide 46
- Direct L2 solution
- Central limit theorem
- Slide 49
- Slide 50
- Norm Distance
- Slide 52
- The full Hashing
- Slide 54
- Slide 55
- Slide 56
- Generalization P-Stable distribution
- P-Stable summary
- Parameters selection
- Parameters selection hellip
- hellip Parameters selection
- Pros amp Cons
- Conclusion
- LSH - Applications
- Motivation
- Slide 66
- Given an image x what are the parameters θ in this image ie angles of joints orientation of the body etc1048698
- Ingredients
- Example based learning
- Slide 70
- The image features
- PSH The basic assumption
- Insight Manifolds
- Slide 74
- Parameter Sensitive Hashing (PSH)
- Slide 76
- Slide 77
- Slide 78
- Slide 79
- Local Weighted Regression (LWR)
- Results
- Slide 82
- Results ndash real data
- Slide 84
- Slide 85
- Fast pose estimation - summary
- Food for Thought
- Food for Thought Point Location in Different Spheres (PLDS)
- Motivation
- Slide 90
- Slide 91
- Slide 92
- Slide 93
- Image segmentation algorithm
- Slide 95
- Filtering examples
- Segmentation examples
- Mean-shift in high dimensions
- LSH-based data structure
- Choosing the optimal K and L
- Slide 101
- Choosing optimal K and L
- Slide 103
- Data driven partitions
- Additional speedup
- Speedup results
- Food for thought
- A thought for foodhellip
- Summary
- Slide 110
- Thanks
-
![Page 47: K-Nearest Neighbors Search in High Dimensions Tomer Peled Dan Kushnir Tell me who your neighbors are, and I'll know who you are.](https://reader036.fdocuments.us/reader036/viewer/2022062421/56649c765503460f94929f53/html5/thumbnails/47.jpg)
Direct L2 solution
bullNew hashing function
bullStill based on sampling
bullUsing mathematical trick
bullP-stable distribution for Lp distance bullGaussian distribution for L2 distance
Central limit theorem
v1 +v2 hellip+vn =+hellip
(Weighted Gaussians) = Weighted Gaussian
Central limit theorem
v1vn = Real Numbers
X1Xn = Independent Identically Distributed(iid)
+v2 X2 hellip+vn Xn =+hellipv1 X1
Central limit theorem
XvXvi
ii
ii
21
2||
Dot Product Norm
Norm Distance
XvuXvXui
iii
iii
ii
21
2||
Features vector 1
Features vector 2 Distance
Norm Distance
XvuXvXui
iii
iii
ii
21
2||
Dot Product
Dot Product Distance
The full Hashing
w
bvavh ba )(
[34 82 21]1
227742
d
d random numbers
+b
phaseRandom[0w]
wDiscretization step
Features vector
The full Hashing
w
bvavh ba )(
+34
100
7944
7900 8000 8100 82007800
The full Hashing
w
bvavh ba )(
+34
phaseRandom[0w]
100Discretization step
7944
The full Hashing
w
bvavh ba )(
a1 v d
iid from p-stable distribution
+b
phaseRandom[0w]
wDiscretization step
Features vector
Generalization P-Stable distribution
bullLp p=eps2
bullGeneralized Central Limit Theorem
bullP-stable distributionCauchy for L2
bullL2
bullCentral Limit Theorem
bullGaussian (normal) distribution
P-Stable summary
bullWorks for bullGeneralizes to 0ltplt=2
bullImproves query time
Query time = O (dn1(1+)log(n) ) O (dn1(1+)^2log(n) )
r - Nearest Neighbor
Latest resultsReported in Email by
Alexander Andoni
Parameters selection
bull90 Probability Best quarry time performance
For Euclidean Space
Parameters selectionhellip
For Euclidean Space
bullSingle projection hit an - Nearest Neighbor with Pr=p1
bullk projections hits an - Nearest Neighbor with Pr=p1k
bullL hashings fail to collide with Pr=(1-p1k)L
bullTo ensure Collision (eg 1-δge90)
bull1( -1-p1k)Lge 1-δ)1log(
)log(
1kp
L
L
Reject Non-NeighborsAccept Neighbors
hellipParameters selection
K
k
time Candidates verification Candidates extraction
Better Query Time than Spatial Data Structures
Scales well to higher dimensions and larger data size ( Sub-linear dependence )
Predictable running time
Extra storage over-head
Inefficient for data with distances concentrated around average
works best for Hamming distance (although can be generalized to Euclidean space)
In secondary storage linear scan is pretty much all we can do (for high dim)
requires radius r to be fixed in advance
Pros amp Cons
From Pioter Indyk slides
Conclusion
bullbut at the endeverything depends on your data set
bullTry it at homendashVisit
httpwebmiteduandoniwwwLSHindexhtml
ndashEmail Alex AndoniAndonimitedundashTest over your own data
(C code under Red Hat Linux )
LSH - Applicationsbull Searching video clips in databases (Hierarchical Non-Uniform Locality Sensitive
Hashing and Its Application to Video Identificationldquo Yang Ooi Sun)
bull Searching image databases (see the following)
bull Image segmentation (see the following)
bull Image classification (ldquoDiscriminant adaptive Nearest Neighbor Classificationrdquo T Hastie R Tibshirani)
bull Texture classification (see the following)
bull Clustering (see the following)
bull Embedding and manifold learning (LLE and many others)
bull Compression ndash vector quantizationbull Search engines (ldquoLSH Forest SelfTuning Indexes for Similarity Searchrdquo M Bawa T Condie P Ganesanrdquo)
bull Genomics (ldquoEfficient Large-Scale Sequence Comparison by Locality-Sensitive Hashingrdquo J Buhler)
bull In short whenever K-Nearest Neighbors (KNN) are needed
Motivation
bull A variety of procedures in learning require KNN computation
bull KNN search is a computational bottleneck
bull LSH provides a fast approximate solution to the problem
bull LSH requires hash function construction and parameter tunning
Outline
Fast Pose Estimation with Parameter Sensitive Hashing G Shakhnarovich P Viola and T Darrell
bull Finding sensitive hash functions
Mean Shift Based Clustering in HighDimensions A Texture Classification Example
B Georgescu I Shimshoni and P Meer
bull Tuning LSH parametersbull LSH data structure is used for algorithm
speedups
Given an image x what are the parameters θ in this image
ie angles of joints orientation of the body etc1048698
The Problem
Fast Pose Estimation with Parameter Sensitive Hashing
G Shakhnarovich P Viola and T Darrell
i
Ingredients
bull Input query image with unknown angles (parameters)
bull Database of human poses with known anglesbull Image feature extractor ndash edge detector
bull Distance metric in feature space dx
bull Distance metric in angles space
m
i
iid1
2121 )cos(1)(
Example based learning
bull Construct a database of example images with their known angles
bull Given a query image run your favorite feature extractorbull Compute KNN from databasebull Use these KNNs to compute the average angles of the
query
Input queryFind KNN in database of examples
Output Average angles of KNN
Input Query
Features extraction
Processed query
PSH (LSH)
Database of examples
The algorithm flow
LWR (Regression)
Output Match
The image features
B A
Axx 4107 )(
4
3
2
4 0
Image features are multi-scale edge histograms
Feature Extraction PSH LWR
PSH The basic assumption
There are two metric spaces here feature space ( )
and parameter space ( )
We want similarity to be measured in the angles
space whereas LSH works on the feature space
bull Assumption The feature space is closely related to the parameter space
xd
d
Feature Extraction PSH LWR
Insight Manifolds
bull Manifold is a space in which every point has a neighborhood resembling a Euclid space
bull But global structure may be complicated curved
bull For example lines are 1D manifolds planes are 2D manifolds etc
Feature Extraction PSH LWR
Parameters Space (angles)
Feature Space
q
Is this Magic
Parameter Sensitive Hashing (PSH)
The trick
Estimate performance of different hash functions on examples and select those sensitive to
The hash functions are applied in feature space but the KNN are valid in angle space
d
Feature Extraction PSH LWR
Label pairs of examples with similar angles
Define hash functions h on feature space
Feature Extraction PSH LWR
Predict labeling of similarnon-similar examples by using h
Compare labeling
If labeling by h is goodaccept h else change h
PSH as a classification problem
+1 +1 -1 -1
(r=025)
Labels
)1()( if 1
)( if 1y
labeled is
)x()(x examples ofpair A
ij
ji
rd
rd
ji
ji
ji
Feature Extraction PSH LWR
otherwise 1-
T(x) if 1)(
xh T
A binary hash functionfeatures
otherwise 1
if 1ˆ
labels ePredict th
)(xh)(xh)x(xy
jTiTjih
Feature Extraction PSH LWR
Feature
Feature Extraction PSH LWR
sconstraint iesprobabilit with thelabeling true thepredicts that Tbest theFind
themseparateor bin
same in the examplesboth place willTh
)(xT
Local Weighted Regression (LWR)bull Given a query image PSH returns
KNNs
bull LWR uses the KNN to compute a weighted average of the estimated angles of the query
weightdist
iXiixNx
xxdKxgdi
0)(
))(())((minarg0
Feature Extraction PSH LWR
Results
Synthetic data were generated
bull 13 angles 1 for rotation of the torso 12 for joints
bull 150000 images
bull Nuisance parameters added clothing illumination face expression
bull 1775000 example pairs
bull Selected 137 out of 5123 meaningful features (how)
18 bit hash functions (k) 150 hash tables (l)
bull Test on 1000 synthetic examplesbull PSH searched only 34 of the data per query
bull Without selection needed 40 bits and
1000 hash tables
Recall P1 is prob of positive hashP2 is prob of bad hashB is the max number of pts in a bucket
Results ndash real data
bull 800 images
bull Processed by a segmentation algorithm
bull 13 of the data were searched
Results ndash real data
Interesting mismatches
Fast pose estimation - summary
bull Fast way to compute the angles of human body figure
bull Moving from one representation space to another
bull Training a sensitive hash function
bull KNN smart averaging
Food for Thought
bull The basic assumption may be problematic (distance metric representations)
bull The training set should be dense
bull Texture and clutter
bull General some features are more important than others and should be weighted
Food for Thought Point Location in Different Spheres (PLDS)
bull Given n spheres in Rd centered at P=p1hellippn
with radii r1helliprn
bull Goal given a query q preprocess the points in P to find point pi that its sphere lsquocoverrsquo the query q
qpi
ri
Courtesy of Mohamad Hegaze
Motivationbull Clustering high dimensional data by using local
density measurements (eg feature space)bull Statistical curse of dimensionality
sparseness of the databull Computational curse of dimensionality
expensive range queriesbull LSH parameters should be adjusted for optimal
performance
Mean-Shift Based Clustering in High Dimensions A Texture Classification Example
B Georgescu I Shimshoni and P Meer
Outline
bull Mean-shift in a nutshell + examples
Our scope
bull Mean-shift in high dimensions ndash using LSH
bull Speedups1 Finding optimal LSH parameters
2 Data-driven partitions into buckets
3 Additional speedup by using LSH data structure
Mean-Shift in a Nutshellbandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
point
KNN in mean-shift
Bandwidth should be inversely proportional to the density in the region
high density - small bandwidth low density - large bandwidth
Based on kth nearest neighbor of the point
The bandwidth is
Adaptive mean-shift vs non-adaptive
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Image segmentation algorithm1 Input Data in 5D (3 color + 2 xy) or 3D (1 gray +2 xy)2 Resolution controlled by the bandwidth hs (spatial) hr (color)3 Apply filtering
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
3D
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Image segmentation algorithm
original segmented
filtered
Filtering pixel value of the nearest mode
Mean-shift trajectories
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
original squirrel filtered
original baboon filtered
Filtering examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Segmentation examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Mean-shift in high dimensions
Computational curse of dimensionality
Statistical curse of dimensionality
Expensive range queries implemented with LSH
Sparseness of the data variable bandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
LSH-based data structure
bull Choose L random partitionsEach partition includes K pairs
(dkvk)bull For each point we check
kdi vxK
It Partitions the data into cells
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing the optimal K and L
bull For a query q compute smallest number of distances to points in its buckets
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
points extra includemight big toois L ifbut
missed bemight points small toois L If
cell ain points ofnumber smaller k Large
C
l
l
CC
dC
LNN
dKnN
)1(
C
C
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
structure data theof resolution thedetermines
decreases but increases increases L As
C
CC
Choosing optimal K and LDetermine accurately the KNN for m randomly-selected data points
distance (bandwidth)
Choose error threshold
The optimal K and L should satisfy
the approximate distance
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing optimal K and Lbull For each K estimate the error forbull In one run for all Lrsquos find the minimal L satisfying the constraint L(K)bull Minimize time t(KL(K))
minimum
Approximationerror for KL
L(K) for =005 Running timet[KL(K)]
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Data driven partitions
bull In original LSH cut values are random in the range of the databull Suggestion Randomly select a point from the data and use one of its coordinates as the cut value
uniform data driven pointsbucket distribution
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Additional speedup
aggregate)an of typea like is (C mode same
the toconverge willCin points all that Assume
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
C
C
Speedup results
65536 points 1638 points sampled k=100
Food for thought
Low dimension High dimension
A thought for foodhellipbull Choose K L by sample learning or take the traditionalbull Can one estimate K L without samplingbull A thought for food does it help to know the data
dimensionality or the data manifoldbull Intuitively dimensionality implies the number of hash
functions neededbull The catch efficient dimensionality learning requires
KNN
1530 cookieshellip
Summary
bull LSH suggests a compromise on accuracy for the gain of complexity
bull Applications that involve massive data in high dimension require the LSH fast performance
bull Extension of the LSH to different spaces (PSH)
bull Learning the LSH parameters and hash functions for different applications
Conclusion
bull but at the endeverything depends on your data set
bull Try it at homendash Visit
httpwebmiteduandoniwwwLSHindexhtmlndash Email Alex Andoni Andonimitedundash Test over your own data
(C code under Red Hat Linux )
Thanks
bull Ilan Shimshoni (Haifa)
bull Mohamad Hegaze (Weizmann)
bull Alex Andoni (MIT)
bull Mica and Denis
- k-Nearest Neighbors Search in High Dimensions
- Outline
- Nearest Neighbor Search Problem definition
- Applications
- Naiumlve solution
- Common solution
- When to use nearest neighbor
- Nearest Neighbor
- r - Nearest Neighbor
- Slide 10
- The simplest solution
- Quadtree
- Quadtree - structure
- Quadtree - Query
- Quadtree ndash Pitfall1
- Slide 16
- Quadtree ndash pitfall 2
- Space partition based algorithms
- Slide 19
- Curse of dimensionality
- Curse of dimensionality Some intuition
- Slide 22
- Preview
- Hash function
- Slide 25
- Slide 26
- Recall r - Nearest Neighbor
- Locality sensitive hashing
- Slide 29
- Hamming Space
- Slide 31
- L1 to Hamming Space Embedding
- Slide 33
- Construction
- Query
- Alternative intuition random projections
- Slide 37
- Slide 38
- Slide 39
- k samplings
- Repeating
- Repeating L times
- Slide 43
- Secondary hashing
- The above hashing is locality-sensitive
- Slide 46
- Direct L2 solution
- Central limit theorem
- Slide 49
- Slide 50
- Norm Distance
- Slide 52
- The full Hashing
- Slide 54
- Slide 55
- Slide 56
- Generalization P-Stable distribution
- P-Stable summary
- Parameters selection
- Parameters selection hellip
- hellip Parameters selection
- Pros amp Cons
- Conclusion
- LSH - Applications
- Motivation
- Slide 66
- Given an image x what are the parameters θ in this image ie angles of joints orientation of the body etc1048698
- Ingredients
- Example based learning
- Slide 70
- The image features
- PSH The basic assumption
- Insight Manifolds
- Slide 74
- Parameter Sensitive Hashing (PSH)
- Slide 76
- Slide 77
- Slide 78
- Slide 79
- Local Weighted Regression (LWR)
- Results
- Slide 82
- Results ndash real data
- Slide 84
- Slide 85
- Fast pose estimation - summary
- Food for Thought
- Food for Thought Point Location in Different Spheres (PLDS)
- Motivation
- Slide 90
- Slide 91
- Slide 92
- Slide 93
- Image segmentation algorithm
- Slide 95
- Filtering examples
- Segmentation examples
- Mean-shift in high dimensions
- LSH-based data structure
- Choosing the optimal K and L
- Slide 101
- Choosing optimal K and L
- Slide 103
- Data driven partitions
- Additional speedup
- Speedup results
- Food for thought
- A thought for foodhellip
- Summary
- Slide 110
- Thanks
-
![Page 48: K-Nearest Neighbors Search in High Dimensions Tomer Peled Dan Kushnir Tell me who your neighbors are, and I'll know who you are.](https://reader036.fdocuments.us/reader036/viewer/2022062421/56649c765503460f94929f53/html5/thumbnails/48.jpg)
Central limit theorem
v1 +v2 hellip+vn =+hellip
(Weighted Gaussians) = Weighted Gaussian
Central limit theorem
v1vn = Real Numbers
X1Xn = Independent Identically Distributed(iid)
+v2 X2 hellip+vn Xn =+hellipv1 X1
Central limit theorem
XvXvi
ii
ii
21
2||
Dot Product Norm
Norm Distance
XvuXvXui
iii
iii
ii
21
2||
Features vector 1
Features vector 2 Distance
Norm Distance
XvuXvXui
iii
iii
ii
21
2||
Dot Product
Dot Product Distance
The full Hashing
w
bvavh ba )(
[34 82 21]1
227742
d
d random numbers
+b
phaseRandom[0w]
wDiscretization step
Features vector
The full Hashing
w
bvavh ba )(
+34
100
7944
7900 8000 8100 82007800
The full Hashing
w
bvavh ba )(
+34
phaseRandom[0w]
100Discretization step
7944
The full Hashing
w
bvavh ba )(
a1 v d
iid from p-stable distribution
+b
phaseRandom[0w]
wDiscretization step
Features vector
Generalization P-Stable distribution
bullLp p=eps2
bullGeneralized Central Limit Theorem
bullP-stable distributionCauchy for L2
bullL2
bullCentral Limit Theorem
bullGaussian (normal) distribution
P-Stable summary
bullWorks for bullGeneralizes to 0ltplt=2
bullImproves query time
Query time = O (dn1(1+)log(n) ) O (dn1(1+)^2log(n) )
r - Nearest Neighbor
Latest resultsReported in Email by
Alexander Andoni
Parameters selection
bull90 Probability Best quarry time performance
For Euclidean Space
Parameters selectionhellip
For Euclidean Space
bullSingle projection hit an - Nearest Neighbor with Pr=p1
bullk projections hits an - Nearest Neighbor with Pr=p1k
bullL hashings fail to collide with Pr=(1-p1k)L
bullTo ensure Collision (eg 1-δge90)
bull1( -1-p1k)Lge 1-δ)1log(
)log(
1kp
L
L
Reject Non-NeighborsAccept Neighbors
hellipParameters selection
K
k
time Candidates verification Candidates extraction
Better Query Time than Spatial Data Structures
Scales well to higher dimensions and larger data size ( Sub-linear dependence )
Predictable running time
Extra storage over-head
Inefficient for data with distances concentrated around average
works best for Hamming distance (although can be generalized to Euclidean space)
In secondary storage linear scan is pretty much all we can do (for high dim)
requires radius r to be fixed in advance
Pros amp Cons
From Pioter Indyk slides
Conclusion
bullbut at the endeverything depends on your data set
bullTry it at homendashVisit
httpwebmiteduandoniwwwLSHindexhtml
ndashEmail Alex AndoniAndonimitedundashTest over your own data
(C code under Red Hat Linux )
LSH - Applicationsbull Searching video clips in databases (Hierarchical Non-Uniform Locality Sensitive
Hashing and Its Application to Video Identificationldquo Yang Ooi Sun)
bull Searching image databases (see the following)
bull Image segmentation (see the following)
bull Image classification (ldquoDiscriminant adaptive Nearest Neighbor Classificationrdquo T Hastie R Tibshirani)
bull Texture classification (see the following)
bull Clustering (see the following)
bull Embedding and manifold learning (LLE and many others)
bull Compression ndash vector quantizationbull Search engines (ldquoLSH Forest SelfTuning Indexes for Similarity Searchrdquo M Bawa T Condie P Ganesanrdquo)
bull Genomics (ldquoEfficient Large-Scale Sequence Comparison by Locality-Sensitive Hashingrdquo J Buhler)
bull In short whenever K-Nearest Neighbors (KNN) are needed
Motivation
bull A variety of procedures in learning require KNN computation
bull KNN search is a computational bottleneck
bull LSH provides a fast approximate solution to the problem
bull LSH requires hash function construction and parameter tunning
Outline
Fast Pose Estimation with Parameter Sensitive Hashing G Shakhnarovich P Viola and T Darrell
bull Finding sensitive hash functions
Mean Shift Based Clustering in HighDimensions A Texture Classification Example
B Georgescu I Shimshoni and P Meer
bull Tuning LSH parametersbull LSH data structure is used for algorithm
speedups
Given an image x what are the parameters θ in this image
ie angles of joints orientation of the body etc1048698
The Problem
Fast Pose Estimation with Parameter Sensitive Hashing
G Shakhnarovich P Viola and T Darrell
i
Ingredients
bull Input query image with unknown angles (parameters)
bull Database of human poses with known anglesbull Image feature extractor ndash edge detector
bull Distance metric in feature space dx
bull Distance metric in angles space
m
i
iid1
2121 )cos(1)(
Example based learning
bull Construct a database of example images with their known angles
bull Given a query image run your favorite feature extractorbull Compute KNN from databasebull Use these KNNs to compute the average angles of the
query
Input queryFind KNN in database of examples
Output Average angles of KNN
Input Query
Features extraction
Processed query
PSH (LSH)
Database of examples
The algorithm flow
LWR (Regression)
Output Match
The image features
B A
Axx 4107 )(
4
3
2
4 0
Image features are multi-scale edge histograms
Feature Extraction PSH LWR
PSH The basic assumption
There are two metric spaces here feature space ( )
and parameter space ( )
We want similarity to be measured in the angles
space whereas LSH works on the feature space
bull Assumption The feature space is closely related to the parameter space
xd
d
Feature Extraction PSH LWR
Insight Manifolds
bull Manifold is a space in which every point has a neighborhood resembling a Euclid space
bull But global structure may be complicated curved
bull For example lines are 1D manifolds planes are 2D manifolds etc
Feature Extraction PSH LWR
Parameters Space (angles)
Feature Space
q
Is this Magic
Parameter Sensitive Hashing (PSH)
The trick
Estimate performance of different hash functions on examples and select those sensitive to
The hash functions are applied in feature space but the KNN are valid in angle space
d
Feature Extraction PSH LWR
Label pairs of examples with similar angles
Define hash functions h on feature space
Feature Extraction PSH LWR
Predict labeling of similarnon-similar examples by using h
Compare labeling
If labeling by h is goodaccept h else change h
PSH as a classification problem
+1 +1 -1 -1
(r=025)
Labels
)1()( if 1
)( if 1y
labeled is
)x()(x examples ofpair A
ij
ji
rd
rd
ji
ji
ji
Feature Extraction PSH LWR
otherwise 1-
T(x) if 1)(
xh T
A binary hash functionfeatures
otherwise 1
if 1ˆ
labels ePredict th
)(xh)(xh)x(xy
jTiTjih
Feature Extraction PSH LWR
Feature
Feature Extraction PSH LWR
sconstraint iesprobabilit with thelabeling true thepredicts that Tbest theFind
themseparateor bin
same in the examplesboth place willTh
)(xT
Local Weighted Regression (LWR)bull Given a query image PSH returns
KNNs
bull LWR uses the KNN to compute a weighted average of the estimated angles of the query
weightdist
iXiixNx
xxdKxgdi
0)(
))(())((minarg0
Feature Extraction PSH LWR
Results
Synthetic data were generated
bull 13 angles 1 for rotation of the torso 12 for joints
bull 150000 images
bull Nuisance parameters added clothing illumination face expression
bull 1775000 example pairs
bull Selected 137 out of 5123 meaningful features (how)
18 bit hash functions (k) 150 hash tables (l)
bull Test on 1000 synthetic examplesbull PSH searched only 34 of the data per query
bull Without selection needed 40 bits and
1000 hash tables
Recall P1 is prob of positive hashP2 is prob of bad hashB is the max number of pts in a bucket
Results ndash real data
bull 800 images
bull Processed by a segmentation algorithm
bull 13 of the data were searched
Results ndash real data
Interesting mismatches
Fast pose estimation - summary
bull Fast way to compute the angles of human body figure
bull Moving from one representation space to another
bull Training a sensitive hash function
bull KNN smart averaging
Food for Thought
bull The basic assumption may be problematic (distance metric representations)
bull The training set should be dense
bull Texture and clutter
bull General some features are more important than others and should be weighted
Food for Thought Point Location in Different Spheres (PLDS)
bull Given n spheres in Rd centered at P=p1hellippn
with radii r1helliprn
bull Goal given a query q preprocess the points in P to find point pi that its sphere lsquocoverrsquo the query q
qpi
ri
Courtesy of Mohamad Hegaze
Motivationbull Clustering high dimensional data by using local
density measurements (eg feature space)bull Statistical curse of dimensionality
sparseness of the databull Computational curse of dimensionality
expensive range queriesbull LSH parameters should be adjusted for optimal
performance
Mean-Shift Based Clustering in High Dimensions A Texture Classification Example
B Georgescu I Shimshoni and P Meer
Outline
bull Mean-shift in a nutshell + examples
Our scope
bull Mean-shift in high dimensions ndash using LSH
bull Speedups1 Finding optimal LSH parameters
2 Data-driven partitions into buckets
3 Additional speedup by using LSH data structure
Mean-Shift in a Nutshellbandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
point
KNN in mean-shift
Bandwidth should be inversely proportional to the density in the region
high density - small bandwidth low density - large bandwidth
Based on kth nearest neighbor of the point
The bandwidth is
Adaptive mean-shift vs non-adaptive
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Image segmentation algorithm1 Input Data in 5D (3 color + 2 xy) or 3D (1 gray +2 xy)2 Resolution controlled by the bandwidth hs (spatial) hr (color)3 Apply filtering
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
3D
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Image segmentation algorithm
original segmented
filtered
Filtering pixel value of the nearest mode
Mean-shift trajectories
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
original squirrel filtered
original baboon filtered
Filtering examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Segmentation examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Mean-shift in high dimensions
Computational curse of dimensionality
Statistical curse of dimensionality
Expensive range queries implemented with LSH
Sparseness of the data variable bandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
LSH-based data structure
bull Choose L random partitionsEach partition includes K pairs
(dkvk)bull For each point we check
kdi vxK
It Partitions the data into cells
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing the optimal K and L
bull For a query q compute smallest number of distances to points in its buckets
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
points extra includemight big toois L ifbut
missed bemight points small toois L If
cell ain points ofnumber smaller k Large
C
l
l
CC
dC
LNN
dKnN
)1(
C
C
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
structure data theof resolution thedetermines
decreases but increases increases L As
C
CC
Choosing optimal K and LDetermine accurately the KNN for m randomly-selected data points
distance (bandwidth)
Choose error threshold
The optimal K and L should satisfy
the approximate distance
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing optimal K and Lbull For each K estimate the error forbull In one run for all Lrsquos find the minimal L satisfying the constraint L(K)bull Minimize time t(KL(K))
minimum
Approximationerror for KL
L(K) for =005 Running timet[KL(K)]
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Data driven partitions
bull In original LSH cut values are random in the range of the databull Suggestion Randomly select a point from the data and use one of its coordinates as the cut value
uniform data driven pointsbucket distribution
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Additional speedup
aggregate)an of typea like is (C mode same
the toconverge willCin points all that Assume
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
C
C
Speedup results
65536 points 1638 points sampled k=100
Food for thought
Low dimension High dimension
A thought for foodhellipbull Choose K L by sample learning or take the traditionalbull Can one estimate K L without samplingbull A thought for food does it help to know the data
dimensionality or the data manifoldbull Intuitively dimensionality implies the number of hash
functions neededbull The catch efficient dimensionality learning requires
KNN
1530 cookieshellip
Summary
bull LSH suggests a compromise on accuracy for the gain of complexity
bull Applications that involve massive data in high dimension require the LSH fast performance
bull Extension of the LSH to different spaces (PSH)
bull Learning the LSH parameters and hash functions for different applications
Conclusion
bull but at the endeverything depends on your data set
bull Try it at homendash Visit
httpwebmiteduandoniwwwLSHindexhtmlndash Email Alex Andoni Andonimitedundash Test over your own data
(C code under Red Hat Linux )
Thanks
bull Ilan Shimshoni (Haifa)
bull Mohamad Hegaze (Weizmann)
bull Alex Andoni (MIT)
bull Mica and Denis
- k-Nearest Neighbors Search in High Dimensions
- Outline
- Nearest Neighbor Search Problem definition
- Applications
- Naiumlve solution
- Common solution
- When to use nearest neighbor
- Nearest Neighbor
- r - Nearest Neighbor
- Slide 10
- The simplest solution
- Quadtree
- Quadtree - structure
- Quadtree - Query
- Quadtree ndash Pitfall1
- Slide 16
- Quadtree ndash pitfall 2
- Space partition based algorithms
- Slide 19
- Curse of dimensionality
- Curse of dimensionality Some intuition
- Slide 22
- Preview
- Hash function
- Slide 25
- Slide 26
- Recall r - Nearest Neighbor
- Locality sensitive hashing
- Slide 29
- Hamming Space
- Slide 31
- L1 to Hamming Space Embedding
- Slide 33
- Construction
- Query
- Alternative intuition random projections
- Slide 37
- Slide 38
- Slide 39
- k samplings
- Repeating
- Repeating L times
- Slide 43
- Secondary hashing
- The above hashing is locality-sensitive
- Slide 46
- Direct L2 solution
- Central limit theorem
- Slide 49
- Slide 50
- Norm Distance
- Slide 52
- The full Hashing
- Slide 54
- Slide 55
- Slide 56
- Generalization P-Stable distribution
- P-Stable summary
- Parameters selection
- Parameters selection hellip
- hellip Parameters selection
- Pros amp Cons
- Conclusion
- LSH - Applications
- Motivation
- Slide 66
- Given an image x what are the parameters θ in this image ie angles of joints orientation of the body etc1048698
- Ingredients
- Example based learning
- Slide 70
- The image features
- PSH The basic assumption
- Insight Manifolds
- Slide 74
- Parameter Sensitive Hashing (PSH)
- Slide 76
- Slide 77
- Slide 78
- Slide 79
- Local Weighted Regression (LWR)
- Results
- Slide 82
- Results ndash real data
- Slide 84
- Slide 85
- Fast pose estimation - summary
- Food for Thought
- Food for Thought Point Location in Different Spheres (PLDS)
- Motivation
- Slide 90
- Slide 91
- Slide 92
- Slide 93
- Image segmentation algorithm
- Slide 95
- Filtering examples
- Segmentation examples
- Mean-shift in high dimensions
- LSH-based data structure
- Choosing the optimal K and L
- Slide 101
- Choosing optimal K and L
- Slide 103
- Data driven partitions
- Additional speedup
- Speedup results
- Food for thought
- A thought for foodhellip
- Summary
- Slide 110
- Thanks
-
![Page 49: K-Nearest Neighbors Search in High Dimensions Tomer Peled Dan Kushnir Tell me who your neighbors are, and I'll know who you are.](https://reader036.fdocuments.us/reader036/viewer/2022062421/56649c765503460f94929f53/html5/thumbnails/49.jpg)
Central limit theorem
v1vn = Real Numbers
X1Xn = Independent Identically Distributed(iid)
+v2 X2 hellip+vn Xn =+hellipv1 X1
Central limit theorem
XvXvi
ii
ii
21
2||
Dot Product Norm
Norm Distance
XvuXvXui
iii
iii
ii
21
2||
Features vector 1
Features vector 2 Distance
Norm Distance
XvuXvXui
iii
iii
ii
21
2||
Dot Product
Dot Product Distance
The full Hashing
w
bvavh ba )(
[34 82 21]1
227742
d
d random numbers
+b
phaseRandom[0w]
wDiscretization step
Features vector
The full Hashing
w
bvavh ba )(
+34
100
7944
7900 8000 8100 82007800
The full Hashing
w
bvavh ba )(
+34
phaseRandom[0w]
100Discretization step
7944
The full Hashing
w
bvavh ba )(
a1 v d
iid from p-stable distribution
+b
phaseRandom[0w]
wDiscretization step
Features vector
Generalization P-Stable distribution
bullLp p=eps2
bullGeneralized Central Limit Theorem
bullP-stable distributionCauchy for L2
bullL2
bullCentral Limit Theorem
bullGaussian (normal) distribution
P-Stable summary
bullWorks for bullGeneralizes to 0ltplt=2
bullImproves query time
Query time = O (dn1(1+)log(n) ) O (dn1(1+)^2log(n) )
r - Nearest Neighbor
Latest resultsReported in Email by
Alexander Andoni
Parameters selection
bull90 Probability Best quarry time performance
For Euclidean Space
Parameters selectionhellip
For Euclidean Space
bullSingle projection hit an - Nearest Neighbor with Pr=p1
bullk projections hits an - Nearest Neighbor with Pr=p1k
bullL hashings fail to collide with Pr=(1-p1k)L
bullTo ensure Collision (eg 1-δge90)
bull1( -1-p1k)Lge 1-δ)1log(
)log(
1kp
L
L
Reject Non-NeighborsAccept Neighbors
hellipParameters selection
K
k
time Candidates verification Candidates extraction
Better Query Time than Spatial Data Structures
Scales well to higher dimensions and larger data size ( Sub-linear dependence )
Predictable running time
Extra storage over-head
Inefficient for data with distances concentrated around average
works best for Hamming distance (although can be generalized to Euclidean space)
In secondary storage linear scan is pretty much all we can do (for high dim)
requires radius r to be fixed in advance
Pros amp Cons
From Pioter Indyk slides
Conclusion
bullbut at the endeverything depends on your data set
bullTry it at homendashVisit
httpwebmiteduandoniwwwLSHindexhtml
ndashEmail Alex AndoniAndonimitedundashTest over your own data
(C code under Red Hat Linux )
LSH - Applicationsbull Searching video clips in databases (Hierarchical Non-Uniform Locality Sensitive
Hashing and Its Application to Video Identificationldquo Yang Ooi Sun)
bull Searching image databases (see the following)
bull Image segmentation (see the following)
bull Image classification (ldquoDiscriminant adaptive Nearest Neighbor Classificationrdquo T Hastie R Tibshirani)
bull Texture classification (see the following)
bull Clustering (see the following)
bull Embedding and manifold learning (LLE and many others)
bull Compression ndash vector quantizationbull Search engines (ldquoLSH Forest SelfTuning Indexes for Similarity Searchrdquo M Bawa T Condie P Ganesanrdquo)
bull Genomics (ldquoEfficient Large-Scale Sequence Comparison by Locality-Sensitive Hashingrdquo J Buhler)
bull In short whenever K-Nearest Neighbors (KNN) are needed
Motivation
bull A variety of procedures in learning require KNN computation
bull KNN search is a computational bottleneck
bull LSH provides a fast approximate solution to the problem
bull LSH requires hash function construction and parameter tunning
Outline
Fast Pose Estimation with Parameter Sensitive Hashing G Shakhnarovich P Viola and T Darrell
bull Finding sensitive hash functions
Mean Shift Based Clustering in HighDimensions A Texture Classification Example
B Georgescu I Shimshoni and P Meer
bull Tuning LSH parametersbull LSH data structure is used for algorithm
speedups
Given an image x what are the parameters θ in this image
ie angles of joints orientation of the body etc1048698
The Problem
Fast Pose Estimation with Parameter Sensitive Hashing
G Shakhnarovich P Viola and T Darrell
i
Ingredients
bull Input query image with unknown angles (parameters)
bull Database of human poses with known anglesbull Image feature extractor ndash edge detector
bull Distance metric in feature space dx
bull Distance metric in angles space
m
i
iid1
2121 )cos(1)(
Example based learning
bull Construct a database of example images with their known angles
bull Given a query image run your favorite feature extractorbull Compute KNN from databasebull Use these KNNs to compute the average angles of the
query
Input queryFind KNN in database of examples
Output Average angles of KNN
Input Query
Features extraction
Processed query
PSH (LSH)
Database of examples
The algorithm flow
LWR (Regression)
Output Match
The image features
B A
Axx 4107 )(
4
3
2
4 0
Image features are multi-scale edge histograms
Feature Extraction PSH LWR
PSH The basic assumption
There are two metric spaces here feature space ( )
and parameter space ( )
We want similarity to be measured in the angles
space whereas LSH works on the feature space
bull Assumption The feature space is closely related to the parameter space
xd
d
Feature Extraction PSH LWR
Insight Manifolds
bull Manifold is a space in which every point has a neighborhood resembling a Euclid space
bull But global structure may be complicated curved
bull For example lines are 1D manifolds planes are 2D manifolds etc
Feature Extraction PSH LWR
Parameters Space (angles)
Feature Space
q
Is this Magic
Parameter Sensitive Hashing (PSH)
The trick
Estimate performance of different hash functions on examples and select those sensitive to
The hash functions are applied in feature space but the KNN are valid in angle space
d
Feature Extraction PSH LWR
Label pairs of examples with similar angles
Define hash functions h on feature space
Feature Extraction PSH LWR
Predict labeling of similarnon-similar examples by using h
Compare labeling
If labeling by h is goodaccept h else change h
PSH as a classification problem
+1 +1 -1 -1
(r=025)
Labels
)1()( if 1
)( if 1y
labeled is
)x()(x examples ofpair A
ij
ji
rd
rd
ji
ji
ji
Feature Extraction PSH LWR
otherwise 1-
T(x) if 1)(
xh T
A binary hash functionfeatures
otherwise 1
if 1ˆ
labels ePredict th
)(xh)(xh)x(xy
jTiTjih
Feature Extraction PSH LWR
Feature
Feature Extraction PSH LWR
sconstraint iesprobabilit with thelabeling true thepredicts that Tbest theFind
themseparateor bin
same in the examplesboth place willTh
)(xT
Local Weighted Regression (LWR)bull Given a query image PSH returns
KNNs
bull LWR uses the KNN to compute a weighted average of the estimated angles of the query
weightdist
iXiixNx
xxdKxgdi
0)(
))(())((minarg0
Feature Extraction PSH LWR
Results
Synthetic data were generated
bull 13 angles 1 for rotation of the torso 12 for joints
bull 150000 images
bull Nuisance parameters added clothing illumination face expression
bull 1775000 example pairs
bull Selected 137 out of 5123 meaningful features (how)
18 bit hash functions (k) 150 hash tables (l)
bull Test on 1000 synthetic examplesbull PSH searched only 34 of the data per query
bull Without selection needed 40 bits and
1000 hash tables
Recall P1 is prob of positive hashP2 is prob of bad hashB is the max number of pts in a bucket
Results ndash real data
bull 800 images
bull Processed by a segmentation algorithm
bull 13 of the data were searched
Results ndash real data
Interesting mismatches
Fast pose estimation - summary
bull Fast way to compute the angles of human body figure
bull Moving from one representation space to another
bull Training a sensitive hash function
bull KNN smart averaging
Food for Thought
bull The basic assumption may be problematic (distance metric representations)
bull The training set should be dense
bull Texture and clutter
bull General some features are more important than others and should be weighted
Food for Thought Point Location in Different Spheres (PLDS)
bull Given n spheres in Rd centered at P=p1hellippn
with radii r1helliprn
bull Goal given a query q preprocess the points in P to find point pi that its sphere lsquocoverrsquo the query q
qpi
ri
Courtesy of Mohamad Hegaze
Motivationbull Clustering high dimensional data by using local
density measurements (eg feature space)bull Statistical curse of dimensionality
sparseness of the databull Computational curse of dimensionality
expensive range queriesbull LSH parameters should be adjusted for optimal
performance
Mean-Shift Based Clustering in High Dimensions A Texture Classification Example
B Georgescu I Shimshoni and P Meer
Outline
bull Mean-shift in a nutshell + examples
Our scope
bull Mean-shift in high dimensions ndash using LSH
bull Speedups1 Finding optimal LSH parameters
2 Data-driven partitions into buckets
3 Additional speedup by using LSH data structure
Mean-Shift in a Nutshellbandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
point
KNN in mean-shift
Bandwidth should be inversely proportional to the density in the region
high density - small bandwidth low density - large bandwidth
Based on kth nearest neighbor of the point
The bandwidth is
Adaptive mean-shift vs non-adaptive
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Image segmentation algorithm1 Input Data in 5D (3 color + 2 xy) or 3D (1 gray +2 xy)2 Resolution controlled by the bandwidth hs (spatial) hr (color)3 Apply filtering
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
3D
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Image segmentation algorithm
original segmented
filtered
Filtering pixel value of the nearest mode
Mean-shift trajectories
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
original squirrel filtered
original baboon filtered
Filtering examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Segmentation examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Mean-shift in high dimensions
Computational curse of dimensionality
Statistical curse of dimensionality
Expensive range queries implemented with LSH
Sparseness of the data variable bandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
LSH-based data structure
bull Choose L random partitionsEach partition includes K pairs
(dkvk)bull For each point we check
kdi vxK
It Partitions the data into cells
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing the optimal K and L
bull For a query q compute smallest number of distances to points in its buckets
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
points extra includemight big toois L ifbut
missed bemight points small toois L If
cell ain points ofnumber smaller k Large
C
l
l
CC
dC
LNN
dKnN
)1(
C
C
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
structure data theof resolution thedetermines
decreases but increases increases L As
C
CC
Choosing optimal K and LDetermine accurately the KNN for m randomly-selected data points
distance (bandwidth)
Choose error threshold
The optimal K and L should satisfy
the approximate distance
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing optimal K and Lbull For each K estimate the error forbull In one run for all Lrsquos find the minimal L satisfying the constraint L(K)bull Minimize time t(KL(K))
minimum
Approximationerror for KL
L(K) for =005 Running timet[KL(K)]
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Data driven partitions
bull In original LSH cut values are random in the range of the databull Suggestion Randomly select a point from the data and use one of its coordinates as the cut value
uniform data driven pointsbucket distribution
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Additional speedup
aggregate)an of typea like is (C mode same
the toconverge willCin points all that Assume
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
C
C
Speedup results
65536 points 1638 points sampled k=100
Food for thought
Low dimension High dimension
A thought for foodhellipbull Choose K L by sample learning or take the traditionalbull Can one estimate K L without samplingbull A thought for food does it help to know the data
dimensionality or the data manifoldbull Intuitively dimensionality implies the number of hash
functions neededbull The catch efficient dimensionality learning requires
KNN
1530 cookieshellip
Summary
bull LSH suggests a compromise on accuracy for the gain of complexity
bull Applications that involve massive data in high dimension require the LSH fast performance
bull Extension of the LSH to different spaces (PSH)
bull Learning the LSH parameters and hash functions for different applications
Conclusion
bull but at the endeverything depends on your data set
bull Try it at homendash Visit
httpwebmiteduandoniwwwLSHindexhtmlndash Email Alex Andoni Andonimitedundash Test over your own data
(C code under Red Hat Linux )
Thanks
bull Ilan Shimshoni (Haifa)
bull Mohamad Hegaze (Weizmann)
bull Alex Andoni (MIT)
bull Mica and Denis
- k-Nearest Neighbors Search in High Dimensions
- Outline
- Nearest Neighbor Search Problem definition
- Applications
- Naiumlve solution
- Common solution
- When to use nearest neighbor
- Nearest Neighbor
- r - Nearest Neighbor
- Slide 10
- The simplest solution
- Quadtree
- Quadtree - structure
- Quadtree - Query
- Quadtree ndash Pitfall1
- Slide 16
- Quadtree ndash pitfall 2
- Space partition based algorithms
- Slide 19
- Curse of dimensionality
- Curse of dimensionality Some intuition
- Slide 22
- Preview
- Hash function
- Slide 25
- Slide 26
- Recall r - Nearest Neighbor
- Locality sensitive hashing
- Slide 29
- Hamming Space
- Slide 31
- L1 to Hamming Space Embedding
- Slide 33
- Construction
- Query
- Alternative intuition random projections
- Slide 37
- Slide 38
- Slide 39
- k samplings
- Repeating
- Repeating L times
- Slide 43
- Secondary hashing
- The above hashing is locality-sensitive
- Slide 46
- Direct L2 solution
- Central limit theorem
- Slide 49
- Slide 50
- Norm Distance
- Slide 52
- The full Hashing
- Slide 54
- Slide 55
- Slide 56
- Generalization P-Stable distribution
- P-Stable summary
- Parameters selection
- Parameters selection hellip
- hellip Parameters selection
- Pros amp Cons
- Conclusion
- LSH - Applications
- Motivation
- Slide 66
- Given an image x what are the parameters θ in this image ie angles of joints orientation of the body etc1048698
- Ingredients
- Example based learning
- Slide 70
- The image features
- PSH The basic assumption
- Insight Manifolds
- Slide 74
- Parameter Sensitive Hashing (PSH)
- Slide 76
- Slide 77
- Slide 78
- Slide 79
- Local Weighted Regression (LWR)
- Results
- Slide 82
- Results ndash real data
- Slide 84
- Slide 85
- Fast pose estimation - summary
- Food for Thought
- Food for Thought Point Location in Different Spheres (PLDS)
- Motivation
- Slide 90
- Slide 91
- Slide 92
- Slide 93
- Image segmentation algorithm
- Slide 95
- Filtering examples
- Segmentation examples
- Mean-shift in high dimensions
- LSH-based data structure
- Choosing the optimal K and L
- Slide 101
- Choosing optimal K and L
- Slide 103
- Data driven partitions
- Additional speedup
- Speedup results
- Food for thought
- A thought for foodhellip
- Summary
- Slide 110
- Thanks
-
![Page 50: K-Nearest Neighbors Search in High Dimensions Tomer Peled Dan Kushnir Tell me who your neighbors are, and I'll know who you are.](https://reader036.fdocuments.us/reader036/viewer/2022062421/56649c765503460f94929f53/html5/thumbnails/50.jpg)
Central limit theorem
XvXvi
ii
ii
21
2||
Dot Product Norm
Norm Distance
XvuXvXui
iii
iii
ii
21
2||
Features vector 1
Features vector 2 Distance
Norm Distance
XvuXvXui
iii
iii
ii
21
2||
Dot Product
Dot Product Distance
The full Hashing
w
bvavh ba )(
[34 82 21]1
227742
d
d random numbers
+b
phaseRandom[0w]
wDiscretization step
Features vector
The full Hashing
w
bvavh ba )(
+34
100
7944
7900 8000 8100 82007800
The full Hashing
w
bvavh ba )(
+34
phaseRandom[0w]
100Discretization step
7944
The full Hashing
w
bvavh ba )(
a1 v d
iid from p-stable distribution
+b
phaseRandom[0w]
wDiscretization step
Features vector
Generalization P-Stable distribution
bullLp p=eps2
bullGeneralized Central Limit Theorem
bullP-stable distributionCauchy for L2
bullL2
bullCentral Limit Theorem
bullGaussian (normal) distribution
P-Stable summary
bullWorks for bullGeneralizes to 0ltplt=2
bullImproves query time
Query time = O (dn1(1+)log(n) ) O (dn1(1+)^2log(n) )
r - Nearest Neighbor
Latest resultsReported in Email by
Alexander Andoni
Parameters selection
bull90 Probability Best quarry time performance
For Euclidean Space
Parameters selectionhellip
For Euclidean Space
bullSingle projection hit an - Nearest Neighbor with Pr=p1
bullk projections hits an - Nearest Neighbor with Pr=p1k
bullL hashings fail to collide with Pr=(1-p1k)L
bullTo ensure Collision (eg 1-δge90)
bull1( -1-p1k)Lge 1-δ)1log(
)log(
1kp
L
L
Reject Non-NeighborsAccept Neighbors
hellipParameters selection
K
k
time Candidates verification Candidates extraction
Better Query Time than Spatial Data Structures
Scales well to higher dimensions and larger data size ( Sub-linear dependence )
Predictable running time
Extra storage over-head
Inefficient for data with distances concentrated around average
works best for Hamming distance (although can be generalized to Euclidean space)
In secondary storage linear scan is pretty much all we can do (for high dim)
requires radius r to be fixed in advance
Pros amp Cons
From Pioter Indyk slides
Conclusion
bullbut at the endeverything depends on your data set
bullTry it at homendashVisit
httpwebmiteduandoniwwwLSHindexhtml
ndashEmail Alex AndoniAndonimitedundashTest over your own data
(C code under Red Hat Linux )
LSH - Applicationsbull Searching video clips in databases (Hierarchical Non-Uniform Locality Sensitive
Hashing and Its Application to Video Identificationldquo Yang Ooi Sun)
bull Searching image databases (see the following)
bull Image segmentation (see the following)
bull Image classification (ldquoDiscriminant adaptive Nearest Neighbor Classificationrdquo T Hastie R Tibshirani)
bull Texture classification (see the following)
bull Clustering (see the following)
bull Embedding and manifold learning (LLE and many others)
bull Compression ndash vector quantizationbull Search engines (ldquoLSH Forest SelfTuning Indexes for Similarity Searchrdquo M Bawa T Condie P Ganesanrdquo)
bull Genomics (ldquoEfficient Large-Scale Sequence Comparison by Locality-Sensitive Hashingrdquo J Buhler)
bull In short whenever K-Nearest Neighbors (KNN) are needed
Motivation
bull A variety of procedures in learning require KNN computation
bull KNN search is a computational bottleneck
bull LSH provides a fast approximate solution to the problem
bull LSH requires hash function construction and parameter tunning
Outline
Fast Pose Estimation with Parameter Sensitive Hashing G Shakhnarovich P Viola and T Darrell
bull Finding sensitive hash functions
Mean Shift Based Clustering in HighDimensions A Texture Classification Example
B Georgescu I Shimshoni and P Meer
bull Tuning LSH parametersbull LSH data structure is used for algorithm
speedups
Given an image x what are the parameters θ in this image
ie angles of joints orientation of the body etc1048698
The Problem
Fast Pose Estimation with Parameter Sensitive Hashing
G Shakhnarovich P Viola and T Darrell
i
Ingredients
bull Input query image with unknown angles (parameters)
bull Database of human poses with known anglesbull Image feature extractor ndash edge detector
bull Distance metric in feature space dx
bull Distance metric in angles space
m
i
iid1
2121 )cos(1)(
Example based learning
bull Construct a database of example images with their known angles
bull Given a query image run your favorite feature extractorbull Compute KNN from databasebull Use these KNNs to compute the average angles of the
query
Input queryFind KNN in database of examples
Output Average angles of KNN
Input Query
Features extraction
Processed query
PSH (LSH)
Database of examples
The algorithm flow
LWR (Regression)
Output Match
The image features
B A
Axx 4107 )(
4
3
2
4 0
Image features are multi-scale edge histograms
Feature Extraction PSH LWR
PSH The basic assumption
There are two metric spaces here feature space ( )
and parameter space ( )
We want similarity to be measured in the angles
space whereas LSH works on the feature space
bull Assumption The feature space is closely related to the parameter space
xd
d
Feature Extraction PSH LWR
Insight Manifolds
bull Manifold is a space in which every point has a neighborhood resembling a Euclid space
bull But global structure may be complicated curved
bull For example lines are 1D manifolds planes are 2D manifolds etc
Feature Extraction PSH LWR
Parameters Space (angles)
Feature Space
q
Is this Magic
Parameter Sensitive Hashing (PSH)
The trick
Estimate performance of different hash functions on examples and select those sensitive to
The hash functions are applied in feature space but the KNN are valid in angle space
d
Feature Extraction PSH LWR
Label pairs of examples with similar angles
Define hash functions h on feature space
Feature Extraction PSH LWR
Predict labeling of similarnon-similar examples by using h
Compare labeling
If labeling by h is goodaccept h else change h
PSH as a classification problem
+1 +1 -1 -1
(r=025)
Labels
)1()( if 1
)( if 1y
labeled is
)x()(x examples ofpair A
ij
ji
rd
rd
ji
ji
ji
Feature Extraction PSH LWR
otherwise 1-
T(x) if 1)(
xh T
A binary hash functionfeatures
otherwise 1
if 1ˆ
labels ePredict th
)(xh)(xh)x(xy
jTiTjih
Feature Extraction PSH LWR
Feature
Feature Extraction PSH LWR
sconstraint iesprobabilit with thelabeling true thepredicts that Tbest theFind
themseparateor bin
same in the examplesboth place willTh
)(xT
Local Weighted Regression (LWR)bull Given a query image PSH returns
KNNs
bull LWR uses the KNN to compute a weighted average of the estimated angles of the query
weightdist
iXiixNx
xxdKxgdi
0)(
))(())((minarg0
Feature Extraction PSH LWR
Results
Synthetic data were generated
bull 13 angles 1 for rotation of the torso 12 for joints
bull 150000 images
bull Nuisance parameters added clothing illumination face expression
bull 1775000 example pairs
bull Selected 137 out of 5123 meaningful features (how)
18 bit hash functions (k) 150 hash tables (l)
bull Test on 1000 synthetic examplesbull PSH searched only 34 of the data per query
bull Without selection needed 40 bits and
1000 hash tables
Recall P1 is prob of positive hashP2 is prob of bad hashB is the max number of pts in a bucket
Results ndash real data
bull 800 images
bull Processed by a segmentation algorithm
bull 13 of the data were searched
Results ndash real data
Interesting mismatches
Fast pose estimation - summary
bull Fast way to compute the angles of human body figure
bull Moving from one representation space to another
bull Training a sensitive hash function
bull KNN smart averaging
Food for Thought
bull The basic assumption may be problematic (distance metric representations)
bull The training set should be dense
bull Texture and clutter
bull General some features are more important than others and should be weighted
Food for Thought Point Location in Different Spheres (PLDS)
bull Given n spheres in Rd centered at P=p1hellippn
with radii r1helliprn
bull Goal given a query q preprocess the points in P to find point pi that its sphere lsquocoverrsquo the query q
qpi
ri
Courtesy of Mohamad Hegaze
Motivationbull Clustering high dimensional data by using local
density measurements (eg feature space)bull Statistical curse of dimensionality
sparseness of the databull Computational curse of dimensionality
expensive range queriesbull LSH parameters should be adjusted for optimal
performance
Mean-Shift Based Clustering in High Dimensions A Texture Classification Example
B Georgescu I Shimshoni and P Meer
Outline
bull Mean-shift in a nutshell + examples
Our scope
bull Mean-shift in high dimensions ndash using LSH
bull Speedups1 Finding optimal LSH parameters
2 Data-driven partitions into buckets
3 Additional speedup by using LSH data structure
Mean-Shift in a Nutshellbandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
point
KNN in mean-shift
Bandwidth should be inversely proportional to the density in the region
high density - small bandwidth low density - large bandwidth
Based on kth nearest neighbor of the point
The bandwidth is
Adaptive mean-shift vs non-adaptive
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Image segmentation algorithm1 Input Data in 5D (3 color + 2 xy) or 3D (1 gray +2 xy)2 Resolution controlled by the bandwidth hs (spatial) hr (color)3 Apply filtering
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
3D
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Image segmentation algorithm
original segmented
filtered
Filtering pixel value of the nearest mode
Mean-shift trajectories
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
original squirrel filtered
original baboon filtered
Filtering examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Segmentation examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Mean-shift in high dimensions
Computational curse of dimensionality
Statistical curse of dimensionality
Expensive range queries implemented with LSH
Sparseness of the data variable bandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
LSH-based data structure
bull Choose L random partitionsEach partition includes K pairs
(dkvk)bull For each point we check
kdi vxK
It Partitions the data into cells
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing the optimal K and L
bull For a query q compute smallest number of distances to points in its buckets
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
points extra includemight big toois L ifbut
missed bemight points small toois L If
cell ain points ofnumber smaller k Large
C
l
l
CC
dC
LNN
dKnN
)1(
C
C
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
structure data theof resolution thedetermines
decreases but increases increases L As
C
CC
Choosing optimal K and LDetermine accurately the KNN for m randomly-selected data points
distance (bandwidth)
Choose error threshold
The optimal K and L should satisfy
the approximate distance
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing optimal K and Lbull For each K estimate the error forbull In one run for all Lrsquos find the minimal L satisfying the constraint L(K)bull Minimize time t(KL(K))
minimum
Approximationerror for KL
L(K) for =005 Running timet[KL(K)]
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Data driven partitions
bull In original LSH cut values are random in the range of the databull Suggestion Randomly select a point from the data and use one of its coordinates as the cut value
uniform data driven pointsbucket distribution
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Additional speedup
aggregate)an of typea like is (C mode same
the toconverge willCin points all that Assume
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
C
C
Speedup results
65536 points 1638 points sampled k=100
Food for thought
Low dimension High dimension
A thought for foodhellipbull Choose K L by sample learning or take the traditionalbull Can one estimate K L without samplingbull A thought for food does it help to know the data
dimensionality or the data manifoldbull Intuitively dimensionality implies the number of hash
functions neededbull The catch efficient dimensionality learning requires
KNN
1530 cookieshellip
Summary
bull LSH suggests a compromise on accuracy for the gain of complexity
bull Applications that involve massive data in high dimension require the LSH fast performance
bull Extension of the LSH to different spaces (PSH)
bull Learning the LSH parameters and hash functions for different applications
Conclusion
bull but at the endeverything depends on your data set
bull Try it at homendash Visit
httpwebmiteduandoniwwwLSHindexhtmlndash Email Alex Andoni Andonimitedundash Test over your own data
(C code under Red Hat Linux )
Thanks
bull Ilan Shimshoni (Haifa)
bull Mohamad Hegaze (Weizmann)
bull Alex Andoni (MIT)
bull Mica and Denis
- k-Nearest Neighbors Search in High Dimensions
- Outline
- Nearest Neighbor Search Problem definition
- Applications
- Naiumlve solution
- Common solution
- When to use nearest neighbor
- Nearest Neighbor
- r - Nearest Neighbor
- Slide 10
- The simplest solution
- Quadtree
- Quadtree - structure
- Quadtree - Query
- Quadtree ndash Pitfall1
- Slide 16
- Quadtree ndash pitfall 2
- Space partition based algorithms
- Slide 19
- Curse of dimensionality
- Curse of dimensionality Some intuition
- Slide 22
- Preview
- Hash function
- Slide 25
- Slide 26
- Recall r - Nearest Neighbor
- Locality sensitive hashing
- Slide 29
- Hamming Space
- Slide 31
- L1 to Hamming Space Embedding
- Slide 33
- Construction
- Query
- Alternative intuition random projections
- Slide 37
- Slide 38
- Slide 39
- k samplings
- Repeating
- Repeating L times
- Slide 43
- Secondary hashing
- The above hashing is locality-sensitive
- Slide 46
- Direct L2 solution
- Central limit theorem
- Slide 49
- Slide 50
- Norm Distance
- Slide 52
- The full Hashing
- Slide 54
- Slide 55
- Slide 56
- Generalization P-Stable distribution
- P-Stable summary
- Parameters selection
- Parameters selection hellip
- hellip Parameters selection
- Pros amp Cons
- Conclusion
- LSH - Applications
- Motivation
- Slide 66
- Given an image x what are the parameters θ in this image ie angles of joints orientation of the body etc1048698
- Ingredients
- Example based learning
- Slide 70
- The image features
- PSH The basic assumption
- Insight Manifolds
- Slide 74
- Parameter Sensitive Hashing (PSH)
- Slide 76
- Slide 77
- Slide 78
- Slide 79
- Local Weighted Regression (LWR)
- Results
- Slide 82
- Results ndash real data
- Slide 84
- Slide 85
- Fast pose estimation - summary
- Food for Thought
- Food for Thought Point Location in Different Spheres (PLDS)
- Motivation
- Slide 90
- Slide 91
- Slide 92
- Slide 93
- Image segmentation algorithm
- Slide 95
- Filtering examples
- Segmentation examples
- Mean-shift in high dimensions
- LSH-based data structure
- Choosing the optimal K and L
- Slide 101
- Choosing optimal K and L
- Slide 103
- Data driven partitions
- Additional speedup
- Speedup results
- Food for thought
- A thought for foodhellip
- Summary
- Slide 110
- Thanks
-
![Page 51: K-Nearest Neighbors Search in High Dimensions Tomer Peled Dan Kushnir Tell me who your neighbors are, and I'll know who you are.](https://reader036.fdocuments.us/reader036/viewer/2022062421/56649c765503460f94929f53/html5/thumbnails/51.jpg)
Norm Distance
XvuXvXui
iii
iii
ii
21
2||
Features vector 1
Features vector 2 Distance
Norm Distance
XvuXvXui
iii
iii
ii
21
2||
Dot Product
Dot Product Distance
The full Hashing
w
bvavh ba )(
[34 82 21]1
227742
d
d random numbers
+b
phaseRandom[0w]
wDiscretization step
Features vector
The full Hashing
w
bvavh ba )(
+34
100
7944
7900 8000 8100 82007800
The full Hashing
w
bvavh ba )(
+34
phaseRandom[0w]
100Discretization step
7944
The full Hashing
w
bvavh ba )(
a1 v d
iid from p-stable distribution
+b
phaseRandom[0w]
wDiscretization step
Features vector
Generalization P-Stable distribution
bullLp p=eps2
bullGeneralized Central Limit Theorem
bullP-stable distributionCauchy for L2
bullL2
bullCentral Limit Theorem
bullGaussian (normal) distribution
P-Stable summary
bullWorks for bullGeneralizes to 0ltplt=2
bullImproves query time
Query time = O (dn1(1+)log(n) ) O (dn1(1+)^2log(n) )
r - Nearest Neighbor
Latest resultsReported in Email by
Alexander Andoni
Parameters selection
bull90 Probability Best quarry time performance
For Euclidean Space
Parameters selectionhellip
For Euclidean Space
bullSingle projection hit an - Nearest Neighbor with Pr=p1
bullk projections hits an - Nearest Neighbor with Pr=p1k
bullL hashings fail to collide with Pr=(1-p1k)L
bullTo ensure Collision (eg 1-δge90)
bull1( -1-p1k)Lge 1-δ)1log(
)log(
1kp
L
L
Reject Non-NeighborsAccept Neighbors
hellipParameters selection
K
k
time Candidates verification Candidates extraction
Better Query Time than Spatial Data Structures
Scales well to higher dimensions and larger data size ( Sub-linear dependence )
Predictable running time
Extra storage over-head
Inefficient for data with distances concentrated around average
works best for Hamming distance (although can be generalized to Euclidean space)
In secondary storage linear scan is pretty much all we can do (for high dim)
requires radius r to be fixed in advance
Pros amp Cons
From Pioter Indyk slides
Conclusion
bullbut at the endeverything depends on your data set
bullTry it at homendashVisit
httpwebmiteduandoniwwwLSHindexhtml
ndashEmail Alex AndoniAndonimitedundashTest over your own data
(C code under Red Hat Linux )
LSH - Applicationsbull Searching video clips in databases (Hierarchical Non-Uniform Locality Sensitive
Hashing and Its Application to Video Identificationldquo Yang Ooi Sun)
bull Searching image databases (see the following)
bull Image segmentation (see the following)
bull Image classification (ldquoDiscriminant adaptive Nearest Neighbor Classificationrdquo T Hastie R Tibshirani)
bull Texture classification (see the following)
bull Clustering (see the following)
bull Embedding and manifold learning (LLE and many others)
bull Compression ndash vector quantizationbull Search engines (ldquoLSH Forest SelfTuning Indexes for Similarity Searchrdquo M Bawa T Condie P Ganesanrdquo)
bull Genomics (ldquoEfficient Large-Scale Sequence Comparison by Locality-Sensitive Hashingrdquo J Buhler)
bull In short whenever K-Nearest Neighbors (KNN) are needed
Motivation
bull A variety of procedures in learning require KNN computation
bull KNN search is a computational bottleneck
bull LSH provides a fast approximate solution to the problem
bull LSH requires hash function construction and parameter tunning
Outline
Fast Pose Estimation with Parameter Sensitive Hashing G Shakhnarovich P Viola and T Darrell
bull Finding sensitive hash functions
Mean Shift Based Clustering in HighDimensions A Texture Classification Example
B Georgescu I Shimshoni and P Meer
bull Tuning LSH parametersbull LSH data structure is used for algorithm
speedups
Given an image x what are the parameters θ in this image
ie angles of joints orientation of the body etc1048698
The Problem
Fast Pose Estimation with Parameter Sensitive Hashing
G Shakhnarovich P Viola and T Darrell
i
Ingredients
bull Input query image with unknown angles (parameters)
bull Database of human poses with known anglesbull Image feature extractor ndash edge detector
bull Distance metric in feature space dx
bull Distance metric in angles space
m
i
iid1
2121 )cos(1)(
Example based learning
bull Construct a database of example images with their known angles
bull Given a query image run your favorite feature extractorbull Compute KNN from databasebull Use these KNNs to compute the average angles of the
query
Input queryFind KNN in database of examples
Output Average angles of KNN
Input Query
Features extraction
Processed query
PSH (LSH)
Database of examples
The algorithm flow
LWR (Regression)
Output Match
The image features
B A
Axx 4107 )(
4
3
2
4 0
Image features are multi-scale edge histograms
Feature Extraction PSH LWR
PSH The basic assumption
There are two metric spaces here feature space ( )
and parameter space ( )
We want similarity to be measured in the angles
space whereas LSH works on the feature space
bull Assumption The feature space is closely related to the parameter space
xd
d
Feature Extraction PSH LWR
Insight Manifolds
bull Manifold is a space in which every point has a neighborhood resembling a Euclid space
bull But global structure may be complicated curved
bull For example lines are 1D manifolds planes are 2D manifolds etc
Feature Extraction PSH LWR
Parameters Space (angles)
Feature Space
q
Is this Magic
Parameter Sensitive Hashing (PSH)
The trick
Estimate performance of different hash functions on examples and select those sensitive to
The hash functions are applied in feature space but the KNN are valid in angle space
d
Feature Extraction PSH LWR
Label pairs of examples with similar angles
Define hash functions h on feature space
Feature Extraction PSH LWR
Predict labeling of similarnon-similar examples by using h
Compare labeling
If labeling by h is goodaccept h else change h
PSH as a classification problem
+1 +1 -1 -1
(r=025)
Labels
)1()( if 1
)( if 1y
labeled is
)x()(x examples ofpair A
ij
ji
rd
rd
ji
ji
ji
Feature Extraction PSH LWR
otherwise 1-
T(x) if 1)(
xh T
A binary hash functionfeatures
otherwise 1
if 1ˆ
labels ePredict th
)(xh)(xh)x(xy
jTiTjih
Feature Extraction PSH LWR
Feature
Feature Extraction PSH LWR
sconstraint iesprobabilit with thelabeling true thepredicts that Tbest theFind
themseparateor bin
same in the examplesboth place willTh
)(xT
Local Weighted Regression (LWR)bull Given a query image PSH returns
KNNs
bull LWR uses the KNN to compute a weighted average of the estimated angles of the query
weightdist
iXiixNx
xxdKxgdi
0)(
))(())((minarg0
Feature Extraction PSH LWR
Results
Synthetic data were generated
bull 13 angles 1 for rotation of the torso 12 for joints
bull 150000 images
bull Nuisance parameters added clothing illumination face expression
bull 1775000 example pairs
bull Selected 137 out of 5123 meaningful features (how)
18 bit hash functions (k) 150 hash tables (l)
bull Test on 1000 synthetic examplesbull PSH searched only 34 of the data per query
bull Without selection needed 40 bits and
1000 hash tables
Recall P1 is prob of positive hashP2 is prob of bad hashB is the max number of pts in a bucket
Results ndash real data
bull 800 images
bull Processed by a segmentation algorithm
bull 13 of the data were searched
Results ndash real data
Interesting mismatches
Fast pose estimation - summary
bull Fast way to compute the angles of human body figure
bull Moving from one representation space to another
bull Training a sensitive hash function
bull KNN smart averaging
Food for Thought
bull The basic assumption may be problematic (distance metric representations)
bull The training set should be dense
bull Texture and clutter
bull General some features are more important than others and should be weighted
Food for Thought Point Location in Different Spheres (PLDS)
bull Given n spheres in Rd centered at P=p1hellippn
with radii r1helliprn
bull Goal given a query q preprocess the points in P to find point pi that its sphere lsquocoverrsquo the query q
qpi
ri
Courtesy of Mohamad Hegaze
Motivationbull Clustering high dimensional data by using local
density measurements (eg feature space)bull Statistical curse of dimensionality
sparseness of the databull Computational curse of dimensionality
expensive range queriesbull LSH parameters should be adjusted for optimal
performance
Mean-Shift Based Clustering in High Dimensions A Texture Classification Example
B Georgescu I Shimshoni and P Meer
Outline
bull Mean-shift in a nutshell + examples
Our scope
bull Mean-shift in high dimensions ndash using LSH
bull Speedups1 Finding optimal LSH parameters
2 Data-driven partitions into buckets
3 Additional speedup by using LSH data structure
Mean-Shift in a Nutshellbandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
point
KNN in mean-shift
Bandwidth should be inversely proportional to the density in the region
high density - small bandwidth low density - large bandwidth
Based on kth nearest neighbor of the point
The bandwidth is
Adaptive mean-shift vs non-adaptive
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Image segmentation algorithm1 Input Data in 5D (3 color + 2 xy) or 3D (1 gray +2 xy)2 Resolution controlled by the bandwidth hs (spatial) hr (color)3 Apply filtering
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
3D
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Image segmentation algorithm
original segmented
filtered
Filtering pixel value of the nearest mode
Mean-shift trajectories
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
original squirrel filtered
original baboon filtered
Filtering examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Segmentation examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Mean-shift in high dimensions
Computational curse of dimensionality
Statistical curse of dimensionality
Expensive range queries implemented with LSH
Sparseness of the data variable bandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
LSH-based data structure
bull Choose L random partitionsEach partition includes K pairs
(dkvk)bull For each point we check
kdi vxK
It Partitions the data into cells
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing the optimal K and L
bull For a query q compute smallest number of distances to points in its buckets
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
points extra includemight big toois L ifbut
missed bemight points small toois L If
cell ain points ofnumber smaller k Large
C
l
l
CC
dC
LNN
dKnN
)1(
C
C
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
structure data theof resolution thedetermines
decreases but increases increases L As
C
CC
Choosing optimal K and LDetermine accurately the KNN for m randomly-selected data points
distance (bandwidth)
Choose error threshold
The optimal K and L should satisfy
the approximate distance
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing optimal K and Lbull For each K estimate the error forbull In one run for all Lrsquos find the minimal L satisfying the constraint L(K)bull Minimize time t(KL(K))
minimum
Approximationerror for KL
L(K) for =005 Running timet[KL(K)]
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Data driven partitions
bull In original LSH cut values are random in the range of the databull Suggestion Randomly select a point from the data and use one of its coordinates as the cut value
uniform data driven pointsbucket distribution
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Additional speedup
aggregate)an of typea like is (C mode same
the toconverge willCin points all that Assume
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
C
C
Speedup results
65536 points 1638 points sampled k=100
Food for thought
Low dimension High dimension
A thought for foodhellipbull Choose K L by sample learning or take the traditionalbull Can one estimate K L without samplingbull A thought for food does it help to know the data
dimensionality or the data manifoldbull Intuitively dimensionality implies the number of hash
functions neededbull The catch efficient dimensionality learning requires
KNN
1530 cookieshellip
Summary
bull LSH suggests a compromise on accuracy for the gain of complexity
bull Applications that involve massive data in high dimension require the LSH fast performance
bull Extension of the LSH to different spaces (PSH)
bull Learning the LSH parameters and hash functions for different applications
Conclusion
bull but at the endeverything depends on your data set
bull Try it at homendash Visit
httpwebmiteduandoniwwwLSHindexhtmlndash Email Alex Andoni Andonimitedundash Test over your own data
(C code under Red Hat Linux )
Thanks
bull Ilan Shimshoni (Haifa)
bull Mohamad Hegaze (Weizmann)
bull Alex Andoni (MIT)
bull Mica and Denis
- k-Nearest Neighbors Search in High Dimensions
- Outline
- Nearest Neighbor Search Problem definition
- Applications
- Naiumlve solution
- Common solution
- When to use nearest neighbor
- Nearest Neighbor
- r - Nearest Neighbor
- Slide 10
- The simplest solution
- Quadtree
- Quadtree - structure
- Quadtree - Query
- Quadtree ndash Pitfall1
- Slide 16
- Quadtree ndash pitfall 2
- Space partition based algorithms
- Slide 19
- Curse of dimensionality
- Curse of dimensionality Some intuition
- Slide 22
- Preview
- Hash function
- Slide 25
- Slide 26
- Recall r - Nearest Neighbor
- Locality sensitive hashing
- Slide 29
- Hamming Space
- Slide 31
- L1 to Hamming Space Embedding
- Slide 33
- Construction
- Query
- Alternative intuition random projections
- Slide 37
- Slide 38
- Slide 39
- k samplings
- Repeating
- Repeating L times
- Slide 43
- Secondary hashing
- The above hashing is locality-sensitive
- Slide 46
- Direct L2 solution
- Central limit theorem
- Slide 49
- Slide 50
- Norm Distance
- Slide 52
- The full Hashing
- Slide 54
- Slide 55
- Slide 56
- Generalization P-Stable distribution
- P-Stable summary
- Parameters selection
- Parameters selection hellip
- hellip Parameters selection
- Pros amp Cons
- Conclusion
- LSH - Applications
- Motivation
- Slide 66
- Given an image x what are the parameters θ in this image ie angles of joints orientation of the body etc1048698
- Ingredients
- Example based learning
- Slide 70
- The image features
- PSH The basic assumption
- Insight Manifolds
- Slide 74
- Parameter Sensitive Hashing (PSH)
- Slide 76
- Slide 77
- Slide 78
- Slide 79
- Local Weighted Regression (LWR)
- Results
- Slide 82
- Results ndash real data
- Slide 84
- Slide 85
- Fast pose estimation - summary
- Food for Thought
- Food for Thought Point Location in Different Spheres (PLDS)
- Motivation
- Slide 90
- Slide 91
- Slide 92
- Slide 93
- Image segmentation algorithm
- Slide 95
- Filtering examples
- Segmentation examples
- Mean-shift in high dimensions
- LSH-based data structure
- Choosing the optimal K and L
- Slide 101
- Choosing optimal K and L
- Slide 103
- Data driven partitions
- Additional speedup
- Speedup results
- Food for thought
- A thought for foodhellip
- Summary
- Slide 110
- Thanks
-
![Page 52: K-Nearest Neighbors Search in High Dimensions Tomer Peled Dan Kushnir Tell me who your neighbors are, and I'll know who you are.](https://reader036.fdocuments.us/reader036/viewer/2022062421/56649c765503460f94929f53/html5/thumbnails/52.jpg)
Norm Distance
XvuXvXui
iii
iii
ii
21
2||
Dot Product
Dot Product Distance
The full Hashing
w
bvavh ba )(
[34 82 21]1
227742
d
d random numbers
+b
phaseRandom[0w]
wDiscretization step
Features vector
The full Hashing
w
bvavh ba )(
+34
100
7944
7900 8000 8100 82007800
The full Hashing
w
bvavh ba )(
+34
phaseRandom[0w]
100Discretization step
7944
The full Hashing
w
bvavh ba )(
a1 v d
iid from p-stable distribution
+b
phaseRandom[0w]
wDiscretization step
Features vector
Generalization P-Stable distribution
bullLp p=eps2
bullGeneralized Central Limit Theorem
bullP-stable distributionCauchy for L2
bullL2
bullCentral Limit Theorem
bullGaussian (normal) distribution
P-Stable summary
bullWorks for bullGeneralizes to 0ltplt=2
bullImproves query time
Query time = O (dn1(1+)log(n) ) O (dn1(1+)^2log(n) )
r - Nearest Neighbor
Latest resultsReported in Email by
Alexander Andoni
Parameters selection
bull90 Probability Best quarry time performance
For Euclidean Space
Parameters selectionhellip
For Euclidean Space
bullSingle projection hit an - Nearest Neighbor with Pr=p1
bullk projections hits an - Nearest Neighbor with Pr=p1k
bullL hashings fail to collide with Pr=(1-p1k)L
bullTo ensure Collision (eg 1-δge90)
bull1( -1-p1k)Lge 1-δ)1log(
)log(
1kp
L
L
Reject Non-NeighborsAccept Neighbors
hellipParameters selection
K
k
time Candidates verification Candidates extraction
Better Query Time than Spatial Data Structures
Scales well to higher dimensions and larger data size ( Sub-linear dependence )
Predictable running time
Extra storage over-head
Inefficient for data with distances concentrated around average
works best for Hamming distance (although can be generalized to Euclidean space)
In secondary storage linear scan is pretty much all we can do (for high dim)
requires radius r to be fixed in advance
Pros amp Cons
From Pioter Indyk slides
Conclusion
bullbut at the endeverything depends on your data set
bullTry it at homendashVisit
httpwebmiteduandoniwwwLSHindexhtml
ndashEmail Alex AndoniAndonimitedundashTest over your own data
(C code under Red Hat Linux )
LSH - Applicationsbull Searching video clips in databases (Hierarchical Non-Uniform Locality Sensitive
Hashing and Its Application to Video Identificationldquo Yang Ooi Sun)
bull Searching image databases (see the following)
bull Image segmentation (see the following)
bull Image classification (ldquoDiscriminant adaptive Nearest Neighbor Classificationrdquo T Hastie R Tibshirani)
bull Texture classification (see the following)
bull Clustering (see the following)
bull Embedding and manifold learning (LLE and many others)
bull Compression ndash vector quantizationbull Search engines (ldquoLSH Forest SelfTuning Indexes for Similarity Searchrdquo M Bawa T Condie P Ganesanrdquo)
bull Genomics (ldquoEfficient Large-Scale Sequence Comparison by Locality-Sensitive Hashingrdquo J Buhler)
bull In short whenever K-Nearest Neighbors (KNN) are needed
Motivation
bull A variety of procedures in learning require KNN computation
bull KNN search is a computational bottleneck
bull LSH provides a fast approximate solution to the problem
bull LSH requires hash function construction and parameter tunning
Outline
Fast Pose Estimation with Parameter Sensitive Hashing G Shakhnarovich P Viola and T Darrell
bull Finding sensitive hash functions
Mean Shift Based Clustering in HighDimensions A Texture Classification Example
B Georgescu I Shimshoni and P Meer
bull Tuning LSH parametersbull LSH data structure is used for algorithm
speedups
Given an image x what are the parameters θ in this image
ie angles of joints orientation of the body etc1048698
The Problem
Fast Pose Estimation with Parameter Sensitive Hashing
G Shakhnarovich P Viola and T Darrell
i
Ingredients
bull Input query image with unknown angles (parameters)
bull Database of human poses with known anglesbull Image feature extractor ndash edge detector
bull Distance metric in feature space dx
bull Distance metric in angles space
m
i
iid1
2121 )cos(1)(
Example based learning
bull Construct a database of example images with their known angles
bull Given a query image run your favorite feature extractorbull Compute KNN from databasebull Use these KNNs to compute the average angles of the
query
Input queryFind KNN in database of examples
Output Average angles of KNN
Input Query
Features extraction
Processed query
PSH (LSH)
Database of examples
The algorithm flow
LWR (Regression)
Output Match
The image features
B A
Axx 4107 )(
4
3
2
4 0
Image features are multi-scale edge histograms
Feature Extraction PSH LWR
PSH The basic assumption
There are two metric spaces here feature space ( )
and parameter space ( )
We want similarity to be measured in the angles
space whereas LSH works on the feature space
bull Assumption The feature space is closely related to the parameter space
xd
d
Feature Extraction PSH LWR
Insight Manifolds
bull Manifold is a space in which every point has a neighborhood resembling a Euclid space
bull But global structure may be complicated curved
bull For example lines are 1D manifolds planes are 2D manifolds etc
Feature Extraction PSH LWR
Parameters Space (angles)
Feature Space
q
Is this Magic
Parameter Sensitive Hashing (PSH)
The trick
Estimate performance of different hash functions on examples and select those sensitive to
The hash functions are applied in feature space but the KNN are valid in angle space
d
Feature Extraction PSH LWR
Label pairs of examples with similar angles
Define hash functions h on feature space
Feature Extraction PSH LWR
Predict labeling of similarnon-similar examples by using h
Compare labeling
If labeling by h is goodaccept h else change h
PSH as a classification problem
+1 +1 -1 -1
(r=025)
Labels
)1()( if 1
)( if 1y
labeled is
)x()(x examples ofpair A
ij
ji
rd
rd
ji
ji
ji
Feature Extraction PSH LWR
otherwise 1-
T(x) if 1)(
xh T
A binary hash functionfeatures
otherwise 1
if 1ˆ
labels ePredict th
)(xh)(xh)x(xy
jTiTjih
Feature Extraction PSH LWR
Feature
Feature Extraction PSH LWR
sconstraint iesprobabilit with thelabeling true thepredicts that Tbest theFind
themseparateor bin
same in the examplesboth place willTh
)(xT
Local Weighted Regression (LWR)bull Given a query image PSH returns
KNNs
bull LWR uses the KNN to compute a weighted average of the estimated angles of the query
weightdist
iXiixNx
xxdKxgdi
0)(
))(())((minarg0
Feature Extraction PSH LWR
Results
Synthetic data were generated
bull 13 angles 1 for rotation of the torso 12 for joints
bull 150000 images
bull Nuisance parameters added clothing illumination face expression
bull 1775000 example pairs
bull Selected 137 out of 5123 meaningful features (how)
18 bit hash functions (k) 150 hash tables (l)
bull Test on 1000 synthetic examplesbull PSH searched only 34 of the data per query
bull Without selection needed 40 bits and
1000 hash tables
Recall P1 is prob of positive hashP2 is prob of bad hashB is the max number of pts in a bucket
Results ndash real data
bull 800 images
bull Processed by a segmentation algorithm
bull 13 of the data were searched
Results ndash real data
Interesting mismatches
Fast pose estimation - summary
bull Fast way to compute the angles of human body figure
bull Moving from one representation space to another
bull Training a sensitive hash function
bull KNN smart averaging
Food for Thought
bull The basic assumption may be problematic (distance metric representations)
bull The training set should be dense
bull Texture and clutter
bull General some features are more important than others and should be weighted
Food for Thought Point Location in Different Spheres (PLDS)
bull Given n spheres in Rd centered at P=p1hellippn
with radii r1helliprn
bull Goal given a query q preprocess the points in P to find point pi that its sphere lsquocoverrsquo the query q
qpi
ri
Courtesy of Mohamad Hegaze
Motivationbull Clustering high dimensional data by using local
density measurements (eg feature space)bull Statistical curse of dimensionality
sparseness of the databull Computational curse of dimensionality
expensive range queriesbull LSH parameters should be adjusted for optimal
performance
Mean-Shift Based Clustering in High Dimensions A Texture Classification Example
B Georgescu I Shimshoni and P Meer
Outline
bull Mean-shift in a nutshell + examples
Our scope
bull Mean-shift in high dimensions ndash using LSH
bull Speedups1 Finding optimal LSH parameters
2 Data-driven partitions into buckets
3 Additional speedup by using LSH data structure
Mean-Shift in a Nutshellbandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
point
KNN in mean-shift
Bandwidth should be inversely proportional to the density in the region
high density - small bandwidth low density - large bandwidth
Based on kth nearest neighbor of the point
The bandwidth is
Adaptive mean-shift vs non-adaptive
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Image segmentation algorithm1 Input Data in 5D (3 color + 2 xy) or 3D (1 gray +2 xy)2 Resolution controlled by the bandwidth hs (spatial) hr (color)3 Apply filtering
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
3D
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Image segmentation algorithm
original segmented
filtered
Filtering pixel value of the nearest mode
Mean-shift trajectories
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
original squirrel filtered
original baboon filtered
Filtering examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Segmentation examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Mean-shift in high dimensions
Computational curse of dimensionality
Statistical curse of dimensionality
Expensive range queries implemented with LSH
Sparseness of the data variable bandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
LSH-based data structure
bull Choose L random partitionsEach partition includes K pairs
(dkvk)bull For each point we check
kdi vxK
It Partitions the data into cells
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing the optimal K and L
bull For a query q compute smallest number of distances to points in its buckets
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
points extra includemight big toois L ifbut
missed bemight points small toois L If
cell ain points ofnumber smaller k Large
C
l
l
CC
dC
LNN
dKnN
)1(
C
C
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
structure data theof resolution thedetermines
decreases but increases increases L As
C
CC
Choosing optimal K and LDetermine accurately the KNN for m randomly-selected data points
distance (bandwidth)
Choose error threshold
The optimal K and L should satisfy
the approximate distance
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing optimal K and Lbull For each K estimate the error forbull In one run for all Lrsquos find the minimal L satisfying the constraint L(K)bull Minimize time t(KL(K))
minimum
Approximationerror for KL
L(K) for =005 Running timet[KL(K)]
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Data driven partitions
bull In original LSH cut values are random in the range of the databull Suggestion Randomly select a point from the data and use one of its coordinates as the cut value
uniform data driven pointsbucket distribution
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Additional speedup
aggregate)an of typea like is (C mode same
the toconverge willCin points all that Assume
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
C
C
Speedup results
65536 points 1638 points sampled k=100
Food for thought
Low dimension High dimension
A thought for foodhellipbull Choose K L by sample learning or take the traditionalbull Can one estimate K L without samplingbull A thought for food does it help to know the data
dimensionality or the data manifoldbull Intuitively dimensionality implies the number of hash
functions neededbull The catch efficient dimensionality learning requires
KNN
1530 cookieshellip
Summary
bull LSH suggests a compromise on accuracy for the gain of complexity
bull Applications that involve massive data in high dimension require the LSH fast performance
bull Extension of the LSH to different spaces (PSH)
bull Learning the LSH parameters and hash functions for different applications
Conclusion
bull but at the endeverything depends on your data set
bull Try it at homendash Visit
httpwebmiteduandoniwwwLSHindexhtmlndash Email Alex Andoni Andonimitedundash Test over your own data
(C code under Red Hat Linux )
Thanks
bull Ilan Shimshoni (Haifa)
bull Mohamad Hegaze (Weizmann)
bull Alex Andoni (MIT)
bull Mica and Denis
- k-Nearest Neighbors Search in High Dimensions
- Outline
- Nearest Neighbor Search Problem definition
- Applications
- Naiumlve solution
- Common solution
- When to use nearest neighbor
- Nearest Neighbor
- r - Nearest Neighbor
- Slide 10
- The simplest solution
- Quadtree
- Quadtree - structure
- Quadtree - Query
- Quadtree ndash Pitfall1
- Slide 16
- Quadtree ndash pitfall 2
- Space partition based algorithms
- Slide 19
- Curse of dimensionality
- Curse of dimensionality Some intuition
- Slide 22
- Preview
- Hash function
- Slide 25
- Slide 26
- Recall r - Nearest Neighbor
- Locality sensitive hashing
- Slide 29
- Hamming Space
- Slide 31
- L1 to Hamming Space Embedding
- Slide 33
- Construction
- Query
- Alternative intuition random projections
- Slide 37
- Slide 38
- Slide 39
- k samplings
- Repeating
- Repeating L times
- Slide 43
- Secondary hashing
- The above hashing is locality-sensitive
- Slide 46
- Direct L2 solution
- Central limit theorem
- Slide 49
- Slide 50
- Norm Distance
- Slide 52
- The full Hashing
- Slide 54
- Slide 55
- Slide 56
- Generalization P-Stable distribution
- P-Stable summary
- Parameters selection
- Parameters selection hellip
- hellip Parameters selection
- Pros amp Cons
- Conclusion
- LSH - Applications
- Motivation
- Slide 66
- Given an image x what are the parameters θ in this image ie angles of joints orientation of the body etc1048698
- Ingredients
- Example based learning
- Slide 70
- The image features
- PSH The basic assumption
- Insight Manifolds
- Slide 74
- Parameter Sensitive Hashing (PSH)
- Slide 76
- Slide 77
- Slide 78
- Slide 79
- Local Weighted Regression (LWR)
- Results
- Slide 82
- Results ndash real data
- Slide 84
- Slide 85
- Fast pose estimation - summary
- Food for Thought
- Food for Thought Point Location in Different Spheres (PLDS)
- Motivation
- Slide 90
- Slide 91
- Slide 92
- Slide 93
- Image segmentation algorithm
- Slide 95
- Filtering examples
- Segmentation examples
- Mean-shift in high dimensions
- LSH-based data structure
- Choosing the optimal K and L
- Slide 101
- Choosing optimal K and L
- Slide 103
- Data driven partitions
- Additional speedup
- Speedup results
- Food for thought
- A thought for foodhellip
- Summary
- Slide 110
- Thanks
-
![Page 53: K-Nearest Neighbors Search in High Dimensions Tomer Peled Dan Kushnir Tell me who your neighbors are, and I'll know who you are.](https://reader036.fdocuments.us/reader036/viewer/2022062421/56649c765503460f94929f53/html5/thumbnails/53.jpg)
The full Hashing
w
bvavh ba )(
[34 82 21]1
227742
d
d random numbers
+b
phaseRandom[0w]
wDiscretization step
Features vector
The full Hashing
w
bvavh ba )(
+34
100
7944
7900 8000 8100 82007800
The full Hashing
w
bvavh ba )(
+34
phaseRandom[0w]
100Discretization step
7944
The full Hashing
w
bvavh ba )(
a1 v d
iid from p-stable distribution
+b
phaseRandom[0w]
wDiscretization step
Features vector
Generalization P-Stable distribution
bullLp p=eps2
bullGeneralized Central Limit Theorem
bullP-stable distributionCauchy for L2
bullL2
bullCentral Limit Theorem
bullGaussian (normal) distribution
P-Stable summary
bullWorks for bullGeneralizes to 0ltplt=2
bullImproves query time
Query time = O (dn1(1+)log(n) ) O (dn1(1+)^2log(n) )
r - Nearest Neighbor
Latest resultsReported in Email by
Alexander Andoni
Parameters selection
bull90 Probability Best quarry time performance
For Euclidean Space
Parameters selectionhellip
For Euclidean Space
bullSingle projection hit an - Nearest Neighbor with Pr=p1
bullk projections hits an - Nearest Neighbor with Pr=p1k
bullL hashings fail to collide with Pr=(1-p1k)L
bullTo ensure Collision (eg 1-δge90)
bull1( -1-p1k)Lge 1-δ)1log(
)log(
1kp
L
L
Reject Non-NeighborsAccept Neighbors
hellipParameters selection
K
k
time Candidates verification Candidates extraction
Better Query Time than Spatial Data Structures
Scales well to higher dimensions and larger data size ( Sub-linear dependence )
Predictable running time
Extra storage over-head
Inefficient for data with distances concentrated around average
works best for Hamming distance (although can be generalized to Euclidean space)
In secondary storage linear scan is pretty much all we can do (for high dim)
requires radius r to be fixed in advance
Pros amp Cons
From Pioter Indyk slides
Conclusion
bullbut at the endeverything depends on your data set
bullTry it at homendashVisit
httpwebmiteduandoniwwwLSHindexhtml
ndashEmail Alex AndoniAndonimitedundashTest over your own data
(C code under Red Hat Linux )
LSH - Applicationsbull Searching video clips in databases (Hierarchical Non-Uniform Locality Sensitive
Hashing and Its Application to Video Identificationldquo Yang Ooi Sun)
bull Searching image databases (see the following)
bull Image segmentation (see the following)
bull Image classification (ldquoDiscriminant adaptive Nearest Neighbor Classificationrdquo T Hastie R Tibshirani)
bull Texture classification (see the following)
bull Clustering (see the following)
bull Embedding and manifold learning (LLE and many others)
bull Compression ndash vector quantizationbull Search engines (ldquoLSH Forest SelfTuning Indexes for Similarity Searchrdquo M Bawa T Condie P Ganesanrdquo)
bull Genomics (ldquoEfficient Large-Scale Sequence Comparison by Locality-Sensitive Hashingrdquo J Buhler)
bull In short whenever K-Nearest Neighbors (KNN) are needed
Motivation
bull A variety of procedures in learning require KNN computation
bull KNN search is a computational bottleneck
bull LSH provides a fast approximate solution to the problem
bull LSH requires hash function construction and parameter tunning
Outline
Fast Pose Estimation with Parameter Sensitive Hashing G Shakhnarovich P Viola and T Darrell
bull Finding sensitive hash functions
Mean Shift Based Clustering in HighDimensions A Texture Classification Example
B Georgescu I Shimshoni and P Meer
bull Tuning LSH parametersbull LSH data structure is used for algorithm
speedups
Given an image x what are the parameters θ in this image
ie angles of joints orientation of the body etc1048698
The Problem
Fast Pose Estimation with Parameter Sensitive Hashing
G Shakhnarovich P Viola and T Darrell
i
Ingredients
bull Input query image with unknown angles (parameters)
bull Database of human poses with known anglesbull Image feature extractor ndash edge detector
bull Distance metric in feature space dx
bull Distance metric in angles space
m
i
iid1
2121 )cos(1)(
Example based learning
bull Construct a database of example images with their known angles
bull Given a query image run your favorite feature extractorbull Compute KNN from databasebull Use these KNNs to compute the average angles of the
query
Input queryFind KNN in database of examples
Output Average angles of KNN
Input Query
Features extraction
Processed query
PSH (LSH)
Database of examples
The algorithm flow
LWR (Regression)
Output Match
The image features
B A
Axx 4107 )(
4
3
2
4 0
Image features are multi-scale edge histograms
Feature Extraction PSH LWR
PSH The basic assumption
There are two metric spaces here feature space ( )
and parameter space ( )
We want similarity to be measured in the angles
space whereas LSH works on the feature space
bull Assumption The feature space is closely related to the parameter space
xd
d
Feature Extraction PSH LWR
Insight Manifolds
bull Manifold is a space in which every point has a neighborhood resembling a Euclid space
bull But global structure may be complicated curved
bull For example lines are 1D manifolds planes are 2D manifolds etc
Feature Extraction PSH LWR
Parameters Space (angles)
Feature Space
q
Is this Magic
Parameter Sensitive Hashing (PSH)
The trick
Estimate performance of different hash functions on examples and select those sensitive to
The hash functions are applied in feature space but the KNN are valid in angle space
d
Feature Extraction PSH LWR
Label pairs of examples with similar angles
Define hash functions h on feature space
Feature Extraction PSH LWR
Predict labeling of similarnon-similar examples by using h
Compare labeling
If labeling by h is goodaccept h else change h
PSH as a classification problem
+1 +1 -1 -1
(r=025)
Labels
)1()( if 1
)( if 1y
labeled is
)x()(x examples ofpair A
ij
ji
rd
rd
ji
ji
ji
Feature Extraction PSH LWR
otherwise 1-
T(x) if 1)(
xh T
A binary hash functionfeatures
otherwise 1
if 1ˆ
labels ePredict th
)(xh)(xh)x(xy
jTiTjih
Feature Extraction PSH LWR
Feature
Feature Extraction PSH LWR
sconstraint iesprobabilit with thelabeling true thepredicts that Tbest theFind
themseparateor bin
same in the examplesboth place willTh
)(xT
Local Weighted Regression (LWR)bull Given a query image PSH returns
KNNs
bull LWR uses the KNN to compute a weighted average of the estimated angles of the query
weightdist
iXiixNx
xxdKxgdi
0)(
))(())((minarg0
Feature Extraction PSH LWR
Results
Synthetic data were generated
bull 13 angles 1 for rotation of the torso 12 for joints
bull 150000 images
bull Nuisance parameters added clothing illumination face expression
bull 1775000 example pairs
bull Selected 137 out of 5123 meaningful features (how)
18 bit hash functions (k) 150 hash tables (l)
bull Test on 1000 synthetic examplesbull PSH searched only 34 of the data per query
bull Without selection needed 40 bits and
1000 hash tables
Recall P1 is prob of positive hashP2 is prob of bad hashB is the max number of pts in a bucket
Results ndash real data
bull 800 images
bull Processed by a segmentation algorithm
bull 13 of the data were searched
Results ndash real data
Interesting mismatches
Fast pose estimation - summary
bull Fast way to compute the angles of human body figure
bull Moving from one representation space to another
bull Training a sensitive hash function
bull KNN smart averaging
Food for Thought
bull The basic assumption may be problematic (distance metric representations)
bull The training set should be dense
bull Texture and clutter
bull General some features are more important than others and should be weighted
Food for Thought Point Location in Different Spheres (PLDS)
bull Given n spheres in Rd centered at P=p1hellippn
with radii r1helliprn
bull Goal given a query q preprocess the points in P to find point pi that its sphere lsquocoverrsquo the query q
qpi
ri
Courtesy of Mohamad Hegaze
Motivationbull Clustering high dimensional data by using local
density measurements (eg feature space)bull Statistical curse of dimensionality
sparseness of the databull Computational curse of dimensionality
expensive range queriesbull LSH parameters should be adjusted for optimal
performance
Mean-Shift Based Clustering in High Dimensions A Texture Classification Example
B Georgescu I Shimshoni and P Meer
Outline
bull Mean-shift in a nutshell + examples
Our scope
bull Mean-shift in high dimensions ndash using LSH
bull Speedups1 Finding optimal LSH parameters
2 Data-driven partitions into buckets
3 Additional speedup by using LSH data structure
Mean-Shift in a Nutshellbandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
point
KNN in mean-shift
Bandwidth should be inversely proportional to the density in the region
high density - small bandwidth low density - large bandwidth
Based on kth nearest neighbor of the point
The bandwidth is
Adaptive mean-shift vs non-adaptive
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Image segmentation algorithm1 Input Data in 5D (3 color + 2 xy) or 3D (1 gray +2 xy)2 Resolution controlled by the bandwidth hs (spatial) hr (color)3 Apply filtering
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
3D
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Image segmentation algorithm
original segmented
filtered
Filtering pixel value of the nearest mode
Mean-shift trajectories
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
original squirrel filtered
original baboon filtered
Filtering examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Segmentation examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Mean-shift in high dimensions
Computational curse of dimensionality
Statistical curse of dimensionality
Expensive range queries implemented with LSH
Sparseness of the data variable bandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
LSH-based data structure
bull Choose L random partitionsEach partition includes K pairs
(dkvk)bull For each point we check
kdi vxK
It Partitions the data into cells
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing the optimal K and L
bull For a query q compute smallest number of distances to points in its buckets
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
points extra includemight big toois L ifbut
missed bemight points small toois L If
cell ain points ofnumber smaller k Large
C
l
l
CC
dC
LNN
dKnN
)1(
C
C
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
structure data theof resolution thedetermines
decreases but increases increases L As
C
CC
Choosing optimal K and LDetermine accurately the KNN for m randomly-selected data points
distance (bandwidth)
Choose error threshold
The optimal K and L should satisfy
the approximate distance
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing optimal K and Lbull For each K estimate the error forbull In one run for all Lrsquos find the minimal L satisfying the constraint L(K)bull Minimize time t(KL(K))
minimum
Approximationerror for KL
L(K) for =005 Running timet[KL(K)]
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Data driven partitions
bull In original LSH cut values are random in the range of the databull Suggestion Randomly select a point from the data and use one of its coordinates as the cut value
uniform data driven pointsbucket distribution
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Additional speedup
aggregate)an of typea like is (C mode same
the toconverge willCin points all that Assume
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
C
C
Speedup results
65536 points 1638 points sampled k=100
Food for thought
Low dimension High dimension
A thought for foodhellipbull Choose K L by sample learning or take the traditionalbull Can one estimate K L without samplingbull A thought for food does it help to know the data
dimensionality or the data manifoldbull Intuitively dimensionality implies the number of hash
functions neededbull The catch efficient dimensionality learning requires
KNN
1530 cookieshellip
Summary
bull LSH suggests a compromise on accuracy for the gain of complexity
bull Applications that involve massive data in high dimension require the LSH fast performance
bull Extension of the LSH to different spaces (PSH)
bull Learning the LSH parameters and hash functions for different applications
Conclusion
bull but at the endeverything depends on your data set
bull Try it at homendash Visit
httpwebmiteduandoniwwwLSHindexhtmlndash Email Alex Andoni Andonimitedundash Test over your own data
(C code under Red Hat Linux )
Thanks
bull Ilan Shimshoni (Haifa)
bull Mohamad Hegaze (Weizmann)
bull Alex Andoni (MIT)
bull Mica and Denis
- k-Nearest Neighbors Search in High Dimensions
- Outline
- Nearest Neighbor Search Problem definition
- Applications
- Naiumlve solution
- Common solution
- When to use nearest neighbor
- Nearest Neighbor
- r - Nearest Neighbor
- Slide 10
- The simplest solution
- Quadtree
- Quadtree - structure
- Quadtree - Query
- Quadtree ndash Pitfall1
- Slide 16
- Quadtree ndash pitfall 2
- Space partition based algorithms
- Slide 19
- Curse of dimensionality
- Curse of dimensionality Some intuition
- Slide 22
- Preview
- Hash function
- Slide 25
- Slide 26
- Recall r - Nearest Neighbor
- Locality sensitive hashing
- Slide 29
- Hamming Space
- Slide 31
- L1 to Hamming Space Embedding
- Slide 33
- Construction
- Query
- Alternative intuition random projections
- Slide 37
- Slide 38
- Slide 39
- k samplings
- Repeating
- Repeating L times
- Slide 43
- Secondary hashing
- The above hashing is locality-sensitive
- Slide 46
- Direct L2 solution
- Central limit theorem
- Slide 49
- Slide 50
- Norm Distance
- Slide 52
- The full Hashing
- Slide 54
- Slide 55
- Slide 56
- Generalization P-Stable distribution
- P-Stable summary
- Parameters selection
- Parameters selection hellip
- hellip Parameters selection
- Pros amp Cons
- Conclusion
- LSH - Applications
- Motivation
- Slide 66
- Given an image x what are the parameters θ in this image ie angles of joints orientation of the body etc1048698
- Ingredients
- Example based learning
- Slide 70
- The image features
- PSH The basic assumption
- Insight Manifolds
- Slide 74
- Parameter Sensitive Hashing (PSH)
- Slide 76
- Slide 77
- Slide 78
- Slide 79
- Local Weighted Regression (LWR)
- Results
- Slide 82
- Results ndash real data
- Slide 84
- Slide 85
- Fast pose estimation - summary
- Food for Thought
- Food for Thought Point Location in Different Spheres (PLDS)
- Motivation
- Slide 90
- Slide 91
- Slide 92
- Slide 93
- Image segmentation algorithm
- Slide 95
- Filtering examples
- Segmentation examples
- Mean-shift in high dimensions
- LSH-based data structure
- Choosing the optimal K and L
- Slide 101
- Choosing optimal K and L
- Slide 103
- Data driven partitions
- Additional speedup
- Speedup results
- Food for thought
- A thought for foodhellip
- Summary
- Slide 110
- Thanks
-
![Page 54: K-Nearest Neighbors Search in High Dimensions Tomer Peled Dan Kushnir Tell me who your neighbors are, and I'll know who you are.](https://reader036.fdocuments.us/reader036/viewer/2022062421/56649c765503460f94929f53/html5/thumbnails/54.jpg)
The full Hashing
w
bvavh ba )(
+34
100
7944
7900 8000 8100 82007800
The full Hashing
w
bvavh ba )(
+34
phaseRandom[0w]
100Discretization step
7944
The full Hashing
w
bvavh ba )(
a1 v d
iid from p-stable distribution
+b
phaseRandom[0w]
wDiscretization step
Features vector
Generalization P-Stable distribution
bullLp p=eps2
bullGeneralized Central Limit Theorem
bullP-stable distributionCauchy for L2
bullL2
bullCentral Limit Theorem
bullGaussian (normal) distribution
P-Stable summary
bullWorks for bullGeneralizes to 0ltplt=2
bullImproves query time
Query time = O (dn1(1+)log(n) ) O (dn1(1+)^2log(n) )
r - Nearest Neighbor
Latest resultsReported in Email by
Alexander Andoni
Parameters selection
bull90 Probability Best quarry time performance
For Euclidean Space
Parameters selectionhellip
For Euclidean Space
bullSingle projection hit an - Nearest Neighbor with Pr=p1
bullk projections hits an - Nearest Neighbor with Pr=p1k
bullL hashings fail to collide with Pr=(1-p1k)L
bullTo ensure Collision (eg 1-δge90)
bull1( -1-p1k)Lge 1-δ)1log(
)log(
1kp
L
L
Reject Non-NeighborsAccept Neighbors
hellipParameters selection
K
k
time Candidates verification Candidates extraction
Better Query Time than Spatial Data Structures
Scales well to higher dimensions and larger data size ( Sub-linear dependence )
Predictable running time
Extra storage over-head
Inefficient for data with distances concentrated around average
works best for Hamming distance (although can be generalized to Euclidean space)
In secondary storage linear scan is pretty much all we can do (for high dim)
requires radius r to be fixed in advance
Pros amp Cons
From Pioter Indyk slides
Conclusion
bullbut at the endeverything depends on your data set
bullTry it at homendashVisit
httpwebmiteduandoniwwwLSHindexhtml
ndashEmail Alex AndoniAndonimitedundashTest over your own data
(C code under Red Hat Linux )
LSH - Applicationsbull Searching video clips in databases (Hierarchical Non-Uniform Locality Sensitive
Hashing and Its Application to Video Identificationldquo Yang Ooi Sun)
bull Searching image databases (see the following)
bull Image segmentation (see the following)
bull Image classification (ldquoDiscriminant adaptive Nearest Neighbor Classificationrdquo T Hastie R Tibshirani)
bull Texture classification (see the following)
bull Clustering (see the following)
bull Embedding and manifold learning (LLE and many others)
bull Compression ndash vector quantizationbull Search engines (ldquoLSH Forest SelfTuning Indexes for Similarity Searchrdquo M Bawa T Condie P Ganesanrdquo)
bull Genomics (ldquoEfficient Large-Scale Sequence Comparison by Locality-Sensitive Hashingrdquo J Buhler)
bull In short whenever K-Nearest Neighbors (KNN) are needed
Motivation
bull A variety of procedures in learning require KNN computation
bull KNN search is a computational bottleneck
bull LSH provides a fast approximate solution to the problem
bull LSH requires hash function construction and parameter tunning
Outline
Fast Pose Estimation with Parameter Sensitive Hashing G Shakhnarovich P Viola and T Darrell
bull Finding sensitive hash functions
Mean Shift Based Clustering in HighDimensions A Texture Classification Example
B Georgescu I Shimshoni and P Meer
bull Tuning LSH parametersbull LSH data structure is used for algorithm
speedups
Given an image x what are the parameters θ in this image
ie angles of joints orientation of the body etc1048698
The Problem
Fast Pose Estimation with Parameter Sensitive Hashing
G Shakhnarovich P Viola and T Darrell
i
Ingredients
bull Input query image with unknown angles (parameters)
bull Database of human poses with known anglesbull Image feature extractor ndash edge detector
bull Distance metric in feature space dx
bull Distance metric in angles space
m
i
iid1
2121 )cos(1)(
Example based learning
bull Construct a database of example images with their known angles
bull Given a query image run your favorite feature extractorbull Compute KNN from databasebull Use these KNNs to compute the average angles of the
query
Input queryFind KNN in database of examples
Output Average angles of KNN
Input Query
Features extraction
Processed query
PSH (LSH)
Database of examples
The algorithm flow
LWR (Regression)
Output Match
The image features
B A
Axx 4107 )(
4
3
2
4 0
Image features are multi-scale edge histograms
Feature Extraction PSH LWR
PSH The basic assumption
There are two metric spaces here feature space ( )
and parameter space ( )
We want similarity to be measured in the angles
space whereas LSH works on the feature space
bull Assumption The feature space is closely related to the parameter space
xd
d
Feature Extraction PSH LWR
Insight Manifolds
bull Manifold is a space in which every point has a neighborhood resembling a Euclid space
bull But global structure may be complicated curved
bull For example lines are 1D manifolds planes are 2D manifolds etc
Feature Extraction PSH LWR
Parameters Space (angles)
Feature Space
q
Is this Magic
Parameter Sensitive Hashing (PSH)
The trick
Estimate performance of different hash functions on examples and select those sensitive to
The hash functions are applied in feature space but the KNN are valid in angle space
d
Feature Extraction PSH LWR
Label pairs of examples with similar angles
Define hash functions h on feature space
Feature Extraction PSH LWR
Predict labeling of similarnon-similar examples by using h
Compare labeling
If labeling by h is goodaccept h else change h
PSH as a classification problem
+1 +1 -1 -1
(r=025)
Labels
)1()( if 1
)( if 1y
labeled is
)x()(x examples ofpair A
ij
ji
rd
rd
ji
ji
ji
Feature Extraction PSH LWR
otherwise 1-
T(x) if 1)(
xh T
A binary hash functionfeatures
otherwise 1
if 1ˆ
labels ePredict th
)(xh)(xh)x(xy
jTiTjih
Feature Extraction PSH LWR
Feature
Feature Extraction PSH LWR
sconstraint iesprobabilit with thelabeling true thepredicts that Tbest theFind
themseparateor bin
same in the examplesboth place willTh
)(xT
Local Weighted Regression (LWR)bull Given a query image PSH returns
KNNs
bull LWR uses the KNN to compute a weighted average of the estimated angles of the query
weightdist
iXiixNx
xxdKxgdi
0)(
))(())((minarg0
Feature Extraction PSH LWR
Results
Synthetic data were generated
bull 13 angles 1 for rotation of the torso 12 for joints
bull 150000 images
bull Nuisance parameters added clothing illumination face expression
bull 1775000 example pairs
bull Selected 137 out of 5123 meaningful features (how)
18 bit hash functions (k) 150 hash tables (l)
bull Test on 1000 synthetic examplesbull PSH searched only 34 of the data per query
bull Without selection needed 40 bits and
1000 hash tables
Recall P1 is prob of positive hashP2 is prob of bad hashB is the max number of pts in a bucket
Results ndash real data
bull 800 images
bull Processed by a segmentation algorithm
bull 13 of the data were searched
Results ndash real data
Interesting mismatches
Fast pose estimation - summary
bull Fast way to compute the angles of human body figure
bull Moving from one representation space to another
bull Training a sensitive hash function
bull KNN smart averaging
Food for Thought
bull The basic assumption may be problematic (distance metric representations)
bull The training set should be dense
bull Texture and clutter
bull General some features are more important than others and should be weighted
Food for Thought Point Location in Different Spheres (PLDS)
bull Given n spheres in Rd centered at P=p1hellippn
with radii r1helliprn
bull Goal given a query q preprocess the points in P to find point pi that its sphere lsquocoverrsquo the query q
qpi
ri
Courtesy of Mohamad Hegaze
Motivationbull Clustering high dimensional data by using local
density measurements (eg feature space)bull Statistical curse of dimensionality
sparseness of the databull Computational curse of dimensionality
expensive range queriesbull LSH parameters should be adjusted for optimal
performance
Mean-Shift Based Clustering in High Dimensions A Texture Classification Example
B Georgescu I Shimshoni and P Meer
Outline
bull Mean-shift in a nutshell + examples
Our scope
bull Mean-shift in high dimensions ndash using LSH
bull Speedups1 Finding optimal LSH parameters
2 Data-driven partitions into buckets
3 Additional speedup by using LSH data structure
Mean-Shift in a Nutshellbandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
point
KNN in mean-shift
Bandwidth should be inversely proportional to the density in the region
high density - small bandwidth low density - large bandwidth
Based on kth nearest neighbor of the point
The bandwidth is
Adaptive mean-shift vs non-adaptive
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Image segmentation algorithm1 Input Data in 5D (3 color + 2 xy) or 3D (1 gray +2 xy)2 Resolution controlled by the bandwidth hs (spatial) hr (color)3 Apply filtering
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
3D
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Image segmentation algorithm
original segmented
filtered
Filtering pixel value of the nearest mode
Mean-shift trajectories
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
original squirrel filtered
original baboon filtered
Filtering examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Segmentation examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Mean-shift in high dimensions
Computational curse of dimensionality
Statistical curse of dimensionality
Expensive range queries implemented with LSH
Sparseness of the data variable bandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
LSH-based data structure
bull Choose L random partitionsEach partition includes K pairs
(dkvk)bull For each point we check
kdi vxK
It Partitions the data into cells
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing the optimal K and L
bull For a query q compute smallest number of distances to points in its buckets
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
points extra includemight big toois L ifbut
missed bemight points small toois L If
cell ain points ofnumber smaller k Large
C
l
l
CC
dC
LNN
dKnN
)1(
C
C
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
structure data theof resolution thedetermines
decreases but increases increases L As
C
CC
Choosing optimal K and LDetermine accurately the KNN for m randomly-selected data points
distance (bandwidth)
Choose error threshold
The optimal K and L should satisfy
the approximate distance
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing optimal K and Lbull For each K estimate the error forbull In one run for all Lrsquos find the minimal L satisfying the constraint L(K)bull Minimize time t(KL(K))
minimum
Approximationerror for KL
L(K) for =005 Running timet[KL(K)]
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Data driven partitions
bull In original LSH cut values are random in the range of the databull Suggestion Randomly select a point from the data and use one of its coordinates as the cut value
uniform data driven pointsbucket distribution
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Additional speedup
aggregate)an of typea like is (C mode same
the toconverge willCin points all that Assume
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
C
C
Speedup results
65536 points 1638 points sampled k=100
Food for thought
Low dimension High dimension
A thought for foodhellipbull Choose K L by sample learning or take the traditionalbull Can one estimate K L without samplingbull A thought for food does it help to know the data
dimensionality or the data manifoldbull Intuitively dimensionality implies the number of hash
functions neededbull The catch efficient dimensionality learning requires
KNN
1530 cookieshellip
Summary
bull LSH suggests a compromise on accuracy for the gain of complexity
bull Applications that involve massive data in high dimension require the LSH fast performance
bull Extension of the LSH to different spaces (PSH)
bull Learning the LSH parameters and hash functions for different applications
Conclusion
bull but at the endeverything depends on your data set
bull Try it at homendash Visit
httpwebmiteduandoniwwwLSHindexhtmlndash Email Alex Andoni Andonimitedundash Test over your own data
(C code under Red Hat Linux )
Thanks
bull Ilan Shimshoni (Haifa)
bull Mohamad Hegaze (Weizmann)
bull Alex Andoni (MIT)
bull Mica and Denis
- k-Nearest Neighbors Search in High Dimensions
- Outline
- Nearest Neighbor Search Problem definition
- Applications
- Naiumlve solution
- Common solution
- When to use nearest neighbor
- Nearest Neighbor
- r - Nearest Neighbor
- Slide 10
- The simplest solution
- Quadtree
- Quadtree - structure
- Quadtree - Query
- Quadtree ndash Pitfall1
- Slide 16
- Quadtree ndash pitfall 2
- Space partition based algorithms
- Slide 19
- Curse of dimensionality
- Curse of dimensionality Some intuition
- Slide 22
- Preview
- Hash function
- Slide 25
- Slide 26
- Recall r - Nearest Neighbor
- Locality sensitive hashing
- Slide 29
- Hamming Space
- Slide 31
- L1 to Hamming Space Embedding
- Slide 33
- Construction
- Query
- Alternative intuition random projections
- Slide 37
- Slide 38
- Slide 39
- k samplings
- Repeating
- Repeating L times
- Slide 43
- Secondary hashing
- The above hashing is locality-sensitive
- Slide 46
- Direct L2 solution
- Central limit theorem
- Slide 49
- Slide 50
- Norm Distance
- Slide 52
- The full Hashing
- Slide 54
- Slide 55
- Slide 56
- Generalization P-Stable distribution
- P-Stable summary
- Parameters selection
- Parameters selection hellip
- hellip Parameters selection
- Pros amp Cons
- Conclusion
- LSH - Applications
- Motivation
- Slide 66
- Given an image x what are the parameters θ in this image ie angles of joints orientation of the body etc1048698
- Ingredients
- Example based learning
- Slide 70
- The image features
- PSH The basic assumption
- Insight Manifolds
- Slide 74
- Parameter Sensitive Hashing (PSH)
- Slide 76
- Slide 77
- Slide 78
- Slide 79
- Local Weighted Regression (LWR)
- Results
- Slide 82
- Results ndash real data
- Slide 84
- Slide 85
- Fast pose estimation - summary
- Food for Thought
- Food for Thought Point Location in Different Spheres (PLDS)
- Motivation
- Slide 90
- Slide 91
- Slide 92
- Slide 93
- Image segmentation algorithm
- Slide 95
- Filtering examples
- Segmentation examples
- Mean-shift in high dimensions
- LSH-based data structure
- Choosing the optimal K and L
- Slide 101
- Choosing optimal K and L
- Slide 103
- Data driven partitions
- Additional speedup
- Speedup results
- Food for thought
- A thought for foodhellip
- Summary
- Slide 110
- Thanks
-
![Page 55: K-Nearest Neighbors Search in High Dimensions Tomer Peled Dan Kushnir Tell me who your neighbors are, and I'll know who you are.](https://reader036.fdocuments.us/reader036/viewer/2022062421/56649c765503460f94929f53/html5/thumbnails/55.jpg)
The full Hashing
w
bvavh ba )(
+34
phaseRandom[0w]
100Discretization step
7944
The full Hashing
w
bvavh ba )(
a1 v d
iid from p-stable distribution
+b
phaseRandom[0w]
wDiscretization step
Features vector
Generalization P-Stable distribution
bullLp p=eps2
bullGeneralized Central Limit Theorem
bullP-stable distributionCauchy for L2
bullL2
bullCentral Limit Theorem
bullGaussian (normal) distribution
P-Stable summary
bullWorks for bullGeneralizes to 0ltplt=2
bullImproves query time
Query time = O (dn1(1+)log(n) ) O (dn1(1+)^2log(n) )
r - Nearest Neighbor
Latest resultsReported in Email by
Alexander Andoni
Parameters selection
bull90 Probability Best quarry time performance
For Euclidean Space
Parameters selectionhellip
For Euclidean Space
bullSingle projection hit an - Nearest Neighbor with Pr=p1
bullk projections hits an - Nearest Neighbor with Pr=p1k
bullL hashings fail to collide with Pr=(1-p1k)L
bullTo ensure Collision (eg 1-δge90)
bull1( -1-p1k)Lge 1-δ)1log(
)log(
1kp
L
L
Reject Non-NeighborsAccept Neighbors
hellipParameters selection
K
k
time Candidates verification Candidates extraction
Better Query Time than Spatial Data Structures
Scales well to higher dimensions and larger data size ( Sub-linear dependence )
Predictable running time
Extra storage over-head
Inefficient for data with distances concentrated around average
works best for Hamming distance (although can be generalized to Euclidean space)
In secondary storage linear scan is pretty much all we can do (for high dim)
requires radius r to be fixed in advance
Pros amp Cons
From Pioter Indyk slides
Conclusion
bullbut at the endeverything depends on your data set
bullTry it at homendashVisit
httpwebmiteduandoniwwwLSHindexhtml
ndashEmail Alex AndoniAndonimitedundashTest over your own data
(C code under Red Hat Linux )
LSH - Applicationsbull Searching video clips in databases (Hierarchical Non-Uniform Locality Sensitive
Hashing and Its Application to Video Identificationldquo Yang Ooi Sun)
bull Searching image databases (see the following)
bull Image segmentation (see the following)
bull Image classification (ldquoDiscriminant adaptive Nearest Neighbor Classificationrdquo T Hastie R Tibshirani)
bull Texture classification (see the following)
bull Clustering (see the following)
bull Embedding and manifold learning (LLE and many others)
bull Compression ndash vector quantizationbull Search engines (ldquoLSH Forest SelfTuning Indexes for Similarity Searchrdquo M Bawa T Condie P Ganesanrdquo)
bull Genomics (ldquoEfficient Large-Scale Sequence Comparison by Locality-Sensitive Hashingrdquo J Buhler)
bull In short whenever K-Nearest Neighbors (KNN) are needed
Motivation
bull A variety of procedures in learning require KNN computation
bull KNN search is a computational bottleneck
bull LSH provides a fast approximate solution to the problem
bull LSH requires hash function construction and parameter tunning
Outline
Fast Pose Estimation with Parameter Sensitive Hashing G Shakhnarovich P Viola and T Darrell
bull Finding sensitive hash functions
Mean Shift Based Clustering in HighDimensions A Texture Classification Example
B Georgescu I Shimshoni and P Meer
bull Tuning LSH parametersbull LSH data structure is used for algorithm
speedups
Given an image x what are the parameters θ in this image
ie angles of joints orientation of the body etc1048698
The Problem
Fast Pose Estimation with Parameter Sensitive Hashing
G Shakhnarovich P Viola and T Darrell
i
Ingredients
bull Input query image with unknown angles (parameters)
bull Database of human poses with known anglesbull Image feature extractor ndash edge detector
bull Distance metric in feature space dx
bull Distance metric in angles space
m
i
iid1
2121 )cos(1)(
Example based learning
bull Construct a database of example images with their known angles
bull Given a query image run your favorite feature extractorbull Compute KNN from databasebull Use these KNNs to compute the average angles of the
query
Input queryFind KNN in database of examples
Output Average angles of KNN
Input Query
Features extraction
Processed query
PSH (LSH)
Database of examples
The algorithm flow
LWR (Regression)
Output Match
The image features
B A
Axx 4107 )(
4
3
2
4 0
Image features are multi-scale edge histograms
Feature Extraction PSH LWR
PSH The basic assumption
There are two metric spaces here feature space ( )
and parameter space ( )
We want similarity to be measured in the angles
space whereas LSH works on the feature space
bull Assumption The feature space is closely related to the parameter space
xd
d
Feature Extraction PSH LWR
Insight Manifolds
bull Manifold is a space in which every point has a neighborhood resembling a Euclid space
bull But global structure may be complicated curved
bull For example lines are 1D manifolds planes are 2D manifolds etc
Feature Extraction PSH LWR
Parameters Space (angles)
Feature Space
q
Is this Magic
Parameter Sensitive Hashing (PSH)
The trick
Estimate performance of different hash functions on examples and select those sensitive to
The hash functions are applied in feature space but the KNN are valid in angle space
d
Feature Extraction PSH LWR
Label pairs of examples with similar angles
Define hash functions h on feature space
Feature Extraction PSH LWR
Predict labeling of similarnon-similar examples by using h
Compare labeling
If labeling by h is goodaccept h else change h
PSH as a classification problem
+1 +1 -1 -1
(r=025)
Labels
)1()( if 1
)( if 1y
labeled is
)x()(x examples ofpair A
ij
ji
rd
rd
ji
ji
ji
Feature Extraction PSH LWR
otherwise 1-
T(x) if 1)(
xh T
A binary hash functionfeatures
otherwise 1
if 1ˆ
labels ePredict th
)(xh)(xh)x(xy
jTiTjih
Feature Extraction PSH LWR
Feature
Feature Extraction PSH LWR
sconstraint iesprobabilit with thelabeling true thepredicts that Tbest theFind
themseparateor bin
same in the examplesboth place willTh
)(xT
Local Weighted Regression (LWR)bull Given a query image PSH returns
KNNs
bull LWR uses the KNN to compute a weighted average of the estimated angles of the query
weightdist
iXiixNx
xxdKxgdi
0)(
))(())((minarg0
Feature Extraction PSH LWR
Results
Synthetic data were generated
bull 13 angles 1 for rotation of the torso 12 for joints
bull 150000 images
bull Nuisance parameters added clothing illumination face expression
bull 1775000 example pairs
bull Selected 137 out of 5123 meaningful features (how)
18 bit hash functions (k) 150 hash tables (l)
bull Test on 1000 synthetic examplesbull PSH searched only 34 of the data per query
bull Without selection needed 40 bits and
1000 hash tables
Recall P1 is prob of positive hashP2 is prob of bad hashB is the max number of pts in a bucket
Results ndash real data
bull 800 images
bull Processed by a segmentation algorithm
bull 13 of the data were searched
Results ndash real data
Interesting mismatches
Fast pose estimation - summary
bull Fast way to compute the angles of human body figure
bull Moving from one representation space to another
bull Training a sensitive hash function
bull KNN smart averaging
Food for Thought
bull The basic assumption may be problematic (distance metric representations)
bull The training set should be dense
bull Texture and clutter
bull General some features are more important than others and should be weighted
Food for Thought Point Location in Different Spheres (PLDS)
bull Given n spheres in Rd centered at P=p1hellippn
with radii r1helliprn
bull Goal given a query q preprocess the points in P to find point pi that its sphere lsquocoverrsquo the query q
qpi
ri
Courtesy of Mohamad Hegaze
Motivationbull Clustering high dimensional data by using local
density measurements (eg feature space)bull Statistical curse of dimensionality
sparseness of the databull Computational curse of dimensionality
expensive range queriesbull LSH parameters should be adjusted for optimal
performance
Mean-Shift Based Clustering in High Dimensions A Texture Classification Example
B Georgescu I Shimshoni and P Meer
Outline
bull Mean-shift in a nutshell + examples
Our scope
bull Mean-shift in high dimensions ndash using LSH
bull Speedups1 Finding optimal LSH parameters
2 Data-driven partitions into buckets
3 Additional speedup by using LSH data structure
Mean-Shift in a Nutshellbandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
point
KNN in mean-shift
Bandwidth should be inversely proportional to the density in the region
high density - small bandwidth low density - large bandwidth
Based on kth nearest neighbor of the point
The bandwidth is
Adaptive mean-shift vs non-adaptive
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Image segmentation algorithm1 Input Data in 5D (3 color + 2 xy) or 3D (1 gray +2 xy)2 Resolution controlled by the bandwidth hs (spatial) hr (color)3 Apply filtering
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
3D
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Image segmentation algorithm
original segmented
filtered
Filtering pixel value of the nearest mode
Mean-shift trajectories
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
original squirrel filtered
original baboon filtered
Filtering examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Segmentation examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Mean-shift in high dimensions
Computational curse of dimensionality
Statistical curse of dimensionality
Expensive range queries implemented with LSH
Sparseness of the data variable bandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
LSH-based data structure
bull Choose L random partitionsEach partition includes K pairs
(dkvk)bull For each point we check
kdi vxK
It Partitions the data into cells
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing the optimal K and L
bull For a query q compute smallest number of distances to points in its buckets
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
points extra includemight big toois L ifbut
missed bemight points small toois L If
cell ain points ofnumber smaller k Large
C
l
l
CC
dC
LNN
dKnN
)1(
C
C
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
structure data theof resolution thedetermines
decreases but increases increases L As
C
CC
Choosing optimal K and LDetermine accurately the KNN for m randomly-selected data points
distance (bandwidth)
Choose error threshold
The optimal K and L should satisfy
the approximate distance
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing optimal K and Lbull For each K estimate the error forbull In one run for all Lrsquos find the minimal L satisfying the constraint L(K)bull Minimize time t(KL(K))
minimum
Approximationerror for KL
L(K) for =005 Running timet[KL(K)]
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Data driven partitions
bull In original LSH cut values are random in the range of the databull Suggestion Randomly select a point from the data and use one of its coordinates as the cut value
uniform data driven pointsbucket distribution
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Additional speedup
aggregate)an of typea like is (C mode same
the toconverge willCin points all that Assume
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
C
C
Speedup results
65536 points 1638 points sampled k=100
Food for thought
Low dimension High dimension
A thought for foodhellipbull Choose K L by sample learning or take the traditionalbull Can one estimate K L without samplingbull A thought for food does it help to know the data
dimensionality or the data manifoldbull Intuitively dimensionality implies the number of hash
functions neededbull The catch efficient dimensionality learning requires
KNN
1530 cookieshellip
Summary
bull LSH suggests a compromise on accuracy for the gain of complexity
bull Applications that involve massive data in high dimension require the LSH fast performance
bull Extension of the LSH to different spaces (PSH)
bull Learning the LSH parameters and hash functions for different applications
Conclusion
bull but at the endeverything depends on your data set
bull Try it at homendash Visit
httpwebmiteduandoniwwwLSHindexhtmlndash Email Alex Andoni Andonimitedundash Test over your own data
(C code under Red Hat Linux )
Thanks
bull Ilan Shimshoni (Haifa)
bull Mohamad Hegaze (Weizmann)
bull Alex Andoni (MIT)
bull Mica and Denis
- k-Nearest Neighbors Search in High Dimensions
- Outline
- Nearest Neighbor Search Problem definition
- Applications
- Naiumlve solution
- Common solution
- When to use nearest neighbor
- Nearest Neighbor
- r - Nearest Neighbor
- Slide 10
- The simplest solution
- Quadtree
- Quadtree - structure
- Quadtree - Query
- Quadtree ndash Pitfall1
- Slide 16
- Quadtree ndash pitfall 2
- Space partition based algorithms
- Slide 19
- Curse of dimensionality
- Curse of dimensionality Some intuition
- Slide 22
- Preview
- Hash function
- Slide 25
- Slide 26
- Recall r - Nearest Neighbor
- Locality sensitive hashing
- Slide 29
- Hamming Space
- Slide 31
- L1 to Hamming Space Embedding
- Slide 33
- Construction
- Query
- Alternative intuition random projections
- Slide 37
- Slide 38
- Slide 39
- k samplings
- Repeating
- Repeating L times
- Slide 43
- Secondary hashing
- The above hashing is locality-sensitive
- Slide 46
- Direct L2 solution
- Central limit theorem
- Slide 49
- Slide 50
- Norm Distance
- Slide 52
- The full Hashing
- Slide 54
- Slide 55
- Slide 56
- Generalization P-Stable distribution
- P-Stable summary
- Parameters selection
- Parameters selection hellip
- hellip Parameters selection
- Pros amp Cons
- Conclusion
- LSH - Applications
- Motivation
- Slide 66
- Given an image x what are the parameters θ in this image ie angles of joints orientation of the body etc1048698
- Ingredients
- Example based learning
- Slide 70
- The image features
- PSH The basic assumption
- Insight Manifolds
- Slide 74
- Parameter Sensitive Hashing (PSH)
- Slide 76
- Slide 77
- Slide 78
- Slide 79
- Local Weighted Regression (LWR)
- Results
- Slide 82
- Results ndash real data
- Slide 84
- Slide 85
- Fast pose estimation - summary
- Food for Thought
- Food for Thought Point Location in Different Spheres (PLDS)
- Motivation
- Slide 90
- Slide 91
- Slide 92
- Slide 93
- Image segmentation algorithm
- Slide 95
- Filtering examples
- Segmentation examples
- Mean-shift in high dimensions
- LSH-based data structure
- Choosing the optimal K and L
- Slide 101
- Choosing optimal K and L
- Slide 103
- Data driven partitions
- Additional speedup
- Speedup results
- Food for thought
- A thought for foodhellip
- Summary
- Slide 110
- Thanks
-
![Page 56: K-Nearest Neighbors Search in High Dimensions Tomer Peled Dan Kushnir Tell me who your neighbors are, and I'll know who you are.](https://reader036.fdocuments.us/reader036/viewer/2022062421/56649c765503460f94929f53/html5/thumbnails/56.jpg)
The full Hashing
w
bvavh ba )(
a1 v d
iid from p-stable distribution
+b
phaseRandom[0w]
wDiscretization step
Features vector
Generalization P-Stable distribution
bullLp p=eps2
bullGeneralized Central Limit Theorem
bullP-stable distributionCauchy for L2
bullL2
bullCentral Limit Theorem
bullGaussian (normal) distribution
P-Stable summary
bullWorks for bullGeneralizes to 0ltplt=2
bullImproves query time
Query time = O (dn1(1+)log(n) ) O (dn1(1+)^2log(n) )
r - Nearest Neighbor
Latest resultsReported in Email by
Alexander Andoni
Parameters selection
bull90 Probability Best quarry time performance
For Euclidean Space
Parameters selectionhellip
For Euclidean Space
bullSingle projection hit an - Nearest Neighbor with Pr=p1
bullk projections hits an - Nearest Neighbor with Pr=p1k
bullL hashings fail to collide with Pr=(1-p1k)L
bullTo ensure Collision (eg 1-δge90)
bull1( -1-p1k)Lge 1-δ)1log(
)log(
1kp
L
L
Reject Non-NeighborsAccept Neighbors
hellipParameters selection
K
k
time Candidates verification Candidates extraction
Better Query Time than Spatial Data Structures
Scales well to higher dimensions and larger data size ( Sub-linear dependence )
Predictable running time
Extra storage over-head
Inefficient for data with distances concentrated around average
works best for Hamming distance (although can be generalized to Euclidean space)
In secondary storage linear scan is pretty much all we can do (for high dim)
requires radius r to be fixed in advance
Pros amp Cons
From Pioter Indyk slides
Conclusion
bullbut at the endeverything depends on your data set
bullTry it at homendashVisit
httpwebmiteduandoniwwwLSHindexhtml
ndashEmail Alex AndoniAndonimitedundashTest over your own data
(C code under Red Hat Linux )
LSH - Applicationsbull Searching video clips in databases (Hierarchical Non-Uniform Locality Sensitive
Hashing and Its Application to Video Identificationldquo Yang Ooi Sun)
bull Searching image databases (see the following)
bull Image segmentation (see the following)
bull Image classification (ldquoDiscriminant adaptive Nearest Neighbor Classificationrdquo T Hastie R Tibshirani)
bull Texture classification (see the following)
bull Clustering (see the following)
bull Embedding and manifold learning (LLE and many others)
bull Compression ndash vector quantizationbull Search engines (ldquoLSH Forest SelfTuning Indexes for Similarity Searchrdquo M Bawa T Condie P Ganesanrdquo)
bull Genomics (ldquoEfficient Large-Scale Sequence Comparison by Locality-Sensitive Hashingrdquo J Buhler)
bull In short whenever K-Nearest Neighbors (KNN) are needed
Motivation
bull A variety of procedures in learning require KNN computation
bull KNN search is a computational bottleneck
bull LSH provides a fast approximate solution to the problem
bull LSH requires hash function construction and parameter tunning
Outline
Fast Pose Estimation with Parameter Sensitive Hashing G Shakhnarovich P Viola and T Darrell
bull Finding sensitive hash functions
Mean Shift Based Clustering in HighDimensions A Texture Classification Example
B Georgescu I Shimshoni and P Meer
bull Tuning LSH parametersbull LSH data structure is used for algorithm
speedups
Given an image x what are the parameters θ in this image
ie angles of joints orientation of the body etc1048698
The Problem
Fast Pose Estimation with Parameter Sensitive Hashing
G Shakhnarovich P Viola and T Darrell
i
Ingredients
bull Input query image with unknown angles (parameters)
bull Database of human poses with known anglesbull Image feature extractor ndash edge detector
bull Distance metric in feature space dx
bull Distance metric in angles space
m
i
iid1
2121 )cos(1)(
Example based learning
bull Construct a database of example images with their known angles
bull Given a query image run your favorite feature extractorbull Compute KNN from databasebull Use these KNNs to compute the average angles of the
query
Input queryFind KNN in database of examples
Output Average angles of KNN
Input Query
Features extraction
Processed query
PSH (LSH)
Database of examples
The algorithm flow
LWR (Regression)
Output Match
The image features
B A
Axx 4107 )(
4
3
2
4 0
Image features are multi-scale edge histograms
Feature Extraction PSH LWR
PSH The basic assumption
There are two metric spaces here feature space ( )
and parameter space ( )
We want similarity to be measured in the angles
space whereas LSH works on the feature space
bull Assumption The feature space is closely related to the parameter space
xd
d
Feature Extraction PSH LWR
Insight Manifolds
bull Manifold is a space in which every point has a neighborhood resembling a Euclid space
bull But global structure may be complicated curved
bull For example lines are 1D manifolds planes are 2D manifolds etc
Feature Extraction PSH LWR
Parameters Space (angles)
Feature Space
q
Is this Magic
Parameter Sensitive Hashing (PSH)
The trick
Estimate performance of different hash functions on examples and select those sensitive to
The hash functions are applied in feature space but the KNN are valid in angle space
d
Feature Extraction PSH LWR
Label pairs of examples with similar angles
Define hash functions h on feature space
Feature Extraction PSH LWR
Predict labeling of similarnon-similar examples by using h
Compare labeling
If labeling by h is goodaccept h else change h
PSH as a classification problem
+1 +1 -1 -1
(r=025)
Labels
)1()( if 1
)( if 1y
labeled is
)x()(x examples ofpair A
ij
ji
rd
rd
ji
ji
ji
Feature Extraction PSH LWR
otherwise 1-
T(x) if 1)(
xh T
A binary hash functionfeatures
otherwise 1
if 1ˆ
labels ePredict th
)(xh)(xh)x(xy
jTiTjih
Feature Extraction PSH LWR
Feature
Feature Extraction PSH LWR
sconstraint iesprobabilit with thelabeling true thepredicts that Tbest theFind
themseparateor bin
same in the examplesboth place willTh
)(xT
Local Weighted Regression (LWR)bull Given a query image PSH returns
KNNs
bull LWR uses the KNN to compute a weighted average of the estimated angles of the query
weightdist
iXiixNx
xxdKxgdi
0)(
))(())((minarg0
Feature Extraction PSH LWR
Results
Synthetic data were generated
bull 13 angles 1 for rotation of the torso 12 for joints
bull 150000 images
bull Nuisance parameters added clothing illumination face expression
bull 1775000 example pairs
bull Selected 137 out of 5123 meaningful features (how)
18 bit hash functions (k) 150 hash tables (l)
bull Test on 1000 synthetic examplesbull PSH searched only 34 of the data per query
bull Without selection needed 40 bits and
1000 hash tables
Recall P1 is prob of positive hashP2 is prob of bad hashB is the max number of pts in a bucket
Results ndash real data
bull 800 images
bull Processed by a segmentation algorithm
bull 13 of the data were searched
Results ndash real data
Interesting mismatches
Fast pose estimation - summary
bull Fast way to compute the angles of human body figure
bull Moving from one representation space to another
bull Training a sensitive hash function
bull KNN smart averaging
Food for Thought
bull The basic assumption may be problematic (distance metric representations)
bull The training set should be dense
bull Texture and clutter
bull General some features are more important than others and should be weighted
Food for Thought Point Location in Different Spheres (PLDS)
bull Given n spheres in Rd centered at P=p1hellippn
with radii r1helliprn
bull Goal given a query q preprocess the points in P to find point pi that its sphere lsquocoverrsquo the query q
qpi
ri
Courtesy of Mohamad Hegaze
Motivationbull Clustering high dimensional data by using local
density measurements (eg feature space)bull Statistical curse of dimensionality
sparseness of the databull Computational curse of dimensionality
expensive range queriesbull LSH parameters should be adjusted for optimal
performance
Mean-Shift Based Clustering in High Dimensions A Texture Classification Example
B Georgescu I Shimshoni and P Meer
Outline
bull Mean-shift in a nutshell + examples
Our scope
bull Mean-shift in high dimensions ndash using LSH
bull Speedups1 Finding optimal LSH parameters
2 Data-driven partitions into buckets
3 Additional speedup by using LSH data structure
Mean-Shift in a Nutshellbandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
point
KNN in mean-shift
Bandwidth should be inversely proportional to the density in the region
high density - small bandwidth low density - large bandwidth
Based on kth nearest neighbor of the point
The bandwidth is
Adaptive mean-shift vs non-adaptive
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Image segmentation algorithm1 Input Data in 5D (3 color + 2 xy) or 3D (1 gray +2 xy)2 Resolution controlled by the bandwidth hs (spatial) hr (color)3 Apply filtering
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
3D
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Image segmentation algorithm
original segmented
filtered
Filtering pixel value of the nearest mode
Mean-shift trajectories
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
original squirrel filtered
original baboon filtered
Filtering examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Segmentation examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Mean-shift in high dimensions
Computational curse of dimensionality
Statistical curse of dimensionality
Expensive range queries implemented with LSH
Sparseness of the data variable bandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
LSH-based data structure
bull Choose L random partitionsEach partition includes K pairs
(dkvk)bull For each point we check
kdi vxK
It Partitions the data into cells
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing the optimal K and L
bull For a query q compute smallest number of distances to points in its buckets
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
points extra includemight big toois L ifbut
missed bemight points small toois L If
cell ain points ofnumber smaller k Large
C
l
l
CC
dC
LNN
dKnN
)1(
C
C
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
structure data theof resolution thedetermines
decreases but increases increases L As
C
CC
Choosing optimal K and LDetermine accurately the KNN for m randomly-selected data points
distance (bandwidth)
Choose error threshold
The optimal K and L should satisfy
the approximate distance
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing optimal K and Lbull For each K estimate the error forbull In one run for all Lrsquos find the minimal L satisfying the constraint L(K)bull Minimize time t(KL(K))
minimum
Approximationerror for KL
L(K) for =005 Running timet[KL(K)]
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Data driven partitions
bull In original LSH cut values are random in the range of the databull Suggestion Randomly select a point from the data and use one of its coordinates as the cut value
uniform data driven pointsbucket distribution
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Additional speedup
aggregate)an of typea like is (C mode same
the toconverge willCin points all that Assume
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
C
C
Speedup results
65536 points 1638 points sampled k=100
Food for thought
Low dimension High dimension
A thought for foodhellipbull Choose K L by sample learning or take the traditionalbull Can one estimate K L without samplingbull A thought for food does it help to know the data
dimensionality or the data manifoldbull Intuitively dimensionality implies the number of hash
functions neededbull The catch efficient dimensionality learning requires
KNN
1530 cookieshellip
Summary
bull LSH suggests a compromise on accuracy for the gain of complexity
bull Applications that involve massive data in high dimension require the LSH fast performance
bull Extension of the LSH to different spaces (PSH)
bull Learning the LSH parameters and hash functions for different applications
Conclusion
bull but at the endeverything depends on your data set
bull Try it at homendash Visit
httpwebmiteduandoniwwwLSHindexhtmlndash Email Alex Andoni Andonimitedundash Test over your own data
(C code under Red Hat Linux )
Thanks
bull Ilan Shimshoni (Haifa)
bull Mohamad Hegaze (Weizmann)
bull Alex Andoni (MIT)
bull Mica and Denis
- k-Nearest Neighbors Search in High Dimensions
- Outline
- Nearest Neighbor Search Problem definition
- Applications
- Naiumlve solution
- Common solution
- When to use nearest neighbor
- Nearest Neighbor
- r - Nearest Neighbor
- Slide 10
- The simplest solution
- Quadtree
- Quadtree - structure
- Quadtree - Query
- Quadtree ndash Pitfall1
- Slide 16
- Quadtree ndash pitfall 2
- Space partition based algorithms
- Slide 19
- Curse of dimensionality
- Curse of dimensionality Some intuition
- Slide 22
- Preview
- Hash function
- Slide 25
- Slide 26
- Recall r - Nearest Neighbor
- Locality sensitive hashing
- Slide 29
- Hamming Space
- Slide 31
- L1 to Hamming Space Embedding
- Slide 33
- Construction
- Query
- Alternative intuition random projections
- Slide 37
- Slide 38
- Slide 39
- k samplings
- Repeating
- Repeating L times
- Slide 43
- Secondary hashing
- The above hashing is locality-sensitive
- Slide 46
- Direct L2 solution
- Central limit theorem
- Slide 49
- Slide 50
- Norm Distance
- Slide 52
- The full Hashing
- Slide 54
- Slide 55
- Slide 56
- Generalization P-Stable distribution
- P-Stable summary
- Parameters selection
- Parameters selection hellip
- hellip Parameters selection
- Pros amp Cons
- Conclusion
- LSH - Applications
- Motivation
- Slide 66
- Given an image x what are the parameters θ in this image ie angles of joints orientation of the body etc1048698
- Ingredients
- Example based learning
- Slide 70
- The image features
- PSH The basic assumption
- Insight Manifolds
- Slide 74
- Parameter Sensitive Hashing (PSH)
- Slide 76
- Slide 77
- Slide 78
- Slide 79
- Local Weighted Regression (LWR)
- Results
- Slide 82
- Results ndash real data
- Slide 84
- Slide 85
- Fast pose estimation - summary
- Food for Thought
- Food for Thought Point Location in Different Spheres (PLDS)
- Motivation
- Slide 90
- Slide 91
- Slide 92
- Slide 93
- Image segmentation algorithm
- Slide 95
- Filtering examples
- Segmentation examples
- Mean-shift in high dimensions
- LSH-based data structure
- Choosing the optimal K and L
- Slide 101
- Choosing optimal K and L
- Slide 103
- Data driven partitions
- Additional speedup
- Speedup results
- Food for thought
- A thought for foodhellip
- Summary
- Slide 110
- Thanks
-
![Page 57: K-Nearest Neighbors Search in High Dimensions Tomer Peled Dan Kushnir Tell me who your neighbors are, and I'll know who you are.](https://reader036.fdocuments.us/reader036/viewer/2022062421/56649c765503460f94929f53/html5/thumbnails/57.jpg)
Generalization P-Stable distribution
bullLp p=eps2
bullGeneralized Central Limit Theorem
bullP-stable distributionCauchy for L2
bullL2
bullCentral Limit Theorem
bullGaussian (normal) distribution
P-Stable summary
bullWorks for bullGeneralizes to 0ltplt=2
bullImproves query time
Query time = O (dn1(1+)log(n) ) O (dn1(1+)^2log(n) )
r - Nearest Neighbor
Latest resultsReported in Email by
Alexander Andoni
Parameters selection
bull90 Probability Best quarry time performance
For Euclidean Space
Parameters selectionhellip
For Euclidean Space
bullSingle projection hit an - Nearest Neighbor with Pr=p1
bullk projections hits an - Nearest Neighbor with Pr=p1k
bullL hashings fail to collide with Pr=(1-p1k)L
bullTo ensure Collision (eg 1-δge90)
bull1( -1-p1k)Lge 1-δ)1log(
)log(
1kp
L
L
Reject Non-NeighborsAccept Neighbors
hellipParameters selection
K
k
time Candidates verification Candidates extraction
Better Query Time than Spatial Data Structures
Scales well to higher dimensions and larger data size ( Sub-linear dependence )
Predictable running time
Extra storage over-head
Inefficient for data with distances concentrated around average
works best for Hamming distance (although can be generalized to Euclidean space)
In secondary storage linear scan is pretty much all we can do (for high dim)
requires radius r to be fixed in advance
Pros amp Cons
From Pioter Indyk slides
Conclusion
bullbut at the endeverything depends on your data set
bullTry it at homendashVisit
httpwebmiteduandoniwwwLSHindexhtml
ndashEmail Alex AndoniAndonimitedundashTest over your own data
(C code under Red Hat Linux )
LSH - Applicationsbull Searching video clips in databases (Hierarchical Non-Uniform Locality Sensitive
Hashing and Its Application to Video Identificationldquo Yang Ooi Sun)
bull Searching image databases (see the following)
bull Image segmentation (see the following)
bull Image classification (ldquoDiscriminant adaptive Nearest Neighbor Classificationrdquo T Hastie R Tibshirani)
bull Texture classification (see the following)
bull Clustering (see the following)
bull Embedding and manifold learning (LLE and many others)
bull Compression ndash vector quantizationbull Search engines (ldquoLSH Forest SelfTuning Indexes for Similarity Searchrdquo M Bawa T Condie P Ganesanrdquo)
bull Genomics (ldquoEfficient Large-Scale Sequence Comparison by Locality-Sensitive Hashingrdquo J Buhler)
bull In short whenever K-Nearest Neighbors (KNN) are needed
Motivation
bull A variety of procedures in learning require KNN computation
bull KNN search is a computational bottleneck
bull LSH provides a fast approximate solution to the problem
bull LSH requires hash function construction and parameter tunning
Outline
Fast Pose Estimation with Parameter Sensitive Hashing G Shakhnarovich P Viola and T Darrell
bull Finding sensitive hash functions
Mean Shift Based Clustering in HighDimensions A Texture Classification Example
B Georgescu I Shimshoni and P Meer
bull Tuning LSH parametersbull LSH data structure is used for algorithm
speedups
Given an image x what are the parameters θ in this image
ie angles of joints orientation of the body etc1048698
The Problem
Fast Pose Estimation with Parameter Sensitive Hashing
G Shakhnarovich P Viola and T Darrell
i
Ingredients
bull Input query image with unknown angles (parameters)
bull Database of human poses with known anglesbull Image feature extractor ndash edge detector
bull Distance metric in feature space dx
bull Distance metric in angles space
m
i
iid1
2121 )cos(1)(
Example based learning
bull Construct a database of example images with their known angles
bull Given a query image run your favorite feature extractorbull Compute KNN from databasebull Use these KNNs to compute the average angles of the
query
Input queryFind KNN in database of examples
Output Average angles of KNN
Input Query
Features extraction
Processed query
PSH (LSH)
Database of examples
The algorithm flow
LWR (Regression)
Output Match
The image features
B A
Axx 4107 )(
4
3
2
4 0
Image features are multi-scale edge histograms
Feature Extraction PSH LWR
PSH The basic assumption
There are two metric spaces here feature space ( )
and parameter space ( )
We want similarity to be measured in the angles
space whereas LSH works on the feature space
bull Assumption The feature space is closely related to the parameter space
xd
d
Feature Extraction PSH LWR
Insight Manifolds
bull Manifold is a space in which every point has a neighborhood resembling a Euclid space
bull But global structure may be complicated curved
bull For example lines are 1D manifolds planes are 2D manifolds etc
Feature Extraction PSH LWR
Parameters Space (angles)
Feature Space
q
Is this Magic
Parameter Sensitive Hashing (PSH)
The trick
Estimate performance of different hash functions on examples and select those sensitive to
The hash functions are applied in feature space but the KNN are valid in angle space
d
Feature Extraction PSH LWR
Label pairs of examples with similar angles
Define hash functions h on feature space
Feature Extraction PSH LWR
Predict labeling of similarnon-similar examples by using h
Compare labeling
If labeling by h is goodaccept h else change h
PSH as a classification problem
+1 +1 -1 -1
(r=025)
Labels
)1()( if 1
)( if 1y
labeled is
)x()(x examples ofpair A
ij
ji
rd
rd
ji
ji
ji
Feature Extraction PSH LWR
otherwise 1-
T(x) if 1)(
xh T
A binary hash functionfeatures
otherwise 1
if 1ˆ
labels ePredict th
)(xh)(xh)x(xy
jTiTjih
Feature Extraction PSH LWR
Feature
Feature Extraction PSH LWR
sconstraint iesprobabilit with thelabeling true thepredicts that Tbest theFind
themseparateor bin
same in the examplesboth place willTh
)(xT
Local Weighted Regression (LWR)bull Given a query image PSH returns
KNNs
bull LWR uses the KNN to compute a weighted average of the estimated angles of the query
weightdist
iXiixNx
xxdKxgdi
0)(
))(())((minarg0
Feature Extraction PSH LWR
Results
Synthetic data were generated
bull 13 angles 1 for rotation of the torso 12 for joints
bull 150000 images
bull Nuisance parameters added clothing illumination face expression
bull 1775000 example pairs
bull Selected 137 out of 5123 meaningful features (how)
18 bit hash functions (k) 150 hash tables (l)
bull Test on 1000 synthetic examplesbull PSH searched only 34 of the data per query
bull Without selection needed 40 bits and
1000 hash tables
Recall P1 is prob of positive hashP2 is prob of bad hashB is the max number of pts in a bucket
Results ndash real data
bull 800 images
bull Processed by a segmentation algorithm
bull 13 of the data were searched
Results ndash real data
Interesting mismatches
Fast pose estimation - summary
bull Fast way to compute the angles of human body figure
bull Moving from one representation space to another
bull Training a sensitive hash function
bull KNN smart averaging
Food for Thought
bull The basic assumption may be problematic (distance metric representations)
bull The training set should be dense
bull Texture and clutter
bull General some features are more important than others and should be weighted
Food for Thought Point Location in Different Spheres (PLDS)
bull Given n spheres in Rd centered at P=p1hellippn
with radii r1helliprn
bull Goal given a query q preprocess the points in P to find point pi that its sphere lsquocoverrsquo the query q
qpi
ri
Courtesy of Mohamad Hegaze
Motivationbull Clustering high dimensional data by using local
density measurements (eg feature space)bull Statistical curse of dimensionality
sparseness of the databull Computational curse of dimensionality
expensive range queriesbull LSH parameters should be adjusted for optimal
performance
Mean-Shift Based Clustering in High Dimensions A Texture Classification Example
B Georgescu I Shimshoni and P Meer
Outline
bull Mean-shift in a nutshell + examples
Our scope
bull Mean-shift in high dimensions ndash using LSH
bull Speedups1 Finding optimal LSH parameters
2 Data-driven partitions into buckets
3 Additional speedup by using LSH data structure
Mean-Shift in a Nutshellbandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
point
KNN in mean-shift
Bandwidth should be inversely proportional to the density in the region
high density - small bandwidth low density - large bandwidth
Based on kth nearest neighbor of the point
The bandwidth is
Adaptive mean-shift vs non-adaptive
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Image segmentation algorithm1 Input Data in 5D (3 color + 2 xy) or 3D (1 gray +2 xy)2 Resolution controlled by the bandwidth hs (spatial) hr (color)3 Apply filtering
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
3D
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Image segmentation algorithm
original segmented
filtered
Filtering pixel value of the nearest mode
Mean-shift trajectories
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
original squirrel filtered
original baboon filtered
Filtering examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Segmentation examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Mean-shift in high dimensions
Computational curse of dimensionality
Statistical curse of dimensionality
Expensive range queries implemented with LSH
Sparseness of the data variable bandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
LSH-based data structure
bull Choose L random partitionsEach partition includes K pairs
(dkvk)bull For each point we check
kdi vxK
It Partitions the data into cells
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing the optimal K and L
bull For a query q compute smallest number of distances to points in its buckets
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
points extra includemight big toois L ifbut
missed bemight points small toois L If
cell ain points ofnumber smaller k Large
C
l
l
CC
dC
LNN
dKnN
)1(
C
C
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
structure data theof resolution thedetermines
decreases but increases increases L As
C
CC
Choosing optimal K and LDetermine accurately the KNN for m randomly-selected data points
distance (bandwidth)
Choose error threshold
The optimal K and L should satisfy
the approximate distance
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing optimal K and Lbull For each K estimate the error forbull In one run for all Lrsquos find the minimal L satisfying the constraint L(K)bull Minimize time t(KL(K))
minimum
Approximationerror for KL
L(K) for =005 Running timet[KL(K)]
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Data driven partitions
bull In original LSH cut values are random in the range of the databull Suggestion Randomly select a point from the data and use one of its coordinates as the cut value
uniform data driven pointsbucket distribution
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Additional speedup
aggregate)an of typea like is (C mode same
the toconverge willCin points all that Assume
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
C
C
Speedup results
65536 points 1638 points sampled k=100
Food for thought
Low dimension High dimension
A thought for foodhellipbull Choose K L by sample learning or take the traditionalbull Can one estimate K L without samplingbull A thought for food does it help to know the data
dimensionality or the data manifoldbull Intuitively dimensionality implies the number of hash
functions neededbull The catch efficient dimensionality learning requires
KNN
1530 cookieshellip
Summary
bull LSH suggests a compromise on accuracy for the gain of complexity
bull Applications that involve massive data in high dimension require the LSH fast performance
bull Extension of the LSH to different spaces (PSH)
bull Learning the LSH parameters and hash functions for different applications
Conclusion
bull but at the endeverything depends on your data set
bull Try it at homendash Visit
httpwebmiteduandoniwwwLSHindexhtmlndash Email Alex Andoni Andonimitedundash Test over your own data
(C code under Red Hat Linux )
Thanks
bull Ilan Shimshoni (Haifa)
bull Mohamad Hegaze (Weizmann)
bull Alex Andoni (MIT)
bull Mica and Denis
- k-Nearest Neighbors Search in High Dimensions
- Outline
- Nearest Neighbor Search Problem definition
- Applications
- Naiumlve solution
- Common solution
- When to use nearest neighbor
- Nearest Neighbor
- r - Nearest Neighbor
- Slide 10
- The simplest solution
- Quadtree
- Quadtree - structure
- Quadtree - Query
- Quadtree ndash Pitfall1
- Slide 16
- Quadtree ndash pitfall 2
- Space partition based algorithms
- Slide 19
- Curse of dimensionality
- Curse of dimensionality Some intuition
- Slide 22
- Preview
- Hash function
- Slide 25
- Slide 26
- Recall r - Nearest Neighbor
- Locality sensitive hashing
- Slide 29
- Hamming Space
- Slide 31
- L1 to Hamming Space Embedding
- Slide 33
- Construction
- Query
- Alternative intuition random projections
- Slide 37
- Slide 38
- Slide 39
- k samplings
- Repeating
- Repeating L times
- Slide 43
- Secondary hashing
- The above hashing is locality-sensitive
- Slide 46
- Direct L2 solution
- Central limit theorem
- Slide 49
- Slide 50
- Norm Distance
- Slide 52
- The full Hashing
- Slide 54
- Slide 55
- Slide 56
- Generalization P-Stable distribution
- P-Stable summary
- Parameters selection
- Parameters selection hellip
- hellip Parameters selection
- Pros amp Cons
- Conclusion
- LSH - Applications
- Motivation
- Slide 66
- Given an image x what are the parameters θ in this image ie angles of joints orientation of the body etc1048698
- Ingredients
- Example based learning
- Slide 70
- The image features
- PSH The basic assumption
- Insight Manifolds
- Slide 74
- Parameter Sensitive Hashing (PSH)
- Slide 76
- Slide 77
- Slide 78
- Slide 79
- Local Weighted Regression (LWR)
- Results
- Slide 82
- Results ndash real data
- Slide 84
- Slide 85
- Fast pose estimation - summary
- Food for Thought
- Food for Thought Point Location in Different Spheres (PLDS)
- Motivation
- Slide 90
- Slide 91
- Slide 92
- Slide 93
- Image segmentation algorithm
- Slide 95
- Filtering examples
- Segmentation examples
- Mean-shift in high dimensions
- LSH-based data structure
- Choosing the optimal K and L
- Slide 101
- Choosing optimal K and L
- Slide 103
- Data driven partitions
- Additional speedup
- Speedup results
- Food for thought
- A thought for foodhellip
- Summary
- Slide 110
- Thanks
-
![Page 58: K-Nearest Neighbors Search in High Dimensions Tomer Peled Dan Kushnir Tell me who your neighbors are, and I'll know who you are.](https://reader036.fdocuments.us/reader036/viewer/2022062421/56649c765503460f94929f53/html5/thumbnails/58.jpg)
P-Stable summary
bullWorks for bullGeneralizes to 0ltplt=2
bullImproves query time
Query time = O (dn1(1+)log(n) ) O (dn1(1+)^2log(n) )
r - Nearest Neighbor
Latest resultsReported in Email by
Alexander Andoni
Parameters selection
bull90 Probability Best quarry time performance
For Euclidean Space
Parameters selectionhellip
For Euclidean Space
bullSingle projection hit an - Nearest Neighbor with Pr=p1
bullk projections hits an - Nearest Neighbor with Pr=p1k
bullL hashings fail to collide with Pr=(1-p1k)L
bullTo ensure Collision (eg 1-δge90)
bull1( -1-p1k)Lge 1-δ)1log(
)log(
1kp
L
L
Reject Non-NeighborsAccept Neighbors
hellipParameters selection
K
k
time Candidates verification Candidates extraction
Better Query Time than Spatial Data Structures
Scales well to higher dimensions and larger data size ( Sub-linear dependence )
Predictable running time
Extra storage over-head
Inefficient for data with distances concentrated around average
works best for Hamming distance (although can be generalized to Euclidean space)
In secondary storage linear scan is pretty much all we can do (for high dim)
requires radius r to be fixed in advance
Pros amp Cons
From Pioter Indyk slides
Conclusion
bullbut at the endeverything depends on your data set
bullTry it at homendashVisit
httpwebmiteduandoniwwwLSHindexhtml
ndashEmail Alex AndoniAndonimitedundashTest over your own data
(C code under Red Hat Linux )
LSH - Applicationsbull Searching video clips in databases (Hierarchical Non-Uniform Locality Sensitive
Hashing and Its Application to Video Identificationldquo Yang Ooi Sun)
bull Searching image databases (see the following)
bull Image segmentation (see the following)
bull Image classification (ldquoDiscriminant adaptive Nearest Neighbor Classificationrdquo T Hastie R Tibshirani)
bull Texture classification (see the following)
bull Clustering (see the following)
bull Embedding and manifold learning (LLE and many others)
bull Compression ndash vector quantizationbull Search engines (ldquoLSH Forest SelfTuning Indexes for Similarity Searchrdquo M Bawa T Condie P Ganesanrdquo)
bull Genomics (ldquoEfficient Large-Scale Sequence Comparison by Locality-Sensitive Hashingrdquo J Buhler)
bull In short whenever K-Nearest Neighbors (KNN) are needed
Motivation
bull A variety of procedures in learning require KNN computation
bull KNN search is a computational bottleneck
bull LSH provides a fast approximate solution to the problem
bull LSH requires hash function construction and parameter tunning
Outline
Fast Pose Estimation with Parameter Sensitive Hashing G Shakhnarovich P Viola and T Darrell
bull Finding sensitive hash functions
Mean Shift Based Clustering in HighDimensions A Texture Classification Example
B Georgescu I Shimshoni and P Meer
bull Tuning LSH parametersbull LSH data structure is used for algorithm
speedups
Given an image x what are the parameters θ in this image
ie angles of joints orientation of the body etc1048698
The Problem
Fast Pose Estimation with Parameter Sensitive Hashing
G Shakhnarovich P Viola and T Darrell
i
Ingredients
bull Input query image with unknown angles (parameters)
bull Database of human poses with known anglesbull Image feature extractor ndash edge detector
bull Distance metric in feature space dx
bull Distance metric in angles space
m
i
iid1
2121 )cos(1)(
Example based learning
bull Construct a database of example images with their known angles
bull Given a query image run your favorite feature extractorbull Compute KNN from databasebull Use these KNNs to compute the average angles of the
query
Input queryFind KNN in database of examples
Output Average angles of KNN
Input Query
Features extraction
Processed query
PSH (LSH)
Database of examples
The algorithm flow
LWR (Regression)
Output Match
The image features
B A
Axx 4107 )(
4
3
2
4 0
Image features are multi-scale edge histograms
Feature Extraction PSH LWR
PSH The basic assumption
There are two metric spaces here feature space ( )
and parameter space ( )
We want similarity to be measured in the angles
space whereas LSH works on the feature space
bull Assumption The feature space is closely related to the parameter space
xd
d
Feature Extraction PSH LWR
Insight Manifolds
bull Manifold is a space in which every point has a neighborhood resembling a Euclid space
bull But global structure may be complicated curved
bull For example lines are 1D manifolds planes are 2D manifolds etc
Feature Extraction PSH LWR
Parameters Space (angles)
Feature Space
q
Is this Magic
Parameter Sensitive Hashing (PSH)
The trick
Estimate performance of different hash functions on examples and select those sensitive to
The hash functions are applied in feature space but the KNN are valid in angle space
d
Feature Extraction PSH LWR
Label pairs of examples with similar angles
Define hash functions h on feature space
Feature Extraction PSH LWR
Predict labeling of similarnon-similar examples by using h
Compare labeling
If labeling by h is goodaccept h else change h
PSH as a classification problem
+1 +1 -1 -1
(r=025)
Labels
)1()( if 1
)( if 1y
labeled is
)x()(x examples ofpair A
ij
ji
rd
rd
ji
ji
ji
Feature Extraction PSH LWR
otherwise 1-
T(x) if 1)(
xh T
A binary hash functionfeatures
otherwise 1
if 1ˆ
labels ePredict th
)(xh)(xh)x(xy
jTiTjih
Feature Extraction PSH LWR
Feature
Feature Extraction PSH LWR
sconstraint iesprobabilit with thelabeling true thepredicts that Tbest theFind
themseparateor bin
same in the examplesboth place willTh
)(xT
Local Weighted Regression (LWR)bull Given a query image PSH returns
KNNs
bull LWR uses the KNN to compute a weighted average of the estimated angles of the query
weightdist
iXiixNx
xxdKxgdi
0)(
))(())((minarg0
Feature Extraction PSH LWR
Results
Synthetic data were generated
bull 13 angles 1 for rotation of the torso 12 for joints
bull 150000 images
bull Nuisance parameters added clothing illumination face expression
bull 1775000 example pairs
bull Selected 137 out of 5123 meaningful features (how)
18 bit hash functions (k) 150 hash tables (l)
bull Test on 1000 synthetic examplesbull PSH searched only 34 of the data per query
bull Without selection needed 40 bits and
1000 hash tables
Recall P1 is prob of positive hashP2 is prob of bad hashB is the max number of pts in a bucket
Results ndash real data
bull 800 images
bull Processed by a segmentation algorithm
bull 13 of the data were searched
Results ndash real data
Interesting mismatches
Fast pose estimation - summary
bull Fast way to compute the angles of human body figure
bull Moving from one representation space to another
bull Training a sensitive hash function
bull KNN smart averaging
Food for Thought
bull The basic assumption may be problematic (distance metric representations)
bull The training set should be dense
bull Texture and clutter
bull General some features are more important than others and should be weighted
Food for Thought Point Location in Different Spheres (PLDS)
bull Given n spheres in Rd centered at P=p1hellippn
with radii r1helliprn
bull Goal given a query q preprocess the points in P to find point pi that its sphere lsquocoverrsquo the query q
qpi
ri
Courtesy of Mohamad Hegaze
Motivationbull Clustering high dimensional data by using local
density measurements (eg feature space)bull Statistical curse of dimensionality
sparseness of the databull Computational curse of dimensionality
expensive range queriesbull LSH parameters should be adjusted for optimal
performance
Mean-Shift Based Clustering in High Dimensions A Texture Classification Example
B Georgescu I Shimshoni and P Meer
Outline
bull Mean-shift in a nutshell + examples
Our scope
bull Mean-shift in high dimensions ndash using LSH
bull Speedups1 Finding optimal LSH parameters
2 Data-driven partitions into buckets
3 Additional speedup by using LSH data structure
Mean-Shift in a Nutshellbandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
point
KNN in mean-shift
Bandwidth should be inversely proportional to the density in the region
high density - small bandwidth low density - large bandwidth
Based on kth nearest neighbor of the point
The bandwidth is
Adaptive mean-shift vs non-adaptive
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Image segmentation algorithm1 Input Data in 5D (3 color + 2 xy) or 3D (1 gray +2 xy)2 Resolution controlled by the bandwidth hs (spatial) hr (color)3 Apply filtering
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
3D
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Image segmentation algorithm
original segmented
filtered
Filtering pixel value of the nearest mode
Mean-shift trajectories
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
original squirrel filtered
original baboon filtered
Filtering examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Segmentation examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Mean-shift in high dimensions
Computational curse of dimensionality
Statistical curse of dimensionality
Expensive range queries implemented with LSH
Sparseness of the data variable bandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
LSH-based data structure
bull Choose L random partitionsEach partition includes K pairs
(dkvk)bull For each point we check
kdi vxK
It Partitions the data into cells
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing the optimal K and L
bull For a query q compute smallest number of distances to points in its buckets
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
points extra includemight big toois L ifbut
missed bemight points small toois L If
cell ain points ofnumber smaller k Large
C
l
l
CC
dC
LNN
dKnN
)1(
C
C
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
structure data theof resolution thedetermines
decreases but increases increases L As
C
CC
Choosing optimal K and LDetermine accurately the KNN for m randomly-selected data points
distance (bandwidth)
Choose error threshold
The optimal K and L should satisfy
the approximate distance
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing optimal K and Lbull For each K estimate the error forbull In one run for all Lrsquos find the minimal L satisfying the constraint L(K)bull Minimize time t(KL(K))
minimum
Approximationerror for KL
L(K) for =005 Running timet[KL(K)]
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Data driven partitions
bull In original LSH cut values are random in the range of the databull Suggestion Randomly select a point from the data and use one of its coordinates as the cut value
uniform data driven pointsbucket distribution
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Additional speedup
aggregate)an of typea like is (C mode same
the toconverge willCin points all that Assume
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
C
C
Speedup results
65536 points 1638 points sampled k=100
Food for thought
Low dimension High dimension
A thought for foodhellipbull Choose K L by sample learning or take the traditionalbull Can one estimate K L without samplingbull A thought for food does it help to know the data
dimensionality or the data manifoldbull Intuitively dimensionality implies the number of hash
functions neededbull The catch efficient dimensionality learning requires
KNN
1530 cookieshellip
Summary
bull LSH suggests a compromise on accuracy for the gain of complexity
bull Applications that involve massive data in high dimension require the LSH fast performance
bull Extension of the LSH to different spaces (PSH)
bull Learning the LSH parameters and hash functions for different applications
Conclusion
bull but at the endeverything depends on your data set
bull Try it at homendash Visit
httpwebmiteduandoniwwwLSHindexhtmlndash Email Alex Andoni Andonimitedundash Test over your own data
(C code under Red Hat Linux )
Thanks
bull Ilan Shimshoni (Haifa)
bull Mohamad Hegaze (Weizmann)
bull Alex Andoni (MIT)
bull Mica and Denis
- k-Nearest Neighbors Search in High Dimensions
- Outline
- Nearest Neighbor Search Problem definition
- Applications
- Naiumlve solution
- Common solution
- When to use nearest neighbor
- Nearest Neighbor
- r - Nearest Neighbor
- Slide 10
- The simplest solution
- Quadtree
- Quadtree - structure
- Quadtree - Query
- Quadtree ndash Pitfall1
- Slide 16
- Quadtree ndash pitfall 2
- Space partition based algorithms
- Slide 19
- Curse of dimensionality
- Curse of dimensionality Some intuition
- Slide 22
- Preview
- Hash function
- Slide 25
- Slide 26
- Recall r - Nearest Neighbor
- Locality sensitive hashing
- Slide 29
- Hamming Space
- Slide 31
- L1 to Hamming Space Embedding
- Slide 33
- Construction
- Query
- Alternative intuition random projections
- Slide 37
- Slide 38
- Slide 39
- k samplings
- Repeating
- Repeating L times
- Slide 43
- Secondary hashing
- The above hashing is locality-sensitive
- Slide 46
- Direct L2 solution
- Central limit theorem
- Slide 49
- Slide 50
- Norm Distance
- Slide 52
- The full Hashing
- Slide 54
- Slide 55
- Slide 56
- Generalization P-Stable distribution
- P-Stable summary
- Parameters selection
- Parameters selection hellip
- hellip Parameters selection
- Pros amp Cons
- Conclusion
- LSH - Applications
- Motivation
- Slide 66
- Given an image x what are the parameters θ in this image ie angles of joints orientation of the body etc1048698
- Ingredients
- Example based learning
- Slide 70
- The image features
- PSH The basic assumption
- Insight Manifolds
- Slide 74
- Parameter Sensitive Hashing (PSH)
- Slide 76
- Slide 77
- Slide 78
- Slide 79
- Local Weighted Regression (LWR)
- Results
- Slide 82
- Results ndash real data
- Slide 84
- Slide 85
- Fast pose estimation - summary
- Food for Thought
- Food for Thought Point Location in Different Spheres (PLDS)
- Motivation
- Slide 90
- Slide 91
- Slide 92
- Slide 93
- Image segmentation algorithm
- Slide 95
- Filtering examples
- Segmentation examples
- Mean-shift in high dimensions
- LSH-based data structure
- Choosing the optimal K and L
- Slide 101
- Choosing optimal K and L
- Slide 103
- Data driven partitions
- Additional speedup
- Speedup results
- Food for thought
- A thought for foodhellip
- Summary
- Slide 110
- Thanks
-
![Page 59: K-Nearest Neighbors Search in High Dimensions Tomer Peled Dan Kushnir Tell me who your neighbors are, and I'll know who you are.](https://reader036.fdocuments.us/reader036/viewer/2022062421/56649c765503460f94929f53/html5/thumbnails/59.jpg)
Parameters selection
bull90 Probability Best quarry time performance
For Euclidean Space
Parameters selectionhellip
For Euclidean Space
bullSingle projection hit an - Nearest Neighbor with Pr=p1
bullk projections hits an - Nearest Neighbor with Pr=p1k
bullL hashings fail to collide with Pr=(1-p1k)L
bullTo ensure Collision (eg 1-δge90)
bull1( -1-p1k)Lge 1-δ)1log(
)log(
1kp
L
L
Reject Non-NeighborsAccept Neighbors
hellipParameters selection
K
k
time Candidates verification Candidates extraction
Better Query Time than Spatial Data Structures
Scales well to higher dimensions and larger data size ( Sub-linear dependence )
Predictable running time
Extra storage over-head
Inefficient for data with distances concentrated around average
works best for Hamming distance (although can be generalized to Euclidean space)
In secondary storage linear scan is pretty much all we can do (for high dim)
requires radius r to be fixed in advance
Pros amp Cons
From Pioter Indyk slides
Conclusion
bullbut at the endeverything depends on your data set
bullTry it at homendashVisit
httpwebmiteduandoniwwwLSHindexhtml
ndashEmail Alex AndoniAndonimitedundashTest over your own data
(C code under Red Hat Linux )
LSH - Applicationsbull Searching video clips in databases (Hierarchical Non-Uniform Locality Sensitive
Hashing and Its Application to Video Identificationldquo Yang Ooi Sun)
bull Searching image databases (see the following)
bull Image segmentation (see the following)
bull Image classification (ldquoDiscriminant adaptive Nearest Neighbor Classificationrdquo T Hastie R Tibshirani)
bull Texture classification (see the following)
bull Clustering (see the following)
bull Embedding and manifold learning (LLE and many others)
bull Compression ndash vector quantizationbull Search engines (ldquoLSH Forest SelfTuning Indexes for Similarity Searchrdquo M Bawa T Condie P Ganesanrdquo)
bull Genomics (ldquoEfficient Large-Scale Sequence Comparison by Locality-Sensitive Hashingrdquo J Buhler)
bull In short whenever K-Nearest Neighbors (KNN) are needed
Motivation
bull A variety of procedures in learning require KNN computation
bull KNN search is a computational bottleneck
bull LSH provides a fast approximate solution to the problem
bull LSH requires hash function construction and parameter tunning
Outline
Fast Pose Estimation with Parameter Sensitive Hashing G Shakhnarovich P Viola and T Darrell
bull Finding sensitive hash functions
Mean Shift Based Clustering in HighDimensions A Texture Classification Example
B Georgescu I Shimshoni and P Meer
bull Tuning LSH parametersbull LSH data structure is used for algorithm
speedups
Given an image x what are the parameters θ in this image
ie angles of joints orientation of the body etc1048698
The Problem
Fast Pose Estimation with Parameter Sensitive Hashing
G Shakhnarovich P Viola and T Darrell
i
Ingredients
bull Input query image with unknown angles (parameters)
bull Database of human poses with known anglesbull Image feature extractor ndash edge detector
bull Distance metric in feature space dx
bull Distance metric in angles space
m
i
iid1
2121 )cos(1)(
Example based learning
bull Construct a database of example images with their known angles
bull Given a query image run your favorite feature extractorbull Compute KNN from databasebull Use these KNNs to compute the average angles of the
query
Input queryFind KNN in database of examples
Output Average angles of KNN
Input Query
Features extraction
Processed query
PSH (LSH)
Database of examples
The algorithm flow
LWR (Regression)
Output Match
The image features
B A
Axx 4107 )(
4
3
2
4 0
Image features are multi-scale edge histograms
Feature Extraction PSH LWR
PSH The basic assumption
There are two metric spaces here feature space ( )
and parameter space ( )
We want similarity to be measured in the angles
space whereas LSH works on the feature space
bull Assumption The feature space is closely related to the parameter space
xd
d
Feature Extraction PSH LWR
Insight Manifolds
bull Manifold is a space in which every point has a neighborhood resembling a Euclid space
bull But global structure may be complicated curved
bull For example lines are 1D manifolds planes are 2D manifolds etc
Feature Extraction PSH LWR
Parameters Space (angles)
Feature Space
q
Is this Magic
Parameter Sensitive Hashing (PSH)
The trick
Estimate performance of different hash functions on examples and select those sensitive to
The hash functions are applied in feature space but the KNN are valid in angle space
d
Feature Extraction PSH LWR
Label pairs of examples with similar angles
Define hash functions h on feature space
Feature Extraction PSH LWR
Predict labeling of similarnon-similar examples by using h
Compare labeling
If labeling by h is goodaccept h else change h
PSH as a classification problem
+1 +1 -1 -1
(r=025)
Labels
)1()( if 1
)( if 1y
labeled is
)x()(x examples ofpair A
ij
ji
rd
rd
ji
ji
ji
Feature Extraction PSH LWR
otherwise 1-
T(x) if 1)(
xh T
A binary hash functionfeatures
otherwise 1
if 1ˆ
labels ePredict th
)(xh)(xh)x(xy
jTiTjih
Feature Extraction PSH LWR
Feature
Feature Extraction PSH LWR
sconstraint iesprobabilit with thelabeling true thepredicts that Tbest theFind
themseparateor bin
same in the examplesboth place willTh
)(xT
Local Weighted Regression (LWR)bull Given a query image PSH returns
KNNs
bull LWR uses the KNN to compute a weighted average of the estimated angles of the query
weightdist
iXiixNx
xxdKxgdi
0)(
))(())((minarg0
Feature Extraction PSH LWR
Results
Synthetic data were generated
bull 13 angles 1 for rotation of the torso 12 for joints
bull 150000 images
bull Nuisance parameters added clothing illumination face expression
bull 1775000 example pairs
bull Selected 137 out of 5123 meaningful features (how)
18 bit hash functions (k) 150 hash tables (l)
bull Test on 1000 synthetic examplesbull PSH searched only 34 of the data per query
bull Without selection needed 40 bits and
1000 hash tables
Recall P1 is prob of positive hashP2 is prob of bad hashB is the max number of pts in a bucket
Results ndash real data
bull 800 images
bull Processed by a segmentation algorithm
bull 13 of the data were searched
Results ndash real data
Interesting mismatches
Fast pose estimation - summary
bull Fast way to compute the angles of human body figure
bull Moving from one representation space to another
bull Training a sensitive hash function
bull KNN smart averaging
Food for Thought
bull The basic assumption may be problematic (distance metric representations)
bull The training set should be dense
bull Texture and clutter
bull General some features are more important than others and should be weighted
Food for Thought Point Location in Different Spheres (PLDS)
bull Given n spheres in Rd centered at P=p1hellippn
with radii r1helliprn
bull Goal given a query q preprocess the points in P to find point pi that its sphere lsquocoverrsquo the query q
qpi
ri
Courtesy of Mohamad Hegaze
Motivationbull Clustering high dimensional data by using local
density measurements (eg feature space)bull Statistical curse of dimensionality
sparseness of the databull Computational curse of dimensionality
expensive range queriesbull LSH parameters should be adjusted for optimal
performance
Mean-Shift Based Clustering in High Dimensions A Texture Classification Example
B Georgescu I Shimshoni and P Meer
Outline
bull Mean-shift in a nutshell + examples
Our scope
bull Mean-shift in high dimensions ndash using LSH
bull Speedups1 Finding optimal LSH parameters
2 Data-driven partitions into buckets
3 Additional speedup by using LSH data structure
Mean-Shift in a Nutshellbandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
point
KNN in mean-shift
Bandwidth should be inversely proportional to the density in the region
high density - small bandwidth low density - large bandwidth
Based on kth nearest neighbor of the point
The bandwidth is
Adaptive mean-shift vs non-adaptive
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Image segmentation algorithm1 Input Data in 5D (3 color + 2 xy) or 3D (1 gray +2 xy)2 Resolution controlled by the bandwidth hs (spatial) hr (color)3 Apply filtering
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
3D
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Image segmentation algorithm
original segmented
filtered
Filtering pixel value of the nearest mode
Mean-shift trajectories
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
original squirrel filtered
original baboon filtered
Filtering examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Segmentation examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Mean-shift in high dimensions
Computational curse of dimensionality
Statistical curse of dimensionality
Expensive range queries implemented with LSH
Sparseness of the data variable bandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
LSH-based data structure
bull Choose L random partitionsEach partition includes K pairs
(dkvk)bull For each point we check
kdi vxK
It Partitions the data into cells
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing the optimal K and L
bull For a query q compute smallest number of distances to points in its buckets
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
points extra includemight big toois L ifbut
missed bemight points small toois L If
cell ain points ofnumber smaller k Large
C
l
l
CC
dC
LNN
dKnN
)1(
C
C
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
structure data theof resolution thedetermines
decreases but increases increases L As
C
CC
Choosing optimal K and LDetermine accurately the KNN for m randomly-selected data points
distance (bandwidth)
Choose error threshold
The optimal K and L should satisfy
the approximate distance
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing optimal K and Lbull For each K estimate the error forbull In one run for all Lrsquos find the minimal L satisfying the constraint L(K)bull Minimize time t(KL(K))
minimum
Approximationerror for KL
L(K) for =005 Running timet[KL(K)]
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Data driven partitions
bull In original LSH cut values are random in the range of the databull Suggestion Randomly select a point from the data and use one of its coordinates as the cut value
uniform data driven pointsbucket distribution
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Additional speedup
aggregate)an of typea like is (C mode same
the toconverge willCin points all that Assume
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
C
C
Speedup results
65536 points 1638 points sampled k=100
Food for thought
Low dimension High dimension
A thought for foodhellipbull Choose K L by sample learning or take the traditionalbull Can one estimate K L without samplingbull A thought for food does it help to know the data
dimensionality or the data manifoldbull Intuitively dimensionality implies the number of hash
functions neededbull The catch efficient dimensionality learning requires
KNN
1530 cookieshellip
Summary
bull LSH suggests a compromise on accuracy for the gain of complexity
bull Applications that involve massive data in high dimension require the LSH fast performance
bull Extension of the LSH to different spaces (PSH)
bull Learning the LSH parameters and hash functions for different applications
Conclusion
bull but at the endeverything depends on your data set
bull Try it at homendash Visit
httpwebmiteduandoniwwwLSHindexhtmlndash Email Alex Andoni Andonimitedundash Test over your own data
(C code under Red Hat Linux )
Thanks
bull Ilan Shimshoni (Haifa)
bull Mohamad Hegaze (Weizmann)
bull Alex Andoni (MIT)
bull Mica and Denis
- k-Nearest Neighbors Search in High Dimensions
- Outline
- Nearest Neighbor Search Problem definition
- Applications
- Naiumlve solution
- Common solution
- When to use nearest neighbor
- Nearest Neighbor
- r - Nearest Neighbor
- Slide 10
- The simplest solution
- Quadtree
- Quadtree - structure
- Quadtree - Query
- Quadtree ndash Pitfall1
- Slide 16
- Quadtree ndash pitfall 2
- Space partition based algorithms
- Slide 19
- Curse of dimensionality
- Curse of dimensionality Some intuition
- Slide 22
- Preview
- Hash function
- Slide 25
- Slide 26
- Recall r - Nearest Neighbor
- Locality sensitive hashing
- Slide 29
- Hamming Space
- Slide 31
- L1 to Hamming Space Embedding
- Slide 33
- Construction
- Query
- Alternative intuition random projections
- Slide 37
- Slide 38
- Slide 39
- k samplings
- Repeating
- Repeating L times
- Slide 43
- Secondary hashing
- The above hashing is locality-sensitive
- Slide 46
- Direct L2 solution
- Central limit theorem
- Slide 49
- Slide 50
- Norm Distance
- Slide 52
- The full Hashing
- Slide 54
- Slide 55
- Slide 56
- Generalization P-Stable distribution
- P-Stable summary
- Parameters selection
- Parameters selection hellip
- hellip Parameters selection
- Pros amp Cons
- Conclusion
- LSH - Applications
- Motivation
- Slide 66
- Given an image x what are the parameters θ in this image ie angles of joints orientation of the body etc1048698
- Ingredients
- Example based learning
- Slide 70
- The image features
- PSH The basic assumption
- Insight Manifolds
- Slide 74
- Parameter Sensitive Hashing (PSH)
- Slide 76
- Slide 77
- Slide 78
- Slide 79
- Local Weighted Regression (LWR)
- Results
- Slide 82
- Results ndash real data
- Slide 84
- Slide 85
- Fast pose estimation - summary
- Food for Thought
- Food for Thought Point Location in Different Spheres (PLDS)
- Motivation
- Slide 90
- Slide 91
- Slide 92
- Slide 93
- Image segmentation algorithm
- Slide 95
- Filtering examples
- Segmentation examples
- Mean-shift in high dimensions
- LSH-based data structure
- Choosing the optimal K and L
- Slide 101
- Choosing optimal K and L
- Slide 103
- Data driven partitions
- Additional speedup
- Speedup results
- Food for thought
- A thought for foodhellip
- Summary
- Slide 110
- Thanks
-
![Page 60: K-Nearest Neighbors Search in High Dimensions Tomer Peled Dan Kushnir Tell me who your neighbors are, and I'll know who you are.](https://reader036.fdocuments.us/reader036/viewer/2022062421/56649c765503460f94929f53/html5/thumbnails/60.jpg)
Parameters selectionhellip
For Euclidean Space
bullSingle projection hit an - Nearest Neighbor with Pr=p1
bullk projections hits an - Nearest Neighbor with Pr=p1k
bullL hashings fail to collide with Pr=(1-p1k)L
bullTo ensure Collision (eg 1-δge90)
bull1( -1-p1k)Lge 1-δ)1log(
)log(
1kp
L
L
Reject Non-NeighborsAccept Neighbors
hellipParameters selection
K
k
time Candidates verification Candidates extraction
Better Query Time than Spatial Data Structures
Scales well to higher dimensions and larger data size ( Sub-linear dependence )
Predictable running time
Extra storage over-head
Inefficient for data with distances concentrated around average
works best for Hamming distance (although can be generalized to Euclidean space)
In secondary storage linear scan is pretty much all we can do (for high dim)
requires radius r to be fixed in advance
Pros amp Cons
From Pioter Indyk slides
Conclusion
bullbut at the endeverything depends on your data set
bullTry it at homendashVisit
httpwebmiteduandoniwwwLSHindexhtml
ndashEmail Alex AndoniAndonimitedundashTest over your own data
(C code under Red Hat Linux )
LSH - Applicationsbull Searching video clips in databases (Hierarchical Non-Uniform Locality Sensitive
Hashing and Its Application to Video Identificationldquo Yang Ooi Sun)
bull Searching image databases (see the following)
bull Image segmentation (see the following)
bull Image classification (ldquoDiscriminant adaptive Nearest Neighbor Classificationrdquo T Hastie R Tibshirani)
bull Texture classification (see the following)
bull Clustering (see the following)
bull Embedding and manifold learning (LLE and many others)
bull Compression ndash vector quantizationbull Search engines (ldquoLSH Forest SelfTuning Indexes for Similarity Searchrdquo M Bawa T Condie P Ganesanrdquo)
bull Genomics (ldquoEfficient Large-Scale Sequence Comparison by Locality-Sensitive Hashingrdquo J Buhler)
bull In short whenever K-Nearest Neighbors (KNN) are needed
Motivation
bull A variety of procedures in learning require KNN computation
bull KNN search is a computational bottleneck
bull LSH provides a fast approximate solution to the problem
bull LSH requires hash function construction and parameter tunning
Outline
Fast Pose Estimation with Parameter Sensitive Hashing G Shakhnarovich P Viola and T Darrell
bull Finding sensitive hash functions
Mean Shift Based Clustering in HighDimensions A Texture Classification Example
B Georgescu I Shimshoni and P Meer
bull Tuning LSH parametersbull LSH data structure is used for algorithm
speedups
Given an image x what are the parameters θ in this image
ie angles of joints orientation of the body etc1048698
The Problem
Fast Pose Estimation with Parameter Sensitive Hashing
G Shakhnarovich P Viola and T Darrell
i
Ingredients
bull Input query image with unknown angles (parameters)
bull Database of human poses with known anglesbull Image feature extractor ndash edge detector
bull Distance metric in feature space dx
bull Distance metric in angles space
m
i
iid1
2121 )cos(1)(
Example based learning
bull Construct a database of example images with their known angles
bull Given a query image run your favorite feature extractorbull Compute KNN from databasebull Use these KNNs to compute the average angles of the
query
Input queryFind KNN in database of examples
Output Average angles of KNN
Input Query
Features extraction
Processed query
PSH (LSH)
Database of examples
The algorithm flow
LWR (Regression)
Output Match
The image features
B A
Axx 4107 )(
4
3
2
4 0
Image features are multi-scale edge histograms
Feature Extraction PSH LWR
PSH The basic assumption
There are two metric spaces here feature space ( )
and parameter space ( )
We want similarity to be measured in the angles
space whereas LSH works on the feature space
bull Assumption The feature space is closely related to the parameter space
xd
d
Feature Extraction PSH LWR
Insight Manifolds
bull Manifold is a space in which every point has a neighborhood resembling a Euclid space
bull But global structure may be complicated curved
bull For example lines are 1D manifolds planes are 2D manifolds etc
Feature Extraction PSH LWR
Parameters Space (angles)
Feature Space
q
Is this Magic
Parameter Sensitive Hashing (PSH)
The trick
Estimate performance of different hash functions on examples and select those sensitive to
The hash functions are applied in feature space but the KNN are valid in angle space
d
Feature Extraction PSH LWR
Label pairs of examples with similar angles
Define hash functions h on feature space
Feature Extraction PSH LWR
Predict labeling of similarnon-similar examples by using h
Compare labeling
If labeling by h is goodaccept h else change h
PSH as a classification problem
+1 +1 -1 -1
(r=025)
Labels
)1()( if 1
)( if 1y
labeled is
)x()(x examples ofpair A
ij
ji
rd
rd
ji
ji
ji
Feature Extraction PSH LWR
otherwise 1-
T(x) if 1)(
xh T
A binary hash functionfeatures
otherwise 1
if 1ˆ
labels ePredict th
)(xh)(xh)x(xy
jTiTjih
Feature Extraction PSH LWR
Feature
Feature Extraction PSH LWR
sconstraint iesprobabilit with thelabeling true thepredicts that Tbest theFind
themseparateor bin
same in the examplesboth place willTh
)(xT
Local Weighted Regression (LWR)bull Given a query image PSH returns
KNNs
bull LWR uses the KNN to compute a weighted average of the estimated angles of the query
weightdist
iXiixNx
xxdKxgdi
0)(
))(())((minarg0
Feature Extraction PSH LWR
Results
Synthetic data were generated
bull 13 angles 1 for rotation of the torso 12 for joints
bull 150000 images
bull Nuisance parameters added clothing illumination face expression
bull 1775000 example pairs
bull Selected 137 out of 5123 meaningful features (how)
18 bit hash functions (k) 150 hash tables (l)
bull Test on 1000 synthetic examplesbull PSH searched only 34 of the data per query
bull Without selection needed 40 bits and
1000 hash tables
Recall P1 is prob of positive hashP2 is prob of bad hashB is the max number of pts in a bucket
Results ndash real data
bull 800 images
bull Processed by a segmentation algorithm
bull 13 of the data were searched
Results ndash real data
Interesting mismatches
Fast pose estimation - summary
bull Fast way to compute the angles of human body figure
bull Moving from one representation space to another
bull Training a sensitive hash function
bull KNN smart averaging
Food for Thought
bull The basic assumption may be problematic (distance metric representations)
bull The training set should be dense
bull Texture and clutter
bull General some features are more important than others and should be weighted
Food for Thought Point Location in Different Spheres (PLDS)
bull Given n spheres in Rd centered at P=p1hellippn
with radii r1helliprn
bull Goal given a query q preprocess the points in P to find point pi that its sphere lsquocoverrsquo the query q
qpi
ri
Courtesy of Mohamad Hegaze
Motivationbull Clustering high dimensional data by using local
density measurements (eg feature space)bull Statistical curse of dimensionality
sparseness of the databull Computational curse of dimensionality
expensive range queriesbull LSH parameters should be adjusted for optimal
performance
Mean-Shift Based Clustering in High Dimensions A Texture Classification Example
B Georgescu I Shimshoni and P Meer
Outline
bull Mean-shift in a nutshell + examples
Our scope
bull Mean-shift in high dimensions ndash using LSH
bull Speedups1 Finding optimal LSH parameters
2 Data-driven partitions into buckets
3 Additional speedup by using LSH data structure
Mean-Shift in a Nutshellbandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
point
KNN in mean-shift
Bandwidth should be inversely proportional to the density in the region
high density - small bandwidth low density - large bandwidth
Based on kth nearest neighbor of the point
The bandwidth is
Adaptive mean-shift vs non-adaptive
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Image segmentation algorithm1 Input Data in 5D (3 color + 2 xy) or 3D (1 gray +2 xy)2 Resolution controlled by the bandwidth hs (spatial) hr (color)3 Apply filtering
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
3D
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Image segmentation algorithm
original segmented
filtered
Filtering pixel value of the nearest mode
Mean-shift trajectories
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
original squirrel filtered
original baboon filtered
Filtering examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Segmentation examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Mean-shift in high dimensions
Computational curse of dimensionality
Statistical curse of dimensionality
Expensive range queries implemented with LSH
Sparseness of the data variable bandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
LSH-based data structure
bull Choose L random partitionsEach partition includes K pairs
(dkvk)bull For each point we check
kdi vxK
It Partitions the data into cells
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing the optimal K and L
bull For a query q compute smallest number of distances to points in its buckets
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
points extra includemight big toois L ifbut
missed bemight points small toois L If
cell ain points ofnumber smaller k Large
C
l
l
CC
dC
LNN
dKnN
)1(
C
C
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
structure data theof resolution thedetermines
decreases but increases increases L As
C
CC
Choosing optimal K and LDetermine accurately the KNN for m randomly-selected data points
distance (bandwidth)
Choose error threshold
The optimal K and L should satisfy
the approximate distance
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing optimal K and Lbull For each K estimate the error forbull In one run for all Lrsquos find the minimal L satisfying the constraint L(K)bull Minimize time t(KL(K))
minimum
Approximationerror for KL
L(K) for =005 Running timet[KL(K)]
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Data driven partitions
bull In original LSH cut values are random in the range of the databull Suggestion Randomly select a point from the data and use one of its coordinates as the cut value
uniform data driven pointsbucket distribution
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Additional speedup
aggregate)an of typea like is (C mode same
the toconverge willCin points all that Assume
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
C
C
Speedup results
65536 points 1638 points sampled k=100
Food for thought
Low dimension High dimension
A thought for foodhellipbull Choose K L by sample learning or take the traditionalbull Can one estimate K L without samplingbull A thought for food does it help to know the data
dimensionality or the data manifoldbull Intuitively dimensionality implies the number of hash
functions neededbull The catch efficient dimensionality learning requires
KNN
1530 cookieshellip
Summary
bull LSH suggests a compromise on accuracy for the gain of complexity
bull Applications that involve massive data in high dimension require the LSH fast performance
bull Extension of the LSH to different spaces (PSH)
bull Learning the LSH parameters and hash functions for different applications
Conclusion
bull but at the endeverything depends on your data set
bull Try it at homendash Visit
httpwebmiteduandoniwwwLSHindexhtmlndash Email Alex Andoni Andonimitedundash Test over your own data
(C code under Red Hat Linux )
Thanks
bull Ilan Shimshoni (Haifa)
bull Mohamad Hegaze (Weizmann)
bull Alex Andoni (MIT)
bull Mica and Denis
- k-Nearest Neighbors Search in High Dimensions
- Outline
- Nearest Neighbor Search Problem definition
- Applications
- Naiumlve solution
- Common solution
- When to use nearest neighbor
- Nearest Neighbor
- r - Nearest Neighbor
- Slide 10
- The simplest solution
- Quadtree
- Quadtree - structure
- Quadtree - Query
- Quadtree ndash Pitfall1
- Slide 16
- Quadtree ndash pitfall 2
- Space partition based algorithms
- Slide 19
- Curse of dimensionality
- Curse of dimensionality Some intuition
- Slide 22
- Preview
- Hash function
- Slide 25
- Slide 26
- Recall r - Nearest Neighbor
- Locality sensitive hashing
- Slide 29
- Hamming Space
- Slide 31
- L1 to Hamming Space Embedding
- Slide 33
- Construction
- Query
- Alternative intuition random projections
- Slide 37
- Slide 38
- Slide 39
- k samplings
- Repeating
- Repeating L times
- Slide 43
- Secondary hashing
- The above hashing is locality-sensitive
- Slide 46
- Direct L2 solution
- Central limit theorem
- Slide 49
- Slide 50
- Norm Distance
- Slide 52
- The full Hashing
- Slide 54
- Slide 55
- Slide 56
- Generalization P-Stable distribution
- P-Stable summary
- Parameters selection
- Parameters selection hellip
- hellip Parameters selection
- Pros amp Cons
- Conclusion
- LSH - Applications
- Motivation
- Slide 66
- Given an image x what are the parameters θ in this image ie angles of joints orientation of the body etc1048698
- Ingredients
- Example based learning
- Slide 70
- The image features
- PSH The basic assumption
- Insight Manifolds
- Slide 74
- Parameter Sensitive Hashing (PSH)
- Slide 76
- Slide 77
- Slide 78
- Slide 79
- Local Weighted Regression (LWR)
- Results
- Slide 82
- Results ndash real data
- Slide 84
- Slide 85
- Fast pose estimation - summary
- Food for Thought
- Food for Thought Point Location in Different Spheres (PLDS)
- Motivation
- Slide 90
- Slide 91
- Slide 92
- Slide 93
- Image segmentation algorithm
- Slide 95
- Filtering examples
- Segmentation examples
- Mean-shift in high dimensions
- LSH-based data structure
- Choosing the optimal K and L
- Slide 101
- Choosing optimal K and L
- Slide 103
- Data driven partitions
- Additional speedup
- Speedup results
- Food for thought
- A thought for foodhellip
- Summary
- Slide 110
- Thanks
-
![Page 61: K-Nearest Neighbors Search in High Dimensions Tomer Peled Dan Kushnir Tell me who your neighbors are, and I'll know who you are.](https://reader036.fdocuments.us/reader036/viewer/2022062421/56649c765503460f94929f53/html5/thumbnails/61.jpg)
hellipParameters selection
K
k
time Candidates verification Candidates extraction
Better Query Time than Spatial Data Structures
Scales well to higher dimensions and larger data size ( Sub-linear dependence )
Predictable running time
Extra storage over-head
Inefficient for data with distances concentrated around average
works best for Hamming distance (although can be generalized to Euclidean space)
In secondary storage linear scan is pretty much all we can do (for high dim)
requires radius r to be fixed in advance
Pros amp Cons
From Pioter Indyk slides
Conclusion
bullbut at the endeverything depends on your data set
bullTry it at homendashVisit
httpwebmiteduandoniwwwLSHindexhtml
ndashEmail Alex AndoniAndonimitedundashTest over your own data
(C code under Red Hat Linux )
LSH - Applicationsbull Searching video clips in databases (Hierarchical Non-Uniform Locality Sensitive
Hashing and Its Application to Video Identificationldquo Yang Ooi Sun)
bull Searching image databases (see the following)
bull Image segmentation (see the following)
bull Image classification (ldquoDiscriminant adaptive Nearest Neighbor Classificationrdquo T Hastie R Tibshirani)
bull Texture classification (see the following)
bull Clustering (see the following)
bull Embedding and manifold learning (LLE and many others)
bull Compression ndash vector quantizationbull Search engines (ldquoLSH Forest SelfTuning Indexes for Similarity Searchrdquo M Bawa T Condie P Ganesanrdquo)
bull Genomics (ldquoEfficient Large-Scale Sequence Comparison by Locality-Sensitive Hashingrdquo J Buhler)
bull In short whenever K-Nearest Neighbors (KNN) are needed
Motivation
bull A variety of procedures in learning require KNN computation
bull KNN search is a computational bottleneck
bull LSH provides a fast approximate solution to the problem
bull LSH requires hash function construction and parameter tunning
Outline
Fast Pose Estimation with Parameter Sensitive Hashing G Shakhnarovich P Viola and T Darrell
bull Finding sensitive hash functions
Mean Shift Based Clustering in HighDimensions A Texture Classification Example
B Georgescu I Shimshoni and P Meer
bull Tuning LSH parametersbull LSH data structure is used for algorithm
speedups
Given an image x what are the parameters θ in this image
ie angles of joints orientation of the body etc1048698
The Problem
Fast Pose Estimation with Parameter Sensitive Hashing
G Shakhnarovich P Viola and T Darrell
i
Ingredients
bull Input query image with unknown angles (parameters)
bull Database of human poses with known anglesbull Image feature extractor ndash edge detector
bull Distance metric in feature space dx
bull Distance metric in angles space
m
i
iid1
2121 )cos(1)(
Example based learning
bull Construct a database of example images with their known angles
bull Given a query image run your favorite feature extractorbull Compute KNN from databasebull Use these KNNs to compute the average angles of the
query
Input queryFind KNN in database of examples
Output Average angles of KNN
Input Query
Features extraction
Processed query
PSH (LSH)
Database of examples
The algorithm flow
LWR (Regression)
Output Match
The image features
B A
Axx 4107 )(
4
3
2
4 0
Image features are multi-scale edge histograms
Feature Extraction PSH LWR
PSH The basic assumption
There are two metric spaces here feature space ( )
and parameter space ( )
We want similarity to be measured in the angles
space whereas LSH works on the feature space
bull Assumption The feature space is closely related to the parameter space
xd
d
Feature Extraction PSH LWR
Insight Manifolds
bull Manifold is a space in which every point has a neighborhood resembling a Euclid space
bull But global structure may be complicated curved
bull For example lines are 1D manifolds planes are 2D manifolds etc
Feature Extraction PSH LWR
Parameters Space (angles)
Feature Space
q
Is this Magic
Parameter Sensitive Hashing (PSH)
The trick
Estimate performance of different hash functions on examples and select those sensitive to
The hash functions are applied in feature space but the KNN are valid in angle space
d
Feature Extraction PSH LWR
Label pairs of examples with similar angles
Define hash functions h on feature space
Feature Extraction PSH LWR
Predict labeling of similarnon-similar examples by using h
Compare labeling
If labeling by h is goodaccept h else change h
PSH as a classification problem
+1 +1 -1 -1
(r=025)
Labels
)1()( if 1
)( if 1y
labeled is
)x()(x examples ofpair A
ij
ji
rd
rd
ji
ji
ji
Feature Extraction PSH LWR
otherwise 1-
T(x) if 1)(
xh T
A binary hash functionfeatures
otherwise 1
if 1ˆ
labels ePredict th
)(xh)(xh)x(xy
jTiTjih
Feature Extraction PSH LWR
Feature
Feature Extraction PSH LWR
sconstraint iesprobabilit with thelabeling true thepredicts that Tbest theFind
themseparateor bin
same in the examplesboth place willTh
)(xT
Local Weighted Regression (LWR)bull Given a query image PSH returns
KNNs
bull LWR uses the KNN to compute a weighted average of the estimated angles of the query
weightdist
iXiixNx
xxdKxgdi
0)(
))(())((minarg0
Feature Extraction PSH LWR
Results
Synthetic data were generated
bull 13 angles 1 for rotation of the torso 12 for joints
bull 150000 images
bull Nuisance parameters added clothing illumination face expression
bull 1775000 example pairs
bull Selected 137 out of 5123 meaningful features (how)
18 bit hash functions (k) 150 hash tables (l)
bull Test on 1000 synthetic examplesbull PSH searched only 34 of the data per query
bull Without selection needed 40 bits and
1000 hash tables
Recall P1 is prob of positive hashP2 is prob of bad hashB is the max number of pts in a bucket
Results ndash real data
bull 800 images
bull Processed by a segmentation algorithm
bull 13 of the data were searched
Results ndash real data
Interesting mismatches
Fast pose estimation - summary
bull Fast way to compute the angles of human body figure
bull Moving from one representation space to another
bull Training a sensitive hash function
bull KNN smart averaging
Food for Thought
bull The basic assumption may be problematic (distance metric representations)
bull The training set should be dense
bull Texture and clutter
bull General some features are more important than others and should be weighted
Food for Thought Point Location in Different Spheres (PLDS)
bull Given n spheres in Rd centered at P=p1hellippn
with radii r1helliprn
bull Goal given a query q preprocess the points in P to find point pi that its sphere lsquocoverrsquo the query q
qpi
ri
Courtesy of Mohamad Hegaze
Motivationbull Clustering high dimensional data by using local
density measurements (eg feature space)bull Statistical curse of dimensionality
sparseness of the databull Computational curse of dimensionality
expensive range queriesbull LSH parameters should be adjusted for optimal
performance
Mean-Shift Based Clustering in High Dimensions A Texture Classification Example
B Georgescu I Shimshoni and P Meer
Outline
bull Mean-shift in a nutshell + examples
Our scope
bull Mean-shift in high dimensions ndash using LSH
bull Speedups1 Finding optimal LSH parameters
2 Data-driven partitions into buckets
3 Additional speedup by using LSH data structure
Mean-Shift in a Nutshellbandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
point
KNN in mean-shift
Bandwidth should be inversely proportional to the density in the region
high density - small bandwidth low density - large bandwidth
Based on kth nearest neighbor of the point
The bandwidth is
Adaptive mean-shift vs non-adaptive
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Image segmentation algorithm1 Input Data in 5D (3 color + 2 xy) or 3D (1 gray +2 xy)2 Resolution controlled by the bandwidth hs (spatial) hr (color)3 Apply filtering
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
3D
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Image segmentation algorithm
original segmented
filtered
Filtering pixel value of the nearest mode
Mean-shift trajectories
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
original squirrel filtered
original baboon filtered
Filtering examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Segmentation examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Mean-shift in high dimensions
Computational curse of dimensionality
Statistical curse of dimensionality
Expensive range queries implemented with LSH
Sparseness of the data variable bandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
LSH-based data structure
bull Choose L random partitionsEach partition includes K pairs
(dkvk)bull For each point we check
kdi vxK
It Partitions the data into cells
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing the optimal K and L
bull For a query q compute smallest number of distances to points in its buckets
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
points extra includemight big toois L ifbut
missed bemight points small toois L If
cell ain points ofnumber smaller k Large
C
l
l
CC
dC
LNN
dKnN
)1(
C
C
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
structure data theof resolution thedetermines
decreases but increases increases L As
C
CC
Choosing optimal K and LDetermine accurately the KNN for m randomly-selected data points
distance (bandwidth)
Choose error threshold
The optimal K and L should satisfy
the approximate distance
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing optimal K and Lbull For each K estimate the error forbull In one run for all Lrsquos find the minimal L satisfying the constraint L(K)bull Minimize time t(KL(K))
minimum
Approximationerror for KL
L(K) for =005 Running timet[KL(K)]
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Data driven partitions
bull In original LSH cut values are random in the range of the databull Suggestion Randomly select a point from the data and use one of its coordinates as the cut value
uniform data driven pointsbucket distribution
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Additional speedup
aggregate)an of typea like is (C mode same
the toconverge willCin points all that Assume
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
C
C
Speedup results
65536 points 1638 points sampled k=100
Food for thought
Low dimension High dimension
A thought for foodhellipbull Choose K L by sample learning or take the traditionalbull Can one estimate K L without samplingbull A thought for food does it help to know the data
dimensionality or the data manifoldbull Intuitively dimensionality implies the number of hash
functions neededbull The catch efficient dimensionality learning requires
KNN
1530 cookieshellip
Summary
bull LSH suggests a compromise on accuracy for the gain of complexity
bull Applications that involve massive data in high dimension require the LSH fast performance
bull Extension of the LSH to different spaces (PSH)
bull Learning the LSH parameters and hash functions for different applications
Conclusion
bull but at the endeverything depends on your data set
bull Try it at homendash Visit
httpwebmiteduandoniwwwLSHindexhtmlndash Email Alex Andoni Andonimitedundash Test over your own data
(C code under Red Hat Linux )
Thanks
bull Ilan Shimshoni (Haifa)
bull Mohamad Hegaze (Weizmann)
bull Alex Andoni (MIT)
bull Mica and Denis
- k-Nearest Neighbors Search in High Dimensions
- Outline
- Nearest Neighbor Search Problem definition
- Applications
- Naiumlve solution
- Common solution
- When to use nearest neighbor
- Nearest Neighbor
- r - Nearest Neighbor
- Slide 10
- The simplest solution
- Quadtree
- Quadtree - structure
- Quadtree - Query
- Quadtree ndash Pitfall1
- Slide 16
- Quadtree ndash pitfall 2
- Space partition based algorithms
- Slide 19
- Curse of dimensionality
- Curse of dimensionality Some intuition
- Slide 22
- Preview
- Hash function
- Slide 25
- Slide 26
- Recall r - Nearest Neighbor
- Locality sensitive hashing
- Slide 29
- Hamming Space
- Slide 31
- L1 to Hamming Space Embedding
- Slide 33
- Construction
- Query
- Alternative intuition random projections
- Slide 37
- Slide 38
- Slide 39
- k samplings
- Repeating
- Repeating L times
- Slide 43
- Secondary hashing
- The above hashing is locality-sensitive
- Slide 46
- Direct L2 solution
- Central limit theorem
- Slide 49
- Slide 50
- Norm Distance
- Slide 52
- The full Hashing
- Slide 54
- Slide 55
- Slide 56
- Generalization P-Stable distribution
- P-Stable summary
- Parameters selection
- Parameters selection hellip
- hellip Parameters selection
- Pros amp Cons
- Conclusion
- LSH - Applications
- Motivation
- Slide 66
- Given an image x what are the parameters θ in this image ie angles of joints orientation of the body etc1048698
- Ingredients
- Example based learning
- Slide 70
- The image features
- PSH The basic assumption
- Insight Manifolds
- Slide 74
- Parameter Sensitive Hashing (PSH)
- Slide 76
- Slide 77
- Slide 78
- Slide 79
- Local Weighted Regression (LWR)
- Results
- Slide 82
- Results ndash real data
- Slide 84
- Slide 85
- Fast pose estimation - summary
- Food for Thought
- Food for Thought Point Location in Different Spheres (PLDS)
- Motivation
- Slide 90
- Slide 91
- Slide 92
- Slide 93
- Image segmentation algorithm
- Slide 95
- Filtering examples
- Segmentation examples
- Mean-shift in high dimensions
- LSH-based data structure
- Choosing the optimal K and L
- Slide 101
- Choosing optimal K and L
- Slide 103
- Data driven partitions
- Additional speedup
- Speedup results
- Food for thought
- A thought for foodhellip
- Summary
- Slide 110
- Thanks
-
![Page 62: K-Nearest Neighbors Search in High Dimensions Tomer Peled Dan Kushnir Tell me who your neighbors are, and I'll know who you are.](https://reader036.fdocuments.us/reader036/viewer/2022062421/56649c765503460f94929f53/html5/thumbnails/62.jpg)
Better Query Time than Spatial Data Structures
Scales well to higher dimensions and larger data size ( Sub-linear dependence )
Predictable running time
Extra storage over-head
Inefficient for data with distances concentrated around average
works best for Hamming distance (although can be generalized to Euclidean space)
In secondary storage linear scan is pretty much all we can do (for high dim)
requires radius r to be fixed in advance
Pros amp Cons
From Pioter Indyk slides
Conclusion
bullbut at the endeverything depends on your data set
bullTry it at homendashVisit
httpwebmiteduandoniwwwLSHindexhtml
ndashEmail Alex AndoniAndonimitedundashTest over your own data
(C code under Red Hat Linux )
LSH - Applicationsbull Searching video clips in databases (Hierarchical Non-Uniform Locality Sensitive
Hashing and Its Application to Video Identificationldquo Yang Ooi Sun)
bull Searching image databases (see the following)
bull Image segmentation (see the following)
bull Image classification (ldquoDiscriminant adaptive Nearest Neighbor Classificationrdquo T Hastie R Tibshirani)
bull Texture classification (see the following)
bull Clustering (see the following)
bull Embedding and manifold learning (LLE and many others)
bull Compression ndash vector quantizationbull Search engines (ldquoLSH Forest SelfTuning Indexes for Similarity Searchrdquo M Bawa T Condie P Ganesanrdquo)
bull Genomics (ldquoEfficient Large-Scale Sequence Comparison by Locality-Sensitive Hashingrdquo J Buhler)
bull In short whenever K-Nearest Neighbors (KNN) are needed
Motivation
bull A variety of procedures in learning require KNN computation
bull KNN search is a computational bottleneck
bull LSH provides a fast approximate solution to the problem
bull LSH requires hash function construction and parameter tunning
Outline
Fast Pose Estimation with Parameter Sensitive Hashing G Shakhnarovich P Viola and T Darrell
bull Finding sensitive hash functions
Mean Shift Based Clustering in HighDimensions A Texture Classification Example
B Georgescu I Shimshoni and P Meer
bull Tuning LSH parametersbull LSH data structure is used for algorithm
speedups
Given an image x what are the parameters θ in this image
ie angles of joints orientation of the body etc1048698
The Problem
Fast Pose Estimation with Parameter Sensitive Hashing
G Shakhnarovich P Viola and T Darrell
i
Ingredients
bull Input query image with unknown angles (parameters)
bull Database of human poses with known anglesbull Image feature extractor ndash edge detector
bull Distance metric in feature space dx
bull Distance metric in angles space
m
i
iid1
2121 )cos(1)(
Example based learning
bull Construct a database of example images with their known angles
bull Given a query image run your favorite feature extractorbull Compute KNN from databasebull Use these KNNs to compute the average angles of the
query
Input queryFind KNN in database of examples
Output Average angles of KNN
Input Query
Features extraction
Processed query
PSH (LSH)
Database of examples
The algorithm flow
LWR (Regression)
Output Match
The image features
B A
Axx 4107 )(
4
3
2
4 0
Image features are multi-scale edge histograms
Feature Extraction PSH LWR
PSH The basic assumption
There are two metric spaces here feature space ( )
and parameter space ( )
We want similarity to be measured in the angles
space whereas LSH works on the feature space
bull Assumption The feature space is closely related to the parameter space
xd
d
Feature Extraction PSH LWR
Insight Manifolds
bull Manifold is a space in which every point has a neighborhood resembling a Euclid space
bull But global structure may be complicated curved
bull For example lines are 1D manifolds planes are 2D manifolds etc
Feature Extraction PSH LWR
Parameters Space (angles)
Feature Space
q
Is this Magic
Parameter Sensitive Hashing (PSH)
The trick
Estimate performance of different hash functions on examples and select those sensitive to
The hash functions are applied in feature space but the KNN are valid in angle space
d
Feature Extraction PSH LWR
Label pairs of examples with similar angles
Define hash functions h on feature space
Feature Extraction PSH LWR
Predict labeling of similarnon-similar examples by using h
Compare labeling
If labeling by h is goodaccept h else change h
PSH as a classification problem
+1 +1 -1 -1
(r=025)
Labels
)1()( if 1
)( if 1y
labeled is
)x()(x examples ofpair A
ij
ji
rd
rd
ji
ji
ji
Feature Extraction PSH LWR
otherwise 1-
T(x) if 1)(
xh T
A binary hash functionfeatures
otherwise 1
if 1ˆ
labels ePredict th
)(xh)(xh)x(xy
jTiTjih
Feature Extraction PSH LWR
Feature
Feature Extraction PSH LWR
sconstraint iesprobabilit with thelabeling true thepredicts that Tbest theFind
themseparateor bin
same in the examplesboth place willTh
)(xT
Local Weighted Regression (LWR)bull Given a query image PSH returns
KNNs
bull LWR uses the KNN to compute a weighted average of the estimated angles of the query
weightdist
iXiixNx
xxdKxgdi
0)(
))(())((minarg0
Feature Extraction PSH LWR
Results
Synthetic data were generated
bull 13 angles 1 for rotation of the torso 12 for joints
bull 150000 images
bull Nuisance parameters added clothing illumination face expression
bull 1775000 example pairs
bull Selected 137 out of 5123 meaningful features (how)
18 bit hash functions (k) 150 hash tables (l)
bull Test on 1000 synthetic examplesbull PSH searched only 34 of the data per query
bull Without selection needed 40 bits and
1000 hash tables
Recall P1 is prob of positive hashP2 is prob of bad hashB is the max number of pts in a bucket
Results ndash real data
bull 800 images
bull Processed by a segmentation algorithm
bull 13 of the data were searched
Results ndash real data
Interesting mismatches
Fast pose estimation - summary
bull Fast way to compute the angles of human body figure
bull Moving from one representation space to another
bull Training a sensitive hash function
bull KNN smart averaging
Food for Thought
bull The basic assumption may be problematic (distance metric representations)
bull The training set should be dense
bull Texture and clutter
bull General some features are more important than others and should be weighted
Food for Thought Point Location in Different Spheres (PLDS)
bull Given n spheres in Rd centered at P=p1hellippn
with radii r1helliprn
bull Goal given a query q preprocess the points in P to find point pi that its sphere lsquocoverrsquo the query q
qpi
ri
Courtesy of Mohamad Hegaze
Motivationbull Clustering high dimensional data by using local
density measurements (eg feature space)bull Statistical curse of dimensionality
sparseness of the databull Computational curse of dimensionality
expensive range queriesbull LSH parameters should be adjusted for optimal
performance
Mean-Shift Based Clustering in High Dimensions A Texture Classification Example
B Georgescu I Shimshoni and P Meer
Outline
bull Mean-shift in a nutshell + examples
Our scope
bull Mean-shift in high dimensions ndash using LSH
bull Speedups1 Finding optimal LSH parameters
2 Data-driven partitions into buckets
3 Additional speedup by using LSH data structure
Mean-Shift in a Nutshellbandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
point
KNN in mean-shift
Bandwidth should be inversely proportional to the density in the region
high density - small bandwidth low density - large bandwidth
Based on kth nearest neighbor of the point
The bandwidth is
Adaptive mean-shift vs non-adaptive
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Image segmentation algorithm1 Input Data in 5D (3 color + 2 xy) or 3D (1 gray +2 xy)2 Resolution controlled by the bandwidth hs (spatial) hr (color)3 Apply filtering
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
3D
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Image segmentation algorithm
original segmented
filtered
Filtering pixel value of the nearest mode
Mean-shift trajectories
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
original squirrel filtered
original baboon filtered
Filtering examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Segmentation examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Mean-shift in high dimensions
Computational curse of dimensionality
Statistical curse of dimensionality
Expensive range queries implemented with LSH
Sparseness of the data variable bandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
LSH-based data structure
bull Choose L random partitionsEach partition includes K pairs
(dkvk)bull For each point we check
kdi vxK
It Partitions the data into cells
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing the optimal K and L
bull For a query q compute smallest number of distances to points in its buckets
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
points extra includemight big toois L ifbut
missed bemight points small toois L If
cell ain points ofnumber smaller k Large
C
l
l
CC
dC
LNN
dKnN
)1(
C
C
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
structure data theof resolution thedetermines
decreases but increases increases L As
C
CC
Choosing optimal K and LDetermine accurately the KNN for m randomly-selected data points
distance (bandwidth)
Choose error threshold
The optimal K and L should satisfy
the approximate distance
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing optimal K and Lbull For each K estimate the error forbull In one run for all Lrsquos find the minimal L satisfying the constraint L(K)bull Minimize time t(KL(K))
minimum
Approximationerror for KL
L(K) for =005 Running timet[KL(K)]
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Data driven partitions
bull In original LSH cut values are random in the range of the databull Suggestion Randomly select a point from the data and use one of its coordinates as the cut value
uniform data driven pointsbucket distribution
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Additional speedup
aggregate)an of typea like is (C mode same
the toconverge willCin points all that Assume
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
C
C
Speedup results
65536 points 1638 points sampled k=100
Food for thought
Low dimension High dimension
A thought for foodhellipbull Choose K L by sample learning or take the traditionalbull Can one estimate K L without samplingbull A thought for food does it help to know the data
dimensionality or the data manifoldbull Intuitively dimensionality implies the number of hash
functions neededbull The catch efficient dimensionality learning requires
KNN
1530 cookieshellip
Summary
bull LSH suggests a compromise on accuracy for the gain of complexity
bull Applications that involve massive data in high dimension require the LSH fast performance
bull Extension of the LSH to different spaces (PSH)
bull Learning the LSH parameters and hash functions for different applications
Conclusion
bull but at the endeverything depends on your data set
bull Try it at homendash Visit
httpwebmiteduandoniwwwLSHindexhtmlndash Email Alex Andoni Andonimitedundash Test over your own data
(C code under Red Hat Linux )
Thanks
bull Ilan Shimshoni (Haifa)
bull Mohamad Hegaze (Weizmann)
bull Alex Andoni (MIT)
bull Mica and Denis
- k-Nearest Neighbors Search in High Dimensions
- Outline
- Nearest Neighbor Search Problem definition
- Applications
- Naiumlve solution
- Common solution
- When to use nearest neighbor
- Nearest Neighbor
- r - Nearest Neighbor
- Slide 10
- The simplest solution
- Quadtree
- Quadtree - structure
- Quadtree - Query
- Quadtree ndash Pitfall1
- Slide 16
- Quadtree ndash pitfall 2
- Space partition based algorithms
- Slide 19
- Curse of dimensionality
- Curse of dimensionality Some intuition
- Slide 22
- Preview
- Hash function
- Slide 25
- Slide 26
- Recall r - Nearest Neighbor
- Locality sensitive hashing
- Slide 29
- Hamming Space
- Slide 31
- L1 to Hamming Space Embedding
- Slide 33
- Construction
- Query
- Alternative intuition random projections
- Slide 37
- Slide 38
- Slide 39
- k samplings
- Repeating
- Repeating L times
- Slide 43
- Secondary hashing
- The above hashing is locality-sensitive
- Slide 46
- Direct L2 solution
- Central limit theorem
- Slide 49
- Slide 50
- Norm Distance
- Slide 52
- The full Hashing
- Slide 54
- Slide 55
- Slide 56
- Generalization P-Stable distribution
- P-Stable summary
- Parameters selection
- Parameters selection hellip
- hellip Parameters selection
- Pros amp Cons
- Conclusion
- LSH - Applications
- Motivation
- Slide 66
- Given an image x what are the parameters θ in this image ie angles of joints orientation of the body etc1048698
- Ingredients
- Example based learning
- Slide 70
- The image features
- PSH The basic assumption
- Insight Manifolds
- Slide 74
- Parameter Sensitive Hashing (PSH)
- Slide 76
- Slide 77
- Slide 78
- Slide 79
- Local Weighted Regression (LWR)
- Results
- Slide 82
- Results ndash real data
- Slide 84
- Slide 85
- Fast pose estimation - summary
- Food for Thought
- Food for Thought Point Location in Different Spheres (PLDS)
- Motivation
- Slide 90
- Slide 91
- Slide 92
- Slide 93
- Image segmentation algorithm
- Slide 95
- Filtering examples
- Segmentation examples
- Mean-shift in high dimensions
- LSH-based data structure
- Choosing the optimal K and L
- Slide 101
- Choosing optimal K and L
- Slide 103
- Data driven partitions
- Additional speedup
- Speedup results
- Food for thought
- A thought for foodhellip
- Summary
- Slide 110
- Thanks
-
![Page 63: K-Nearest Neighbors Search in High Dimensions Tomer Peled Dan Kushnir Tell me who your neighbors are, and I'll know who you are.](https://reader036.fdocuments.us/reader036/viewer/2022062421/56649c765503460f94929f53/html5/thumbnails/63.jpg)
Conclusion
bullbut at the endeverything depends on your data set
bullTry it at homendashVisit
httpwebmiteduandoniwwwLSHindexhtml
ndashEmail Alex AndoniAndonimitedundashTest over your own data
(C code under Red Hat Linux )
LSH - Applicationsbull Searching video clips in databases (Hierarchical Non-Uniform Locality Sensitive
Hashing and Its Application to Video Identificationldquo Yang Ooi Sun)
bull Searching image databases (see the following)
bull Image segmentation (see the following)
bull Image classification (ldquoDiscriminant adaptive Nearest Neighbor Classificationrdquo T Hastie R Tibshirani)
bull Texture classification (see the following)
bull Clustering (see the following)
bull Embedding and manifold learning (LLE and many others)
bull Compression ndash vector quantizationbull Search engines (ldquoLSH Forest SelfTuning Indexes for Similarity Searchrdquo M Bawa T Condie P Ganesanrdquo)
bull Genomics (ldquoEfficient Large-Scale Sequence Comparison by Locality-Sensitive Hashingrdquo J Buhler)
bull In short whenever K-Nearest Neighbors (KNN) are needed
Motivation
bull A variety of procedures in learning require KNN computation
bull KNN search is a computational bottleneck
bull LSH provides a fast approximate solution to the problem
bull LSH requires hash function construction and parameter tunning
Outline
Fast Pose Estimation with Parameter Sensitive Hashing G Shakhnarovich P Viola and T Darrell
bull Finding sensitive hash functions
Mean Shift Based Clustering in HighDimensions A Texture Classification Example
B Georgescu I Shimshoni and P Meer
bull Tuning LSH parametersbull LSH data structure is used for algorithm
speedups
Given an image x what are the parameters θ in this image
ie angles of joints orientation of the body etc1048698
The Problem
Fast Pose Estimation with Parameter Sensitive Hashing
G Shakhnarovich P Viola and T Darrell
i
Ingredients
bull Input query image with unknown angles (parameters)
bull Database of human poses with known anglesbull Image feature extractor ndash edge detector
bull Distance metric in feature space dx
bull Distance metric in angles space
m
i
iid1
2121 )cos(1)(
Example based learning
bull Construct a database of example images with their known angles
bull Given a query image run your favorite feature extractorbull Compute KNN from databasebull Use these KNNs to compute the average angles of the
query
Input queryFind KNN in database of examples
Output Average angles of KNN
Input Query
Features extraction
Processed query
PSH (LSH)
Database of examples
The algorithm flow
LWR (Regression)
Output Match
The image features
B A
Axx 4107 )(
4
3
2
4 0
Image features are multi-scale edge histograms
Feature Extraction PSH LWR
PSH The basic assumption
There are two metric spaces here feature space ( )
and parameter space ( )
We want similarity to be measured in the angles
space whereas LSH works on the feature space
bull Assumption The feature space is closely related to the parameter space
xd
d
Feature Extraction PSH LWR
Insight Manifolds
bull Manifold is a space in which every point has a neighborhood resembling a Euclid space
bull But global structure may be complicated curved
bull For example lines are 1D manifolds planes are 2D manifolds etc
Feature Extraction PSH LWR
Parameters Space (angles)
Feature Space
q
Is this Magic
Parameter Sensitive Hashing (PSH)
The trick
Estimate performance of different hash functions on examples and select those sensitive to
The hash functions are applied in feature space but the KNN are valid in angle space
d
Feature Extraction PSH LWR
Label pairs of examples with similar angles
Define hash functions h on feature space
Feature Extraction PSH LWR
Predict labeling of similarnon-similar examples by using h
Compare labeling
If labeling by h is goodaccept h else change h
PSH as a classification problem
+1 +1 -1 -1
(r=025)
Labels
)1()( if 1
)( if 1y
labeled is
)x()(x examples ofpair A
ij
ji
rd
rd
ji
ji
ji
Feature Extraction PSH LWR
otherwise 1-
T(x) if 1)(
xh T
A binary hash functionfeatures
otherwise 1
if 1ˆ
labels ePredict th
)(xh)(xh)x(xy
jTiTjih
Feature Extraction PSH LWR
Feature
Feature Extraction PSH LWR
sconstraint iesprobabilit with thelabeling true thepredicts that Tbest theFind
themseparateor bin
same in the examplesboth place willTh
)(xT
Local Weighted Regression (LWR)bull Given a query image PSH returns
KNNs
bull LWR uses the KNN to compute a weighted average of the estimated angles of the query
weightdist
iXiixNx
xxdKxgdi
0)(
))(())((minarg0
Feature Extraction PSH LWR
Results
Synthetic data were generated
bull 13 angles 1 for rotation of the torso 12 for joints
bull 150000 images
bull Nuisance parameters added clothing illumination face expression
bull 1775000 example pairs
bull Selected 137 out of 5123 meaningful features (how)
18 bit hash functions (k) 150 hash tables (l)
bull Test on 1000 synthetic examplesbull PSH searched only 34 of the data per query
bull Without selection needed 40 bits and
1000 hash tables
Recall P1 is prob of positive hashP2 is prob of bad hashB is the max number of pts in a bucket
Results ndash real data
bull 800 images
bull Processed by a segmentation algorithm
bull 13 of the data were searched
Results ndash real data
Interesting mismatches
Fast pose estimation - summary
bull Fast way to compute the angles of human body figure
bull Moving from one representation space to another
bull Training a sensitive hash function
bull KNN smart averaging
Food for Thought
bull The basic assumption may be problematic (distance metric representations)
bull The training set should be dense
bull Texture and clutter
bull General some features are more important than others and should be weighted
Food for Thought Point Location in Different Spheres (PLDS)
bull Given n spheres in Rd centered at P=p1hellippn
with radii r1helliprn
bull Goal given a query q preprocess the points in P to find point pi that its sphere lsquocoverrsquo the query q
qpi
ri
Courtesy of Mohamad Hegaze
Motivationbull Clustering high dimensional data by using local
density measurements (eg feature space)bull Statistical curse of dimensionality
sparseness of the databull Computational curse of dimensionality
expensive range queriesbull LSH parameters should be adjusted for optimal
performance
Mean-Shift Based Clustering in High Dimensions A Texture Classification Example
B Georgescu I Shimshoni and P Meer
Outline
bull Mean-shift in a nutshell + examples
Our scope
bull Mean-shift in high dimensions ndash using LSH
bull Speedups1 Finding optimal LSH parameters
2 Data-driven partitions into buckets
3 Additional speedup by using LSH data structure
Mean-Shift in a Nutshellbandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
point
KNN in mean-shift
Bandwidth should be inversely proportional to the density in the region
high density - small bandwidth low density - large bandwidth
Based on kth nearest neighbor of the point
The bandwidth is
Adaptive mean-shift vs non-adaptive
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Image segmentation algorithm1 Input Data in 5D (3 color + 2 xy) or 3D (1 gray +2 xy)2 Resolution controlled by the bandwidth hs (spatial) hr (color)3 Apply filtering
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
3D
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Image segmentation algorithm
original segmented
filtered
Filtering pixel value of the nearest mode
Mean-shift trajectories
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
original squirrel filtered
original baboon filtered
Filtering examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Segmentation examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Mean-shift in high dimensions
Computational curse of dimensionality
Statistical curse of dimensionality
Expensive range queries implemented with LSH
Sparseness of the data variable bandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
LSH-based data structure
bull Choose L random partitionsEach partition includes K pairs
(dkvk)bull For each point we check
kdi vxK
It Partitions the data into cells
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing the optimal K and L
bull For a query q compute smallest number of distances to points in its buckets
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
points extra includemight big toois L ifbut
missed bemight points small toois L If
cell ain points ofnumber smaller k Large
C
l
l
CC
dC
LNN
dKnN
)1(
C
C
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
structure data theof resolution thedetermines
decreases but increases increases L As
C
CC
Choosing optimal K and LDetermine accurately the KNN for m randomly-selected data points
distance (bandwidth)
Choose error threshold
The optimal K and L should satisfy
the approximate distance
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing optimal K and Lbull For each K estimate the error forbull In one run for all Lrsquos find the minimal L satisfying the constraint L(K)bull Minimize time t(KL(K))
minimum
Approximationerror for KL
L(K) for =005 Running timet[KL(K)]
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Data driven partitions
bull In original LSH cut values are random in the range of the databull Suggestion Randomly select a point from the data and use one of its coordinates as the cut value
uniform data driven pointsbucket distribution
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Additional speedup
aggregate)an of typea like is (C mode same
the toconverge willCin points all that Assume
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
C
C
Speedup results
65536 points 1638 points sampled k=100
Food for thought
Low dimension High dimension
A thought for foodhellipbull Choose K L by sample learning or take the traditionalbull Can one estimate K L without samplingbull A thought for food does it help to know the data
dimensionality or the data manifoldbull Intuitively dimensionality implies the number of hash
functions neededbull The catch efficient dimensionality learning requires
KNN
1530 cookieshellip
Summary
bull LSH suggests a compromise on accuracy for the gain of complexity
bull Applications that involve massive data in high dimension require the LSH fast performance
bull Extension of the LSH to different spaces (PSH)
bull Learning the LSH parameters and hash functions for different applications
Conclusion
bull but at the endeverything depends on your data set
bull Try it at homendash Visit
httpwebmiteduandoniwwwLSHindexhtmlndash Email Alex Andoni Andonimitedundash Test over your own data
(C code under Red Hat Linux )
Thanks
bull Ilan Shimshoni (Haifa)
bull Mohamad Hegaze (Weizmann)
bull Alex Andoni (MIT)
bull Mica and Denis
- k-Nearest Neighbors Search in High Dimensions
- Outline
- Nearest Neighbor Search Problem definition
- Applications
- Naiumlve solution
- Common solution
- When to use nearest neighbor
- Nearest Neighbor
- r - Nearest Neighbor
- Slide 10
- The simplest solution
- Quadtree
- Quadtree - structure
- Quadtree - Query
- Quadtree ndash Pitfall1
- Slide 16
- Quadtree ndash pitfall 2
- Space partition based algorithms
- Slide 19
- Curse of dimensionality
- Curse of dimensionality Some intuition
- Slide 22
- Preview
- Hash function
- Slide 25
- Slide 26
- Recall r - Nearest Neighbor
- Locality sensitive hashing
- Slide 29
- Hamming Space
- Slide 31
- L1 to Hamming Space Embedding
- Slide 33
- Construction
- Query
- Alternative intuition random projections
- Slide 37
- Slide 38
- Slide 39
- k samplings
- Repeating
- Repeating L times
- Slide 43
- Secondary hashing
- The above hashing is locality-sensitive
- Slide 46
- Direct L2 solution
- Central limit theorem
- Slide 49
- Slide 50
- Norm Distance
- Slide 52
- The full Hashing
- Slide 54
- Slide 55
- Slide 56
- Generalization P-Stable distribution
- P-Stable summary
- Parameters selection
- Parameters selection hellip
- hellip Parameters selection
- Pros amp Cons
- Conclusion
- LSH - Applications
- Motivation
- Slide 66
- Given an image x what are the parameters θ in this image ie angles of joints orientation of the body etc1048698
- Ingredients
- Example based learning
- Slide 70
- The image features
- PSH The basic assumption
- Insight Manifolds
- Slide 74
- Parameter Sensitive Hashing (PSH)
- Slide 76
- Slide 77
- Slide 78
- Slide 79
- Local Weighted Regression (LWR)
- Results
- Slide 82
- Results ndash real data
- Slide 84
- Slide 85
- Fast pose estimation - summary
- Food for Thought
- Food for Thought Point Location in Different Spheres (PLDS)
- Motivation
- Slide 90
- Slide 91
- Slide 92
- Slide 93
- Image segmentation algorithm
- Slide 95
- Filtering examples
- Segmentation examples
- Mean-shift in high dimensions
- LSH-based data structure
- Choosing the optimal K and L
- Slide 101
- Choosing optimal K and L
- Slide 103
- Data driven partitions
- Additional speedup
- Speedup results
- Food for thought
- A thought for foodhellip
- Summary
- Slide 110
- Thanks
-
![Page 64: K-Nearest Neighbors Search in High Dimensions Tomer Peled Dan Kushnir Tell me who your neighbors are, and I'll know who you are.](https://reader036.fdocuments.us/reader036/viewer/2022062421/56649c765503460f94929f53/html5/thumbnails/64.jpg)
LSH - Applicationsbull Searching video clips in databases (Hierarchical Non-Uniform Locality Sensitive
Hashing and Its Application to Video Identificationldquo Yang Ooi Sun)
bull Searching image databases (see the following)
bull Image segmentation (see the following)
bull Image classification (ldquoDiscriminant adaptive Nearest Neighbor Classificationrdquo T Hastie R Tibshirani)
bull Texture classification (see the following)
bull Clustering (see the following)
bull Embedding and manifold learning (LLE and many others)
bull Compression ndash vector quantizationbull Search engines (ldquoLSH Forest SelfTuning Indexes for Similarity Searchrdquo M Bawa T Condie P Ganesanrdquo)
bull Genomics (ldquoEfficient Large-Scale Sequence Comparison by Locality-Sensitive Hashingrdquo J Buhler)
bull In short whenever K-Nearest Neighbors (KNN) are needed
Motivation
bull A variety of procedures in learning require KNN computation
bull KNN search is a computational bottleneck
bull LSH provides a fast approximate solution to the problem
bull LSH requires hash function construction and parameter tunning
Outline
Fast Pose Estimation with Parameter Sensitive Hashing G Shakhnarovich P Viola and T Darrell
bull Finding sensitive hash functions
Mean Shift Based Clustering in HighDimensions A Texture Classification Example
B Georgescu I Shimshoni and P Meer
bull Tuning LSH parametersbull LSH data structure is used for algorithm
speedups
Given an image x what are the parameters θ in this image
ie angles of joints orientation of the body etc1048698
The Problem
Fast Pose Estimation with Parameter Sensitive Hashing
G Shakhnarovich P Viola and T Darrell
i
Ingredients
bull Input query image with unknown angles (parameters)
bull Database of human poses with known anglesbull Image feature extractor ndash edge detector
bull Distance metric in feature space dx
bull Distance metric in angles space
m
i
iid1
2121 )cos(1)(
Example based learning
bull Construct a database of example images with their known angles
bull Given a query image run your favorite feature extractorbull Compute KNN from databasebull Use these KNNs to compute the average angles of the
query
Input queryFind KNN in database of examples
Output Average angles of KNN
Input Query
Features extraction
Processed query
PSH (LSH)
Database of examples
The algorithm flow
LWR (Regression)
Output Match
The image features
B A
Axx 4107 )(
4
3
2
4 0
Image features are multi-scale edge histograms
Feature Extraction PSH LWR
PSH The basic assumption
There are two metric spaces here feature space ( )
and parameter space ( )
We want similarity to be measured in the angles
space whereas LSH works on the feature space
bull Assumption The feature space is closely related to the parameter space
xd
d
Feature Extraction PSH LWR
Insight Manifolds
bull Manifold is a space in which every point has a neighborhood resembling a Euclid space
bull But global structure may be complicated curved
bull For example lines are 1D manifolds planes are 2D manifolds etc
Feature Extraction PSH LWR
Parameters Space (angles)
Feature Space
q
Is this Magic
Parameter Sensitive Hashing (PSH)
The trick
Estimate performance of different hash functions on examples and select those sensitive to
The hash functions are applied in feature space but the KNN are valid in angle space
d
Feature Extraction PSH LWR
Label pairs of examples with similar angles
Define hash functions h on feature space
Feature Extraction PSH LWR
Predict labeling of similarnon-similar examples by using h
Compare labeling
If labeling by h is goodaccept h else change h
PSH as a classification problem
+1 +1 -1 -1
(r=025)
Labels
)1()( if 1
)( if 1y
labeled is
)x()(x examples ofpair A
ij
ji
rd
rd
ji
ji
ji
Feature Extraction PSH LWR
otherwise 1-
T(x) if 1)(
xh T
A binary hash functionfeatures
otherwise 1
if 1ˆ
labels ePredict th
)(xh)(xh)x(xy
jTiTjih
Feature Extraction PSH LWR
Feature
Feature Extraction PSH LWR
sconstraint iesprobabilit with thelabeling true thepredicts that Tbest theFind
themseparateor bin
same in the examplesboth place willTh
)(xT
Local Weighted Regression (LWR)bull Given a query image PSH returns
KNNs
bull LWR uses the KNN to compute a weighted average of the estimated angles of the query
weightdist
iXiixNx
xxdKxgdi
0)(
))(())((minarg0
Feature Extraction PSH LWR
Results
Synthetic data were generated
bull 13 angles 1 for rotation of the torso 12 for joints
bull 150000 images
bull Nuisance parameters added clothing illumination face expression
bull 1775000 example pairs
bull Selected 137 out of 5123 meaningful features (how)
18 bit hash functions (k) 150 hash tables (l)
bull Test on 1000 synthetic examplesbull PSH searched only 34 of the data per query
bull Without selection needed 40 bits and
1000 hash tables
Recall P1 is prob of positive hashP2 is prob of bad hashB is the max number of pts in a bucket
Results ndash real data
bull 800 images
bull Processed by a segmentation algorithm
bull 13 of the data were searched
Results ndash real data
Interesting mismatches
Fast pose estimation - summary
bull Fast way to compute the angles of human body figure
bull Moving from one representation space to another
bull Training a sensitive hash function
bull KNN smart averaging
Food for Thought
bull The basic assumption may be problematic (distance metric representations)
bull The training set should be dense
bull Texture and clutter
bull General some features are more important than others and should be weighted
Food for Thought Point Location in Different Spheres (PLDS)
bull Given n spheres in Rd centered at P=p1hellippn
with radii r1helliprn
bull Goal given a query q preprocess the points in P to find point pi that its sphere lsquocoverrsquo the query q
qpi
ri
Courtesy of Mohamad Hegaze
Motivationbull Clustering high dimensional data by using local
density measurements (eg feature space)bull Statistical curse of dimensionality
sparseness of the databull Computational curse of dimensionality
expensive range queriesbull LSH parameters should be adjusted for optimal
performance
Mean-Shift Based Clustering in High Dimensions A Texture Classification Example
B Georgescu I Shimshoni and P Meer
Outline
bull Mean-shift in a nutshell + examples
Our scope
bull Mean-shift in high dimensions ndash using LSH
bull Speedups1 Finding optimal LSH parameters
2 Data-driven partitions into buckets
3 Additional speedup by using LSH data structure
Mean-Shift in a Nutshellbandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
point
KNN in mean-shift
Bandwidth should be inversely proportional to the density in the region
high density - small bandwidth low density - large bandwidth
Based on kth nearest neighbor of the point
The bandwidth is
Adaptive mean-shift vs non-adaptive
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Image segmentation algorithm1 Input Data in 5D (3 color + 2 xy) or 3D (1 gray +2 xy)2 Resolution controlled by the bandwidth hs (spatial) hr (color)3 Apply filtering
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
3D
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Image segmentation algorithm
original segmented
filtered
Filtering pixel value of the nearest mode
Mean-shift trajectories
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
original squirrel filtered
original baboon filtered
Filtering examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Segmentation examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Mean-shift in high dimensions
Computational curse of dimensionality
Statistical curse of dimensionality
Expensive range queries implemented with LSH
Sparseness of the data variable bandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
LSH-based data structure
bull Choose L random partitionsEach partition includes K pairs
(dkvk)bull For each point we check
kdi vxK
It Partitions the data into cells
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing the optimal K and L
bull For a query q compute smallest number of distances to points in its buckets
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
points extra includemight big toois L ifbut
missed bemight points small toois L If
cell ain points ofnumber smaller k Large
C
l
l
CC
dC
LNN
dKnN
)1(
C
C
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
structure data theof resolution thedetermines
decreases but increases increases L As
C
CC
Choosing optimal K and LDetermine accurately the KNN for m randomly-selected data points
distance (bandwidth)
Choose error threshold
The optimal K and L should satisfy
the approximate distance
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing optimal K and Lbull For each K estimate the error forbull In one run for all Lrsquos find the minimal L satisfying the constraint L(K)bull Minimize time t(KL(K))
minimum
Approximationerror for KL
L(K) for =005 Running timet[KL(K)]
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Data driven partitions
bull In original LSH cut values are random in the range of the databull Suggestion Randomly select a point from the data and use one of its coordinates as the cut value
uniform data driven pointsbucket distribution
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Additional speedup
aggregate)an of typea like is (C mode same
the toconverge willCin points all that Assume
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
C
C
Speedup results
65536 points 1638 points sampled k=100
Food for thought
Low dimension High dimension
A thought for foodhellipbull Choose K L by sample learning or take the traditionalbull Can one estimate K L without samplingbull A thought for food does it help to know the data
dimensionality or the data manifoldbull Intuitively dimensionality implies the number of hash
functions neededbull The catch efficient dimensionality learning requires
KNN
1530 cookieshellip
Summary
bull LSH suggests a compromise on accuracy for the gain of complexity
bull Applications that involve massive data in high dimension require the LSH fast performance
bull Extension of the LSH to different spaces (PSH)
bull Learning the LSH parameters and hash functions for different applications
Conclusion
bull but at the endeverything depends on your data set
bull Try it at homendash Visit
httpwebmiteduandoniwwwLSHindexhtmlndash Email Alex Andoni Andonimitedundash Test over your own data
(C code under Red Hat Linux )
Thanks
bull Ilan Shimshoni (Haifa)
bull Mohamad Hegaze (Weizmann)
bull Alex Andoni (MIT)
bull Mica and Denis
- k-Nearest Neighbors Search in High Dimensions
- Outline
- Nearest Neighbor Search Problem definition
- Applications
- Naiumlve solution
- Common solution
- When to use nearest neighbor
- Nearest Neighbor
- r - Nearest Neighbor
- Slide 10
- The simplest solution
- Quadtree
- Quadtree - structure
- Quadtree - Query
- Quadtree ndash Pitfall1
- Slide 16
- Quadtree ndash pitfall 2
- Space partition based algorithms
- Slide 19
- Curse of dimensionality
- Curse of dimensionality Some intuition
- Slide 22
- Preview
- Hash function
- Slide 25
- Slide 26
- Recall r - Nearest Neighbor
- Locality sensitive hashing
- Slide 29
- Hamming Space
- Slide 31
- L1 to Hamming Space Embedding
- Slide 33
- Construction
- Query
- Alternative intuition random projections
- Slide 37
- Slide 38
- Slide 39
- k samplings
- Repeating
- Repeating L times
- Slide 43
- Secondary hashing
- The above hashing is locality-sensitive
- Slide 46
- Direct L2 solution
- Central limit theorem
- Slide 49
- Slide 50
- Norm Distance
- Slide 52
- The full Hashing
- Slide 54
- Slide 55
- Slide 56
- Generalization P-Stable distribution
- P-Stable summary
- Parameters selection
- Parameters selection hellip
- hellip Parameters selection
- Pros amp Cons
- Conclusion
- LSH - Applications
- Motivation
- Slide 66
- Given an image x what are the parameters θ in this image ie angles of joints orientation of the body etc1048698
- Ingredients
- Example based learning
- Slide 70
- The image features
- PSH The basic assumption
- Insight Manifolds
- Slide 74
- Parameter Sensitive Hashing (PSH)
- Slide 76
- Slide 77
- Slide 78
- Slide 79
- Local Weighted Regression (LWR)
- Results
- Slide 82
- Results ndash real data
- Slide 84
- Slide 85
- Fast pose estimation - summary
- Food for Thought
- Food for Thought Point Location in Different Spheres (PLDS)
- Motivation
- Slide 90
- Slide 91
- Slide 92
- Slide 93
- Image segmentation algorithm
- Slide 95
- Filtering examples
- Segmentation examples
- Mean-shift in high dimensions
- LSH-based data structure
- Choosing the optimal K and L
- Slide 101
- Choosing optimal K and L
- Slide 103
- Data driven partitions
- Additional speedup
- Speedup results
- Food for thought
- A thought for foodhellip
- Summary
- Slide 110
- Thanks
-
![Page 65: K-Nearest Neighbors Search in High Dimensions Tomer Peled Dan Kushnir Tell me who your neighbors are, and I'll know who you are.](https://reader036.fdocuments.us/reader036/viewer/2022062421/56649c765503460f94929f53/html5/thumbnails/65.jpg)
Motivation
bull A variety of procedures in learning require KNN computation
bull KNN search is a computational bottleneck
bull LSH provides a fast approximate solution to the problem
bull LSH requires hash function construction and parameter tunning
Outline
Fast Pose Estimation with Parameter Sensitive Hashing G Shakhnarovich P Viola and T Darrell
bull Finding sensitive hash functions
Mean Shift Based Clustering in HighDimensions A Texture Classification Example
B Georgescu I Shimshoni and P Meer
bull Tuning LSH parametersbull LSH data structure is used for algorithm
speedups
Given an image x what are the parameters θ in this image
ie angles of joints orientation of the body etc1048698
The Problem
Fast Pose Estimation with Parameter Sensitive Hashing
G Shakhnarovich P Viola and T Darrell
i
Ingredients
bull Input query image with unknown angles (parameters)
bull Database of human poses with known anglesbull Image feature extractor ndash edge detector
bull Distance metric in feature space dx
bull Distance metric in angles space
m
i
iid1
2121 )cos(1)(
Example based learning
bull Construct a database of example images with their known angles
bull Given a query image run your favorite feature extractorbull Compute KNN from databasebull Use these KNNs to compute the average angles of the
query
Input queryFind KNN in database of examples
Output Average angles of KNN
Input Query
Features extraction
Processed query
PSH (LSH)
Database of examples
The algorithm flow
LWR (Regression)
Output Match
The image features
B A
Axx 4107 )(
4
3
2
4 0
Image features are multi-scale edge histograms
Feature Extraction PSH LWR
PSH The basic assumption
There are two metric spaces here feature space ( )
and parameter space ( )
We want similarity to be measured in the angles
space whereas LSH works on the feature space
bull Assumption The feature space is closely related to the parameter space
xd
d
Feature Extraction PSH LWR
Insight Manifolds
bull Manifold is a space in which every point has a neighborhood resembling a Euclid space
bull But global structure may be complicated curved
bull For example lines are 1D manifolds planes are 2D manifolds etc
Feature Extraction PSH LWR
Parameters Space (angles)
Feature Space
q
Is this Magic
Parameter Sensitive Hashing (PSH)
The trick
Estimate performance of different hash functions on examples and select those sensitive to
The hash functions are applied in feature space but the KNN are valid in angle space
d
Feature Extraction PSH LWR
Label pairs of examples with similar angles
Define hash functions h on feature space
Feature Extraction PSH LWR
Predict labeling of similarnon-similar examples by using h
Compare labeling
If labeling by h is goodaccept h else change h
PSH as a classification problem
+1 +1 -1 -1
(r=025)
Labels
)1()( if 1
)( if 1y
labeled is
)x()(x examples ofpair A
ij
ji
rd
rd
ji
ji
ji
Feature Extraction PSH LWR
otherwise 1-
T(x) if 1)(
xh T
A binary hash functionfeatures
otherwise 1
if 1ˆ
labels ePredict th
)(xh)(xh)x(xy
jTiTjih
Feature Extraction PSH LWR
Feature
Feature Extraction PSH LWR
sconstraint iesprobabilit with thelabeling true thepredicts that Tbest theFind
themseparateor bin
same in the examplesboth place willTh
)(xT
Local Weighted Regression (LWR)bull Given a query image PSH returns
KNNs
bull LWR uses the KNN to compute a weighted average of the estimated angles of the query
weightdist
iXiixNx
xxdKxgdi
0)(
))(())((minarg0
Feature Extraction PSH LWR
Results
Synthetic data were generated
bull 13 angles 1 for rotation of the torso 12 for joints
bull 150000 images
bull Nuisance parameters added clothing illumination face expression
bull 1775000 example pairs
bull Selected 137 out of 5123 meaningful features (how)
18 bit hash functions (k) 150 hash tables (l)
bull Test on 1000 synthetic examplesbull PSH searched only 34 of the data per query
bull Without selection needed 40 bits and
1000 hash tables
Recall P1 is prob of positive hashP2 is prob of bad hashB is the max number of pts in a bucket
Results ndash real data
bull 800 images
bull Processed by a segmentation algorithm
bull 13 of the data were searched
Results ndash real data
Interesting mismatches
Fast pose estimation - summary
bull Fast way to compute the angles of human body figure
bull Moving from one representation space to another
bull Training a sensitive hash function
bull KNN smart averaging
Food for Thought
bull The basic assumption may be problematic (distance metric representations)
bull The training set should be dense
bull Texture and clutter
bull General some features are more important than others and should be weighted
Food for Thought Point Location in Different Spheres (PLDS)
bull Given n spheres in Rd centered at P=p1hellippn
with radii r1helliprn
bull Goal given a query q preprocess the points in P to find point pi that its sphere lsquocoverrsquo the query q
qpi
ri
Courtesy of Mohamad Hegaze
Motivationbull Clustering high dimensional data by using local
density measurements (eg feature space)bull Statistical curse of dimensionality
sparseness of the databull Computational curse of dimensionality
expensive range queriesbull LSH parameters should be adjusted for optimal
performance
Mean-Shift Based Clustering in High Dimensions A Texture Classification Example
B Georgescu I Shimshoni and P Meer
Outline
bull Mean-shift in a nutshell + examples
Our scope
bull Mean-shift in high dimensions ndash using LSH
bull Speedups1 Finding optimal LSH parameters
2 Data-driven partitions into buckets
3 Additional speedup by using LSH data structure
Mean-Shift in a Nutshellbandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
point
KNN in mean-shift
Bandwidth should be inversely proportional to the density in the region
high density - small bandwidth low density - large bandwidth
Based on kth nearest neighbor of the point
The bandwidth is
Adaptive mean-shift vs non-adaptive
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Image segmentation algorithm1 Input Data in 5D (3 color + 2 xy) or 3D (1 gray +2 xy)2 Resolution controlled by the bandwidth hs (spatial) hr (color)3 Apply filtering
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
3D
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Image segmentation algorithm
original segmented
filtered
Filtering pixel value of the nearest mode
Mean-shift trajectories
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
original squirrel filtered
original baboon filtered
Filtering examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Segmentation examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Mean-shift in high dimensions
Computational curse of dimensionality
Statistical curse of dimensionality
Expensive range queries implemented with LSH
Sparseness of the data variable bandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
LSH-based data structure
bull Choose L random partitionsEach partition includes K pairs
(dkvk)bull For each point we check
kdi vxK
It Partitions the data into cells
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing the optimal K and L
bull For a query q compute smallest number of distances to points in its buckets
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
points extra includemight big toois L ifbut
missed bemight points small toois L If
cell ain points ofnumber smaller k Large
C
l
l
CC
dC
LNN
dKnN
)1(
C
C
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
structure data theof resolution thedetermines
decreases but increases increases L As
C
CC
Choosing optimal K and LDetermine accurately the KNN for m randomly-selected data points
distance (bandwidth)
Choose error threshold
The optimal K and L should satisfy
the approximate distance
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing optimal K and Lbull For each K estimate the error forbull In one run for all Lrsquos find the minimal L satisfying the constraint L(K)bull Minimize time t(KL(K))
minimum
Approximationerror for KL
L(K) for =005 Running timet[KL(K)]
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Data driven partitions
bull In original LSH cut values are random in the range of the databull Suggestion Randomly select a point from the data and use one of its coordinates as the cut value
uniform data driven pointsbucket distribution
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Additional speedup
aggregate)an of typea like is (C mode same
the toconverge willCin points all that Assume
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
C
C
Speedup results
65536 points 1638 points sampled k=100
Food for thought
Low dimension High dimension
A thought for foodhellipbull Choose K L by sample learning or take the traditionalbull Can one estimate K L without samplingbull A thought for food does it help to know the data
dimensionality or the data manifoldbull Intuitively dimensionality implies the number of hash
functions neededbull The catch efficient dimensionality learning requires
KNN
1530 cookieshellip
Summary
bull LSH suggests a compromise on accuracy for the gain of complexity
bull Applications that involve massive data in high dimension require the LSH fast performance
bull Extension of the LSH to different spaces (PSH)
bull Learning the LSH parameters and hash functions for different applications
Conclusion
bull but at the endeverything depends on your data set
bull Try it at homendash Visit
httpwebmiteduandoniwwwLSHindexhtmlndash Email Alex Andoni Andonimitedundash Test over your own data
(C code under Red Hat Linux )
Thanks
bull Ilan Shimshoni (Haifa)
bull Mohamad Hegaze (Weizmann)
bull Alex Andoni (MIT)
bull Mica and Denis
- k-Nearest Neighbors Search in High Dimensions
- Outline
- Nearest Neighbor Search Problem definition
- Applications
- Naiumlve solution
- Common solution
- When to use nearest neighbor
- Nearest Neighbor
- r - Nearest Neighbor
- Slide 10
- The simplest solution
- Quadtree
- Quadtree - structure
- Quadtree - Query
- Quadtree ndash Pitfall1
- Slide 16
- Quadtree ndash pitfall 2
- Space partition based algorithms
- Slide 19
- Curse of dimensionality
- Curse of dimensionality Some intuition
- Slide 22
- Preview
- Hash function
- Slide 25
- Slide 26
- Recall r - Nearest Neighbor
- Locality sensitive hashing
- Slide 29
- Hamming Space
- Slide 31
- L1 to Hamming Space Embedding
- Slide 33
- Construction
- Query
- Alternative intuition random projections
- Slide 37
- Slide 38
- Slide 39
- k samplings
- Repeating
- Repeating L times
- Slide 43
- Secondary hashing
- The above hashing is locality-sensitive
- Slide 46
- Direct L2 solution
- Central limit theorem
- Slide 49
- Slide 50
- Norm Distance
- Slide 52
- The full Hashing
- Slide 54
- Slide 55
- Slide 56
- Generalization P-Stable distribution
- P-Stable summary
- Parameters selection
- Parameters selection hellip
- hellip Parameters selection
- Pros amp Cons
- Conclusion
- LSH - Applications
- Motivation
- Slide 66
- Given an image x what are the parameters θ in this image ie angles of joints orientation of the body etc1048698
- Ingredients
- Example based learning
- Slide 70
- The image features
- PSH The basic assumption
- Insight Manifolds
- Slide 74
- Parameter Sensitive Hashing (PSH)
- Slide 76
- Slide 77
- Slide 78
- Slide 79
- Local Weighted Regression (LWR)
- Results
- Slide 82
- Results ndash real data
- Slide 84
- Slide 85
- Fast pose estimation - summary
- Food for Thought
- Food for Thought Point Location in Different Spheres (PLDS)
- Motivation
- Slide 90
- Slide 91
- Slide 92
- Slide 93
- Image segmentation algorithm
- Slide 95
- Filtering examples
- Segmentation examples
- Mean-shift in high dimensions
- LSH-based data structure
- Choosing the optimal K and L
- Slide 101
- Choosing optimal K and L
- Slide 103
- Data driven partitions
- Additional speedup
- Speedup results
- Food for thought
- A thought for foodhellip
- Summary
- Slide 110
- Thanks
-
![Page 66: K-Nearest Neighbors Search in High Dimensions Tomer Peled Dan Kushnir Tell me who your neighbors are, and I'll know who you are.](https://reader036.fdocuments.us/reader036/viewer/2022062421/56649c765503460f94929f53/html5/thumbnails/66.jpg)
Outline
Fast Pose Estimation with Parameter Sensitive Hashing G Shakhnarovich P Viola and T Darrell
bull Finding sensitive hash functions
Mean Shift Based Clustering in HighDimensions A Texture Classification Example
B Georgescu I Shimshoni and P Meer
bull Tuning LSH parametersbull LSH data structure is used for algorithm
speedups
Given an image x what are the parameters θ in this image
ie angles of joints orientation of the body etc1048698
The Problem
Fast Pose Estimation with Parameter Sensitive Hashing
G Shakhnarovich P Viola and T Darrell
i
Ingredients
bull Input query image with unknown angles (parameters)
bull Database of human poses with known anglesbull Image feature extractor ndash edge detector
bull Distance metric in feature space dx
bull Distance metric in angles space
m
i
iid1
2121 )cos(1)(
Example based learning
bull Construct a database of example images with their known angles
bull Given a query image run your favorite feature extractorbull Compute KNN from databasebull Use these KNNs to compute the average angles of the
query
Input queryFind KNN in database of examples
Output Average angles of KNN
Input Query
Features extraction
Processed query
PSH (LSH)
Database of examples
The algorithm flow
LWR (Regression)
Output Match
The image features
B A
Axx 4107 )(
4
3
2
4 0
Image features are multi-scale edge histograms
Feature Extraction PSH LWR
PSH The basic assumption
There are two metric spaces here feature space ( )
and parameter space ( )
We want similarity to be measured in the angles
space whereas LSH works on the feature space
bull Assumption The feature space is closely related to the parameter space
xd
d
Feature Extraction PSH LWR
Insight Manifolds
bull Manifold is a space in which every point has a neighborhood resembling a Euclid space
bull But global structure may be complicated curved
bull For example lines are 1D manifolds planes are 2D manifolds etc
Feature Extraction PSH LWR
Parameters Space (angles)
Feature Space
q
Is this Magic
Parameter Sensitive Hashing (PSH)
The trick
Estimate performance of different hash functions on examples and select those sensitive to
The hash functions are applied in feature space but the KNN are valid in angle space
d
Feature Extraction PSH LWR
Label pairs of examples with similar angles
Define hash functions h on feature space
Feature Extraction PSH LWR
Predict labeling of similarnon-similar examples by using h
Compare labeling
If labeling by h is goodaccept h else change h
PSH as a classification problem
+1 +1 -1 -1
(r=025)
Labels
)1()( if 1
)( if 1y
labeled is
)x()(x examples ofpair A
ij
ji
rd
rd
ji
ji
ji
Feature Extraction PSH LWR
otherwise 1-
T(x) if 1)(
xh T
A binary hash functionfeatures
otherwise 1
if 1ˆ
labels ePredict th
)(xh)(xh)x(xy
jTiTjih
Feature Extraction PSH LWR
Feature
Feature Extraction PSH LWR
sconstraint iesprobabilit with thelabeling true thepredicts that Tbest theFind
themseparateor bin
same in the examplesboth place willTh
)(xT
Local Weighted Regression (LWR)bull Given a query image PSH returns
KNNs
bull LWR uses the KNN to compute a weighted average of the estimated angles of the query
weightdist
iXiixNx
xxdKxgdi
0)(
))(())((minarg0
Feature Extraction PSH LWR
Results
Synthetic data were generated
bull 13 angles 1 for rotation of the torso 12 for joints
bull 150000 images
bull Nuisance parameters added clothing illumination face expression
bull 1775000 example pairs
bull Selected 137 out of 5123 meaningful features (how)
18 bit hash functions (k) 150 hash tables (l)
bull Test on 1000 synthetic examplesbull PSH searched only 34 of the data per query
bull Without selection needed 40 bits and
1000 hash tables
Recall P1 is prob of positive hashP2 is prob of bad hashB is the max number of pts in a bucket
Results ndash real data
bull 800 images
bull Processed by a segmentation algorithm
bull 13 of the data were searched
Results ndash real data
Interesting mismatches
Fast pose estimation - summary
bull Fast way to compute the angles of human body figure
bull Moving from one representation space to another
bull Training a sensitive hash function
bull KNN smart averaging
Food for Thought
bull The basic assumption may be problematic (distance metric representations)
bull The training set should be dense
bull Texture and clutter
bull General some features are more important than others and should be weighted
Food for Thought Point Location in Different Spheres (PLDS)
bull Given n spheres in Rd centered at P=p1hellippn
with radii r1helliprn
bull Goal given a query q preprocess the points in P to find point pi that its sphere lsquocoverrsquo the query q
qpi
ri
Courtesy of Mohamad Hegaze
Motivationbull Clustering high dimensional data by using local
density measurements (eg feature space)bull Statistical curse of dimensionality
sparseness of the databull Computational curse of dimensionality
expensive range queriesbull LSH parameters should be adjusted for optimal
performance
Mean-Shift Based Clustering in High Dimensions A Texture Classification Example
B Georgescu I Shimshoni and P Meer
Outline
bull Mean-shift in a nutshell + examples
Our scope
bull Mean-shift in high dimensions ndash using LSH
bull Speedups1 Finding optimal LSH parameters
2 Data-driven partitions into buckets
3 Additional speedup by using LSH data structure
Mean-Shift in a Nutshellbandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
point
KNN in mean-shift
Bandwidth should be inversely proportional to the density in the region
high density - small bandwidth low density - large bandwidth
Based on kth nearest neighbor of the point
The bandwidth is
Adaptive mean-shift vs non-adaptive
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Image segmentation algorithm1 Input Data in 5D (3 color + 2 xy) or 3D (1 gray +2 xy)2 Resolution controlled by the bandwidth hs (spatial) hr (color)3 Apply filtering
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
3D
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Image segmentation algorithm
original segmented
filtered
Filtering pixel value of the nearest mode
Mean-shift trajectories
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
original squirrel filtered
original baboon filtered
Filtering examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Segmentation examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Mean-shift in high dimensions
Computational curse of dimensionality
Statistical curse of dimensionality
Expensive range queries implemented with LSH
Sparseness of the data variable bandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
LSH-based data structure
bull Choose L random partitionsEach partition includes K pairs
(dkvk)bull For each point we check
kdi vxK
It Partitions the data into cells
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing the optimal K and L
bull For a query q compute smallest number of distances to points in its buckets
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
points extra includemight big toois L ifbut
missed bemight points small toois L If
cell ain points ofnumber smaller k Large
C
l
l
CC
dC
LNN
dKnN
)1(
C
C
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
structure data theof resolution thedetermines
decreases but increases increases L As
C
CC
Choosing optimal K and LDetermine accurately the KNN for m randomly-selected data points
distance (bandwidth)
Choose error threshold
The optimal K and L should satisfy
the approximate distance
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing optimal K and Lbull For each K estimate the error forbull In one run for all Lrsquos find the minimal L satisfying the constraint L(K)bull Minimize time t(KL(K))
minimum
Approximationerror for KL
L(K) for =005 Running timet[KL(K)]
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Data driven partitions
bull In original LSH cut values are random in the range of the databull Suggestion Randomly select a point from the data and use one of its coordinates as the cut value
uniform data driven pointsbucket distribution
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Additional speedup
aggregate)an of typea like is (C mode same
the toconverge willCin points all that Assume
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
C
C
Speedup results
65536 points 1638 points sampled k=100
Food for thought
Low dimension High dimension
A thought for foodhellipbull Choose K L by sample learning or take the traditionalbull Can one estimate K L without samplingbull A thought for food does it help to know the data
dimensionality or the data manifoldbull Intuitively dimensionality implies the number of hash
functions neededbull The catch efficient dimensionality learning requires
KNN
1530 cookieshellip
Summary
bull LSH suggests a compromise on accuracy for the gain of complexity
bull Applications that involve massive data in high dimension require the LSH fast performance
bull Extension of the LSH to different spaces (PSH)
bull Learning the LSH parameters and hash functions for different applications
Conclusion
bull but at the endeverything depends on your data set
bull Try it at homendash Visit
httpwebmiteduandoniwwwLSHindexhtmlndash Email Alex Andoni Andonimitedundash Test over your own data
(C code under Red Hat Linux )
Thanks
bull Ilan Shimshoni (Haifa)
bull Mohamad Hegaze (Weizmann)
bull Alex Andoni (MIT)
bull Mica and Denis
- k-Nearest Neighbors Search in High Dimensions
- Outline
- Nearest Neighbor Search Problem definition
- Applications
- Naiumlve solution
- Common solution
- When to use nearest neighbor
- Nearest Neighbor
- r - Nearest Neighbor
- Slide 10
- The simplest solution
- Quadtree
- Quadtree - structure
- Quadtree - Query
- Quadtree ndash Pitfall1
- Slide 16
- Quadtree ndash pitfall 2
- Space partition based algorithms
- Slide 19
- Curse of dimensionality
- Curse of dimensionality Some intuition
- Slide 22
- Preview
- Hash function
- Slide 25
- Slide 26
- Recall r - Nearest Neighbor
- Locality sensitive hashing
- Slide 29
- Hamming Space
- Slide 31
- L1 to Hamming Space Embedding
- Slide 33
- Construction
- Query
- Alternative intuition random projections
- Slide 37
- Slide 38
- Slide 39
- k samplings
- Repeating
- Repeating L times
- Slide 43
- Secondary hashing
- The above hashing is locality-sensitive
- Slide 46
- Direct L2 solution
- Central limit theorem
- Slide 49
- Slide 50
- Norm Distance
- Slide 52
- The full Hashing
- Slide 54
- Slide 55
- Slide 56
- Generalization P-Stable distribution
- P-Stable summary
- Parameters selection
- Parameters selection hellip
- hellip Parameters selection
- Pros amp Cons
- Conclusion
- LSH - Applications
- Motivation
- Slide 66
- Given an image x what are the parameters θ in this image ie angles of joints orientation of the body etc1048698
- Ingredients
- Example based learning
- Slide 70
- The image features
- PSH The basic assumption
- Insight Manifolds
- Slide 74
- Parameter Sensitive Hashing (PSH)
- Slide 76
- Slide 77
- Slide 78
- Slide 79
- Local Weighted Regression (LWR)
- Results
- Slide 82
- Results ndash real data
- Slide 84
- Slide 85
- Fast pose estimation - summary
- Food for Thought
- Food for Thought Point Location in Different Spheres (PLDS)
- Motivation
- Slide 90
- Slide 91
- Slide 92
- Slide 93
- Image segmentation algorithm
- Slide 95
- Filtering examples
- Segmentation examples
- Mean-shift in high dimensions
- LSH-based data structure
- Choosing the optimal K and L
- Slide 101
- Choosing optimal K and L
- Slide 103
- Data driven partitions
- Additional speedup
- Speedup results
- Food for thought
- A thought for foodhellip
- Summary
- Slide 110
- Thanks
-
![Page 67: K-Nearest Neighbors Search in High Dimensions Tomer Peled Dan Kushnir Tell me who your neighbors are, and I'll know who you are.](https://reader036.fdocuments.us/reader036/viewer/2022062421/56649c765503460f94929f53/html5/thumbnails/67.jpg)
Given an image x what are the parameters θ in this image
ie angles of joints orientation of the body etc1048698
The Problem
Fast Pose Estimation with Parameter Sensitive Hashing
G Shakhnarovich P Viola and T Darrell
i
Ingredients
bull Input query image with unknown angles (parameters)
bull Database of human poses with known anglesbull Image feature extractor ndash edge detector
bull Distance metric in feature space dx
bull Distance metric in angles space
m
i
iid1
2121 )cos(1)(
Example based learning
bull Construct a database of example images with their known angles
bull Given a query image run your favorite feature extractorbull Compute KNN from databasebull Use these KNNs to compute the average angles of the
query
Input queryFind KNN in database of examples
Output Average angles of KNN
Input Query
Features extraction
Processed query
PSH (LSH)
Database of examples
The algorithm flow
LWR (Regression)
Output Match
The image features
B A
Axx 4107 )(
4
3
2
4 0
Image features are multi-scale edge histograms
Feature Extraction PSH LWR
PSH The basic assumption
There are two metric spaces here feature space ( )
and parameter space ( )
We want similarity to be measured in the angles
space whereas LSH works on the feature space
bull Assumption The feature space is closely related to the parameter space
xd
d
Feature Extraction PSH LWR
Insight Manifolds
bull Manifold is a space in which every point has a neighborhood resembling a Euclid space
bull But global structure may be complicated curved
bull For example lines are 1D manifolds planes are 2D manifolds etc
Feature Extraction PSH LWR
Parameters Space (angles)
Feature Space
q
Is this Magic
Parameter Sensitive Hashing (PSH)
The trick
Estimate performance of different hash functions on examples and select those sensitive to
The hash functions are applied in feature space but the KNN are valid in angle space
d
Feature Extraction PSH LWR
Label pairs of examples with similar angles
Define hash functions h on feature space
Feature Extraction PSH LWR
Predict labeling of similarnon-similar examples by using h
Compare labeling
If labeling by h is goodaccept h else change h
PSH as a classification problem
+1 +1 -1 -1
(r=025)
Labels
)1()( if 1
)( if 1y
labeled is
)x()(x examples ofpair A
ij
ji
rd
rd
ji
ji
ji
Feature Extraction PSH LWR
otherwise 1-
T(x) if 1)(
xh T
A binary hash functionfeatures
otherwise 1
if 1ˆ
labels ePredict th
)(xh)(xh)x(xy
jTiTjih
Feature Extraction PSH LWR
Feature
Feature Extraction PSH LWR
sconstraint iesprobabilit with thelabeling true thepredicts that Tbest theFind
themseparateor bin
same in the examplesboth place willTh
)(xT
Local Weighted Regression (LWR)bull Given a query image PSH returns
KNNs
bull LWR uses the KNN to compute a weighted average of the estimated angles of the query
weightdist
iXiixNx
xxdKxgdi
0)(
))(())((minarg0
Feature Extraction PSH LWR
Results
Synthetic data were generated
bull 13 angles 1 for rotation of the torso 12 for joints
bull 150000 images
bull Nuisance parameters added clothing illumination face expression
bull 1775000 example pairs
bull Selected 137 out of 5123 meaningful features (how)
18 bit hash functions (k) 150 hash tables (l)
bull Test on 1000 synthetic examplesbull PSH searched only 34 of the data per query
bull Without selection needed 40 bits and
1000 hash tables
Recall P1 is prob of positive hashP2 is prob of bad hashB is the max number of pts in a bucket
Results ndash real data
bull 800 images
bull Processed by a segmentation algorithm
bull 13 of the data were searched
Results ndash real data
Interesting mismatches
Fast pose estimation - summary
bull Fast way to compute the angles of human body figure
bull Moving from one representation space to another
bull Training a sensitive hash function
bull KNN smart averaging
Food for Thought
bull The basic assumption may be problematic (distance metric representations)
bull The training set should be dense
bull Texture and clutter
bull General some features are more important than others and should be weighted
Food for Thought Point Location in Different Spheres (PLDS)
bull Given n spheres in Rd centered at P=p1hellippn
with radii r1helliprn
bull Goal given a query q preprocess the points in P to find point pi that its sphere lsquocoverrsquo the query q
qpi
ri
Courtesy of Mohamad Hegaze
Motivationbull Clustering high dimensional data by using local
density measurements (eg feature space)bull Statistical curse of dimensionality
sparseness of the databull Computational curse of dimensionality
expensive range queriesbull LSH parameters should be adjusted for optimal
performance
Mean-Shift Based Clustering in High Dimensions A Texture Classification Example
B Georgescu I Shimshoni and P Meer
Outline
bull Mean-shift in a nutshell + examples
Our scope
bull Mean-shift in high dimensions ndash using LSH
bull Speedups1 Finding optimal LSH parameters
2 Data-driven partitions into buckets
3 Additional speedup by using LSH data structure
Mean-Shift in a Nutshellbandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
point
KNN in mean-shift
Bandwidth should be inversely proportional to the density in the region
high density - small bandwidth low density - large bandwidth
Based on kth nearest neighbor of the point
The bandwidth is
Adaptive mean-shift vs non-adaptive
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Image segmentation algorithm1 Input Data in 5D (3 color + 2 xy) or 3D (1 gray +2 xy)2 Resolution controlled by the bandwidth hs (spatial) hr (color)3 Apply filtering
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
3D
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Image segmentation algorithm
original segmented
filtered
Filtering pixel value of the nearest mode
Mean-shift trajectories
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
original squirrel filtered
original baboon filtered
Filtering examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Segmentation examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Mean-shift in high dimensions
Computational curse of dimensionality
Statistical curse of dimensionality
Expensive range queries implemented with LSH
Sparseness of the data variable bandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
LSH-based data structure
bull Choose L random partitionsEach partition includes K pairs
(dkvk)bull For each point we check
kdi vxK
It Partitions the data into cells
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing the optimal K and L
bull For a query q compute smallest number of distances to points in its buckets
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
points extra includemight big toois L ifbut
missed bemight points small toois L If
cell ain points ofnumber smaller k Large
C
l
l
CC
dC
LNN
dKnN
)1(
C
C
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
structure data theof resolution thedetermines
decreases but increases increases L As
C
CC
Choosing optimal K and LDetermine accurately the KNN for m randomly-selected data points
distance (bandwidth)
Choose error threshold
The optimal K and L should satisfy
the approximate distance
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing optimal K and Lbull For each K estimate the error forbull In one run for all Lrsquos find the minimal L satisfying the constraint L(K)bull Minimize time t(KL(K))
minimum
Approximationerror for KL
L(K) for =005 Running timet[KL(K)]
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Data driven partitions
bull In original LSH cut values are random in the range of the databull Suggestion Randomly select a point from the data and use one of its coordinates as the cut value
uniform data driven pointsbucket distribution
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Additional speedup
aggregate)an of typea like is (C mode same
the toconverge willCin points all that Assume
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
C
C
Speedup results
65536 points 1638 points sampled k=100
Food for thought
Low dimension High dimension
A thought for foodhellipbull Choose K L by sample learning or take the traditionalbull Can one estimate K L without samplingbull A thought for food does it help to know the data
dimensionality or the data manifoldbull Intuitively dimensionality implies the number of hash
functions neededbull The catch efficient dimensionality learning requires
KNN
1530 cookieshellip
Summary
bull LSH suggests a compromise on accuracy for the gain of complexity
bull Applications that involve massive data in high dimension require the LSH fast performance
bull Extension of the LSH to different spaces (PSH)
bull Learning the LSH parameters and hash functions for different applications
Conclusion
bull but at the endeverything depends on your data set
bull Try it at homendash Visit
httpwebmiteduandoniwwwLSHindexhtmlndash Email Alex Andoni Andonimitedundash Test over your own data
(C code under Red Hat Linux )
Thanks
bull Ilan Shimshoni (Haifa)
bull Mohamad Hegaze (Weizmann)
bull Alex Andoni (MIT)
bull Mica and Denis
- k-Nearest Neighbors Search in High Dimensions
- Outline
- Nearest Neighbor Search Problem definition
- Applications
- Naiumlve solution
- Common solution
- When to use nearest neighbor
- Nearest Neighbor
- r - Nearest Neighbor
- Slide 10
- The simplest solution
- Quadtree
- Quadtree - structure
- Quadtree - Query
- Quadtree ndash Pitfall1
- Slide 16
- Quadtree ndash pitfall 2
- Space partition based algorithms
- Slide 19
- Curse of dimensionality
- Curse of dimensionality Some intuition
- Slide 22
- Preview
- Hash function
- Slide 25
- Slide 26
- Recall r - Nearest Neighbor
- Locality sensitive hashing
- Slide 29
- Hamming Space
- Slide 31
- L1 to Hamming Space Embedding
- Slide 33
- Construction
- Query
- Alternative intuition random projections
- Slide 37
- Slide 38
- Slide 39
- k samplings
- Repeating
- Repeating L times
- Slide 43
- Secondary hashing
- The above hashing is locality-sensitive
- Slide 46
- Direct L2 solution
- Central limit theorem
- Slide 49
- Slide 50
- Norm Distance
- Slide 52
- The full Hashing
- Slide 54
- Slide 55
- Slide 56
- Generalization P-Stable distribution
- P-Stable summary
- Parameters selection
- Parameters selection hellip
- hellip Parameters selection
- Pros amp Cons
- Conclusion
- LSH - Applications
- Motivation
- Slide 66
- Given an image x what are the parameters θ in this image ie angles of joints orientation of the body etc1048698
- Ingredients
- Example based learning
- Slide 70
- The image features
- PSH The basic assumption
- Insight Manifolds
- Slide 74
- Parameter Sensitive Hashing (PSH)
- Slide 76
- Slide 77
- Slide 78
- Slide 79
- Local Weighted Regression (LWR)
- Results
- Slide 82
- Results ndash real data
- Slide 84
- Slide 85
- Fast pose estimation - summary
- Food for Thought
- Food for Thought Point Location in Different Spheres (PLDS)
- Motivation
- Slide 90
- Slide 91
- Slide 92
- Slide 93
- Image segmentation algorithm
- Slide 95
- Filtering examples
- Segmentation examples
- Mean-shift in high dimensions
- LSH-based data structure
- Choosing the optimal K and L
- Slide 101
- Choosing optimal K and L
- Slide 103
- Data driven partitions
- Additional speedup
- Speedup results
- Food for thought
- A thought for foodhellip
- Summary
- Slide 110
- Thanks
-
![Page 68: K-Nearest Neighbors Search in High Dimensions Tomer Peled Dan Kushnir Tell me who your neighbors are, and I'll know who you are.](https://reader036.fdocuments.us/reader036/viewer/2022062421/56649c765503460f94929f53/html5/thumbnails/68.jpg)
Ingredients
bull Input query image with unknown angles (parameters)
bull Database of human poses with known anglesbull Image feature extractor ndash edge detector
bull Distance metric in feature space dx
bull Distance metric in angles space
m
i
iid1
2121 )cos(1)(
Example based learning
bull Construct a database of example images with their known angles
bull Given a query image run your favorite feature extractorbull Compute KNN from databasebull Use these KNNs to compute the average angles of the
query
Input queryFind KNN in database of examples
Output Average angles of KNN
Input Query
Features extraction
Processed query
PSH (LSH)
Database of examples
The algorithm flow
LWR (Regression)
Output Match
The image features
B A
Axx 4107 )(
4
3
2
4 0
Image features are multi-scale edge histograms
Feature Extraction PSH LWR
PSH The basic assumption
There are two metric spaces here feature space ( )
and parameter space ( )
We want similarity to be measured in the angles
space whereas LSH works on the feature space
bull Assumption The feature space is closely related to the parameter space
xd
d
Feature Extraction PSH LWR
Insight Manifolds
bull Manifold is a space in which every point has a neighborhood resembling a Euclid space
bull But global structure may be complicated curved
bull For example lines are 1D manifolds planes are 2D manifolds etc
Feature Extraction PSH LWR
Parameters Space (angles)
Feature Space
q
Is this Magic
Parameter Sensitive Hashing (PSH)
The trick
Estimate performance of different hash functions on examples and select those sensitive to
The hash functions are applied in feature space but the KNN are valid in angle space
d
Feature Extraction PSH LWR
Label pairs of examples with similar angles
Define hash functions h on feature space
Feature Extraction PSH LWR
Predict labeling of similarnon-similar examples by using h
Compare labeling
If labeling by h is goodaccept h else change h
PSH as a classification problem
+1 +1 -1 -1
(r=025)
Labels
)1()( if 1
)( if 1y
labeled is
)x()(x examples ofpair A
ij
ji
rd
rd
ji
ji
ji
Feature Extraction PSH LWR
otherwise 1-
T(x) if 1)(
xh T
A binary hash functionfeatures
otherwise 1
if 1ˆ
labels ePredict th
)(xh)(xh)x(xy
jTiTjih
Feature Extraction PSH LWR
Feature
Feature Extraction PSH LWR
sconstraint iesprobabilit with thelabeling true thepredicts that Tbest theFind
themseparateor bin
same in the examplesboth place willTh
)(xT
Local Weighted Regression (LWR)bull Given a query image PSH returns
KNNs
bull LWR uses the KNN to compute a weighted average of the estimated angles of the query
weightdist
iXiixNx
xxdKxgdi
0)(
))(())((minarg0
Feature Extraction PSH LWR
Results
Synthetic data were generated
bull 13 angles 1 for rotation of the torso 12 for joints
bull 150000 images
bull Nuisance parameters added clothing illumination face expression
bull 1775000 example pairs
bull Selected 137 out of 5123 meaningful features (how)
18 bit hash functions (k) 150 hash tables (l)
bull Test on 1000 synthetic examplesbull PSH searched only 34 of the data per query
bull Without selection needed 40 bits and
1000 hash tables
Recall P1 is prob of positive hashP2 is prob of bad hashB is the max number of pts in a bucket
Results ndash real data
bull 800 images
bull Processed by a segmentation algorithm
bull 13 of the data were searched
Results ndash real data
Interesting mismatches
Fast pose estimation - summary
bull Fast way to compute the angles of human body figure
bull Moving from one representation space to another
bull Training a sensitive hash function
bull KNN smart averaging
Food for Thought
bull The basic assumption may be problematic (distance metric representations)
bull The training set should be dense
bull Texture and clutter
bull General some features are more important than others and should be weighted
Food for Thought Point Location in Different Spheres (PLDS)
bull Given n spheres in Rd centered at P=p1hellippn
with radii r1helliprn
bull Goal given a query q preprocess the points in P to find point pi that its sphere lsquocoverrsquo the query q
qpi
ri
Courtesy of Mohamad Hegaze
Motivationbull Clustering high dimensional data by using local
density measurements (eg feature space)bull Statistical curse of dimensionality
sparseness of the databull Computational curse of dimensionality
expensive range queriesbull LSH parameters should be adjusted for optimal
performance
Mean-Shift Based Clustering in High Dimensions A Texture Classification Example
B Georgescu I Shimshoni and P Meer
Outline
bull Mean-shift in a nutshell + examples
Our scope
bull Mean-shift in high dimensions ndash using LSH
bull Speedups1 Finding optimal LSH parameters
2 Data-driven partitions into buckets
3 Additional speedup by using LSH data structure
Mean-Shift in a Nutshellbandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
point
KNN in mean-shift
Bandwidth should be inversely proportional to the density in the region
high density - small bandwidth low density - large bandwidth
Based on kth nearest neighbor of the point
The bandwidth is
Adaptive mean-shift vs non-adaptive
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Image segmentation algorithm1 Input Data in 5D (3 color + 2 xy) or 3D (1 gray +2 xy)2 Resolution controlled by the bandwidth hs (spatial) hr (color)3 Apply filtering
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
3D
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Image segmentation algorithm
original segmented
filtered
Filtering pixel value of the nearest mode
Mean-shift trajectories
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
original squirrel filtered
original baboon filtered
Filtering examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Segmentation examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Mean-shift in high dimensions
Computational curse of dimensionality
Statistical curse of dimensionality
Expensive range queries implemented with LSH
Sparseness of the data variable bandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
LSH-based data structure
bull Choose L random partitionsEach partition includes K pairs
(dkvk)bull For each point we check
kdi vxK
It Partitions the data into cells
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing the optimal K and L
bull For a query q compute smallest number of distances to points in its buckets
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
points extra includemight big toois L ifbut
missed bemight points small toois L If
cell ain points ofnumber smaller k Large
C
l
l
CC
dC
LNN
dKnN
)1(
C
C
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
structure data theof resolution thedetermines
decreases but increases increases L As
C
CC
Choosing optimal K and LDetermine accurately the KNN for m randomly-selected data points
distance (bandwidth)
Choose error threshold
The optimal K and L should satisfy
the approximate distance
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing optimal K and Lbull For each K estimate the error forbull In one run for all Lrsquos find the minimal L satisfying the constraint L(K)bull Minimize time t(KL(K))
minimum
Approximationerror for KL
L(K) for =005 Running timet[KL(K)]
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Data driven partitions
bull In original LSH cut values are random in the range of the databull Suggestion Randomly select a point from the data and use one of its coordinates as the cut value
uniform data driven pointsbucket distribution
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Additional speedup
aggregate)an of typea like is (C mode same
the toconverge willCin points all that Assume
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
C
C
Speedup results
65536 points 1638 points sampled k=100
Food for thought
Low dimension High dimension
A thought for foodhellipbull Choose K L by sample learning or take the traditionalbull Can one estimate K L without samplingbull A thought for food does it help to know the data
dimensionality or the data manifoldbull Intuitively dimensionality implies the number of hash
functions neededbull The catch efficient dimensionality learning requires
KNN
1530 cookieshellip
Summary
bull LSH suggests a compromise on accuracy for the gain of complexity
bull Applications that involve massive data in high dimension require the LSH fast performance
bull Extension of the LSH to different spaces (PSH)
bull Learning the LSH parameters and hash functions for different applications
Conclusion
bull but at the endeverything depends on your data set
bull Try it at homendash Visit
httpwebmiteduandoniwwwLSHindexhtmlndash Email Alex Andoni Andonimitedundash Test over your own data
(C code under Red Hat Linux )
Thanks
bull Ilan Shimshoni (Haifa)
bull Mohamad Hegaze (Weizmann)
bull Alex Andoni (MIT)
bull Mica and Denis
- k-Nearest Neighbors Search in High Dimensions
- Outline
- Nearest Neighbor Search Problem definition
- Applications
- Naiumlve solution
- Common solution
- When to use nearest neighbor
- Nearest Neighbor
- r - Nearest Neighbor
- Slide 10
- The simplest solution
- Quadtree
- Quadtree - structure
- Quadtree - Query
- Quadtree ndash Pitfall1
- Slide 16
- Quadtree ndash pitfall 2
- Space partition based algorithms
- Slide 19
- Curse of dimensionality
- Curse of dimensionality Some intuition
- Slide 22
- Preview
- Hash function
- Slide 25
- Slide 26
- Recall r - Nearest Neighbor
- Locality sensitive hashing
- Slide 29
- Hamming Space
- Slide 31
- L1 to Hamming Space Embedding
- Slide 33
- Construction
- Query
- Alternative intuition random projections
- Slide 37
- Slide 38
- Slide 39
- k samplings
- Repeating
- Repeating L times
- Slide 43
- Secondary hashing
- The above hashing is locality-sensitive
- Slide 46
- Direct L2 solution
- Central limit theorem
- Slide 49
- Slide 50
- Norm Distance
- Slide 52
- The full Hashing
- Slide 54
- Slide 55
- Slide 56
- Generalization P-Stable distribution
- P-Stable summary
- Parameters selection
- Parameters selection hellip
- hellip Parameters selection
- Pros amp Cons
- Conclusion
- LSH - Applications
- Motivation
- Slide 66
- Given an image x what are the parameters θ in this image ie angles of joints orientation of the body etc1048698
- Ingredients
- Example based learning
- Slide 70
- The image features
- PSH The basic assumption
- Insight Manifolds
- Slide 74
- Parameter Sensitive Hashing (PSH)
- Slide 76
- Slide 77
- Slide 78
- Slide 79
- Local Weighted Regression (LWR)
- Results
- Slide 82
- Results ndash real data
- Slide 84
- Slide 85
- Fast pose estimation - summary
- Food for Thought
- Food for Thought Point Location in Different Spheres (PLDS)
- Motivation
- Slide 90
- Slide 91
- Slide 92
- Slide 93
- Image segmentation algorithm
- Slide 95
- Filtering examples
- Segmentation examples
- Mean-shift in high dimensions
- LSH-based data structure
- Choosing the optimal K and L
- Slide 101
- Choosing optimal K and L
- Slide 103
- Data driven partitions
- Additional speedup
- Speedup results
- Food for thought
- A thought for foodhellip
- Summary
- Slide 110
- Thanks
-
![Page 69: K-Nearest Neighbors Search in High Dimensions Tomer Peled Dan Kushnir Tell me who your neighbors are, and I'll know who you are.](https://reader036.fdocuments.us/reader036/viewer/2022062421/56649c765503460f94929f53/html5/thumbnails/69.jpg)
Example based learning
bull Construct a database of example images with their known angles
bull Given a query image run your favorite feature extractorbull Compute KNN from databasebull Use these KNNs to compute the average angles of the
query
Input queryFind KNN in database of examples
Output Average angles of KNN
Input Query
Features extraction
Processed query
PSH (LSH)
Database of examples
The algorithm flow
LWR (Regression)
Output Match
The image features
B A
Axx 4107 )(
4
3
2
4 0
Image features are multi-scale edge histograms
Feature Extraction PSH LWR
PSH The basic assumption
There are two metric spaces here feature space ( )
and parameter space ( )
We want similarity to be measured in the angles
space whereas LSH works on the feature space
bull Assumption The feature space is closely related to the parameter space
xd
d
Feature Extraction PSH LWR
Insight Manifolds
bull Manifold is a space in which every point has a neighborhood resembling a Euclid space
bull But global structure may be complicated curved
bull For example lines are 1D manifolds planes are 2D manifolds etc
Feature Extraction PSH LWR
Parameters Space (angles)
Feature Space
q
Is this Magic
Parameter Sensitive Hashing (PSH)
The trick
Estimate performance of different hash functions on examples and select those sensitive to
The hash functions are applied in feature space but the KNN are valid in angle space
d
Feature Extraction PSH LWR
Label pairs of examples with similar angles
Define hash functions h on feature space
Feature Extraction PSH LWR
Predict labeling of similarnon-similar examples by using h
Compare labeling
If labeling by h is goodaccept h else change h
PSH as a classification problem
+1 +1 -1 -1
(r=025)
Labels
)1()( if 1
)( if 1y
labeled is
)x()(x examples ofpair A
ij
ji
rd
rd
ji
ji
ji
Feature Extraction PSH LWR
otherwise 1-
T(x) if 1)(
xh T
A binary hash functionfeatures
otherwise 1
if 1ˆ
labels ePredict th
)(xh)(xh)x(xy
jTiTjih
Feature Extraction PSH LWR
Feature
Feature Extraction PSH LWR
sconstraint iesprobabilit with thelabeling true thepredicts that Tbest theFind
themseparateor bin
same in the examplesboth place willTh
)(xT
Local Weighted Regression (LWR)bull Given a query image PSH returns
KNNs
bull LWR uses the KNN to compute a weighted average of the estimated angles of the query
weightdist
iXiixNx
xxdKxgdi
0)(
))(())((minarg0
Feature Extraction PSH LWR
Results
Synthetic data were generated
bull 13 angles 1 for rotation of the torso 12 for joints
bull 150000 images
bull Nuisance parameters added clothing illumination face expression
bull 1775000 example pairs
bull Selected 137 out of 5123 meaningful features (how)
18 bit hash functions (k) 150 hash tables (l)
bull Test on 1000 synthetic examplesbull PSH searched only 34 of the data per query
bull Without selection needed 40 bits and
1000 hash tables
Recall P1 is prob of positive hashP2 is prob of bad hashB is the max number of pts in a bucket
Results ndash real data
bull 800 images
bull Processed by a segmentation algorithm
bull 13 of the data were searched
Results ndash real data
Interesting mismatches
Fast pose estimation - summary
bull Fast way to compute the angles of human body figure
bull Moving from one representation space to another
bull Training a sensitive hash function
bull KNN smart averaging
Food for Thought
bull The basic assumption may be problematic (distance metric representations)
bull The training set should be dense
bull Texture and clutter
bull General some features are more important than others and should be weighted
Food for Thought Point Location in Different Spheres (PLDS)
bull Given n spheres in Rd centered at P=p1hellippn
with radii r1helliprn
bull Goal given a query q preprocess the points in P to find point pi that its sphere lsquocoverrsquo the query q
qpi
ri
Courtesy of Mohamad Hegaze
Motivationbull Clustering high dimensional data by using local
density measurements (eg feature space)bull Statistical curse of dimensionality
sparseness of the databull Computational curse of dimensionality
expensive range queriesbull LSH parameters should be adjusted for optimal
performance
Mean-Shift Based Clustering in High Dimensions A Texture Classification Example
B Georgescu I Shimshoni and P Meer
Outline
bull Mean-shift in a nutshell + examples
Our scope
bull Mean-shift in high dimensions ndash using LSH
bull Speedups1 Finding optimal LSH parameters
2 Data-driven partitions into buckets
3 Additional speedup by using LSH data structure
Mean-Shift in a Nutshellbandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
point
KNN in mean-shift
Bandwidth should be inversely proportional to the density in the region
high density - small bandwidth low density - large bandwidth
Based on kth nearest neighbor of the point
The bandwidth is
Adaptive mean-shift vs non-adaptive
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Image segmentation algorithm1 Input Data in 5D (3 color + 2 xy) or 3D (1 gray +2 xy)2 Resolution controlled by the bandwidth hs (spatial) hr (color)3 Apply filtering
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
3D
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Image segmentation algorithm
original segmented
filtered
Filtering pixel value of the nearest mode
Mean-shift trajectories
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
original squirrel filtered
original baboon filtered
Filtering examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Segmentation examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Mean-shift in high dimensions
Computational curse of dimensionality
Statistical curse of dimensionality
Expensive range queries implemented with LSH
Sparseness of the data variable bandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
LSH-based data structure
bull Choose L random partitionsEach partition includes K pairs
(dkvk)bull For each point we check
kdi vxK
It Partitions the data into cells
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing the optimal K and L
bull For a query q compute smallest number of distances to points in its buckets
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
points extra includemight big toois L ifbut
missed bemight points small toois L If
cell ain points ofnumber smaller k Large
C
l
l
CC
dC
LNN
dKnN
)1(
C
C
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
structure data theof resolution thedetermines
decreases but increases increases L As
C
CC
Choosing optimal K and LDetermine accurately the KNN for m randomly-selected data points
distance (bandwidth)
Choose error threshold
The optimal K and L should satisfy
the approximate distance
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing optimal K and Lbull For each K estimate the error forbull In one run for all Lrsquos find the minimal L satisfying the constraint L(K)bull Minimize time t(KL(K))
minimum
Approximationerror for KL
L(K) for =005 Running timet[KL(K)]
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Data driven partitions
bull In original LSH cut values are random in the range of the databull Suggestion Randomly select a point from the data and use one of its coordinates as the cut value
uniform data driven pointsbucket distribution
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Additional speedup
aggregate)an of typea like is (C mode same
the toconverge willCin points all that Assume
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
C
C
Speedup results
65536 points 1638 points sampled k=100
Food for thought
Low dimension High dimension
A thought for foodhellipbull Choose K L by sample learning or take the traditionalbull Can one estimate K L without samplingbull A thought for food does it help to know the data
dimensionality or the data manifoldbull Intuitively dimensionality implies the number of hash
functions neededbull The catch efficient dimensionality learning requires
KNN
1530 cookieshellip
Summary
bull LSH suggests a compromise on accuracy for the gain of complexity
bull Applications that involve massive data in high dimension require the LSH fast performance
bull Extension of the LSH to different spaces (PSH)
bull Learning the LSH parameters and hash functions for different applications
Conclusion
bull but at the endeverything depends on your data set
bull Try it at homendash Visit
httpwebmiteduandoniwwwLSHindexhtmlndash Email Alex Andoni Andonimitedundash Test over your own data
(C code under Red Hat Linux )
Thanks
bull Ilan Shimshoni (Haifa)
bull Mohamad Hegaze (Weizmann)
bull Alex Andoni (MIT)
bull Mica and Denis
- k-Nearest Neighbors Search in High Dimensions
- Outline
- Nearest Neighbor Search Problem definition
- Applications
- Naiumlve solution
- Common solution
- When to use nearest neighbor
- Nearest Neighbor
- r - Nearest Neighbor
- Slide 10
- The simplest solution
- Quadtree
- Quadtree - structure
- Quadtree - Query
- Quadtree ndash Pitfall1
- Slide 16
- Quadtree ndash pitfall 2
- Space partition based algorithms
- Slide 19
- Curse of dimensionality
- Curse of dimensionality Some intuition
- Slide 22
- Preview
- Hash function
- Slide 25
- Slide 26
- Recall r - Nearest Neighbor
- Locality sensitive hashing
- Slide 29
- Hamming Space
- Slide 31
- L1 to Hamming Space Embedding
- Slide 33
- Construction
- Query
- Alternative intuition random projections
- Slide 37
- Slide 38
- Slide 39
- k samplings
- Repeating
- Repeating L times
- Slide 43
- Secondary hashing
- The above hashing is locality-sensitive
- Slide 46
- Direct L2 solution
- Central limit theorem
- Slide 49
- Slide 50
- Norm Distance
- Slide 52
- The full Hashing
- Slide 54
- Slide 55
- Slide 56
- Generalization P-Stable distribution
- P-Stable summary
- Parameters selection
- Parameters selection hellip
- hellip Parameters selection
- Pros amp Cons
- Conclusion
- LSH - Applications
- Motivation
- Slide 66
- Given an image x what are the parameters θ in this image ie angles of joints orientation of the body etc1048698
- Ingredients
- Example based learning
- Slide 70
- The image features
- PSH The basic assumption
- Insight Manifolds
- Slide 74
- Parameter Sensitive Hashing (PSH)
- Slide 76
- Slide 77
- Slide 78
- Slide 79
- Local Weighted Regression (LWR)
- Results
- Slide 82
- Results ndash real data
- Slide 84
- Slide 85
- Fast pose estimation - summary
- Food for Thought
- Food for Thought Point Location in Different Spheres (PLDS)
- Motivation
- Slide 90
- Slide 91
- Slide 92
- Slide 93
- Image segmentation algorithm
- Slide 95
- Filtering examples
- Segmentation examples
- Mean-shift in high dimensions
- LSH-based data structure
- Choosing the optimal K and L
- Slide 101
- Choosing optimal K and L
- Slide 103
- Data driven partitions
- Additional speedup
- Speedup results
- Food for thought
- A thought for foodhellip
- Summary
- Slide 110
- Thanks
-
![Page 70: K-Nearest Neighbors Search in High Dimensions Tomer Peled Dan Kushnir Tell me who your neighbors are, and I'll know who you are.](https://reader036.fdocuments.us/reader036/viewer/2022062421/56649c765503460f94929f53/html5/thumbnails/70.jpg)
Input Query
Features extraction
Processed query
PSH (LSH)
Database of examples
The algorithm flow
LWR (Regression)
Output Match
The image features
B A
Axx 4107 )(
4
3
2
4 0
Image features are multi-scale edge histograms
Feature Extraction PSH LWR
PSH The basic assumption
There are two metric spaces here feature space ( )
and parameter space ( )
We want similarity to be measured in the angles
space whereas LSH works on the feature space
bull Assumption The feature space is closely related to the parameter space
xd
d
Feature Extraction PSH LWR
Insight Manifolds
bull Manifold is a space in which every point has a neighborhood resembling a Euclid space
bull But global structure may be complicated curved
bull For example lines are 1D manifolds planes are 2D manifolds etc
Feature Extraction PSH LWR
Parameters Space (angles)
Feature Space
q
Is this Magic
Parameter Sensitive Hashing (PSH)
The trick
Estimate performance of different hash functions on examples and select those sensitive to
The hash functions are applied in feature space but the KNN are valid in angle space
d
Feature Extraction PSH LWR
Label pairs of examples with similar angles
Define hash functions h on feature space
Feature Extraction PSH LWR
Predict labeling of similarnon-similar examples by using h
Compare labeling
If labeling by h is goodaccept h else change h
PSH as a classification problem
+1 +1 -1 -1
(r=025)
Labels
)1()( if 1
)( if 1y
labeled is
)x()(x examples ofpair A
ij
ji
rd
rd
ji
ji
ji
Feature Extraction PSH LWR
otherwise 1-
T(x) if 1)(
xh T
A binary hash functionfeatures
otherwise 1
if 1ˆ
labels ePredict th
)(xh)(xh)x(xy
jTiTjih
Feature Extraction PSH LWR
Feature
Feature Extraction PSH LWR
sconstraint iesprobabilit with thelabeling true thepredicts that Tbest theFind
themseparateor bin
same in the examplesboth place willTh
)(xT
Local Weighted Regression (LWR)bull Given a query image PSH returns
KNNs
bull LWR uses the KNN to compute a weighted average of the estimated angles of the query
weightdist
iXiixNx
xxdKxgdi
0)(
))(())((minarg0
Feature Extraction PSH LWR
Results
Synthetic data were generated
bull 13 angles 1 for rotation of the torso 12 for joints
bull 150000 images
bull Nuisance parameters added clothing illumination face expression
bull 1775000 example pairs
bull Selected 137 out of 5123 meaningful features (how)
18 bit hash functions (k) 150 hash tables (l)
bull Test on 1000 synthetic examplesbull PSH searched only 34 of the data per query
bull Without selection needed 40 bits and
1000 hash tables
Recall P1 is prob of positive hashP2 is prob of bad hashB is the max number of pts in a bucket
Results ndash real data
bull 800 images
bull Processed by a segmentation algorithm
bull 13 of the data were searched
Results ndash real data
Interesting mismatches
Fast pose estimation - summary
bull Fast way to compute the angles of human body figure
bull Moving from one representation space to another
bull Training a sensitive hash function
bull KNN smart averaging
Food for Thought
bull The basic assumption may be problematic (distance metric representations)
bull The training set should be dense
bull Texture and clutter
bull General some features are more important than others and should be weighted
Food for Thought Point Location in Different Spheres (PLDS)
bull Given n spheres in Rd centered at P=p1hellippn
with radii r1helliprn
bull Goal given a query q preprocess the points in P to find point pi that its sphere lsquocoverrsquo the query q
qpi
ri
Courtesy of Mohamad Hegaze
Motivationbull Clustering high dimensional data by using local
density measurements (eg feature space)bull Statistical curse of dimensionality
sparseness of the databull Computational curse of dimensionality
expensive range queriesbull LSH parameters should be adjusted for optimal
performance
Mean-Shift Based Clustering in High Dimensions A Texture Classification Example
B Georgescu I Shimshoni and P Meer
Outline
bull Mean-shift in a nutshell + examples
Our scope
bull Mean-shift in high dimensions ndash using LSH
bull Speedups1 Finding optimal LSH parameters
2 Data-driven partitions into buckets
3 Additional speedup by using LSH data structure
Mean-Shift in a Nutshellbandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
point
KNN in mean-shift
Bandwidth should be inversely proportional to the density in the region
high density - small bandwidth low density - large bandwidth
Based on kth nearest neighbor of the point
The bandwidth is
Adaptive mean-shift vs non-adaptive
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Image segmentation algorithm1 Input Data in 5D (3 color + 2 xy) or 3D (1 gray +2 xy)2 Resolution controlled by the bandwidth hs (spatial) hr (color)3 Apply filtering
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
3D
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Image segmentation algorithm
original segmented
filtered
Filtering pixel value of the nearest mode
Mean-shift trajectories
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
original squirrel filtered
original baboon filtered
Filtering examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Segmentation examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Mean-shift in high dimensions
Computational curse of dimensionality
Statistical curse of dimensionality
Expensive range queries implemented with LSH
Sparseness of the data variable bandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
LSH-based data structure
bull Choose L random partitionsEach partition includes K pairs
(dkvk)bull For each point we check
kdi vxK
It Partitions the data into cells
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing the optimal K and L
bull For a query q compute smallest number of distances to points in its buckets
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
points extra includemight big toois L ifbut
missed bemight points small toois L If
cell ain points ofnumber smaller k Large
C
l
l
CC
dC
LNN
dKnN
)1(
C
C
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
structure data theof resolution thedetermines
decreases but increases increases L As
C
CC
Choosing optimal K and LDetermine accurately the KNN for m randomly-selected data points
distance (bandwidth)
Choose error threshold
The optimal K and L should satisfy
the approximate distance
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing optimal K and Lbull For each K estimate the error forbull In one run for all Lrsquos find the minimal L satisfying the constraint L(K)bull Minimize time t(KL(K))
minimum
Approximationerror for KL
L(K) for =005 Running timet[KL(K)]
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Data driven partitions
bull In original LSH cut values are random in the range of the databull Suggestion Randomly select a point from the data and use one of its coordinates as the cut value
uniform data driven pointsbucket distribution
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Additional speedup
aggregate)an of typea like is (C mode same
the toconverge willCin points all that Assume
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
C
C
Speedup results
65536 points 1638 points sampled k=100
Food for thought
Low dimension High dimension
A thought for foodhellipbull Choose K L by sample learning or take the traditionalbull Can one estimate K L without samplingbull A thought for food does it help to know the data
dimensionality or the data manifoldbull Intuitively dimensionality implies the number of hash
functions neededbull The catch efficient dimensionality learning requires
KNN
1530 cookieshellip
Summary
bull LSH suggests a compromise on accuracy for the gain of complexity
bull Applications that involve massive data in high dimension require the LSH fast performance
bull Extension of the LSH to different spaces (PSH)
bull Learning the LSH parameters and hash functions for different applications
Conclusion
bull but at the endeverything depends on your data set
bull Try it at homendash Visit
httpwebmiteduandoniwwwLSHindexhtmlndash Email Alex Andoni Andonimitedundash Test over your own data
(C code under Red Hat Linux )
Thanks
bull Ilan Shimshoni (Haifa)
bull Mohamad Hegaze (Weizmann)
bull Alex Andoni (MIT)
bull Mica and Denis
- k-Nearest Neighbors Search in High Dimensions
- Outline
- Nearest Neighbor Search Problem definition
- Applications
- Naiumlve solution
- Common solution
- When to use nearest neighbor
- Nearest Neighbor
- r - Nearest Neighbor
- Slide 10
- The simplest solution
- Quadtree
- Quadtree - structure
- Quadtree - Query
- Quadtree ndash Pitfall1
- Slide 16
- Quadtree ndash pitfall 2
- Space partition based algorithms
- Slide 19
- Curse of dimensionality
- Curse of dimensionality Some intuition
- Slide 22
- Preview
- Hash function
- Slide 25
- Slide 26
- Recall r - Nearest Neighbor
- Locality sensitive hashing
- Slide 29
- Hamming Space
- Slide 31
- L1 to Hamming Space Embedding
- Slide 33
- Construction
- Query
- Alternative intuition random projections
- Slide 37
- Slide 38
- Slide 39
- k samplings
- Repeating
- Repeating L times
- Slide 43
- Secondary hashing
- The above hashing is locality-sensitive
- Slide 46
- Direct L2 solution
- Central limit theorem
- Slide 49
- Slide 50
- Norm Distance
- Slide 52
- The full Hashing
- Slide 54
- Slide 55
- Slide 56
- Generalization P-Stable distribution
- P-Stable summary
- Parameters selection
- Parameters selection hellip
- hellip Parameters selection
- Pros amp Cons
- Conclusion
- LSH - Applications
- Motivation
- Slide 66
- Given an image x what are the parameters θ in this image ie angles of joints orientation of the body etc1048698
- Ingredients
- Example based learning
- Slide 70
- The image features
- PSH The basic assumption
- Insight Manifolds
- Slide 74
- Parameter Sensitive Hashing (PSH)
- Slide 76
- Slide 77
- Slide 78
- Slide 79
- Local Weighted Regression (LWR)
- Results
- Slide 82
- Results ndash real data
- Slide 84
- Slide 85
- Fast pose estimation - summary
- Food for Thought
- Food for Thought Point Location in Different Spheres (PLDS)
- Motivation
- Slide 90
- Slide 91
- Slide 92
- Slide 93
- Image segmentation algorithm
- Slide 95
- Filtering examples
- Segmentation examples
- Mean-shift in high dimensions
- LSH-based data structure
- Choosing the optimal K and L
- Slide 101
- Choosing optimal K and L
- Slide 103
- Data driven partitions
- Additional speedup
- Speedup results
- Food for thought
- A thought for foodhellip
- Summary
- Slide 110
- Thanks
-
![Page 71: K-Nearest Neighbors Search in High Dimensions Tomer Peled Dan Kushnir Tell me who your neighbors are, and I'll know who you are.](https://reader036.fdocuments.us/reader036/viewer/2022062421/56649c765503460f94929f53/html5/thumbnails/71.jpg)
The image features
B A
Axx 4107 )(
4
3
2
4 0
Image features are multi-scale edge histograms
Feature Extraction PSH LWR
PSH The basic assumption
There are two metric spaces here feature space ( )
and parameter space ( )
We want similarity to be measured in the angles
space whereas LSH works on the feature space
bull Assumption The feature space is closely related to the parameter space
xd
d
Feature Extraction PSH LWR
Insight Manifolds
bull Manifold is a space in which every point has a neighborhood resembling a Euclid space
bull But global structure may be complicated curved
bull For example lines are 1D manifolds planes are 2D manifolds etc
Feature Extraction PSH LWR
Parameters Space (angles)
Feature Space
q
Is this Magic
Parameter Sensitive Hashing (PSH)
The trick
Estimate performance of different hash functions on examples and select those sensitive to
The hash functions are applied in feature space but the KNN are valid in angle space
d
Feature Extraction PSH LWR
Label pairs of examples with similar angles
Define hash functions h on feature space
Feature Extraction PSH LWR
Predict labeling of similarnon-similar examples by using h
Compare labeling
If labeling by h is goodaccept h else change h
PSH as a classification problem
+1 +1 -1 -1
(r=025)
Labels
)1()( if 1
)( if 1y
labeled is
)x()(x examples ofpair A
ij
ji
rd
rd
ji
ji
ji
Feature Extraction PSH LWR
otherwise 1-
T(x) if 1)(
xh T
A binary hash functionfeatures
otherwise 1
if 1ˆ
labels ePredict th
)(xh)(xh)x(xy
jTiTjih
Feature Extraction PSH LWR
Feature
Feature Extraction PSH LWR
sconstraint iesprobabilit with thelabeling true thepredicts that Tbest theFind
themseparateor bin
same in the examplesboth place willTh
)(xT
Local Weighted Regression (LWR)bull Given a query image PSH returns
KNNs
bull LWR uses the KNN to compute a weighted average of the estimated angles of the query
weightdist
iXiixNx
xxdKxgdi
0)(
))(())((minarg0
Feature Extraction PSH LWR
Results
Synthetic data were generated
bull 13 angles 1 for rotation of the torso 12 for joints
bull 150000 images
bull Nuisance parameters added clothing illumination face expression
bull 1775000 example pairs
bull Selected 137 out of 5123 meaningful features (how)
18 bit hash functions (k) 150 hash tables (l)
bull Test on 1000 synthetic examplesbull PSH searched only 34 of the data per query
bull Without selection needed 40 bits and
1000 hash tables
Recall P1 is prob of positive hashP2 is prob of bad hashB is the max number of pts in a bucket
Results ndash real data
bull 800 images
bull Processed by a segmentation algorithm
bull 13 of the data were searched
Results ndash real data
Interesting mismatches
Fast pose estimation - summary
bull Fast way to compute the angles of human body figure
bull Moving from one representation space to another
bull Training a sensitive hash function
bull KNN smart averaging
Food for Thought
bull The basic assumption may be problematic (distance metric representations)
bull The training set should be dense
bull Texture and clutter
bull General some features are more important than others and should be weighted
Food for Thought Point Location in Different Spheres (PLDS)
bull Given n spheres in Rd centered at P=p1hellippn
with radii r1helliprn
bull Goal given a query q preprocess the points in P to find point pi that its sphere lsquocoverrsquo the query q
qpi
ri
Courtesy of Mohamad Hegaze
Motivationbull Clustering high dimensional data by using local
density measurements (eg feature space)bull Statistical curse of dimensionality
sparseness of the databull Computational curse of dimensionality
expensive range queriesbull LSH parameters should be adjusted for optimal
performance
Mean-Shift Based Clustering in High Dimensions A Texture Classification Example
B Georgescu I Shimshoni and P Meer
Outline
bull Mean-shift in a nutshell + examples
Our scope
bull Mean-shift in high dimensions ndash using LSH
bull Speedups1 Finding optimal LSH parameters
2 Data-driven partitions into buckets
3 Additional speedup by using LSH data structure
Mean-Shift in a Nutshellbandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
point
KNN in mean-shift
Bandwidth should be inversely proportional to the density in the region
high density - small bandwidth low density - large bandwidth
Based on kth nearest neighbor of the point
The bandwidth is
Adaptive mean-shift vs non-adaptive
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Image segmentation algorithm1 Input Data in 5D (3 color + 2 xy) or 3D (1 gray +2 xy)2 Resolution controlled by the bandwidth hs (spatial) hr (color)3 Apply filtering
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
3D
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Image segmentation algorithm
original segmented
filtered
Filtering pixel value of the nearest mode
Mean-shift trajectories
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
original squirrel filtered
original baboon filtered
Filtering examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Segmentation examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Mean-shift in high dimensions
Computational curse of dimensionality
Statistical curse of dimensionality
Expensive range queries implemented with LSH
Sparseness of the data variable bandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
LSH-based data structure
bull Choose L random partitionsEach partition includes K pairs
(dkvk)bull For each point we check
kdi vxK
It Partitions the data into cells
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing the optimal K and L
bull For a query q compute smallest number of distances to points in its buckets
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
points extra includemight big toois L ifbut
missed bemight points small toois L If
cell ain points ofnumber smaller k Large
C
l
l
CC
dC
LNN
dKnN
)1(
C
C
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
structure data theof resolution thedetermines
decreases but increases increases L As
C
CC
Choosing optimal K and LDetermine accurately the KNN for m randomly-selected data points
distance (bandwidth)
Choose error threshold
The optimal K and L should satisfy
the approximate distance
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing optimal K and Lbull For each K estimate the error forbull In one run for all Lrsquos find the minimal L satisfying the constraint L(K)bull Minimize time t(KL(K))
minimum
Approximationerror for KL
L(K) for =005 Running timet[KL(K)]
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Data driven partitions
bull In original LSH cut values are random in the range of the databull Suggestion Randomly select a point from the data and use one of its coordinates as the cut value
uniform data driven pointsbucket distribution
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Additional speedup
aggregate)an of typea like is (C mode same
the toconverge willCin points all that Assume
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
C
C
Speedup results
65536 points 1638 points sampled k=100
Food for thought
Low dimension High dimension
A thought for foodhellipbull Choose K L by sample learning or take the traditionalbull Can one estimate K L without samplingbull A thought for food does it help to know the data
dimensionality or the data manifoldbull Intuitively dimensionality implies the number of hash
functions neededbull The catch efficient dimensionality learning requires
KNN
1530 cookieshellip
Summary
bull LSH suggests a compromise on accuracy for the gain of complexity
bull Applications that involve massive data in high dimension require the LSH fast performance
bull Extension of the LSH to different spaces (PSH)
bull Learning the LSH parameters and hash functions for different applications
Conclusion
bull but at the endeverything depends on your data set
bull Try it at homendash Visit
httpwebmiteduandoniwwwLSHindexhtmlndash Email Alex Andoni Andonimitedundash Test over your own data
(C code under Red Hat Linux )
Thanks
bull Ilan Shimshoni (Haifa)
bull Mohamad Hegaze (Weizmann)
bull Alex Andoni (MIT)
bull Mica and Denis
- k-Nearest Neighbors Search in High Dimensions
- Outline
- Nearest Neighbor Search Problem definition
- Applications
- Naiumlve solution
- Common solution
- When to use nearest neighbor
- Nearest Neighbor
- r - Nearest Neighbor
- Slide 10
- The simplest solution
- Quadtree
- Quadtree - structure
- Quadtree - Query
- Quadtree ndash Pitfall1
- Slide 16
- Quadtree ndash pitfall 2
- Space partition based algorithms
- Slide 19
- Curse of dimensionality
- Curse of dimensionality Some intuition
- Slide 22
- Preview
- Hash function
- Slide 25
- Slide 26
- Recall r - Nearest Neighbor
- Locality sensitive hashing
- Slide 29
- Hamming Space
- Slide 31
- L1 to Hamming Space Embedding
- Slide 33
- Construction
- Query
- Alternative intuition random projections
- Slide 37
- Slide 38
- Slide 39
- k samplings
- Repeating
- Repeating L times
- Slide 43
- Secondary hashing
- The above hashing is locality-sensitive
- Slide 46
- Direct L2 solution
- Central limit theorem
- Slide 49
- Slide 50
- Norm Distance
- Slide 52
- The full Hashing
- Slide 54
- Slide 55
- Slide 56
- Generalization P-Stable distribution
- P-Stable summary
- Parameters selection
- Parameters selection hellip
- hellip Parameters selection
- Pros amp Cons
- Conclusion
- LSH - Applications
- Motivation
- Slide 66
- Given an image x what are the parameters θ in this image ie angles of joints orientation of the body etc1048698
- Ingredients
- Example based learning
- Slide 70
- The image features
- PSH The basic assumption
- Insight Manifolds
- Slide 74
- Parameter Sensitive Hashing (PSH)
- Slide 76
- Slide 77
- Slide 78
- Slide 79
- Local Weighted Regression (LWR)
- Results
- Slide 82
- Results ndash real data
- Slide 84
- Slide 85
- Fast pose estimation - summary
- Food for Thought
- Food for Thought Point Location in Different Spheres (PLDS)
- Motivation
- Slide 90
- Slide 91
- Slide 92
- Slide 93
- Image segmentation algorithm
- Slide 95
- Filtering examples
- Segmentation examples
- Mean-shift in high dimensions
- LSH-based data structure
- Choosing the optimal K and L
- Slide 101
- Choosing optimal K and L
- Slide 103
- Data driven partitions
- Additional speedup
- Speedup results
- Food for thought
- A thought for foodhellip
- Summary
- Slide 110
- Thanks
-
![Page 72: K-Nearest Neighbors Search in High Dimensions Tomer Peled Dan Kushnir Tell me who your neighbors are, and I'll know who you are.](https://reader036.fdocuments.us/reader036/viewer/2022062421/56649c765503460f94929f53/html5/thumbnails/72.jpg)
PSH The basic assumption
There are two metric spaces here feature space ( )
and parameter space ( )
We want similarity to be measured in the angles
space whereas LSH works on the feature space
bull Assumption The feature space is closely related to the parameter space
xd
d
Feature Extraction PSH LWR
Insight Manifolds
bull Manifold is a space in which every point has a neighborhood resembling a Euclid space
bull But global structure may be complicated curved
bull For example lines are 1D manifolds planes are 2D manifolds etc
Feature Extraction PSH LWR
Parameters Space (angles)
Feature Space
q
Is this Magic
Parameter Sensitive Hashing (PSH)
The trick
Estimate performance of different hash functions on examples and select those sensitive to
The hash functions are applied in feature space but the KNN are valid in angle space
d
Feature Extraction PSH LWR
Label pairs of examples with similar angles
Define hash functions h on feature space
Feature Extraction PSH LWR
Predict labeling of similarnon-similar examples by using h
Compare labeling
If labeling by h is goodaccept h else change h
PSH as a classification problem
+1 +1 -1 -1
(r=025)
Labels
)1()( if 1
)( if 1y
labeled is
)x()(x examples ofpair A
ij
ji
rd
rd
ji
ji
ji
Feature Extraction PSH LWR
otherwise 1-
T(x) if 1)(
xh T
A binary hash functionfeatures
otherwise 1
if 1ˆ
labels ePredict th
)(xh)(xh)x(xy
jTiTjih
Feature Extraction PSH LWR
Feature
Feature Extraction PSH LWR
sconstraint iesprobabilit with thelabeling true thepredicts that Tbest theFind
themseparateor bin
same in the examplesboth place willTh
)(xT
Local Weighted Regression (LWR)bull Given a query image PSH returns
KNNs
bull LWR uses the KNN to compute a weighted average of the estimated angles of the query
weightdist
iXiixNx
xxdKxgdi
0)(
))(())((minarg0
Feature Extraction PSH LWR
Results
Synthetic data were generated
bull 13 angles 1 for rotation of the torso 12 for joints
bull 150000 images
bull Nuisance parameters added clothing illumination face expression
bull 1775000 example pairs
bull Selected 137 out of 5123 meaningful features (how)
18 bit hash functions (k) 150 hash tables (l)
bull Test on 1000 synthetic examplesbull PSH searched only 34 of the data per query
bull Without selection needed 40 bits and
1000 hash tables
Recall P1 is prob of positive hashP2 is prob of bad hashB is the max number of pts in a bucket
Results ndash real data
bull 800 images
bull Processed by a segmentation algorithm
bull 13 of the data were searched
Results ndash real data
Interesting mismatches
Fast pose estimation - summary
bull Fast way to compute the angles of human body figure
bull Moving from one representation space to another
bull Training a sensitive hash function
bull KNN smart averaging
Food for Thought
bull The basic assumption may be problematic (distance metric representations)
bull The training set should be dense
bull Texture and clutter
bull General some features are more important than others and should be weighted
Food for Thought Point Location in Different Spheres (PLDS)
bull Given n spheres in Rd centered at P=p1hellippn
with radii r1helliprn
bull Goal given a query q preprocess the points in P to find point pi that its sphere lsquocoverrsquo the query q
qpi
ri
Courtesy of Mohamad Hegaze
Motivationbull Clustering high dimensional data by using local
density measurements (eg feature space)bull Statistical curse of dimensionality
sparseness of the databull Computational curse of dimensionality
expensive range queriesbull LSH parameters should be adjusted for optimal
performance
Mean-Shift Based Clustering in High Dimensions A Texture Classification Example
B Georgescu I Shimshoni and P Meer
Outline
bull Mean-shift in a nutshell + examples
Our scope
bull Mean-shift in high dimensions ndash using LSH
bull Speedups1 Finding optimal LSH parameters
2 Data-driven partitions into buckets
3 Additional speedup by using LSH data structure
Mean-Shift in a Nutshellbandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
point
KNN in mean-shift
Bandwidth should be inversely proportional to the density in the region
high density - small bandwidth low density - large bandwidth
Based on kth nearest neighbor of the point
The bandwidth is
Adaptive mean-shift vs non-adaptive
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Image segmentation algorithm1 Input Data in 5D (3 color + 2 xy) or 3D (1 gray +2 xy)2 Resolution controlled by the bandwidth hs (spatial) hr (color)3 Apply filtering
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
3D
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Image segmentation algorithm
original segmented
filtered
Filtering pixel value of the nearest mode
Mean-shift trajectories
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
original squirrel filtered
original baboon filtered
Filtering examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Segmentation examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Mean-shift in high dimensions
Computational curse of dimensionality
Statistical curse of dimensionality
Expensive range queries implemented with LSH
Sparseness of the data variable bandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
LSH-based data structure
bull Choose L random partitionsEach partition includes K pairs
(dkvk)bull For each point we check
kdi vxK
It Partitions the data into cells
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing the optimal K and L
bull For a query q compute smallest number of distances to points in its buckets
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
points extra includemight big toois L ifbut
missed bemight points small toois L If
cell ain points ofnumber smaller k Large
C
l
l
CC
dC
LNN
dKnN
)1(
C
C
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
structure data theof resolution thedetermines
decreases but increases increases L As
C
CC
Choosing optimal K and LDetermine accurately the KNN for m randomly-selected data points
distance (bandwidth)
Choose error threshold
The optimal K and L should satisfy
the approximate distance
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing optimal K and Lbull For each K estimate the error forbull In one run for all Lrsquos find the minimal L satisfying the constraint L(K)bull Minimize time t(KL(K))
minimum
Approximationerror for KL
L(K) for =005 Running timet[KL(K)]
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Data driven partitions
bull In original LSH cut values are random in the range of the databull Suggestion Randomly select a point from the data and use one of its coordinates as the cut value
uniform data driven pointsbucket distribution
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Additional speedup
aggregate)an of typea like is (C mode same
the toconverge willCin points all that Assume
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
C
C
Speedup results
65536 points 1638 points sampled k=100
Food for thought
Low dimension High dimension
A thought for foodhellipbull Choose K L by sample learning or take the traditionalbull Can one estimate K L without samplingbull A thought for food does it help to know the data
dimensionality or the data manifoldbull Intuitively dimensionality implies the number of hash
functions neededbull The catch efficient dimensionality learning requires
KNN
1530 cookieshellip
Summary
bull LSH suggests a compromise on accuracy for the gain of complexity
bull Applications that involve massive data in high dimension require the LSH fast performance
bull Extension of the LSH to different spaces (PSH)
bull Learning the LSH parameters and hash functions for different applications
Conclusion
bull but at the endeverything depends on your data set
bull Try it at homendash Visit
httpwebmiteduandoniwwwLSHindexhtmlndash Email Alex Andoni Andonimitedundash Test over your own data
(C code under Red Hat Linux )
Thanks
bull Ilan Shimshoni (Haifa)
bull Mohamad Hegaze (Weizmann)
bull Alex Andoni (MIT)
bull Mica and Denis
- k-Nearest Neighbors Search in High Dimensions
- Outline
- Nearest Neighbor Search Problem definition
- Applications
- Naiumlve solution
- Common solution
- When to use nearest neighbor
- Nearest Neighbor
- r - Nearest Neighbor
- Slide 10
- The simplest solution
- Quadtree
- Quadtree - structure
- Quadtree - Query
- Quadtree ndash Pitfall1
- Slide 16
- Quadtree ndash pitfall 2
- Space partition based algorithms
- Slide 19
- Curse of dimensionality
- Curse of dimensionality Some intuition
- Slide 22
- Preview
- Hash function
- Slide 25
- Slide 26
- Recall r - Nearest Neighbor
- Locality sensitive hashing
- Slide 29
- Hamming Space
- Slide 31
- L1 to Hamming Space Embedding
- Slide 33
- Construction
- Query
- Alternative intuition random projections
- Slide 37
- Slide 38
- Slide 39
- k samplings
- Repeating
- Repeating L times
- Slide 43
- Secondary hashing
- The above hashing is locality-sensitive
- Slide 46
- Direct L2 solution
- Central limit theorem
- Slide 49
- Slide 50
- Norm Distance
- Slide 52
- The full Hashing
- Slide 54
- Slide 55
- Slide 56
- Generalization P-Stable distribution
- P-Stable summary
- Parameters selection
- Parameters selection hellip
- hellip Parameters selection
- Pros amp Cons
- Conclusion
- LSH - Applications
- Motivation
- Slide 66
- Given an image x what are the parameters θ in this image ie angles of joints orientation of the body etc1048698
- Ingredients
- Example based learning
- Slide 70
- The image features
- PSH The basic assumption
- Insight Manifolds
- Slide 74
- Parameter Sensitive Hashing (PSH)
- Slide 76
- Slide 77
- Slide 78
- Slide 79
- Local Weighted Regression (LWR)
- Results
- Slide 82
- Results ndash real data
- Slide 84
- Slide 85
- Fast pose estimation - summary
- Food for Thought
- Food for Thought Point Location in Different Spheres (PLDS)
- Motivation
- Slide 90
- Slide 91
- Slide 92
- Slide 93
- Image segmentation algorithm
- Slide 95
- Filtering examples
- Segmentation examples
- Mean-shift in high dimensions
- LSH-based data structure
- Choosing the optimal K and L
- Slide 101
- Choosing optimal K and L
- Slide 103
- Data driven partitions
- Additional speedup
- Speedup results
- Food for thought
- A thought for foodhellip
- Summary
- Slide 110
- Thanks
-
![Page 73: K-Nearest Neighbors Search in High Dimensions Tomer Peled Dan Kushnir Tell me who your neighbors are, and I'll know who you are.](https://reader036.fdocuments.us/reader036/viewer/2022062421/56649c765503460f94929f53/html5/thumbnails/73.jpg)
Insight Manifolds
bull Manifold is a space in which every point has a neighborhood resembling a Euclid space
bull But global structure may be complicated curved
bull For example lines are 1D manifolds planes are 2D manifolds etc
Feature Extraction PSH LWR
Parameters Space (angles)
Feature Space
q
Is this Magic
Parameter Sensitive Hashing (PSH)
The trick
Estimate performance of different hash functions on examples and select those sensitive to
The hash functions are applied in feature space but the KNN are valid in angle space
d
Feature Extraction PSH LWR
Label pairs of examples with similar angles
Define hash functions h on feature space
Feature Extraction PSH LWR
Predict labeling of similarnon-similar examples by using h
Compare labeling
If labeling by h is goodaccept h else change h
PSH as a classification problem
+1 +1 -1 -1
(r=025)
Labels
)1()( if 1
)( if 1y
labeled is
)x()(x examples ofpair A
ij
ji
rd
rd
ji
ji
ji
Feature Extraction PSH LWR
otherwise 1-
T(x) if 1)(
xh T
A binary hash functionfeatures
otherwise 1
if 1ˆ
labels ePredict th
)(xh)(xh)x(xy
jTiTjih
Feature Extraction PSH LWR
Feature
Feature Extraction PSH LWR
sconstraint iesprobabilit with thelabeling true thepredicts that Tbest theFind
themseparateor bin
same in the examplesboth place willTh
)(xT
Local Weighted Regression (LWR)bull Given a query image PSH returns
KNNs
bull LWR uses the KNN to compute a weighted average of the estimated angles of the query
weightdist
iXiixNx
xxdKxgdi
0)(
))(())((minarg0
Feature Extraction PSH LWR
Results
Synthetic data were generated
bull 13 angles 1 for rotation of the torso 12 for joints
bull 150000 images
bull Nuisance parameters added clothing illumination face expression
bull 1775000 example pairs
bull Selected 137 out of 5123 meaningful features (how)
18 bit hash functions (k) 150 hash tables (l)
bull Test on 1000 synthetic examplesbull PSH searched only 34 of the data per query
bull Without selection needed 40 bits and
1000 hash tables
Recall P1 is prob of positive hashP2 is prob of bad hashB is the max number of pts in a bucket
Results ndash real data
bull 800 images
bull Processed by a segmentation algorithm
bull 13 of the data were searched
Results ndash real data
Interesting mismatches
Fast pose estimation - summary
bull Fast way to compute the angles of human body figure
bull Moving from one representation space to another
bull Training a sensitive hash function
bull KNN smart averaging
Food for Thought
bull The basic assumption may be problematic (distance metric representations)
bull The training set should be dense
bull Texture and clutter
bull General some features are more important than others and should be weighted
Food for Thought Point Location in Different Spheres (PLDS)
bull Given n spheres in Rd centered at P=p1hellippn
with radii r1helliprn
bull Goal given a query q preprocess the points in P to find point pi that its sphere lsquocoverrsquo the query q
qpi
ri
Courtesy of Mohamad Hegaze
Motivationbull Clustering high dimensional data by using local
density measurements (eg feature space)bull Statistical curse of dimensionality
sparseness of the databull Computational curse of dimensionality
expensive range queriesbull LSH parameters should be adjusted for optimal
performance
Mean-Shift Based Clustering in High Dimensions A Texture Classification Example
B Georgescu I Shimshoni and P Meer
Outline
bull Mean-shift in a nutshell + examples
Our scope
bull Mean-shift in high dimensions ndash using LSH
bull Speedups1 Finding optimal LSH parameters
2 Data-driven partitions into buckets
3 Additional speedup by using LSH data structure
Mean-Shift in a Nutshellbandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
point
KNN in mean-shift
Bandwidth should be inversely proportional to the density in the region
high density - small bandwidth low density - large bandwidth
Based on kth nearest neighbor of the point
The bandwidth is
Adaptive mean-shift vs non-adaptive
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Image segmentation algorithm1 Input Data in 5D (3 color + 2 xy) or 3D (1 gray +2 xy)2 Resolution controlled by the bandwidth hs (spatial) hr (color)3 Apply filtering
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
3D
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Image segmentation algorithm
original segmented
filtered
Filtering pixel value of the nearest mode
Mean-shift trajectories
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
original squirrel filtered
original baboon filtered
Filtering examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Segmentation examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Mean-shift in high dimensions
Computational curse of dimensionality
Statistical curse of dimensionality
Expensive range queries implemented with LSH
Sparseness of the data variable bandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
LSH-based data structure
bull Choose L random partitionsEach partition includes K pairs
(dkvk)bull For each point we check
kdi vxK
It Partitions the data into cells
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing the optimal K and L
bull For a query q compute smallest number of distances to points in its buckets
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
points extra includemight big toois L ifbut
missed bemight points small toois L If
cell ain points ofnumber smaller k Large
C
l
l
CC
dC
LNN
dKnN
)1(
C
C
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
structure data theof resolution thedetermines
decreases but increases increases L As
C
CC
Choosing optimal K and LDetermine accurately the KNN for m randomly-selected data points
distance (bandwidth)
Choose error threshold
The optimal K and L should satisfy
the approximate distance
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing optimal K and Lbull For each K estimate the error forbull In one run for all Lrsquos find the minimal L satisfying the constraint L(K)bull Minimize time t(KL(K))
minimum
Approximationerror for KL
L(K) for =005 Running timet[KL(K)]
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Data driven partitions
bull In original LSH cut values are random in the range of the databull Suggestion Randomly select a point from the data and use one of its coordinates as the cut value
uniform data driven pointsbucket distribution
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Additional speedup
aggregate)an of typea like is (C mode same
the toconverge willCin points all that Assume
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
C
C
Speedup results
65536 points 1638 points sampled k=100
Food for thought
Low dimension High dimension
A thought for foodhellipbull Choose K L by sample learning or take the traditionalbull Can one estimate K L without samplingbull A thought for food does it help to know the data
dimensionality or the data manifoldbull Intuitively dimensionality implies the number of hash
functions neededbull The catch efficient dimensionality learning requires
KNN
1530 cookieshellip
Summary
bull LSH suggests a compromise on accuracy for the gain of complexity
bull Applications that involve massive data in high dimension require the LSH fast performance
bull Extension of the LSH to different spaces (PSH)
bull Learning the LSH parameters and hash functions for different applications
Conclusion
bull but at the endeverything depends on your data set
bull Try it at homendash Visit
httpwebmiteduandoniwwwLSHindexhtmlndash Email Alex Andoni Andonimitedundash Test over your own data
(C code under Red Hat Linux )
Thanks
bull Ilan Shimshoni (Haifa)
bull Mohamad Hegaze (Weizmann)
bull Alex Andoni (MIT)
bull Mica and Denis
- k-Nearest Neighbors Search in High Dimensions
- Outline
- Nearest Neighbor Search Problem definition
- Applications
- Naiumlve solution
- Common solution
- When to use nearest neighbor
- Nearest Neighbor
- r - Nearest Neighbor
- Slide 10
- The simplest solution
- Quadtree
- Quadtree - structure
- Quadtree - Query
- Quadtree ndash Pitfall1
- Slide 16
- Quadtree ndash pitfall 2
- Space partition based algorithms
- Slide 19
- Curse of dimensionality
- Curse of dimensionality Some intuition
- Slide 22
- Preview
- Hash function
- Slide 25
- Slide 26
- Recall r - Nearest Neighbor
- Locality sensitive hashing
- Slide 29
- Hamming Space
- Slide 31
- L1 to Hamming Space Embedding
- Slide 33
- Construction
- Query
- Alternative intuition random projections
- Slide 37
- Slide 38
- Slide 39
- k samplings
- Repeating
- Repeating L times
- Slide 43
- Secondary hashing
- The above hashing is locality-sensitive
- Slide 46
- Direct L2 solution
- Central limit theorem
- Slide 49
- Slide 50
- Norm Distance
- Slide 52
- The full Hashing
- Slide 54
- Slide 55
- Slide 56
- Generalization P-Stable distribution
- P-Stable summary
- Parameters selection
- Parameters selection hellip
- hellip Parameters selection
- Pros amp Cons
- Conclusion
- LSH - Applications
- Motivation
- Slide 66
- Given an image x what are the parameters θ in this image ie angles of joints orientation of the body etc1048698
- Ingredients
- Example based learning
- Slide 70
- The image features
- PSH The basic assumption
- Insight Manifolds
- Slide 74
- Parameter Sensitive Hashing (PSH)
- Slide 76
- Slide 77
- Slide 78
- Slide 79
- Local Weighted Regression (LWR)
- Results
- Slide 82
- Results ndash real data
- Slide 84
- Slide 85
- Fast pose estimation - summary
- Food for Thought
- Food for Thought Point Location in Different Spheres (PLDS)
- Motivation
- Slide 90
- Slide 91
- Slide 92
- Slide 93
- Image segmentation algorithm
- Slide 95
- Filtering examples
- Segmentation examples
- Mean-shift in high dimensions
- LSH-based data structure
- Choosing the optimal K and L
- Slide 101
- Choosing optimal K and L
- Slide 103
- Data driven partitions
- Additional speedup
- Speedup results
- Food for thought
- A thought for foodhellip
- Summary
- Slide 110
- Thanks
-
![Page 74: K-Nearest Neighbors Search in High Dimensions Tomer Peled Dan Kushnir Tell me who your neighbors are, and I'll know who you are.](https://reader036.fdocuments.us/reader036/viewer/2022062421/56649c765503460f94929f53/html5/thumbnails/74.jpg)
Parameters Space (angles)
Feature Space
q
Is this Magic
Parameter Sensitive Hashing (PSH)
The trick
Estimate performance of different hash functions on examples and select those sensitive to
The hash functions are applied in feature space but the KNN are valid in angle space
d
Feature Extraction PSH LWR
Label pairs of examples with similar angles
Define hash functions h on feature space
Feature Extraction PSH LWR
Predict labeling of similarnon-similar examples by using h
Compare labeling
If labeling by h is goodaccept h else change h
PSH as a classification problem
+1 +1 -1 -1
(r=025)
Labels
)1()( if 1
)( if 1y
labeled is
)x()(x examples ofpair A
ij
ji
rd
rd
ji
ji
ji
Feature Extraction PSH LWR
otherwise 1-
T(x) if 1)(
xh T
A binary hash functionfeatures
otherwise 1
if 1ˆ
labels ePredict th
)(xh)(xh)x(xy
jTiTjih
Feature Extraction PSH LWR
Feature
Feature Extraction PSH LWR
sconstraint iesprobabilit with thelabeling true thepredicts that Tbest theFind
themseparateor bin
same in the examplesboth place willTh
)(xT
Local Weighted Regression (LWR)bull Given a query image PSH returns
KNNs
bull LWR uses the KNN to compute a weighted average of the estimated angles of the query
weightdist
iXiixNx
xxdKxgdi
0)(
))(())((minarg0
Feature Extraction PSH LWR
Results
Synthetic data were generated
bull 13 angles 1 for rotation of the torso 12 for joints
bull 150000 images
bull Nuisance parameters added clothing illumination face expression
bull 1775000 example pairs
bull Selected 137 out of 5123 meaningful features (how)
18 bit hash functions (k) 150 hash tables (l)
bull Test on 1000 synthetic examplesbull PSH searched only 34 of the data per query
bull Without selection needed 40 bits and
1000 hash tables
Recall P1 is prob of positive hashP2 is prob of bad hashB is the max number of pts in a bucket
Results ndash real data
bull 800 images
bull Processed by a segmentation algorithm
bull 13 of the data were searched
Results ndash real data
Interesting mismatches
Fast pose estimation - summary
bull Fast way to compute the angles of human body figure
bull Moving from one representation space to another
bull Training a sensitive hash function
bull KNN smart averaging
Food for Thought
bull The basic assumption may be problematic (distance metric representations)
bull The training set should be dense
bull Texture and clutter
bull General some features are more important than others and should be weighted
Food for Thought Point Location in Different Spheres (PLDS)
bull Given n spheres in Rd centered at P=p1hellippn
with radii r1helliprn
bull Goal given a query q preprocess the points in P to find point pi that its sphere lsquocoverrsquo the query q
qpi
ri
Courtesy of Mohamad Hegaze
Motivationbull Clustering high dimensional data by using local
density measurements (eg feature space)bull Statistical curse of dimensionality
sparseness of the databull Computational curse of dimensionality
expensive range queriesbull LSH parameters should be adjusted for optimal
performance
Mean-Shift Based Clustering in High Dimensions A Texture Classification Example
B Georgescu I Shimshoni and P Meer
Outline
bull Mean-shift in a nutshell + examples
Our scope
bull Mean-shift in high dimensions ndash using LSH
bull Speedups1 Finding optimal LSH parameters
2 Data-driven partitions into buckets
3 Additional speedup by using LSH data structure
Mean-Shift in a Nutshellbandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
point
KNN in mean-shift
Bandwidth should be inversely proportional to the density in the region
high density - small bandwidth low density - large bandwidth
Based on kth nearest neighbor of the point
The bandwidth is
Adaptive mean-shift vs non-adaptive
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Image segmentation algorithm1 Input Data in 5D (3 color + 2 xy) or 3D (1 gray +2 xy)2 Resolution controlled by the bandwidth hs (spatial) hr (color)3 Apply filtering
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
3D
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Image segmentation algorithm
original segmented
filtered
Filtering pixel value of the nearest mode
Mean-shift trajectories
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
original squirrel filtered
original baboon filtered
Filtering examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Segmentation examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Mean-shift in high dimensions
Computational curse of dimensionality
Statistical curse of dimensionality
Expensive range queries implemented with LSH
Sparseness of the data variable bandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
LSH-based data structure
bull Choose L random partitionsEach partition includes K pairs
(dkvk)bull For each point we check
kdi vxK
It Partitions the data into cells
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing the optimal K and L
bull For a query q compute smallest number of distances to points in its buckets
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
points extra includemight big toois L ifbut
missed bemight points small toois L If
cell ain points ofnumber smaller k Large
C
l
l
CC
dC
LNN
dKnN
)1(
C
C
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
structure data theof resolution thedetermines
decreases but increases increases L As
C
CC
Choosing optimal K and LDetermine accurately the KNN for m randomly-selected data points
distance (bandwidth)
Choose error threshold
The optimal K and L should satisfy
the approximate distance
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing optimal K and Lbull For each K estimate the error forbull In one run for all Lrsquos find the minimal L satisfying the constraint L(K)bull Minimize time t(KL(K))
minimum
Approximationerror for KL
L(K) for =005 Running timet[KL(K)]
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Data driven partitions
bull In original LSH cut values are random in the range of the databull Suggestion Randomly select a point from the data and use one of its coordinates as the cut value
uniform data driven pointsbucket distribution
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Additional speedup
aggregate)an of typea like is (C mode same
the toconverge willCin points all that Assume
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
C
C
Speedup results
65536 points 1638 points sampled k=100
Food for thought
Low dimension High dimension
A thought for foodhellipbull Choose K L by sample learning or take the traditionalbull Can one estimate K L without samplingbull A thought for food does it help to know the data
dimensionality or the data manifoldbull Intuitively dimensionality implies the number of hash
functions neededbull The catch efficient dimensionality learning requires
KNN
1530 cookieshellip
Summary
bull LSH suggests a compromise on accuracy for the gain of complexity
bull Applications that involve massive data in high dimension require the LSH fast performance
bull Extension of the LSH to different spaces (PSH)
bull Learning the LSH parameters and hash functions for different applications
Conclusion
bull but at the endeverything depends on your data set
bull Try it at homendash Visit
httpwebmiteduandoniwwwLSHindexhtmlndash Email Alex Andoni Andonimitedundash Test over your own data
(C code under Red Hat Linux )
Thanks
bull Ilan Shimshoni (Haifa)
bull Mohamad Hegaze (Weizmann)
bull Alex Andoni (MIT)
bull Mica and Denis
- k-Nearest Neighbors Search in High Dimensions
- Outline
- Nearest Neighbor Search Problem definition
- Applications
- Naiumlve solution
- Common solution
- When to use nearest neighbor
- Nearest Neighbor
- r - Nearest Neighbor
- Slide 10
- The simplest solution
- Quadtree
- Quadtree - structure
- Quadtree - Query
- Quadtree ndash Pitfall1
- Slide 16
- Quadtree ndash pitfall 2
- Space partition based algorithms
- Slide 19
- Curse of dimensionality
- Curse of dimensionality Some intuition
- Slide 22
- Preview
- Hash function
- Slide 25
- Slide 26
- Recall r - Nearest Neighbor
- Locality sensitive hashing
- Slide 29
- Hamming Space
- Slide 31
- L1 to Hamming Space Embedding
- Slide 33
- Construction
- Query
- Alternative intuition random projections
- Slide 37
- Slide 38
- Slide 39
- k samplings
- Repeating
- Repeating L times
- Slide 43
- Secondary hashing
- The above hashing is locality-sensitive
- Slide 46
- Direct L2 solution
- Central limit theorem
- Slide 49
- Slide 50
- Norm Distance
- Slide 52
- The full Hashing
- Slide 54
- Slide 55
- Slide 56
- Generalization P-Stable distribution
- P-Stable summary
- Parameters selection
- Parameters selection hellip
- hellip Parameters selection
- Pros amp Cons
- Conclusion
- LSH - Applications
- Motivation
- Slide 66
- Given an image x what are the parameters θ in this image ie angles of joints orientation of the body etc1048698
- Ingredients
- Example based learning
- Slide 70
- The image features
- PSH The basic assumption
- Insight Manifolds
- Slide 74
- Parameter Sensitive Hashing (PSH)
- Slide 76
- Slide 77
- Slide 78
- Slide 79
- Local Weighted Regression (LWR)
- Results
- Slide 82
- Results ndash real data
- Slide 84
- Slide 85
- Fast pose estimation - summary
- Food for Thought
- Food for Thought Point Location in Different Spheres (PLDS)
- Motivation
- Slide 90
- Slide 91
- Slide 92
- Slide 93
- Image segmentation algorithm
- Slide 95
- Filtering examples
- Segmentation examples
- Mean-shift in high dimensions
- LSH-based data structure
- Choosing the optimal K and L
- Slide 101
- Choosing optimal K and L
- Slide 103
- Data driven partitions
- Additional speedup
- Speedup results
- Food for thought
- A thought for foodhellip
- Summary
- Slide 110
- Thanks
-
![Page 75: K-Nearest Neighbors Search in High Dimensions Tomer Peled Dan Kushnir Tell me who your neighbors are, and I'll know who you are.](https://reader036.fdocuments.us/reader036/viewer/2022062421/56649c765503460f94929f53/html5/thumbnails/75.jpg)
Parameter Sensitive Hashing (PSH)
The trick
Estimate performance of different hash functions on examples and select those sensitive to
The hash functions are applied in feature space but the KNN are valid in angle space
d
Feature Extraction PSH LWR
Label pairs of examples with similar angles
Define hash functions h on feature space
Feature Extraction PSH LWR
Predict labeling of similarnon-similar examples by using h
Compare labeling
If labeling by h is goodaccept h else change h
PSH as a classification problem
+1 +1 -1 -1
(r=025)
Labels
)1()( if 1
)( if 1y
labeled is
)x()(x examples ofpair A
ij
ji
rd
rd
ji
ji
ji
Feature Extraction PSH LWR
otherwise 1-
T(x) if 1)(
xh T
A binary hash functionfeatures
otherwise 1
if 1ˆ
labels ePredict th
)(xh)(xh)x(xy
jTiTjih
Feature Extraction PSH LWR
Feature
Feature Extraction PSH LWR
sconstraint iesprobabilit with thelabeling true thepredicts that Tbest theFind
themseparateor bin
same in the examplesboth place willTh
)(xT
Local Weighted Regression (LWR)bull Given a query image PSH returns
KNNs
bull LWR uses the KNN to compute a weighted average of the estimated angles of the query
weightdist
iXiixNx
xxdKxgdi
0)(
))(())((minarg0
Feature Extraction PSH LWR
Results
Synthetic data were generated
bull 13 angles 1 for rotation of the torso 12 for joints
bull 150000 images
bull Nuisance parameters added clothing illumination face expression
bull 1775000 example pairs
bull Selected 137 out of 5123 meaningful features (how)
18 bit hash functions (k) 150 hash tables (l)
bull Test on 1000 synthetic examplesbull PSH searched only 34 of the data per query
bull Without selection needed 40 bits and
1000 hash tables
Recall P1 is prob of positive hashP2 is prob of bad hashB is the max number of pts in a bucket
Results ndash real data
bull 800 images
bull Processed by a segmentation algorithm
bull 13 of the data were searched
Results ndash real data
Interesting mismatches
Fast pose estimation - summary
bull Fast way to compute the angles of human body figure
bull Moving from one representation space to another
bull Training a sensitive hash function
bull KNN smart averaging
Food for Thought
bull The basic assumption may be problematic (distance metric representations)
bull The training set should be dense
bull Texture and clutter
bull General some features are more important than others and should be weighted
Food for Thought Point Location in Different Spheres (PLDS)
bull Given n spheres in Rd centered at P=p1hellippn
with radii r1helliprn
bull Goal given a query q preprocess the points in P to find point pi that its sphere lsquocoverrsquo the query q
qpi
ri
Courtesy of Mohamad Hegaze
Motivationbull Clustering high dimensional data by using local
density measurements (eg feature space)bull Statistical curse of dimensionality
sparseness of the databull Computational curse of dimensionality
expensive range queriesbull LSH parameters should be adjusted for optimal
performance
Mean-Shift Based Clustering in High Dimensions A Texture Classification Example
B Georgescu I Shimshoni and P Meer
Outline
bull Mean-shift in a nutshell + examples
Our scope
bull Mean-shift in high dimensions ndash using LSH
bull Speedups1 Finding optimal LSH parameters
2 Data-driven partitions into buckets
3 Additional speedup by using LSH data structure
Mean-Shift in a Nutshellbandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
point
KNN in mean-shift
Bandwidth should be inversely proportional to the density in the region
high density - small bandwidth low density - large bandwidth
Based on kth nearest neighbor of the point
The bandwidth is
Adaptive mean-shift vs non-adaptive
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Image segmentation algorithm1 Input Data in 5D (3 color + 2 xy) or 3D (1 gray +2 xy)2 Resolution controlled by the bandwidth hs (spatial) hr (color)3 Apply filtering
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
3D
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Image segmentation algorithm
original segmented
filtered
Filtering pixel value of the nearest mode
Mean-shift trajectories
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
original squirrel filtered
original baboon filtered
Filtering examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Segmentation examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Mean-shift in high dimensions
Computational curse of dimensionality
Statistical curse of dimensionality
Expensive range queries implemented with LSH
Sparseness of the data variable bandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
LSH-based data structure
bull Choose L random partitionsEach partition includes K pairs
(dkvk)bull For each point we check
kdi vxK
It Partitions the data into cells
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing the optimal K and L
bull For a query q compute smallest number of distances to points in its buckets
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
points extra includemight big toois L ifbut
missed bemight points small toois L If
cell ain points ofnumber smaller k Large
C
l
l
CC
dC
LNN
dKnN
)1(
C
C
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
structure data theof resolution thedetermines
decreases but increases increases L As
C
CC
Choosing optimal K and LDetermine accurately the KNN for m randomly-selected data points
distance (bandwidth)
Choose error threshold
The optimal K and L should satisfy
the approximate distance
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing optimal K and Lbull For each K estimate the error forbull In one run for all Lrsquos find the minimal L satisfying the constraint L(K)bull Minimize time t(KL(K))
minimum
Approximationerror for KL
L(K) for =005 Running timet[KL(K)]
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Data driven partitions
bull In original LSH cut values are random in the range of the databull Suggestion Randomly select a point from the data and use one of its coordinates as the cut value
uniform data driven pointsbucket distribution
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Additional speedup
aggregate)an of typea like is (C mode same
the toconverge willCin points all that Assume
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
C
C
Speedup results
65536 points 1638 points sampled k=100
Food for thought
Low dimension High dimension
A thought for foodhellipbull Choose K L by sample learning or take the traditionalbull Can one estimate K L without samplingbull A thought for food does it help to know the data
dimensionality or the data manifoldbull Intuitively dimensionality implies the number of hash
functions neededbull The catch efficient dimensionality learning requires
KNN
1530 cookieshellip
Summary
bull LSH suggests a compromise on accuracy for the gain of complexity
bull Applications that involve massive data in high dimension require the LSH fast performance
bull Extension of the LSH to different spaces (PSH)
bull Learning the LSH parameters and hash functions for different applications
Conclusion
bull but at the endeverything depends on your data set
bull Try it at homendash Visit
httpwebmiteduandoniwwwLSHindexhtmlndash Email Alex Andoni Andonimitedundash Test over your own data
(C code under Red Hat Linux )
Thanks
bull Ilan Shimshoni (Haifa)
bull Mohamad Hegaze (Weizmann)
bull Alex Andoni (MIT)
bull Mica and Denis
- k-Nearest Neighbors Search in High Dimensions
- Outline
- Nearest Neighbor Search Problem definition
- Applications
- Naiumlve solution
- Common solution
- When to use nearest neighbor
- Nearest Neighbor
- r - Nearest Neighbor
- Slide 10
- The simplest solution
- Quadtree
- Quadtree - structure
- Quadtree - Query
- Quadtree ndash Pitfall1
- Slide 16
- Quadtree ndash pitfall 2
- Space partition based algorithms
- Slide 19
- Curse of dimensionality
- Curse of dimensionality Some intuition
- Slide 22
- Preview
- Hash function
- Slide 25
- Slide 26
- Recall r - Nearest Neighbor
- Locality sensitive hashing
- Slide 29
- Hamming Space
- Slide 31
- L1 to Hamming Space Embedding
- Slide 33
- Construction
- Query
- Alternative intuition random projections
- Slide 37
- Slide 38
- Slide 39
- k samplings
- Repeating
- Repeating L times
- Slide 43
- Secondary hashing
- The above hashing is locality-sensitive
- Slide 46
- Direct L2 solution
- Central limit theorem
- Slide 49
- Slide 50
- Norm Distance
- Slide 52
- The full Hashing
- Slide 54
- Slide 55
- Slide 56
- Generalization P-Stable distribution
- P-Stable summary
- Parameters selection
- Parameters selection hellip
- hellip Parameters selection
- Pros amp Cons
- Conclusion
- LSH - Applications
- Motivation
- Slide 66
- Given an image x what are the parameters θ in this image ie angles of joints orientation of the body etc1048698
- Ingredients
- Example based learning
- Slide 70
- The image features
- PSH The basic assumption
- Insight Manifolds
- Slide 74
- Parameter Sensitive Hashing (PSH)
- Slide 76
- Slide 77
- Slide 78
- Slide 79
- Local Weighted Regression (LWR)
- Results
- Slide 82
- Results ndash real data
- Slide 84
- Slide 85
- Fast pose estimation - summary
- Food for Thought
- Food for Thought Point Location in Different Spheres (PLDS)
- Motivation
- Slide 90
- Slide 91
- Slide 92
- Slide 93
- Image segmentation algorithm
- Slide 95
- Filtering examples
- Segmentation examples
- Mean-shift in high dimensions
- LSH-based data structure
- Choosing the optimal K and L
- Slide 101
- Choosing optimal K and L
- Slide 103
- Data driven partitions
- Additional speedup
- Speedup results
- Food for thought
- A thought for foodhellip
- Summary
- Slide 110
- Thanks
-
![Page 76: K-Nearest Neighbors Search in High Dimensions Tomer Peled Dan Kushnir Tell me who your neighbors are, and I'll know who you are.](https://reader036.fdocuments.us/reader036/viewer/2022062421/56649c765503460f94929f53/html5/thumbnails/76.jpg)
Label pairs of examples with similar angles
Define hash functions h on feature space
Feature Extraction PSH LWR
Predict labeling of similarnon-similar examples by using h
Compare labeling
If labeling by h is goodaccept h else change h
PSH as a classification problem
+1 +1 -1 -1
(r=025)
Labels
)1()( if 1
)( if 1y
labeled is
)x()(x examples ofpair A
ij
ji
rd
rd
ji
ji
ji
Feature Extraction PSH LWR
otherwise 1-
T(x) if 1)(
xh T
A binary hash functionfeatures
otherwise 1
if 1ˆ
labels ePredict th
)(xh)(xh)x(xy
jTiTjih
Feature Extraction PSH LWR
Feature
Feature Extraction PSH LWR
sconstraint iesprobabilit with thelabeling true thepredicts that Tbest theFind
themseparateor bin
same in the examplesboth place willTh
)(xT
Local Weighted Regression (LWR)bull Given a query image PSH returns
KNNs
bull LWR uses the KNN to compute a weighted average of the estimated angles of the query
weightdist
iXiixNx
xxdKxgdi
0)(
))(())((minarg0
Feature Extraction PSH LWR
Results
Synthetic data were generated
bull 13 angles 1 for rotation of the torso 12 for joints
bull 150000 images
bull Nuisance parameters added clothing illumination face expression
bull 1775000 example pairs
bull Selected 137 out of 5123 meaningful features (how)
18 bit hash functions (k) 150 hash tables (l)
bull Test on 1000 synthetic examplesbull PSH searched only 34 of the data per query
bull Without selection needed 40 bits and
1000 hash tables
Recall P1 is prob of positive hashP2 is prob of bad hashB is the max number of pts in a bucket
Results ndash real data
bull 800 images
bull Processed by a segmentation algorithm
bull 13 of the data were searched
Results ndash real data
Interesting mismatches
Fast pose estimation - summary
bull Fast way to compute the angles of human body figure
bull Moving from one representation space to another
bull Training a sensitive hash function
bull KNN smart averaging
Food for Thought
bull The basic assumption may be problematic (distance metric representations)
bull The training set should be dense
bull Texture and clutter
bull General some features are more important than others and should be weighted
Food for Thought Point Location in Different Spheres (PLDS)
bull Given n spheres in Rd centered at P=p1hellippn
with radii r1helliprn
bull Goal given a query q preprocess the points in P to find point pi that its sphere lsquocoverrsquo the query q
qpi
ri
Courtesy of Mohamad Hegaze
Motivationbull Clustering high dimensional data by using local
density measurements (eg feature space)bull Statistical curse of dimensionality
sparseness of the databull Computational curse of dimensionality
expensive range queriesbull LSH parameters should be adjusted for optimal
performance
Mean-Shift Based Clustering in High Dimensions A Texture Classification Example
B Georgescu I Shimshoni and P Meer
Outline
bull Mean-shift in a nutshell + examples
Our scope
bull Mean-shift in high dimensions ndash using LSH
bull Speedups1 Finding optimal LSH parameters
2 Data-driven partitions into buckets
3 Additional speedup by using LSH data structure
Mean-Shift in a Nutshellbandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
point
KNN in mean-shift
Bandwidth should be inversely proportional to the density in the region
high density - small bandwidth low density - large bandwidth
Based on kth nearest neighbor of the point
The bandwidth is
Adaptive mean-shift vs non-adaptive
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Image segmentation algorithm1 Input Data in 5D (3 color + 2 xy) or 3D (1 gray +2 xy)2 Resolution controlled by the bandwidth hs (spatial) hr (color)3 Apply filtering
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
3D
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Image segmentation algorithm
original segmented
filtered
Filtering pixel value of the nearest mode
Mean-shift trajectories
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
original squirrel filtered
original baboon filtered
Filtering examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Segmentation examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Mean-shift in high dimensions
Computational curse of dimensionality
Statistical curse of dimensionality
Expensive range queries implemented with LSH
Sparseness of the data variable bandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
LSH-based data structure
bull Choose L random partitionsEach partition includes K pairs
(dkvk)bull For each point we check
kdi vxK
It Partitions the data into cells
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing the optimal K and L
bull For a query q compute smallest number of distances to points in its buckets
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
points extra includemight big toois L ifbut
missed bemight points small toois L If
cell ain points ofnumber smaller k Large
C
l
l
CC
dC
LNN
dKnN
)1(
C
C
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
structure data theof resolution thedetermines
decreases but increases increases L As
C
CC
Choosing optimal K and LDetermine accurately the KNN for m randomly-selected data points
distance (bandwidth)
Choose error threshold
The optimal K and L should satisfy
the approximate distance
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing optimal K and Lbull For each K estimate the error forbull In one run for all Lrsquos find the minimal L satisfying the constraint L(K)bull Minimize time t(KL(K))
minimum
Approximationerror for KL
L(K) for =005 Running timet[KL(K)]
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Data driven partitions
bull In original LSH cut values are random in the range of the databull Suggestion Randomly select a point from the data and use one of its coordinates as the cut value
uniform data driven pointsbucket distribution
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Additional speedup
aggregate)an of typea like is (C mode same
the toconverge willCin points all that Assume
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
C
C
Speedup results
65536 points 1638 points sampled k=100
Food for thought
Low dimension High dimension
A thought for foodhellipbull Choose K L by sample learning or take the traditionalbull Can one estimate K L without samplingbull A thought for food does it help to know the data
dimensionality or the data manifoldbull Intuitively dimensionality implies the number of hash
functions neededbull The catch efficient dimensionality learning requires
KNN
1530 cookieshellip
Summary
bull LSH suggests a compromise on accuracy for the gain of complexity
bull Applications that involve massive data in high dimension require the LSH fast performance
bull Extension of the LSH to different spaces (PSH)
bull Learning the LSH parameters and hash functions for different applications
Conclusion
bull but at the endeverything depends on your data set
bull Try it at homendash Visit
httpwebmiteduandoniwwwLSHindexhtmlndash Email Alex Andoni Andonimitedundash Test over your own data
(C code under Red Hat Linux )
Thanks
bull Ilan Shimshoni (Haifa)
bull Mohamad Hegaze (Weizmann)
bull Alex Andoni (MIT)
bull Mica and Denis
- k-Nearest Neighbors Search in High Dimensions
- Outline
- Nearest Neighbor Search Problem definition
- Applications
- Naiumlve solution
- Common solution
- When to use nearest neighbor
- Nearest Neighbor
- r - Nearest Neighbor
- Slide 10
- The simplest solution
- Quadtree
- Quadtree - structure
- Quadtree - Query
- Quadtree ndash Pitfall1
- Slide 16
- Quadtree ndash pitfall 2
- Space partition based algorithms
- Slide 19
- Curse of dimensionality
- Curse of dimensionality Some intuition
- Slide 22
- Preview
- Hash function
- Slide 25
- Slide 26
- Recall r - Nearest Neighbor
- Locality sensitive hashing
- Slide 29
- Hamming Space
- Slide 31
- L1 to Hamming Space Embedding
- Slide 33
- Construction
- Query
- Alternative intuition random projections
- Slide 37
- Slide 38
- Slide 39
- k samplings
- Repeating
- Repeating L times
- Slide 43
- Secondary hashing
- The above hashing is locality-sensitive
- Slide 46
- Direct L2 solution
- Central limit theorem
- Slide 49
- Slide 50
- Norm Distance
- Slide 52
- The full Hashing
- Slide 54
- Slide 55
- Slide 56
- Generalization P-Stable distribution
- P-Stable summary
- Parameters selection
- Parameters selection hellip
- hellip Parameters selection
- Pros amp Cons
- Conclusion
- LSH - Applications
- Motivation
- Slide 66
- Given an image x what are the parameters θ in this image ie angles of joints orientation of the body etc1048698
- Ingredients
- Example based learning
- Slide 70
- The image features
- PSH The basic assumption
- Insight Manifolds
- Slide 74
- Parameter Sensitive Hashing (PSH)
- Slide 76
- Slide 77
- Slide 78
- Slide 79
- Local Weighted Regression (LWR)
- Results
- Slide 82
- Results ndash real data
- Slide 84
- Slide 85
- Fast pose estimation - summary
- Food for Thought
- Food for Thought Point Location in Different Spheres (PLDS)
- Motivation
- Slide 90
- Slide 91
- Slide 92
- Slide 93
- Image segmentation algorithm
- Slide 95
- Filtering examples
- Segmentation examples
- Mean-shift in high dimensions
- LSH-based data structure
- Choosing the optimal K and L
- Slide 101
- Choosing optimal K and L
- Slide 103
- Data driven partitions
- Additional speedup
- Speedup results
- Food for thought
- A thought for foodhellip
- Summary
- Slide 110
- Thanks
-
![Page 77: K-Nearest Neighbors Search in High Dimensions Tomer Peled Dan Kushnir Tell me who your neighbors are, and I'll know who you are.](https://reader036.fdocuments.us/reader036/viewer/2022062421/56649c765503460f94929f53/html5/thumbnails/77.jpg)
+1 +1 -1 -1
(r=025)
Labels
)1()( if 1
)( if 1y
labeled is
)x()(x examples ofpair A
ij
ji
rd
rd
ji
ji
ji
Feature Extraction PSH LWR
otherwise 1-
T(x) if 1)(
xh T
A binary hash functionfeatures
otherwise 1
if 1ˆ
labels ePredict th
)(xh)(xh)x(xy
jTiTjih
Feature Extraction PSH LWR
Feature
Feature Extraction PSH LWR
sconstraint iesprobabilit with thelabeling true thepredicts that Tbest theFind
themseparateor bin
same in the examplesboth place willTh
)(xT
Local Weighted Regression (LWR)bull Given a query image PSH returns
KNNs
bull LWR uses the KNN to compute a weighted average of the estimated angles of the query
weightdist
iXiixNx
xxdKxgdi
0)(
))(())((minarg0
Feature Extraction PSH LWR
Results
Synthetic data were generated
bull 13 angles 1 for rotation of the torso 12 for joints
bull 150000 images
bull Nuisance parameters added clothing illumination face expression
bull 1775000 example pairs
bull Selected 137 out of 5123 meaningful features (how)
18 bit hash functions (k) 150 hash tables (l)
bull Test on 1000 synthetic examplesbull PSH searched only 34 of the data per query
bull Without selection needed 40 bits and
1000 hash tables
Recall P1 is prob of positive hashP2 is prob of bad hashB is the max number of pts in a bucket
Results ndash real data
bull 800 images
bull Processed by a segmentation algorithm
bull 13 of the data were searched
Results ndash real data
Interesting mismatches
Fast pose estimation - summary
bull Fast way to compute the angles of human body figure
bull Moving from one representation space to another
bull Training a sensitive hash function
bull KNN smart averaging
Food for Thought
bull The basic assumption may be problematic (distance metric representations)
bull The training set should be dense
bull Texture and clutter
bull General some features are more important than others and should be weighted
Food for Thought Point Location in Different Spheres (PLDS)
bull Given n spheres in Rd centered at P=p1hellippn
with radii r1helliprn
bull Goal given a query q preprocess the points in P to find point pi that its sphere lsquocoverrsquo the query q
qpi
ri
Courtesy of Mohamad Hegaze
Motivationbull Clustering high dimensional data by using local
density measurements (eg feature space)bull Statistical curse of dimensionality
sparseness of the databull Computational curse of dimensionality
expensive range queriesbull LSH parameters should be adjusted for optimal
performance
Mean-Shift Based Clustering in High Dimensions A Texture Classification Example
B Georgescu I Shimshoni and P Meer
Outline
bull Mean-shift in a nutshell + examples
Our scope
bull Mean-shift in high dimensions ndash using LSH
bull Speedups1 Finding optimal LSH parameters
2 Data-driven partitions into buckets
3 Additional speedup by using LSH data structure
Mean-Shift in a Nutshellbandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
point
KNN in mean-shift
Bandwidth should be inversely proportional to the density in the region
high density - small bandwidth low density - large bandwidth
Based on kth nearest neighbor of the point
The bandwidth is
Adaptive mean-shift vs non-adaptive
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Image segmentation algorithm1 Input Data in 5D (3 color + 2 xy) or 3D (1 gray +2 xy)2 Resolution controlled by the bandwidth hs (spatial) hr (color)3 Apply filtering
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
3D
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Image segmentation algorithm
original segmented
filtered
Filtering pixel value of the nearest mode
Mean-shift trajectories
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
original squirrel filtered
original baboon filtered
Filtering examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Segmentation examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Mean-shift in high dimensions
Computational curse of dimensionality
Statistical curse of dimensionality
Expensive range queries implemented with LSH
Sparseness of the data variable bandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
LSH-based data structure
bull Choose L random partitionsEach partition includes K pairs
(dkvk)bull For each point we check
kdi vxK
It Partitions the data into cells
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing the optimal K and L
bull For a query q compute smallest number of distances to points in its buckets
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
points extra includemight big toois L ifbut
missed bemight points small toois L If
cell ain points ofnumber smaller k Large
C
l
l
CC
dC
LNN
dKnN
)1(
C
C
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
structure data theof resolution thedetermines
decreases but increases increases L As
C
CC
Choosing optimal K and LDetermine accurately the KNN for m randomly-selected data points
distance (bandwidth)
Choose error threshold
The optimal K and L should satisfy
the approximate distance
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing optimal K and Lbull For each K estimate the error forbull In one run for all Lrsquos find the minimal L satisfying the constraint L(K)bull Minimize time t(KL(K))
minimum
Approximationerror for KL
L(K) for =005 Running timet[KL(K)]
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Data driven partitions
bull In original LSH cut values are random in the range of the databull Suggestion Randomly select a point from the data and use one of its coordinates as the cut value
uniform data driven pointsbucket distribution
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Additional speedup
aggregate)an of typea like is (C mode same
the toconverge willCin points all that Assume
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
C
C
Speedup results
65536 points 1638 points sampled k=100
Food for thought
Low dimension High dimension
A thought for foodhellipbull Choose K L by sample learning or take the traditionalbull Can one estimate K L without samplingbull A thought for food does it help to know the data
dimensionality or the data manifoldbull Intuitively dimensionality implies the number of hash
functions neededbull The catch efficient dimensionality learning requires
KNN
1530 cookieshellip
Summary
bull LSH suggests a compromise on accuracy for the gain of complexity
bull Applications that involve massive data in high dimension require the LSH fast performance
bull Extension of the LSH to different spaces (PSH)
bull Learning the LSH parameters and hash functions for different applications
Conclusion
bull but at the endeverything depends on your data set
bull Try it at homendash Visit
httpwebmiteduandoniwwwLSHindexhtmlndash Email Alex Andoni Andonimitedundash Test over your own data
(C code under Red Hat Linux )
Thanks
bull Ilan Shimshoni (Haifa)
bull Mohamad Hegaze (Weizmann)
bull Alex Andoni (MIT)
bull Mica and Denis
- k-Nearest Neighbors Search in High Dimensions
- Outline
- Nearest Neighbor Search Problem definition
- Applications
- Naiumlve solution
- Common solution
- When to use nearest neighbor
- Nearest Neighbor
- r - Nearest Neighbor
- Slide 10
- The simplest solution
- Quadtree
- Quadtree - structure
- Quadtree - Query
- Quadtree ndash Pitfall1
- Slide 16
- Quadtree ndash pitfall 2
- Space partition based algorithms
- Slide 19
- Curse of dimensionality
- Curse of dimensionality Some intuition
- Slide 22
- Preview
- Hash function
- Slide 25
- Slide 26
- Recall r - Nearest Neighbor
- Locality sensitive hashing
- Slide 29
- Hamming Space
- Slide 31
- L1 to Hamming Space Embedding
- Slide 33
- Construction
- Query
- Alternative intuition random projections
- Slide 37
- Slide 38
- Slide 39
- k samplings
- Repeating
- Repeating L times
- Slide 43
- Secondary hashing
- The above hashing is locality-sensitive
- Slide 46
- Direct L2 solution
- Central limit theorem
- Slide 49
- Slide 50
- Norm Distance
- Slide 52
- The full Hashing
- Slide 54
- Slide 55
- Slide 56
- Generalization P-Stable distribution
- P-Stable summary
- Parameters selection
- Parameters selection hellip
- hellip Parameters selection
- Pros amp Cons
- Conclusion
- LSH - Applications
- Motivation
- Slide 66
- Given an image x what are the parameters θ in this image ie angles of joints orientation of the body etc1048698
- Ingredients
- Example based learning
- Slide 70
- The image features
- PSH The basic assumption
- Insight Manifolds
- Slide 74
- Parameter Sensitive Hashing (PSH)
- Slide 76
- Slide 77
- Slide 78
- Slide 79
- Local Weighted Regression (LWR)
- Results
- Slide 82
- Results ndash real data
- Slide 84
- Slide 85
- Fast pose estimation - summary
- Food for Thought
- Food for Thought Point Location in Different Spheres (PLDS)
- Motivation
- Slide 90
- Slide 91
- Slide 92
- Slide 93
- Image segmentation algorithm
- Slide 95
- Filtering examples
- Segmentation examples
- Mean-shift in high dimensions
- LSH-based data structure
- Choosing the optimal K and L
- Slide 101
- Choosing optimal K and L
- Slide 103
- Data driven partitions
- Additional speedup
- Speedup results
- Food for thought
- A thought for foodhellip
- Summary
- Slide 110
- Thanks
-
![Page 78: K-Nearest Neighbors Search in High Dimensions Tomer Peled Dan Kushnir Tell me who your neighbors are, and I'll know who you are.](https://reader036.fdocuments.us/reader036/viewer/2022062421/56649c765503460f94929f53/html5/thumbnails/78.jpg)
otherwise 1-
T(x) if 1)(
xh T
A binary hash functionfeatures
otherwise 1
if 1ˆ
labels ePredict th
)(xh)(xh)x(xy
jTiTjih
Feature Extraction PSH LWR
Feature
Feature Extraction PSH LWR
sconstraint iesprobabilit with thelabeling true thepredicts that Tbest theFind
themseparateor bin
same in the examplesboth place willTh
)(xT
Local Weighted Regression (LWR)bull Given a query image PSH returns
KNNs
bull LWR uses the KNN to compute a weighted average of the estimated angles of the query
weightdist
iXiixNx
xxdKxgdi
0)(
))(())((minarg0
Feature Extraction PSH LWR
Results
Synthetic data were generated
bull 13 angles 1 for rotation of the torso 12 for joints
bull 150000 images
bull Nuisance parameters added clothing illumination face expression
bull 1775000 example pairs
bull Selected 137 out of 5123 meaningful features (how)
18 bit hash functions (k) 150 hash tables (l)
bull Test on 1000 synthetic examplesbull PSH searched only 34 of the data per query
bull Without selection needed 40 bits and
1000 hash tables
Recall P1 is prob of positive hashP2 is prob of bad hashB is the max number of pts in a bucket
Results ndash real data
bull 800 images
bull Processed by a segmentation algorithm
bull 13 of the data were searched
Results ndash real data
Interesting mismatches
Fast pose estimation - summary
bull Fast way to compute the angles of human body figure
bull Moving from one representation space to another
bull Training a sensitive hash function
bull KNN smart averaging
Food for Thought
bull The basic assumption may be problematic (distance metric representations)
bull The training set should be dense
bull Texture and clutter
bull General some features are more important than others and should be weighted
Food for Thought Point Location in Different Spheres (PLDS)
bull Given n spheres in Rd centered at P=p1hellippn
with radii r1helliprn
bull Goal given a query q preprocess the points in P to find point pi that its sphere lsquocoverrsquo the query q
qpi
ri
Courtesy of Mohamad Hegaze
Motivationbull Clustering high dimensional data by using local
density measurements (eg feature space)bull Statistical curse of dimensionality
sparseness of the databull Computational curse of dimensionality
expensive range queriesbull LSH parameters should be adjusted for optimal
performance
Mean-Shift Based Clustering in High Dimensions A Texture Classification Example
B Georgescu I Shimshoni and P Meer
Outline
bull Mean-shift in a nutshell + examples
Our scope
bull Mean-shift in high dimensions ndash using LSH
bull Speedups1 Finding optimal LSH parameters
2 Data-driven partitions into buckets
3 Additional speedup by using LSH data structure
Mean-Shift in a Nutshellbandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
point
KNN in mean-shift
Bandwidth should be inversely proportional to the density in the region
high density - small bandwidth low density - large bandwidth
Based on kth nearest neighbor of the point
The bandwidth is
Adaptive mean-shift vs non-adaptive
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Image segmentation algorithm1 Input Data in 5D (3 color + 2 xy) or 3D (1 gray +2 xy)2 Resolution controlled by the bandwidth hs (spatial) hr (color)3 Apply filtering
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
3D
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Image segmentation algorithm
original segmented
filtered
Filtering pixel value of the nearest mode
Mean-shift trajectories
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
original squirrel filtered
original baboon filtered
Filtering examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Segmentation examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Mean-shift in high dimensions
Computational curse of dimensionality
Statistical curse of dimensionality
Expensive range queries implemented with LSH
Sparseness of the data variable bandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
LSH-based data structure
bull Choose L random partitionsEach partition includes K pairs
(dkvk)bull For each point we check
kdi vxK
It Partitions the data into cells
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing the optimal K and L
bull For a query q compute smallest number of distances to points in its buckets
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
points extra includemight big toois L ifbut
missed bemight points small toois L If
cell ain points ofnumber smaller k Large
C
l
l
CC
dC
LNN
dKnN
)1(
C
C
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
structure data theof resolution thedetermines
decreases but increases increases L As
C
CC
Choosing optimal K and LDetermine accurately the KNN for m randomly-selected data points
distance (bandwidth)
Choose error threshold
The optimal K and L should satisfy
the approximate distance
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing optimal K and Lbull For each K estimate the error forbull In one run for all Lrsquos find the minimal L satisfying the constraint L(K)bull Minimize time t(KL(K))
minimum
Approximationerror for KL
L(K) for =005 Running timet[KL(K)]
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Data driven partitions
bull In original LSH cut values are random in the range of the databull Suggestion Randomly select a point from the data and use one of its coordinates as the cut value
uniform data driven pointsbucket distribution
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Additional speedup
aggregate)an of typea like is (C mode same
the toconverge willCin points all that Assume
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
C
C
Speedup results
65536 points 1638 points sampled k=100
Food for thought
Low dimension High dimension
A thought for foodhellipbull Choose K L by sample learning or take the traditionalbull Can one estimate K L without samplingbull A thought for food does it help to know the data
dimensionality or the data manifoldbull Intuitively dimensionality implies the number of hash
functions neededbull The catch efficient dimensionality learning requires
KNN
1530 cookieshellip
Summary
bull LSH suggests a compromise on accuracy for the gain of complexity
bull Applications that involve massive data in high dimension require the LSH fast performance
bull Extension of the LSH to different spaces (PSH)
bull Learning the LSH parameters and hash functions for different applications
Conclusion
bull but at the endeverything depends on your data set
bull Try it at homendash Visit
httpwebmiteduandoniwwwLSHindexhtmlndash Email Alex Andoni Andonimitedundash Test over your own data
(C code under Red Hat Linux )
Thanks
bull Ilan Shimshoni (Haifa)
bull Mohamad Hegaze (Weizmann)
bull Alex Andoni (MIT)
bull Mica and Denis
- k-Nearest Neighbors Search in High Dimensions
- Outline
- Nearest Neighbor Search Problem definition
- Applications
- Naiumlve solution
- Common solution
- When to use nearest neighbor
- Nearest Neighbor
- r - Nearest Neighbor
- Slide 10
- The simplest solution
- Quadtree
- Quadtree - structure
- Quadtree - Query
- Quadtree ndash Pitfall1
- Slide 16
- Quadtree ndash pitfall 2
- Space partition based algorithms
- Slide 19
- Curse of dimensionality
- Curse of dimensionality Some intuition
- Slide 22
- Preview
- Hash function
- Slide 25
- Slide 26
- Recall r - Nearest Neighbor
- Locality sensitive hashing
- Slide 29
- Hamming Space
- Slide 31
- L1 to Hamming Space Embedding
- Slide 33
- Construction
- Query
- Alternative intuition random projections
- Slide 37
- Slide 38
- Slide 39
- k samplings
- Repeating
- Repeating L times
- Slide 43
- Secondary hashing
- The above hashing is locality-sensitive
- Slide 46
- Direct L2 solution
- Central limit theorem
- Slide 49
- Slide 50
- Norm Distance
- Slide 52
- The full Hashing
- Slide 54
- Slide 55
- Slide 56
- Generalization P-Stable distribution
- P-Stable summary
- Parameters selection
- Parameters selection hellip
- hellip Parameters selection
- Pros amp Cons
- Conclusion
- LSH - Applications
- Motivation
- Slide 66
- Given an image x what are the parameters θ in this image ie angles of joints orientation of the body etc1048698
- Ingredients
- Example based learning
- Slide 70
- The image features
- PSH The basic assumption
- Insight Manifolds
- Slide 74
- Parameter Sensitive Hashing (PSH)
- Slide 76
- Slide 77
- Slide 78
- Slide 79
- Local Weighted Regression (LWR)
- Results
- Slide 82
- Results ndash real data
- Slide 84
- Slide 85
- Fast pose estimation - summary
- Food for Thought
- Food for Thought Point Location in Different Spheres (PLDS)
- Motivation
- Slide 90
- Slide 91
- Slide 92
- Slide 93
- Image segmentation algorithm
- Slide 95
- Filtering examples
- Segmentation examples
- Mean-shift in high dimensions
- LSH-based data structure
- Choosing the optimal K and L
- Slide 101
- Choosing optimal K and L
- Slide 103
- Data driven partitions
- Additional speedup
- Speedup results
- Food for thought
- A thought for foodhellip
- Summary
- Slide 110
- Thanks
-
![Page 79: K-Nearest Neighbors Search in High Dimensions Tomer Peled Dan Kushnir Tell me who your neighbors are, and I'll know who you are.](https://reader036.fdocuments.us/reader036/viewer/2022062421/56649c765503460f94929f53/html5/thumbnails/79.jpg)
Feature Extraction PSH LWR
sconstraint iesprobabilit with thelabeling true thepredicts that Tbest theFind
themseparateor bin
same in the examplesboth place willTh
)(xT
Local Weighted Regression (LWR)bull Given a query image PSH returns
KNNs
bull LWR uses the KNN to compute a weighted average of the estimated angles of the query
weightdist
iXiixNx
xxdKxgdi
0)(
))(())((minarg0
Feature Extraction PSH LWR
Results
Synthetic data were generated
bull 13 angles 1 for rotation of the torso 12 for joints
bull 150000 images
bull Nuisance parameters added clothing illumination face expression
bull 1775000 example pairs
bull Selected 137 out of 5123 meaningful features (how)
18 bit hash functions (k) 150 hash tables (l)
bull Test on 1000 synthetic examplesbull PSH searched only 34 of the data per query
bull Without selection needed 40 bits and
1000 hash tables
Recall P1 is prob of positive hashP2 is prob of bad hashB is the max number of pts in a bucket
Results ndash real data
bull 800 images
bull Processed by a segmentation algorithm
bull 13 of the data were searched
Results ndash real data
Interesting mismatches
Fast pose estimation - summary
bull Fast way to compute the angles of human body figure
bull Moving from one representation space to another
bull Training a sensitive hash function
bull KNN smart averaging
Food for Thought
bull The basic assumption may be problematic (distance metric representations)
bull The training set should be dense
bull Texture and clutter
bull General some features are more important than others and should be weighted
Food for Thought Point Location in Different Spheres (PLDS)
bull Given n spheres in Rd centered at P=p1hellippn
with radii r1helliprn
bull Goal given a query q preprocess the points in P to find point pi that its sphere lsquocoverrsquo the query q
qpi
ri
Courtesy of Mohamad Hegaze
Motivationbull Clustering high dimensional data by using local
density measurements (eg feature space)bull Statistical curse of dimensionality
sparseness of the databull Computational curse of dimensionality
expensive range queriesbull LSH parameters should be adjusted for optimal
performance
Mean-Shift Based Clustering in High Dimensions A Texture Classification Example
B Georgescu I Shimshoni and P Meer
Outline
bull Mean-shift in a nutshell + examples
Our scope
bull Mean-shift in high dimensions ndash using LSH
bull Speedups1 Finding optimal LSH parameters
2 Data-driven partitions into buckets
3 Additional speedup by using LSH data structure
Mean-Shift in a Nutshellbandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
point
KNN in mean-shift
Bandwidth should be inversely proportional to the density in the region
high density - small bandwidth low density - large bandwidth
Based on kth nearest neighbor of the point
The bandwidth is
Adaptive mean-shift vs non-adaptive
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Image segmentation algorithm1 Input Data in 5D (3 color + 2 xy) or 3D (1 gray +2 xy)2 Resolution controlled by the bandwidth hs (spatial) hr (color)3 Apply filtering
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
3D
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Image segmentation algorithm
original segmented
filtered
Filtering pixel value of the nearest mode
Mean-shift trajectories
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
original squirrel filtered
original baboon filtered
Filtering examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Segmentation examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Mean-shift in high dimensions
Computational curse of dimensionality
Statistical curse of dimensionality
Expensive range queries implemented with LSH
Sparseness of the data variable bandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
LSH-based data structure
bull Choose L random partitionsEach partition includes K pairs
(dkvk)bull For each point we check
kdi vxK
It Partitions the data into cells
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing the optimal K and L
bull For a query q compute smallest number of distances to points in its buckets
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
points extra includemight big toois L ifbut
missed bemight points small toois L If
cell ain points ofnumber smaller k Large
C
l
l
CC
dC
LNN
dKnN
)1(
C
C
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
structure data theof resolution thedetermines
decreases but increases increases L As
C
CC
Choosing optimal K and LDetermine accurately the KNN for m randomly-selected data points
distance (bandwidth)
Choose error threshold
The optimal K and L should satisfy
the approximate distance
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing optimal K and Lbull For each K estimate the error forbull In one run for all Lrsquos find the minimal L satisfying the constraint L(K)bull Minimize time t(KL(K))
minimum
Approximationerror for KL
L(K) for =005 Running timet[KL(K)]
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Data driven partitions
bull In original LSH cut values are random in the range of the databull Suggestion Randomly select a point from the data and use one of its coordinates as the cut value
uniform data driven pointsbucket distribution
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Additional speedup
aggregate)an of typea like is (C mode same
the toconverge willCin points all that Assume
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
C
C
Speedup results
65536 points 1638 points sampled k=100
Food for thought
Low dimension High dimension
A thought for foodhellipbull Choose K L by sample learning or take the traditionalbull Can one estimate K L without samplingbull A thought for food does it help to know the data
dimensionality or the data manifoldbull Intuitively dimensionality implies the number of hash
functions neededbull The catch efficient dimensionality learning requires
KNN
1530 cookieshellip
Summary
bull LSH suggests a compromise on accuracy for the gain of complexity
bull Applications that involve massive data in high dimension require the LSH fast performance
bull Extension of the LSH to different spaces (PSH)
bull Learning the LSH parameters and hash functions for different applications
Conclusion
bull but at the endeverything depends on your data set
bull Try it at homendash Visit
httpwebmiteduandoniwwwLSHindexhtmlndash Email Alex Andoni Andonimitedundash Test over your own data
(C code under Red Hat Linux )
Thanks
bull Ilan Shimshoni (Haifa)
bull Mohamad Hegaze (Weizmann)
bull Alex Andoni (MIT)
bull Mica and Denis
- k-Nearest Neighbors Search in High Dimensions
- Outline
- Nearest Neighbor Search Problem definition
- Applications
- Naiumlve solution
- Common solution
- When to use nearest neighbor
- Nearest Neighbor
- r - Nearest Neighbor
- Slide 10
- The simplest solution
- Quadtree
- Quadtree - structure
- Quadtree - Query
- Quadtree ndash Pitfall1
- Slide 16
- Quadtree ndash pitfall 2
- Space partition based algorithms
- Slide 19
- Curse of dimensionality
- Curse of dimensionality Some intuition
- Slide 22
- Preview
- Hash function
- Slide 25
- Slide 26
- Recall r - Nearest Neighbor
- Locality sensitive hashing
- Slide 29
- Hamming Space
- Slide 31
- L1 to Hamming Space Embedding
- Slide 33
- Construction
- Query
- Alternative intuition random projections
- Slide 37
- Slide 38
- Slide 39
- k samplings
- Repeating
- Repeating L times
- Slide 43
- Secondary hashing
- The above hashing is locality-sensitive
- Slide 46
- Direct L2 solution
- Central limit theorem
- Slide 49
- Slide 50
- Norm Distance
- Slide 52
- The full Hashing
- Slide 54
- Slide 55
- Slide 56
- Generalization P-Stable distribution
- P-Stable summary
- Parameters selection
- Parameters selection hellip
- hellip Parameters selection
- Pros amp Cons
- Conclusion
- LSH - Applications
- Motivation
- Slide 66
- Given an image x what are the parameters θ in this image ie angles of joints orientation of the body etc1048698
- Ingredients
- Example based learning
- Slide 70
- The image features
- PSH The basic assumption
- Insight Manifolds
- Slide 74
- Parameter Sensitive Hashing (PSH)
- Slide 76
- Slide 77
- Slide 78
- Slide 79
- Local Weighted Regression (LWR)
- Results
- Slide 82
- Results ndash real data
- Slide 84
- Slide 85
- Fast pose estimation - summary
- Food for Thought
- Food for Thought Point Location in Different Spheres (PLDS)
- Motivation
- Slide 90
- Slide 91
- Slide 92
- Slide 93
- Image segmentation algorithm
- Slide 95
- Filtering examples
- Segmentation examples
- Mean-shift in high dimensions
- LSH-based data structure
- Choosing the optimal K and L
- Slide 101
- Choosing optimal K and L
- Slide 103
- Data driven partitions
- Additional speedup
- Speedup results
- Food for thought
- A thought for foodhellip
- Summary
- Slide 110
- Thanks
-
![Page 80: K-Nearest Neighbors Search in High Dimensions Tomer Peled Dan Kushnir Tell me who your neighbors are, and I'll know who you are.](https://reader036.fdocuments.us/reader036/viewer/2022062421/56649c765503460f94929f53/html5/thumbnails/80.jpg)
Local Weighted Regression (LWR)bull Given a query image PSH returns
KNNs
bull LWR uses the KNN to compute a weighted average of the estimated angles of the query
weightdist
iXiixNx
xxdKxgdi
0)(
))(())((minarg0
Feature Extraction PSH LWR
Results
Synthetic data were generated
bull 13 angles 1 for rotation of the torso 12 for joints
bull 150000 images
bull Nuisance parameters added clothing illumination face expression
bull 1775000 example pairs
bull Selected 137 out of 5123 meaningful features (how)
18 bit hash functions (k) 150 hash tables (l)
bull Test on 1000 synthetic examplesbull PSH searched only 34 of the data per query
bull Without selection needed 40 bits and
1000 hash tables
Recall P1 is prob of positive hashP2 is prob of bad hashB is the max number of pts in a bucket
Results ndash real data
bull 800 images
bull Processed by a segmentation algorithm
bull 13 of the data were searched
Results ndash real data
Interesting mismatches
Fast pose estimation - summary
bull Fast way to compute the angles of human body figure
bull Moving from one representation space to another
bull Training a sensitive hash function
bull KNN smart averaging
Food for Thought
bull The basic assumption may be problematic (distance metric representations)
bull The training set should be dense
bull Texture and clutter
bull General some features are more important than others and should be weighted
Food for Thought Point Location in Different Spheres (PLDS)
bull Given n spheres in Rd centered at P=p1hellippn
with radii r1helliprn
bull Goal given a query q preprocess the points in P to find point pi that its sphere lsquocoverrsquo the query q
qpi
ri
Courtesy of Mohamad Hegaze
Motivationbull Clustering high dimensional data by using local
density measurements (eg feature space)bull Statistical curse of dimensionality
sparseness of the databull Computational curse of dimensionality
expensive range queriesbull LSH parameters should be adjusted for optimal
performance
Mean-Shift Based Clustering in High Dimensions A Texture Classification Example
B Georgescu I Shimshoni and P Meer
Outline
bull Mean-shift in a nutshell + examples
Our scope
bull Mean-shift in high dimensions ndash using LSH
bull Speedups1 Finding optimal LSH parameters
2 Data-driven partitions into buckets
3 Additional speedup by using LSH data structure
Mean-Shift in a Nutshellbandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
point
KNN in mean-shift
Bandwidth should be inversely proportional to the density in the region
high density - small bandwidth low density - large bandwidth
Based on kth nearest neighbor of the point
The bandwidth is
Adaptive mean-shift vs non-adaptive
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Image segmentation algorithm1 Input Data in 5D (3 color + 2 xy) or 3D (1 gray +2 xy)2 Resolution controlled by the bandwidth hs (spatial) hr (color)3 Apply filtering
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
3D
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Image segmentation algorithm
original segmented
filtered
Filtering pixel value of the nearest mode
Mean-shift trajectories
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
original squirrel filtered
original baboon filtered
Filtering examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Segmentation examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Mean-shift in high dimensions
Computational curse of dimensionality
Statistical curse of dimensionality
Expensive range queries implemented with LSH
Sparseness of the data variable bandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
LSH-based data structure
bull Choose L random partitionsEach partition includes K pairs
(dkvk)bull For each point we check
kdi vxK
It Partitions the data into cells
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing the optimal K and L
bull For a query q compute smallest number of distances to points in its buckets
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
points extra includemight big toois L ifbut
missed bemight points small toois L If
cell ain points ofnumber smaller k Large
C
l
l
CC
dC
LNN
dKnN
)1(
C
C
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
structure data theof resolution thedetermines
decreases but increases increases L As
C
CC
Choosing optimal K and LDetermine accurately the KNN for m randomly-selected data points
distance (bandwidth)
Choose error threshold
The optimal K and L should satisfy
the approximate distance
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing optimal K and Lbull For each K estimate the error forbull In one run for all Lrsquos find the minimal L satisfying the constraint L(K)bull Minimize time t(KL(K))
minimum
Approximationerror for KL
L(K) for =005 Running timet[KL(K)]
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Data driven partitions
bull In original LSH cut values are random in the range of the databull Suggestion Randomly select a point from the data and use one of its coordinates as the cut value
uniform data driven pointsbucket distribution
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Additional speedup
aggregate)an of typea like is (C mode same
the toconverge willCin points all that Assume
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
C
C
Speedup results
65536 points 1638 points sampled k=100
Food for thought
Low dimension High dimension
A thought for foodhellipbull Choose K L by sample learning or take the traditionalbull Can one estimate K L without samplingbull A thought for food does it help to know the data
dimensionality or the data manifoldbull Intuitively dimensionality implies the number of hash
functions neededbull The catch efficient dimensionality learning requires
KNN
1530 cookieshellip
Summary
bull LSH suggests a compromise on accuracy for the gain of complexity
bull Applications that involve massive data in high dimension require the LSH fast performance
bull Extension of the LSH to different spaces (PSH)
bull Learning the LSH parameters and hash functions for different applications
Conclusion
bull but at the endeverything depends on your data set
bull Try it at homendash Visit
httpwebmiteduandoniwwwLSHindexhtmlndash Email Alex Andoni Andonimitedundash Test over your own data
(C code under Red Hat Linux )
Thanks
bull Ilan Shimshoni (Haifa)
bull Mohamad Hegaze (Weizmann)
bull Alex Andoni (MIT)
bull Mica and Denis
- k-Nearest Neighbors Search in High Dimensions
- Outline
- Nearest Neighbor Search Problem definition
- Applications
- Naiumlve solution
- Common solution
- When to use nearest neighbor
- Nearest Neighbor
- r - Nearest Neighbor
- Slide 10
- The simplest solution
- Quadtree
- Quadtree - structure
- Quadtree - Query
- Quadtree ndash Pitfall1
- Slide 16
- Quadtree ndash pitfall 2
- Space partition based algorithms
- Slide 19
- Curse of dimensionality
- Curse of dimensionality Some intuition
- Slide 22
- Preview
- Hash function
- Slide 25
- Slide 26
- Recall r - Nearest Neighbor
- Locality sensitive hashing
- Slide 29
- Hamming Space
- Slide 31
- L1 to Hamming Space Embedding
- Slide 33
- Construction
- Query
- Alternative intuition random projections
- Slide 37
- Slide 38
- Slide 39
- k samplings
- Repeating
- Repeating L times
- Slide 43
- Secondary hashing
- The above hashing is locality-sensitive
- Slide 46
- Direct L2 solution
- Central limit theorem
- Slide 49
- Slide 50
- Norm Distance
- Slide 52
- The full Hashing
- Slide 54
- Slide 55
- Slide 56
- Generalization P-Stable distribution
- P-Stable summary
- Parameters selection
- Parameters selection hellip
- hellip Parameters selection
- Pros amp Cons
- Conclusion
- LSH - Applications
- Motivation
- Slide 66
- Given an image x what are the parameters θ in this image ie angles of joints orientation of the body etc1048698
- Ingredients
- Example based learning
- Slide 70
- The image features
- PSH The basic assumption
- Insight Manifolds
- Slide 74
- Parameter Sensitive Hashing (PSH)
- Slide 76
- Slide 77
- Slide 78
- Slide 79
- Local Weighted Regression (LWR)
- Results
- Slide 82
- Results ndash real data
- Slide 84
- Slide 85
- Fast pose estimation - summary
- Food for Thought
- Food for Thought Point Location in Different Spheres (PLDS)
- Motivation
- Slide 90
- Slide 91
- Slide 92
- Slide 93
- Image segmentation algorithm
- Slide 95
- Filtering examples
- Segmentation examples
- Mean-shift in high dimensions
- LSH-based data structure
- Choosing the optimal K and L
- Slide 101
- Choosing optimal K and L
- Slide 103
- Data driven partitions
- Additional speedup
- Speedup results
- Food for thought
- A thought for foodhellip
- Summary
- Slide 110
- Thanks
-
![Page 81: K-Nearest Neighbors Search in High Dimensions Tomer Peled Dan Kushnir Tell me who your neighbors are, and I'll know who you are.](https://reader036.fdocuments.us/reader036/viewer/2022062421/56649c765503460f94929f53/html5/thumbnails/81.jpg)
Results
Synthetic data were generated
bull 13 angles 1 for rotation of the torso 12 for joints
bull 150000 images
bull Nuisance parameters added clothing illumination face expression
bull 1775000 example pairs
bull Selected 137 out of 5123 meaningful features (how)
18 bit hash functions (k) 150 hash tables (l)
bull Test on 1000 synthetic examplesbull PSH searched only 34 of the data per query
bull Without selection needed 40 bits and
1000 hash tables
Recall P1 is prob of positive hashP2 is prob of bad hashB is the max number of pts in a bucket
Results ndash real data
bull 800 images
bull Processed by a segmentation algorithm
bull 13 of the data were searched
Results ndash real data
Interesting mismatches
Fast pose estimation - summary
bull Fast way to compute the angles of human body figure
bull Moving from one representation space to another
bull Training a sensitive hash function
bull KNN smart averaging
Food for Thought
bull The basic assumption may be problematic (distance metric representations)
bull The training set should be dense
bull Texture and clutter
bull General some features are more important than others and should be weighted
Food for Thought Point Location in Different Spheres (PLDS)
bull Given n spheres in Rd centered at P=p1hellippn
with radii r1helliprn
bull Goal given a query q preprocess the points in P to find point pi that its sphere lsquocoverrsquo the query q
qpi
ri
Courtesy of Mohamad Hegaze
Motivationbull Clustering high dimensional data by using local
density measurements (eg feature space)bull Statistical curse of dimensionality
sparseness of the databull Computational curse of dimensionality
expensive range queriesbull LSH parameters should be adjusted for optimal
performance
Mean-Shift Based Clustering in High Dimensions A Texture Classification Example
B Georgescu I Shimshoni and P Meer
Outline
bull Mean-shift in a nutshell + examples
Our scope
bull Mean-shift in high dimensions ndash using LSH
bull Speedups1 Finding optimal LSH parameters
2 Data-driven partitions into buckets
3 Additional speedup by using LSH data structure
Mean-Shift in a Nutshellbandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
point
KNN in mean-shift
Bandwidth should be inversely proportional to the density in the region
high density - small bandwidth low density - large bandwidth
Based on kth nearest neighbor of the point
The bandwidth is
Adaptive mean-shift vs non-adaptive
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Image segmentation algorithm1 Input Data in 5D (3 color + 2 xy) or 3D (1 gray +2 xy)2 Resolution controlled by the bandwidth hs (spatial) hr (color)3 Apply filtering
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
3D
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Image segmentation algorithm
original segmented
filtered
Filtering pixel value of the nearest mode
Mean-shift trajectories
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
original squirrel filtered
original baboon filtered
Filtering examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Segmentation examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Mean-shift in high dimensions
Computational curse of dimensionality
Statistical curse of dimensionality
Expensive range queries implemented with LSH
Sparseness of the data variable bandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
LSH-based data structure
bull Choose L random partitionsEach partition includes K pairs
(dkvk)bull For each point we check
kdi vxK
It Partitions the data into cells
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing the optimal K and L
bull For a query q compute smallest number of distances to points in its buckets
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
points extra includemight big toois L ifbut
missed bemight points small toois L If
cell ain points ofnumber smaller k Large
C
l
l
CC
dC
LNN
dKnN
)1(
C
C
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
structure data theof resolution thedetermines
decreases but increases increases L As
C
CC
Choosing optimal K and LDetermine accurately the KNN for m randomly-selected data points
distance (bandwidth)
Choose error threshold
The optimal K and L should satisfy
the approximate distance
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing optimal K and Lbull For each K estimate the error forbull In one run for all Lrsquos find the minimal L satisfying the constraint L(K)bull Minimize time t(KL(K))
minimum
Approximationerror for KL
L(K) for =005 Running timet[KL(K)]
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Data driven partitions
bull In original LSH cut values are random in the range of the databull Suggestion Randomly select a point from the data and use one of its coordinates as the cut value
uniform data driven pointsbucket distribution
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Additional speedup
aggregate)an of typea like is (C mode same
the toconverge willCin points all that Assume
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
C
C
Speedup results
65536 points 1638 points sampled k=100
Food for thought
Low dimension High dimension
A thought for foodhellipbull Choose K L by sample learning or take the traditionalbull Can one estimate K L without samplingbull A thought for food does it help to know the data
dimensionality or the data manifoldbull Intuitively dimensionality implies the number of hash
functions neededbull The catch efficient dimensionality learning requires
KNN
1530 cookieshellip
Summary
bull LSH suggests a compromise on accuracy for the gain of complexity
bull Applications that involve massive data in high dimension require the LSH fast performance
bull Extension of the LSH to different spaces (PSH)
bull Learning the LSH parameters and hash functions for different applications
Conclusion
bull but at the endeverything depends on your data set
bull Try it at homendash Visit
httpwebmiteduandoniwwwLSHindexhtmlndash Email Alex Andoni Andonimitedundash Test over your own data
(C code under Red Hat Linux )
Thanks
bull Ilan Shimshoni (Haifa)
bull Mohamad Hegaze (Weizmann)
bull Alex Andoni (MIT)
bull Mica and Denis
- k-Nearest Neighbors Search in High Dimensions
- Outline
- Nearest Neighbor Search Problem definition
- Applications
- Naiumlve solution
- Common solution
- When to use nearest neighbor
- Nearest Neighbor
- r - Nearest Neighbor
- Slide 10
- The simplest solution
- Quadtree
- Quadtree - structure
- Quadtree - Query
- Quadtree ndash Pitfall1
- Slide 16
- Quadtree ndash pitfall 2
- Space partition based algorithms
- Slide 19
- Curse of dimensionality
- Curse of dimensionality Some intuition
- Slide 22
- Preview
- Hash function
- Slide 25
- Slide 26
- Recall r - Nearest Neighbor
- Locality sensitive hashing
- Slide 29
- Hamming Space
- Slide 31
- L1 to Hamming Space Embedding
- Slide 33
- Construction
- Query
- Alternative intuition random projections
- Slide 37
- Slide 38
- Slide 39
- k samplings
- Repeating
- Repeating L times
- Slide 43
- Secondary hashing
- The above hashing is locality-sensitive
- Slide 46
- Direct L2 solution
- Central limit theorem
- Slide 49
- Slide 50
- Norm Distance
- Slide 52
- The full Hashing
- Slide 54
- Slide 55
- Slide 56
- Generalization P-Stable distribution
- P-Stable summary
- Parameters selection
- Parameters selection hellip
- hellip Parameters selection
- Pros amp Cons
- Conclusion
- LSH - Applications
- Motivation
- Slide 66
- Given an image x what are the parameters θ in this image ie angles of joints orientation of the body etc1048698
- Ingredients
- Example based learning
- Slide 70
- The image features
- PSH The basic assumption
- Insight Manifolds
- Slide 74
- Parameter Sensitive Hashing (PSH)
- Slide 76
- Slide 77
- Slide 78
- Slide 79
- Local Weighted Regression (LWR)
- Results
- Slide 82
- Results ndash real data
- Slide 84
- Slide 85
- Fast pose estimation - summary
- Food for Thought
- Food for Thought Point Location in Different Spheres (PLDS)
- Motivation
- Slide 90
- Slide 91
- Slide 92
- Slide 93
- Image segmentation algorithm
- Slide 95
- Filtering examples
- Segmentation examples
- Mean-shift in high dimensions
- LSH-based data structure
- Choosing the optimal K and L
- Slide 101
- Choosing optimal K and L
- Slide 103
- Data driven partitions
- Additional speedup
- Speedup results
- Food for thought
- A thought for foodhellip
- Summary
- Slide 110
- Thanks
-
![Page 82: K-Nearest Neighbors Search in High Dimensions Tomer Peled Dan Kushnir Tell me who your neighbors are, and I'll know who you are.](https://reader036.fdocuments.us/reader036/viewer/2022062421/56649c765503460f94929f53/html5/thumbnails/82.jpg)
bull 1775000 example pairs
bull Selected 137 out of 5123 meaningful features (how)
18 bit hash functions (k) 150 hash tables (l)
bull Test on 1000 synthetic examplesbull PSH searched only 34 of the data per query
bull Without selection needed 40 bits and
1000 hash tables
Recall P1 is prob of positive hashP2 is prob of bad hashB is the max number of pts in a bucket
Results ndash real data
bull 800 images
bull Processed by a segmentation algorithm
bull 13 of the data were searched
Results ndash real data
Interesting mismatches
Fast pose estimation - summary
bull Fast way to compute the angles of human body figure
bull Moving from one representation space to another
bull Training a sensitive hash function
bull KNN smart averaging
Food for Thought
bull The basic assumption may be problematic (distance metric representations)
bull The training set should be dense
bull Texture and clutter
bull General some features are more important than others and should be weighted
Food for Thought Point Location in Different Spheres (PLDS)
bull Given n spheres in Rd centered at P=p1hellippn
with radii r1helliprn
bull Goal given a query q preprocess the points in P to find point pi that its sphere lsquocoverrsquo the query q
qpi
ri
Courtesy of Mohamad Hegaze
Motivationbull Clustering high dimensional data by using local
density measurements (eg feature space)bull Statistical curse of dimensionality
sparseness of the databull Computational curse of dimensionality
expensive range queriesbull LSH parameters should be adjusted for optimal
performance
Mean-Shift Based Clustering in High Dimensions A Texture Classification Example
B Georgescu I Shimshoni and P Meer
Outline
bull Mean-shift in a nutshell + examples
Our scope
bull Mean-shift in high dimensions ndash using LSH
bull Speedups1 Finding optimal LSH parameters
2 Data-driven partitions into buckets
3 Additional speedup by using LSH data structure
Mean-Shift in a Nutshellbandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
point
KNN in mean-shift
Bandwidth should be inversely proportional to the density in the region
high density - small bandwidth low density - large bandwidth
Based on kth nearest neighbor of the point
The bandwidth is
Adaptive mean-shift vs non-adaptive
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Image segmentation algorithm1 Input Data in 5D (3 color + 2 xy) or 3D (1 gray +2 xy)2 Resolution controlled by the bandwidth hs (spatial) hr (color)3 Apply filtering
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
3D
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Image segmentation algorithm
original segmented
filtered
Filtering pixel value of the nearest mode
Mean-shift trajectories
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
original squirrel filtered
original baboon filtered
Filtering examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Segmentation examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Mean-shift in high dimensions
Computational curse of dimensionality
Statistical curse of dimensionality
Expensive range queries implemented with LSH
Sparseness of the data variable bandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
LSH-based data structure
bull Choose L random partitionsEach partition includes K pairs
(dkvk)bull For each point we check
kdi vxK
It Partitions the data into cells
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing the optimal K and L
bull For a query q compute smallest number of distances to points in its buckets
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
points extra includemight big toois L ifbut
missed bemight points small toois L If
cell ain points ofnumber smaller k Large
C
l
l
CC
dC
LNN
dKnN
)1(
C
C
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
structure data theof resolution thedetermines
decreases but increases increases L As
C
CC
Choosing optimal K and LDetermine accurately the KNN for m randomly-selected data points
distance (bandwidth)
Choose error threshold
The optimal K and L should satisfy
the approximate distance
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing optimal K and Lbull For each K estimate the error forbull In one run for all Lrsquos find the minimal L satisfying the constraint L(K)bull Minimize time t(KL(K))
minimum
Approximationerror for KL
L(K) for =005 Running timet[KL(K)]
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Data driven partitions
bull In original LSH cut values are random in the range of the databull Suggestion Randomly select a point from the data and use one of its coordinates as the cut value
uniform data driven pointsbucket distribution
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Additional speedup
aggregate)an of typea like is (C mode same
the toconverge willCin points all that Assume
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
C
C
Speedup results
65536 points 1638 points sampled k=100
Food for thought
Low dimension High dimension
A thought for foodhellipbull Choose K L by sample learning or take the traditionalbull Can one estimate K L without samplingbull A thought for food does it help to know the data
dimensionality or the data manifoldbull Intuitively dimensionality implies the number of hash
functions neededbull The catch efficient dimensionality learning requires
KNN
1530 cookieshellip
Summary
bull LSH suggests a compromise on accuracy for the gain of complexity
bull Applications that involve massive data in high dimension require the LSH fast performance
bull Extension of the LSH to different spaces (PSH)
bull Learning the LSH parameters and hash functions for different applications
Conclusion
bull but at the endeverything depends on your data set
bull Try it at homendash Visit
httpwebmiteduandoniwwwLSHindexhtmlndash Email Alex Andoni Andonimitedundash Test over your own data
(C code under Red Hat Linux )
Thanks
bull Ilan Shimshoni (Haifa)
bull Mohamad Hegaze (Weizmann)
bull Alex Andoni (MIT)
bull Mica and Denis
- k-Nearest Neighbors Search in High Dimensions
- Outline
- Nearest Neighbor Search Problem definition
- Applications
- Naiumlve solution
- Common solution
- When to use nearest neighbor
- Nearest Neighbor
- r - Nearest Neighbor
- Slide 10
- The simplest solution
- Quadtree
- Quadtree - structure
- Quadtree - Query
- Quadtree ndash Pitfall1
- Slide 16
- Quadtree ndash pitfall 2
- Space partition based algorithms
- Slide 19
- Curse of dimensionality
- Curse of dimensionality Some intuition
- Slide 22
- Preview
- Hash function
- Slide 25
- Slide 26
- Recall r - Nearest Neighbor
- Locality sensitive hashing
- Slide 29
- Hamming Space
- Slide 31
- L1 to Hamming Space Embedding
- Slide 33
- Construction
- Query
- Alternative intuition random projections
- Slide 37
- Slide 38
- Slide 39
- k samplings
- Repeating
- Repeating L times
- Slide 43
- Secondary hashing
- The above hashing is locality-sensitive
- Slide 46
- Direct L2 solution
- Central limit theorem
- Slide 49
- Slide 50
- Norm Distance
- Slide 52
- The full Hashing
- Slide 54
- Slide 55
- Slide 56
- Generalization P-Stable distribution
- P-Stable summary
- Parameters selection
- Parameters selection hellip
- hellip Parameters selection
- Pros amp Cons
- Conclusion
- LSH - Applications
- Motivation
- Slide 66
- Given an image x what are the parameters θ in this image ie angles of joints orientation of the body etc1048698
- Ingredients
- Example based learning
- Slide 70
- The image features
- PSH The basic assumption
- Insight Manifolds
- Slide 74
- Parameter Sensitive Hashing (PSH)
- Slide 76
- Slide 77
- Slide 78
- Slide 79
- Local Weighted Regression (LWR)
- Results
- Slide 82
- Results ndash real data
- Slide 84
- Slide 85
- Fast pose estimation - summary
- Food for Thought
- Food for Thought Point Location in Different Spheres (PLDS)
- Motivation
- Slide 90
- Slide 91
- Slide 92
- Slide 93
- Image segmentation algorithm
- Slide 95
- Filtering examples
- Segmentation examples
- Mean-shift in high dimensions
- LSH-based data structure
- Choosing the optimal K and L
- Slide 101
- Choosing optimal K and L
- Slide 103
- Data driven partitions
- Additional speedup
- Speedup results
- Food for thought
- A thought for foodhellip
- Summary
- Slide 110
- Thanks
-
![Page 83: K-Nearest Neighbors Search in High Dimensions Tomer Peled Dan Kushnir Tell me who your neighbors are, and I'll know who you are.](https://reader036.fdocuments.us/reader036/viewer/2022062421/56649c765503460f94929f53/html5/thumbnails/83.jpg)
Results ndash real data
bull 800 images
bull Processed by a segmentation algorithm
bull 13 of the data were searched
Results ndash real data
Interesting mismatches
Fast pose estimation - summary
bull Fast way to compute the angles of human body figure
bull Moving from one representation space to another
bull Training a sensitive hash function
bull KNN smart averaging
Food for Thought
bull The basic assumption may be problematic (distance metric representations)
bull The training set should be dense
bull Texture and clutter
bull General some features are more important than others and should be weighted
Food for Thought Point Location in Different Spheres (PLDS)
bull Given n spheres in Rd centered at P=p1hellippn
with radii r1helliprn
bull Goal given a query q preprocess the points in P to find point pi that its sphere lsquocoverrsquo the query q
qpi
ri
Courtesy of Mohamad Hegaze
Motivationbull Clustering high dimensional data by using local
density measurements (eg feature space)bull Statistical curse of dimensionality
sparseness of the databull Computational curse of dimensionality
expensive range queriesbull LSH parameters should be adjusted for optimal
performance
Mean-Shift Based Clustering in High Dimensions A Texture Classification Example
B Georgescu I Shimshoni and P Meer
Outline
bull Mean-shift in a nutshell + examples
Our scope
bull Mean-shift in high dimensions ndash using LSH
bull Speedups1 Finding optimal LSH parameters
2 Data-driven partitions into buckets
3 Additional speedup by using LSH data structure
Mean-Shift in a Nutshellbandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
point
KNN in mean-shift
Bandwidth should be inversely proportional to the density in the region
high density - small bandwidth low density - large bandwidth
Based on kth nearest neighbor of the point
The bandwidth is
Adaptive mean-shift vs non-adaptive
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Image segmentation algorithm1 Input Data in 5D (3 color + 2 xy) or 3D (1 gray +2 xy)2 Resolution controlled by the bandwidth hs (spatial) hr (color)3 Apply filtering
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
3D
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Image segmentation algorithm
original segmented
filtered
Filtering pixel value of the nearest mode
Mean-shift trajectories
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
original squirrel filtered
original baboon filtered
Filtering examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Segmentation examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Mean-shift in high dimensions
Computational curse of dimensionality
Statistical curse of dimensionality
Expensive range queries implemented with LSH
Sparseness of the data variable bandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
LSH-based data structure
bull Choose L random partitionsEach partition includes K pairs
(dkvk)bull For each point we check
kdi vxK
It Partitions the data into cells
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing the optimal K and L
bull For a query q compute smallest number of distances to points in its buckets
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
points extra includemight big toois L ifbut
missed bemight points small toois L If
cell ain points ofnumber smaller k Large
C
l
l
CC
dC
LNN
dKnN
)1(
C
C
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
structure data theof resolution thedetermines
decreases but increases increases L As
C
CC
Choosing optimal K and LDetermine accurately the KNN for m randomly-selected data points
distance (bandwidth)
Choose error threshold
The optimal K and L should satisfy
the approximate distance
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing optimal K and Lbull For each K estimate the error forbull In one run for all Lrsquos find the minimal L satisfying the constraint L(K)bull Minimize time t(KL(K))
minimum
Approximationerror for KL
L(K) for =005 Running timet[KL(K)]
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Data driven partitions
bull In original LSH cut values are random in the range of the databull Suggestion Randomly select a point from the data and use one of its coordinates as the cut value
uniform data driven pointsbucket distribution
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Additional speedup
aggregate)an of typea like is (C mode same
the toconverge willCin points all that Assume
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
C
C
Speedup results
65536 points 1638 points sampled k=100
Food for thought
Low dimension High dimension
A thought for foodhellipbull Choose K L by sample learning or take the traditionalbull Can one estimate K L without samplingbull A thought for food does it help to know the data
dimensionality or the data manifoldbull Intuitively dimensionality implies the number of hash
functions neededbull The catch efficient dimensionality learning requires
KNN
1530 cookieshellip
Summary
bull LSH suggests a compromise on accuracy for the gain of complexity
bull Applications that involve massive data in high dimension require the LSH fast performance
bull Extension of the LSH to different spaces (PSH)
bull Learning the LSH parameters and hash functions for different applications
Conclusion
bull but at the endeverything depends on your data set
bull Try it at homendash Visit
httpwebmiteduandoniwwwLSHindexhtmlndash Email Alex Andoni Andonimitedundash Test over your own data
(C code under Red Hat Linux )
Thanks
bull Ilan Shimshoni (Haifa)
bull Mohamad Hegaze (Weizmann)
bull Alex Andoni (MIT)
bull Mica and Denis
- k-Nearest Neighbors Search in High Dimensions
- Outline
- Nearest Neighbor Search Problem definition
- Applications
- Naiumlve solution
- Common solution
- When to use nearest neighbor
- Nearest Neighbor
- r - Nearest Neighbor
- Slide 10
- The simplest solution
- Quadtree
- Quadtree - structure
- Quadtree - Query
- Quadtree ndash Pitfall1
- Slide 16
- Quadtree ndash pitfall 2
- Space partition based algorithms
- Slide 19
- Curse of dimensionality
- Curse of dimensionality Some intuition
- Slide 22
- Preview
- Hash function
- Slide 25
- Slide 26
- Recall r - Nearest Neighbor
- Locality sensitive hashing
- Slide 29
- Hamming Space
- Slide 31
- L1 to Hamming Space Embedding
- Slide 33
- Construction
- Query
- Alternative intuition random projections
- Slide 37
- Slide 38
- Slide 39
- k samplings
- Repeating
- Repeating L times
- Slide 43
- Secondary hashing
- The above hashing is locality-sensitive
- Slide 46
- Direct L2 solution
- Central limit theorem
- Slide 49
- Slide 50
- Norm Distance
- Slide 52
- The full Hashing
- Slide 54
- Slide 55
- Slide 56
- Generalization P-Stable distribution
- P-Stable summary
- Parameters selection
- Parameters selection hellip
- hellip Parameters selection
- Pros amp Cons
- Conclusion
- LSH - Applications
- Motivation
- Slide 66
- Given an image x what are the parameters θ in this image ie angles of joints orientation of the body etc1048698
- Ingredients
- Example based learning
- Slide 70
- The image features
- PSH The basic assumption
- Insight Manifolds
- Slide 74
- Parameter Sensitive Hashing (PSH)
- Slide 76
- Slide 77
- Slide 78
- Slide 79
- Local Weighted Regression (LWR)
- Results
- Slide 82
- Results ndash real data
- Slide 84
- Slide 85
- Fast pose estimation - summary
- Food for Thought
- Food for Thought Point Location in Different Spheres (PLDS)
- Motivation
- Slide 90
- Slide 91
- Slide 92
- Slide 93
- Image segmentation algorithm
- Slide 95
- Filtering examples
- Segmentation examples
- Mean-shift in high dimensions
- LSH-based data structure
- Choosing the optimal K and L
- Slide 101
- Choosing optimal K and L
- Slide 103
- Data driven partitions
- Additional speedup
- Speedup results
- Food for thought
- A thought for foodhellip
- Summary
- Slide 110
- Thanks
-
![Page 84: K-Nearest Neighbors Search in High Dimensions Tomer Peled Dan Kushnir Tell me who your neighbors are, and I'll know who you are.](https://reader036.fdocuments.us/reader036/viewer/2022062421/56649c765503460f94929f53/html5/thumbnails/84.jpg)
Results ndash real data
Interesting mismatches
Fast pose estimation - summary
bull Fast way to compute the angles of human body figure
bull Moving from one representation space to another
bull Training a sensitive hash function
bull KNN smart averaging
Food for Thought
bull The basic assumption may be problematic (distance metric representations)
bull The training set should be dense
bull Texture and clutter
bull General some features are more important than others and should be weighted
Food for Thought Point Location in Different Spheres (PLDS)
bull Given n spheres in Rd centered at P=p1hellippn
with radii r1helliprn
bull Goal given a query q preprocess the points in P to find point pi that its sphere lsquocoverrsquo the query q
qpi
ri
Courtesy of Mohamad Hegaze
Motivationbull Clustering high dimensional data by using local
density measurements (eg feature space)bull Statistical curse of dimensionality
sparseness of the databull Computational curse of dimensionality
expensive range queriesbull LSH parameters should be adjusted for optimal
performance
Mean-Shift Based Clustering in High Dimensions A Texture Classification Example
B Georgescu I Shimshoni and P Meer
Outline
bull Mean-shift in a nutshell + examples
Our scope
bull Mean-shift in high dimensions ndash using LSH
bull Speedups1 Finding optimal LSH parameters
2 Data-driven partitions into buckets
3 Additional speedup by using LSH data structure
Mean-Shift in a Nutshellbandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
point
KNN in mean-shift
Bandwidth should be inversely proportional to the density in the region
high density - small bandwidth low density - large bandwidth
Based on kth nearest neighbor of the point
The bandwidth is
Adaptive mean-shift vs non-adaptive
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Image segmentation algorithm1 Input Data in 5D (3 color + 2 xy) or 3D (1 gray +2 xy)2 Resolution controlled by the bandwidth hs (spatial) hr (color)3 Apply filtering
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
3D
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Image segmentation algorithm
original segmented
filtered
Filtering pixel value of the nearest mode
Mean-shift trajectories
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
original squirrel filtered
original baboon filtered
Filtering examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Segmentation examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Mean-shift in high dimensions
Computational curse of dimensionality
Statistical curse of dimensionality
Expensive range queries implemented with LSH
Sparseness of the data variable bandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
LSH-based data structure
bull Choose L random partitionsEach partition includes K pairs
(dkvk)bull For each point we check
kdi vxK
It Partitions the data into cells
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing the optimal K and L
bull For a query q compute smallest number of distances to points in its buckets
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
points extra includemight big toois L ifbut
missed bemight points small toois L If
cell ain points ofnumber smaller k Large
C
l
l
CC
dC
LNN
dKnN
)1(
C
C
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
structure data theof resolution thedetermines
decreases but increases increases L As
C
CC
Choosing optimal K and LDetermine accurately the KNN for m randomly-selected data points
distance (bandwidth)
Choose error threshold
The optimal K and L should satisfy
the approximate distance
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing optimal K and Lbull For each K estimate the error forbull In one run for all Lrsquos find the minimal L satisfying the constraint L(K)bull Minimize time t(KL(K))
minimum
Approximationerror for KL
L(K) for =005 Running timet[KL(K)]
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Data driven partitions
bull In original LSH cut values are random in the range of the databull Suggestion Randomly select a point from the data and use one of its coordinates as the cut value
uniform data driven pointsbucket distribution
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Additional speedup
aggregate)an of typea like is (C mode same
the toconverge willCin points all that Assume
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
C
C
Speedup results
65536 points 1638 points sampled k=100
Food for thought
Low dimension High dimension
A thought for foodhellipbull Choose K L by sample learning or take the traditionalbull Can one estimate K L without samplingbull A thought for food does it help to know the data
dimensionality or the data manifoldbull Intuitively dimensionality implies the number of hash
functions neededbull The catch efficient dimensionality learning requires
KNN
1530 cookieshellip
Summary
bull LSH suggests a compromise on accuracy for the gain of complexity
bull Applications that involve massive data in high dimension require the LSH fast performance
bull Extension of the LSH to different spaces (PSH)
bull Learning the LSH parameters and hash functions for different applications
Conclusion
bull but at the endeverything depends on your data set
bull Try it at homendash Visit
httpwebmiteduandoniwwwLSHindexhtmlndash Email Alex Andoni Andonimitedundash Test over your own data
(C code under Red Hat Linux )
Thanks
bull Ilan Shimshoni (Haifa)
bull Mohamad Hegaze (Weizmann)
bull Alex Andoni (MIT)
bull Mica and Denis
- k-Nearest Neighbors Search in High Dimensions
- Outline
- Nearest Neighbor Search Problem definition
- Applications
- Naiumlve solution
- Common solution
- When to use nearest neighbor
- Nearest Neighbor
- r - Nearest Neighbor
- Slide 10
- The simplest solution
- Quadtree
- Quadtree - structure
- Quadtree - Query
- Quadtree ndash Pitfall1
- Slide 16
- Quadtree ndash pitfall 2
- Space partition based algorithms
- Slide 19
- Curse of dimensionality
- Curse of dimensionality Some intuition
- Slide 22
- Preview
- Hash function
- Slide 25
- Slide 26
- Recall r - Nearest Neighbor
- Locality sensitive hashing
- Slide 29
- Hamming Space
- Slide 31
- L1 to Hamming Space Embedding
- Slide 33
- Construction
- Query
- Alternative intuition random projections
- Slide 37
- Slide 38
- Slide 39
- k samplings
- Repeating
- Repeating L times
- Slide 43
- Secondary hashing
- The above hashing is locality-sensitive
- Slide 46
- Direct L2 solution
- Central limit theorem
- Slide 49
- Slide 50
- Norm Distance
- Slide 52
- The full Hashing
- Slide 54
- Slide 55
- Slide 56
- Generalization P-Stable distribution
- P-Stable summary
- Parameters selection
- Parameters selection hellip
- hellip Parameters selection
- Pros amp Cons
- Conclusion
- LSH - Applications
- Motivation
- Slide 66
- Given an image x what are the parameters θ in this image ie angles of joints orientation of the body etc1048698
- Ingredients
- Example based learning
- Slide 70
- The image features
- PSH The basic assumption
- Insight Manifolds
- Slide 74
- Parameter Sensitive Hashing (PSH)
- Slide 76
- Slide 77
- Slide 78
- Slide 79
- Local Weighted Regression (LWR)
- Results
- Slide 82
- Results ndash real data
- Slide 84
- Slide 85
- Fast pose estimation - summary
- Food for Thought
- Food for Thought Point Location in Different Spheres (PLDS)
- Motivation
- Slide 90
- Slide 91
- Slide 92
- Slide 93
- Image segmentation algorithm
- Slide 95
- Filtering examples
- Segmentation examples
- Mean-shift in high dimensions
- LSH-based data structure
- Choosing the optimal K and L
- Slide 101
- Choosing optimal K and L
- Slide 103
- Data driven partitions
- Additional speedup
- Speedup results
- Food for thought
- A thought for foodhellip
- Summary
- Slide 110
- Thanks
-
![Page 85: K-Nearest Neighbors Search in High Dimensions Tomer Peled Dan Kushnir Tell me who your neighbors are, and I'll know who you are.](https://reader036.fdocuments.us/reader036/viewer/2022062421/56649c765503460f94929f53/html5/thumbnails/85.jpg)
Interesting mismatches
Fast pose estimation - summary
bull Fast way to compute the angles of human body figure
bull Moving from one representation space to another
bull Training a sensitive hash function
bull KNN smart averaging
Food for Thought
bull The basic assumption may be problematic (distance metric representations)
bull The training set should be dense
bull Texture and clutter
bull General some features are more important than others and should be weighted
Food for Thought Point Location in Different Spheres (PLDS)
bull Given n spheres in Rd centered at P=p1hellippn
with radii r1helliprn
bull Goal given a query q preprocess the points in P to find point pi that its sphere lsquocoverrsquo the query q
qpi
ri
Courtesy of Mohamad Hegaze
Motivationbull Clustering high dimensional data by using local
density measurements (eg feature space)bull Statistical curse of dimensionality
sparseness of the databull Computational curse of dimensionality
expensive range queriesbull LSH parameters should be adjusted for optimal
performance
Mean-Shift Based Clustering in High Dimensions A Texture Classification Example
B Georgescu I Shimshoni and P Meer
Outline
bull Mean-shift in a nutshell + examples
Our scope
bull Mean-shift in high dimensions ndash using LSH
bull Speedups1 Finding optimal LSH parameters
2 Data-driven partitions into buckets
3 Additional speedup by using LSH data structure
Mean-Shift in a Nutshellbandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
point
KNN in mean-shift
Bandwidth should be inversely proportional to the density in the region
high density - small bandwidth low density - large bandwidth
Based on kth nearest neighbor of the point
The bandwidth is
Adaptive mean-shift vs non-adaptive
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Image segmentation algorithm1 Input Data in 5D (3 color + 2 xy) or 3D (1 gray +2 xy)2 Resolution controlled by the bandwidth hs (spatial) hr (color)3 Apply filtering
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
3D
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Image segmentation algorithm
original segmented
filtered
Filtering pixel value of the nearest mode
Mean-shift trajectories
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
original squirrel filtered
original baboon filtered
Filtering examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Segmentation examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Mean-shift in high dimensions
Computational curse of dimensionality
Statistical curse of dimensionality
Expensive range queries implemented with LSH
Sparseness of the data variable bandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
LSH-based data structure
bull Choose L random partitionsEach partition includes K pairs
(dkvk)bull For each point we check
kdi vxK
It Partitions the data into cells
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing the optimal K and L
bull For a query q compute smallest number of distances to points in its buckets
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
points extra includemight big toois L ifbut
missed bemight points small toois L If
cell ain points ofnumber smaller k Large
C
l
l
CC
dC
LNN
dKnN
)1(
C
C
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
structure data theof resolution thedetermines
decreases but increases increases L As
C
CC
Choosing optimal K and LDetermine accurately the KNN for m randomly-selected data points
distance (bandwidth)
Choose error threshold
The optimal K and L should satisfy
the approximate distance
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing optimal K and Lbull For each K estimate the error forbull In one run for all Lrsquos find the minimal L satisfying the constraint L(K)bull Minimize time t(KL(K))
minimum
Approximationerror for KL
L(K) for =005 Running timet[KL(K)]
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Data driven partitions
bull In original LSH cut values are random in the range of the databull Suggestion Randomly select a point from the data and use one of its coordinates as the cut value
uniform data driven pointsbucket distribution
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Additional speedup
aggregate)an of typea like is (C mode same
the toconverge willCin points all that Assume
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
C
C
Speedup results
65536 points 1638 points sampled k=100
Food for thought
Low dimension High dimension
A thought for foodhellipbull Choose K L by sample learning or take the traditionalbull Can one estimate K L without samplingbull A thought for food does it help to know the data
dimensionality or the data manifoldbull Intuitively dimensionality implies the number of hash
functions neededbull The catch efficient dimensionality learning requires
KNN
1530 cookieshellip
Summary
bull LSH suggests a compromise on accuracy for the gain of complexity
bull Applications that involve massive data in high dimension require the LSH fast performance
bull Extension of the LSH to different spaces (PSH)
bull Learning the LSH parameters and hash functions for different applications
Conclusion
bull but at the endeverything depends on your data set
bull Try it at homendash Visit
httpwebmiteduandoniwwwLSHindexhtmlndash Email Alex Andoni Andonimitedundash Test over your own data
(C code under Red Hat Linux )
Thanks
bull Ilan Shimshoni (Haifa)
bull Mohamad Hegaze (Weizmann)
bull Alex Andoni (MIT)
bull Mica and Denis
- k-Nearest Neighbors Search in High Dimensions
- Outline
- Nearest Neighbor Search Problem definition
- Applications
- Naiumlve solution
- Common solution
- When to use nearest neighbor
- Nearest Neighbor
- r - Nearest Neighbor
- Slide 10
- The simplest solution
- Quadtree
- Quadtree - structure
- Quadtree - Query
- Quadtree ndash Pitfall1
- Slide 16
- Quadtree ndash pitfall 2
- Space partition based algorithms
- Slide 19
- Curse of dimensionality
- Curse of dimensionality Some intuition
- Slide 22
- Preview
- Hash function
- Slide 25
- Slide 26
- Recall r - Nearest Neighbor
- Locality sensitive hashing
- Slide 29
- Hamming Space
- Slide 31
- L1 to Hamming Space Embedding
- Slide 33
- Construction
- Query
- Alternative intuition random projections
- Slide 37
- Slide 38
- Slide 39
- k samplings
- Repeating
- Repeating L times
- Slide 43
- Secondary hashing
- The above hashing is locality-sensitive
- Slide 46
- Direct L2 solution
- Central limit theorem
- Slide 49
- Slide 50
- Norm Distance
- Slide 52
- The full Hashing
- Slide 54
- Slide 55
- Slide 56
- Generalization P-Stable distribution
- P-Stable summary
- Parameters selection
- Parameters selection hellip
- hellip Parameters selection
- Pros amp Cons
- Conclusion
- LSH - Applications
- Motivation
- Slide 66
- Given an image x what are the parameters θ in this image ie angles of joints orientation of the body etc1048698
- Ingredients
- Example based learning
- Slide 70
- The image features
- PSH The basic assumption
- Insight Manifolds
- Slide 74
- Parameter Sensitive Hashing (PSH)
- Slide 76
- Slide 77
- Slide 78
- Slide 79
- Local Weighted Regression (LWR)
- Results
- Slide 82
- Results ndash real data
- Slide 84
- Slide 85
- Fast pose estimation - summary
- Food for Thought
- Food for Thought Point Location in Different Spheres (PLDS)
- Motivation
- Slide 90
- Slide 91
- Slide 92
- Slide 93
- Image segmentation algorithm
- Slide 95
- Filtering examples
- Segmentation examples
- Mean-shift in high dimensions
- LSH-based data structure
- Choosing the optimal K and L
- Slide 101
- Choosing optimal K and L
- Slide 103
- Data driven partitions
- Additional speedup
- Speedup results
- Food for thought
- A thought for foodhellip
- Summary
- Slide 110
- Thanks
-
![Page 86: K-Nearest Neighbors Search in High Dimensions Tomer Peled Dan Kushnir Tell me who your neighbors are, and I'll know who you are.](https://reader036.fdocuments.us/reader036/viewer/2022062421/56649c765503460f94929f53/html5/thumbnails/86.jpg)
Fast pose estimation - summary
bull Fast way to compute the angles of human body figure
bull Moving from one representation space to another
bull Training a sensitive hash function
bull KNN smart averaging
Food for Thought
bull The basic assumption may be problematic (distance metric representations)
bull The training set should be dense
bull Texture and clutter
bull General some features are more important than others and should be weighted
Food for Thought Point Location in Different Spheres (PLDS)
bull Given n spheres in Rd centered at P=p1hellippn
with radii r1helliprn
bull Goal given a query q preprocess the points in P to find point pi that its sphere lsquocoverrsquo the query q
qpi
ri
Courtesy of Mohamad Hegaze
Motivationbull Clustering high dimensional data by using local
density measurements (eg feature space)bull Statistical curse of dimensionality
sparseness of the databull Computational curse of dimensionality
expensive range queriesbull LSH parameters should be adjusted for optimal
performance
Mean-Shift Based Clustering in High Dimensions A Texture Classification Example
B Georgescu I Shimshoni and P Meer
Outline
bull Mean-shift in a nutshell + examples
Our scope
bull Mean-shift in high dimensions ndash using LSH
bull Speedups1 Finding optimal LSH parameters
2 Data-driven partitions into buckets
3 Additional speedup by using LSH data structure
Mean-Shift in a Nutshellbandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
point
KNN in mean-shift
Bandwidth should be inversely proportional to the density in the region
high density - small bandwidth low density - large bandwidth
Based on kth nearest neighbor of the point
The bandwidth is
Adaptive mean-shift vs non-adaptive
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Image segmentation algorithm1 Input Data in 5D (3 color + 2 xy) or 3D (1 gray +2 xy)2 Resolution controlled by the bandwidth hs (spatial) hr (color)3 Apply filtering
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
3D
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Image segmentation algorithm
original segmented
filtered
Filtering pixel value of the nearest mode
Mean-shift trajectories
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
original squirrel filtered
original baboon filtered
Filtering examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Segmentation examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Mean-shift in high dimensions
Computational curse of dimensionality
Statistical curse of dimensionality
Expensive range queries implemented with LSH
Sparseness of the data variable bandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
LSH-based data structure
bull Choose L random partitionsEach partition includes K pairs
(dkvk)bull For each point we check
kdi vxK
It Partitions the data into cells
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing the optimal K and L
bull For a query q compute smallest number of distances to points in its buckets
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
points extra includemight big toois L ifbut
missed bemight points small toois L If
cell ain points ofnumber smaller k Large
C
l
l
CC
dC
LNN
dKnN
)1(
C
C
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
structure data theof resolution thedetermines
decreases but increases increases L As
C
CC
Choosing optimal K and LDetermine accurately the KNN for m randomly-selected data points
distance (bandwidth)
Choose error threshold
The optimal K and L should satisfy
the approximate distance
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing optimal K and Lbull For each K estimate the error forbull In one run for all Lrsquos find the minimal L satisfying the constraint L(K)bull Minimize time t(KL(K))
minimum
Approximationerror for KL
L(K) for =005 Running timet[KL(K)]
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Data driven partitions
bull In original LSH cut values are random in the range of the databull Suggestion Randomly select a point from the data and use one of its coordinates as the cut value
uniform data driven pointsbucket distribution
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Additional speedup
aggregate)an of typea like is (C mode same
the toconverge willCin points all that Assume
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
C
C
Speedup results
65536 points 1638 points sampled k=100
Food for thought
Low dimension High dimension
A thought for foodhellipbull Choose K L by sample learning or take the traditionalbull Can one estimate K L without samplingbull A thought for food does it help to know the data
dimensionality or the data manifoldbull Intuitively dimensionality implies the number of hash
functions neededbull The catch efficient dimensionality learning requires
KNN
1530 cookieshellip
Summary
bull LSH suggests a compromise on accuracy for the gain of complexity
bull Applications that involve massive data in high dimension require the LSH fast performance
bull Extension of the LSH to different spaces (PSH)
bull Learning the LSH parameters and hash functions for different applications
Conclusion
bull but at the endeverything depends on your data set
bull Try it at homendash Visit
httpwebmiteduandoniwwwLSHindexhtmlndash Email Alex Andoni Andonimitedundash Test over your own data
(C code under Red Hat Linux )
Thanks
bull Ilan Shimshoni (Haifa)
bull Mohamad Hegaze (Weizmann)
bull Alex Andoni (MIT)
bull Mica and Denis
- k-Nearest Neighbors Search in High Dimensions
- Outline
- Nearest Neighbor Search Problem definition
- Applications
- Naiumlve solution
- Common solution
- When to use nearest neighbor
- Nearest Neighbor
- r - Nearest Neighbor
- Slide 10
- The simplest solution
- Quadtree
- Quadtree - structure
- Quadtree - Query
- Quadtree ndash Pitfall1
- Slide 16
- Quadtree ndash pitfall 2
- Space partition based algorithms
- Slide 19
- Curse of dimensionality
- Curse of dimensionality Some intuition
- Slide 22
- Preview
- Hash function
- Slide 25
- Slide 26
- Recall r - Nearest Neighbor
- Locality sensitive hashing
- Slide 29
- Hamming Space
- Slide 31
- L1 to Hamming Space Embedding
- Slide 33
- Construction
- Query
- Alternative intuition random projections
- Slide 37
- Slide 38
- Slide 39
- k samplings
- Repeating
- Repeating L times
- Slide 43
- Secondary hashing
- The above hashing is locality-sensitive
- Slide 46
- Direct L2 solution
- Central limit theorem
- Slide 49
- Slide 50
- Norm Distance
- Slide 52
- The full Hashing
- Slide 54
- Slide 55
- Slide 56
- Generalization P-Stable distribution
- P-Stable summary
- Parameters selection
- Parameters selection hellip
- hellip Parameters selection
- Pros amp Cons
- Conclusion
- LSH - Applications
- Motivation
- Slide 66
- Given an image x what are the parameters θ in this image ie angles of joints orientation of the body etc1048698
- Ingredients
- Example based learning
- Slide 70
- The image features
- PSH The basic assumption
- Insight Manifolds
- Slide 74
- Parameter Sensitive Hashing (PSH)
- Slide 76
- Slide 77
- Slide 78
- Slide 79
- Local Weighted Regression (LWR)
- Results
- Slide 82
- Results ndash real data
- Slide 84
- Slide 85
- Fast pose estimation - summary
- Food for Thought
- Food for Thought Point Location in Different Spheres (PLDS)
- Motivation
- Slide 90
- Slide 91
- Slide 92
- Slide 93
- Image segmentation algorithm
- Slide 95
- Filtering examples
- Segmentation examples
- Mean-shift in high dimensions
- LSH-based data structure
- Choosing the optimal K and L
- Slide 101
- Choosing optimal K and L
- Slide 103
- Data driven partitions
- Additional speedup
- Speedup results
- Food for thought
- A thought for foodhellip
- Summary
- Slide 110
- Thanks
-
![Page 87: K-Nearest Neighbors Search in High Dimensions Tomer Peled Dan Kushnir Tell me who your neighbors are, and I'll know who you are.](https://reader036.fdocuments.us/reader036/viewer/2022062421/56649c765503460f94929f53/html5/thumbnails/87.jpg)
Food for Thought
bull The basic assumption may be problematic (distance metric representations)
bull The training set should be dense
bull Texture and clutter
bull General some features are more important than others and should be weighted
Food for Thought Point Location in Different Spheres (PLDS)
bull Given n spheres in Rd centered at P=p1hellippn
with radii r1helliprn
bull Goal given a query q preprocess the points in P to find point pi that its sphere lsquocoverrsquo the query q
qpi
ri
Courtesy of Mohamad Hegaze
Motivationbull Clustering high dimensional data by using local
density measurements (eg feature space)bull Statistical curse of dimensionality
sparseness of the databull Computational curse of dimensionality
expensive range queriesbull LSH parameters should be adjusted for optimal
performance
Mean-Shift Based Clustering in High Dimensions A Texture Classification Example
B Georgescu I Shimshoni and P Meer
Outline
bull Mean-shift in a nutshell + examples
Our scope
bull Mean-shift in high dimensions ndash using LSH
bull Speedups1 Finding optimal LSH parameters
2 Data-driven partitions into buckets
3 Additional speedup by using LSH data structure
Mean-Shift in a Nutshellbandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
point
KNN in mean-shift
Bandwidth should be inversely proportional to the density in the region
high density - small bandwidth low density - large bandwidth
Based on kth nearest neighbor of the point
The bandwidth is
Adaptive mean-shift vs non-adaptive
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Image segmentation algorithm1 Input Data in 5D (3 color + 2 xy) or 3D (1 gray +2 xy)2 Resolution controlled by the bandwidth hs (spatial) hr (color)3 Apply filtering
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
3D
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Image segmentation algorithm
original segmented
filtered
Filtering pixel value of the nearest mode
Mean-shift trajectories
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
original squirrel filtered
original baboon filtered
Filtering examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Segmentation examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Mean-shift in high dimensions
Computational curse of dimensionality
Statistical curse of dimensionality
Expensive range queries implemented with LSH
Sparseness of the data variable bandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
LSH-based data structure
bull Choose L random partitionsEach partition includes K pairs
(dkvk)bull For each point we check
kdi vxK
It Partitions the data into cells
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing the optimal K and L
bull For a query q compute smallest number of distances to points in its buckets
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
points extra includemight big toois L ifbut
missed bemight points small toois L If
cell ain points ofnumber smaller k Large
C
l
l
CC
dC
LNN
dKnN
)1(
C
C
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
structure data theof resolution thedetermines
decreases but increases increases L As
C
CC
Choosing optimal K and LDetermine accurately the KNN for m randomly-selected data points
distance (bandwidth)
Choose error threshold
The optimal K and L should satisfy
the approximate distance
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing optimal K and Lbull For each K estimate the error forbull In one run for all Lrsquos find the minimal L satisfying the constraint L(K)bull Minimize time t(KL(K))
minimum
Approximationerror for KL
L(K) for =005 Running timet[KL(K)]
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Data driven partitions
bull In original LSH cut values are random in the range of the databull Suggestion Randomly select a point from the data and use one of its coordinates as the cut value
uniform data driven pointsbucket distribution
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Additional speedup
aggregate)an of typea like is (C mode same
the toconverge willCin points all that Assume
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
C
C
Speedup results
65536 points 1638 points sampled k=100
Food for thought
Low dimension High dimension
A thought for foodhellipbull Choose K L by sample learning or take the traditionalbull Can one estimate K L without samplingbull A thought for food does it help to know the data
dimensionality or the data manifoldbull Intuitively dimensionality implies the number of hash
functions neededbull The catch efficient dimensionality learning requires
KNN
1530 cookieshellip
Summary
bull LSH suggests a compromise on accuracy for the gain of complexity
bull Applications that involve massive data in high dimension require the LSH fast performance
bull Extension of the LSH to different spaces (PSH)
bull Learning the LSH parameters and hash functions for different applications
Conclusion
bull but at the endeverything depends on your data set
bull Try it at homendash Visit
httpwebmiteduandoniwwwLSHindexhtmlndash Email Alex Andoni Andonimitedundash Test over your own data
(C code under Red Hat Linux )
Thanks
bull Ilan Shimshoni (Haifa)
bull Mohamad Hegaze (Weizmann)
bull Alex Andoni (MIT)
bull Mica and Denis
- k-Nearest Neighbors Search in High Dimensions
- Outline
- Nearest Neighbor Search Problem definition
- Applications
- Naiumlve solution
- Common solution
- When to use nearest neighbor
- Nearest Neighbor
- r - Nearest Neighbor
- Slide 10
- The simplest solution
- Quadtree
- Quadtree - structure
- Quadtree - Query
- Quadtree ndash Pitfall1
- Slide 16
- Quadtree ndash pitfall 2
- Space partition based algorithms
- Slide 19
- Curse of dimensionality
- Curse of dimensionality Some intuition
- Slide 22
- Preview
- Hash function
- Slide 25
- Slide 26
- Recall r - Nearest Neighbor
- Locality sensitive hashing
- Slide 29
- Hamming Space
- Slide 31
- L1 to Hamming Space Embedding
- Slide 33
- Construction
- Query
- Alternative intuition random projections
- Slide 37
- Slide 38
- Slide 39
- k samplings
- Repeating
- Repeating L times
- Slide 43
- Secondary hashing
- The above hashing is locality-sensitive
- Slide 46
- Direct L2 solution
- Central limit theorem
- Slide 49
- Slide 50
- Norm Distance
- Slide 52
- The full Hashing
- Slide 54
- Slide 55
- Slide 56
- Generalization P-Stable distribution
- P-Stable summary
- Parameters selection
- Parameters selection hellip
- hellip Parameters selection
- Pros amp Cons
- Conclusion
- LSH - Applications
- Motivation
- Slide 66
- Given an image x what are the parameters θ in this image ie angles of joints orientation of the body etc1048698
- Ingredients
- Example based learning
- Slide 70
- The image features
- PSH The basic assumption
- Insight Manifolds
- Slide 74
- Parameter Sensitive Hashing (PSH)
- Slide 76
- Slide 77
- Slide 78
- Slide 79
- Local Weighted Regression (LWR)
- Results
- Slide 82
- Results ndash real data
- Slide 84
- Slide 85
- Fast pose estimation - summary
- Food for Thought
- Food for Thought Point Location in Different Spheres (PLDS)
- Motivation
- Slide 90
- Slide 91
- Slide 92
- Slide 93
- Image segmentation algorithm
- Slide 95
- Filtering examples
- Segmentation examples
- Mean-shift in high dimensions
- LSH-based data structure
- Choosing the optimal K and L
- Slide 101
- Choosing optimal K and L
- Slide 103
- Data driven partitions
- Additional speedup
- Speedup results
- Food for thought
- A thought for foodhellip
- Summary
- Slide 110
- Thanks
-
![Page 88: K-Nearest Neighbors Search in High Dimensions Tomer Peled Dan Kushnir Tell me who your neighbors are, and I'll know who you are.](https://reader036.fdocuments.us/reader036/viewer/2022062421/56649c765503460f94929f53/html5/thumbnails/88.jpg)
Food for Thought Point Location in Different Spheres (PLDS)
bull Given n spheres in Rd centered at P=p1hellippn
with radii r1helliprn
bull Goal given a query q preprocess the points in P to find point pi that its sphere lsquocoverrsquo the query q
qpi
ri
Courtesy of Mohamad Hegaze
Motivationbull Clustering high dimensional data by using local
density measurements (eg feature space)bull Statistical curse of dimensionality
sparseness of the databull Computational curse of dimensionality
expensive range queriesbull LSH parameters should be adjusted for optimal
performance
Mean-Shift Based Clustering in High Dimensions A Texture Classification Example
B Georgescu I Shimshoni and P Meer
Outline
bull Mean-shift in a nutshell + examples
Our scope
bull Mean-shift in high dimensions ndash using LSH
bull Speedups1 Finding optimal LSH parameters
2 Data-driven partitions into buckets
3 Additional speedup by using LSH data structure
Mean-Shift in a Nutshellbandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
point
KNN in mean-shift
Bandwidth should be inversely proportional to the density in the region
high density - small bandwidth low density - large bandwidth
Based on kth nearest neighbor of the point
The bandwidth is
Adaptive mean-shift vs non-adaptive
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Image segmentation algorithm1 Input Data in 5D (3 color + 2 xy) or 3D (1 gray +2 xy)2 Resolution controlled by the bandwidth hs (spatial) hr (color)3 Apply filtering
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
3D
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Image segmentation algorithm
original segmented
filtered
Filtering pixel value of the nearest mode
Mean-shift trajectories
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
original squirrel filtered
original baboon filtered
Filtering examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Segmentation examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Mean-shift in high dimensions
Computational curse of dimensionality
Statistical curse of dimensionality
Expensive range queries implemented with LSH
Sparseness of the data variable bandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
LSH-based data structure
bull Choose L random partitionsEach partition includes K pairs
(dkvk)bull For each point we check
kdi vxK
It Partitions the data into cells
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing the optimal K and L
bull For a query q compute smallest number of distances to points in its buckets
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
points extra includemight big toois L ifbut
missed bemight points small toois L If
cell ain points ofnumber smaller k Large
C
l
l
CC
dC
LNN
dKnN
)1(
C
C
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
structure data theof resolution thedetermines
decreases but increases increases L As
C
CC
Choosing optimal K and LDetermine accurately the KNN for m randomly-selected data points
distance (bandwidth)
Choose error threshold
The optimal K and L should satisfy
the approximate distance
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing optimal K and Lbull For each K estimate the error forbull In one run for all Lrsquos find the minimal L satisfying the constraint L(K)bull Minimize time t(KL(K))
minimum
Approximationerror for KL
L(K) for =005 Running timet[KL(K)]
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Data driven partitions
bull In original LSH cut values are random in the range of the databull Suggestion Randomly select a point from the data and use one of its coordinates as the cut value
uniform data driven pointsbucket distribution
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Additional speedup
aggregate)an of typea like is (C mode same
the toconverge willCin points all that Assume
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
C
C
Speedup results
65536 points 1638 points sampled k=100
Food for thought
Low dimension High dimension
A thought for foodhellipbull Choose K L by sample learning or take the traditionalbull Can one estimate K L without samplingbull A thought for food does it help to know the data
dimensionality or the data manifoldbull Intuitively dimensionality implies the number of hash
functions neededbull The catch efficient dimensionality learning requires
KNN
1530 cookieshellip
Summary
bull LSH suggests a compromise on accuracy for the gain of complexity
bull Applications that involve massive data in high dimension require the LSH fast performance
bull Extension of the LSH to different spaces (PSH)
bull Learning the LSH parameters and hash functions for different applications
Conclusion
bull but at the endeverything depends on your data set
bull Try it at homendash Visit
httpwebmiteduandoniwwwLSHindexhtmlndash Email Alex Andoni Andonimitedundash Test over your own data
(C code under Red Hat Linux )
Thanks
bull Ilan Shimshoni (Haifa)
bull Mohamad Hegaze (Weizmann)
bull Alex Andoni (MIT)
bull Mica and Denis
- k-Nearest Neighbors Search in High Dimensions
- Outline
- Nearest Neighbor Search Problem definition
- Applications
- Naiumlve solution
- Common solution
- When to use nearest neighbor
- Nearest Neighbor
- r - Nearest Neighbor
- Slide 10
- The simplest solution
- Quadtree
- Quadtree - structure
- Quadtree - Query
- Quadtree ndash Pitfall1
- Slide 16
- Quadtree ndash pitfall 2
- Space partition based algorithms
- Slide 19
- Curse of dimensionality
- Curse of dimensionality Some intuition
- Slide 22
- Preview
- Hash function
- Slide 25
- Slide 26
- Recall r - Nearest Neighbor
- Locality sensitive hashing
- Slide 29
- Hamming Space
- Slide 31
- L1 to Hamming Space Embedding
- Slide 33
- Construction
- Query
- Alternative intuition random projections
- Slide 37
- Slide 38
- Slide 39
- k samplings
- Repeating
- Repeating L times
- Slide 43
- Secondary hashing
- The above hashing is locality-sensitive
- Slide 46
- Direct L2 solution
- Central limit theorem
- Slide 49
- Slide 50
- Norm Distance
- Slide 52
- The full Hashing
- Slide 54
- Slide 55
- Slide 56
- Generalization P-Stable distribution
- P-Stable summary
- Parameters selection
- Parameters selection hellip
- hellip Parameters selection
- Pros amp Cons
- Conclusion
- LSH - Applications
- Motivation
- Slide 66
- Given an image x what are the parameters θ in this image ie angles of joints orientation of the body etc1048698
- Ingredients
- Example based learning
- Slide 70
- The image features
- PSH The basic assumption
- Insight Manifolds
- Slide 74
- Parameter Sensitive Hashing (PSH)
- Slide 76
- Slide 77
- Slide 78
- Slide 79
- Local Weighted Regression (LWR)
- Results
- Slide 82
- Results ndash real data
- Slide 84
- Slide 85
- Fast pose estimation - summary
- Food for Thought
- Food for Thought Point Location in Different Spheres (PLDS)
- Motivation
- Slide 90
- Slide 91
- Slide 92
- Slide 93
- Image segmentation algorithm
- Slide 95
- Filtering examples
- Segmentation examples
- Mean-shift in high dimensions
- LSH-based data structure
- Choosing the optimal K and L
- Slide 101
- Choosing optimal K and L
- Slide 103
- Data driven partitions
- Additional speedup
- Speedup results
- Food for thought
- A thought for foodhellip
- Summary
- Slide 110
- Thanks
-
![Page 89: K-Nearest Neighbors Search in High Dimensions Tomer Peled Dan Kushnir Tell me who your neighbors are, and I'll know who you are.](https://reader036.fdocuments.us/reader036/viewer/2022062421/56649c765503460f94929f53/html5/thumbnails/89.jpg)
Motivationbull Clustering high dimensional data by using local
density measurements (eg feature space)bull Statistical curse of dimensionality
sparseness of the databull Computational curse of dimensionality
expensive range queriesbull LSH parameters should be adjusted for optimal
performance
Mean-Shift Based Clustering in High Dimensions A Texture Classification Example
B Georgescu I Shimshoni and P Meer
Outline
bull Mean-shift in a nutshell + examples
Our scope
bull Mean-shift in high dimensions ndash using LSH
bull Speedups1 Finding optimal LSH parameters
2 Data-driven partitions into buckets
3 Additional speedup by using LSH data structure
Mean-Shift in a Nutshellbandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
point
KNN in mean-shift
Bandwidth should be inversely proportional to the density in the region
high density - small bandwidth low density - large bandwidth
Based on kth nearest neighbor of the point
The bandwidth is
Adaptive mean-shift vs non-adaptive
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Image segmentation algorithm1 Input Data in 5D (3 color + 2 xy) or 3D (1 gray +2 xy)2 Resolution controlled by the bandwidth hs (spatial) hr (color)3 Apply filtering
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
3D
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Image segmentation algorithm
original segmented
filtered
Filtering pixel value of the nearest mode
Mean-shift trajectories
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
original squirrel filtered
original baboon filtered
Filtering examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Segmentation examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Mean-shift in high dimensions
Computational curse of dimensionality
Statistical curse of dimensionality
Expensive range queries implemented with LSH
Sparseness of the data variable bandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
LSH-based data structure
bull Choose L random partitionsEach partition includes K pairs
(dkvk)bull For each point we check
kdi vxK
It Partitions the data into cells
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing the optimal K and L
bull For a query q compute smallest number of distances to points in its buckets
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
points extra includemight big toois L ifbut
missed bemight points small toois L If
cell ain points ofnumber smaller k Large
C
l
l
CC
dC
LNN
dKnN
)1(
C
C
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
structure data theof resolution thedetermines
decreases but increases increases L As
C
CC
Choosing optimal K and LDetermine accurately the KNN for m randomly-selected data points
distance (bandwidth)
Choose error threshold
The optimal K and L should satisfy
the approximate distance
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing optimal K and Lbull For each K estimate the error forbull In one run for all Lrsquos find the minimal L satisfying the constraint L(K)bull Minimize time t(KL(K))
minimum
Approximationerror for KL
L(K) for =005 Running timet[KL(K)]
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Data driven partitions
bull In original LSH cut values are random in the range of the databull Suggestion Randomly select a point from the data and use one of its coordinates as the cut value
uniform data driven pointsbucket distribution
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Additional speedup
aggregate)an of typea like is (C mode same
the toconverge willCin points all that Assume
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
C
C
Speedup results
65536 points 1638 points sampled k=100
Food for thought
Low dimension High dimension
A thought for foodhellipbull Choose K L by sample learning or take the traditionalbull Can one estimate K L without samplingbull A thought for food does it help to know the data
dimensionality or the data manifoldbull Intuitively dimensionality implies the number of hash
functions neededbull The catch efficient dimensionality learning requires
KNN
1530 cookieshellip
Summary
bull LSH suggests a compromise on accuracy for the gain of complexity
bull Applications that involve massive data in high dimension require the LSH fast performance
bull Extension of the LSH to different spaces (PSH)
bull Learning the LSH parameters and hash functions for different applications
Conclusion
bull but at the endeverything depends on your data set
bull Try it at homendash Visit
httpwebmiteduandoniwwwLSHindexhtmlndash Email Alex Andoni Andonimitedundash Test over your own data
(C code under Red Hat Linux )
Thanks
bull Ilan Shimshoni (Haifa)
bull Mohamad Hegaze (Weizmann)
bull Alex Andoni (MIT)
bull Mica and Denis
- k-Nearest Neighbors Search in High Dimensions
- Outline
- Nearest Neighbor Search Problem definition
- Applications
- Naiumlve solution
- Common solution
- When to use nearest neighbor
- Nearest Neighbor
- r - Nearest Neighbor
- Slide 10
- The simplest solution
- Quadtree
- Quadtree - structure
- Quadtree - Query
- Quadtree ndash Pitfall1
- Slide 16
- Quadtree ndash pitfall 2
- Space partition based algorithms
- Slide 19
- Curse of dimensionality
- Curse of dimensionality Some intuition
- Slide 22
- Preview
- Hash function
- Slide 25
- Slide 26
- Recall r - Nearest Neighbor
- Locality sensitive hashing
- Slide 29
- Hamming Space
- Slide 31
- L1 to Hamming Space Embedding
- Slide 33
- Construction
- Query
- Alternative intuition random projections
- Slide 37
- Slide 38
- Slide 39
- k samplings
- Repeating
- Repeating L times
- Slide 43
- Secondary hashing
- The above hashing is locality-sensitive
- Slide 46
- Direct L2 solution
- Central limit theorem
- Slide 49
- Slide 50
- Norm Distance
- Slide 52
- The full Hashing
- Slide 54
- Slide 55
- Slide 56
- Generalization P-Stable distribution
- P-Stable summary
- Parameters selection
- Parameters selection hellip
- hellip Parameters selection
- Pros amp Cons
- Conclusion
- LSH - Applications
- Motivation
- Slide 66
- Given an image x what are the parameters θ in this image ie angles of joints orientation of the body etc1048698
- Ingredients
- Example based learning
- Slide 70
- The image features
- PSH The basic assumption
- Insight Manifolds
- Slide 74
- Parameter Sensitive Hashing (PSH)
- Slide 76
- Slide 77
- Slide 78
- Slide 79
- Local Weighted Regression (LWR)
- Results
- Slide 82
- Results ndash real data
- Slide 84
- Slide 85
- Fast pose estimation - summary
- Food for Thought
- Food for Thought Point Location in Different Spheres (PLDS)
- Motivation
- Slide 90
- Slide 91
- Slide 92
- Slide 93
- Image segmentation algorithm
- Slide 95
- Filtering examples
- Segmentation examples
- Mean-shift in high dimensions
- LSH-based data structure
- Choosing the optimal K and L
- Slide 101
- Choosing optimal K and L
- Slide 103
- Data driven partitions
- Additional speedup
- Speedup results
- Food for thought
- A thought for foodhellip
- Summary
- Slide 110
- Thanks
-
![Page 90: K-Nearest Neighbors Search in High Dimensions Tomer Peled Dan Kushnir Tell me who your neighbors are, and I'll know who you are.](https://reader036.fdocuments.us/reader036/viewer/2022062421/56649c765503460f94929f53/html5/thumbnails/90.jpg)
Outline
bull Mean-shift in a nutshell + examples
Our scope
bull Mean-shift in high dimensions ndash using LSH
bull Speedups1 Finding optimal LSH parameters
2 Data-driven partitions into buckets
3 Additional speedup by using LSH data structure
Mean-Shift in a Nutshellbandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
point
KNN in mean-shift
Bandwidth should be inversely proportional to the density in the region
high density - small bandwidth low density - large bandwidth
Based on kth nearest neighbor of the point
The bandwidth is
Adaptive mean-shift vs non-adaptive
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Image segmentation algorithm1 Input Data in 5D (3 color + 2 xy) or 3D (1 gray +2 xy)2 Resolution controlled by the bandwidth hs (spatial) hr (color)3 Apply filtering
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
3D
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Image segmentation algorithm
original segmented
filtered
Filtering pixel value of the nearest mode
Mean-shift trajectories
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
original squirrel filtered
original baboon filtered
Filtering examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Segmentation examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Mean-shift in high dimensions
Computational curse of dimensionality
Statistical curse of dimensionality
Expensive range queries implemented with LSH
Sparseness of the data variable bandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
LSH-based data structure
bull Choose L random partitionsEach partition includes K pairs
(dkvk)bull For each point we check
kdi vxK
It Partitions the data into cells
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing the optimal K and L
bull For a query q compute smallest number of distances to points in its buckets
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
points extra includemight big toois L ifbut
missed bemight points small toois L If
cell ain points ofnumber smaller k Large
C
l
l
CC
dC
LNN
dKnN
)1(
C
C
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
structure data theof resolution thedetermines
decreases but increases increases L As
C
CC
Choosing optimal K and LDetermine accurately the KNN for m randomly-selected data points
distance (bandwidth)
Choose error threshold
The optimal K and L should satisfy
the approximate distance
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing optimal K and Lbull For each K estimate the error forbull In one run for all Lrsquos find the minimal L satisfying the constraint L(K)bull Minimize time t(KL(K))
minimum
Approximationerror for KL
L(K) for =005 Running timet[KL(K)]
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Data driven partitions
bull In original LSH cut values are random in the range of the databull Suggestion Randomly select a point from the data and use one of its coordinates as the cut value
uniform data driven pointsbucket distribution
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Additional speedup
aggregate)an of typea like is (C mode same
the toconverge willCin points all that Assume
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
C
C
Speedup results
65536 points 1638 points sampled k=100
Food for thought
Low dimension High dimension
A thought for foodhellipbull Choose K L by sample learning or take the traditionalbull Can one estimate K L without samplingbull A thought for food does it help to know the data
dimensionality or the data manifoldbull Intuitively dimensionality implies the number of hash
functions neededbull The catch efficient dimensionality learning requires
KNN
1530 cookieshellip
Summary
bull LSH suggests a compromise on accuracy for the gain of complexity
bull Applications that involve massive data in high dimension require the LSH fast performance
bull Extension of the LSH to different spaces (PSH)
bull Learning the LSH parameters and hash functions for different applications
Conclusion
bull but at the endeverything depends on your data set
bull Try it at homendash Visit
httpwebmiteduandoniwwwLSHindexhtmlndash Email Alex Andoni Andonimitedundash Test over your own data
(C code under Red Hat Linux )
Thanks
bull Ilan Shimshoni (Haifa)
bull Mohamad Hegaze (Weizmann)
bull Alex Andoni (MIT)
bull Mica and Denis
- k-Nearest Neighbors Search in High Dimensions
- Outline
- Nearest Neighbor Search Problem definition
- Applications
- Naiumlve solution
- Common solution
- When to use nearest neighbor
- Nearest Neighbor
- r - Nearest Neighbor
- Slide 10
- The simplest solution
- Quadtree
- Quadtree - structure
- Quadtree - Query
- Quadtree ndash Pitfall1
- Slide 16
- Quadtree ndash pitfall 2
- Space partition based algorithms
- Slide 19
- Curse of dimensionality
- Curse of dimensionality Some intuition
- Slide 22
- Preview
- Hash function
- Slide 25
- Slide 26
- Recall r - Nearest Neighbor
- Locality sensitive hashing
- Slide 29
- Hamming Space
- Slide 31
- L1 to Hamming Space Embedding
- Slide 33
- Construction
- Query
- Alternative intuition random projections
- Slide 37
- Slide 38
- Slide 39
- k samplings
- Repeating
- Repeating L times
- Slide 43
- Secondary hashing
- The above hashing is locality-sensitive
- Slide 46
- Direct L2 solution
- Central limit theorem
- Slide 49
- Slide 50
- Norm Distance
- Slide 52
- The full Hashing
- Slide 54
- Slide 55
- Slide 56
- Generalization P-Stable distribution
- P-Stable summary
- Parameters selection
- Parameters selection hellip
- hellip Parameters selection
- Pros amp Cons
- Conclusion
- LSH - Applications
- Motivation
- Slide 66
- Given an image x what are the parameters θ in this image ie angles of joints orientation of the body etc1048698
- Ingredients
- Example based learning
- Slide 70
- The image features
- PSH The basic assumption
- Insight Manifolds
- Slide 74
- Parameter Sensitive Hashing (PSH)
- Slide 76
- Slide 77
- Slide 78
- Slide 79
- Local Weighted Regression (LWR)
- Results
- Slide 82
- Results ndash real data
- Slide 84
- Slide 85
- Fast pose estimation - summary
- Food for Thought
- Food for Thought Point Location in Different Spheres (PLDS)
- Motivation
- Slide 90
- Slide 91
- Slide 92
- Slide 93
- Image segmentation algorithm
- Slide 95
- Filtering examples
- Segmentation examples
- Mean-shift in high dimensions
- LSH-based data structure
- Choosing the optimal K and L
- Slide 101
- Choosing optimal K and L
- Slide 103
- Data driven partitions
- Additional speedup
- Speedup results
- Food for thought
- A thought for foodhellip
- Summary
- Slide 110
- Thanks
-
![Page 91: K-Nearest Neighbors Search in High Dimensions Tomer Peled Dan Kushnir Tell me who your neighbors are, and I'll know who you are.](https://reader036.fdocuments.us/reader036/viewer/2022062421/56649c765503460f94929f53/html5/thumbnails/91.jpg)
Mean-Shift in a Nutshellbandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
point
KNN in mean-shift
Bandwidth should be inversely proportional to the density in the region
high density - small bandwidth low density - large bandwidth
Based on kth nearest neighbor of the point
The bandwidth is
Adaptive mean-shift vs non-adaptive
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Image segmentation algorithm1 Input Data in 5D (3 color + 2 xy) or 3D (1 gray +2 xy)2 Resolution controlled by the bandwidth hs (spatial) hr (color)3 Apply filtering
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
3D
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Image segmentation algorithm
original segmented
filtered
Filtering pixel value of the nearest mode
Mean-shift trajectories
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
original squirrel filtered
original baboon filtered
Filtering examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Segmentation examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Mean-shift in high dimensions
Computational curse of dimensionality
Statistical curse of dimensionality
Expensive range queries implemented with LSH
Sparseness of the data variable bandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
LSH-based data structure
bull Choose L random partitionsEach partition includes K pairs
(dkvk)bull For each point we check
kdi vxK
It Partitions the data into cells
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing the optimal K and L
bull For a query q compute smallest number of distances to points in its buckets
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
points extra includemight big toois L ifbut
missed bemight points small toois L If
cell ain points ofnumber smaller k Large
C
l
l
CC
dC
LNN
dKnN
)1(
C
C
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
structure data theof resolution thedetermines
decreases but increases increases L As
C
CC
Choosing optimal K and LDetermine accurately the KNN for m randomly-selected data points
distance (bandwidth)
Choose error threshold
The optimal K and L should satisfy
the approximate distance
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing optimal K and Lbull For each K estimate the error forbull In one run for all Lrsquos find the minimal L satisfying the constraint L(K)bull Minimize time t(KL(K))
minimum
Approximationerror for KL
L(K) for =005 Running timet[KL(K)]
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Data driven partitions
bull In original LSH cut values are random in the range of the databull Suggestion Randomly select a point from the data and use one of its coordinates as the cut value
uniform data driven pointsbucket distribution
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Additional speedup
aggregate)an of typea like is (C mode same
the toconverge willCin points all that Assume
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
C
C
Speedup results
65536 points 1638 points sampled k=100
Food for thought
Low dimension High dimension
A thought for foodhellipbull Choose K L by sample learning or take the traditionalbull Can one estimate K L without samplingbull A thought for food does it help to know the data
dimensionality or the data manifoldbull Intuitively dimensionality implies the number of hash
functions neededbull The catch efficient dimensionality learning requires
KNN
1530 cookieshellip
Summary
bull LSH suggests a compromise on accuracy for the gain of complexity
bull Applications that involve massive data in high dimension require the LSH fast performance
bull Extension of the LSH to different spaces (PSH)
bull Learning the LSH parameters and hash functions for different applications
Conclusion
bull but at the endeverything depends on your data set
bull Try it at homendash Visit
httpwebmiteduandoniwwwLSHindexhtmlndash Email Alex Andoni Andonimitedundash Test over your own data
(C code under Red Hat Linux )
Thanks
bull Ilan Shimshoni (Haifa)
bull Mohamad Hegaze (Weizmann)
bull Alex Andoni (MIT)
bull Mica and Denis
- k-Nearest Neighbors Search in High Dimensions
- Outline
- Nearest Neighbor Search Problem definition
- Applications
- Naiumlve solution
- Common solution
- When to use nearest neighbor
- Nearest Neighbor
- r - Nearest Neighbor
- Slide 10
- The simplest solution
- Quadtree
- Quadtree - structure
- Quadtree - Query
- Quadtree ndash Pitfall1
- Slide 16
- Quadtree ndash pitfall 2
- Space partition based algorithms
- Slide 19
- Curse of dimensionality
- Curse of dimensionality Some intuition
- Slide 22
- Preview
- Hash function
- Slide 25
- Slide 26
- Recall r - Nearest Neighbor
- Locality sensitive hashing
- Slide 29
- Hamming Space
- Slide 31
- L1 to Hamming Space Embedding
- Slide 33
- Construction
- Query
- Alternative intuition random projections
- Slide 37
- Slide 38
- Slide 39
- k samplings
- Repeating
- Repeating L times
- Slide 43
- Secondary hashing
- The above hashing is locality-sensitive
- Slide 46
- Direct L2 solution
- Central limit theorem
- Slide 49
- Slide 50
- Norm Distance
- Slide 52
- The full Hashing
- Slide 54
- Slide 55
- Slide 56
- Generalization P-Stable distribution
- P-Stable summary
- Parameters selection
- Parameters selection hellip
- hellip Parameters selection
- Pros amp Cons
- Conclusion
- LSH - Applications
- Motivation
- Slide 66
- Given an image x what are the parameters θ in this image ie angles of joints orientation of the body etc1048698
- Ingredients
- Example based learning
- Slide 70
- The image features
- PSH The basic assumption
- Insight Manifolds
- Slide 74
- Parameter Sensitive Hashing (PSH)
- Slide 76
- Slide 77
- Slide 78
- Slide 79
- Local Weighted Regression (LWR)
- Results
- Slide 82
- Results ndash real data
- Slide 84
- Slide 85
- Fast pose estimation - summary
- Food for Thought
- Food for Thought Point Location in Different Spheres (PLDS)
- Motivation
- Slide 90
- Slide 91
- Slide 92
- Slide 93
- Image segmentation algorithm
- Slide 95
- Filtering examples
- Segmentation examples
- Mean-shift in high dimensions
- LSH-based data structure
- Choosing the optimal K and L
- Slide 101
- Choosing optimal K and L
- Slide 103
- Data driven partitions
- Additional speedup
- Speedup results
- Food for thought
- A thought for foodhellip
- Summary
- Slide 110
- Thanks
-
![Page 92: K-Nearest Neighbors Search in High Dimensions Tomer Peled Dan Kushnir Tell me who your neighbors are, and I'll know who you are.](https://reader036.fdocuments.us/reader036/viewer/2022062421/56649c765503460f94929f53/html5/thumbnails/92.jpg)
KNN in mean-shift
Bandwidth should be inversely proportional to the density in the region
high density - small bandwidth low density - large bandwidth
Based on kth nearest neighbor of the point
The bandwidth is
Adaptive mean-shift vs non-adaptive
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Image segmentation algorithm1 Input Data in 5D (3 color + 2 xy) or 3D (1 gray +2 xy)2 Resolution controlled by the bandwidth hs (spatial) hr (color)3 Apply filtering
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
3D
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Image segmentation algorithm
original segmented
filtered
Filtering pixel value of the nearest mode
Mean-shift trajectories
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
original squirrel filtered
original baboon filtered
Filtering examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Segmentation examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Mean-shift in high dimensions
Computational curse of dimensionality
Statistical curse of dimensionality
Expensive range queries implemented with LSH
Sparseness of the data variable bandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
LSH-based data structure
bull Choose L random partitionsEach partition includes K pairs
(dkvk)bull For each point we check
kdi vxK
It Partitions the data into cells
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing the optimal K and L
bull For a query q compute smallest number of distances to points in its buckets
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
points extra includemight big toois L ifbut
missed bemight points small toois L If
cell ain points ofnumber smaller k Large
C
l
l
CC
dC
LNN
dKnN
)1(
C
C
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
structure data theof resolution thedetermines
decreases but increases increases L As
C
CC
Choosing optimal K and LDetermine accurately the KNN for m randomly-selected data points
distance (bandwidth)
Choose error threshold
The optimal K and L should satisfy
the approximate distance
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing optimal K and Lbull For each K estimate the error forbull In one run for all Lrsquos find the minimal L satisfying the constraint L(K)bull Minimize time t(KL(K))
minimum
Approximationerror for KL
L(K) for =005 Running timet[KL(K)]
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Data driven partitions
bull In original LSH cut values are random in the range of the databull Suggestion Randomly select a point from the data and use one of its coordinates as the cut value
uniform data driven pointsbucket distribution
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Additional speedup
aggregate)an of typea like is (C mode same
the toconverge willCin points all that Assume
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
C
C
Speedup results
65536 points 1638 points sampled k=100
Food for thought
Low dimension High dimension
A thought for foodhellipbull Choose K L by sample learning or take the traditionalbull Can one estimate K L without samplingbull A thought for food does it help to know the data
dimensionality or the data manifoldbull Intuitively dimensionality implies the number of hash
functions neededbull The catch efficient dimensionality learning requires
KNN
1530 cookieshellip
Summary
bull LSH suggests a compromise on accuracy for the gain of complexity
bull Applications that involve massive data in high dimension require the LSH fast performance
bull Extension of the LSH to different spaces (PSH)
bull Learning the LSH parameters and hash functions for different applications
Conclusion
bull but at the endeverything depends on your data set
bull Try it at homendash Visit
httpwebmiteduandoniwwwLSHindexhtmlndash Email Alex Andoni Andonimitedundash Test over your own data
(C code under Red Hat Linux )
Thanks
bull Ilan Shimshoni (Haifa)
bull Mohamad Hegaze (Weizmann)
bull Alex Andoni (MIT)
bull Mica and Denis
- k-Nearest Neighbors Search in High Dimensions
- Outline
- Nearest Neighbor Search Problem definition
- Applications
- Naiumlve solution
- Common solution
- When to use nearest neighbor
- Nearest Neighbor
- r - Nearest Neighbor
- Slide 10
- The simplest solution
- Quadtree
- Quadtree - structure
- Quadtree - Query
- Quadtree ndash Pitfall1
- Slide 16
- Quadtree ndash pitfall 2
- Space partition based algorithms
- Slide 19
- Curse of dimensionality
- Curse of dimensionality Some intuition
- Slide 22
- Preview
- Hash function
- Slide 25
- Slide 26
- Recall r - Nearest Neighbor
- Locality sensitive hashing
- Slide 29
- Hamming Space
- Slide 31
- L1 to Hamming Space Embedding
- Slide 33
- Construction
- Query
- Alternative intuition random projections
- Slide 37
- Slide 38
- Slide 39
- k samplings
- Repeating
- Repeating L times
- Slide 43
- Secondary hashing
- The above hashing is locality-sensitive
- Slide 46
- Direct L2 solution
- Central limit theorem
- Slide 49
- Slide 50
- Norm Distance
- Slide 52
- The full Hashing
- Slide 54
- Slide 55
- Slide 56
- Generalization P-Stable distribution
- P-Stable summary
- Parameters selection
- Parameters selection hellip
- hellip Parameters selection
- Pros amp Cons
- Conclusion
- LSH - Applications
- Motivation
- Slide 66
- Given an image x what are the parameters θ in this image ie angles of joints orientation of the body etc1048698
- Ingredients
- Example based learning
- Slide 70
- The image features
- PSH The basic assumption
- Insight Manifolds
- Slide 74
- Parameter Sensitive Hashing (PSH)
- Slide 76
- Slide 77
- Slide 78
- Slide 79
- Local Weighted Regression (LWR)
- Results
- Slide 82
- Results ndash real data
- Slide 84
- Slide 85
- Fast pose estimation - summary
- Food for Thought
- Food for Thought Point Location in Different Spheres (PLDS)
- Motivation
- Slide 90
- Slide 91
- Slide 92
- Slide 93
- Image segmentation algorithm
- Slide 95
- Filtering examples
- Segmentation examples
- Mean-shift in high dimensions
- LSH-based data structure
- Choosing the optimal K and L
- Slide 101
- Choosing optimal K and L
- Slide 103
- Data driven partitions
- Additional speedup
- Speedup results
- Food for thought
- A thought for foodhellip
- Summary
- Slide 110
- Thanks
-
![Page 93: K-Nearest Neighbors Search in High Dimensions Tomer Peled Dan Kushnir Tell me who your neighbors are, and I'll know who you are.](https://reader036.fdocuments.us/reader036/viewer/2022062421/56649c765503460f94929f53/html5/thumbnails/93.jpg)
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Image segmentation algorithm1 Input Data in 5D (3 color + 2 xy) or 3D (1 gray +2 xy)2 Resolution controlled by the bandwidth hs (spatial) hr (color)3 Apply filtering
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
3D
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Image segmentation algorithm
original segmented
filtered
Filtering pixel value of the nearest mode
Mean-shift trajectories
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
original squirrel filtered
original baboon filtered
Filtering examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Segmentation examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Mean-shift in high dimensions
Computational curse of dimensionality
Statistical curse of dimensionality
Expensive range queries implemented with LSH
Sparseness of the data variable bandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
LSH-based data structure
bull Choose L random partitionsEach partition includes K pairs
(dkvk)bull For each point we check
kdi vxK
It Partitions the data into cells
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing the optimal K and L
bull For a query q compute smallest number of distances to points in its buckets
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
points extra includemight big toois L ifbut
missed bemight points small toois L If
cell ain points ofnumber smaller k Large
C
l
l
CC
dC
LNN
dKnN
)1(
C
C
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
structure data theof resolution thedetermines
decreases but increases increases L As
C
CC
Choosing optimal K and LDetermine accurately the KNN for m randomly-selected data points
distance (bandwidth)
Choose error threshold
The optimal K and L should satisfy
the approximate distance
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing optimal K and Lbull For each K estimate the error forbull In one run for all Lrsquos find the minimal L satisfying the constraint L(K)bull Minimize time t(KL(K))
minimum
Approximationerror for KL
L(K) for =005 Running timet[KL(K)]
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Data driven partitions
bull In original LSH cut values are random in the range of the databull Suggestion Randomly select a point from the data and use one of its coordinates as the cut value
uniform data driven pointsbucket distribution
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Additional speedup
aggregate)an of typea like is (C mode same
the toconverge willCin points all that Assume
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
C
C
Speedup results
65536 points 1638 points sampled k=100
Food for thought
Low dimension High dimension
A thought for foodhellipbull Choose K L by sample learning or take the traditionalbull Can one estimate K L without samplingbull A thought for food does it help to know the data
dimensionality or the data manifoldbull Intuitively dimensionality implies the number of hash
functions neededbull The catch efficient dimensionality learning requires
KNN
1530 cookieshellip
Summary
bull LSH suggests a compromise on accuracy for the gain of complexity
bull Applications that involve massive data in high dimension require the LSH fast performance
bull Extension of the LSH to different spaces (PSH)
bull Learning the LSH parameters and hash functions for different applications
Conclusion
bull but at the endeverything depends on your data set
bull Try it at homendash Visit
httpwebmiteduandoniwwwLSHindexhtmlndash Email Alex Andoni Andonimitedundash Test over your own data
(C code under Red Hat Linux )
Thanks
bull Ilan Shimshoni (Haifa)
bull Mohamad Hegaze (Weizmann)
bull Alex Andoni (MIT)
bull Mica and Denis
- k-Nearest Neighbors Search in High Dimensions
- Outline
- Nearest Neighbor Search Problem definition
- Applications
- Naiumlve solution
- Common solution
- When to use nearest neighbor
- Nearest Neighbor
- r - Nearest Neighbor
- Slide 10
- The simplest solution
- Quadtree
- Quadtree - structure
- Quadtree - Query
- Quadtree ndash Pitfall1
- Slide 16
- Quadtree ndash pitfall 2
- Space partition based algorithms
- Slide 19
- Curse of dimensionality
- Curse of dimensionality Some intuition
- Slide 22
- Preview
- Hash function
- Slide 25
- Slide 26
- Recall r - Nearest Neighbor
- Locality sensitive hashing
- Slide 29
- Hamming Space
- Slide 31
- L1 to Hamming Space Embedding
- Slide 33
- Construction
- Query
- Alternative intuition random projections
- Slide 37
- Slide 38
- Slide 39
- k samplings
- Repeating
- Repeating L times
- Slide 43
- Secondary hashing
- The above hashing is locality-sensitive
- Slide 46
- Direct L2 solution
- Central limit theorem
- Slide 49
- Slide 50
- Norm Distance
- Slide 52
- The full Hashing
- Slide 54
- Slide 55
- Slide 56
- Generalization P-Stable distribution
- P-Stable summary
- Parameters selection
- Parameters selection hellip
- hellip Parameters selection
- Pros amp Cons
- Conclusion
- LSH - Applications
- Motivation
- Slide 66
- Given an image x what are the parameters θ in this image ie angles of joints orientation of the body etc1048698
- Ingredients
- Example based learning
- Slide 70
- The image features
- PSH The basic assumption
- Insight Manifolds
- Slide 74
- Parameter Sensitive Hashing (PSH)
- Slide 76
- Slide 77
- Slide 78
- Slide 79
- Local Weighted Regression (LWR)
- Results
- Slide 82
- Results ndash real data
- Slide 84
- Slide 85
- Fast pose estimation - summary
- Food for Thought
- Food for Thought Point Location in Different Spheres (PLDS)
- Motivation
- Slide 90
- Slide 91
- Slide 92
- Slide 93
- Image segmentation algorithm
- Slide 95
- Filtering examples
- Segmentation examples
- Mean-shift in high dimensions
- LSH-based data structure
- Choosing the optimal K and L
- Slide 101
- Choosing optimal K and L
- Slide 103
- Data driven partitions
- Additional speedup
- Speedup results
- Food for thought
- A thought for foodhellip
- Summary
- Slide 110
- Thanks
-
![Page 94: K-Nearest Neighbors Search in High Dimensions Tomer Peled Dan Kushnir Tell me who your neighbors are, and I'll know who you are.](https://reader036.fdocuments.us/reader036/viewer/2022062421/56649c765503460f94929f53/html5/thumbnails/94.jpg)
Image segmentation algorithm1 Input Data in 5D (3 color + 2 xy) or 3D (1 gray +2 xy)2 Resolution controlled by the bandwidth hs (spatial) hr (color)3 Apply filtering
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
3D
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Image segmentation algorithm
original segmented
filtered
Filtering pixel value of the nearest mode
Mean-shift trajectories
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
original squirrel filtered
original baboon filtered
Filtering examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Segmentation examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Mean-shift in high dimensions
Computational curse of dimensionality
Statistical curse of dimensionality
Expensive range queries implemented with LSH
Sparseness of the data variable bandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
LSH-based data structure
bull Choose L random partitionsEach partition includes K pairs
(dkvk)bull For each point we check
kdi vxK
It Partitions the data into cells
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing the optimal K and L
bull For a query q compute smallest number of distances to points in its buckets
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
points extra includemight big toois L ifbut
missed bemight points small toois L If
cell ain points ofnumber smaller k Large
C
l
l
CC
dC
LNN
dKnN
)1(
C
C
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
structure data theof resolution thedetermines
decreases but increases increases L As
C
CC
Choosing optimal K and LDetermine accurately the KNN for m randomly-selected data points
distance (bandwidth)
Choose error threshold
The optimal K and L should satisfy
the approximate distance
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing optimal K and Lbull For each K estimate the error forbull In one run for all Lrsquos find the minimal L satisfying the constraint L(K)bull Minimize time t(KL(K))
minimum
Approximationerror for KL
L(K) for =005 Running timet[KL(K)]
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Data driven partitions
bull In original LSH cut values are random in the range of the databull Suggestion Randomly select a point from the data and use one of its coordinates as the cut value
uniform data driven pointsbucket distribution
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Additional speedup
aggregate)an of typea like is (C mode same
the toconverge willCin points all that Assume
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
C
C
Speedup results
65536 points 1638 points sampled k=100
Food for thought
Low dimension High dimension
A thought for foodhellipbull Choose K L by sample learning or take the traditionalbull Can one estimate K L without samplingbull A thought for food does it help to know the data
dimensionality or the data manifoldbull Intuitively dimensionality implies the number of hash
functions neededbull The catch efficient dimensionality learning requires
KNN
1530 cookieshellip
Summary
bull LSH suggests a compromise on accuracy for the gain of complexity
bull Applications that involve massive data in high dimension require the LSH fast performance
bull Extension of the LSH to different spaces (PSH)
bull Learning the LSH parameters and hash functions for different applications
Conclusion
bull but at the endeverything depends on your data set
bull Try it at homendash Visit
httpwebmiteduandoniwwwLSHindexhtmlndash Email Alex Andoni Andonimitedundash Test over your own data
(C code under Red Hat Linux )
Thanks
bull Ilan Shimshoni (Haifa)
bull Mohamad Hegaze (Weizmann)
bull Alex Andoni (MIT)
bull Mica and Denis
- k-Nearest Neighbors Search in High Dimensions
- Outline
- Nearest Neighbor Search Problem definition
- Applications
- Naiumlve solution
- Common solution
- When to use nearest neighbor
- Nearest Neighbor
- r - Nearest Neighbor
- Slide 10
- The simplest solution
- Quadtree
- Quadtree - structure
- Quadtree - Query
- Quadtree ndash Pitfall1
- Slide 16
- Quadtree ndash pitfall 2
- Space partition based algorithms
- Slide 19
- Curse of dimensionality
- Curse of dimensionality Some intuition
- Slide 22
- Preview
- Hash function
- Slide 25
- Slide 26
- Recall r - Nearest Neighbor
- Locality sensitive hashing
- Slide 29
- Hamming Space
- Slide 31
- L1 to Hamming Space Embedding
- Slide 33
- Construction
- Query
- Alternative intuition random projections
- Slide 37
- Slide 38
- Slide 39
- k samplings
- Repeating
- Repeating L times
- Slide 43
- Secondary hashing
- The above hashing is locality-sensitive
- Slide 46
- Direct L2 solution
- Central limit theorem
- Slide 49
- Slide 50
- Norm Distance
- Slide 52
- The full Hashing
- Slide 54
- Slide 55
- Slide 56
- Generalization P-Stable distribution
- P-Stable summary
- Parameters selection
- Parameters selection hellip
- hellip Parameters selection
- Pros amp Cons
- Conclusion
- LSH - Applications
- Motivation
- Slide 66
- Given an image x what are the parameters θ in this image ie angles of joints orientation of the body etc1048698
- Ingredients
- Example based learning
- Slide 70
- The image features
- PSH The basic assumption
- Insight Manifolds
- Slide 74
- Parameter Sensitive Hashing (PSH)
- Slide 76
- Slide 77
- Slide 78
- Slide 79
- Local Weighted Regression (LWR)
- Results
- Slide 82
- Results ndash real data
- Slide 84
- Slide 85
- Fast pose estimation - summary
- Food for Thought
- Food for Thought Point Location in Different Spheres (PLDS)
- Motivation
- Slide 90
- Slide 91
- Slide 92
- Slide 93
- Image segmentation algorithm
- Slide 95
- Filtering examples
- Segmentation examples
- Mean-shift in high dimensions
- LSH-based data structure
- Choosing the optimal K and L
- Slide 101
- Choosing optimal K and L
- Slide 103
- Data driven partitions
- Additional speedup
- Speedup results
- Food for thought
- A thought for foodhellip
- Summary
- Slide 110
- Thanks
-
![Page 95: K-Nearest Neighbors Search in High Dimensions Tomer Peled Dan Kushnir Tell me who your neighbors are, and I'll know who you are.](https://reader036.fdocuments.us/reader036/viewer/2022062421/56649c765503460f94929f53/html5/thumbnails/95.jpg)
Image segmentation algorithm
original segmented
filtered
Filtering pixel value of the nearest mode
Mean-shift trajectories
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
original squirrel filtered
original baboon filtered
Filtering examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Segmentation examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Mean-shift in high dimensions
Computational curse of dimensionality
Statistical curse of dimensionality
Expensive range queries implemented with LSH
Sparseness of the data variable bandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
LSH-based data structure
bull Choose L random partitionsEach partition includes K pairs
(dkvk)bull For each point we check
kdi vxK
It Partitions the data into cells
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing the optimal K and L
bull For a query q compute smallest number of distances to points in its buckets
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
points extra includemight big toois L ifbut
missed bemight points small toois L If
cell ain points ofnumber smaller k Large
C
l
l
CC
dC
LNN
dKnN
)1(
C
C
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
structure data theof resolution thedetermines
decreases but increases increases L As
C
CC
Choosing optimal K and LDetermine accurately the KNN for m randomly-selected data points
distance (bandwidth)
Choose error threshold
The optimal K and L should satisfy
the approximate distance
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing optimal K and Lbull For each K estimate the error forbull In one run for all Lrsquos find the minimal L satisfying the constraint L(K)bull Minimize time t(KL(K))
minimum
Approximationerror for KL
L(K) for =005 Running timet[KL(K)]
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Data driven partitions
bull In original LSH cut values are random in the range of the databull Suggestion Randomly select a point from the data and use one of its coordinates as the cut value
uniform data driven pointsbucket distribution
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Additional speedup
aggregate)an of typea like is (C mode same
the toconverge willCin points all that Assume
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
C
C
Speedup results
65536 points 1638 points sampled k=100
Food for thought
Low dimension High dimension
A thought for foodhellipbull Choose K L by sample learning or take the traditionalbull Can one estimate K L without samplingbull A thought for food does it help to know the data
dimensionality or the data manifoldbull Intuitively dimensionality implies the number of hash
functions neededbull The catch efficient dimensionality learning requires
KNN
1530 cookieshellip
Summary
bull LSH suggests a compromise on accuracy for the gain of complexity
bull Applications that involve massive data in high dimension require the LSH fast performance
bull Extension of the LSH to different spaces (PSH)
bull Learning the LSH parameters and hash functions for different applications
Conclusion
bull but at the endeverything depends on your data set
bull Try it at homendash Visit
httpwebmiteduandoniwwwLSHindexhtmlndash Email Alex Andoni Andonimitedundash Test over your own data
(C code under Red Hat Linux )
Thanks
bull Ilan Shimshoni (Haifa)
bull Mohamad Hegaze (Weizmann)
bull Alex Andoni (MIT)
bull Mica and Denis
- k-Nearest Neighbors Search in High Dimensions
- Outline
- Nearest Neighbor Search Problem definition
- Applications
- Naiumlve solution
- Common solution
- When to use nearest neighbor
- Nearest Neighbor
- r - Nearest Neighbor
- Slide 10
- The simplest solution
- Quadtree
- Quadtree - structure
- Quadtree - Query
- Quadtree ndash Pitfall1
- Slide 16
- Quadtree ndash pitfall 2
- Space partition based algorithms
- Slide 19
- Curse of dimensionality
- Curse of dimensionality Some intuition
- Slide 22
- Preview
- Hash function
- Slide 25
- Slide 26
- Recall r - Nearest Neighbor
- Locality sensitive hashing
- Slide 29
- Hamming Space
- Slide 31
- L1 to Hamming Space Embedding
- Slide 33
- Construction
- Query
- Alternative intuition random projections
- Slide 37
- Slide 38
- Slide 39
- k samplings
- Repeating
- Repeating L times
- Slide 43
- Secondary hashing
- The above hashing is locality-sensitive
- Slide 46
- Direct L2 solution
- Central limit theorem
- Slide 49
- Slide 50
- Norm Distance
- Slide 52
- The full Hashing
- Slide 54
- Slide 55
- Slide 56
- Generalization P-Stable distribution
- P-Stable summary
- Parameters selection
- Parameters selection hellip
- hellip Parameters selection
- Pros amp Cons
- Conclusion
- LSH - Applications
- Motivation
- Slide 66
- Given an image x what are the parameters θ in this image ie angles of joints orientation of the body etc1048698
- Ingredients
- Example based learning
- Slide 70
- The image features
- PSH The basic assumption
- Insight Manifolds
- Slide 74
- Parameter Sensitive Hashing (PSH)
- Slide 76
- Slide 77
- Slide 78
- Slide 79
- Local Weighted Regression (LWR)
- Results
- Slide 82
- Results ndash real data
- Slide 84
- Slide 85
- Fast pose estimation - summary
- Food for Thought
- Food for Thought Point Location in Different Spheres (PLDS)
- Motivation
- Slide 90
- Slide 91
- Slide 92
- Slide 93
- Image segmentation algorithm
- Slide 95
- Filtering examples
- Segmentation examples
- Mean-shift in high dimensions
- LSH-based data structure
- Choosing the optimal K and L
- Slide 101
- Choosing optimal K and L
- Slide 103
- Data driven partitions
- Additional speedup
- Speedup results
- Food for thought
- A thought for foodhellip
- Summary
- Slide 110
- Thanks
-
![Page 96: K-Nearest Neighbors Search in High Dimensions Tomer Peled Dan Kushnir Tell me who your neighbors are, and I'll know who you are.](https://reader036.fdocuments.us/reader036/viewer/2022062421/56649c765503460f94929f53/html5/thumbnails/96.jpg)
original squirrel filtered
original baboon filtered
Filtering examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Segmentation examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Mean-shift in high dimensions
Computational curse of dimensionality
Statistical curse of dimensionality
Expensive range queries implemented with LSH
Sparseness of the data variable bandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
LSH-based data structure
bull Choose L random partitionsEach partition includes K pairs
(dkvk)bull For each point we check
kdi vxK
It Partitions the data into cells
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing the optimal K and L
bull For a query q compute smallest number of distances to points in its buckets
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
points extra includemight big toois L ifbut
missed bemight points small toois L If
cell ain points ofnumber smaller k Large
C
l
l
CC
dC
LNN
dKnN
)1(
C
C
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
structure data theof resolution thedetermines
decreases but increases increases L As
C
CC
Choosing optimal K and LDetermine accurately the KNN for m randomly-selected data points
distance (bandwidth)
Choose error threshold
The optimal K and L should satisfy
the approximate distance
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing optimal K and Lbull For each K estimate the error forbull In one run for all Lrsquos find the minimal L satisfying the constraint L(K)bull Minimize time t(KL(K))
minimum
Approximationerror for KL
L(K) for =005 Running timet[KL(K)]
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Data driven partitions
bull In original LSH cut values are random in the range of the databull Suggestion Randomly select a point from the data and use one of its coordinates as the cut value
uniform data driven pointsbucket distribution
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Additional speedup
aggregate)an of typea like is (C mode same
the toconverge willCin points all that Assume
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
C
C
Speedup results
65536 points 1638 points sampled k=100
Food for thought
Low dimension High dimension
A thought for foodhellipbull Choose K L by sample learning or take the traditionalbull Can one estimate K L without samplingbull A thought for food does it help to know the data
dimensionality or the data manifoldbull Intuitively dimensionality implies the number of hash
functions neededbull The catch efficient dimensionality learning requires
KNN
1530 cookieshellip
Summary
bull LSH suggests a compromise on accuracy for the gain of complexity
bull Applications that involve massive data in high dimension require the LSH fast performance
bull Extension of the LSH to different spaces (PSH)
bull Learning the LSH parameters and hash functions for different applications
Conclusion
bull but at the endeverything depends on your data set
bull Try it at homendash Visit
httpwebmiteduandoniwwwLSHindexhtmlndash Email Alex Andoni Andonimitedundash Test over your own data
(C code under Red Hat Linux )
Thanks
bull Ilan Shimshoni (Haifa)
bull Mohamad Hegaze (Weizmann)
bull Alex Andoni (MIT)
bull Mica and Denis
- k-Nearest Neighbors Search in High Dimensions
- Outline
- Nearest Neighbor Search Problem definition
- Applications
- Naiumlve solution
- Common solution
- When to use nearest neighbor
- Nearest Neighbor
- r - Nearest Neighbor
- Slide 10
- The simplest solution
- Quadtree
- Quadtree - structure
- Quadtree - Query
- Quadtree ndash Pitfall1
- Slide 16
- Quadtree ndash pitfall 2
- Space partition based algorithms
- Slide 19
- Curse of dimensionality
- Curse of dimensionality Some intuition
- Slide 22
- Preview
- Hash function
- Slide 25
- Slide 26
- Recall r - Nearest Neighbor
- Locality sensitive hashing
- Slide 29
- Hamming Space
- Slide 31
- L1 to Hamming Space Embedding
- Slide 33
- Construction
- Query
- Alternative intuition random projections
- Slide 37
- Slide 38
- Slide 39
- k samplings
- Repeating
- Repeating L times
- Slide 43
- Secondary hashing
- The above hashing is locality-sensitive
- Slide 46
- Direct L2 solution
- Central limit theorem
- Slide 49
- Slide 50
- Norm Distance
- Slide 52
- The full Hashing
- Slide 54
- Slide 55
- Slide 56
- Generalization P-Stable distribution
- P-Stable summary
- Parameters selection
- Parameters selection hellip
- hellip Parameters selection
- Pros amp Cons
- Conclusion
- LSH - Applications
- Motivation
- Slide 66
- Given an image x what are the parameters θ in this image ie angles of joints orientation of the body etc1048698
- Ingredients
- Example based learning
- Slide 70
- The image features
- PSH The basic assumption
- Insight Manifolds
- Slide 74
- Parameter Sensitive Hashing (PSH)
- Slide 76
- Slide 77
- Slide 78
- Slide 79
- Local Weighted Regression (LWR)
- Results
- Slide 82
- Results ndash real data
- Slide 84
- Slide 85
- Fast pose estimation - summary
- Food for Thought
- Food for Thought Point Location in Different Spheres (PLDS)
- Motivation
- Slide 90
- Slide 91
- Slide 92
- Slide 93
- Image segmentation algorithm
- Slide 95
- Filtering examples
- Segmentation examples
- Mean-shift in high dimensions
- LSH-based data structure
- Choosing the optimal K and L
- Slide 101
- Choosing optimal K and L
- Slide 103
- Data driven partitions
- Additional speedup
- Speedup results
- Food for thought
- A thought for foodhellip
- Summary
- Slide 110
- Thanks
-
![Page 97: K-Nearest Neighbors Search in High Dimensions Tomer Peled Dan Kushnir Tell me who your neighbors are, and I'll know who you are.](https://reader036.fdocuments.us/reader036/viewer/2022062421/56649c765503460f94929f53/html5/thumbnails/97.jpg)
Segmentation examples
Mean-shift A Robust Approach Towards Feature Space Analysis D Comaniciu et al TPAMI 02rsquo
Mean-shift in high dimensions
Computational curse of dimensionality
Statistical curse of dimensionality
Expensive range queries implemented with LSH
Sparseness of the data variable bandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
LSH-based data structure
bull Choose L random partitionsEach partition includes K pairs
(dkvk)bull For each point we check
kdi vxK
It Partitions the data into cells
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing the optimal K and L
bull For a query q compute smallest number of distances to points in its buckets
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
points extra includemight big toois L ifbut
missed bemight points small toois L If
cell ain points ofnumber smaller k Large
C
l
l
CC
dC
LNN
dKnN
)1(
C
C
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
structure data theof resolution thedetermines
decreases but increases increases L As
C
CC
Choosing optimal K and LDetermine accurately the KNN for m randomly-selected data points
distance (bandwidth)
Choose error threshold
The optimal K and L should satisfy
the approximate distance
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing optimal K and Lbull For each K estimate the error forbull In one run for all Lrsquos find the minimal L satisfying the constraint L(K)bull Minimize time t(KL(K))
minimum
Approximationerror for KL
L(K) for =005 Running timet[KL(K)]
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Data driven partitions
bull In original LSH cut values are random in the range of the databull Suggestion Randomly select a point from the data and use one of its coordinates as the cut value
uniform data driven pointsbucket distribution
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Additional speedup
aggregate)an of typea like is (C mode same
the toconverge willCin points all that Assume
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
C
C
Speedup results
65536 points 1638 points sampled k=100
Food for thought
Low dimension High dimension
A thought for foodhellipbull Choose K L by sample learning or take the traditionalbull Can one estimate K L without samplingbull A thought for food does it help to know the data
dimensionality or the data manifoldbull Intuitively dimensionality implies the number of hash
functions neededbull The catch efficient dimensionality learning requires
KNN
1530 cookieshellip
Summary
bull LSH suggests a compromise on accuracy for the gain of complexity
bull Applications that involve massive data in high dimension require the LSH fast performance
bull Extension of the LSH to different spaces (PSH)
bull Learning the LSH parameters and hash functions for different applications
Conclusion
bull but at the endeverything depends on your data set
bull Try it at homendash Visit
httpwebmiteduandoniwwwLSHindexhtmlndash Email Alex Andoni Andonimitedundash Test over your own data
(C code under Red Hat Linux )
Thanks
bull Ilan Shimshoni (Haifa)
bull Mohamad Hegaze (Weizmann)
bull Alex Andoni (MIT)
bull Mica and Denis
- k-Nearest Neighbors Search in High Dimensions
- Outline
- Nearest Neighbor Search Problem definition
- Applications
- Naiumlve solution
- Common solution
- When to use nearest neighbor
- Nearest Neighbor
- r - Nearest Neighbor
- Slide 10
- The simplest solution
- Quadtree
- Quadtree - structure
- Quadtree - Query
- Quadtree ndash Pitfall1
- Slide 16
- Quadtree ndash pitfall 2
- Space partition based algorithms
- Slide 19
- Curse of dimensionality
- Curse of dimensionality Some intuition
- Slide 22
- Preview
- Hash function
- Slide 25
- Slide 26
- Recall r - Nearest Neighbor
- Locality sensitive hashing
- Slide 29
- Hamming Space
- Slide 31
- L1 to Hamming Space Embedding
- Slide 33
- Construction
- Query
- Alternative intuition random projections
- Slide 37
- Slide 38
- Slide 39
- k samplings
- Repeating
- Repeating L times
- Slide 43
- Secondary hashing
- The above hashing is locality-sensitive
- Slide 46
- Direct L2 solution
- Central limit theorem
- Slide 49
- Slide 50
- Norm Distance
- Slide 52
- The full Hashing
- Slide 54
- Slide 55
- Slide 56
- Generalization P-Stable distribution
- P-Stable summary
- Parameters selection
- Parameters selection hellip
- hellip Parameters selection
- Pros amp Cons
- Conclusion
- LSH - Applications
- Motivation
- Slide 66
- Given an image x what are the parameters θ in this image ie angles of joints orientation of the body etc1048698
- Ingredients
- Example based learning
- Slide 70
- The image features
- PSH The basic assumption
- Insight Manifolds
- Slide 74
- Parameter Sensitive Hashing (PSH)
- Slide 76
- Slide 77
- Slide 78
- Slide 79
- Local Weighted Regression (LWR)
- Results
- Slide 82
- Results ndash real data
- Slide 84
- Slide 85
- Fast pose estimation - summary
- Food for Thought
- Food for Thought Point Location in Different Spheres (PLDS)
- Motivation
- Slide 90
- Slide 91
- Slide 92
- Slide 93
- Image segmentation algorithm
- Slide 95
- Filtering examples
- Segmentation examples
- Mean-shift in high dimensions
- LSH-based data structure
- Choosing the optimal K and L
- Slide 101
- Choosing optimal K and L
- Slide 103
- Data driven partitions
- Additional speedup
- Speedup results
- Food for thought
- A thought for foodhellip
- Summary
- Slide 110
- Thanks
-
![Page 98: K-Nearest Neighbors Search in High Dimensions Tomer Peled Dan Kushnir Tell me who your neighbors are, and I'll know who you are.](https://reader036.fdocuments.us/reader036/viewer/2022062421/56649c765503460f94929f53/html5/thumbnails/98.jpg)
Mean-shift in high dimensions
Computational curse of dimensionality
Statistical curse of dimensionality
Expensive range queries implemented with LSH
Sparseness of the data variable bandwidth
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
LSH-based data structure
bull Choose L random partitionsEach partition includes K pairs
(dkvk)bull For each point we check
kdi vxK
It Partitions the data into cells
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing the optimal K and L
bull For a query q compute smallest number of distances to points in its buckets
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
points extra includemight big toois L ifbut
missed bemight points small toois L If
cell ain points ofnumber smaller k Large
C
l
l
CC
dC
LNN
dKnN
)1(
C
C
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
structure data theof resolution thedetermines
decreases but increases increases L As
C
CC
Choosing optimal K and LDetermine accurately the KNN for m randomly-selected data points
distance (bandwidth)
Choose error threshold
The optimal K and L should satisfy
the approximate distance
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing optimal K and Lbull For each K estimate the error forbull In one run for all Lrsquos find the minimal L satisfying the constraint L(K)bull Minimize time t(KL(K))
minimum
Approximationerror for KL
L(K) for =005 Running timet[KL(K)]
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Data driven partitions
bull In original LSH cut values are random in the range of the databull Suggestion Randomly select a point from the data and use one of its coordinates as the cut value
uniform data driven pointsbucket distribution
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Additional speedup
aggregate)an of typea like is (C mode same
the toconverge willCin points all that Assume
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
C
C
Speedup results
65536 points 1638 points sampled k=100
Food for thought
Low dimension High dimension
A thought for foodhellipbull Choose K L by sample learning or take the traditionalbull Can one estimate K L without samplingbull A thought for food does it help to know the data
dimensionality or the data manifoldbull Intuitively dimensionality implies the number of hash
functions neededbull The catch efficient dimensionality learning requires
KNN
1530 cookieshellip
Summary
bull LSH suggests a compromise on accuracy for the gain of complexity
bull Applications that involve massive data in high dimension require the LSH fast performance
bull Extension of the LSH to different spaces (PSH)
bull Learning the LSH parameters and hash functions for different applications
Conclusion
bull but at the endeverything depends on your data set
bull Try it at homendash Visit
httpwebmiteduandoniwwwLSHindexhtmlndash Email Alex Andoni Andonimitedundash Test over your own data
(C code under Red Hat Linux )
Thanks
bull Ilan Shimshoni (Haifa)
bull Mohamad Hegaze (Weizmann)
bull Alex Andoni (MIT)
bull Mica and Denis
- k-Nearest Neighbors Search in High Dimensions
- Outline
- Nearest Neighbor Search Problem definition
- Applications
- Naiumlve solution
- Common solution
- When to use nearest neighbor
- Nearest Neighbor
- r - Nearest Neighbor
- Slide 10
- The simplest solution
- Quadtree
- Quadtree - structure
- Quadtree - Query
- Quadtree ndash Pitfall1
- Slide 16
- Quadtree ndash pitfall 2
- Space partition based algorithms
- Slide 19
- Curse of dimensionality
- Curse of dimensionality Some intuition
- Slide 22
- Preview
- Hash function
- Slide 25
- Slide 26
- Recall r - Nearest Neighbor
- Locality sensitive hashing
- Slide 29
- Hamming Space
- Slide 31
- L1 to Hamming Space Embedding
- Slide 33
- Construction
- Query
- Alternative intuition random projections
- Slide 37
- Slide 38
- Slide 39
- k samplings
- Repeating
- Repeating L times
- Slide 43
- Secondary hashing
- The above hashing is locality-sensitive
- Slide 46
- Direct L2 solution
- Central limit theorem
- Slide 49
- Slide 50
- Norm Distance
- Slide 52
- The full Hashing
- Slide 54
- Slide 55
- Slide 56
- Generalization P-Stable distribution
- P-Stable summary
- Parameters selection
- Parameters selection hellip
- hellip Parameters selection
- Pros amp Cons
- Conclusion
- LSH - Applications
- Motivation
- Slide 66
- Given an image x what are the parameters θ in this image ie angles of joints orientation of the body etc1048698
- Ingredients
- Example based learning
- Slide 70
- The image features
- PSH The basic assumption
- Insight Manifolds
- Slide 74
- Parameter Sensitive Hashing (PSH)
- Slide 76
- Slide 77
- Slide 78
- Slide 79
- Local Weighted Regression (LWR)
- Results
- Slide 82
- Results ndash real data
- Slide 84
- Slide 85
- Fast pose estimation - summary
- Food for Thought
- Food for Thought Point Location in Different Spheres (PLDS)
- Motivation
- Slide 90
- Slide 91
- Slide 92
- Slide 93
- Image segmentation algorithm
- Slide 95
- Filtering examples
- Segmentation examples
- Mean-shift in high dimensions
- LSH-based data structure
- Choosing the optimal K and L
- Slide 101
- Choosing optimal K and L
- Slide 103
- Data driven partitions
- Additional speedup
- Speedup results
- Food for thought
- A thought for foodhellip
- Summary
- Slide 110
- Thanks
-
![Page 99: K-Nearest Neighbors Search in High Dimensions Tomer Peled Dan Kushnir Tell me who your neighbors are, and I'll know who you are.](https://reader036.fdocuments.us/reader036/viewer/2022062421/56649c765503460f94929f53/html5/thumbnails/99.jpg)
LSH-based data structure
bull Choose L random partitionsEach partition includes K pairs
(dkvk)bull For each point we check
kdi vxK
It Partitions the data into cells
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing the optimal K and L
bull For a query q compute smallest number of distances to points in its buckets
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
points extra includemight big toois L ifbut
missed bemight points small toois L If
cell ain points ofnumber smaller k Large
C
l
l
CC
dC
LNN
dKnN
)1(
C
C
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
structure data theof resolution thedetermines
decreases but increases increases L As
C
CC
Choosing optimal K and LDetermine accurately the KNN for m randomly-selected data points
distance (bandwidth)
Choose error threshold
The optimal K and L should satisfy
the approximate distance
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing optimal K and Lbull For each K estimate the error forbull In one run for all Lrsquos find the minimal L satisfying the constraint L(K)bull Minimize time t(KL(K))
minimum
Approximationerror for KL
L(K) for =005 Running timet[KL(K)]
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Data driven partitions
bull In original LSH cut values are random in the range of the databull Suggestion Randomly select a point from the data and use one of its coordinates as the cut value
uniform data driven pointsbucket distribution
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Additional speedup
aggregate)an of typea like is (C mode same
the toconverge willCin points all that Assume
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
C
C
Speedup results
65536 points 1638 points sampled k=100
Food for thought
Low dimension High dimension
A thought for foodhellipbull Choose K L by sample learning or take the traditionalbull Can one estimate K L without samplingbull A thought for food does it help to know the data
dimensionality or the data manifoldbull Intuitively dimensionality implies the number of hash
functions neededbull The catch efficient dimensionality learning requires
KNN
1530 cookieshellip
Summary
bull LSH suggests a compromise on accuracy for the gain of complexity
bull Applications that involve massive data in high dimension require the LSH fast performance
bull Extension of the LSH to different spaces (PSH)
bull Learning the LSH parameters and hash functions for different applications
Conclusion
bull but at the endeverything depends on your data set
bull Try it at homendash Visit
httpwebmiteduandoniwwwLSHindexhtmlndash Email Alex Andoni Andonimitedundash Test over your own data
(C code under Red Hat Linux )
Thanks
bull Ilan Shimshoni (Haifa)
bull Mohamad Hegaze (Weizmann)
bull Alex Andoni (MIT)
bull Mica and Denis
- k-Nearest Neighbors Search in High Dimensions
- Outline
- Nearest Neighbor Search Problem definition
- Applications
- Naiumlve solution
- Common solution
- When to use nearest neighbor
- Nearest Neighbor
- r - Nearest Neighbor
- Slide 10
- The simplest solution
- Quadtree
- Quadtree - structure
- Quadtree - Query
- Quadtree ndash Pitfall1
- Slide 16
- Quadtree ndash pitfall 2
- Space partition based algorithms
- Slide 19
- Curse of dimensionality
- Curse of dimensionality Some intuition
- Slide 22
- Preview
- Hash function
- Slide 25
- Slide 26
- Recall r - Nearest Neighbor
- Locality sensitive hashing
- Slide 29
- Hamming Space
- Slide 31
- L1 to Hamming Space Embedding
- Slide 33
- Construction
- Query
- Alternative intuition random projections
- Slide 37
- Slide 38
- Slide 39
- k samplings
- Repeating
- Repeating L times
- Slide 43
- Secondary hashing
- The above hashing is locality-sensitive
- Slide 46
- Direct L2 solution
- Central limit theorem
- Slide 49
- Slide 50
- Norm Distance
- Slide 52
- The full Hashing
- Slide 54
- Slide 55
- Slide 56
- Generalization P-Stable distribution
- P-Stable summary
- Parameters selection
- Parameters selection hellip
- hellip Parameters selection
- Pros amp Cons
- Conclusion
- LSH - Applications
- Motivation
- Slide 66
- Given an image x what are the parameters θ in this image ie angles of joints orientation of the body etc1048698
- Ingredients
- Example based learning
- Slide 70
- The image features
- PSH The basic assumption
- Insight Manifolds
- Slide 74
- Parameter Sensitive Hashing (PSH)
- Slide 76
- Slide 77
- Slide 78
- Slide 79
- Local Weighted Regression (LWR)
- Results
- Slide 82
- Results ndash real data
- Slide 84
- Slide 85
- Fast pose estimation - summary
- Food for Thought
- Food for Thought Point Location in Different Spheres (PLDS)
- Motivation
- Slide 90
- Slide 91
- Slide 92
- Slide 93
- Image segmentation algorithm
- Slide 95
- Filtering examples
- Segmentation examples
- Mean-shift in high dimensions
- LSH-based data structure
- Choosing the optimal K and L
- Slide 101
- Choosing optimal K and L
- Slide 103
- Data driven partitions
- Additional speedup
- Speedup results
- Food for thought
- A thought for foodhellip
- Summary
- Slide 110
- Thanks
-
![Page 100: K-Nearest Neighbors Search in High Dimensions Tomer Peled Dan Kushnir Tell me who your neighbors are, and I'll know who you are.](https://reader036.fdocuments.us/reader036/viewer/2022062421/56649c765503460f94929f53/html5/thumbnails/100.jpg)
Choosing the optimal K and L
bull For a query q compute smallest number of distances to points in its buckets
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
points extra includemight big toois L ifbut
missed bemight points small toois L If
cell ain points ofnumber smaller k Large
C
l
l
CC
dC
LNN
dKnN
)1(
C
C
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
structure data theof resolution thedetermines
decreases but increases increases L As
C
CC
Choosing optimal K and LDetermine accurately the KNN for m randomly-selected data points
distance (bandwidth)
Choose error threshold
The optimal K and L should satisfy
the approximate distance
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing optimal K and Lbull For each K estimate the error forbull In one run for all Lrsquos find the minimal L satisfying the constraint L(K)bull Minimize time t(KL(K))
minimum
Approximationerror for KL
L(K) for =005 Running timet[KL(K)]
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Data driven partitions
bull In original LSH cut values are random in the range of the databull Suggestion Randomly select a point from the data and use one of its coordinates as the cut value
uniform data driven pointsbucket distribution
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Additional speedup
aggregate)an of typea like is (C mode same
the toconverge willCin points all that Assume
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
C
C
Speedup results
65536 points 1638 points sampled k=100
Food for thought
Low dimension High dimension
A thought for foodhellipbull Choose K L by sample learning or take the traditionalbull Can one estimate K L without samplingbull A thought for food does it help to know the data
dimensionality or the data manifoldbull Intuitively dimensionality implies the number of hash
functions neededbull The catch efficient dimensionality learning requires
KNN
1530 cookieshellip
Summary
bull LSH suggests a compromise on accuracy for the gain of complexity
bull Applications that involve massive data in high dimension require the LSH fast performance
bull Extension of the LSH to different spaces (PSH)
bull Learning the LSH parameters and hash functions for different applications
Conclusion
bull but at the endeverything depends on your data set
bull Try it at homendash Visit
httpwebmiteduandoniwwwLSHindexhtmlndash Email Alex Andoni Andonimitedundash Test over your own data
(C code under Red Hat Linux )
Thanks
bull Ilan Shimshoni (Haifa)
bull Mohamad Hegaze (Weizmann)
bull Alex Andoni (MIT)
bull Mica and Denis
- k-Nearest Neighbors Search in High Dimensions
- Outline
- Nearest Neighbor Search Problem definition
- Applications
- Naiumlve solution
- Common solution
- When to use nearest neighbor
- Nearest Neighbor
- r - Nearest Neighbor
- Slide 10
- The simplest solution
- Quadtree
- Quadtree - structure
- Quadtree - Query
- Quadtree ndash Pitfall1
- Slide 16
- Quadtree ndash pitfall 2
- Space partition based algorithms
- Slide 19
- Curse of dimensionality
- Curse of dimensionality Some intuition
- Slide 22
- Preview
- Hash function
- Slide 25
- Slide 26
- Recall r - Nearest Neighbor
- Locality sensitive hashing
- Slide 29
- Hamming Space
- Slide 31
- L1 to Hamming Space Embedding
- Slide 33
- Construction
- Query
- Alternative intuition random projections
- Slide 37
- Slide 38
- Slide 39
- k samplings
- Repeating
- Repeating L times
- Slide 43
- Secondary hashing
- The above hashing is locality-sensitive
- Slide 46
- Direct L2 solution
- Central limit theorem
- Slide 49
- Slide 50
- Norm Distance
- Slide 52
- The full Hashing
- Slide 54
- Slide 55
- Slide 56
- Generalization P-Stable distribution
- P-Stable summary
- Parameters selection
- Parameters selection hellip
- hellip Parameters selection
- Pros amp Cons
- Conclusion
- LSH - Applications
- Motivation
- Slide 66
- Given an image x what are the parameters θ in this image ie angles of joints orientation of the body etc1048698
- Ingredients
- Example based learning
- Slide 70
- The image features
- PSH The basic assumption
- Insight Manifolds
- Slide 74
- Parameter Sensitive Hashing (PSH)
- Slide 76
- Slide 77
- Slide 78
- Slide 79
- Local Weighted Regression (LWR)
- Results
- Slide 82
- Results ndash real data
- Slide 84
- Slide 85
- Fast pose estimation - summary
- Food for Thought
- Food for Thought Point Location in Different Spheres (PLDS)
- Motivation
- Slide 90
- Slide 91
- Slide 92
- Slide 93
- Image segmentation algorithm
- Slide 95
- Filtering examples
- Segmentation examples
- Mean-shift in high dimensions
- LSH-based data structure
- Choosing the optimal K and L
- Slide 101
- Choosing optimal K and L
- Slide 103
- Data driven partitions
- Additional speedup
- Speedup results
- Food for thought
- A thought for foodhellip
- Summary
- Slide 110
- Thanks
-
![Page 101: K-Nearest Neighbors Search in High Dimensions Tomer Peled Dan Kushnir Tell me who your neighbors are, and I'll know who you are.](https://reader036.fdocuments.us/reader036/viewer/2022062421/56649c765503460f94929f53/html5/thumbnails/101.jpg)
points extra includemight big toois L ifbut
missed bemight points small toois L If
cell ain points ofnumber smaller k Large
C
l
l
CC
dC
LNN
dKnN
)1(
C
C
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
structure data theof resolution thedetermines
decreases but increases increases L As
C
CC
Choosing optimal K and LDetermine accurately the KNN for m randomly-selected data points
distance (bandwidth)
Choose error threshold
The optimal K and L should satisfy
the approximate distance
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing optimal K and Lbull For each K estimate the error forbull In one run for all Lrsquos find the minimal L satisfying the constraint L(K)bull Minimize time t(KL(K))
minimum
Approximationerror for KL
L(K) for =005 Running timet[KL(K)]
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Data driven partitions
bull In original LSH cut values are random in the range of the databull Suggestion Randomly select a point from the data and use one of its coordinates as the cut value
uniform data driven pointsbucket distribution
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Additional speedup
aggregate)an of typea like is (C mode same
the toconverge willCin points all that Assume
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
C
C
Speedup results
65536 points 1638 points sampled k=100
Food for thought
Low dimension High dimension
A thought for foodhellipbull Choose K L by sample learning or take the traditionalbull Can one estimate K L without samplingbull A thought for food does it help to know the data
dimensionality or the data manifoldbull Intuitively dimensionality implies the number of hash
functions neededbull The catch efficient dimensionality learning requires
KNN
1530 cookieshellip
Summary
bull LSH suggests a compromise on accuracy for the gain of complexity
bull Applications that involve massive data in high dimension require the LSH fast performance
bull Extension of the LSH to different spaces (PSH)
bull Learning the LSH parameters and hash functions for different applications
Conclusion
bull but at the endeverything depends on your data set
bull Try it at homendash Visit
httpwebmiteduandoniwwwLSHindexhtmlndash Email Alex Andoni Andonimitedundash Test over your own data
(C code under Red Hat Linux )
Thanks
bull Ilan Shimshoni (Haifa)
bull Mohamad Hegaze (Weizmann)
bull Alex Andoni (MIT)
bull Mica and Denis
- k-Nearest Neighbors Search in High Dimensions
- Outline
- Nearest Neighbor Search Problem definition
- Applications
- Naiumlve solution
- Common solution
- When to use nearest neighbor
- Nearest Neighbor
- r - Nearest Neighbor
- Slide 10
- The simplest solution
- Quadtree
- Quadtree - structure
- Quadtree - Query
- Quadtree ndash Pitfall1
- Slide 16
- Quadtree ndash pitfall 2
- Space partition based algorithms
- Slide 19
- Curse of dimensionality
- Curse of dimensionality Some intuition
- Slide 22
- Preview
- Hash function
- Slide 25
- Slide 26
- Recall r - Nearest Neighbor
- Locality sensitive hashing
- Slide 29
- Hamming Space
- Slide 31
- L1 to Hamming Space Embedding
- Slide 33
- Construction
- Query
- Alternative intuition random projections
- Slide 37
- Slide 38
- Slide 39
- k samplings
- Repeating
- Repeating L times
- Slide 43
- Secondary hashing
- The above hashing is locality-sensitive
- Slide 46
- Direct L2 solution
- Central limit theorem
- Slide 49
- Slide 50
- Norm Distance
- Slide 52
- The full Hashing
- Slide 54
- Slide 55
- Slide 56
- Generalization P-Stable distribution
- P-Stable summary
- Parameters selection
- Parameters selection hellip
- hellip Parameters selection
- Pros amp Cons
- Conclusion
- LSH - Applications
- Motivation
- Slide 66
- Given an image x what are the parameters θ in this image ie angles of joints orientation of the body etc1048698
- Ingredients
- Example based learning
- Slide 70
- The image features
- PSH The basic assumption
- Insight Manifolds
- Slide 74
- Parameter Sensitive Hashing (PSH)
- Slide 76
- Slide 77
- Slide 78
- Slide 79
- Local Weighted Regression (LWR)
- Results
- Slide 82
- Results ndash real data
- Slide 84
- Slide 85
- Fast pose estimation - summary
- Food for Thought
- Food for Thought Point Location in Different Spheres (PLDS)
- Motivation
- Slide 90
- Slide 91
- Slide 92
- Slide 93
- Image segmentation algorithm
- Slide 95
- Filtering examples
- Segmentation examples
- Mean-shift in high dimensions
- LSH-based data structure
- Choosing the optimal K and L
- Slide 101
- Choosing optimal K and L
- Slide 103
- Data driven partitions
- Additional speedup
- Speedup results
- Food for thought
- A thought for foodhellip
- Summary
- Slide 110
- Thanks
-
![Page 102: K-Nearest Neighbors Search in High Dimensions Tomer Peled Dan Kushnir Tell me who your neighbors are, and I'll know who you are.](https://reader036.fdocuments.us/reader036/viewer/2022062421/56649c765503460f94929f53/html5/thumbnails/102.jpg)
Choosing optimal K and LDetermine accurately the KNN for m randomly-selected data points
distance (bandwidth)
Choose error threshold
The optimal K and L should satisfy
the approximate distance
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Choosing optimal K and Lbull For each K estimate the error forbull In one run for all Lrsquos find the minimal L satisfying the constraint L(K)bull Minimize time t(KL(K))
minimum
Approximationerror for KL
L(K) for =005 Running timet[KL(K)]
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Data driven partitions
bull In original LSH cut values are random in the range of the databull Suggestion Randomly select a point from the data and use one of its coordinates as the cut value
uniform data driven pointsbucket distribution
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Additional speedup
aggregate)an of typea like is (C mode same
the toconverge willCin points all that Assume
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
C
C
Speedup results
65536 points 1638 points sampled k=100
Food for thought
Low dimension High dimension
A thought for foodhellipbull Choose K L by sample learning or take the traditionalbull Can one estimate K L without samplingbull A thought for food does it help to know the data
dimensionality or the data manifoldbull Intuitively dimensionality implies the number of hash
functions neededbull The catch efficient dimensionality learning requires
KNN
1530 cookieshellip
Summary
bull LSH suggests a compromise on accuracy for the gain of complexity
bull Applications that involve massive data in high dimension require the LSH fast performance
bull Extension of the LSH to different spaces (PSH)
bull Learning the LSH parameters and hash functions for different applications
Conclusion
bull but at the endeverything depends on your data set
bull Try it at homendash Visit
httpwebmiteduandoniwwwLSHindexhtmlndash Email Alex Andoni Andonimitedundash Test over your own data
(C code under Red Hat Linux )
Thanks
bull Ilan Shimshoni (Haifa)
bull Mohamad Hegaze (Weizmann)
bull Alex Andoni (MIT)
bull Mica and Denis
- k-Nearest Neighbors Search in High Dimensions
- Outline
- Nearest Neighbor Search Problem definition
- Applications
- Naiumlve solution
- Common solution
- When to use nearest neighbor
- Nearest Neighbor
- r - Nearest Neighbor
- Slide 10
- The simplest solution
- Quadtree
- Quadtree - structure
- Quadtree - Query
- Quadtree ndash Pitfall1
- Slide 16
- Quadtree ndash pitfall 2
- Space partition based algorithms
- Slide 19
- Curse of dimensionality
- Curse of dimensionality Some intuition
- Slide 22
- Preview
- Hash function
- Slide 25
- Slide 26
- Recall r - Nearest Neighbor
- Locality sensitive hashing
- Slide 29
- Hamming Space
- Slide 31
- L1 to Hamming Space Embedding
- Slide 33
- Construction
- Query
- Alternative intuition random projections
- Slide 37
- Slide 38
- Slide 39
- k samplings
- Repeating
- Repeating L times
- Slide 43
- Secondary hashing
- The above hashing is locality-sensitive
- Slide 46
- Direct L2 solution
- Central limit theorem
- Slide 49
- Slide 50
- Norm Distance
- Slide 52
- The full Hashing
- Slide 54
- Slide 55
- Slide 56
- Generalization P-Stable distribution
- P-Stable summary
- Parameters selection
- Parameters selection hellip
- hellip Parameters selection
- Pros amp Cons
- Conclusion
- LSH - Applications
- Motivation
- Slide 66
- Given an image x what are the parameters θ in this image ie angles of joints orientation of the body etc1048698
- Ingredients
- Example based learning
- Slide 70
- The image features
- PSH The basic assumption
- Insight Manifolds
- Slide 74
- Parameter Sensitive Hashing (PSH)
- Slide 76
- Slide 77
- Slide 78
- Slide 79
- Local Weighted Regression (LWR)
- Results
- Slide 82
- Results ndash real data
- Slide 84
- Slide 85
- Fast pose estimation - summary
- Food for Thought
- Food for Thought Point Location in Different Spheres (PLDS)
- Motivation
- Slide 90
- Slide 91
- Slide 92
- Slide 93
- Image segmentation algorithm
- Slide 95
- Filtering examples
- Segmentation examples
- Mean-shift in high dimensions
- LSH-based data structure
- Choosing the optimal K and L
- Slide 101
- Choosing optimal K and L
- Slide 103
- Data driven partitions
- Additional speedup
- Speedup results
- Food for thought
- A thought for foodhellip
- Summary
- Slide 110
- Thanks
-
![Page 103: K-Nearest Neighbors Search in High Dimensions Tomer Peled Dan Kushnir Tell me who your neighbors are, and I'll know who you are.](https://reader036.fdocuments.us/reader036/viewer/2022062421/56649c765503460f94929f53/html5/thumbnails/103.jpg)
Choosing optimal K and Lbull For each K estimate the error forbull In one run for all Lrsquos find the minimal L satisfying the constraint L(K)bull Minimize time t(KL(K))
minimum
Approximationerror for KL
L(K) for =005 Running timet[KL(K)]
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Data driven partitions
bull In original LSH cut values are random in the range of the databull Suggestion Randomly select a point from the data and use one of its coordinates as the cut value
uniform data driven pointsbucket distribution
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Additional speedup
aggregate)an of typea like is (C mode same
the toconverge willCin points all that Assume
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
C
C
Speedup results
65536 points 1638 points sampled k=100
Food for thought
Low dimension High dimension
A thought for foodhellipbull Choose K L by sample learning or take the traditionalbull Can one estimate K L without samplingbull A thought for food does it help to know the data
dimensionality or the data manifoldbull Intuitively dimensionality implies the number of hash
functions neededbull The catch efficient dimensionality learning requires
KNN
1530 cookieshellip
Summary
bull LSH suggests a compromise on accuracy for the gain of complexity
bull Applications that involve massive data in high dimension require the LSH fast performance
bull Extension of the LSH to different spaces (PSH)
bull Learning the LSH parameters and hash functions for different applications
Conclusion
bull but at the endeverything depends on your data set
bull Try it at homendash Visit
httpwebmiteduandoniwwwLSHindexhtmlndash Email Alex Andoni Andonimitedundash Test over your own data
(C code under Red Hat Linux )
Thanks
bull Ilan Shimshoni (Haifa)
bull Mohamad Hegaze (Weizmann)
bull Alex Andoni (MIT)
bull Mica and Denis
- k-Nearest Neighbors Search in High Dimensions
- Outline
- Nearest Neighbor Search Problem definition
- Applications
- Naiumlve solution
- Common solution
- When to use nearest neighbor
- Nearest Neighbor
- r - Nearest Neighbor
- Slide 10
- The simplest solution
- Quadtree
- Quadtree - structure
- Quadtree - Query
- Quadtree ndash Pitfall1
- Slide 16
- Quadtree ndash pitfall 2
- Space partition based algorithms
- Slide 19
- Curse of dimensionality
- Curse of dimensionality Some intuition
- Slide 22
- Preview
- Hash function
- Slide 25
- Slide 26
- Recall r - Nearest Neighbor
- Locality sensitive hashing
- Slide 29
- Hamming Space
- Slide 31
- L1 to Hamming Space Embedding
- Slide 33
- Construction
- Query
- Alternative intuition random projections
- Slide 37
- Slide 38
- Slide 39
- k samplings
- Repeating
- Repeating L times
- Slide 43
- Secondary hashing
- The above hashing is locality-sensitive
- Slide 46
- Direct L2 solution
- Central limit theorem
- Slide 49
- Slide 50
- Norm Distance
- Slide 52
- The full Hashing
- Slide 54
- Slide 55
- Slide 56
- Generalization P-Stable distribution
- P-Stable summary
- Parameters selection
- Parameters selection hellip
- hellip Parameters selection
- Pros amp Cons
- Conclusion
- LSH - Applications
- Motivation
- Slide 66
- Given an image x what are the parameters θ in this image ie angles of joints orientation of the body etc1048698
- Ingredients
- Example based learning
- Slide 70
- The image features
- PSH The basic assumption
- Insight Manifolds
- Slide 74
- Parameter Sensitive Hashing (PSH)
- Slide 76
- Slide 77
- Slide 78
- Slide 79
- Local Weighted Regression (LWR)
- Results
- Slide 82
- Results ndash real data
- Slide 84
- Slide 85
- Fast pose estimation - summary
- Food for Thought
- Food for Thought Point Location in Different Spheres (PLDS)
- Motivation
- Slide 90
- Slide 91
- Slide 92
- Slide 93
- Image segmentation algorithm
- Slide 95
- Filtering examples
- Segmentation examples
- Mean-shift in high dimensions
- LSH-based data structure
- Choosing the optimal K and L
- Slide 101
- Choosing optimal K and L
- Slide 103
- Data driven partitions
- Additional speedup
- Speedup results
- Food for thought
- A thought for foodhellip
- Summary
- Slide 110
- Thanks
-
![Page 104: K-Nearest Neighbors Search in High Dimensions Tomer Peled Dan Kushnir Tell me who your neighbors are, and I'll know who you are.](https://reader036.fdocuments.us/reader036/viewer/2022062421/56649c765503460f94929f53/html5/thumbnails/104.jpg)
Data driven partitions
bull In original LSH cut values are random in the range of the databull Suggestion Randomly select a point from the data and use one of its coordinates as the cut value
uniform data driven pointsbucket distribution
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
Additional speedup
aggregate)an of typea like is (C mode same
the toconverge willCin points all that Assume
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
C
C
Speedup results
65536 points 1638 points sampled k=100
Food for thought
Low dimension High dimension
A thought for foodhellipbull Choose K L by sample learning or take the traditionalbull Can one estimate K L without samplingbull A thought for food does it help to know the data
dimensionality or the data manifoldbull Intuitively dimensionality implies the number of hash
functions neededbull The catch efficient dimensionality learning requires
KNN
1530 cookieshellip
Summary
bull LSH suggests a compromise on accuracy for the gain of complexity
bull Applications that involve massive data in high dimension require the LSH fast performance
bull Extension of the LSH to different spaces (PSH)
bull Learning the LSH parameters and hash functions for different applications
Conclusion
bull but at the endeverything depends on your data set
bull Try it at homendash Visit
httpwebmiteduandoniwwwLSHindexhtmlndash Email Alex Andoni Andonimitedundash Test over your own data
(C code under Red Hat Linux )
Thanks
bull Ilan Shimshoni (Haifa)
bull Mohamad Hegaze (Weizmann)
bull Alex Andoni (MIT)
bull Mica and Denis
- k-Nearest Neighbors Search in High Dimensions
- Outline
- Nearest Neighbor Search Problem definition
- Applications
- Naiumlve solution
- Common solution
- When to use nearest neighbor
- Nearest Neighbor
- r - Nearest Neighbor
- Slide 10
- The simplest solution
- Quadtree
- Quadtree - structure
- Quadtree - Query
- Quadtree ndash Pitfall1
- Slide 16
- Quadtree ndash pitfall 2
- Space partition based algorithms
- Slide 19
- Curse of dimensionality
- Curse of dimensionality Some intuition
- Slide 22
- Preview
- Hash function
- Slide 25
- Slide 26
- Recall r - Nearest Neighbor
- Locality sensitive hashing
- Slide 29
- Hamming Space
- Slide 31
- L1 to Hamming Space Embedding
- Slide 33
- Construction
- Query
- Alternative intuition random projections
- Slide 37
- Slide 38
- Slide 39
- k samplings
- Repeating
- Repeating L times
- Slide 43
- Secondary hashing
- The above hashing is locality-sensitive
- Slide 46
- Direct L2 solution
- Central limit theorem
- Slide 49
- Slide 50
- Norm Distance
- Slide 52
- The full Hashing
- Slide 54
- Slide 55
- Slide 56
- Generalization P-Stable distribution
- P-Stable summary
- Parameters selection
- Parameters selection hellip
- hellip Parameters selection
- Pros amp Cons
- Conclusion
- LSH - Applications
- Motivation
- Slide 66
- Given an image x what are the parameters θ in this image ie angles of joints orientation of the body etc1048698
- Ingredients
- Example based learning
- Slide 70
- The image features
- PSH The basic assumption
- Insight Manifolds
- Slide 74
- Parameter Sensitive Hashing (PSH)
- Slide 76
- Slide 77
- Slide 78
- Slide 79
- Local Weighted Regression (LWR)
- Results
- Slide 82
- Results ndash real data
- Slide 84
- Slide 85
- Fast pose estimation - summary
- Food for Thought
- Food for Thought Point Location in Different Spheres (PLDS)
- Motivation
- Slide 90
- Slide 91
- Slide 92
- Slide 93
- Image segmentation algorithm
- Slide 95
- Filtering examples
- Segmentation examples
- Mean-shift in high dimensions
- LSH-based data structure
- Choosing the optimal K and L
- Slide 101
- Choosing optimal K and L
- Slide 103
- Data driven partitions
- Additional speedup
- Speedup results
- Food for thought
- A thought for foodhellip
- Summary
- Slide 110
- Thanks
-
![Page 105: K-Nearest Neighbors Search in High Dimensions Tomer Peled Dan Kushnir Tell me who your neighbors are, and I'll know who you are.](https://reader036.fdocuments.us/reader036/viewer/2022062421/56649c765503460f94929f53/html5/thumbnails/105.jpg)
Additional speedup
aggregate)an of typea like is (C mode same
the toconverge willCin points all that Assume
Mean-shift LSH optimal kl LSH data partition
LSH LSH data struct
C
C
Speedup results
65536 points 1638 points sampled k=100
Food for thought
Low dimension High dimension
A thought for foodhellipbull Choose K L by sample learning or take the traditionalbull Can one estimate K L without samplingbull A thought for food does it help to know the data
dimensionality or the data manifoldbull Intuitively dimensionality implies the number of hash
functions neededbull The catch efficient dimensionality learning requires
KNN
1530 cookieshellip
Summary
bull LSH suggests a compromise on accuracy for the gain of complexity
bull Applications that involve massive data in high dimension require the LSH fast performance
bull Extension of the LSH to different spaces (PSH)
bull Learning the LSH parameters and hash functions for different applications
Conclusion
bull but at the endeverything depends on your data set
bull Try it at homendash Visit
httpwebmiteduandoniwwwLSHindexhtmlndash Email Alex Andoni Andonimitedundash Test over your own data
(C code under Red Hat Linux )
Thanks
bull Ilan Shimshoni (Haifa)
bull Mohamad Hegaze (Weizmann)
bull Alex Andoni (MIT)
bull Mica and Denis
- k-Nearest Neighbors Search in High Dimensions
- Outline
- Nearest Neighbor Search Problem definition
- Applications
- Naiumlve solution
- Common solution
- When to use nearest neighbor
- Nearest Neighbor
- r - Nearest Neighbor
- Slide 10
- The simplest solution
- Quadtree
- Quadtree - structure
- Quadtree - Query
- Quadtree ndash Pitfall1
- Slide 16
- Quadtree ndash pitfall 2
- Space partition based algorithms
- Slide 19
- Curse of dimensionality
- Curse of dimensionality Some intuition
- Slide 22
- Preview
- Hash function
- Slide 25
- Slide 26
- Recall r - Nearest Neighbor
- Locality sensitive hashing
- Slide 29
- Hamming Space
- Slide 31
- L1 to Hamming Space Embedding
- Slide 33
- Construction
- Query
- Alternative intuition random projections
- Slide 37
- Slide 38
- Slide 39
- k samplings
- Repeating
- Repeating L times
- Slide 43
- Secondary hashing
- The above hashing is locality-sensitive
- Slide 46
- Direct L2 solution
- Central limit theorem
- Slide 49
- Slide 50
- Norm Distance
- Slide 52
- The full Hashing
- Slide 54
- Slide 55
- Slide 56
- Generalization P-Stable distribution
- P-Stable summary
- Parameters selection
- Parameters selection hellip
- hellip Parameters selection
- Pros amp Cons
- Conclusion
- LSH - Applications
- Motivation
- Slide 66
- Given an image x what are the parameters θ in this image ie angles of joints orientation of the body etc1048698
- Ingredients
- Example based learning
- Slide 70
- The image features
- PSH The basic assumption
- Insight Manifolds
- Slide 74
- Parameter Sensitive Hashing (PSH)
- Slide 76
- Slide 77
- Slide 78
- Slide 79
- Local Weighted Regression (LWR)
- Results
- Slide 82
- Results ndash real data
- Slide 84
- Slide 85
- Fast pose estimation - summary
- Food for Thought
- Food for Thought Point Location in Different Spheres (PLDS)
- Motivation
- Slide 90
- Slide 91
- Slide 92
- Slide 93
- Image segmentation algorithm
- Slide 95
- Filtering examples
- Segmentation examples
- Mean-shift in high dimensions
- LSH-based data structure
- Choosing the optimal K and L
- Slide 101
- Choosing optimal K and L
- Slide 103
- Data driven partitions
- Additional speedup
- Speedup results
- Food for thought
- A thought for foodhellip
- Summary
- Slide 110
- Thanks
-
![Page 106: K-Nearest Neighbors Search in High Dimensions Tomer Peled Dan Kushnir Tell me who your neighbors are, and I'll know who you are.](https://reader036.fdocuments.us/reader036/viewer/2022062421/56649c765503460f94929f53/html5/thumbnails/106.jpg)
Speedup results
65536 points 1638 points sampled k=100
Food for thought
Low dimension High dimension
A thought for foodhellipbull Choose K L by sample learning or take the traditionalbull Can one estimate K L without samplingbull A thought for food does it help to know the data
dimensionality or the data manifoldbull Intuitively dimensionality implies the number of hash
functions neededbull The catch efficient dimensionality learning requires
KNN
1530 cookieshellip
Summary
bull LSH suggests a compromise on accuracy for the gain of complexity
bull Applications that involve massive data in high dimension require the LSH fast performance
bull Extension of the LSH to different spaces (PSH)
bull Learning the LSH parameters and hash functions for different applications
Conclusion
bull but at the endeverything depends on your data set
bull Try it at homendash Visit
httpwebmiteduandoniwwwLSHindexhtmlndash Email Alex Andoni Andonimitedundash Test over your own data
(C code under Red Hat Linux )
Thanks
bull Ilan Shimshoni (Haifa)
bull Mohamad Hegaze (Weizmann)
bull Alex Andoni (MIT)
bull Mica and Denis
- k-Nearest Neighbors Search in High Dimensions
- Outline
- Nearest Neighbor Search Problem definition
- Applications
- Naiumlve solution
- Common solution
- When to use nearest neighbor
- Nearest Neighbor
- r - Nearest Neighbor
- Slide 10
- The simplest solution
- Quadtree
- Quadtree - structure
- Quadtree - Query
- Quadtree ndash Pitfall1
- Slide 16
- Quadtree ndash pitfall 2
- Space partition based algorithms
- Slide 19
- Curse of dimensionality
- Curse of dimensionality Some intuition
- Slide 22
- Preview
- Hash function
- Slide 25
- Slide 26
- Recall r - Nearest Neighbor
- Locality sensitive hashing
- Slide 29
- Hamming Space
- Slide 31
- L1 to Hamming Space Embedding
- Slide 33
- Construction
- Query
- Alternative intuition random projections
- Slide 37
- Slide 38
- Slide 39
- k samplings
- Repeating
- Repeating L times
- Slide 43
- Secondary hashing
- The above hashing is locality-sensitive
- Slide 46
- Direct L2 solution
- Central limit theorem
- Slide 49
- Slide 50
- Norm Distance
- Slide 52
- The full Hashing
- Slide 54
- Slide 55
- Slide 56
- Generalization P-Stable distribution
- P-Stable summary
- Parameters selection
- Parameters selection hellip
- hellip Parameters selection
- Pros amp Cons
- Conclusion
- LSH - Applications
- Motivation
- Slide 66
- Given an image x what are the parameters θ in this image ie angles of joints orientation of the body etc1048698
- Ingredients
- Example based learning
- Slide 70
- The image features
- PSH The basic assumption
- Insight Manifolds
- Slide 74
- Parameter Sensitive Hashing (PSH)
- Slide 76
- Slide 77
- Slide 78
- Slide 79
- Local Weighted Regression (LWR)
- Results
- Slide 82
- Results ndash real data
- Slide 84
- Slide 85
- Fast pose estimation - summary
- Food for Thought
- Food for Thought Point Location in Different Spheres (PLDS)
- Motivation
- Slide 90
- Slide 91
- Slide 92
- Slide 93
- Image segmentation algorithm
- Slide 95
- Filtering examples
- Segmentation examples
- Mean-shift in high dimensions
- LSH-based data structure
- Choosing the optimal K and L
- Slide 101
- Choosing optimal K and L
- Slide 103
- Data driven partitions
- Additional speedup
- Speedup results
- Food for thought
- A thought for foodhellip
- Summary
- Slide 110
- Thanks
-
![Page 107: K-Nearest Neighbors Search in High Dimensions Tomer Peled Dan Kushnir Tell me who your neighbors are, and I'll know who you are.](https://reader036.fdocuments.us/reader036/viewer/2022062421/56649c765503460f94929f53/html5/thumbnails/107.jpg)
Food for thought
Low dimension High dimension
A thought for foodhellipbull Choose K L by sample learning or take the traditionalbull Can one estimate K L without samplingbull A thought for food does it help to know the data
dimensionality or the data manifoldbull Intuitively dimensionality implies the number of hash
functions neededbull The catch efficient dimensionality learning requires
KNN
1530 cookieshellip
Summary
bull LSH suggests a compromise on accuracy for the gain of complexity
bull Applications that involve massive data in high dimension require the LSH fast performance
bull Extension of the LSH to different spaces (PSH)
bull Learning the LSH parameters and hash functions for different applications
Conclusion
bull but at the endeverything depends on your data set
bull Try it at homendash Visit
httpwebmiteduandoniwwwLSHindexhtmlndash Email Alex Andoni Andonimitedundash Test over your own data
(C code under Red Hat Linux )
Thanks
bull Ilan Shimshoni (Haifa)
bull Mohamad Hegaze (Weizmann)
bull Alex Andoni (MIT)
bull Mica and Denis
- k-Nearest Neighbors Search in High Dimensions
- Outline
- Nearest Neighbor Search Problem definition
- Applications
- Naiumlve solution
- Common solution
- When to use nearest neighbor
- Nearest Neighbor
- r - Nearest Neighbor
- Slide 10
- The simplest solution
- Quadtree
- Quadtree - structure
- Quadtree - Query
- Quadtree ndash Pitfall1
- Slide 16
- Quadtree ndash pitfall 2
- Space partition based algorithms
- Slide 19
- Curse of dimensionality
- Curse of dimensionality Some intuition
- Slide 22
- Preview
- Hash function
- Slide 25
- Slide 26
- Recall r - Nearest Neighbor
- Locality sensitive hashing
- Slide 29
- Hamming Space
- Slide 31
- L1 to Hamming Space Embedding
- Slide 33
- Construction
- Query
- Alternative intuition random projections
- Slide 37
- Slide 38
- Slide 39
- k samplings
- Repeating
- Repeating L times
- Slide 43
- Secondary hashing
- The above hashing is locality-sensitive
- Slide 46
- Direct L2 solution
- Central limit theorem
- Slide 49
- Slide 50
- Norm Distance
- Slide 52
- The full Hashing
- Slide 54
- Slide 55
- Slide 56
- Generalization P-Stable distribution
- P-Stable summary
- Parameters selection
- Parameters selection hellip
- hellip Parameters selection
- Pros amp Cons
- Conclusion
- LSH - Applications
- Motivation
- Slide 66
- Given an image x what are the parameters θ in this image ie angles of joints orientation of the body etc1048698
- Ingredients
- Example based learning
- Slide 70
- The image features
- PSH The basic assumption
- Insight Manifolds
- Slide 74
- Parameter Sensitive Hashing (PSH)
- Slide 76
- Slide 77
- Slide 78
- Slide 79
- Local Weighted Regression (LWR)
- Results
- Slide 82
- Results ndash real data
- Slide 84
- Slide 85
- Fast pose estimation - summary
- Food for Thought
- Food for Thought Point Location in Different Spheres (PLDS)
- Motivation
- Slide 90
- Slide 91
- Slide 92
- Slide 93
- Image segmentation algorithm
- Slide 95
- Filtering examples
- Segmentation examples
- Mean-shift in high dimensions
- LSH-based data structure
- Choosing the optimal K and L
- Slide 101
- Choosing optimal K and L
- Slide 103
- Data driven partitions
- Additional speedup
- Speedup results
- Food for thought
- A thought for foodhellip
- Summary
- Slide 110
- Thanks
-
![Page 108: K-Nearest Neighbors Search in High Dimensions Tomer Peled Dan Kushnir Tell me who your neighbors are, and I'll know who you are.](https://reader036.fdocuments.us/reader036/viewer/2022062421/56649c765503460f94929f53/html5/thumbnails/108.jpg)
A thought for foodhellipbull Choose K L by sample learning or take the traditionalbull Can one estimate K L without samplingbull A thought for food does it help to know the data
dimensionality or the data manifoldbull Intuitively dimensionality implies the number of hash
functions neededbull The catch efficient dimensionality learning requires
KNN
1530 cookieshellip
Summary
bull LSH suggests a compromise on accuracy for the gain of complexity
bull Applications that involve massive data in high dimension require the LSH fast performance
bull Extension of the LSH to different spaces (PSH)
bull Learning the LSH parameters and hash functions for different applications
Conclusion
bull but at the endeverything depends on your data set
bull Try it at homendash Visit
httpwebmiteduandoniwwwLSHindexhtmlndash Email Alex Andoni Andonimitedundash Test over your own data
(C code under Red Hat Linux )
Thanks
bull Ilan Shimshoni (Haifa)
bull Mohamad Hegaze (Weizmann)
bull Alex Andoni (MIT)
bull Mica and Denis
- k-Nearest Neighbors Search in High Dimensions
- Outline
- Nearest Neighbor Search Problem definition
- Applications
- Naiumlve solution
- Common solution
- When to use nearest neighbor
- Nearest Neighbor
- r - Nearest Neighbor
- Slide 10
- The simplest solution
- Quadtree
- Quadtree - structure
- Quadtree - Query
- Quadtree ndash Pitfall1
- Slide 16
- Quadtree ndash pitfall 2
- Space partition based algorithms
- Slide 19
- Curse of dimensionality
- Curse of dimensionality Some intuition
- Slide 22
- Preview
- Hash function
- Slide 25
- Slide 26
- Recall r - Nearest Neighbor
- Locality sensitive hashing
- Slide 29
- Hamming Space
- Slide 31
- L1 to Hamming Space Embedding
- Slide 33
- Construction
- Query
- Alternative intuition random projections
- Slide 37
- Slide 38
- Slide 39
- k samplings
- Repeating
- Repeating L times
- Slide 43
- Secondary hashing
- The above hashing is locality-sensitive
- Slide 46
- Direct L2 solution
- Central limit theorem
- Slide 49
- Slide 50
- Norm Distance
- Slide 52
- The full Hashing
- Slide 54
- Slide 55
- Slide 56
- Generalization P-Stable distribution
- P-Stable summary
- Parameters selection
- Parameters selection hellip
- hellip Parameters selection
- Pros amp Cons
- Conclusion
- LSH - Applications
- Motivation
- Slide 66
- Given an image x what are the parameters θ in this image ie angles of joints orientation of the body etc1048698
- Ingredients
- Example based learning
- Slide 70
- The image features
- PSH The basic assumption
- Insight Manifolds
- Slide 74
- Parameter Sensitive Hashing (PSH)
- Slide 76
- Slide 77
- Slide 78
- Slide 79
- Local Weighted Regression (LWR)
- Results
- Slide 82
- Results ndash real data
- Slide 84
- Slide 85
- Fast pose estimation - summary
- Food for Thought
- Food for Thought Point Location in Different Spheres (PLDS)
- Motivation
- Slide 90
- Slide 91
- Slide 92
- Slide 93
- Image segmentation algorithm
- Slide 95
- Filtering examples
- Segmentation examples
- Mean-shift in high dimensions
- LSH-based data structure
- Choosing the optimal K and L
- Slide 101
- Choosing optimal K and L
- Slide 103
- Data driven partitions
- Additional speedup
- Speedup results
- Food for thought
- A thought for foodhellip
- Summary
- Slide 110
- Thanks
-
![Page 109: K-Nearest Neighbors Search in High Dimensions Tomer Peled Dan Kushnir Tell me who your neighbors are, and I'll know who you are.](https://reader036.fdocuments.us/reader036/viewer/2022062421/56649c765503460f94929f53/html5/thumbnails/109.jpg)
Summary
bull LSH suggests a compromise on accuracy for the gain of complexity
bull Applications that involve massive data in high dimension require the LSH fast performance
bull Extension of the LSH to different spaces (PSH)
bull Learning the LSH parameters and hash functions for different applications
Conclusion
bull but at the endeverything depends on your data set
bull Try it at homendash Visit
httpwebmiteduandoniwwwLSHindexhtmlndash Email Alex Andoni Andonimitedundash Test over your own data
(C code under Red Hat Linux )
Thanks
bull Ilan Shimshoni (Haifa)
bull Mohamad Hegaze (Weizmann)
bull Alex Andoni (MIT)
bull Mica and Denis
- k-Nearest Neighbors Search in High Dimensions
- Outline
- Nearest Neighbor Search Problem definition
- Applications
- Naiumlve solution
- Common solution
- When to use nearest neighbor
- Nearest Neighbor
- r - Nearest Neighbor
- Slide 10
- The simplest solution
- Quadtree
- Quadtree - structure
- Quadtree - Query
- Quadtree ndash Pitfall1
- Slide 16
- Quadtree ndash pitfall 2
- Space partition based algorithms
- Slide 19
- Curse of dimensionality
- Curse of dimensionality Some intuition
- Slide 22
- Preview
- Hash function
- Slide 25
- Slide 26
- Recall r - Nearest Neighbor
- Locality sensitive hashing
- Slide 29
- Hamming Space
- Slide 31
- L1 to Hamming Space Embedding
- Slide 33
- Construction
- Query
- Alternative intuition random projections
- Slide 37
- Slide 38
- Slide 39
- k samplings
- Repeating
- Repeating L times
- Slide 43
- Secondary hashing
- The above hashing is locality-sensitive
- Slide 46
- Direct L2 solution
- Central limit theorem
- Slide 49
- Slide 50
- Norm Distance
- Slide 52
- The full Hashing
- Slide 54
- Slide 55
- Slide 56
- Generalization P-Stable distribution
- P-Stable summary
- Parameters selection
- Parameters selection hellip
- hellip Parameters selection
- Pros amp Cons
- Conclusion
- LSH - Applications
- Motivation
- Slide 66
- Given an image x what are the parameters θ in this image ie angles of joints orientation of the body etc1048698
- Ingredients
- Example based learning
- Slide 70
- The image features
- PSH The basic assumption
- Insight Manifolds
- Slide 74
- Parameter Sensitive Hashing (PSH)
- Slide 76
- Slide 77
- Slide 78
- Slide 79
- Local Weighted Regression (LWR)
- Results
- Slide 82
- Results ndash real data
- Slide 84
- Slide 85
- Fast pose estimation - summary
- Food for Thought
- Food for Thought Point Location in Different Spheres (PLDS)
- Motivation
- Slide 90
- Slide 91
- Slide 92
- Slide 93
- Image segmentation algorithm
- Slide 95
- Filtering examples
- Segmentation examples
- Mean-shift in high dimensions
- LSH-based data structure
- Choosing the optimal K and L
- Slide 101
- Choosing optimal K and L
- Slide 103
- Data driven partitions
- Additional speedup
- Speedup results
- Food for thought
- A thought for foodhellip
- Summary
- Slide 110
- Thanks
-
![Page 110: K-Nearest Neighbors Search in High Dimensions Tomer Peled Dan Kushnir Tell me who your neighbors are, and I'll know who you are.](https://reader036.fdocuments.us/reader036/viewer/2022062421/56649c765503460f94929f53/html5/thumbnails/110.jpg)
Conclusion
bull but at the endeverything depends on your data set
bull Try it at homendash Visit
httpwebmiteduandoniwwwLSHindexhtmlndash Email Alex Andoni Andonimitedundash Test over your own data
(C code under Red Hat Linux )
Thanks
bull Ilan Shimshoni (Haifa)
bull Mohamad Hegaze (Weizmann)
bull Alex Andoni (MIT)
bull Mica and Denis
- k-Nearest Neighbors Search in High Dimensions
- Outline
- Nearest Neighbor Search Problem definition
- Applications
- Naiumlve solution
- Common solution
- When to use nearest neighbor
- Nearest Neighbor
- r - Nearest Neighbor
- Slide 10
- The simplest solution
- Quadtree
- Quadtree - structure
- Quadtree - Query
- Quadtree ndash Pitfall1
- Slide 16
- Quadtree ndash pitfall 2
- Space partition based algorithms
- Slide 19
- Curse of dimensionality
- Curse of dimensionality Some intuition
- Slide 22
- Preview
- Hash function
- Slide 25
- Slide 26
- Recall r - Nearest Neighbor
- Locality sensitive hashing
- Slide 29
- Hamming Space
- Slide 31
- L1 to Hamming Space Embedding
- Slide 33
- Construction
- Query
- Alternative intuition random projections
- Slide 37
- Slide 38
- Slide 39
- k samplings
- Repeating
- Repeating L times
- Slide 43
- Secondary hashing
- The above hashing is locality-sensitive
- Slide 46
- Direct L2 solution
- Central limit theorem
- Slide 49
- Slide 50
- Norm Distance
- Slide 52
- The full Hashing
- Slide 54
- Slide 55
- Slide 56
- Generalization P-Stable distribution
- P-Stable summary
- Parameters selection
- Parameters selection hellip
- hellip Parameters selection
- Pros amp Cons
- Conclusion
- LSH - Applications
- Motivation
- Slide 66
- Given an image x what are the parameters θ in this image ie angles of joints orientation of the body etc1048698
- Ingredients
- Example based learning
- Slide 70
- The image features
- PSH The basic assumption
- Insight Manifolds
- Slide 74
- Parameter Sensitive Hashing (PSH)
- Slide 76
- Slide 77
- Slide 78
- Slide 79
- Local Weighted Regression (LWR)
- Results
- Slide 82
- Results ndash real data
- Slide 84
- Slide 85
- Fast pose estimation - summary
- Food for Thought
- Food for Thought Point Location in Different Spheres (PLDS)
- Motivation
- Slide 90
- Slide 91
- Slide 92
- Slide 93
- Image segmentation algorithm
- Slide 95
- Filtering examples
- Segmentation examples
- Mean-shift in high dimensions
- LSH-based data structure
- Choosing the optimal K and L
- Slide 101
- Choosing optimal K and L
- Slide 103
- Data driven partitions
- Additional speedup
- Speedup results
- Food for thought
- A thought for foodhellip
- Summary
- Slide 110
- Thanks
-
![Page 111: K-Nearest Neighbors Search in High Dimensions Tomer Peled Dan Kushnir Tell me who your neighbors are, and I'll know who you are.](https://reader036.fdocuments.us/reader036/viewer/2022062421/56649c765503460f94929f53/html5/thumbnails/111.jpg)
Thanks
bull Ilan Shimshoni (Haifa)
bull Mohamad Hegaze (Weizmann)
bull Alex Andoni (MIT)
bull Mica and Denis
- k-Nearest Neighbors Search in High Dimensions
- Outline
- Nearest Neighbor Search Problem definition
- Applications
- Naiumlve solution
- Common solution
- When to use nearest neighbor
- Nearest Neighbor
- r - Nearest Neighbor
- Slide 10
- The simplest solution
- Quadtree
- Quadtree - structure
- Quadtree - Query
- Quadtree ndash Pitfall1
- Slide 16
- Quadtree ndash pitfall 2
- Space partition based algorithms
- Slide 19
- Curse of dimensionality
- Curse of dimensionality Some intuition
- Slide 22
- Preview
- Hash function
- Slide 25
- Slide 26
- Recall r - Nearest Neighbor
- Locality sensitive hashing
- Slide 29
- Hamming Space
- Slide 31
- L1 to Hamming Space Embedding
- Slide 33
- Construction
- Query
- Alternative intuition random projections
- Slide 37
- Slide 38
- Slide 39
- k samplings
- Repeating
- Repeating L times
- Slide 43
- Secondary hashing
- The above hashing is locality-sensitive
- Slide 46
- Direct L2 solution
- Central limit theorem
- Slide 49
- Slide 50
- Norm Distance
- Slide 52
- The full Hashing
- Slide 54
- Slide 55
- Slide 56
- Generalization P-Stable distribution
- P-Stable summary
- Parameters selection
- Parameters selection hellip
- hellip Parameters selection
- Pros amp Cons
- Conclusion
- LSH - Applications
- Motivation
- Slide 66
- Given an image x what are the parameters θ in this image ie angles of joints orientation of the body etc1048698
- Ingredients
- Example based learning
- Slide 70
- The image features
- PSH The basic assumption
- Insight Manifolds
- Slide 74
- Parameter Sensitive Hashing (PSH)
- Slide 76
- Slide 77
- Slide 78
- Slide 79
- Local Weighted Regression (LWR)
- Results
- Slide 82
- Results ndash real data
- Slide 84
- Slide 85
- Fast pose estimation - summary
- Food for Thought
- Food for Thought Point Location in Different Spheres (PLDS)
- Motivation
- Slide 90
- Slide 91
- Slide 92
- Slide 93
- Image segmentation algorithm
- Slide 95
- Filtering examples
- Segmentation examples
- Mean-shift in high dimensions
- LSH-based data structure
- Choosing the optimal K and L
- Slide 101
- Choosing optimal K and L
- Slide 103
- Data driven partitions
- Additional speedup
- Speedup results
- Food for thought
- A thought for foodhellip
- Summary
- Slide 110
- Thanks
-