Analysis of hierarchical metric-tree indexing schemescris/AofA2008/slides/pestov.pdf · Metric tree...
Transcript of Analysis of hierarchical metric-tree indexing schemescris/AofA2008/slides/pestov.pdf · Metric tree...
![Page 1: Analysis of hierarchical metric-tree indexing schemescris/AofA2008/slides/pestov.pdf · Metric tree indexing schemes for similarity search Vladimir Pestov, University of Ottawa AofA](https://reader034.fdocuments.us/reader034/viewer/2022042117/5e9607baa9c68568ae6e3a95/html5/thumbnails/1.jpg)
Analysis of hierarchical metric-treeindexing schemes
for similarity search in high-dimensional datasets
Vladimir Pestov
http://aix1.uottawa.ca/˜vpest283
Department of Mathematics and Statistics
University of Ottawa
Metric tree indexing schemes for similarity search Vladimir Pestov, University of Ottawa AofA 2008, Maresias, SP, Brazil – p.1/25
![Page 2: Analysis of hierarchical metric-tree indexing schemescris/AofA2008/slides/pestov.pdf · Metric tree indexing schemes for similarity search Vladimir Pestov, University of Ottawa AofA](https://reader034.fdocuments.us/reader034/viewer/2022042117/5e9607baa9c68568ae6e3a95/html5/thumbnails/2.jpg)
General setting
Workload: W = (Ω, X,Q), where:
Metric tree indexing schemes for similarity search Vladimir Pestov, University of Ottawa AofA 2008, Maresias, SP, Brazil – p.2/25
![Page 3: Analysis of hierarchical metric-tree indexing schemescris/AofA2008/slides/pestov.pdf · Metric tree indexing schemes for similarity search Vladimir Pestov, University of Ottawa AofA](https://reader034.fdocuments.us/reader034/viewer/2022042117/5e9607baa9c68568ae6e3a95/html5/thumbnails/3.jpg)
General setting
Workload: W = (Ω, X,Q), where:
Ω is the domain,
Metric tree indexing schemes for similarity search Vladimir Pestov, University of Ottawa AofA 2008, Maresias, SP, Brazil – p.2/25
![Page 4: Analysis of hierarchical metric-tree indexing schemescris/AofA2008/slides/pestov.pdf · Metric tree indexing schemes for similarity search Vladimir Pestov, University of Ottawa AofA](https://reader034.fdocuments.us/reader034/viewer/2022042117/5e9607baa9c68568ae6e3a95/html5/thumbnails/4.jpg)
General setting
Workload: W = (Ω, X,Q), where:
Ω is the domain,
X ⊂ Ω finite subset (dataset, or instance), and
Metric tree indexing schemes for similarity search Vladimir Pestov, University of Ottawa AofA 2008, Maresias, SP, Brazil – p.2/25
![Page 5: Analysis of hierarchical metric-tree indexing schemescris/AofA2008/slides/pestov.pdf · Metric tree indexing schemes for similarity search Vladimir Pestov, University of Ottawa AofA](https://reader034.fdocuments.us/reader034/viewer/2022042117/5e9607baa9c68568ae6e3a95/html5/thumbnails/5.jpg)
General setting
Workload: W = (Ω, X,Q), where:
Ω is the domain,
X ⊂ Ω finite subset (dataset, or instance), and
Q ⊆ 2Ω is the set of queries.
Metric tree indexing schemes for similarity search Vladimir Pestov, University of Ottawa AofA 2008, Maresias, SP, Brazil – p.2/25
![Page 6: Analysis of hierarchical metric-tree indexing schemescris/AofA2008/slides/pestov.pdf · Metric tree indexing schemes for similarity search Vladimir Pestov, University of Ottawa AofA](https://reader034.fdocuments.us/reader034/viewer/2022042117/5e9607baa9c68568ae6e3a95/html5/thumbnails/6.jpg)
General setting
Workload: W = (Ω, X,Q), where:
Ω is the domain,
X ⊂ Ω finite subset (dataset, or instance), and
Q ⊆ 2Ω is the set of queries.
Answering a query Q ∈ Q
Metric tree indexing schemes for similarity search Vladimir Pestov, University of Ottawa AofA 2008, Maresias, SP, Brazil – p.2/25
![Page 7: Analysis of hierarchical metric-tree indexing schemescris/AofA2008/slides/pestov.pdf · Metric tree indexing schemes for similarity search Vladimir Pestov, University of Ottawa AofA](https://reader034.fdocuments.us/reader034/viewer/2022042117/5e9607baa9c68568ae6e3a95/html5/thumbnails/7.jpg)
General setting
Workload: W = (Ω, X,Q), where:
Ω is the domain,
X ⊂ Ω finite subset (dataset, or instance), and
Q ⊆ 2Ω is the set of queries.
Answering a query Q ∈ Q is listing all x ∈ X ∩ Q.
Metric tree indexing schemes for similarity search Vladimir Pestov, University of Ottawa AofA 2008, Maresias, SP, Brazil – p.2/25
![Page 8: Analysis of hierarchical metric-tree indexing schemescris/AofA2008/slides/pestov.pdf · Metric tree indexing schemes for similarity search Vladimir Pestov, University of Ottawa AofA](https://reader034.fdocuments.us/reader034/viewer/2022042117/5e9607baa9c68568ae6e3a95/html5/thumbnails/8.jpg)
General setting
Workload: W = (Ω, X,Q), where:
Ω is the domain,
X ⊂ Ω finite subset (dataset, or instance), and
Q ⊆ 2Ω is the set of queries.
Answering a query Q ∈ Q is listing all x ∈ X ∩ Q.
A (dis)similarity measure s : Ω × Ω → R,
Metric tree indexing schemes for similarity search Vladimir Pestov, University of Ottawa AofA 2008, Maresias, SP, Brazil – p.2/25
![Page 9: Analysis of hierarchical metric-tree indexing schemescris/AofA2008/slides/pestov.pdf · Metric tree indexing schemes for similarity search Vladimir Pestov, University of Ottawa AofA](https://reader034.fdocuments.us/reader034/viewer/2022042117/5e9607baa9c68568ae6e3a95/html5/thumbnails/9.jpg)
General setting
Workload: W = (Ω, X,Q), where:
Ω is the domain,
X ⊂ Ω finite subset (dataset, or instance), and
Q ⊆ 2Ω is the set of queries.
Answering a query Q ∈ Q is listing all x ∈ X ∩ Q.
A (dis)similarity measure s : Ω × Ω → R,e.g. a metric, or a pseudometric.
Metric tree indexing schemes for similarity search Vladimir Pestov, University of Ottawa AofA 2008, Maresias, SP, Brazil – p.2/25
![Page 10: Analysis of hierarchical metric-tree indexing schemescris/AofA2008/slides/pestov.pdf · Metric tree indexing schemes for similarity search Vladimir Pestov, University of Ottawa AofA](https://reader034.fdocuments.us/reader034/viewer/2022042117/5e9607baa9c68568ae6e3a95/html5/thumbnails/10.jpg)
General setting
Workload: W = (Ω, X,Q), where:
Ω is the domain,
X ⊂ Ω finite subset (dataset, or instance), and
Q ⊆ 2Ω is the set of queries.
Answering a query Q ∈ Q is listing all x ∈ X ∩ Q.
A (dis)similarity measure s : Ω × Ω → R,e.g. a metric, or a pseudometric.
A range similarity query centred at ω ∈ Ω:
Q = x ∈ Ω: s(ω, x) < ε
Metric tree indexing schemes for similarity search Vladimir Pestov, University of Ottawa AofA 2008, Maresias, SP, Brazil – p.2/25
![Page 11: Analysis of hierarchical metric-tree indexing schemescris/AofA2008/slides/pestov.pdf · Metric tree indexing schemes for similarity search Vladimir Pestov, University of Ottawa AofA](https://reader034.fdocuments.us/reader034/viewer/2022042117/5e9607baa9c68568ae6e3a95/html5/thumbnails/11.jpg)
Similarity workloads
ε
Ω
W = (Ω, d,X, Bε(x))
Metric tree indexing schemes for similarity search Vladimir Pestov, University of Ottawa AofA 2008, Maresias, SP, Brazil – p.3/25
![Page 12: Analysis of hierarchical metric-tree indexing schemescris/AofA2008/slides/pestov.pdf · Metric tree indexing schemes for similarity search Vladimir Pestov, University of Ottawa AofA](https://reader034.fdocuments.us/reader034/viewer/2022042117/5e9607baa9c68568ae6e3a95/html5/thumbnails/12.jpg)
Similarity workloads
ε
Ω
W = (Ω, d,X, Bε(x))
k-nearest neighbours (k-NN) query centred at x∗ ∈ Ω,where k ∈ N.
Metric tree indexing schemes for similarity search Vladimir Pestov, University of Ottawa AofA 2008, Maresias, SP, Brazil – p.3/25
![Page 13: Analysis of hierarchical metric-tree indexing schemescris/AofA2008/slides/pestov.pdf · Metric tree indexing schemes for similarity search Vladimir Pestov, University of Ottawa AofA](https://reader034.fdocuments.us/reader034/viewer/2022042117/5e9607baa9c68568ae6e3a95/html5/thumbnails/13.jpg)
Example
Metric tree indexing schemes for similarity search Vladimir Pestov, University of Ottawa AofA 2008, Maresias, SP, Brazil – p.4/25
![Page 14: Analysis of hierarchical metric-tree indexing schemescris/AofA2008/slides/pestov.pdf · Metric tree indexing schemes for similarity search Vladimir Pestov, University of Ottawa AofA](https://reader034.fdocuments.us/reader034/viewer/2022042117/5e9607baa9c68568ae6e3a95/html5/thumbnails/14.jpg)
Example
Ω = strings of length m = 10 from the alphabet Σ of 20standard amino acids: Ω = Σ10.
Metric tree indexing schemes for similarity search Vladimir Pestov, University of Ottawa AofA 2008, Maresias, SP, Brazil – p.4/25
![Page 15: Analysis of hierarchical metric-tree indexing schemescris/AofA2008/slides/pestov.pdf · Metric tree indexing schemes for similarity search Vladimir Pestov, University of Ottawa AofA](https://reader034.fdocuments.us/reader034/viewer/2022042117/5e9607baa9c68568ae6e3a95/html5/thumbnails/15.jpg)
Example
Ω = strings of length m = 10 from the alphabet Σ of 20standard amino acids: Ω = Σ10.
X = all peptide fragments of length 10 in the SwissProtdatabase (as of 19-Oct-2002). |X| = 23, 817, 598.
Metric tree indexing schemes for similarity search Vladimir Pestov, University of Ottawa AofA 2008, Maresias, SP, Brazil – p.4/25
![Page 16: Analysis of hierarchical metric-tree indexing schemescris/AofA2008/slides/pestov.pdf · Metric tree indexing schemes for similarity search Vladimir Pestov, University of Ottawa AofA](https://reader034.fdocuments.us/reader034/viewer/2022042117/5e9607baa9c68568ae6e3a95/html5/thumbnails/16.jpg)
Example
Ω = strings of length m = 10 from the alphabet Σ of 20standard amino acids: Ω = Σ10.
X = all peptide fragments of length 10 in the SwissProtdatabase (as of 19-Oct-2002). |X| = 23, 817, 598.
Similarity measure given by the most common scoringmatrix in sequence comparison, BLOSUM62, bys(a, b) =
∑mi=1 s(ai, bi) (the ungapped score).
Metric tree indexing schemes for similarity search Vladimir Pestov, University of Ottawa AofA 2008, Maresias, SP, Brazil – p.4/25
![Page 17: Analysis of hierarchical metric-tree indexing schemescris/AofA2008/slides/pestov.pdf · Metric tree indexing schemes for similarity search Vladimir Pestov, University of Ottawa AofA](https://reader034.fdocuments.us/reader034/viewer/2022042117/5e9607baa9c68568ae6e3a95/html5/thumbnails/17.jpg)
Example
Ω = strings of length m = 10 from the alphabet Σ of 20standard amino acids: Ω = Σ10.
X = all peptide fragments of length 10 in the SwissProtdatabase (as of 19-Oct-2002). |X| = 23, 817, 598.
Similarity measure given by the most common scoringmatrix in sequence comparison, BLOSUM62, bys(a, b) =
∑mi=1 s(ai, bi) (the ungapped score).
Converted into quasi-metric d(a, b) = s(a, a) − s(a, b),generating the same set of queries (range and k-NN).
(joint with A. Stojmirovic)
Metric tree indexing schemes for similarity search Vladimir Pestov, University of Ottawa AofA 2008, Maresias, SP, Brazil – p.4/25
![Page 18: Analysis of hierarchical metric-tree indexing schemescris/AofA2008/slides/pestov.pdf · Metric tree indexing schemes for similarity search Vladimir Pestov, University of Ottawa AofA](https://reader034.fdocuments.us/reader034/viewer/2022042117/5e9607baa9c68568ae6e3a95/html5/thumbnails/18.jpg)
Inner vs outer
Metric tree indexing schemes for similarity search Vladimir Pestov, University of Ottawa AofA 2008, Maresias, SP, Brazil – p.5/25
![Page 19: Analysis of hierarchical metric-tree indexing schemescris/AofA2008/slides/pestov.pdf · Metric tree indexing schemes for similarity search Vladimir Pestov, University of Ottawa AofA](https://reader034.fdocuments.us/reader034/viewer/2022042117/5e9607baa9c68568ae6e3a95/html5/thumbnails/19.jpg)
Inner vs outer
Inner workload if X = Ω,
Metric tree indexing schemes for similarity search Vladimir Pestov, University of Ottawa AofA 2008, Maresias, SP, Brazil – p.5/25
![Page 20: Analysis of hierarchical metric-tree indexing schemescris/AofA2008/slides/pestov.pdf · Metric tree indexing schemes for similarity search Vladimir Pestov, University of Ottawa AofA](https://reader034.fdocuments.us/reader034/viewer/2022042117/5e9607baa9c68568ae6e3a95/html5/thumbnails/20.jpg)
Inner vs outer
Inner workload if X = Ω,
Outer workload if |X| ≪ |Ω|.
Metric tree indexing schemes for similarity search Vladimir Pestov, University of Ottawa AofA 2008, Maresias, SP, Brazil – p.5/25
![Page 21: Analysis of hierarchical metric-tree indexing schemescris/AofA2008/slides/pestov.pdf · Metric tree indexing schemes for similarity search Vladimir Pestov, University of Ottawa AofA](https://reader034.fdocuments.us/reader034/viewer/2022042117/5e9607baa9c68568ae6e3a95/html5/thumbnails/21.jpg)
Inner vs outer
Inner workload if X = Ω,
Outer workload if |X| ≪ |Ω|.
Fragment example: outer,
|X|/|Ω| = 23, 817, 598/2010 ≈ 0.0000023
Metric tree indexing schemes for similarity search Vladimir Pestov, University of Ottawa AofA 2008, Maresias, SP, Brazil – p.5/25
![Page 22: Analysis of hierarchical metric-tree indexing schemescris/AofA2008/slides/pestov.pdf · Metric tree indexing schemes for similarity search Vladimir Pestov, University of Ottawa AofA](https://reader034.fdocuments.us/reader034/viewer/2022042117/5e9607baa9c68568ae6e3a95/html5/thumbnails/22.jpg)
Inner vs outer
Inner workload if X = Ω,
Outer workload if |X| ≪ |Ω|.
Fragment example: outer,
|X|/|Ω| = 23, 817, 598/2010 ≈ 0.0000023
Most points ω ∈ Ω have NN x ∈ X within ε = 25 (high biological relevance).
0 5 10 15 20 25 30 35
0.0
0.2
0.4
0.6
0.8
1.0
DISTANCE
µ(N
ε(X))
Quasi−metricMetric
Metric tree indexing schemes for similarity search Vladimir Pestov, University of Ottawa AofA 2008, Maresias, SP, Brazil – p.5/25
![Page 23: Analysis of hierarchical metric-tree indexing schemescris/AofA2008/slides/pestov.pdf · Metric tree indexing schemes for similarity search Vladimir Pestov, University of Ottawa AofA](https://reader034.fdocuments.us/reader034/viewer/2022042117/5e9607baa9c68568ae6e3a95/html5/thumbnails/23.jpg)
Hierarchical tree index structures
Metric tree indexing schemes for similarity search Vladimir Pestov, University of Ottawa AofA 2008, Maresias, SP, Brazil – p.6/25
![Page 24: Analysis of hierarchical metric-tree indexing schemescris/AofA2008/slides/pestov.pdf · Metric tree indexing schemes for similarity search Vladimir Pestov, University of Ottawa AofA](https://reader034.fdocuments.us/reader034/viewer/2022042117/5e9607baa9c68568ae6e3a95/html5/thumbnails/24.jpg)
Hierarchical tree index structures
A sequence of refining partitions of the domain:
Metric tree indexing schemes for similarity search Vladimir Pestov, University of Ottawa AofA 2008, Maresias, SP, Brazil – p.6/25
![Page 25: Analysis of hierarchical metric-tree indexing schemescris/AofA2008/slides/pestov.pdf · Metric tree indexing schemes for similarity search Vladimir Pestov, University of Ottawa AofA](https://reader034.fdocuments.us/reader034/viewer/2022042117/5e9607baa9c68568ae6e3a95/html5/thumbnails/25.jpg)
Hierarchical tree index structures
A sequence of refining partitions of the domain:
Metric tree indexing schemes for similarity search Vladimir Pestov, University of Ottawa AofA 2008, Maresias, SP, Brazil – p.6/25
![Page 26: Analysis of hierarchical metric-tree indexing schemescris/AofA2008/slides/pestov.pdf · Metric tree indexing schemes for similarity search Vladimir Pestov, University of Ottawa AofA](https://reader034.fdocuments.us/reader034/viewer/2022042117/5e9607baa9c68568ae6e3a95/html5/thumbnails/26.jpg)
Hierarchical tree index structures
A sequence of refining partitions of the domain:
Space O(n).
Metric tree indexing schemes for similarity search Vladimir Pestov, University of Ottawa AofA 2008, Maresias, SP, Brazil – p.6/25
![Page 27: Analysis of hierarchical metric-tree indexing schemescris/AofA2008/slides/pestov.pdf · Metric tree indexing schemes for similarity search Vladimir Pestov, University of Ottawa AofA](https://reader034.fdocuments.us/reader034/viewer/2022042117/5e9607baa9c68568ae6e3a95/html5/thumbnails/27.jpg)
Hierarchical tree index structures
A sequence of refining partitions of the domain:
Space O(n).To process a range query Bε(ω), we traverse the tree all theway down to the leaf level.
Metric tree indexing schemes for similarity search Vladimir Pestov, University of Ottawa AofA 2008, Maresias, SP, Brazil – p.6/25
![Page 28: Analysis of hierarchical metric-tree indexing schemescris/AofA2008/slides/pestov.pdf · Metric tree indexing schemes for similarity search Vladimir Pestov, University of Ottawa AofA](https://reader034.fdocuments.us/reader034/viewer/2022042117/5e9607baa9c68568ae6e3a95/html5/thumbnails/28.jpg)
Hierarchical tree index structures
A sequence of refining partitions of the domain:
Space O(n).To process a range query Bε(ω), we traverse the tree all theway down to the leaf level.
What happens in each node?
Metric tree indexing schemes for similarity search Vladimir Pestov, University of Ottawa AofA 2008, Maresias, SP, Brazil – p.6/25
![Page 29: Analysis of hierarchical metric-tree indexing schemescris/AofA2008/slides/pestov.pdf · Metric tree indexing schemes for similarity search Vladimir Pestov, University of Ottawa AofA](https://reader034.fdocuments.us/reader034/viewer/2022042117/5e9607baa9c68568ae6e3a95/html5/thumbnails/29.jpg)
Pruning
Metric tree indexing schemes for similarity search Vladimir Pestov, University of Ottawa AofA 2008, Maresias, SP, Brazil – p.7/25
![Page 30: Analysis of hierarchical metric-tree indexing schemescris/AofA2008/slides/pestov.pdf · Metric tree indexing schemes for similarity search Vladimir Pestov, University of Ottawa AofA](https://reader034.fdocuments.us/reader034/viewer/2022042117/5e9607baa9c68568ae6e3a95/html5/thumbnails/30.jpg)
Pruning
• If Bε(ω) ∩ B = ∅, the sub-tree descending from the node Bcan be pruned:
Metric tree indexing schemes for similarity search Vladimir Pestov, University of Ottawa AofA 2008, Maresias, SP, Brazil – p.7/25
![Page 31: Analysis of hierarchical metric-tree indexing schemescris/AofA2008/slides/pestov.pdf · Metric tree indexing schemes for similarity search Vladimir Pestov, University of Ottawa AofA](https://reader034.fdocuments.us/reader034/viewer/2022042117/5e9607baa9c68568ae6e3a95/html5/thumbnails/31.jpg)
Pruning
• If Bε(ω) ∩ B = ∅, the sub-tree descending from the node Bcan be pruned:
B
ω
A B
A ε ε
ε
Metric tree indexing schemes for similarity search Vladimir Pestov, University of Ottawa AofA 2008, Maresias, SP, Brazil – p.7/25
![Page 32: Analysis of hierarchical metric-tree indexing schemescris/AofA2008/slides/pestov.pdf · Metric tree indexing schemes for similarity search Vladimir Pestov, University of Ottawa AofA](https://reader034.fdocuments.us/reader034/viewer/2022042117/5e9607baa9c68568ae6e3a95/html5/thumbnails/32.jpg)
Pruning
• If Bε(ω) ∩ B = ∅, the sub-tree descending from the node Bcan be pruned:
B
ω
A B
A ε ε
ε
that is, if it can be certified thatω /∈ Bε = x ∈ Ω: d(x,B) < ε.
Metric tree indexing schemes for similarity search Vladimir Pestov, University of Ottawa AofA 2008, Maresias, SP, Brazil – p.7/25
![Page 33: Analysis of hierarchical metric-tree indexing schemescris/AofA2008/slides/pestov.pdf · Metric tree indexing schemes for similarity search Vladimir Pestov, University of Ottawa AofA](https://reader034.fdocuments.us/reader034/viewer/2022042117/5e9607baa9c68568ae6e3a95/html5/thumbnails/33.jpg)
Pruning
• If Bε(ω) ∩ B = ∅, the sub-tree descending from the node Bcan be pruned:
B
ω
A B
A ε ε
ε
A
ω
B
A B
ε ε
ε
that is, if it can be certified thatω /∈ Bε = x ∈ Ω: d(x,B) < ε.
• Otherwise the search branches out.
Metric tree indexing schemes for similarity search Vladimir Pestov, University of Ottawa AofA 2008, Maresias, SP, Brazil – p.7/25
![Page 34: Analysis of hierarchical metric-tree indexing schemescris/AofA2008/slides/pestov.pdf · Metric tree indexing schemes for similarity search Vladimir Pestov, University of Ottawa AofA](https://reader034.fdocuments.us/reader034/viewer/2022042117/5e9607baa9c68568ae6e3a95/html5/thumbnails/34.jpg)
Pruning
• If Bε(ω) ∩ B = ∅, the sub-tree descending from the node Bcan be pruned:
B
ω
A B
A ε ε
ε
A
ω
B
A B
ε ε
ε
that is, if it can be certified thatω /∈ Bε = x ∈ Ω: d(x,B) < ε.
• Otherwise the search branches out.
How to “certify” that Bε(ω) ∩ B = ∅?
Metric tree indexing schemes for similarity search Vladimir Pestov, University of Ottawa AofA 2008, Maresias, SP, Brazil – p.7/25
![Page 35: Analysis of hierarchical metric-tree indexing schemescris/AofA2008/slides/pestov.pdf · Metric tree indexing schemes for similarity search Vladimir Pestov, University of Ottawa AofA](https://reader034.fdocuments.us/reader034/viewer/2022042117/5e9607baa9c68568ae6e3a95/html5/thumbnails/35.jpg)
Decision functions
Metric tree indexing schemes for similarity search Vladimir Pestov, University of Ottawa AofA 2008, Maresias, SP, Brazil – p.8/25
![Page 36: Analysis of hierarchical metric-tree indexing schemescris/AofA2008/slides/pestov.pdf · Metric tree indexing schemes for similarity search Vladimir Pestov, University of Ottawa AofA](https://reader034.fdocuments.us/reader034/viewer/2022042117/5e9607baa9c68568ae6e3a95/html5/thumbnails/36.jpg)
Decision functions
Let f : Ω → R be a 1-Lipschitz function,
|f(x) − f(y)| ≤ d(x, y) ∀x, y ∈ Ω,
Metric tree indexing schemes for similarity search Vladimir Pestov, University of Ottawa AofA 2008, Maresias, SP, Brazil – p.8/25
![Page 37: Analysis of hierarchical metric-tree indexing schemescris/AofA2008/slides/pestov.pdf · Metric tree indexing schemes for similarity search Vladimir Pestov, University of Ottawa AofA](https://reader034.fdocuments.us/reader034/viewer/2022042117/5e9607baa9c68568ae6e3a95/html5/thumbnails/37.jpg)
Decision functions
Let f : Ω → R be a 1-Lipschitz function,
|f(x) − f(y)| ≤ d(x, y) ∀x, y ∈ Ω,
such that f B ≤ 0.
Metric tree indexing schemes for similarity search Vladimir Pestov, University of Ottawa AofA 2008, Maresias, SP, Brazil – p.8/25
![Page 38: Analysis of hierarchical metric-tree indexing schemescris/AofA2008/slides/pestov.pdf · Metric tree indexing schemes for similarity search Vladimir Pestov, University of Ottawa AofA](https://reader034.fdocuments.us/reader034/viewer/2022042117/5e9607baa9c68568ae6e3a95/html5/thumbnails/38.jpg)
Decision functions
Let f : Ω → R be a 1-Lipschitz function,
|f(x) − f(y)| ≤ d(x, y) ∀x, y ∈ Ω,
such that f B ≤ 0. Then f Bε < ε,
Metric tree indexing schemes for similarity search Vladimir Pestov, University of Ottawa AofA 2008, Maresias, SP, Brazil – p.8/25
![Page 39: Analysis of hierarchical metric-tree indexing schemescris/AofA2008/slides/pestov.pdf · Metric tree indexing schemes for similarity search Vladimir Pestov, University of Ottawa AofA](https://reader034.fdocuments.us/reader034/viewer/2022042117/5e9607baa9c68568ae6e3a95/html5/thumbnails/39.jpg)
Decision functions
Let f : Ω → R be a 1-Lipschitz function,
|f(x) − f(y)| ≤ d(x, y) ∀x, y ∈ Ω,
such that f B ≤ 0. Then f Bε < ε,
x
f
B 0
f(x)
ε
y
Metric tree indexing schemes for similarity search Vladimir Pestov, University of Ottawa AofA 2008, Maresias, SP, Brazil – p.8/25
![Page 40: Analysis of hierarchical metric-tree indexing schemescris/AofA2008/slides/pestov.pdf · Metric tree indexing schemes for similarity search Vladimir Pestov, University of Ottawa AofA](https://reader034.fdocuments.us/reader034/viewer/2022042117/5e9607baa9c68568ae6e3a95/html5/thumbnails/40.jpg)
Decision functions
Let f : Ω → R be a 1-Lipschitz function,
|f(x) − f(y)| ≤ d(x, y) ∀x, y ∈ Ω,
such that f B ≤ 0. Then f Bε < ε,
x
f
B 0
f(x)
ε
y
that is, f(ω) ≥ ε
Metric tree indexing schemes for similarity search Vladimir Pestov, University of Ottawa AofA 2008, Maresias, SP, Brazil – p.8/25
![Page 41: Analysis of hierarchical metric-tree indexing schemescris/AofA2008/slides/pestov.pdf · Metric tree indexing schemes for similarity search Vladimir Pestov, University of Ottawa AofA](https://reader034.fdocuments.us/reader034/viewer/2022042117/5e9607baa9c68568ae6e3a95/html5/thumbnails/41.jpg)
Decision functions
Let f : Ω → R be a 1-Lipschitz function,
|f(x) − f(y)| ≤ d(x, y) ∀x, y ∈ Ω,
such that f B ≤ 0. Then f Bε < ε,
x
f
B 0
f(x)
ε
y
that is, f(ω) ≥ ε is a certificate that Bε(ω) ∩ B = ∅
Metric tree indexing schemes for similarity search Vladimir Pestov, University of Ottawa AofA 2008, Maresias, SP, Brazil – p.8/25
![Page 42: Analysis of hierarchical metric-tree indexing schemescris/AofA2008/slides/pestov.pdf · Metric tree indexing schemes for similarity search Vladimir Pestov, University of Ottawa AofA](https://reader034.fdocuments.us/reader034/viewer/2022042117/5e9607baa9c68568ae6e3a95/html5/thumbnails/42.jpg)
Metric trees
A metric tree for a metric similarity workload (Ω, ρ,X):
a binary rooted tree T ,
a collection of partially defined 1-Lipschitz functionsft : Bt → R for every inner node t (decision functions),
a collection of bins Bt ⊆ Ω for every leaf node t,containing pointers to elements X ∩ Bt,
such that
Broot(T ) = Ω,
∀ inner node t and child nodes t−, t+, Bt ⊆ Bt− ∪ Bt+.
When processing a range query Bε(ω),
t− [ t+ ] is accessed ⇐⇒ ft(ω) < ε [resp. ft(ω) > −ε].
Metric tree indexing schemes for similarity search Vladimir Pestov, University of Ottawa AofA 2008, Maresias, SP, Brazil – p.9/25
![Page 43: Analysis of hierarchical metric-tree indexing schemescris/AofA2008/slides/pestov.pdf · Metric tree indexing schemes for similarity search Vladimir Pestov, University of Ottawa AofA](https://reader034.fdocuments.us/reader034/viewer/2022042117/5e9607baa9c68568ae6e3a95/html5/thumbnails/43.jpg)
What happens in practice?
Metric tree indexing schemes for similarity search Vladimir Pestov, University of Ottawa AofA 2008, Maresias, SP, Brazil – p.10/25
![Page 44: Analysis of hierarchical metric-tree indexing schemescris/AofA2008/slides/pestov.pdf · Metric tree indexing schemes for similarity search Vladimir Pestov, University of Ottawa AofA](https://reader034.fdocuments.us/reader034/viewer/2022042117/5e9607baa9c68568ae6e3a95/html5/thumbnails/44.jpg)
What happens in practice?
The best indexing schemes for exact similarity search inhigh-dimensional outer datasets are often (not always!)outperformed by linear scan.
∗ ∗ ∗
Metric tree indexing schemes for similarity search Vladimir Pestov, University of Ottawa AofA 2008, Maresias, SP, Brazil – p.10/25
![Page 45: Analysis of hierarchical metric-tree indexing schemescris/AofA2008/slides/pestov.pdf · Metric tree indexing schemes for similarity search Vladimir Pestov, University of Ottawa AofA](https://reader034.fdocuments.us/reader034/viewer/2022042117/5e9607baa9c68568ae6e3a95/html5/thumbnails/45.jpg)
What happens in practice?
The best indexing schemes for exact similarity search inhigh-dimensional outer datasets are often (not always!)outperformed by linear scan.
∗ ∗ ∗
The emphasis has shifted towards approximate similaritysearch:
Metric tree indexing schemes for similarity search Vladimir Pestov, University of Ottawa AofA 2008, Maresias, SP, Brazil – p.10/25
![Page 46: Analysis of hierarchical metric-tree indexing schemescris/AofA2008/slides/pestov.pdf · Metric tree indexing schemes for similarity search Vladimir Pestov, University of Ottawa AofA](https://reader034.fdocuments.us/reader034/viewer/2022042117/5e9607baa9c68568ae6e3a95/html5/thumbnails/46.jpg)
What happens in practice?
The best indexing schemes for exact similarity search inhigh-dimensional outer datasets are often (not always!)outperformed by linear scan.
∗ ∗ ∗
The emphasis has shifted towards approximate similaritysearch:
given ε > 0 and ω ∈ Ω, return a point that is [with highprobability] at a distance < (1 + ε)dNN (ω) from ω.
Metric tree indexing schemes for similarity search Vladimir Pestov, University of Ottawa AofA 2008, Maresias, SP, Brazil – p.10/25
![Page 47: Analysis of hierarchical metric-tree indexing schemescris/AofA2008/slides/pestov.pdf · Metric tree indexing schemes for similarity search Vladimir Pestov, University of Ottawa AofA](https://reader034.fdocuments.us/reader034/viewer/2022042117/5e9607baa9c68568ae6e3a95/html5/thumbnails/47.jpg)
The curse of dimensionality conjecture
Metric tree indexing schemes for similarity search Vladimir Pestov, University of Ottawa AofA 2008, Maresias, SP, Brazil – p.11/25
![Page 48: Analysis of hierarchical metric-tree indexing schemescris/AofA2008/slides/pestov.pdf · Metric tree indexing schemes for similarity search Vladimir Pestov, University of Ottawa AofA](https://reader034.fdocuments.us/reader034/viewer/2022042117/5e9607baa9c68568ae6e3a95/html5/thumbnails/48.jpg)
The curse of dimensionality conjecture
Conjecture.
Metric tree indexing schemes for similarity search Vladimir Pestov, University of Ottawa AofA 2008, Maresias, SP, Brazil – p.11/25
![Page 49: Analysis of hierarchical metric-tree indexing schemescris/AofA2008/slides/pestov.pdf · Metric tree indexing schemes for similarity search Vladimir Pestov, University of Ottawa AofA](https://reader034.fdocuments.us/reader034/viewer/2022042117/5e9607baa9c68568ae6e3a95/html5/thumbnails/49.jpg)
The curse of dimensionality conjecture
Conjecture. Let X ⊆ 0, 1d be a dataset with n points,where the Hamming cube is equipped with the Hamming(ℓ1) distance:
d(x, y) = ♯i : xi 6= yi.
Metric tree indexing schemes for similarity search Vladimir Pestov, University of Ottawa AofA 2008, Maresias, SP, Brazil – p.11/25
![Page 50: Analysis of hierarchical metric-tree indexing schemescris/AofA2008/slides/pestov.pdf · Metric tree indexing schemes for similarity search Vladimir Pestov, University of Ottawa AofA](https://reader034.fdocuments.us/reader034/viewer/2022042117/5e9607baa9c68568ae6e3a95/html5/thumbnails/50.jpg)
The curse of dimensionality conjecture
Conjecture. Let X ⊆ 0, 1d be a dataset with n points,where the Hamming cube is equipped with the Hamming(ℓ1) distance:
d(x, y) = ♯i : xi 6= yi.
Suppose d = no(1), but d = ω(log n).
Metric tree indexing schemes for similarity search Vladimir Pestov, University of Ottawa AofA 2008, Maresias, SP, Brazil – p.11/25
![Page 51: Analysis of hierarchical metric-tree indexing schemescris/AofA2008/slides/pestov.pdf · Metric tree indexing schemes for similarity search Vladimir Pestov, University of Ottawa AofA](https://reader034.fdocuments.us/reader034/viewer/2022042117/5e9607baa9c68568ae6e3a95/html5/thumbnails/51.jpg)
The curse of dimensionality conjecture
Conjecture. Let X ⊆ 0, 1d be a dataset with n points,where the Hamming cube is equipped with the Hamming(ℓ1) distance:
d(x, y) = ♯i : xi 6= yi.
Suppose d = no(1), but d = ω(log n). Any data structure forexact nearest neighbour search in X,
Metric tree indexing schemes for similarity search Vladimir Pestov, University of Ottawa AofA 2008, Maresias, SP, Brazil – p.11/25
![Page 52: Analysis of hierarchical metric-tree indexing schemescris/AofA2008/slides/pestov.pdf · Metric tree indexing schemes for similarity search Vladimir Pestov, University of Ottawa AofA](https://reader034.fdocuments.us/reader034/viewer/2022042117/5e9607baa9c68568ae6e3a95/html5/thumbnails/52.jpg)
The curse of dimensionality conjecture
Conjecture. Let X ⊆ 0, 1d be a dataset with n points,where the Hamming cube is equipped with the Hamming(ℓ1) distance:
d(x, y) = ♯i : xi 6= yi.
Suppose d = no(1), but d = ω(log n). Any data structure forexact nearest neighbour search in X, with dO(1) querytime,
Metric tree indexing schemes for similarity search Vladimir Pestov, University of Ottawa AofA 2008, Maresias, SP, Brazil – p.11/25
![Page 53: Analysis of hierarchical metric-tree indexing schemescris/AofA2008/slides/pestov.pdf · Metric tree indexing schemes for similarity search Vladimir Pestov, University of Ottawa AofA](https://reader034.fdocuments.us/reader034/viewer/2022042117/5e9607baa9c68568ae6e3a95/html5/thumbnails/53.jpg)
The curse of dimensionality conjecture
Conjecture. Let X ⊆ 0, 1d be a dataset with n points,where the Hamming cube is equipped with the Hamming(ℓ1) distance:
d(x, y) = ♯i : xi 6= yi.
Suppose d = no(1), but d = ω(log n). Any data structure forexact nearest neighbour search in X, with dO(1) querytime, must use nω(1) space.
∗ ∗ ∗
Metric tree indexing schemes for similarity search Vladimir Pestov, University of Ottawa AofA 2008, Maresias, SP, Brazil – p.11/25
![Page 54: Analysis of hierarchical metric-tree indexing schemescris/AofA2008/slides/pestov.pdf · Metric tree indexing schemes for similarity search Vladimir Pestov, University of Ottawa AofA](https://reader034.fdocuments.us/reader034/viewer/2022042117/5e9607baa9c68568ae6e3a95/html5/thumbnails/54.jpg)
The curse of dimensionality conjecture
Conjecture. Let X ⊆ 0, 1d be a dataset with n points,where the Hamming cube is equipped with the Hamming(ℓ1) distance:
d(x, y) = ♯i : xi 6= yi.
Suppose d = no(1), but d = ω(log n). Any data structure forexact nearest neighbour search in X, with dO(1) querytime, must use nω(1) space.
∗ ∗ ∗
The cell probe model: Ω(d/ log n) lower bound(Barkol–Rabani, 2000).
Metric tree indexing schemes for similarity search Vladimir Pestov, University of Ottawa AofA 2008, Maresias, SP, Brazil – p.11/25
![Page 55: Analysis of hierarchical metric-tree indexing schemescris/AofA2008/slides/pestov.pdf · Metric tree indexing schemes for similarity search Vladimir Pestov, University of Ottawa AofA](https://reader034.fdocuments.us/reader034/viewer/2022042117/5e9607baa9c68568ae6e3a95/html5/thumbnails/55.jpg)
Concentration of measure
Metric tree indexing schemes for similarity search Vladimir Pestov, University of Ottawa AofA 2008, Maresias, SP, Brazil – p.12/25
![Page 56: Analysis of hierarchical metric-tree indexing schemescris/AofA2008/slides/pestov.pdf · Metric tree indexing schemes for similarity search Vladimir Pestov, University of Ottawa AofA](https://reader034.fdocuments.us/reader034/viewer/2022042117/5e9607baa9c68568ae6e3a95/html5/thumbnails/56.jpg)
Concentration of measure
The phenomenon of concentration of measure on high-
dimensional structures (“Geometric LLN”):
Metric tree indexing schemes for similarity search Vladimir Pestov, University of Ottawa AofA 2008, Maresias, SP, Brazil – p.12/25
![Page 57: Analysis of hierarchical metric-tree indexing schemescris/AofA2008/slides/pestov.pdf · Metric tree indexing schemes for similarity search Vladimir Pestov, University of Ottawa AofA](https://reader034.fdocuments.us/reader034/viewer/2022042117/5e9607baa9c68568ae6e3a95/html5/thumbnails/57.jpg)
Concentration of measure
The phenomenon of concentration of measure onhigh-dimensional structures (“Geometric LLN”):
for a typical “high-dimensional” structure Ω, if A is asubset containing at least half of all points, then themeasure of the ε-neighbourhood Aε of A isoverwhelmingly close to 1 already for small ε > 0.
Metric tree indexing schemes for similarity search Vladimir Pestov, University of Ottawa AofA 2008, Maresias, SP, Brazil – p.12/25
![Page 58: Analysis of hierarchical metric-tree indexing schemescris/AofA2008/slides/pestov.pdf · Metric tree indexing schemes for similarity search Vladimir Pestov, University of Ottawa AofA](https://reader034.fdocuments.us/reader034/viewer/2022042117/5e9607baa9c68568ae6e3a95/html5/thumbnails/58.jpg)
Concentration of measure
The phenomenon of concentration of measure onhigh-dimensional structures (“Geometric LLN”):
for a typical “high-dimensional” structure Ω, if A is asubset containing at least half of all points, then themeasure of the ε-neighbourhood Aε of A isoverwhelmingly close to 1 already for small ε > 0.
)
Aε
ε
A
at least half ofall points
bounds from above
containsΑ Ω
α(Ω,ε)µ(Ω
Ω \ A ε
\ A ε
Metric tree indexing schemes for similarity search Vladimir Pestov, University of Ottawa AofA 2008, Maresias, SP, Brazil – p.12/25
![Page 59: Analysis of hierarchical metric-tree indexing schemescris/AofA2008/slides/pestov.pdf · Metric tree indexing schemes for similarity search Vladimir Pestov, University of Ottawa AofA](https://reader034.fdocuments.us/reader034/viewer/2022042117/5e9607baa9c68568ae6e3a95/html5/thumbnails/59.jpg)
Concentration function
Metric tree indexing schemes for similarity search Vladimir Pestov, University of Ottawa AofA 2008, Maresias, SP, Brazil – p.13/25
![Page 60: Analysis of hierarchical metric-tree indexing schemescris/AofA2008/slides/pestov.pdf · Metric tree indexing schemes for similarity search Vladimir Pestov, University of Ottawa AofA](https://reader034.fdocuments.us/reader034/viewer/2022042117/5e9607baa9c68568ae6e3a95/html5/thumbnails/60.jpg)
Concentration function
Let Ω = (Ω, d, µ) be a metric space with measure.
Metric tree indexing schemes for similarity search Vladimir Pestov, University of Ottawa AofA 2008, Maresias, SP, Brazil – p.13/25
![Page 61: Analysis of hierarchical metric-tree indexing schemescris/AofA2008/slides/pestov.pdf · Metric tree indexing schemes for similarity search Vladimir Pestov, University of Ottawa AofA](https://reader034.fdocuments.us/reader034/viewer/2022042117/5e9607baa9c68568ae6e3a95/html5/thumbnails/61.jpg)
Concentration function
Let Ω = (Ω, d, µ) be a metric space with measure.The concentration function of Ω:
α(ε) =
12 , if ε = 0,1 − min
µ♯ (Aε) : A ⊆ Ω, µ♯(A) ≥ 12
, if ε > 0.
Metric tree indexing schemes for similarity search Vladimir Pestov, University of Ottawa AofA 2008, Maresias, SP, Brazil – p.13/25
![Page 62: Analysis of hierarchical metric-tree indexing schemescris/AofA2008/slides/pestov.pdf · Metric tree indexing schemes for similarity search Vladimir Pestov, University of Ottawa AofA](https://reader034.fdocuments.us/reader034/viewer/2022042117/5e9607baa9c68568ae6e3a95/html5/thumbnails/62.jpg)
Concentration function
Let Ω = (Ω, d, µ) be a metric space with measure.The concentration function of Ω:
α(ε) =
12 , if ε = 0,1 − min
µ♯ (Aε) : A ⊆ Ω, µ♯(A) ≥ 12
, if ε > 0.
For Ω = Σn, the Hamming cube (normalized distance +unif. measure):
αΣn(ε) ≤ e−2ε2n.
Metric tree indexing schemes for similarity search Vladimir Pestov, University of Ottawa AofA 2008, Maresias, SP, Brazil – p.13/25
![Page 63: Analysis of hierarchical metric-tree indexing schemescris/AofA2008/slides/pestov.pdf · Metric tree indexing schemes for similarity search Vladimir Pestov, University of Ottawa AofA](https://reader034.fdocuments.us/reader034/viewer/2022042117/5e9607baa9c68568ae6e3a95/html5/thumbnails/63.jpg)
Concentration function
Let Ω = (Ω, d, µ) be a metric space with measure.The concentration function of Ω:
α(ε) =
12 , if ε = 0,1 − min
µ♯ (Aε) : A ⊆ Ω, µ♯(A) ≥ 12
, if ε > 0.
For Ω = Σn, the Hamming cube (normalized distance +unif. measure):
αΣn(ε) ≤ e−2ε2n.
Gaussian estimates are typical
(Euclidean spheres Sn, cubes I
n, ...)
Metric tree indexing schemes for similarity search Vladimir Pestov, University of Ottawa AofA 2008, Maresias, SP, Brazil – p.13/25
![Page 64: Analysis of hierarchical metric-tree indexing schemescris/AofA2008/slides/pestov.pdf · Metric tree indexing schemes for similarity search Vladimir Pestov, University of Ottawa AofA](https://reader034.fdocuments.us/reader034/viewer/2022042117/5e9607baa9c68568ae6e3a95/html5/thumbnails/64.jpg)
Example: the Hamming cube
0
0.2
0.4
0.6
0.8
1
0 0.05 0.1 0.15 0.2
Concentration function versus Chernoff’s bound, n = 101
Concentration functionChernoff bound
Concentration function α(Σ101, ε) versus Chernoff bound
Metric tree indexing schemes for similarity search Vladimir Pestov, University of Ottawa AofA 2008, Maresias, SP, Brazil – p.14/25
![Page 65: Analysis of hierarchical metric-tree indexing schemescris/AofA2008/slides/pestov.pdf · Metric tree indexing schemes for similarity search Vladimir Pestov, University of Ottawa AofA](https://reader034.fdocuments.us/reader034/viewer/2022042117/5e9607baa9c68568ae6e3a95/html5/thumbnails/65.jpg)
Effects of concentration on branching
Metric tree indexing schemes for similarity search Vladimir Pestov, University of Ottawa AofA 2008, Maresias, SP, Brazil – p.15/25
![Page 66: Analysis of hierarchical metric-tree indexing schemescris/AofA2008/slides/pestov.pdf · Metric tree indexing schemes for similarity search Vladimir Pestov, University of Ottawa AofA](https://reader034.fdocuments.us/reader034/viewer/2022042117/5e9607baa9c68568ae6e3a95/html5/thumbnails/66.jpg)
Effects of concentration on branching
A
A B
< α (C, ε)< α (C, ε)C
ω
Bε ε
ε
Metric tree indexing schemes for similarity search Vladimir Pestov, University of Ottawa AofA 2008, Maresias, SP, Brazil – p.15/25
![Page 67: Analysis of hierarchical metric-tree indexing schemescris/AofA2008/slides/pestov.pdf · Metric tree indexing schemes for similarity search Vladimir Pestov, University of Ottawa AofA](https://reader034.fdocuments.us/reader034/viewer/2022042117/5e9607baa9c68568ae6e3a95/html5/thumbnails/67.jpg)
Effects of concentration on branching
A
A B
< α (C, ε)< α (C, ε)C
ω
Bε ε
ε
For all query points ω ∈ C except a set of measure
≤ 2α(C, ε),
the search algorithm branches out at the node C.
Metric tree indexing schemes for similarity search Vladimir Pestov, University of Ottawa AofA 2008, Maresias, SP, Brazil – p.15/25
![Page 68: Analysis of hierarchical metric-tree indexing schemescris/AofA2008/slides/pestov.pdf · Metric tree indexing schemes for similarity search Vladimir Pestov, University of Ottawa AofA](https://reader034.fdocuments.us/reader034/viewer/2022042117/5e9607baa9c68568ae6e3a95/html5/thumbnails/68.jpg)
Search radius
εNN (ω) is a 1-Lipschitz function, so concentrates nearthe median value, εM ;
εM → Eµ⊗µd(x, y) = O(1).
Example: 1000 pts ∼ [0, 1]10, the ℓ2-εNN :
εM = 0.69419 Ed(x, y) = 1.2765.
Metric tree indexing schemes for similarity search Vladimir Pestov, University of Ottawa AofA 2008, Maresias, SP, Brazil – p.16/25
![Page 69: Analysis of hierarchical metric-tree indexing schemescris/AofA2008/slides/pestov.pdf · Metric tree indexing schemes for similarity search Vladimir Pestov, University of Ottawa AofA](https://reader034.fdocuments.us/reader034/viewer/2022042117/5e9607baa9c68568ae6e3a95/html5/thumbnails/69.jpg)
A naive average O(n) lower bound
Metric tree indexing schemes for similarity search Vladimir Pestov, University of Ottawa AofA 2008, Maresias, SP, Brazil – p.17/25
![Page 70: Analysis of hierarchical metric-tree indexing schemescris/AofA2008/slides/pestov.pdf · Metric tree indexing schemes for similarity search Vladimir Pestov, University of Ottawa AofA](https://reader034.fdocuments.us/reader034/viewer/2022042117/5e9607baa9c68568ae6e3a95/html5/thumbnails/70.jpg)
A naive average O(n) lower bound
Suppose datapoints are distributed according to µ ∈ P (Ω)...
Metric tree indexing schemes for similarity search Vladimir Pestov, University of Ottawa AofA 2008, Maresias, SP, Brazil – p.17/25
![Page 71: Analysis of hierarchical metric-tree indexing schemescris/AofA2008/slides/pestov.pdf · Metric tree indexing schemes for similarity search Vladimir Pestov, University of Ottawa AofA](https://reader034.fdocuments.us/reader034/viewer/2022042117/5e9607baa9c68568ae6e3a95/html5/thumbnails/71.jpg)
A naive average O(n) lower bound
Suppose datapoints are distributed according to µ ∈ P (Ω)......as well as query points.
Metric tree indexing schemes for similarity search Vladimir Pestov, University of Ottawa AofA 2008, Maresias, SP, Brazil – p.17/25
![Page 72: Analysis of hierarchical metric-tree indexing schemescris/AofA2008/slides/pestov.pdf · Metric tree indexing schemes for similarity search Vladimir Pestov, University of Ottawa AofA](https://reader034.fdocuments.us/reader034/viewer/2022042117/5e9607baa9c68568ae6e3a95/html5/thumbnails/72.jpg)
A naive average O(n) lower bound
Suppose datapoints are distributed according to µ ∈ P (Ω)......as well as query points.A balanced metric tree of depth O(log n), with O(n) bins ofroughly equal size (µ-measure).
Metric tree indexing schemes for similarity search Vladimir Pestov, University of Ottawa AofA 2008, Maresias, SP, Brazil – p.17/25
![Page 73: Analysis of hierarchical metric-tree indexing schemescris/AofA2008/slides/pestov.pdf · Metric tree indexing schemes for similarity search Vladimir Pestov, University of Ottawa AofA](https://reader034.fdocuments.us/reader034/viewer/2022042117/5e9607baa9c68568ae6e3a95/html5/thumbnails/73.jpg)
A naive average O(n) lower bound
Suppose datapoints are distributed according to µ ∈ P (Ω)......as well as query points.A balanced metric tree of depth O(log n), with O(n) bins ofroughly equal size (µ-measure).in 1/2 the cases, εNN ≥ εM = O(1), the median NN dist.
Metric tree indexing schemes for similarity search Vladimir Pestov, University of Ottawa AofA 2008, Maresias, SP, Brazil – p.17/25
![Page 74: Analysis of hierarchical metric-tree indexing schemescris/AofA2008/slides/pestov.pdf · Metric tree indexing schemes for similarity search Vladimir Pestov, University of Ottawa AofA](https://reader034.fdocuments.us/reader034/viewer/2022042117/5e9607baa9c68568ae6e3a95/html5/thumbnails/74.jpg)
A naive average O(n) lower bound
Suppose datapoints are distributed according to µ ∈ P (Ω)......as well as query points.A balanced metric tree of depth O(log n), with O(n) bins ofroughly equal size (µ-measure).in 1/2 the cases, εNN ≥ εM = O(1), the median NN dist.For every element A of level t partition,
α(A, εM ) ≤ 2µ(A)−1α(Ω, εM/2) = O(2t)e−O(1)ε2Md.
Metric tree indexing schemes for similarity search Vladimir Pestov, University of Ottawa AofA 2008, Maresias, SP, Brazil – p.17/25
![Page 75: Analysis of hierarchical metric-tree indexing schemescris/AofA2008/slides/pestov.pdf · Metric tree indexing schemes for similarity search Vladimir Pestov, University of Ottawa AofA](https://reader034.fdocuments.us/reader034/viewer/2022042117/5e9607baa9c68568ae6e3a95/html5/thumbnails/75.jpg)
A naive average O(n) lower bound
Suppose datapoints are distributed according to µ ∈ P (Ω)......as well as query points.A balanced metric tree of depth O(log n), with O(n) bins ofroughly equal size (µ-measure).in 1/2 the cases, εNN ≥ εM = O(1), the median NN dist.For every element A of level t partition,
α(A, εM ) ≤ 2µ(A)−1α(Ω, εM/2) = O(2t)e−O(1)ε2Md.
branching at every node occurs for all ω except
Metric tree indexing schemes for similarity search Vladimir Pestov, University of Ottawa AofA 2008, Maresias, SP, Brazil – p.17/25
![Page 76: Analysis of hierarchical metric-tree indexing schemescris/AofA2008/slides/pestov.pdf · Metric tree indexing schemes for similarity search Vladimir Pestov, University of Ottawa AofA](https://reader034.fdocuments.us/reader034/viewer/2022042117/5e9607baa9c68568ae6e3a95/html5/thumbnails/76.jpg)
A naive average O(n) lower bound
Suppose datapoints are distributed according to µ ∈ P (Ω)......as well as query points.A balanced metric tree of depth O(log n), with O(n) bins ofroughly equal size (µ-measure).in 1/2 the cases, εNN ≥ εM = O(1), the median NN dist.For every element A of level t partition,
α(A, εM ) ≤ 2µ(A)−1α(Ω, εM/2) = O(2t)e−O(1)ε2Md.
branching at every node occurs for all ω except
♯(nodes) × 2 supA
α(A, ε) = O(n2)e−O(1)d = o(1),
because d = ω(log n), e−O(1)d is superpoly(n).
Metric tree indexing schemes for similarity search Vladimir Pestov, University of Ottawa AofA 2008, Maresias, SP, Brazil – p.17/25
![Page 77: Analysis of hierarchical metric-tree indexing schemescris/AofA2008/slides/pestov.pdf · Metric tree indexing schemes for similarity search Vladimir Pestov, University of Ottawa AofA](https://reader034.fdocuments.us/reader034/viewer/2022042117/5e9607baa9c68568ae6e3a95/html5/thumbnails/77.jpg)
What’s wrong?
Metric tree indexing schemes for similarity search Vladimir Pestov, University of Ottawa AofA 2008, Maresias, SP, Brazil – p.18/25
![Page 78: Analysis of hierarchical metric-tree indexing schemescris/AofA2008/slides/pestov.pdf · Metric tree indexing schemes for similarity search Vladimir Pestov, University of Ottawa AofA](https://reader034.fdocuments.us/reader034/viewer/2022042117/5e9607baa9c68568ae6e3a95/html5/thumbnails/78.jpg)
What’s wrong?
A dataset X is modeled by a sequence of i.i.d. r.v. Xi ∼ µ.
Metric tree indexing schemes for similarity search Vladimir Pestov, University of Ottawa AofA 2008, Maresias, SP, Brazil – p.18/25
![Page 79: Analysis of hierarchical metric-tree indexing schemescris/AofA2008/slides/pestov.pdf · Metric tree indexing schemes for similarity search Vladimir Pestov, University of Ottawa AofA](https://reader034.fdocuments.us/reader034/viewer/2022042117/5e9607baa9c68568ae6e3a95/html5/thumbnails/79.jpg)
What’s wrong?
A dataset X is modeled by a sequence of i.i.d. r.v. Xi ∼ µ.
Implicit assumption: empirical measure µn(A) = |A|n ≈ µ(A).
Metric tree indexing schemes for similarity search Vladimir Pestov, University of Ottawa AofA 2008, Maresias, SP, Brazil – p.18/25
![Page 80: Analysis of hierarchical metric-tree indexing schemescris/AofA2008/slides/pestov.pdf · Metric tree indexing schemes for similarity search Vladimir Pestov, University of Ottawa AofA](https://reader034.fdocuments.us/reader034/viewer/2022042117/5e9607baa9c68568ae6e3a95/html5/thumbnails/80.jpg)
What’s wrong?
A dataset X is modeled by a sequence of i.i.d. r.v. Xi ∼ µ.
Implicit assumption: empirical measure µn(A) = |A|n ≈ µ(A).
But the scheme is chosen after seeing an instance X!
Metric tree indexing schemes for similarity search Vladimir Pestov, University of Ottawa AofA 2008, Maresias, SP, Brazil – p.18/25
![Page 81: Analysis of hierarchical metric-tree indexing schemescris/AofA2008/slides/pestov.pdf · Metric tree indexing schemes for similarity search Vladimir Pestov, University of Ottawa AofA](https://reader034.fdocuments.us/reader034/viewer/2022042117/5e9607baa9c68568ae6e3a95/html5/thumbnails/81.jpg)
What’s wrong?
A dataset X is modeled by a sequence of i.i.d. r.v. Xi ∼ µ.
Implicit assumption: empirical measure µn(A) = |A|n ≈ µ(A).
But the scheme is chosen after seeing an instance X!
0
0.2
0.4
0.6
0.8
1
0 0.2 0.4 0.6 0.8 1
How much can be said of concentration in (Ω, µn)?
Metric tree indexing schemes for similarity search Vladimir Pestov, University of Ottawa AofA 2008, Maresias, SP, Brazil – p.18/25
![Page 82: Analysis of hierarchical metric-tree indexing schemescris/AofA2008/slides/pestov.pdf · Metric tree indexing schemes for similarity search Vladimir Pestov, University of Ottawa AofA](https://reader034.fdocuments.us/reader034/viewer/2022042117/5e9607baa9c68568ae6e3a95/html5/thumbnails/82.jpg)
VC dimension
Metric tree indexing schemes for similarity search Vladimir Pestov, University of Ottawa AofA 2008, Maresias, SP, Brazil – p.19/25
![Page 83: Analysis of hierarchical metric-tree indexing schemescris/AofA2008/slides/pestov.pdf · Metric tree indexing schemes for similarity search Vladimir Pestov, University of Ottawa AofA](https://reader034.fdocuments.us/reader034/viewer/2022042117/5e9607baa9c68568ae6e3a95/html5/thumbnails/83.jpg)
VC dimension
Let A be a family of subsets of Ω (a concept class).B ⊆ Ω is shattered by A if for each C ⊆ B there is A ∈ A
such thatA ∩ B = C.
Metric tree indexing schemes for similarity search Vladimir Pestov, University of Ottawa AofA 2008, Maresias, SP, Brazil – p.19/25
![Page 84: Analysis of hierarchical metric-tree indexing schemescris/AofA2008/slides/pestov.pdf · Metric tree indexing schemes for similarity search Vladimir Pestov, University of Ottawa AofA](https://reader034.fdocuments.us/reader034/viewer/2022042117/5e9607baa9c68568ae6e3a95/html5/thumbnails/84.jpg)
VC dimension
Let A be a family of subsets of Ω (a concept class).B ⊆ Ω is shattered by A if for each C ⊆ B there is A ∈ A
such thatA ∩ B = C.
A
Ω
B
C
Metric tree indexing schemes for similarity search Vladimir Pestov, University of Ottawa AofA 2008, Maresias, SP, Brazil – p.19/25
![Page 85: Analysis of hierarchical metric-tree indexing schemescris/AofA2008/slides/pestov.pdf · Metric tree indexing schemes for similarity search Vladimir Pestov, University of Ottawa AofA](https://reader034.fdocuments.us/reader034/viewer/2022042117/5e9607baa9c68568ae6e3a95/html5/thumbnails/85.jpg)
VC dimension
Let A be a family of subsets of Ω (a concept class).B ⊆ Ω is shattered by A if for each C ⊆ B there is A ∈ A
such thatA ∩ B = C.
A
Ω
B
C
The Vapnik–Chervonenkis dimension VC-dim (A ) of A is
the largest cardinality of a set B ⊆ Ω shattered by A .Metric tree indexing schemes for similarity search Vladimir Pestov, University of Ottawa AofA 2008, Maresias, SP, Brazil – p.19/25
![Page 86: Analysis of hierarchical metric-tree indexing schemescris/AofA2008/slides/pestov.pdf · Metric tree indexing schemes for similarity search Vladimir Pestov, University of Ottawa AofA](https://reader034.fdocuments.us/reader034/viewer/2022042117/5e9607baa9c68568ae6e3a95/html5/thumbnails/86.jpg)
Statistical learning bounds
Metric tree indexing schemes for similarity search Vladimir Pestov, University of Ottawa AofA 2008, Maresias, SP, Brazil – p.20/25
![Page 87: Analysis of hierarchical metric-tree indexing schemescris/AofA2008/slides/pestov.pdf · Metric tree indexing schemes for similarity search Vladimir Pestov, University of Ottawa AofA](https://reader034.fdocuments.us/reader034/viewer/2022042117/5e9607baa9c68568ae6e3a95/html5/thumbnails/87.jpg)
Statistical learning bounds
Let A ⊆ 2Ω be a concept class of finite VC dimension, d.
Metric tree indexing schemes for similarity search Vladimir Pestov, University of Ottawa AofA 2008, Maresias, SP, Brazil – p.20/25
![Page 88: Analysis of hierarchical metric-tree indexing schemescris/AofA2008/slides/pestov.pdf · Metric tree indexing schemes for similarity search Vladimir Pestov, University of Ottawa AofA](https://reader034.fdocuments.us/reader034/viewer/2022042117/5e9607baa9c68568ae6e3a95/html5/thumbnails/88.jpg)
Statistical learning bounds
Let A ⊆ 2Ω be a concept class of finite VC dimension, d.Then for all ǫ, δ > 0 and every probability measure µ on Ω,
Metric tree indexing schemes for similarity search Vladimir Pestov, University of Ottawa AofA 2008, Maresias, SP, Brazil – p.20/25
![Page 89: Analysis of hierarchical metric-tree indexing schemescris/AofA2008/slides/pestov.pdf · Metric tree indexing schemes for similarity search Vladimir Pestov, University of Ottawa AofA](https://reader034.fdocuments.us/reader034/viewer/2022042117/5e9607baa9c68568ae6e3a95/html5/thumbnails/89.jpg)
Statistical learning bounds
Let A ⊆ 2Ω be a concept class of finite VC dimension, d.Then for all ǫ, δ > 0 and every probability measure µ on Ω,if n datapoints in X are drawn randomly and independentlyacoording to µ, then with confidence 1 − δ
∀A ∈ A ,
∣
∣
∣
∣
µ(A) −X ∩ A
n
∣
∣
∣
∣
< ǫ,
Metric tree indexing schemes for similarity search Vladimir Pestov, University of Ottawa AofA 2008, Maresias, SP, Brazil – p.20/25
![Page 90: Analysis of hierarchical metric-tree indexing schemescris/AofA2008/slides/pestov.pdf · Metric tree indexing schemes for similarity search Vladimir Pestov, University of Ottawa AofA](https://reader034.fdocuments.us/reader034/viewer/2022042117/5e9607baa9c68568ae6e3a95/html5/thumbnails/90.jpg)
Statistical learning bounds
Let A ⊆ 2Ω be a concept class of finite VC dimension, d.Then for all ǫ, δ > 0 and every probability measure µ on Ω,if n datapoints in X are drawn randomly and independentlyacoording to µ, then with confidence 1 − δ
∀A ∈ A ,
∣
∣
∣
∣
µ(A) −X ∩ A
n
∣
∣
∣
∣
< ǫ,
provided n is large enough:
n ≥128
ε2
(
d log
(
2e2
εlog
2e
ε
)
+ log8
δ
)
.
Metric tree indexing schemes for similarity search Vladimir Pestov, University of Ottawa AofA 2008, Maresias, SP, Brazil – p.20/25
![Page 91: Analysis of hierarchical metric-tree indexing schemescris/AofA2008/slides/pestov.pdf · Metric tree indexing schemes for similarity search Vladimir Pestov, University of Ottawa AofA](https://reader034.fdocuments.us/reader034/viewer/2022042117/5e9607baa9c68568ae6e3a95/html5/thumbnails/91.jpg)
Bin access lemma
Metric tree indexing schemes for similarity search Vladimir Pestov, University of Ottawa AofA 2008, Maresias, SP, Brazil – p.21/25
![Page 92: Analysis of hierarchical metric-tree indexing schemescris/AofA2008/slides/pestov.pdf · Metric tree indexing schemes for similarity search Vladimir Pestov, University of Ottawa AofA](https://reader034.fdocuments.us/reader034/viewer/2022042117/5e9607baa9c68568ae6e3a95/html5/thumbnails/92.jpg)
Bin access lemma
Let δ > 0, and let γ be a collection of subsets A ⊆ Ω ofmeasure µ(A) ≤ α(δ) ≤ 1
4 each, satisfying µ(∪γ) ≥ 1/2.
Metric tree indexing schemes for similarity search Vladimir Pestov, University of Ottawa AofA 2008, Maresias, SP, Brazil – p.21/25
![Page 93: Analysis of hierarchical metric-tree indexing schemescris/AofA2008/slides/pestov.pdf · Metric tree indexing schemes for similarity search Vladimir Pestov, University of Ottawa AofA](https://reader034.fdocuments.us/reader034/viewer/2022042117/5e9607baa9c68568ae6e3a95/html5/thumbnails/93.jpg)
Bin access lemma
Let δ > 0, and let γ be a collection of subsets A ⊆ Ω ofmeasure µ(A) ≤ α(δ) ≤ 1
4 each, satisfying µ(∪γ) ≥ 1/2.Then the 2δ-neighbourhood of every point ω ∈ Ω, apart froma set of measure at most 1
2α(δ)1
2 , meets at least ⌈12α(δ)−
1
2 ⌉
elements of γ.∗ ∗ ∗
Metric tree indexing schemes for similarity search Vladimir Pestov, University of Ottawa AofA 2008, Maresias, SP, Brazil – p.21/25
![Page 94: Analysis of hierarchical metric-tree indexing schemescris/AofA2008/slides/pestov.pdf · Metric tree indexing schemes for similarity search Vladimir Pestov, University of Ottawa AofA](https://reader034.fdocuments.us/reader034/viewer/2022042117/5e9607baa9c68568ae6e3a95/html5/thumbnails/94.jpg)
Bin access lemma
Let δ > 0, and let γ be a collection of subsets A ⊆ Ω ofmeasure µ(A) ≤ α(δ) ≤ 1
4 each, satisfying µ(∪γ) ≥ 1/2.Then the 2δ-neighbourhood of every point ω ∈ Ω, apart froma set of measure at most 1
2α(δ)1
2 , meets at least ⌈12α(δ)−
1
2 ⌉
elements of γ.∗ ∗ ∗
If we can now guarantee that the bins are not too large, we
get a lower bound on the number of bin accesses.
Metric tree indexing schemes for similarity search Vladimir Pestov, University of Ottawa AofA 2008, Maresias, SP, Brazil – p.21/25
![Page 95: Analysis of hierarchical metric-tree indexing schemescris/AofA2008/slides/pestov.pdf · Metric tree indexing schemes for similarity search Vladimir Pestov, University of Ottawa AofA](https://reader034.fdocuments.us/reader034/viewer/2022042117/5e9607baa9c68568ae6e3a95/html5/thumbnails/95.jpg)
Bin complexity estimates
Metric tree indexing schemes for similarity search Vladimir Pestov, University of Ottawa AofA 2008, Maresias, SP, Brazil – p.22/25
![Page 96: Analysis of hierarchical metric-tree indexing schemescris/AofA2008/slides/pestov.pdf · Metric tree indexing schemes for similarity search Vladimir Pestov, University of Ottawa AofA](https://reader034.fdocuments.us/reader034/viewer/2022042117/5e9607baa9c68568ae6e3a95/html5/thumbnails/96.jpg)
Bin complexity estimates
Let F be a class of 1-Lipschitz functions used forconstructing a metric tree of a particular type.
Metric tree indexing schemes for similarity search Vladimir Pestov, University of Ottawa AofA 2008, Maresias, SP, Brazil – p.22/25
![Page 97: Analysis of hierarchical metric-tree indexing schemescris/AofA2008/slides/pestov.pdf · Metric tree indexing schemes for similarity search Vladimir Pestov, University of Ottawa AofA](https://reader034.fdocuments.us/reader034/viewer/2022042117/5e9607baa9c68568ae6e3a95/html5/thumbnails/97.jpg)
Bin complexity estimates
Let F be a class of 1-Lipschitz functions used forconstructing a metric tree of a particular type.Let A be the concept class of all solution sets toinequalities
f R a, f ∈ F , a ∈ R.
Metric tree indexing schemes for similarity search Vladimir Pestov, University of Ottawa AofA 2008, Maresias, SP, Brazil – p.22/25
![Page 98: Analysis of hierarchical metric-tree indexing schemescris/AofA2008/slides/pestov.pdf · Metric tree indexing schemes for similarity search Vladimir Pestov, University of Ottawa AofA](https://reader034.fdocuments.us/reader034/viewer/2022042117/5e9607baa9c68568ae6e3a95/html5/thumbnails/98.jpg)
Bin complexity estimates
Let F be a class of 1-Lipschitz functions used forconstructing a metric tree of a particular type.Let A be the concept class of all solution sets toinequalities
f R a, f ∈ F , a ∈ R.
Supposep = VC-dim (A ) < ∞
(pseudodimension of F in the sense of Vapnik).
Metric tree indexing schemes for similarity search Vladimir Pestov, University of Ottawa AofA 2008, Maresias, SP, Brazil – p.22/25
![Page 99: Analysis of hierarchical metric-tree indexing schemescris/AofA2008/slides/pestov.pdf · Metric tree indexing schemes for similarity search Vladimir Pestov, University of Ottawa AofA](https://reader034.fdocuments.us/reader034/viewer/2022042117/5e9607baa9c68568ae6e3a95/html5/thumbnails/99.jpg)
Bin complexity estimates
Let F be a class of 1-Lipschitz functions used forconstructing a metric tree of a particular type.Let A be the concept class of all solution sets toinequalities
f R a, f ∈ F , a ∈ R.
Supposep = VC-dim (A ) < ∞
(pseudodimension of F in the sense of Vapnik).Denote B the class of all bins of all possible metric trees ofdepth ≤ h built using F . Then
VC-dim (B) ≤ 2hp log(hp) = O(hp).
Metric tree indexing schemes for similarity search Vladimir Pestov, University of Ottawa AofA 2008, Maresias, SP, Brazil – p.22/25
![Page 100: Analysis of hierarchical metric-tree indexing schemescris/AofA2008/slides/pestov.pdf · Metric tree indexing schemes for similarity search Vladimir Pestov, University of Ottawa AofA](https://reader034.fdocuments.us/reader034/viewer/2022042117/5e9607baa9c68568ae6e3a95/html5/thumbnails/100.jpg)
Rigorous lower bounds
Metric tree indexing schemes for similarity search Vladimir Pestov, University of Ottawa AofA 2008, Maresias, SP, Brazil – p.23/25
![Page 101: Analysis of hierarchical metric-tree indexing schemescris/AofA2008/slides/pestov.pdf · Metric tree indexing schemes for similarity search Vladimir Pestov, University of Ottawa AofA](https://reader034.fdocuments.us/reader034/viewer/2022042117/5e9607baa9c68568ae6e3a95/html5/thumbnails/101.jpg)
Rigorous lower bounds
thm. Let F be a class of 1-Lipschitz functions on 0, 1d
with VC dimension of the class of sets given by inequalitiesf R a being poly(d).
Metric tree indexing schemes for similarity search Vladimir Pestov, University of Ottawa AofA 2008, Maresias, SP, Brazil – p.23/25
![Page 102: Analysis of hierarchical metric-tree indexing schemescris/AofA2008/slides/pestov.pdf · Metric tree indexing schemes for similarity search Vladimir Pestov, University of Ottawa AofA](https://reader034.fdocuments.us/reader034/viewer/2022042117/5e9607baa9c68568ae6e3a95/html5/thumbnails/102.jpg)
Rigorous lower bounds
thm. Let F be a class of 1-Lipschitz functions on 0, 1d
with VC dimension of the class of sets given by inequalitiesf R a being poly(d).With probability approaching 1, every metric tree indexingscheme for a random sample X of 0, 1d containing n
points, where d = no(1) and d = ω(log n),
Metric tree indexing schemes for similarity search Vladimir Pestov, University of Ottawa AofA 2008, Maresias, SP, Brazil – p.23/25
![Page 103: Analysis of hierarchical metric-tree indexing schemescris/AofA2008/slides/pestov.pdf · Metric tree indexing schemes for similarity search Vladimir Pestov, University of Ottawa AofA](https://reader034.fdocuments.us/reader034/viewer/2022042117/5e9607baa9c68568ae6e3a95/html5/thumbnails/103.jpg)
Rigorous lower bounds
thm. Let F be a class of 1-Lipschitz functions on 0, 1d
with VC dimension of the class of sets given by inequalitiesf R a being poly(d).With probability approaching 1, every metric tree indexingscheme for a random sample X of 0, 1d containing n
points, where d = no(1) and d = ω(log n), will have theworst-case performance dω(1).
Metric tree indexing schemes for similarity search Vladimir Pestov, University of Ottawa AofA 2008, Maresias, SP, Brazil – p.23/25
![Page 104: Analysis of hierarchical metric-tree indexing schemescris/AofA2008/slides/pestov.pdf · Metric tree indexing schemes for similarity search Vladimir Pestov, University of Ottawa AofA](https://reader034.fdocuments.us/reader034/viewer/2022042117/5e9607baa9c68568ae6e3a95/html5/thumbnails/104.jpg)
Rigorous lower bounds
thm. Let F be a class of 1-Lipschitz functions on 0, 1d
with VC dimension of the class of sets given by inequalitiesf R a being poly(d).With probability approaching 1, every metric tree indexingscheme for a random sample X of 0, 1d containing n
points, where d = no(1) and d = ω(log n), will have theworst-case performance dω(1).
⊳ Can suppose every bin contains poly(d) datapoints, and
the tree depth is poly(d). The VC-dim of all possible bins
is poly(d) = o(n). If ǫ = n1/2−γ, by learning estimates the
measure of each bin of the scheme is O(n−1/2+γ), so there
will be Ω(n1/4−γ) = dω(1) bin accesses. ⊲Metric tree indexing schemes for similarity search Vladimir Pestov, University of Ottawa AofA 2008, Maresias, SP, Brazil – p.23/25
![Page 105: Analysis of hierarchical metric-tree indexing schemescris/AofA2008/slides/pestov.pdf · Metric tree indexing schemes for similarity search Vladimir Pestov, University of Ottawa AofA](https://reader034.fdocuments.us/reader034/viewer/2022042117/5e9607baa9c68568ae6e3a95/html5/thumbnails/105.jpg)
Example: vp-tree
The vp-tree (Yianilos) uses decision functions of the form
ft(ω) = (1/2)(ρ(xt+, ω) − ρ(xt−, ω)),
where
t± are two children of t and
xt± are the vantage points for the node t.
If Ω = Rd, VC dimension is d + 1.
Metric tree indexing schemes for similarity search Vladimir Pestov, University of Ottawa AofA 2008, Maresias, SP, Brazil – p.24/25
![Page 106: Analysis of hierarchical metric-tree indexing schemescris/AofA2008/slides/pestov.pdf · Metric tree indexing schemes for similarity search Vladimir Pestov, University of Ottawa AofA](https://reader034.fdocuments.us/reader034/viewer/2022042117/5e9607baa9c68568ae6e3a95/html5/thumbnails/106.jpg)
Example: M -tree
The M-tree (Ciaccia, Patella, Zezula) employs decisionfunctions
ft(ω) = ρ(xt, ω) − supτ∈Bt
ρ(xt, τ),
where
Bt is a block corresponding to the node t,
xt is a datapoint chosen for each node t, and
suprema on the r.h.s. are precomputed and stored.
If Ω = Rd, VC-dim is d + 1; for Ω = 0, 1d, it is O(d).
Metric tree indexing schemes for similarity search Vladimir Pestov, University of Ottawa AofA 2008, Maresias, SP, Brazil – p.25/25