Mehlhorn-Tsakalidis Revisited
Christos Zaroliagis
[Joint work with A. Kaporis, C. Makris, S. Sioutas, A. Tsakalidis, & K. Tsichlas]
1 / 24
Mehlhorn-Tsakalidis - once upon a time ...
2 / 24
Mehlhorn-Tsakalidis - once upon a time ...
2 / 24
Dynamic Dictionary Search - Problem
Given set of keys S = {X1, . . . , Xn}, Xi ∈ [a, b] ⊂ IR
Maintain (under key insertions and deletions) a non-decreasing ordering
P = {X(1), . . . , X(n)} of S such that:
given a query element y
find largest X(j) ∈ P : X(j) ≤ y
3 / 24
Dynamic Dictionary Search - Classical methods
Use arbitrary rule to select splitting element X(k) ∈ P that splits P into two
subsets, and recurse
e.g., in binary search, splitting element = middle element
Balanced search trees (AVL-trees, red-black trees, (a, b)-trees, etc):
search & update time: O(log n)
Bounds
• optimal for Pointer Machine
• RAM: Θ(√
log n
log log n
)[Andersson & Thorup, 2001]
4 / 24
Dynamic Dictionary Search - Classical methods
Use arbitrary rule to select splitting element X(k) ∈ P that splits P into two
subsets, and recurse
e.g., in binary search, splitting element = middle element
Balanced search trees (AVL-trees, red-black trees, (a, b)-trees, etc):
search & update time: O(log n)
Bounds
• optimal for Pointer Machine
• RAM: Θ(√
log n
log log n
)[Andersson & Thorup, 2001]
4 / 24
Dynamic Dictionary Search - Classical methods
Use arbitrary rule to select splitting element X(k) ∈ P that splits P into two
subsets, and recurse
e.g., in binary search, splitting element = middle element
Balanced search trees (AVL-trees, red-black trees, (a, b)-trees, etc):
search & update time: O(log n)
Bounds
• optimal for Pointer Machine
• RAM: Θ(√
log n
log log n
)[Andersson & Thorup, 2001]
4 / 24
Surpassing the Bounds - Expected Complexities
Interpolation Search [Peterson,57]
Select splitting elements by taking advantage of the statistical properties of
the keys
Splitting elements are spread closer to query key y
Expected search time (static problem):
Θ(log log n), uniform distribution
[Yao & Yao,76], [Gonnet,77], [Perl & Reingold,77], [Perl, Itai & Avni,78],
[Gonnet, Rogers & George, 80]
Θ(log log n), regular (non-uniform) distribution
[Willard,85]
5 / 24
Surpassing the Bounds - Expected Complexities
Interpolation Search [Peterson,57]
Select splitting elements by taking advantage of the statistical properties of
the keys
Splitting elements are spread closer to query key y
Expected search time (static problem):
Θ(log log n), uniform distribution
[Yao & Yao,76], [Gonnet,77], [Perl & Reingold,77], [Perl, Itai & Avni,78],
[Gonnet, Rogers & George, 80]
Θ(log log n), regular (non-uniform) distribution
[Willard,85]
5 / 24
Surpassing the Bounds - Expected Complexities
Interpolation Search [Peterson,57]
Select splitting elements by taking advantage of the statistical properties of
the keys
Splitting elements are spread closer to query key y
Expected search time (static problem):
Θ(log log n), uniform distribution
[Yao & Yao,76], [Gonnet,77], [Perl & Reingold,77], [Perl, Itai & Avni,78],
[Gonnet, Rogers & George, 80]
Θ(log log n), regular (non-uniform) distribution
[Willard,85]
5 / 24
Surpassing the Bounds - Expected Complexities
Interpolation Search [Peterson,57]
Select splitting elements by taking advantage of the statistical properties of
the keys
Splitting elements are spread closer to query key y
Expected search time (static problem):
Θ(log log n), uniform distribution
[Yao & Yao,76], [Gonnet,77], [Perl & Reingold,77], [Perl, Itai & Avni,78],
[Gonnet, Rogers & George, 80]
Θ(log log n), regular (non-uniform) distribution
[Willard,85]
5 / 24
Dynamic Interpolation Search
Scenario: µ-random insertions, random deletions
µ uniform [Frederickson,83], [Itai,Konheim & Rodeh, 81]
search: O(log log n) expected time
update: O(nε) time, 0 < ε < 1
6 / 24
Dynamic Interpolation Search
µ (f1, f2)-smooth (peak-less or peak-adjusting) [Mehlhorn & Tsakalidis,85]
Pr
[c2 −
c3 − c1
f1(n)≤ X ≤ c2 | c1 ≤ X ≤ c3
]≤ βf2(n)
n
smooth ⊃ {uniform, regular, bounded, other non-uniform}→ any probability distribution is (f1,Θ(n))-smooth
search: O(log log n) expected time
update:
{O(log log n) expected [Mehlhorn & Tsakalidis,85]
O(1) time (position given) [Andersson & Mattson,93]
7 / 24
Dynamic Interpolation Search
µ (f1, f2)-smooth (peak-less or peak-adjusting) [Mehlhorn & Tsakalidis,85]
Pr
[c2 −
c3 − c1
f1(n)≤ X ≤ c2 | c1 ≤ X ≤ c3
]≤ βf2(n)
n
smooth ⊃ {uniform, regular, bounded, other non-uniform}→ any probability distribution is (f1,Θ(n))-smooth
search: O(log log n) expected time
update:
{O(log log n) expected [Mehlhorn & Tsakalidis,85]
O(1) time (position given) [Andersson & Mattson,93]
7 / 24
Dynamic Interpolation Search
µ (f1, f2)-smooth (peak-less or peak-adjusting) [Mehlhorn & Tsakalidis,85]
Pr
[c2 −
c3 − c1
f1(n)≤ X ≤ c2 | c1 ≤ X ≤ c3
]≤ βf2(n)
n
smooth ⊃ {uniform, regular, bounded, other non-uniform}→ any probability distribution is (f1,Θ(n))-smooth
search: O(log log n) expected time
update:
{O(log log n) expected [Mehlhorn & Tsakalidis,85]
O(1) time (position given) [Andersson & Mattson,93]
7 / 24
Static/Dynamic Interpolation Search
Key AssumptionConditional distribution on the subinterval dictated by an arbitrary interpolation
step remains unaffected (i.e., µ-random)
8 / 24
This Work (Revisiting Mehl-Tsak DIS) - Result 1
Key assumption is valid
• only when elements are distinct (indeed assumed in all previous work)
V produced under some continuous (or discrete) distribution with zero probability
of collision
• otherwise, it fails
Probabilistic analyses of previous Interpolation Search structures are
inapplicable to sequences of non-distinct elements
V produced by discrete probability distributions with measurable (non-zero)
probability of key collision
9 / 24
This Work (Revisiting Mehl-Tsak DIS) - Result 1
Key assumption is valid
• only when elements are distinct (indeed assumed in all previous work)
V produced under some continuous (or discrete) distribution with zero probability
of collision
• otherwise, it fails
Probabilistic analyses of previous Interpolation Search structures are
inapplicable to sequences of non-distinct elements
V produced by discrete probability distributions with measurable (non-zero)
probability of key collision
9 / 24
Interpolation Search - Lack of Generalization
∃ applications where we must store duplicates
Secondary indices in databases
Searching tables with alphabetic keys (names, dictionary entries, etc)
B keys follow a non-uniform, discrete distribution and collisions do occur
B Empirical results showed that Interpolation Search has a very poor
performance on such data
[Perl & Reingold,77], [Burton & Lewis,80], [Santorno & Sidney,85], [Perl & Gabriel,92]
B Heuristics; no rigorous performance analysis
Storing duplicates once: statistical properties are destroyed
10 / 24
This Work (Revisiting Mehl-Tsak DIS) - Result 2
New dynamic interpolation search data structure
search: O(log log n) expected time
update: O(1) time (position given)
Bounds hold with high probability
Analysis
− always valid irrespectively of the distinctness or not of the elements
− applies to
(n
(log log n)1+ε , nδ)
-smooth input distributions, δ ∈ (0, 1), ε > 0
Robust (no apriori knowledge of the input distribution is required)
11 / 24
This Work (Revisiting Mehl-Tsak DIS) - Other Results
Modifications to our data structure yield
O(1) expected search time w.h.p. for a (largest so far) subclass of smooth
distributions
O(log log n) expected search time w.h.p. for Power-law and Binomialdistributions
B Power-law and Binomial: @ proper choices of f1, f2 to achieve O(log log n)expected search time
12 / 24
Interpolation Search [Mehlhorn & Tsakalidis,85], [Andersson & Mattson,93]
eFirst Interpolation Step
Children of the root node:
Root���������
""""
XXXXXXXXXe1
e2
. . . e
. . . v . . .
. . . eR(n)
Root’s ID array
ID[1] ID[2] ID[j] ID[I(n)]
Root’s REP array
a
a
b
b
pppppppppppp
pppppppppppp
pppppppppppp
ppppppppp
pppppppppppp
pppppppppppp pppppppp
ppppa+ 1 b−aI(n)
rREP[1]
rREP[2]
rREP[3]
r. . .
. . .
. . .REP[t] . . .
. . .
rREP[R(n)]
rREP[t+ l]
a+ 2 b−aI(n) a+ (j − 1) b−a
I(n) a+ j b−aI(n) a+ (f1(n)− 1) b−a
I(n)
µ is (f1(n), f2(n)) = (I(n), n/R(n))-smooth
root has R(n) children; a root-child has R(n/R(n)) children, etc.
ID array: partitions [a, b] into I(n) equal-length parts
REP array: partitions P into R(n) equal-sized subsets, each of size n/R(n)REP[i]↔ Pi = {X ∈ P | X((i−1) n
R(n)) < X ≤ X(i n
R(n))}, i = 1, . . . , R(n)
13 / 24
Interpolation Search [Mehlhorn & Tsakalidis,85], [Andersson & Mattson,93]
eFirst Interpolation Step
Children of the root node:
Root���������
""""
XXXXXXXXXe1
e2
. . . e
. . . v . . .
. . . eR(n)
Root’s ID array
ID[1] ID[2] ID[j] ID[I(n)]
Root’s REP array
a
a
b
b
pppppppppppp
pppppppppppp
pppppppppppp
ppppppppp
pppppppppppp
pppppppppppp pppppppp
ppppa+ 1 b−aI(n)
rREP[1]
rREP[2]
rREP[3]
r. . .
. . .
. . .REP[t] . . .
. . .
rREP[R(n)]
rREP[t+ l]
a+ 2 b−aI(n) a+ (j − 1) b−a
I(n) a+ j b−aI(n) a+ (f1(n)− 1) b−a
I(n)
Query element y V
j =⌊
y−a
b−aI(n)⌋
+ 1 V Ij =[a + (j − 1) b−a
I(n) , a + jb−a
I(n)
]
Search for appropriate REP within Ij
− O(1) expected number of REPs
− distribution of elements between consecutive REPs: µ-random (smooth)
14 / 24
Analysis and the Subtle Case
[a, b] µ-random and smooth, (a′, b′] ⊆ [a, b]
Pr[X = λ | a′ < X ≤ b′] =
Pr[X = λ]
Pr[a′ < X < b′], a
′ < λ < b′ (1)
Consider X1, X2, X3 ∈ [a, b], and their ordering X(1) ≤ X(2) ≤ X(3)
Pr[X = λ | X(1) = a′∩X(3) = b
′] =Pr[X = λ ∩ X(1) = a′ ∩ X(3) = b′]
Pr[X(1) = a′ ∩ X(3) = b′](2)
Is (2) µ-random and smooth ??
15 / 24
Analysis and the Subtle Case
[a, b] µ-random and smooth, (a′, b′] ⊆ [a, b]
Pr[X = λ | a′ < X ≤ b′] =
Pr[X = λ]
Pr[a′ < X < b′], a
′ < λ < b′ (1)
Consider X1, X2, X3 ∈ [a, b], and their ordering X(1) ≤ X(2) ≤ X(3)
Pr[X = λ | X(1) = a′∩X(3) = b
′] =Pr[X = λ ∩ X(1) = a′ ∩ X(3) = b′]
Pr[X(1) = a′ ∩ X(3) = b′](2)
Is (2) µ-random and smooth ??
15 / 24
Analysis and the Subtle Case
[a, b] µ-random and smooth, (a′, b′] ⊆ [a, b]
Pr[X = λ | a′ < X ≤ b′] =
Pr[X = λ]
Pr[a′ < X < b′], a
′ < λ < b′ (1)
Consider X1, X2, X3 ∈ [a, b], and their ordering X(1) ≤ X(2) ≤ X(3)
Pr[X = λ | X(1) = a′∩X(3) = b
′] =Pr[X = λ ∩ X(1) = a′ ∩ X(3) = b′]
Pr[X(1) = a′ ∩ X(3) = b′](2)
Is (2) µ-random and smooth ??
15 / 24
Analysis and the Subtle Case
Distinct keys (Pr[key collision] = 0)
Event E1 = {X = λ ∩ X(1) = a′ ∩ X(3) = b′} occurs if ≥ 1 occurs
{X2 = λ, X3 = a′, X1 = b
′}, {X1 = λ, X2 = a′, X3 = b
′},{X3 = λ, X1 = a
′, X2 = b′}, {X1 = λ, X3 = a
′, X2 = b′},
{X3 = λ, X2 = a′, X1 = b
′}, {X2 = λ, X1 = a′, X3 = b
′}
Event E2 = {X(1) = a′ ∩ X(3) = b′} occurs if ≥ 1 occurs
{X1 = a′, X2 = b
′, a′ < X3 < b
′}, {X2 = a′, X1 = b
′, a′ < X3 < b
′},{X1 = a
′, X3 = b′, a
′ < X2 < b′}, {X3 = a
′, X1 = b′, a
′ < X2 < b′},
{X2 = a′, X3 = b
′, a′ < X1 < b′}, {X3 = a
′, X2 = b′, a
′ < X1 < b′}
(2) = Pr[E1]Pr[E2]
= 6Pr[X=λ] Pr[X=a′] Pr[X=b′]6Pr[X=a′] Pr[X=b′] Pr[a′<X<b′] = Pr[X=λ]
Pr[a′<X<b′] = (1)
16 / 24
Analysis and the Subtle Case
Non-distinct keys (Pr[key collision] 6= 0)
Event E1 = {X = λ ∩ X(1) = a′ ∩ X(3) = b′} occurs if ≥ 1 occurs
{X2 = λ, X3 = a′, X1 = b
′}, {X1 = λ, X2 = a′, X3 = b
′},{X3 = λ, X1 = a
′, X2 = b′}, {X1 = λ, X3 = a
′, X2 = b′},
{X3 = λ, X2 = a′, X1 = b
′}, {X2 = λ, X1 = a′, X3 = b
′}
Event E2 = {X(1) = a′ ∩ X(3) = b′} occurs if ≥ 1 occurs
{X1 = a′, X2 = b
′, a′ < X3 < b
′}, {X2 = a′, X1 = b
′, a′ < X3 < b
′},{X1 = a
′, X3 = b′, a
′ < X2 < b′}, {X3 = a
′, X1 = b′, a
′ < X2 < b′},
{X2 = a′, X3 = b
′, a′ < X1 < b′}, {X3 = a
′, X2 = b′, a
′ < X1 < b′}
{X1,2 = a′, X3 = b
′}, {X1,2 = b′, X3 = a
′}, {X1,3 = a′, X2 = b
′},{X1,3 = b
′, X2 = a′}, {X2,3 = a
′, X1 = b′}, {X2,3 = b
′, X1 = a′}
(2) = Pr[E1]Pr[E2]
= Pr[X=λ]Pr[X=a′]+Pr[X=b′]
2+Pr[a′<X<b′]
6= (1) !!
17 / 24
The New Data Structure
n elements drawn µ-randomly from [a, b]µ is (f1, f2) = (nα, nδ)-smooth, α, δ ∈ (0, 1)
Key idea
Forget about REPs
Partition [a, b] into f1(n) intervals, each of size (b − a)/f1(n)
Recurse on each interval (BIN) until |BIN| = O(poly log n)
18 / 24
The New Data Structure
1st Layer of f1(n) bins.
Each bin receives a ball with probability O(f2(n)n
)= O
(nδ
n
).
BIN(1) BIN(2) BIN(j1) BIN(f1(n))
a b
r r r . . .
n1 balls r r . . .
n2 balls r r r . . .
nj1 balls r r r r . . .
nf1(n) balls r. . .
. . .
. . .
. . .
a+ 1 b−af1(n)
a+ 2 b−af1(n)
a+ (j1 − 1) b−af1(n)
a+ j1b−af1(n)
a+ (f1(n)− 1) b−af1(n)
✥✥✥✥✥✥✥✥✥✥✥✥✥✥✥✥✥✥✥✥✥✥✥✥✥✥✥
✦✦✦✦✦✦✦✦✦✦✦✦✦✦✦✦✦
✄✄✄✄✄
PPPPPPPPPPPPPPPP
BIN(j1)’s 2nd Layer of f1(nj1) bins.
Each bin receives a ball with probability O(f2(nj1
)
nj1
)= O
(nδ2
nδ
).
BIN(j1, 1) BIN(j1, 2) BIN(j1, j2) BIN(j1, f1(nj1))
aj1 bj1
r . . .
nj1,1 balls r r r r . . .
nj1,2 balls r r r . . .
nj1,j2 balls r r . . .
nj1,f1(nj1) ballsr
. . .
. . .
. . .aj1 + 1bj1−aj1f1(nj1
) aj1 + 2bj1−aj1f1(nj1
) aj1 + (j2 − 1)bj1−aj1f1(nj1
) aj1 + j2bj1−aj1f1(nj1
)
Leaf BIN: q∗-heap [Willard,93]⇒ O(1) search & update time
19 / 24
The New Data Structure
Lem. 1 The elements of each BIN (subinterval) are µ-randomly distributed
Lem. 2 |BIN| = f2(|parent-BIN|) w.h.p.
⇓− 1st layer: |BIN| = O(nδ)
.
.
.
− k-th layer: |BIN| = O(nδk
)⇓
I Number of layers: O(log log n)
20 / 24
O(1) Search Time
First layer of f1(n) BINS.
Each BIN receives a ball with probability O(f2(n)n
)= O
(lnO(1) n
n
).
BIN(11) BIN(21) BIN(j1) BIN(f1(n)1)
a b
r r r . . .
n11 balls r r r r . . .
n21 balls r r r r . . .
nj1 balls r r r r . . .
nf1(n)1 balls r. . .
. . .
. . .
. . .
a+ 1 b−af1(n)
a+ 2 b−af1(n)
a+ (j1 − 1) b−af1(n)
a+ j1b−af1(n)
a+ (f1(n)− 1) b−af1(n)
µ: (f1(n), f2(n)) =(
n
g, lnO(1)
n
)-smooth, g: constant
|BIN| = O(lnO(1)n) w.h.p.
Implement each bin as q∗-heap V O(1) search time
21 / 24
Power-law Distribution
I1 I21 bnα
I1 is dense V van Emde Boas structure, O(log log |I1|) search time
I2 is sparse V distribution is (f1(n), poly log n)-smooth
Similar idea for Binomial
22 / 24
Revisiting Mehl-Tsak DIS - Conclusions
New dynamic interpolation search data structure
Supports non-distinct (duplicate) elements
Expected search time O(log log n) w.h.p. for smooth and other
distributions (e.g., power law)
O(1) expected search time for a subclass (largest so far) of smooth
distributions
23 / 24
Thanks
Thank you Thanassi for
introducing me (and others) to the intricacies of elementary data structures
your contributions to CS&E and for establishing it within Greece
your huge contribution in elevating the Dept of Computer Engineering &
Informatics in all respects (infrastructure, procedures, etc) within and out of
our University
the artistic touch you gave to CEID
... your support and friendship
Wish you all the best for the future
24 / 24
Thanks
Thank you Thanassi for
introducing me (and others) to the intricacies of elementary data structures
your contributions to CS&E and for establishing it within Greece
your huge contribution in elevating the Dept of Computer Engineering &
Informatics in all respects (infrastructure, procedures, etc) within and out of
our University
the artistic touch you gave to CEID
... your support and friendship
Wish you all the best for the future
24 / 24
Thanks
Thank you Thanassi for
introducing me (and others) to the intricacies of elementary data structures
your contributions to CS&E and for establishing it within Greece
your huge contribution in elevating the Dept of Computer Engineering &
Informatics in all respects (infrastructure, procedures, etc) within and out of
our University
the artistic touch you gave to CEID
... your support and friendship
Wish you all the best for the future
24 / 24
Thanks
Thank you Thanassi for
introducing me (and others) to the intricacies of elementary data structures
your contributions to CS&E and for establishing it within Greece
your huge contribution in elevating the Dept of Computer Engineering &
Informatics in all respects (infrastructure, procedures, etc) within and out of
our University
the artistic touch you gave to CEID
... your support and friendship
Wish you all the best for the future
24 / 24
Thanks
Thank you Thanassi for
introducing me (and others) to the intricacies of elementary data structures
your contributions to CS&E and for establishing it within Greece
your huge contribution in elevating the Dept of Computer Engineering &
Informatics in all respects (infrastructure, procedures, etc) within and out of
our University
the artistic touch you gave to CEID
... your support and friendship
Wish you all the best for the future
24 / 24
Thanks
Thank you Thanassi for
introducing me (and others) to the intricacies of elementary data structures
your contributions to CS&E and for establishing it within Greece
your huge contribution in elevating the Dept of Computer Engineering &
Informatics in all respects (infrastructure, procedures, etc) within and out of
our University
the artistic touch you gave to CEID
... your support and friendship
Wish you all the best for the future
24 / 24
Thanks
Thank you Thanassi for
introducing me (and others) to the intricacies of elementary data structures
your contributions to CS&E and for establishing it within Greece
your huge contribution in elevating the Dept of Computer Engineering &
Informatics in all respects (infrastructure, procedures, etc) within and out of
our University
the artistic touch you gave to CEID
... your support and friendship
Wish you all the best for the future
24 / 24
Thanks
Thank you Thanassi for
introducing me (and others) to the intricacies of elementary data structures
your contributions to CS&E and for establishing it within Greece
your huge contribution in elevating the Dept of Computer Engineering &
Informatics in all respects (infrastructure, procedures, etc) within and out of
our University
the artistic touch you gave to CEID
... your support and friendship
Wish you all the best for the future
24 / 24
Thanks
Thank you Thanassi for
introducing me (and others) to the intricacies of elementary data structures
your contributions to CS&E and for establishing it within Greece
your huge contribution in elevating the Dept of Computer Engineering &
Informatics in all respects (infrastructure, procedures, etc) within and out of
our University
the artistic touch you gave to CEID
... your support and friendship
Wish you all the best for the future
24 / 24
Top Related