ECE750-TXB Lecture 8: Treaps, Tries, and Hash Tablesece750-ads/notes/lecture08.pdf · I A Treap is...

22
ECE750-TXB Lecture 8: Treaps, Tries, and Hash Tables Todd L. Veldhuizen [email protected] Review: Treaps Tries Hash Tables Bibliography ECE750-TXB Lecture 8: Treaps, Tries, and Hash Tables Todd L. Veldhuizen [email protected] Electrical & Computer Engineering University of Waterloo Canada February 1, 2007 ECE750-TXB Lecture 8: Treaps, Tries, and Hash Tables Todd L. Veldhuizen [email protected] Review: Treaps Tries Hash Tables Bibliography Review: Treaps Recall that a binary search tree has keys drawn from a totally ordered structure K , An inorder traversal of the tree recovers the keys in ascending order. d b h a c f i

Transcript of ECE750-TXB Lecture 8: Treaps, Tries, and Hash Tablesece750-ads/notes/lecture08.pdf · I A Treap is...

Page 1: ECE750-TXB Lecture 8: Treaps, Tries, and Hash Tablesece750-ads/notes/lecture08.pdf · I A Treap is a mixture of a binary search tree and a heap: I A binary search tree with respect

ECE750-TXBLecture 8: Treaps,Tries, and Hash

Tables

Todd L.Veldhuizen

[email protected]

Review: Treaps

Tries

Hash Tables

Bibliography

ECE750-TXB Lecture 8: Treaps, Tries, andHash Tables

Todd L. [email protected]

Electrical & Computer EngineeringUniversity of Waterloo

Canada

February 1, 2007

ECE750-TXBLecture 8: Treaps,Tries, and Hash

Tables

Todd L.Veldhuizen

[email protected]

Review: Treaps

Tries

Hash Tables

Bibliography

Review: Treaps

I Recall that a binary search tree has keys drawn from atotally ordered structure 〈K ,≤〉

I An inorder traversal of the tree recovers the keys inascending order.

d

b h

a c f i

Page 2: ECE750-TXB Lecture 8: Treaps, Tries, and Hash Tablesece750-ads/notes/lecture08.pdf · I A Treap is a mixture of a binary search tree and a heap: I A binary search tree with respect

ECE750-TXBLecture 8: Treaps,Tries, and Hash

Tables

Todd L.Veldhuizen

[email protected]

Review: Treaps

Tries

Hash Tables

Bibliography

Review: Treaps

I Recall that a heap has priorities drawn from a totallyordered structure 〈P,≤〉

I The priority of a parent is ≥ that of its children (for amax heap.)

I The largest priority is at the root.

23

11 14

7 1 6 13

ECE750-TXBLecture 8: Treaps,Tries, and Hash

Tables

Todd L.Veldhuizen

[email protected]

Review: Treaps

Tries

Hash Tables

Bibliography

Review: Treaps

I In a treap, nodes contain a pair (k, p) where k ∈ K is akey, and p ∈ P is a priority.

I A Treap is a mixture of a binary search tree and a heap:

I A binary search tree with respect to keys;I A heap with respect to priorities.

(d,23)

(b,11) (h,14)

(a,7) (c,1) (f,6) (i,13)

Page 3: ECE750-TXB Lecture 8: Treaps, Tries, and Hash Tablesece750-ads/notes/lecture08.pdf · I A Treap is a mixture of a binary search tree and a heap: I A binary search tree with respect

ECE750-TXBLecture 8: Treaps,Tries, and Hash

Tables

Todd L.Veldhuizen

[email protected]

Review: Treaps

Tries

Hash Tables

Bibliography

Review: Unique Representation

I If the keys and priorities are unique, then treaps havethe unique representation property: given a set of (k, p)pairs, there is only one way to build the tree.

I For the heap property to be satisfied, there is only one(k, p) pair that can be the root: the one with thehighest priority.

I The left subtree of the root will contain all keys < k,and the right subtree of the root will contain all keys> k.

I Of the keys < k, the one with the highest priority mustoccupy the left child of the root. This then splitsconstructing the left subtree into two subproblems.

I etc.

ECE750-TXBLecture 8: Treaps,Tries, and Hash

Tables

Todd L.Veldhuizen

[email protected]

Review: Treaps

Tries

Hash Tables

Bibliography

Review: Unique Representation

I Example: to build a treap from{(i , 13), (c , 1), (d , 23), (b, 11), (h, 14), (a, 7), (f , 6)},unique choice of root: (d , 23)

(d , 23)jjjjj TTTTT

{(c, 1), (b, 11), (a, 7)} {(i , 13), (h, 14), (f , 6)}

I To build the left subtree, pick out the highest priorityelement: (b, 11). And so forth.

(d , 23)

tttt TTTTT

(b, 11)

uuuu KKK

K{(i , 13), (h, 14), (f , 6)}

(a, 7) (c, 1)

Page 4: ECE750-TXB Lecture 8: Treaps, Tries, and Hash Tablesece750-ads/notes/lecture08.pdf · I A Treap is a mixture of a binary search tree and a heap: I A binary search tree with respect

ECE750-TXBLecture 8: Treaps,Tries, and Hash

Tables

Todd L.Veldhuizen

[email protected]

Review: Treaps

Tries

Hash Tables

Bibliography

Review: Unique Representation

I Data structures with the unique representation can bechecked for equality in O(1) time by using caching (alsoknown as memoization):

I Implement the data structure in a purely functional style(a node’s fields are never altered after construction.Any changes require creating a new node.)

I Maintain a map from (key,priority, lchild, rchild)tuples to already constructed nodes.

I Before constructing a node, check the cache to see if italready exists; if so, return the pointer to that node.Otherwise, construct the node and add it to the cache.

I If two treaps contain the same keys, their root pointerswill be equal: can be checked in O(1) time.

I Checking and maintaining the cache requires additionaltime overhead.

ECE750-TXBLecture 8: Treaps,Tries, and Hash

Tables

Todd L.Veldhuizen

[email protected]

Review: Treaps

Tries

Hash Tables

Bibliography

Review: Balance of treaps

I Treaps are balanced if the priorities are chosenrandomly.

I Recall that building a binary search tree with a randominsertion order results in a tree of expected heightc log n, with c ≈ 4.311.

I A treap with random priorities assigned to keys hasexactly the same structure as a binary search treecreated by inserting keys in descending order of priority

I Descending order of priority is a random order;I Therefore treaps have expected height c log n with

c ≈ 4.311.

Page 5: ECE750-TXB Lecture 8: Treaps, Tries, and Hash Tablesece750-ads/notes/lecture08.pdf · I A Treap is a mixture of a binary search tree and a heap: I A binary search tree with respect

ECE750-TXBLecture 8: Treaps,Tries, and Hash

Tables

Todd L.Veldhuizen

[email protected]

Review: Treaps

Tries

Hash Tables

Bibliography

Insertion into treaps

I Insertion for treaps is much simpler than that forred-black trees.

1. Insert the (k, p) pair as for a binary search tree, by keyalone: the new node will be placed somewhere at thebottom of the tree.

2. Perform rotations along the path from the new leaf tothe root to restore invariants:

I If there is a node x whose right subchild has a higherpriority, rotate left at x .

I If there is a node x whose left subchild has a higherpriority, rotate right at x .

ECE750-TXBLecture 8: Treaps,Tries, and Hash

Tables

Todd L.Veldhuizen

[email protected]

Review: Treaps

Tries

Hash Tables

Bibliography

Insertion into treaps

I Example: the treap below has just had (e, 19) insertedas a new leaf. Rotations have not yet been performed.

(d,23)

(b,11) (h,14)

(a,7) (c,1) (f,6) (i,13)

(e,19)

I f has a left subchild with greater priority: rotate rightat f .

Page 6: ECE750-TXB Lecture 8: Treaps, Tries, and Hash Tablesece750-ads/notes/lecture08.pdf · I A Treap is a mixture of a binary search tree and a heap: I A binary search tree with respect

ECE750-TXBLecture 8: Treaps,Tries, and Hash

Tables

Todd L.Veldhuizen

[email protected]

Review: Treaps

Tries

Hash Tables

Bibliography

Insertion into treaps

I After rotating right at f :

(d,23)

(b,11) (h,14)

(a,7) (c,1) (e,19) (i,13)

(f,6)

I h has a left subchild with greater priority: rotate rightat h.

ECE750-TXBLecture 8: Treaps,Tries, and Hash

Tables

Todd L.Veldhuizen

[email protected]

Review: Treaps

Tries

Hash Tables

Bibliography

Insertion into treaps

I After rotating right at h:

(d,23)

(b,11) (e,19)

(a,7) (c,1) (h,14)

(f,6) (i,13)

I Heap invariant is satisfied: all done.

Page 7: ECE750-TXB Lecture 8: Treaps, Tries, and Hash Tablesece750-ads/notes/lecture08.pdf · I A Treap is a mixture of a binary search tree and a heap: I A binary search tree with respect

ECE750-TXBLecture 8: Treaps,Tries, and Hash

Tables

Todd L.Veldhuizen

[email protected]

Review: Treaps

Tries

Hash Tables

Bibliography

I Treaps are easily made persistent (retain previousversions) by implementing them in a purely functionalstyle. Insertion requires duplicating at most a sequenceof nodes from the root to a leaf: an O(log n) spaceoverhead. The remaining parts of the tree are shared.

I E.g. the previous insert done in a purely functional style:

Version 2

(d,23)

(b,11)

(e,19)

(a,7) (c,1)

(h,14)

(f,6) (i,13)

(d,23)

Version 1

ECE750-TXBLecture 8: Treaps,Tries, and Hash

Tables

Todd L.Veldhuizen

[email protected]

Review: Treaps

Tries

Hash Tables

Bibliography

Strings

I A string is a sequence of characters drawn from somealphabet Σ. We will often use Σ = {0, 1}: binarystrings.

I We write Σ∗ to mean all finite strings1 composed ofcharacters from Σ. (∗ is the Kleene closure.)

I Σ∗ contains the empty string ε.I If w , v ∈ Σ∗ are strings, we write w · v or just wv to

mean the concatenation of w and v .I Example: given w = 010 and v = 11, w · v = 01011.

〈Σ∗, ·, ε〉 is an example of a monoid: a set (Σ∗) together with anassociative binary operator (·) and an identity element (ε). Forany strings u, v , w ∈ Σ∗,

u · (v · w) = (u · v) · wvε = εv = v

1Infinite strings are very useful also: if we write a real numberx ∈ [0, 1] as a binary number e.g. 0.101100101000 · · · , this is arepresentation of x by an infinite string from Σω.

Page 8: ECE750-TXB Lecture 8: Treaps, Tries, and Hash Tablesece750-ads/notes/lecture08.pdf · I A Treap is a mixture of a binary search tree and a heap: I A binary search tree with respect

ECE750-TXBLecture 8: Treaps,Tries, and Hash

Tables

Todd L.Veldhuizen

[email protected]

Review: Treaps

Tries

Hash Tables

Bibliography

Tries

I Recall that we may label the left and right links of abinary tree with 0 (for left) and 1 (for right):

��������0yyy

y 1@@

@@

x ��������0���� 1::

::

y z

I To describe a path in the tree, one can list the sequenceof left/right branches to take from the root. E.g., 10gives y , 11 gives z .

I The set of all paths from the root to leaves isP◦ = {0, 10, 11}

I The set of all paths from the root to leaves or internalnodes is: P• = {ε, 0, 1, 10, 11}, where ε is the emptystring indicating the path starting and ending at theroot.

ECE750-TXBLecture 8: Treaps,Tries, and Hash

Tables

Todd L.Veldhuizen

[email protected]

Review: Treaps

Tries

Hash Tables

Bibliography

Tries

I The set P◦ is prefix-free: no string is an initial segmentof any other string. Otherwise, there would be a pathto a leaf passing through another leaf!

I The set P• is prefix-closed: if wv ∈ P•, then w ∈ P•

also. i.e., P• contains all prefixes of all strings in P•.2

2We can define • as an operator by A• ≡ {w : wv ∈ A}. • is aclosure operator. A useful fact: every closure operator has as its range acomplete lattice, where meet and join are given by (X uY )• = X • ∩Y •

and (X tY )• = (X • ∪Y •)•. Applying this fact to the representation ofbinary trees by strings, • induces a lattice of binary trees.

Page 9: ECE750-TXB Lecture 8: Treaps, Tries, and Hash Tablesece750-ads/notes/lecture08.pdf · I A Treap is a mixture of a binary search tree and a heap: I A binary search tree with respect

ECE750-TXBLecture 8: Treaps,Tries, and Hash

Tables

Todd L.Veldhuizen

[email protected]

Review: Treaps

Tries

Hash Tables

Bibliography

Tries

I Given a binary tree, we can produce a set of strings P•

or P◦ that describe all paths (resp. all paths to leaves).

I The converse is also true: given a set P• or P◦, we canreproduce the tree.3

I Example: the set {100, 11, 001, 01} is prefix free, andthe corresponding tree can be built by simply addingthe paths one-by-one to an initially empty tree:

��������0

ooooooooooooo1

OOOOOOOOOOOOO

��������0

����

���� 1

????

????

��������0

����

���� 1

????

????

��������1

????

????

�������� ��������0

����

����

���������������� ��������

3Formally we can say there is a bijection (a 1-1 correspondence)between binary trees and prefix-closed (resp. prefix-free) sets.

ECE750-TXBLecture 8: Treaps,Tries, and Hash

Tables

Todd L.Veldhuizen

[email protected]

Review: Treaps

Tries

Hash Tables

Bibliography

TriesI A tree constructed in this way — by interpreting a set

of strings as paths of the tree — is called a trie. (Theterm comes from reTRIEval; pronounced either “tree”or “try” depending on taste. Tries were invented by dela Briandais, and independently by Fredkin [5].)

I The most common use of a trie is to implement aDictionary〈K ,V 〉, i.e., maintaining a mapf : K ⇀ V by associating each k ∈ K with a paththrough the trie to a node where f (k) is stored.4

I Tries find applications in bioinformatics, coding andcompression, sorting, SAT solving, routing, naturallanguage processing, very large databases (VLDBs),data mining, etc.

I Binary Decision Diagrams (BDDs) are essentially trieswith caching and sharing of subtrees.

I Recent survey by Flajolet [4].4The notation K ⇀ V indicates a partial function from K to V : a

function that might not be defined for some keys.

Page 10: ECE750-TXB Lecture 8: Treaps, Tries, and Hash Tablesece750-ads/notes/lecture08.pdf · I A Treap is a mixture of a binary search tree and a heap: I A binary search tree with respect

ECE750-TXBLecture 8: Treaps,Tries, and Hash

Tables

Todd L.Veldhuizen

[email protected]

Review: Treaps

Tries

Hash Tables

Bibliography

Trie example: word list

I Example: build a trie to store english words: trie, larch,saxophone, tried, saxifrage, squeak, try, squeak,squeaky, squeakily, squeakier.

I Common implementation variants of a trie:I associate internal nodes with entries also, if one occurs

there. (Can use 1 bit on internal nodes to indicatewhether a key terminates there.)

I when a node has only one descendent, end the triethere, rather than including a possibly long chain ofnodes with single children.

I Use the trie to store keys only; implicitly the values weare storing are V = {0, 1}. The function the trierepresents is a map χ : K → {0, 1} where χ is thecharacteristic function of the set: χ(k) = 1 if and onlyif k is in the set.

I Use the alphabet {a, b, · · · , z}.I Instead of having a 26-way branch in each node, put a

little BST at each node with up to 26 elements in it (a“ternary search trie” [1])

ECE750-TXBLecture 8: Treaps,Tries, and Hash

Tables

Todd L.Veldhuizen

[email protected]

Review: Treaps

Tries

Hash Tables

Bibliography

Trie example: wordlist

larchl

st

a

q

r

x

u e a squeakki

squeakyy

squeakiere

squeakilyl

saxifragei

saxophoneo

i

try

y triee triedd

Page 11: ECE750-TXB Lecture 8: Treaps, Tries, and Hash Tablesece750-ads/notes/lecture08.pdf · I A Treap is a mixture of a binary search tree and a heap: I A binary search tree with respect

ECE750-TXBLecture 8: Treaps,Tries, and Hash

Tables

Todd L.Veldhuizen

[email protected]

Review: Treaps

Tries

Hash Tables

Bibliography

Trie example: coding

I Suppose we want to transmit (or compress) data.

I At the receiving (or decoding) end, we will have a longstring of bits to decode.

I A simple but effective strategy is to build a codebookthat maps binary codewords to plaintext. The incomingtransmission is then just a sequence of codewords thatwe will replace, one by one, with their correspondingplaintext.5

I A code that can be described by a trie, with outputsonly at the leaves, is an example of a uniquelydecodeable code: there is only one way an encodedmessage can be decoded. Specifically, such codes arecalled prefix codes or instantaneous codes.

5This strategy is asymptotically optimal (achieves a bitrate ≤ H + εfor any ε > 0) for stationary ergodic random processes, with anappropriate choice of codebook.

ECE750-TXBLecture 8: Treaps,Tries, and Hash

Tables

Todd L.Veldhuizen

[email protected]

Review: Treaps

Tries

Hash Tables

Bibliography

Trie example: coding

I Example: to encode english, we might assign codewordsto sequences of three letters, giving the most frequentwords shorter codes:

Three-letter combination Codewordthe 000and 001for 010are 011but 100not 1010you 1011all 1100...

...etc 11101101...

...qxw 1111011001101001

I These codewords are chosen to be a prefix-free set.

Page 12: ECE750-TXB Lecture 8: Treaps, Tries, and Hash Tablesece750-ads/notes/lecture08.pdf · I A Treap is a mixture of a binary search tree and a heap: I A binary search tree with respect

ECE750-TXBLecture 8: Treaps,Tries, and Hash

Tables

Todd L.Veldhuizen

[email protected]

Review: Treaps

Tries

Hash Tables

Bibliography

Trie example: coding

I For decoding messages we build a trie:

0 1

0 1 0 1

the

0

and

1

for

0

are

1

but

01 0 1

not

0

you

1

all

0 1 0 1

0 1

ECE750-TXBLecture 8: Treaps,Tries, and Hash

Tables

Todd L.Veldhuizen

[email protected]

Review: Treaps

Tries

Hash Tables

Bibliography

Trie example: decoding

I Incoming message: 100101001010111100

I To decode: start at root of trie, follow path given bybits. When a leaf is reached, output the word there,and return to the root.

100︸︷︷︸but

1010︸︷︷︸not

010︸︷︷︸for

1011︸︷︷︸you

1100︸︷︷︸all

I This requires substantially fewer bits than transmittingas ASCII text (24 bits per 3-letter sequence).

I A good code assigns short codewords tofrequently-occurring strings; if a string occurs withprobability pi , one wants the codeword to have lengthabout − log2 pi .

I Later in the course we shall see how such codes can beconstructed optimally using a greedy algorithm.

Page 13: ECE750-TXB Lecture 8: Treaps, Tries, and Hash Tablesece750-ads/notes/lecture08.pdf · I A Treap is a mixture of a binary search tree and a heap: I A binary search tree with respect

ECE750-TXBLecture 8: Treaps,Tries, and Hash

Tables

Todd L.Veldhuizen

[email protected]

Review: Treaps

Tries

Hash Tables

Bibliography

Tries: Kraft’s inequality

I Kraft’s inequality is a simple constraint on the lengthsof codewords in a prefix code (equivalently, leaf depthsin a binary tree.)

Theorem (Kraft)

Let (d1, d2, . . .) be a sequence of code lengths of a code.There is a prefix code with code lengths d1, d2, . . .(equivalently, a binary tree with leaves at depth d1, d2, . . .) ifand only if

n∑i=1

2−di ≤ 1 (1)

ECE750-TXBLecture 8: Treaps,Tries, and Hash

Tables

Todd L.Veldhuizen

[email protected]

Review: Treaps

Tries

Hash Tables

Bibliography

Tries: Kraft’s inequality I

I Positive example: the codeword lengths 3, 3, 2, 2 satisfyKraft’s inequality: 1

8 + 18 + 1

4 + 14 = 3

4 . Possible trierealization:

��������0ooooo 1

OOOOO��������0��

� 1??? ��������0

�����������0

��� 1??

? �������� ���������������� ��������

I Negative example: the codeword lengths 3, 3, 3, 2, 2, 2violate Kraft’s inequality: sum is 9

8 .

I Kraft’s inequality becomes an equality for trees in whichevery internal node has two children.

Page 14: ECE750-TXB Lecture 8: Treaps, Tries, and Hash Tablesece750-ads/notes/lecture08.pdf · I A Treap is a mixture of a binary search tree and a heap: I A binary search tree with respect

ECE750-TXBLecture 8: Treaps,Tries, and Hash

Tables

Todd L.Veldhuizen

[email protected]

Review: Treaps

Tries

Hash Tables

Bibliography

Tries: Kraft’s inequality

Two ways to prove Kraft’s inequality:I Put each node of a binary tree in correspondence with a

subinterval of [0, 1] on the real line: root is [0, 1], its children get[0, 1

2] and [ 1

2, 1]. Each node at depth d receives an interval of

length 2−d and splits it in half for its children. The union of theintervals at the leaves is ⊆ [0, 1], and the intervals at the leavesare pairwise disjoint, so the sum of their interval lengths is ≤ 1.

I Kraft’s inequality can also be proved with a simple inductionargument. The list of valid codeword length sequences can begenerated from the initial sequence 〈1, 1〉 (codewords {0, 1}) bythe rewrite rules k → k + 1, k + 1 (expand a node into twochildren) and k → k + 1 (expand a node to have a single child).Base case: with 〈1, 1〉 obviously 2−1 + 2−1 = 1. Induction step: ifsum is ≤ 1, consider expanding a single element of the sequence:have either the rewrite k → k + 1, k + 1, and 2k ≥ 2k−1 + 2k−1; orthe rewrite k → k + 1, and 2k ≥ 2k−1. So rewrites never increasethe “weight” of a node.

ECE750-TXBLecture 8: Treaps,Tries, and Hash

Tables

Todd L.Veldhuizen

[email protected]

Review: Treaps

Tries

Hash Tables

Bibliography

Tries: Kraft’s inequality I

It is occasionally useful to have an infinite set of codewordshandy, in case we do not know in advance how manydifferent objects we might need to code.For an infinite set of codewords (or infinite binary tree),Kraft’s inequality implies

dk ≥ c + log+ k + log log+ log∗ k infinitely often (2)

where

log+ x ≡ log x + log log x + log log log x + · · ·

with the sum taken only over the positive terms, and log∗ xis the “iterated logarithm” —

log∗ x =

{0 if x ≤ 1

1 + log∗(log x) otherwise

Page 15: ECE750-TXB Lecture 8: Treaps, Tries, and Hash Tablesece750-ads/notes/lecture08.pdf · I A Treap is a mixture of a binary search tree and a heap: I A binary search tree with respect

ECE750-TXBLecture 8: Treaps,Tries, and Hash

Tables

Todd L.Veldhuizen

[email protected]

Review: Treaps

Tries

Hash Tables

Bibliography

Tries: Kraft’s inequality II

See e.g., [2, 9].Where does this bound come from? Well, a necessary condition for

∞Xk=0

2−dk ≤ 1

to hold is that the seriesP∞

k=0 2−dk converges. For example, ifdk = log k, then 2−dk = 1

k, the Harmonic series. The Harmonic series

diverges, so Kraft’s inequality can’t hold.We can parlay this into an inequality by remembering the “comparisontest” for convergence of series: if ak , bk are two positive series, andak ≤ bk for all k, then

Pak ≤

Pbk . If we stick the Harmonic series in

for ak and 2−dk for bk , we get:

If 1k≤ 2−dk for all k then ∞ ≤

P2−dk .

The premiss of this test must be false ifP

2−dk does not diverge toinfinity. Therefore 2−dk must be < 1

kfor at least some k. If 2−dk < 1

k

for only some finite number of choices of k, the series would stilldiverge. So, a necessary condition for 2−dk to converge is that 2−dk < 1

k

ECE750-TXBLecture 8: Treaps,Tries, and Hash

Tables

Todd L.Veldhuizen

[email protected]

Review: Treaps

Tries

Hash Tables

Bibliography

Tries: Kraft’s inequality III

for infinitely many terms. Taking logarithms and multiplying through by−1 we get dk > log k for infinitely many i .We can generalize this by saying that if g ∈ ω(1) is any divergingfunction, then dk > − log g ′(k) for infinitely many k. (The Harmonicseries bound follows from choosing g(x) = log x .) Unfortunately thereis no “slowest growing function” g(x) from which we could obtain atightest possible bound.

Eqn. (2) is from [2]; Bentley credits the result to Ronald Graham and

Fan Chung, apparently unpublished.

Page 16: ECE750-TXB Lecture 8: Treaps, Tries, and Hash Tablesece750-ads/notes/lecture08.pdf · I A Treap is a mixture of a binary search tree and a heap: I A binary search tree with respect

ECE750-TXBLecture 8: Treaps,Tries, and Hash

Tables

Todd L.Veldhuizen

[email protected]

Review: Treaps

Tries

Hash Tables

Bibliography

Tries: Variations on a theme I

There are many useful variants of tries [4]:

I Multiway branching: instead of choosing Σ = {0, 1},one can choose any finite alphabet, and allow eachnode to have |Σ| children.

I Paged trie: each node is required to have a minimalnumber of leaves descended from it; when thisthreshold is not met, the subtree is converted into acompact form (e.g., an array of keys and values)suitable for secondary storage. This technique can alsobe used to increase performance in main memory [6].

I Patricia tries [7] (“Practical Algorithm To RetrieveInformation Coded in Alphanumeric6”) Introduce skippointers to avoid long sequences of single-branch nodeslike

�������� 0 //�������� 1 //�������� 1 //�������� 0 //��������

ECE750-TXBLecture 8: Treaps,Tries, and Hash

Tables

Todd L.Veldhuizen

[email protected]

Review: Treaps

Tries

Hash Tables

Bibliography

Tries: Variations on a theme II

I LC-Trie: the first few levels of a big trie tend to bealmost a complete binary tree of some depth, which canbe collapsed into an array of pointers to tries [8].

I Ternary Search Tries (TSTs): a blend of a trie and aBST; can require substantially less space than a trie.For a large |Σ|, replace a |Σ|-way branch at eachinternal node with a BST of depth ≤ log |Σ|.

6Almost better than my all-time favourite strained CS acronym,PERIDOT: “Programming by Example for Real-time Interface DesignObviating Typing.” Great project, despite the acronym.

Page 17: ECE750-TXB Lecture 8: Treaps, Tries, and Hash Tablesece750-ads/notes/lecture08.pdf · I A Treap is a mixture of a binary search tree and a heap: I A binary search tree with respect

ECE750-TXBLecture 8: Treaps,Tries, and Hash

Tables

Todd L.Veldhuizen

[email protected]

Review: Treaps

Tries

Hash Tables

Bibliography

Hash Tables

I Suppose we wanted to represent the following set:

M = {35, 139, 395, 1691, 1760, 1795, 3632, 3789, 4657}

Given some x ∈ N, we want to quickly test whetherx ∈ M.

I Binary search trees: require following a path through atree — perhaps not fast enough for our problem.

I Super fast way: allocate an array of 4657 bytes. Set

A[i ] =

{0 if i 6∈ M

1 if i ∈ M

Then, on a RAM, can test whether x ∈ M with a singlememory access to A[i ] (a constant amount of time).However, space required by this strategy is O(supM).

ECE750-TXBLecture 8: Treaps,Tries, and Hash

Tables

Todd L.Veldhuizen

[email protected]

Review: Treaps

Tries

Hash Tables

Bibliography

Hash Tables

I Obviously the array A would contain mostly emptyspace. Can we somehow “compress” the array but stillsupport fast access?

I Yes: allocate a much smaller table B of length k.Define a function h : [1, 4657] → [1, k] that mapsindices of A to indices of B, can be computed quickly,and ensures that if x , y ∈ M and x 6= y , thenh(x) 6= h(y) i.e., no two elements of M have the sameindex in B.

I Then, x ∈ M if and only if B[h(x)] = x .

Page 18: ECE750-TXB Lecture 8: Treaps, Tries, and Hash Tablesece750-ads/notes/lecture08.pdf · I A Treap is a mixture of a binary search tree and a heap: I A binary search tree with respect

ECE750-TXBLecture 8: Treaps,Tries, and Hash

Tables

Todd L.Veldhuizen

[email protected]

Review: Treaps

Tries

Hash Tables

Bibliography

Hash Tables

I For our example, h(x) = x mod 17 does the trick. Hereis the array B:

j B[j ]

0 01 352 03 1394 3955 0

j B[j ]

6 07 08 16919 176010 179511 3632

j B[j ]

12 013 014 015 378916 4657

I e.g.: x = 1691: h(x) = 8, and B[8] = 1691, so x ∈ M.

I e.g.: x = 1692: h(x) = 9, and B[9] = 1760 6= 1692, sox 6∈ M.

I This is a hash table. h(x) = x mod 17 is called a hashfunction.

ECE750-TXBLecture 8: Treaps,Tries, and Hash

Tables

Todd L.Veldhuizen

[email protected]

Review: Treaps

Tries

Hash Tables

Bibliography

Hash Functions

I A hash function is a map h : K → H from some(usually large) key space K to some (usually small) setof hash values H. In our example, we were mappingfrom K = [1, 4657] to H = [1, 17].

I If the set M ⊆ K is chosen uniformly at random, keysare uniformly distributed (i.e., each k ∈ K has the sameprobability of appearing in a set to represent). In thiscase the hash function should distribute the keys evenlyamongst elements of H, i.e., we want that|h−1(y)| ≈ |h−1(z)| for y , z ∈ H.7

I For a nonuniform distribution on keys, one just wants to choose h

so that the distribution induced on H is close to uniform.

7Recall that for a function f : R → S , f −1(s) ≡ {r : f (r) = s}.

Page 19: ECE750-TXB Lecture 8: Treaps, Tries, and Hash Tablesece750-ads/notes/lecture08.pdf · I A Treap is a mixture of a binary search tree and a heap: I A binary search tree with respect

ECE750-TXBLecture 8: Treaps,Tries, and Hash

Tables

Todd L.Veldhuizen

[email protected]

Review: Treaps

Tries

Hash Tables

Bibliography

Hash Functions

I We will describe some hash functions where K = N(keys are nonnegative integers). These are easilyadapted to other kinds of keys (e.g., strings) byinterpreting the binary representation of the key as aninteger.

Some commonly used hash functions are the following:

1. Division: use h(k) = k mod m where m = |H| is usuallychosen to be a prime number far away from any powerof 2. (Note.8)

I For long bit strings, use Horner’s rule for evaluatingpolynomials in Z/mZ (will explain.)

2. Multiplication: use h(k) = bm{kφ}c, where 0 < φ < 1is an irrational number and {x} ≡ x − bxc. A popular

choice of φ is φ =√

5−12 .

8A particularly terrible choice would be m = 256, which would hashobjects based only on their lowest 8 bits. e.g., the hash of a stringwould depend only on its last character.

ECE750-TXBLecture 8: Treaps,Tries, and Hash

Tables

Todd L.Veldhuizen

[email protected]

Review: Treaps

Tries

Hash Tables

Bibliography

Multiplication hash functions: ExampleExample of multiplication hash function using φ =

√5−12

, and hashtable with m = 100 slots:

key {kφ} bm{kφ}c1 0.618034 61.2 0.236068 23.3 0.854102 85.4 0.472136 47.5 0.090170 9.6 0.708204 70.7 0.326238 32.8 0.944272 94.9 0.562306 56.

10 0.180340 18.11 0.798374 79.12 0.416408 41.13 0.034442 3.14 0.652476 65.15 0.270510 27.16 0.888544 88.17 0.506578 50.

Idea is that the third column (the hash slots) ‘looks like’ a random

sequence.

Page 20: ECE750-TXB Lecture 8: Treaps, Tries, and Hash Tablesece750-ads/notes/lecture08.pdf · I A Treap is a mixture of a binary search tree and a heap: I A binary search tree with respect

ECE750-TXBLecture 8: Treaps,Tries, and Hash

Tables

Todd L.Veldhuizen

[email protected]

Review: Treaps

Tries

Hash Tables

Bibliography

Multiplication hash functionsI The reason why h(k) = bm{kφ}c is a reasonable hash

function is interesting.I The short answer is that the sequence {kφ} for

k = 1, 2, 3, . . . ‘kind of behaves like’ a random realdrawn from (0, 1). So, h(k) = bm{kφ}c ‘looks like’ arandomly chosen hash function.A less sketchy explanation:

1. {kφ} is uniformly distributed on (0, 1): asymptotically,the proportion of {kφ} falling in an interval (α, β)where (α, β) ⊆ (0, 1) is (β − α). Just like a uniformdistribution on (0, 1)!

2. {kφ} satisfies an ergodic theorem: if we sample asuitably well-behaved9 function f at points {kφ} andaverage, this converges to the integral:

1

m

m∑k=1

f ({kφ}) →∫ 1

0

f (x)dx

Just like a uniform distribution on (0, 1)!See [3]. Variously called Weyl’s ergodic principle, Weyl’sequidistribution theorem.

However, {kφ} is emphatically not a random sequence.9Continuously differentiable and periodic with period 1

ECE750-TXBLecture 8: Treaps,Tries, and Hash

Tables

Todd L.Veldhuizen

[email protected]

Review: Treaps

Tries

Hash Tables

Bibliography

Hash FunctionsI To evaluate whether a hash function is a good choice

for a set of data S ⊆ K , one can see how the observeddistribution of keys into hash table slots compares to auniform distribution.

I Suppose there are n keys and m hash slots. Computethe observed distribution of the keys:

p̂i =|{k : h(k) = i}|

nI To measure how far from uniform, compute

D(P̂||U) = log2 m +m∑

i=1

p̂i log2 p̂i

Convention: 0 log2 0 = 0.

I This is the Kullback-Leibler divergence of the observeddistribution P̂ from the uniform distribution U. It maybe thought of as the “distance” from P̂ to U.

I The smaller D(P̂||U), the better the hash function.

Page 21: ECE750-TXB Lecture 8: Treaps, Tries, and Hash Tablesece750-ads/notes/lecture08.pdf · I A Treap is a mixture of a binary search tree and a heap: I A binary search tree with respect

ECE750-TXBLecture 8: Treaps,Tries, and Hash

Tables

Todd L.Veldhuizen

[email protected]

Review: Treaps

Tries

Hash Tables

Bibliography

Bibliography I

[1] Jon L. Bentley and Robert Sedgewick.Fast algorithms for sorting and searching strings.In SODA ’97: Proceedings of the eighth annualACM-SIAM symposium on Discrete algorithms, pages360–369, Philadelphia, PA, USA, 1997. Society forIndustrial and Applied Mathematics. bib

[2] Jon Louis Bentley and Andrew Chi Chih Yao.An almost optimal algorithm for unbounded searching.Information Processing Lett., 5(3):82–87, 1976. bib pdf

[3] Bernard Chazelle.The Discrepancy Method — Randomness andComplexity.Cambridge University Press, Cambridge, 2000. bib

ECE750-TXBLecture 8: Treaps,Tries, and Hash

Tables

Todd L.Veldhuizen

[email protected]

Review: Treaps

Tries

Hash Tables

Bibliography

Bibliography II

[4] Philippe Flajolet.The ubiquitous digital tree.In Bruno Durand and Wolfgang Thomas, editors,STACS, volume 3884 of Lecture Notes in ComputerScience, pages 1–22. Springer, 2006. bib pdf

[5] Edward Fredkin.Trie memory.Commun. ACM, 3(9):490–499, 1960. bib

[6] Steffen Heinz, Justin Zobel, and Hugh E. Williams.Burst tries: a fast, efficient data structure for string keys.

ACM Trans. Inf. Syst., 20(2):192–223, 2002. bib

Page 22: ECE750-TXB Lecture 8: Treaps, Tries, and Hash Tablesece750-ads/notes/lecture08.pdf · I A Treap is a mixture of a binary search tree and a heap: I A binary search tree with respect

ECE750-TXBLecture 8: Treaps,Tries, and Hash

Tables

Todd L.Veldhuizen

[email protected]

Review: Treaps

Tries

Hash Tables

Bibliography

Bibliography III

[7] Donald R. Morrison.PATRICIA—practical algorithm to retrieve informationcoded in alphanumeric.J. ACM, 15(4):514–534, 1968. bib pdf

[8] Stefan Nilsson and Gunnar Karlsson.IP-address lookup using LC-tries.IEEE Journal on Selected Areas in Communications,17:1083–1092, June 1999. bib

[9] Jorma Rissanen.Stochastic Complexity in Statistical Inquiry, volume 15of Series in Computer Science.World Scientific, 1989. bib