History-IndependentCuckoo Hashing
Weizmann InstituteIsrael
Udi WiederMoni Naor Gil Segev
Microsoft Research Silicon Valley
2
Election DayCarol
Bob
Carol
Elections for class president Each student whispers in Mr. Drew’s ear Mr. Drew writes down the votes
Alice Alice Bob
Alice Problem:
Mr. Drew’s notebook leaks sensitive information First student voted for Carol Second student voted for Alice …
AliceMay compromise the
privacy of the elections
3
Election Day
Carol
AliceBob
11
1
1
Carol Alice Alice Bob What about more involved applications?
Write-in candidates Votes which are subsets or rankings ….
A simple solution: Lexicographically sorted list of
candidates Unary counters
4
Learning From History
A simple example: sorted list Canonical memory representation Not really efficient...
The two levels of a data structure “Legitimate” interface Memory representation
History independence The memory representation should not reveal information that cannot be obtained using the legitimate interface
AliceBob
Carol
5
Typical Applications
Incremental cryptography [BGG94, Mic97]
Voting [MKSW06, MNS07]
Set comparison & reconciliation [MNS08]
Computational Geometry [BGV08]
...
6
Our ContributionThe first HI dictionary that simultaneously achieves the following:
Efficiency: Lookup time – O(1) worst case Update time – O(1) expected amortized Memory utilization 50% (25% with deletions)
Strongest notion of history independence
Simple and fast
7
Notions of History Independence Micciancio (1997): oblivious trees
Motivated by incremental cryptography Only considered the shape of the trees and not their memory
representation
Naor and Teague (2001) Memory representation Weak & strong history independence
8
Notions of History Independence Weak history independence
Memory revealed at the end of an activity period Any two sequences of operations S1 and S2 that lead to the same
content induce the same distribution on the memory representation
Strong history independence Memory revealed several times during an activity period Any two sets of breakpoints along S1 and S2 with the same content
at each breakpoint, induce the same distributions on the memory representation at all these points
Completely randomizing memory after each operation is not good enough.
9
Notions of History Independence
Weak & strong are not equivalent WHI for reversible data structures is possible without a canonical
representation Provable efficiency gaps [BP06] (in restricted models)
We consider strong history independence Canonical representation (up to initial randomness) implies SHI Other direction shown to hold for reversible data structures
[HHMPR05]
10
SHI DictionariesDeletions
Memory utilization
Update time
Lookup time
Practical?
Naor & Teague ‘01
Blelloch & Golovin ‘07
Blelloch & Golovin ‘07
This work
99%
99%
< 9%
> 25%(> 50%)
O(1) expected
O(1) expected
O(1) expected
O(1) expected
O(1) worst case
O(1) expected
O(1) worst case
O(1) worst case
?
(mem. util. < 50%)
(mem. util. < 50%)
11
Our Approach Cuckoo hashing [PR01]:
A simple & practical scheme with worst case constant lookup time
Force a canonical representation on cuckoo hashing No significant loss in efficiency
Avoid rehashing by using a small stash What happens when hash functions fail? Rehashing is highly problematic in SHI data structures
All hash functions need to be sampled in advance When an item is deleted, may need to roll back on previous functions
We use a secondary storage to reduces the failure probability exponentially [KMW08]
12
Cuckoo Hashing Tables T1 and T2 with hash functions h1 and h2
Store x in one of T1[h1(x)] and T2[h2(x)]
Insert(x): Greedily insert in T1 orT2
if both are full insert in T1
Repeat in other table with the previous occupant (if any)
Y
Z
V
T1 T2
X
Z
Y
V
T1 T2
X
Successful insertion
W W
13
Cuckoo Hashing Tables T1 and T2 with hash functions h1 and h2
Store x in one of T1[h1(x)] and T2[h2(x)]
Y
U
Z
V
T1 T2
X
Failure –rehash
required
Insert(x): Greedily insert in T1 orT2
if both are full insert in T1
Repeat in other table with the previous occupant (if any)
14
The Cuckoo Graph Set S ½ U containing n keys h1, h2 : U ! {1,...,r}
Bipartite graph with sets of size rEdge (h1(x), h2(x)) for every x2S
S is successfully stored
Every connected componenthas at most one cycle
Main theorem:
If r ¸ (1 + ²)n and h1,h2 are log(n)-wise independent,then failure probability is £(1/n)
15
The Canonical Representation Assume that S can be stored using h1 and h2 We force a canonical representation on the cuckoo graph
Suffices to consider a single connected component
Assume that S forms a tree in the cuckoo graph. Typical case
One location must be empty. The choice of the empty location uniquely determines the location of all elements
a
b
d
c
eRule: h1 (minimal element) is empty
16
The Canonical Representation Assume that S can be stored using h1 and h2 We force a canonical representation on the cuckoo graph
Suffices to consider a single connected component
Assume that S has one cycle Two ways to assign elements in the
cycle Each choice uniquely determines the
location of all elements
a
b
d
c
eRule: minimal element in cycle lies in T1
17
The Canonical Representation Updates efficiently maintain the canonical representation Insertions:
New leaf: check if new element is smaller than current min new cycle:
Same component… Merging two components…
All cases straight forward
Update time < size of component = expected (small) constant
Deletions: Find the new min, split component,… Requires connecting all elements in the component with a cyclic list
Memory utilization drops to 25% All cases straight forward
18
Rehashing What if S cannot be stored using h1 and h2 ?
Happens with probability 1/n Can we simply pick new functions?
Canonical memory implies we need to sample all hash functions in advance
Whenever an item is deleted, need to check whether we can role back to previous hash functions
A bad item which is repeatedly inserted and deleted would cause a rehash every operation!
19
Using a Stash Whenever an insert fails, put a ‘bad’ item in a secondary data
structure Bad item: smallest item that belongs to a cycle Secondary data structure must be SHI in itself
Theorem [KMW08]: Pr[|stash| > s] < n-s
In practice keeping the stash as a sorted list is probably the best solution Effectively the query time is constant with (very) high probability
In theory the stash could be any SHI with constant lookup time A deterministic hashing scheme, where the elements are rehashed
whenever the content changes [AN96, HMP01]
20
Conclusions and Problems Cuckoo hashing is a robust and flexible hashing scheme
Easily ‘molded’ into a history independent data structure
We don’t know how to analyze variants with more than 2 hash functions and/or more than 1 element per bucket Expected size of connected component is not constant
Full performance analysis
Top Related