Hash Tables:. Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or...

43
Hash Tables:

Transcript of Hash Tables:. Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or...

Page 1: Hash Tables:. Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.

Hash Tables:

Page 2: Hash Tables:. Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.

Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.

Page 3: Hash Tables:. Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.

Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.

Page 4: Hash Tables:. Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.

Analysis of hashing with chaining:

• Given a hash table with m slots and n keys, define load factor = n/m : average number of keys per slot.

• Assume each key is equally likely to be hashed into any slot: simple uniform hashing (SUH).

Page 5: Hash Tables:. Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.

• Thm: In a hash table in which collisions are resolved by chaining, an unsuccessful search takes expected time Θ(1+ ) under SUH.

Proof:

Under the assumption of SUH, any un-stored key is equally likely to hash to any of the m slots.

The expected time to search unsuccessfully for a key k is the expected time to search to the end of list T[h(k)], which is exactly .

Thus, the total time required is Θ(1+ ). □

Page 6: Hash Tables:. Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.

• Thm: In a hash table in which collisions are resolved by chaining, a successful search takes time Θ(1+ ), on the average under SUH.

Proof:Let the element being searched for equally likely

to be any of the n elements stored in the table.The expected time to search successfully for a

key.

Elements before x in the list were inserted after x was inserted.

We want to find the expected number of elements added to x’s list after x was added to the list.

x.... ....

Page 7: Hash Tables:. Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.

Let xi denote the ith element into the table, for i =1 to n, and let ki=key[xi].

Define Xij = I{ h(ki)=h(kj) }. Under SUH, we have Pr{ h(ki)=h(kj) } = 1/m = E[Xij ].

1 1 1 1

1 1 1

2

(

1 1E[ (1 )] (1 E[ ])

1 1 1(1 ) 1 ( )

1 ( 1)

2 ) (1 ).

11 ( ) 1 1

2 2 2

2 2

2

n n n n

ij iji j i i j i

n n n

i j i i

X Xn n

n in m mn

n n nn

mn m n

n

Page 8: Hash Tables:. Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.

Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.

The multiplication method:0<A<1

14

32

32

Knuth suggests that ( 5 1) / 2 0.6180339887...

Take 123456, 14, 2 16384 and 32.

2 2654

1761

435769

327706022297664 (76300 2 )

The 14 most significant bits of is

12864

176112 !67864

A

k p m w

s A

k s

Page 9: Hash Tables:. Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.

Universal Hashing

• H={ h: U→{0,…,m-1} }, which is a finite collection of hash functions.

• H is called “universal” if for each pair of distinct keys k, U, the number of hash functions h H for ∈ ∈which h(k)=h( ) is at most |H|/m

• Define ni = the length of list T[i]

• Thm: suppose h is randomly selected from H, using chaining to resolve collisions. If k is not in the table, then E[nh(k)]

≤ α. If k is in the table, then E[nh(k)] ≤ 1+α

Page 10: Hash Tables:. Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.

• Proof:– For each pair k and of distinct keys,

define Xk =I{h(k)=h( )}.

– By definition, Prh{h(k)=h( )} ≤ 1/m, and so E[Xk ] ≤ 1/m.

– Define Yk to be the number of keys other than k that hash to the same slot as k, so that

1[ ] [ ]

k k

Tk

k k

T Tk k

Y X

E Y E Xm

Page 11: Hash Tables:. Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.

– If k T, then because k appears in T[h(k)] and the count ∈Yk does not include k, we have nh(k) = Yk + 1

and

( )

( )

, |{ : , } |

[ ] [ ]

h k k

h k k

If k T then n Y and T k n

nthus E n E Y

m

( )

|{ : , } | 1

1 1[ ] [ ] 1 1 1 1h k k

T k n

nThus E n E Y

m m

Page 12: Hash Tables:. Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.

Designing a universal class of hash functions:

p:prime

For any and , define

, ha,b:Zp→Zm

1,,1,0 pZ p 1,,2,1 pZ p

pZb pZa

mpbakkh ba mod)mod)(()(,

ppbamp ZbandZah *,, :

Page 13: Hash Tables:. Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.

Theorem:

Hp,m is universal.

Pf: Let k, be two distinct keys in Zp.

Given ha,b, Let r=(ak+b) mod p , and

s=(a +b) mod p.

Then r-s≡a(k- ) mod p

Page 14: Hash Tables:. Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.

For any ha,b∈Hp,m, distinct inputs k and map to distinct r and s modulo p.

Each possible p(p-1) choices for the pair (a,b) with a≠0 yields a different resulting pair (r,s) with r≠s, since we can solve for a and b given r and s:

a=((r-s)((k- )-1 mod p)) mod p b=(r-ak) mod p

Page 15: Hash Tables:. Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.

• There are p(p-1) possible pairs (r,s) with r≠s, there is a 1-1 correspondence between pairs (a,b) with a≠0 and (r,s), r≠s.

Page 16: Hash Tables:. Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.

• For any given pair of inputs k and , if we pick (a,b) uniformly at random from

, the resulting pair (r,s) is equally likely to be any pair of distinct values modulo p.

pp ZZ

Page 17: Hash Tables:. Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.

• Pr[ k and collide]=Prr,s[r≡s mod m]

• Given r, the number of s such that s≠r and s≡r (mod m) is at most

⌈p/m⌉-1≤((p+m-1)/m)-1 =(p-1)/m ∵ s, s+m, s+2m,…., ≤p

Page 18: Hash Tables:. Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.

• Thus,

Prr,s[r≡s mod m] ≤((p-1)/m)/(p-1)

=1/mTherefore, for any pair of distinct k, ∈Zp,

Pr[ha,b(k)=ha,b( )] ≤1/m,

so that Hp,m is universal.

Page 19: Hash Tables:. Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.

• Open addressing:– There is no list and no element stored

outside the table.– Advantage: avoid pointers, potentially

yield fewer collisions and faster retrieval.

Page 20: Hash Tables:. Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.

– – For every k, the probe sequence

is a permutation of .– Deletion from an open-address hash

table is difficult.– Thus chaining is more common when

keys must be deleted.

: 0,1, , 1 0,1, , 1h U m m

,0 , ,1 , , , 1h k h k h k m

0,1, , 1m

Page 21: Hash Tables:. Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.

Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.

Page 22: Hash Tables:. Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.

Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.

Page 23: Hash Tables:. Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.

• Linear probing:– ~ an ordinary hash

function (auxiliary hash function).– .

: 0,1, , 1h U m

, mod h k i h k i m

Page 24: Hash Tables:. Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.

• Quadratic probing:– ,where h’ is an

auxiliary hash function, c1 and c2≠0 and are constants.

21 2, mod h k i h k c i c i m

Page 25: Hash Tables:. Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.

• Double hashing:– ,where h1 and h2

are auxiliary hash functions.– probe sequences; Linear and

Quadratic have probe sequences.

1 2, mod h k i h k ih k m

2m m

Page 26: Hash Tables:. Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.

Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.

Page 27: Hash Tables:. Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.

• Analysis of open-addressing hashing

: load factor,

with n elements and m slots.

n

m

Page 28: Hash Tables:. Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.

• Thm:

Given an open-address hash table with load factor , the expected number of probes in an unsuccessful search is at most , assuming uniform hashing.

1n m

1 1

Page 29: Hash Tables:. Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.

• Pf:– Define the r.v. X to be the number of probes

made in an unsuccessful search.– Define Ai: the event there is an ith probe and it

is to an occupied slot.– Event . 1 2 1iX i A A A

Page 30: Hash Tables:. Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.

– The prob. that there is a jth probe and it is to an occupied slot, given that the first j-1 probes were to occupied slots is (n-j+1)/(m-j+1). Why?

1 2 1

1 2 1 3 1 2

1 1 2 2

Pr Pr

Pr Pr | Pr |

Pr |

i

i i

X i A A A

A A A A A A

A A A A

1Prn

Am

Page 31: Hash Tables:. Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.

– ∵n<m, (n-j)/(m-j) ≤ n/m for all 0 ≤ j<m.–

1 1

1

1 1 0

1 2Pr[ ]

1 2

( )

1[ ] Pr[ ]

1

i i

i i

i i i

n n n iX i

m m m in

m

E X X i

Page 32: Hash Tables:. Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.

Cor: Inserting an element into an open-addressing hash table with load factor α requires at most 1/(1- α) probes on average, assuming uniform hashing.

Page 33: Hash Tables:. Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.

Thm: Given an open-address hash table with load factor α<1, the expected number of probes in a successful search is at most , assuming uniform hashing and that each key in the table is equally likely to be searched for.

1

1ln

1

Page 34: Hash Tables:. Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.

Pf: Suppose we search for a key k.

If k was the (i+1)st key inserted into the hash table, the expected number of probes made in a search for k is at most

1/(1-i/m)=m/(m-i).

Page 35: Hash Tables:. Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.

• Averaging over all n keys in the hash table gives us the average number of probes in a successful search:

1

1ln

1ln

1111

)(111

1

1

0

1

0

nm

m

x

dx

k

HHimn

m

im

m

nm

nm

m

nmk

n

inmm

n

i

Page 36: Hash Tables:. Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.

Perfect Hashing:

Page 37: Hash Tables:. Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.

• Perfect hashing :good for when the keys are static; i.e. , once stored, the keys never change, e.g. CD-ROM, the set of reserved word in programming language.

• Thm :If we store n keys in a hash table of size m=n2 using a hash function h randomly chosen from a universal class of hash functions, then the probability of there being any collisions < ½ .

Page 38: Hash Tables:. Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.

• Proof:Let h be chosen from an universal family. Then each pair collides with probability 1/m , and there are pairs of keys.Let X be a r.v. that counts the number of collisions. When m=n2,

2

n

2

2

1 1 1[ ]

2 2 2

' , Pr[ ] [ ] / ,

1.

n n nE X

m n

By Markov s inequality X t E X t

and take t

Page 39: Hash Tables:. Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.

• Thm: If we store n keys in a hash table of size m=n using a hash function h randomly chosen from universal class of hash functions, then , where nj is the number of keys hashing to slot j.

nnEm

j j 2][1

0

2

Page 40: Hash Tables:. Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.

• Pf:– It is clear for any nonnegative integer a,

222 a

aa

]2

[2][

]2

2[][

1

0

1

0

1

0

1

0

2

m

j

jm

jj

m

j

jj

m

jj

nEnE

nnEnE

Page 41: Hash Tables:. Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.

]2

[2]2

[2][1

0

1

0

m

j

jm

j

j nEn

nEnE

total number of collisions

.2122

12][

. since ,2

1

2

)1(1

21

0

2 nnn

nnE

nmn

m

nn

m

n

m

jj

Page 42: Hash Tables:. Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.

• Cor: If store n keys in a hash table of size m=n using a hash function h randomly chosen from a universal class of hash functions and we set the size of each secondary hash table to mj=nj

2 for j=0,…,m-1, then the expected amount of storage required for all secondary hash tables in a perfect hashing scheme is < 2n.

Page 43: Hash Tables:. Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.

• Cor: Same as the above, Pr{total storage 4n} < 1/2

• Pf:– By Markov’s inequality, Pr{ X t } E[X]/t.–

.2

1

4

2

4

][

}4Pr{

:4 and Take

1

01

0

1

0

n

n

n

mE

nm

ntmX

m

jjm

jj

m

jj