Design and Analysis of Algorithms Hash Tables

14
Design and Analysis of Algorithms Hash Tables Haidong Xue Summer 2012, at GSU

description

Design and Analysis of Algorithms Hash Tables. Haidong Xue Summer 2012, at GSU. Dictionary operations. Very likely. Worst case. INSERT DELETE SEARCH. ( 1 ). O(1). O(1). O(1). ( n ). O(1). - PowerPoint PPT Presentation

Transcript of Design and Analysis of Algorithms Hash Tables

Page 1: Design  and Analysis of  Algorithms Hash Tables

Design and Analysis of AlgorithmsHash Tables

Haidong XueSummer 2012, at GSU

Page 2: Design  and Analysis of  Algorithms Hash Tables

Dictionary operations

• INSERT • DELETE • SEARCH

O(1)

O(1)

O(1)

“A hash table is an effective data structure for implementing dictionaries” – textbook page 253

Very likely Worst case

(1)

O(1)

(n)

Page 3: Design  and Analysis of  Algorithms Hash Tables

51 2 3 4 6 7 8 9 10

Direct-address tables

2 3 6 1 7 5

Direct-address table:

SEARCH(S, 6)

INSERT(S, )

DELETE(S, )7

4

O(1)

O(1)

O(1)

What’s the problem here?

Storage requirement = , is the universe of keys

When the range of element is in [1, 30000]…..

Direct-addressing: use keys as addresses

Page 4: Design  and Analysis of  Algorithms Hash Tables

0 1 2

2 3 6 1 7 5

Hash tables• Can we have O(1) INSERT, DELETE AND

SEARCH with less storage?

2 3 6 1 7 5

Hash Table:

Hash Function: h(x) = x mod 3

h(2) = 2 mod 3 = 2

h(3) = 3 mod 3 = 0

h(6) = 6 mod 3 = 0

h(1) = 1 mod 3 = 1

h(7) = 7 mod 3 = 1

h(5) = 5 mod 3 = 2

Multiple elements in one slot

Collision!

Yes!

Page 5: Design  and Analysis of  Algorithms Hash Tables

Hash tables0 1 2

Hash Table:

3 1

7 5

2

6

SEARCH(S, 6)

INSERT(S, )

DELETE(S, )7

4

O(1)+2

DELETE in 1-linked-list

SEARCH in 0-linked-list

INSERT in 1-linked-list O(1)+O(1) = O(1)

O(1)+O(1) = O(1)

(2 is the length of the linked-list)h(6)=6 mod 3=0

h(4)=4 mod 3=1

h(7)=7 mod 3=1

A common method is to put them into a linked-list, i.e. chaining

What is the upper bound length?What is the average length?

Page 6: Design  and Analysis of  Algorithms Hash Tables

Analysis of hash tables

0 1 2Hash Table:

3 4

……..

……..

n m

m-1

… … … … … …

Load factor

Uniform hashing “each key is equally likely to hash to any of the m slots”

Page 7: Design  and Analysis of  Algorithms Hash Tables

Analysis of hash tables0 1 2 3 4

……..m-1

… … … … … … 𝜶

Therorem11.1 Unsuccessful search:

(1+ )

Therorem11.2Successful search:

(1+ )

= , T(n)=(1+ )

If =, T(n)=(1+ O(m))=O(1)

How to get uniform hashing?

With the assumption of uniform hashing

Page 8: Design  and Analysis of  Algorithms Hash Tables

Hash functionsHow to get uniform hashing?

Uniform hashing “each key is equally likely to hash to any of the m slots”

• Division hashing• Multiplication hashing• Universal hashing

To achieve this goal, many hashing methods are proposed:

Page 9: Design  and Analysis of  Algorithms Hash Tables

Hash functions – division hashing

• h(k) = k mod mwhere k is value of key, m is the number of slots • E.g.: – Final grades of all my students with a hash table of

10 slots– Items in grocery stores with a hash table of 10 slots

• 99 cents, large soda• $1.99, ground beef• $6.99, lamb

What’s the problem here?What if we still use 10 slots?

Page 10: Design  and Analysis of  Algorithms Hash Tables

Hash functions – division hashing

• h(k) = k mod m• Choose m as a prime number• 2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43,

47, 53, 59, 61, 67, 71, 73,…

• it sometimes not very convenient to be implemented ()

What’s the problem here?

e.g.: 99 mod 7 = 1 199 mod 7 = 3699 mod 7 = 6

Page 11: Design  and Analysis of  Algorithms Hash Tables

Hash functions – multiplication hashing

• h(k) = floor(m(kA mod 1))where m is the number of slots and A is a constant number in (0, 1)• E.g.: A=0.123, m=10– 99*0.123=12.177– 199*0.123=24.477– 699*0.123= 85.977

h(99)=floor(10*0.177)=1h(199)=floor(10*0.477)=4

h(699)=floor(10*0.977)=9

Page 12: Design  and Analysis of  Algorithms Hash Tables

Hash functions – universal hashing

• is set of hash functions;• At the beginning of each execution, randomly

choose a hash function from • Universal: where, and are keys, is the number of slots• If is not in the table, • If is in the table, Theorem 11.3

Page 13: Design  and Analysis of  Algorithms Hash Tables

Another method to deal with collisions: Open Address

• No linked-list• Hash functions include probe number:

• Linear probing: • Quadratic probing: • Double hashing:

• When does not work, use

Number of probes for unsuccessful search is at most

Number of probes for successful search is at most

Page 14: Design  and Analysis of  Algorithms Hash Tables

40 1 2 3 5 6 7 8 9Open addressing:

3 6 12

Another method to deal with collisions: Open Address

3 6 1

h ′ (𝑘 )=𝑘𝑚𝑜𝑑3

h (𝑘 ,𝑖 )=(h′ (𝑘)+𝑖)𝑚𝑜𝑑10

2

h(2, 0)=((2 mod 3) +0)mod 10=2

h(3, 0)=((3 mod 3) +0)mod 10=0

h(6, 0)=((6 mod 3) +0)mod 10=0

h(6, 1)=((6 mod 3) +1)mod 10=1

h(1, 0)=((1 mod 3) +0)mod 10=1

h(1, 1)=((1 mod 3) +1)mod 10=2

h(1, 2)=((1 mod 3) +2)mod 10=3