NCUE CSIE Wireless Communications and Networking Laboratory CHAPTER 8 Hashing 1.

25
NCUE CSIE Wireless Communications and Networking Laboratory CHAPTER 8 Hashing Hashing 1

Transcript of NCUE CSIE Wireless Communications and Networking Laboratory CHAPTER 8 Hashing 1.

NCUE CSIE Wireless Communications and Networking Laboratory

CHAPTER 8

HashingHashing

1

NCUE CSIE Wireless Communications and Networking Laboratory

Hashing

Definition:Definition: In hashing the dictionary pairs are stored in a table, ht, called the

hash table. The hash table is partitioned into b buckets, ht[0],…,ht[b-1]. The address or location of a pair is determined by a hash function, h, which maps keys into buckets. Thus, for any key k, h(k) is an integer in the range 0 through b-1.

2

NCUE CSIE Wireless Communications and Networking Laboratory

Hash Table

index

0

1

2

b-1

0Slot 1Slot

Bucket

x H(x)

3

NCUE CSIE Wireless Communications and Networking Laboratory

Terminologies

4

Identifier densityIdentifier density:: The identifier density of a hash table is the ratio n/T, where n is the number of identifiers in the table. The loading density or loading factor of a hash table is (T: distinct possible value of identifiers. s: number of slots per bucket. b: bucket number)

bsn

utilization Collision

NCUE CSIE Wireless Communications and Networking Laboratory

5

Terminologies

Collision:Collision:A collision occurs when the home bucket for the new pair is not

empty at the time of insertion.

Overflow:Overflow:Since many keys typically have the same home bucket, it is possible that the home bucket for a new dictionary pair is full at the time we wish to insert this pair into the dictionary.

collision overflownot necessarily

necessarily

NCUE CSIE Wireless Communications and Networking Laboratory

Hashing Function

Hashing Function Design :

(1)(1) Easy to ComputeEasy to Compute (2)(2) Minimize the Number of Collisions Minimize the Number of Collisions (3)(3) Uniform Hash functionUniform Hash function

6

NCUE CSIE Wireless Communications and Networking Laboratory

Hashing Function

Hashing Function :

Mid-SquareMid-Square

Division (Modulus)Division (Modulus)

Folding Folding

Digit AnalysisDigit Analysis

7

NCUE CSIE Wireless Communications and Networking Laboratory

Mid-square

The mid-square hash function determines the home bucket for a key by squaring the key and then using an appropriate number of bits

from the middle of the square to obtain the bucket address.

ex) We assume the key = 8125 , and hashing table has

1000 buckets.

66015625)8125( 2

8

so address is “ 156” or “015”

NCUE CSIE Wireless Communications and Networking Laboratory

Divisions

The home bucket is obtained by using the modulo (%) operator. The key x is divided by some number M, and the remainder is used as the home bucket for x.

f(x) = x mod M ex)

9

prime

NCUE CSIE Wireless Communications and Networking Laboratory

Folding

In this method the key k is partitioned into several parts, all but possibly the last being of the same length. These partitons are then added together to obtain the hash address for k.

There are two ways of carrying out this additon. (1) Shift (2) Boundary

ex1) We assume the key = 12320324111220 , and

hashing table has 1000 buckets. 123|203|241|112|20 (1) 123+203+241+112+020=699 (2) 123+302+241+211+020=897

10

NCUE CSIE Wireless Communications and Networking Laboratory

Folding

11

ex2)

(1) Shift (2) Boundary

NCUE CSIE Wireless Communications and Networking Laboratory

Digit Analysis

All the keys in the table are known in advance. Each key is interpreted

as a number using some radix r. The same radix is used for all the keys in the table. Using this radix, the digits of each key are

examined.

ex)

12

phone number address

NCUE CSIE Wireless Communications and Networking Laboratory

Overflow Handling

Linear Open Addressing (Linear Probing)

Quadratic Probing

Rehashing

Chaining

13

NCUE CSIE Wireless Communications and Networking Laboratory

Linear Probing

When the overflow occurs, we search the hash table buckets in the order (H(x)+1, H(x)+2…), until the hash table is full or reaching the first unfilled bucket.

ex)

14

1

0

2

3

4

5

6

10

75

43

Insert 55

1

0

2

3

4

5

6

10

75

43

Insert 25

55

25

1

0

2

3

4

5

6

10

75

43

55

NCUE CSIE Wireless Communications and Networking Laboratory

Linear Probing

AdvantageAdvantagess: Simple 、 Easy to Implement 。

Disadvantages:Disadvantages: When the clustering occurs, the search time will

increaserapidly 。

15

NCUE CSIE Wireless Communications and Networking Laboratory

Quadratic Probing

H(x), overflow21

22

21

22

16

When the overflow occurs, we search the hash table buckets by using

ex) Key k, hash function H

1st search : H(k)2nd

search : (H(k)+12)%b3th search : (H(k)-12)%b4th search : (H(k)+22)%b5th search : (H(k)-22)%bNth search : (H(k)±((B-1)/2)2)%b

NCUE CSIE Wireless Communications and Networking Laboratory

Rehashing

The rehashing method is to use a series of hash functions h1,h2,…,hm. Buckets hi(k), 1 i m are examined in the order.≦ ≦

17

NCUE CSIE Wireless Communications and Networking Laboratory

Chaining

Many of the comparisons can be saved if we maintain lists of keys, one list per bucket, each list containing all the synonyms for that bucket.

18

1

0

2

3

4

5

6

10

75

43

Insert 551

0

2

3

4

5

6

10

75

43

55

Insert 25

25

NCUE CSIE Wireless Communications and Networking Laboratory

Assume that a hash function has the following characteristics: keys 257 and 567 hash to 3 keys 987 and 313 hash to 6 keys 734, 189 and 575 hash to 5 keys 122 and 391 hash to 8Assume that insertions are done in order 257, 987, 122, 575, 189, 734, 567, 313, 391(1)Indicate the position of the data if open probe addressingis used to resolve collision(2) Indicate the position of the data if chining with separate lists is used to resolve collision

0 1 2 3 4 5 6 7 8 9 10

Hash Table

19

Question:

NCUE CSIE Wireless Communications and Networking Laboratory

If H(x) = x mod 7 and separate chaining resolves collisions, What does the hash table look like after the following insertions occur: 8, 10, 24, 15, 32, 17? Assume that each table item contains only a search key.

20

Question:

NCUE CSIE Wireless Communications and Networking Laboratory

Suppose the hashing function f(x) = x mod 11 is used to hash a list of input value (in the given order) into a hash table implemented by the arraybucked[0],bucket[1],…bucket[10]. The inputs are 10,100,32,45,126,3,24,200,and 53. Each bucket can hold only one number. Overflow is resolved by quadratic probing, which examines buckets f(x), (f(x)+ ) mod 11,and (f(x)- ) mod 11, i=1 to 5.Show the final contents in bucket[0] to bucket[10].

21

Question:

2i2i

NCUE CSIE Wireless Communications and Networking Laboratory

Ans:

22

0 32

1 100

2 45

3 3

4

5 126

6 24

7

8 53

9 200

10 10

NCUE CSIE Wireless Communications and Networking Laboratory

For each hash table below, show the result of inserting the following sequence of key values, in the given order , into an initially empty hash table of that type: 26,17,20,9,34,32,15,21.In both cases , assume a hash table size of 11 and a hash function h(x) = x mod 11.(1)Static hash table that uses chaining(2)Hash table that uses linear probing

23

Question:

NCUE CSIE Wireless Communications and Networking Laboratory

(1) (2)

24

0

1 →34

2

3

4 →26→ 15

5 →17

6

7

8

9 →20→ 9

10 →32→ 21

0 32

1 34

2 21

3

4 26

5 15

6 17

7

8

9 20

10 9

NCUE CSIE Wireless Communications and Networking Laboratory

25

Reference

Ellis Horowitz, Sartaj Sahni, and Susan Anderson-Freed〝 Fundamentals of Data Structures in C 〞 , W. H. Freeman & Co Ltd, 1992.

Ellis Horowitz, Sartaj Sahni, and Dinesh Mehta〝 Fundamentals of Data Structures in C++ 〞 Silicon Pr, 2006

Richard F.Gilberg, Behrouz A. Forouzan, 〝 Data Structures: A Pseudocode Approach with C 〞 , SBaker & Taylor Books, 2004

Fred Buckley, and Marty Lewinter 〝 A Friendly Introduction to Graph Theory 〞 Prentice Hall, 2002

〝資料結構 - 使用 C 語言〞蘇維雅譯,松崗, 2004 〝資料結構 - 使用 C 語言 〞 蔡明志編著,全華, 2004 〝資料結構 ( 含精選試題 ) 〞洪逸編著,鼎茂, 2005