Part II Chapter 8 Hashing Introduction Consider we may perform insertion, searching and deletion on...
-
Upload
annabelle-snellgrove -
Category
Documents
-
view
215 -
download
1
Transcript of Part II Chapter 8 Hashing Introduction Consider we may perform insertion, searching and deletion on...
Part II
Chapter 8 Hashing
IntroductionConsider we may perform insertion,
searching and deletion on a dictionary (symbol table).Array
Linked listTree
SortedNot
Sortedunbalanc
edbalanced
Insertion
O(n) / O(1) O(h) O(h) O(logk n)
Searching
O(log n) / O(1)
O(h) O(h) O(logk n)
Deletion
O(n) / O(1) O(h) O(h) O(logk n)Is it possible to perform these operations in O(1) ?
Is it possible to perform these operations in O(1) ?
IntroductionIf we find a mapping from a key to an
index, then we can locate a record quickly according its key and perform random access.
S1S2S3…
012…
IntroductionThis mapping can be illustrated as
follows:
Hashing: define a function h so that h(Key) = i, where h is called a hash function.
Two kindsStatic hashingDynamic hashing
hhKey
i
8.2 Static Hashing
DefinitionIn static hashing, identifiers/keys are
stored in table with a fixed size that is called hash table.
slot1 slot2
Bucket 0Bucket 1Bucket 2
Bucket n
Bucket: Each bucket has its
own address and is capable of holding a key.
hhx h(x)
Hash function
Identifier Bucket address
DefinitionSlot: Each bucket may consists of s
slots to hold synonym (同義字 )i1 and i2 are synonyms if h(i1) = h(i2).
Distinct synonyms enter into the same bucket as long as the bucket has slots available.
ExampleNumber of buckets:Number of slots for each
bucket:Define hashing function f(x)
f(x) = {i | i is the order of the initial of x}.
A and A2 are synonyms.GA and GB are synonyms.If “Doll” enters, it will be
put at buckect _______ (according to the hash function).
A A2
slot1 slot2
Bucket 0Bucket 1Bucket 2
Bucket 25
DBucket 3
GA GB
Overflow and CollisionOverflow occurs when a new identifier is
mapped into a full bucket.Collision occurs when two non-identical
identifiers are hashed into the same bucket.If the number of slot is 1, then overflow and
collision occur simutaneously.
A A2
slot1 slot2
Bucket 0Bucket 1Bucket 2
If A3 enters bucket 0, A3 collides with A and A2. The bucket overflows as well.
8.2.2 Hash FunctionsIdeally, we expect to find a hash
function that is one-to-one and easy to compute.
The hash function f(x) wheref(x) = {i | i is the order of the initial of x}.The hash function can result in a lot of
collisions because it only considers the initial character.
Key points: use every character in the identifier as possible.
Common ApproachesDivisionMid-squareFoldingDigit Analysis
DivisionThe most widely used hash functionThe key k is divided by some
number D, and the remainder is used as the bucket address.h(k) = k % DSince the bucket address is from 0 to b-1 if there are b buckets, D is usually selected as the number of buckets.
Selecting The DivisorWhen the divisor is an even number, odd
integers hash into odd home buckets and even integers into even home buckets.
20%14 = 6, 30%14 = 2, 8%14 = 815%14 = 1, 3%14 = 3, 23%14 = 9
When the divisor is an odd number, odd (even) integers may hash into any home.
20%15 = 5, 30%15 = 0, 8%15 = 815%15 = 0, 3%15 = 3, 23%15 = 8
The bias in the keys does not result in a bias toward either the odd or even home buckets.
Better chance of uniformly distributed home buckets.
So do not use an even divisor.
Selecting The DivisorSimilar biased distribution of home buckets is
seen, in practice, when the divisor is a multiple of prime numbers such as 3, 5, 7, …
The effect of each prime divisor p of b decreases as p gets larger.
Ideally, choose b so that it is a prime number.Alternatively, choose b so that it has no prime
factor smaller than 20.
Mid-squareSquaring the key and then using an
appropriate number of bits from the middle of the square.
Example:Suppose a character is represented in 6 bits
and the bucket size is 2r.0 1 3 4
A 1
0 0 0 0 0 1 0 1 1 0 1 0 92
92x92=84640 1 0 0 0 0 0 1 0 0 0 01 0 0
r bits
Mid-squareExample
Key = 113586, m =10000, where 9999 is the largest bucket address.
Squaring the key, and then we have
1 2 9 0 1 7 7 9 3 9 6
h(x) = 1779
FoldingThe key k is partitioned into several parts,
all of the same length. These partitions are then added together to obtain the hash address of k.
Two schemesShift foldingFolding at the boundaries
1 2 3 2 0 3 2 4 1 1 1 2 2 0
P1 P2 P3 P4 P5
P1
Folding
P2
P3
P4
P5
1 2 3
2 0 32 4 11 1 2 2 0
6 9 9
Shift folding
P1
P2
P3
P4
P5
1 2 3
3 0 22 4 12 1 1 2 0
8 9 7
Folding at the boundaries
Overflow HandlingAn overflow occurs when the home bucket for a
new pair (key, element) is full.We may handle overflows by:
Search the hash table in some systematic fashion for a bucket that is not full.Linear probing (linear open addressing).Quadratic probing.Rehashing.
Eliminate overflows by permitting each bucket to keep a list of all pairs for which it is the bucket address.Array linear list.Chain.
Linear ProbingAlso called linear opening addressing
Search one by one until a empty slot is found.Procedures: suppose b denotes the bucket
size.1.Compute h(k).2.Examine the hash table buckets in the order
ht[h(k)], ht[(h(k)+1)%b],…, ht[(h(k)+j)%b] until one of the following happens: ht[(h(k)+j)%b] has a pair whose key is k; k is
found. ht[(h(k)+j)%b] is empty; k is not in the table. Return to ht[h(k)]; the table is full.
Linear Probing
divisor = b (number of buckets) = 17.Bucket address = key % 17.
0 4 8 12
16
• Insert pairs whose keys are 6, 12, 34, 29, 28, 11, 23, 7, 0, 33, 30, 45
6 12
29
34 28
11
23 70 33
30
45
Linear Probing0 4 8 1
2166 1
229
34 28
11
23 70 33
30
45
Consider: when 51 enters, how many comparisons are required?
Linear opening addressing tends to create “cluster”. These clusters become larger as more synonyms enter.
Quadratic ProbingSuppose i is used as the increment.When overflow occurs, the search is carried
out by examining h(x), (h(x)+i2)%b, and (h(x)-i2)%b.For 1≦i ≦(b-1)/2 and b is a prime number of
4j+3.For example, b=3, 7, 11,…,43, 59..
RehashingIf overflow occurs at hi(x), then try hi+1(x).
Use a series of hash function h1, h2, …, hm to find an empty bucket.
h1 h2 hmx hm(x
)
Chaining[0]
[4]
[8]
[12]
[16]
12
6
34
29
28
11
237
0
33
30
45
Disadvantage of linear probingComparison of
identifiers with different hash values.
Use linked list to connect the identifiers with the same hash value and to increase the capacity of a bucket.