File StructureFile Structure
Chapter 11. HashingChapter 11. Hashing
- 2 -File Structures - Chapter 11 -File Structures - Chapter 11 -
Contents
Introduction
A Simple Hashing Algorithm
Hashing Functions and Record Distributions
How Much Extra Memory Should Be Used?
Collision Resolution by Progressive Overflow
Storing More Than One Record per Address: Buckets
Making Deletions
Other Collision Resolution Techniques
Patterns of Record Access
Introduction
A Simple Hashing Algorithm
Hashing Functions and Record Distributions
How Much Extra Memory Should Be Used?
Collision Resolution by Progressive Overflow
Storing More Than One Record per Address: Buckets
Making Deletions
Other Collision Resolution Techniques
Patterns of Record Access
- 3 -File Structures - Chapter 11 -File Structures - Chapter 11 -
1. Introduction
O-notationO(1)O(N) : sequential searchingO(log2N)
O(logkN) : B-Tree (k : 리프 노드 크기 )
What is Hashing?a = h(K)
h (hash function), K (key), a (home address)
ExampleK = BASSh = (first char * second char) mod 1000
a = h(K) = (66 * 65) mod 1000 = 4,290 mod 1000 = 290
O-notationO(1)O(N) : sequential searchingO(log2N)
O(logkN) : B-Tree (k : 리프 노드 크기 )
What is Hashing?a = h(K)
h (hash function), K (key), a (home address)
ExampleK = BASSh = (first char * second char) mod 1000
a = h(K) = (66 * 65) mod 1000 = 4,290 mod 1000 = 290
- 4 -File Structures - Chapter 11 -File Structures - Chapter 11 -
Introduction
CollisionExample
key : LOWELL => a = (76 * 79) mod 1000 = 6,004 mod 1000 = 4 OLIVIER => a = (79 * 76) mod 1000 = 6,004 mod 1000 = 4
Several ways to reduce the number of collisions 1. Spread out the records
Good hashing algorithms 2. Use extra memory 3. Put more than one record at a single address
Buckets
CollisionExample
key : LOWELL => a = (76 * 79) mod 1000 = 6,004 mod 1000 = 4 OLIVIER => a = (79 * 76) mod 1000 = 6,004 mod 1000 = 4
Several ways to reduce the number of collisions 1. Spread out the records
Good hashing algorithms 2. Use extra memory 3. Put more than one record at a single address
Buckets
- 5 -File Structures - Chapter 11 -File Structures - Chapter 11 -
2. A Simple Hashing Algorithm
3 Steps1. Represent the key in numerical form2. Fold and add3. Divide by a prime number and use the remainder as the address
ExampleStep 1. Represent the Key in Numerical Form
3 Steps1. Represent the key in numerical form2. Fold and add3. Divide by a prime number and use the remainder as the address
ExampleStep 1. Represent the Key in Numerical Form
LOWELL = 76 79 87 69 76 76 32 32 32 32 32 32 L O W E L L Blanks
- 6 -File Structures - Chapter 11 -File Structures - Chapter 11 -
A Simple Hashing Algorithm
Example (계속 )Step 2. Fold and Add
76 79 | 87 69 | 76 76 | 32 32 | 32 32 | 32 327679 + 8769 + 7676 + 3232 + 3232 = 30588(30588+3232 = 33820 => 2byte Maximum 값 32767 을 초과하므로 )
7679 + 8769 = 16448 => 16448 mod 19937 = 16448 16448 + 7676 = 24124 => 24124 mod 19937 = 4187
4187 + 3232 = 7419 => 7419 mod 19937 = 74197419 + 3232 = 10651 => 10651 mod 19937 = 1065110651 + 3232 = 13883 => 13883 mod 19937 = 13883
Step 3. Divide by the Size of the Address Spacea = s mod n (n : # of address in file)a = 13883 mod 100 = 83a = 13883 mod 101 = 46
Example (계속 )Step 2. Fold and Add
76 79 | 87 69 | 76 76 | 32 32 | 32 32 | 32 327679 + 8769 + 7676 + 3232 + 3232 = 30588(30588+3232 = 33820 => 2byte Maximum 값 32767 을 초과하므로 )
7679 + 8769 = 16448 => 16448 mod 19937 = 16448 16448 + 7676 = 24124 => 24124 mod 19937 = 4187
4187 + 3232 = 7419 => 7419 mod 19937 = 74197419 + 3232 = 10651 => 10651 mod 19937 = 1065110651 + 3232 = 13883 => 13883 mod 19937 = 13883
Step 3. Divide by the Size of the Address Spacea = s mod n (n : # of address in file)a = 13883 mod 100 = 83a = 13883 mod 101 = 46
- 7 -File Structures - Chapter 11 -File Structures - Chapter 11 -
3. Hashing Functions and Record Distributions
Distributing Records among Addresses Distributing Records among Addresses
12345678910
ABCDEFG
Record Address
Best
(a)
12345678910
ABCDEFG
Record Address
Worst
(b)
12345678910
ABCDEFG
Record Address
Acceptable
(c)
<Figure 11.3> Different distributions. (a) Uniform distribution(Best) (b) Worst case (c) Randomly distribution (Acceptable)
- 8 -File Structures - Chapter 11 -File Structures - Chapter 11 -
Hashing Functions and Record Distributions
Some Other Hashing MethodsBetter than random
Examine keys for a pattern 주민등록 번호
Divide the key by a prime number
Random Square the key and take the middle
4532 => 2 0 5 2 0 9 Radix transformation
Some Other Hashing MethodsBetter than random
Examine keys for a pattern 주민등록 번호
Divide the key by a prime number
Random Square the key and take the middle
4532 => 2 0 5 2 0 9 Radix transformation
- 9 -File Structures - Chapter 11 -File Structures - Chapter 11 -
4. How Much Extra Memory Should Be Used ?
Packing Density
Exampler = 75 recordsN = 100 address
Packing Density
Exampler = 75 recordsN = 100 address
N
r
spaces of #
records of #
%7575.0100
75
- 10 -File Structures - Chapter 11 -File Structures - Chapter 11 -
How Much Extra Memory Should Be Used ?
Predicting Collisions for Different Packing Densities Predicting Collisions for Different Packing Densities
Packing density (%) Synonyms (%)
10407090100
4.817.628.134.136.8
<Table 11.2> Effect of packing density on the proportion of records not stored at their home addresses
- 11 -File Structures - Chapter 11 -File Structures - Chapter 11 -
5. Collision Resolution by Progressive Overflow
Progressive OverflowOpen addressingLinear probing
Progressive OverflowOpen addressingLinear probing
0
1
Rosen2
Jasper3
York4
Novak’s home address
York’s home address
York h(K)address
3
Novak h(K)address
2
- 12 -File Structures - Chapter 11 -File Structures - Chapter 11 -
Collision Resolution by Progressive Overflow
Search Length Search Length
KeyHome
Address# of Access
(Search Length)
AdamsBatesColeDeanEvans
01120
11225
Adams0
Bates1
Cole2
Dean3
Evans4
5
- 13 -File Structures - Chapter 11 -File Structures - Chapter 11 -
Collision Resolution by Progressive Overflow
Search Length (계속 )
Example
Search Length (계속 )
Examplerecords ofnumber total
lengthsearch total Length Search Average
2.25
52211 Length Search Average
<Figure 11.7>Average search lengthversus packing densityin a hashed file
- 14 -File Structures - Chapter 11 -File Structures - Chapter 11 -
6. Storing More Than One Record per Address : Buckets
Buckets Buckets
Key Home Address
GreenHall
JenksKingLandMarxNutt
0023333
Green Hall0
1
Jenks2
King Land Marks3
Nutt4
- 15 -File Structures - Chapter 11 -File Structures - Chapter 11 -
Storing More Than One Record per Address : Buckets
Effects of Buckets on Performance Effects of Buckets on Performance
bN
r density packing
r : # of recordsN : # of addressesb : # of records in a bucket
File without buckets File with buckets
# of records# of addresses
Bucket sizePacking density
Ratio of records to addresses
r = 750N = 1000
b = 10.75
r/N = 0.75
r = 750N = 500
b = 20.75
r/N = 1.5
- 16 -File Structures - Chapter 11 -File Structures - Chapter 11 -
Storing More Than One Record per Address : Buckets
<Table 11.4> Synonyms causing collisions as a percent of records for different packing densities and different bucket sizes
<Table 11.4> Synonyms causing collisions as a percent of records for different packing densities and different bucket sizes
Packingdensity
Bucket size
1 2 5 10
20 %
50 %
80 %
100 %
9.4
21.3
31.2
36.8
2.2
10.4
20.4
27.1
0.1
2.5
10.3
17.6
0.0
0.4
5.3
12.5
- 17 -File Structures - Chapter 11 -File Structures - Chapter 11 -
7. Making Deletions
처음상태 처음상태
KeyHome
AddressActual
address
Adams
Jones
Morris
Smith
0
1
1
0
0
1
2
3
Adams0
Jones1
Morris2
Smith3
- 18 -File Structures - Chapter 11 -File Structures - Chapter 11 -
Making Deletions
(1) Tombstones for Handling Deletions (1) Tombstones for Handling Deletions
Adams0
Jones1
Morris2
Smith3
* Deletion of Morris
Adams0
Jones1
###2
Smith3
“Smith 는 찾을 수 없다”
### : tombstoneThis mark indicates that a record once lived there but no longer does
- 19 -File Structures - Chapter 11 -File Structures - Chapter 11 -
Making Deletions
(2) Implications of Tombstones for Insertions Inserting “Smith”
(3) Effects of Deletions and Additions on PerformanceSolution to problem of deteriorating average search length
Reorganization
(2) Implications of Tombstones for Insertions Inserting “Smith”
(3) Effects of Deletions and Additions on PerformanceSolution to problem of deteriorating average search length
Reorganization
- 20 -File Structures - Chapter 11 -File Structures - Chapter 11 -
8. Other Collision Resolution Techniques
(1) Double HashingSecond hashing function
Increment(c) adding
Seek time overhead
(1) Double HashingSecond hashing function
Increment(c) adding
Seek time overhead
- 21 -File Structures - Chapter 11 -File Structures - Chapter 11 -
Other Collision Resolution Techniques
(2) Chained Progressive Overflow (2) Chained Progressive Overflow
KeyHome
addressActual
AddressSearch
length(1)Search
length(2)
AdamsBatesColeDeanEvansFlint
010140
012345
113316
112213
Adams0
Bates1
Cole2
Dean3
Evans4
Flint5
Adams0
Bates1
Cole2
Dean3
Evans4
Flint5
2
3
5
-1
-1
-1
- 22 -File Structures - Chapter 11 -File Structures - Chapter 11 -
Other Collision Resolution Techniques
(3) Chaining with a Separate Overflow Area (3) Chaining with a Separate Overflow Area
Adams0
Bates1
2
3
Evans4
0
1
-1
Cole
Dean
Flint
2
-1
-1
Homeaddress
Primarydata area
Overflowarea
- 23 -File Structures - Chapter 11 -File Structures - Chapter 11 -
Other Collision Resolution Techniques
(4) Scatter Tables: Indexing Revisited (4) Scatter Tables: Indexing Revisited
0
1
2
3
4
Adams
Coles
Deans
1
3
Bates 4
Flint -1
-1
-1Evans
- 24 -File Structures - Chapter 11 -File Structures - Chapter 11 -
Patterns of Record Access
A small percentage of the records in a file account for a large percentage of the accesses : 80 / 20 Rule80% of the accesses are performed on 20% of the records
A small percentage of the records in a file account for a large percentage of the accesses : 80 / 20 Rule80% of the accesses are performed on 20% of the records
Top Related