Department of Computer Science and Engineering, HKUST Slide 1 Dynamic Hashing Good for database that...
-
Upload
tatiana-hammell -
Category
Documents
-
view
213 -
download
0
Transcript of Department of Computer Science and Engineering, HKUST Slide 1 Dynamic Hashing Good for database that...
Department of Computer Science and Engineering, HKUST Slide 1
Dynamic HashingDynamic Hashing
• Good for database that grows and shrinks in size
• Allows the hash function to be modified dynamically– When the hash function takes modulo 10 in the previous example,
the number of buckets is fixed to 10
– The trick is how to change the hash function so that the number of buckets can change
– And at the same time, without the need of rehashing the existing records!
– Imagine if you change modulo 10 to modulo 13, then every existing record has to be rehashed – not a good idea
Department of Computer Science and Engineering, HKUST Slide 2
Extendable HashingExtendable Hashing
• Extendable hashing - one form of dynamic hashing
– Hashing function generates values over a large range - typically b-bit integers, with b = 32.
• At any time, use only a prefix of the b-bit integers to index into a table of bucket addresses. Let the length of the prefix be i bits, 0 < i < 32
• Initially i = 1, meaning that it can index at most 2 buckets
• When the 2 buckets are full, we can use 2 bits (i = 2), meaning that we can now index at most 4 buckets, and so on and so forth….
• i grows and shrinks as the size of the database grows and shrinks.
• Actual number of buckets is < 2i, which may change due to bucket merging and splitting
b-1 0
i bits
Department of Computer Science and Engineering, HKUST Slide 3
hashprefix i = 1
0
1
hash table
i0 = 0 Bucket 0
Extendable Hash Structure General IdeasExtendable Hash Structure General Ideas
• Initially, i = 1, use 1 bit in the hash key, resulting in two entries in the hash address table
• Suppose we start with only 1 or 2 records, we need only 1 bucket initially
• Both entries in the hash address table point to the same bucket
• i0 = 0 means no bit had been used to separate records in the bucket (I.e., records are all hashed into bucket 0 irrespective of the any bit setting in the hash key values)
New record
Department of Computer Science and Engineering, HKUST Slide 4
hashprefix i = 1
0
1
hash table
i0 = 0 Bucket 0
Extendable Hash Structure Bucket Expansion
Extendable Hash Structure Bucket Expansion
New record
• Suppose bucket 0 is full and a new record arrives
• Create a new bucket, rehash the three records (two existing ones and the new record) into buckets 0 and, according to the last bit of their hash keys
i1 = 1 Bucket 1
Department of Computer Science and Engineering, HKUST Slide 5
hashprefix i = 1
0
1
hash address table
i0 = 1 Bucket 0
General Extendable Hash StructureGeneral Extendable Hash Structure
• Note: why do we need to keep i, i0 and i1?
• i is the maximum number of bits used in hashing so far; i0 and i1 are the number of bits used for these particular buckets
i1 = 1 Bucket 1 Record originallyin bucket 0
New record
Department of Computer Science and Engineering, HKUST Slide 6
hashprefix i = 1
0
1
hash address table
i0 = 1 Bucket 0
General Extendable Hash StructureGeneral Extendable Hash Structure
i1 = 1 Bucket 1
A new record
1) Upon inserting of a new (red) record, bucket 0 is full again2) Bucket 2 is created, and the three records (two existing ones and the new
one) are rehashed among buckets 0 and 2 based on the second bit
i2 = 2 Bucket 2
Department of Computer Science and Engineering, HKUST Slide 7
General Extendable Hash structureGeneral Extendable Hash structure
hashprefix i = 2
00
01
10
11
hash address table
i0 = 2 Bucket 0
Bucket 1 notchanged
i1 = 1 Bucket 1
Bucket 2 is new
use the first 2 bits from the hash key to address
the 4 entries in the table.
2 bits from the hash key had been use to hash the
recordsuse the 3rd bit in next
split
i2 = 2 Bucket 2
1 bit from the hash key had been use to hash the
recordsuse the 2nd bit in next
split
Department of Computer Science and Engineering, HKUST Slide 8
Extendable Hash Structure – PropertiesExtendable Hash Structure – Properties
• Every expansion doubles the number of entries in the table• Multiple entries in the bucket address table may point to the same
bucket. It means that the bucket hasn’t been expanded while other buckets had been expanded multiple times
• Each bucket j stores a value ij; entries in the same bucket have the same values on the first ij bits of the hash keys
• To locate the bucket containing search-key Kj:1. Compute h(Kj) = X2. Use the first i high order bits of X to look up the hash address table, and follow the pointer to appropriate bucket
• To insert a record with search-key value Kj, look up the bucket where it should belong, say j. If there is room in bucket j insert record in the bucket, else the bucket must be split and insertion re-attempted.
Department of Computer Science and Engineering, HKUST Slide 9
Split in Extendable hash StructureSplit in Extendable hash Structure
To split a bucket j when inserting record with search-key value Kj;
• If i > ij (more than one pointer to bucket j)
– allocate a new bucket z, and set ij and iz to the old ij+1.
– make the second half of the bucket address table entries pointing to j to point to z
– remove and reinsert each record in bucket j.
– recompute new bucket for Kj and insert record in the bucket (further splitting is required if the bucket is still full)
1
2
2
2
could be any number > 1
Department of Computer Science and Engineering, HKUST Slide 10
Split in Extendable hash StructureSplit in Extendable hash Structure
To split a bucket j when inserting record with search-key value Kj;• If i = ij (only one pointer to bucket j)
– increment i and double the size of the bucket address table.– Replace each entry in the table by two entries that point to the same
bucket.– Re-compute new bucket address table entry for Kj, now i > ij, so use the
first case above. hash
prefix i = 200
01
10
11
hash address table
i0=2
i1=2
i2=2
i3=2
new record
Department of Computer Science and Engineering, HKUST Slide 11
Example: Use of Extendable Hash Structure
Branch-name h(branch-name)Brighton 0010 1101 1111 1011 0010 1100 0011 0000Downtown 1010 0011 1010 0000 1100 0110 1001 1111Mianus 1100 0111 1110 1101 1011 1111 0011 1010Perryridge 1111 0001 0010 0100 1001 0011 0110 1101Redwood 0011 0101 1010 0110 1100 1001 1110 1011Round hill 1101 1000 0011 1111 1001 1100 0000 0001
Initial Hash structure, Bucket size=2
Bucket 0
hash address table
hash prefix 00
Department of Computer Science and Engineering, HKUST Slide 12
Example
Insert:
0010 1101 1111 1011 0010 1100 0011 0000no bit is needed from the hash value (i=0)
Brighton, A-217, 750
hash prefix 0Brighton, A-217, 750
0
Department of Computer Science and Engineering, HKUST Slide 13
Example
Insert:
1010 0011 1010 0000 1100 0110 1001 1111no bit is needed from the hash value (i=0)
Downtown, A-101, 500
hash prefix 0Brighton, A-217, 750
0
Downtown, A-101, 500
Insert:bucket full, split records according to first bit (i=1)
Downtown, A-101, 600
Department of Computer Science and Engineering, HKUST Slide 14
1
hash prefix 101
1
Example
Brighton 0010 1101 1111 1011 0010 1100 0011 0000Downtown 1010 0011 1010 0000 1100 0110 1001 1111
Brighton, A-217, 750
Downtown, A-101, 500Downtown, A-101, 600
Insert:
1100 0111 1110 1101 1011 1111 0011 1010
Hash into bucket 1, which is full
Mianus, A-215, 700
Department of Computer Science and Engineering, HKUST Slide 15
hash prefix 200011011
1
2
Brighton, A-217, 750
2
Example
Downtown, A-101, 500Downtown, A-101, 600
Mianus, A-215, 700Note: directory size is doubled
1 bit had been usedto allocate records
2 bits had been usedto allocate records
Mianus 1100 0111 1110 1101 1011 1111 0011 1010Downtown 1010 0011 1010 0000 1100 0110 1001 1111
Department of Computer Science and Engineering, HKUST Slide 16
hash prefix 200011011
1
2
Brighton, A-217, 750
2
Example
Downtown, A-101, 500Downtown, A-101, 600
Mianus, A-215, 700
Insert:
1111 0001 0010 0100 1001 0011 0110 1101
Perryridge, A-102, 400
Perryridge, A-102, 400
Insert:
1111 0001 0010 0100 1001 0011 0110 1101
Perryridge, A-201, 900 Bucket 2 overflowsagain!
Department of Computer Science and Engineering, HKUST Slide 17
hash prefix 200011011
1
2
Brighton, A-217, 750
2
Downtown, A-101, 500Downtown, A-101, 600
Mianus, A-215, 700Perryridge, A-102, 400
Example
• Expand hash address table, renumber the entries by adding one more bit to the left
• Bucket 10 becomes bucket 10x, where x could be 0 or 1
• Bucket 11 becomes bucket 110 and 111, because it is split; new bucket is added; local level is updated
00011011
00001111
3
3
Perryridge, A-201, 900
Redistribute overflow andnew record
Department of Computer Science and Engineering, HKUST Slide 18
Example
Insert:
1111 0001 0010 0100 1001 0011 0110 1101
Perryridge, A-218, 700 Bucket 3 overflowsagain!
hash prefix 3000001010011100101110111
1
2
Brighton, A-217, 750
3
Downtown, A-101, 500
Downtown, A-101, 600
3
Mianus, A-215, 700
Perryridge, A-102, 400
Perryridge, A-201, 900
Perryridge, A-218, 700
Department of Computer Science and Engineering, HKUST Slide 19
Example
hash prefix 3000001010011100101110111
1
2
Brighton, A-217, 750
3
Downtown, A-101, 500
Downtown, A-101, 600
3
Mianus, A-215, 700
Perryridge, A-102, 400
Perryridge, A-201, 900
Perryridge, A-218, 700
Round Hill, A-305, 350
Redwood, A-222, 700
Done!
Department of Computer Science and Engineering, HKUST Slide 20
Updates in Extendable Hash StructureUpdates in Extendable Hash Structure
• When inserting a value, if the bucket is full after several splits (that is, i reaches some limits b) create an overflow bucket instead of splitting bucket entry table further.
• To delete a key value, locate it in its bucket and remove it. The bucket itself can be removed if it because empty (with appropriate updates to the bucket address table). Coalescing of buckets and decreasing bucket address table size is also possible.
Department of Computer Science and Engineering, HKUST Slide 21
Extendible Hashing is not Pure HashingExtendible Hashing is not Pure Hashing
• Pure hashing maps a key value directly to the bucket where the record containing the key value can be found.
• Extendible hashing maps a key value to the entry in the hash prefix table which contains a pointer to the bucket where the record containing the key value can be found.
• The hash-prefix table can be considered as a complete binary tree; that is, extendible hashing is a combination of tree and hashing!
hash prefix 200011011 00 01 10 11