Department of Computer Science and Engineering, HKUST Slide 1 Dynamic Hashing Good for database that...

21
Department of Computer Science and Engineering, HKUST Slide 1 Dynamic Hashing Good for database that grows and shrinks in size Allows the hash function to be modified dynamically When the hash function takes modulo 10 in the previous example, the number of buckets is fixed to 10 The trick is how to change the hash function so that the number of buckets can change And at the same time, without the need of rehashing the existing records! Imagine if you change modulo 10 to modulo 13, then every existing record has to be rehashed – not a good idea

Transcript of Department of Computer Science and Engineering, HKUST Slide 1 Dynamic Hashing Good for database that...

Department of Computer Science and Engineering, HKUST Slide 1

Dynamic HashingDynamic Hashing

• Good for database that grows and shrinks in size

• Allows the hash function to be modified dynamically– When the hash function takes modulo 10 in the previous example,

the number of buckets is fixed to 10

– The trick is how to change the hash function so that the number of buckets can change

– And at the same time, without the need of rehashing the existing records!

– Imagine if you change modulo 10 to modulo 13, then every existing record has to be rehashed – not a good idea

Department of Computer Science and Engineering, HKUST Slide 2

Extendable HashingExtendable Hashing

• Extendable hashing - one form of dynamic hashing

– Hashing function generates values over a large range - typically b-bit integers, with b = 32.

• At any time, use only a prefix of the b-bit integers to index into a table of bucket addresses. Let the length of the prefix be i bits, 0 < i < 32

• Initially i = 1, meaning that it can index at most 2 buckets

• When the 2 buckets are full, we can use 2 bits (i = 2), meaning that we can now index at most 4 buckets, and so on and so forth….

• i grows and shrinks as the size of the database grows and shrinks.

• Actual number of buckets is < 2i, which may change due to bucket merging and splitting

b-1 0

i bits

Department of Computer Science and Engineering, HKUST Slide 3

hashprefix i = 1

0

1

hash table

i0 = 0 Bucket 0

Extendable Hash Structure General IdeasExtendable Hash Structure General Ideas

• Initially, i = 1, use 1 bit in the hash key, resulting in two entries in the hash address table

• Suppose we start with only 1 or 2 records, we need only 1 bucket initially

• Both entries in the hash address table point to the same bucket

• i0 = 0 means no bit had been used to separate records in the bucket (I.e., records are all hashed into bucket 0 irrespective of the any bit setting in the hash key values)

New record

Department of Computer Science and Engineering, HKUST Slide 4

hashprefix i = 1

0

1

hash table

i0 = 0 Bucket 0

Extendable Hash Structure Bucket Expansion

Extendable Hash Structure Bucket Expansion

New record

• Suppose bucket 0 is full and a new record arrives

• Create a new bucket, rehash the three records (two existing ones and the new record) into buckets 0 and, according to the last bit of their hash keys

i1 = 1 Bucket 1

Department of Computer Science and Engineering, HKUST Slide 5

hashprefix i = 1

0

1

hash address table

i0 = 1 Bucket 0

General Extendable Hash StructureGeneral Extendable Hash Structure

• Note: why do we need to keep i, i0 and i1?

• i is the maximum number of bits used in hashing so far; i0 and i1 are the number of bits used for these particular buckets

i1 = 1 Bucket 1 Record originallyin bucket 0

New record

Department of Computer Science and Engineering, HKUST Slide 6

hashprefix i = 1

0

1

hash address table

i0 = 1 Bucket 0

General Extendable Hash StructureGeneral Extendable Hash Structure

i1 = 1 Bucket 1

A new record

1) Upon inserting of a new (red) record, bucket 0 is full again2) Bucket 2 is created, and the three records (two existing ones and the new

one) are rehashed among buckets 0 and 2 based on the second bit

i2 = 2 Bucket 2

Department of Computer Science and Engineering, HKUST Slide 7

General Extendable Hash structureGeneral Extendable Hash structure

hashprefix i = 2

00

01

10

11

hash address table

i0 = 2 Bucket 0

Bucket 1 notchanged

i1 = 1 Bucket 1

Bucket 2 is new

use the first 2 bits from the hash key to address

the 4 entries in the table.

2 bits from the hash key had been use to hash the

recordsuse the 3rd bit in next

split

i2 = 2 Bucket 2

1 bit from the hash key had been use to hash the

recordsuse the 2nd bit in next

split

Department of Computer Science and Engineering, HKUST Slide 8

Extendable Hash Structure – PropertiesExtendable Hash Structure – Properties

• Every expansion doubles the number of entries in the table• Multiple entries in the bucket address table may point to the same

bucket. It means that the bucket hasn’t been expanded while other buckets had been expanded multiple times

• Each bucket j stores a value ij; entries in the same bucket have the same values on the first ij bits of the hash keys

• To locate the bucket containing search-key Kj:1. Compute h(Kj) = X2. Use the first i high order bits of X to look up the hash address table, and follow the pointer to appropriate bucket

• To insert a record with search-key value Kj, look up the bucket where it should belong, say j. If there is room in bucket j insert record in the bucket, else the bucket must be split and insertion re-attempted.

Department of Computer Science and Engineering, HKUST Slide 9

Split in Extendable hash StructureSplit in Extendable hash Structure

To split a bucket j when inserting record with search-key value Kj;

• If i > ij (more than one pointer to bucket j)

– allocate a new bucket z, and set ij and iz to the old ij+1.

– make the second half of the bucket address table entries pointing to j to point to z

– remove and reinsert each record in bucket j.

– recompute new bucket for Kj and insert record in the bucket (further splitting is required if the bucket is still full)

1

2

2

2

could be any number > 1

Department of Computer Science and Engineering, HKUST Slide 10

Split in Extendable hash StructureSplit in Extendable hash Structure

To split a bucket j when inserting record with search-key value Kj;• If i = ij (only one pointer to bucket j)

– increment i and double the size of the bucket address table.– Replace each entry in the table by two entries that point to the same

bucket.– Re-compute new bucket address table entry for Kj, now i > ij, so use the

first case above. hash

prefix i = 200

01

10

11

hash address table

i0=2

i1=2

i2=2

i3=2

new record

Department of Computer Science and Engineering, HKUST Slide 11

Example: Use of Extendable Hash Structure

Branch-name h(branch-name)Brighton 0010 1101 1111 1011 0010 1100 0011 0000Downtown 1010 0011 1010 0000 1100 0110 1001 1111Mianus 1100 0111 1110 1101 1011 1111 0011 1010Perryridge 1111 0001 0010 0100 1001 0011 0110 1101Redwood 0011 0101 1010 0110 1100 1001 1110 1011Round hill 1101 1000 0011 1111 1001 1100 0000 0001

Initial Hash structure, Bucket size=2

Bucket 0

hash address table

hash prefix 00

Department of Computer Science and Engineering, HKUST Slide 12

Example

Insert:

0010 1101 1111 1011 0010 1100 0011 0000no bit is needed from the hash value (i=0)

Brighton, A-217, 750

hash prefix 0Brighton, A-217, 750

0

Department of Computer Science and Engineering, HKUST Slide 13

Example

Insert:

1010 0011 1010 0000 1100 0110 1001 1111no bit is needed from the hash value (i=0)

Downtown, A-101, 500

hash prefix 0Brighton, A-217, 750

0

Downtown, A-101, 500

Insert:bucket full, split records according to first bit (i=1)

Downtown, A-101, 600

Department of Computer Science and Engineering, HKUST Slide 14

1

hash prefix 101

1

Example

Brighton 0010 1101 1111 1011 0010 1100 0011 0000Downtown 1010 0011 1010 0000 1100 0110 1001 1111

Brighton, A-217, 750

Downtown, A-101, 500Downtown, A-101, 600

Insert:

1100 0111 1110 1101 1011 1111 0011 1010

Hash into bucket 1, which is full

Mianus, A-215, 700

Department of Computer Science and Engineering, HKUST Slide 15

hash prefix 200011011

1

2

Brighton, A-217, 750

2

Example

Downtown, A-101, 500Downtown, A-101, 600

Mianus, A-215, 700Note: directory size is doubled

1 bit had been usedto allocate records

2 bits had been usedto allocate records

Mianus 1100 0111 1110 1101 1011 1111 0011 1010Downtown 1010 0011 1010 0000 1100 0110 1001 1111

Department of Computer Science and Engineering, HKUST Slide 16

hash prefix 200011011

1

2

Brighton, A-217, 750

2

Example

Downtown, A-101, 500Downtown, A-101, 600

Mianus, A-215, 700

Insert:

1111 0001 0010 0100 1001 0011 0110 1101

Perryridge, A-102, 400

Perryridge, A-102, 400

Insert:

1111 0001 0010 0100 1001 0011 0110 1101

Perryridge, A-201, 900 Bucket 2 overflowsagain!

Department of Computer Science and Engineering, HKUST Slide 17

hash prefix 200011011

1

2

Brighton, A-217, 750

2

Downtown, A-101, 500Downtown, A-101, 600

Mianus, A-215, 700Perryridge, A-102, 400

Example

• Expand hash address table, renumber the entries by adding one more bit to the left

• Bucket 10 becomes bucket 10x, where x could be 0 or 1

• Bucket 11 becomes bucket 110 and 111, because it is split; new bucket is added; local level is updated

00011011

00001111

3

3

Perryridge, A-201, 900

Redistribute overflow andnew record

Department of Computer Science and Engineering, HKUST Slide 18

Example

Insert:

1111 0001 0010 0100 1001 0011 0110 1101

Perryridge, A-218, 700 Bucket 3 overflowsagain!

hash prefix 3000001010011100101110111

1

2

Brighton, A-217, 750

3

Downtown, A-101, 500

Downtown, A-101, 600

3

Mianus, A-215, 700

Perryridge, A-102, 400

Perryridge, A-201, 900

Perryridge, A-218, 700

Department of Computer Science and Engineering, HKUST Slide 19

Example

hash prefix 3000001010011100101110111

1

2

Brighton, A-217, 750

3

Downtown, A-101, 500

Downtown, A-101, 600

3

Mianus, A-215, 700

Perryridge, A-102, 400

Perryridge, A-201, 900

Perryridge, A-218, 700

Round Hill, A-305, 350

Redwood, A-222, 700

Done!

Department of Computer Science and Engineering, HKUST Slide 20

Updates in Extendable Hash StructureUpdates in Extendable Hash Structure

• When inserting a value, if the bucket is full after several splits (that is, i reaches some limits b) create an overflow bucket instead of splitting bucket entry table further.

• To delete a key value, locate it in its bucket and remove it. The bucket itself can be removed if it because empty (with appropriate updates to the bucket address table). Coalescing of buckets and decreasing bucket address table size is also possible.

Department of Computer Science and Engineering, HKUST Slide 21

Extendible Hashing is not Pure HashingExtendible Hashing is not Pure Hashing

• Pure hashing maps a key value directly to the bucket where the record containing the key value can be found.

• Extendible hashing maps a key value to the entry in the hash prefix table which contains a pointer to the bucket where the record containing the key value can be found.

• The hash-prefix table can be considered as a complete binary tree; that is, extendible hashing is a combination of tree and hashing!

hash prefix 200011011 00 01 10 11