Class 11: Hash Indexing...Extendible Hashing Directory: an array Search for k: –Apply hash...
Transcript of Class 11: Hash Indexing...Extendible Hashing Directory: an array Search for k: –Apply hash...
![Page 1: Class 11: Hash Indexing...Extendible Hashing Directory: an array Search for k: –Apply hash function h(k) –Take last global depth # bits of h(k) Insert: –If the bucket has space,](https://reader033.fdocuments.us/reader033/viewer/2022060523/6053295d2e2cbe5b195b82e8/html5/thumbnails/1.jpg)
CAS CS 460 [Fall 2020] - https://bu-disc.github.io/CS460/ - Manos Athanassoulis
CS460: Intro to Database Systems
Class 11: Hash IndexingInstructor: Manos Athanassoulis
https://bu-disc.github.io/CS460/
![Page 2: Class 11: Hash Indexing...Extendible Hashing Directory: an array Search for k: –Apply hash function h(k) –Take last global depth # bits of h(k) Insert: –If the bucket has space,](https://reader033.fdocuments.us/reader033/viewer/2022060523/6053295d2e2cbe5b195b82e8/html5/thumbnails/2.jpg)
CAS CS 460 [Fall 2020] - https://bu-disc.github.io/CS460/ - Manos Athanassoulis
Last time: B+ Trees
2
2* 3*
Root17
21 24
14* 16* 19* 20* 21* 22* 23* 24* 27* 29*
135
7*5* 8*
“It could be said that the world’s information is at our fingertips because of B-trees”
Other forms of indexing?
![Page 3: Class 11: Hash Indexing...Extendible Hashing Directory: an array Search for k: –Apply hash function h(k) –Take last global depth # bits of h(k) Insert: –If the bucket has space,](https://reader033.fdocuments.us/reader033/viewer/2022060523/6053295d2e2cbe5b195b82e8/html5/thumbnails/3.jpg)
CAS CS 460 [Fall 2020] - https://bu-disc.github.io/CS460/ - Manos Athanassoulis
Units
Hash Indexing
Static Hashing
Extendible Hashing
Linear Hashing
3
![Page 4: Class 11: Hash Indexing...Extendible Hashing Directory: an array Search for k: –Apply hash function h(k) –Take last global depth # bits of h(k) Insert: –If the bucket has space,](https://reader033.fdocuments.us/reader033/viewer/2022060523/6053295d2e2cbe5b195b82e8/html5/thumbnails/4.jpg)
CAS CS 460 [Fall 2020] - https://bu-disc.github.io/CS460/ - Manos Athanassoulis
Reminder: Alternatives of Data Entries1. Actual data record (with key value k)2. <k, rid of matching data record>3. <k, list of rids of matching data records>
Choice is orthogonal to the indexing technique
Hash-based indexes à equality selectionsCannot support range searches
Static and dynamic hashing techniques exist
![Page 5: Class 11: Hash Indexing...Extendible Hashing Directory: an array Search for k: –Apply hash function h(k) –Take last global depth # bits of h(k) Insert: –If the bucket has space,](https://reader033.fdocuments.us/reader033/viewer/2022060523/6053295d2e2cbe5b195b82e8/html5/thumbnails/5.jpg)
CAS CS 460 [Fall 2020] - https://bu-disc.github.io/CS460/ - Manos Athanassoulis
Hash function
a function that maps a search key to an index between [0 .. M-1]
where M is the number of buckets (pages) available to our index
• ideally a hash function maps the search keys uniformly in [0, …, M-1]• in practice simple hash functions are used (fast to compute)• different keys might be mapped to the same bucket
5
![Page 6: Class 11: Hash Indexing...Extendible Hashing Directory: an array Search for k: –Apply hash function h(k) –Take last global depth # bits of h(k) Insert: –If the bucket has space,](https://reader033.fdocuments.us/reader033/viewer/2022060523/6053295d2e2cbe5b195b82e8/html5/thumbnails/6.jpg)
CAS CS 460 [Fall 2020] - https://bu-disc.github.io/CS460/ - Manos Athanassoulis
#primary bucket pages fixed, allocated sequentially, never de-allocated; overflow pages if needed
h(k) mod M = bucket to insert data entry with key k (M: #buckets)
Static Hashing
6
h
0
1
…
…
M-1
keyh(key) mod M
…
………
…
Primary bucket pages Overflow pages
![Page 7: Class 11: Hash Indexing...Extendible Hashing Directory: an array Search for k: –Apply hash function h(k) –Take last global depth # bits of h(k) Insert: –If the bucket has space,](https://reader033.fdocuments.us/reader033/viewer/2022060523/6053295d2e2cbe5b195b82e8/html5/thumbnails/7.jpg)
CAS CS 460 [Fall 2020] - https://bu-disc.github.io/CS460/ - Manos Athanassoulis
Static Hashing (Contd.) Buckets contain data entries
Hash function on search key field of record r
Must distribute values over range 0 ... M-1What is a good hash function? h(key) = (a * key + b) usually works wella and b are constants; lots known about how to tune h
7
![Page 8: Class 11: Hash Indexing...Extendible Hashing Directory: an array Search for k: –Apply hash function h(k) –Take last global depth # bits of h(k) Insert: –If the bucket has space,](https://reader033.fdocuments.us/reader033/viewer/2022060523/6053295d2e2cbe5b195b82e8/html5/thumbnails/8.jpg)
CAS CS 460 [Fall 2020] - https://bu-disc.github.io/CS460/ - Manos Athanassoulis
Static Hashing (Problems!)
Long overflow chains can develop and degrade performance
Ways to solve?– Reorganization (re-hashing) is expensive and may block queries – Extendible and Linear Hashing: Dynamic techniques to fix this problem
8
![Page 9: Class 11: Hash Indexing...Extendible Hashing Directory: an array Search for k: –Apply hash function h(k) –Take last global depth # bits of h(k) Insert: –If the bucket has space,](https://reader033.fdocuments.us/reader033/viewer/2022060523/6053295d2e2cbe5b195b82e8/html5/thumbnails/9.jpg)
CAS CS 460 [Fall 2020] - https://bu-disc.github.io/CS460/ - Manos Athanassoulis
Units
Hash Indexing
Static Hashing
Extendible Hashing
Linear Hashing
9
![Page 10: Class 11: Hash Indexing...Extendible Hashing Directory: an array Search for k: –Apply hash function h(k) –Take last global depth # bits of h(k) Insert: –If the bucket has space,](https://reader033.fdocuments.us/reader033/viewer/2022060523/6053295d2e2cbe5b195b82e8/html5/thumbnails/10.jpg)
CAS CS 460 [Fall 2020] - https://bu-disc.github.io/CS460/ - Manos Athanassoulis
h(k) mod M = bucket to insert data entry with key k (M: #buckets)
Let’s start from Static Hashing
10
h
0
1
…
…
M-1
keyh(key) mod M
…
………
Primary bucket pages Overflow pages
…
What else we can do instead of adding an overflow page?
…
![Page 11: Class 11: Hash Indexing...Extendible Hashing Directory: an array Search for k: –Apply hash function h(k) –Take last global depth # bits of h(k) Insert: –If the bucket has space,](https://reader033.fdocuments.us/reader033/viewer/2022060523/6053295d2e2cbe5b195b82e8/html5/thumbnails/11.jpg)
CAS CS 460 [Fall 2020] - https://bu-disc.github.io/CS460/ - Manos Athanassoulis
Extendible Hashing Why not double the number of buckets? Note that reading and writing all pages is expensive! Idea: Use directory of pointers to buckets On overflow, double the directory (not the # of buckets) Why does this help?
Directory is much smaller than the entire index file Only one page of data entries is split No overflow page! (caveat: duplicates w.r.t. the hash function)
Trick lies in how the hash function is adjusted!
11
![Page 12: Class 11: Hash Indexing...Extendible Hashing Directory: an array Search for k: –Apply hash function h(k) –Take last global depth # bits of h(k) Insert: –If the bucket has space,](https://reader033.fdocuments.us/reader033/viewer/2022060523/6053295d2e2cbe5b195b82e8/html5/thumbnails/12.jpg)
CAS CS 460 [Fall 2020] - https://bu-disc.github.io/CS460/ - Manos Athanassoulis
Extendible HashingDirectory: an array Search for k: – Apply hash function h(k) – Take last global depth # bits of h(k) Insert: – If the bucket has space, insert, done– If the bucket if full, split it, re-distribute – If necessary, double the directory
12
00
01
10
11
2global depth:
h
4* 12*2
1* 13*2
10*2
15* 7*2
directorydata pages
local depth
![Page 13: Class 11: Hash Indexing...Extendible Hashing Directory: an array Search for k: –Apply hash function h(k) –Take last global depth # bits of h(k) Insert: –If the bucket has space,](https://reader033.fdocuments.us/reader033/viewer/2022060523/6053295d2e2cbe5b195b82e8/html5/thumbnails/13.jpg)
CAS CS 460 [Fall 2020] - https://bu-disc.github.io/CS460/ - Manos Athanassoulis
Example
13
00
01
10
11
2global depth:
h
13*=1101
4* 12*2
1* 13*2
10*2
15* 7*2
directorydata pages
what is the hash function?
local depth
![Page 14: Class 11: Hash Indexing...Extendible Hashing Directory: an array Search for k: –Apply hash function h(k) –Take last global depth # bits of h(k) Insert: –If the bucket has space,](https://reader033.fdocuments.us/reader033/viewer/2022060523/6053295d2e2cbe5b195b82e8/html5/thumbnails/14.jpg)
CAS CS 460 [Fall 2020] - https://bu-disc.github.io/CS460/ - Manos Athanassoulis
Example: Insert 6
14
00
01
10
11
2global depth:
h
6*=0110
4* 12*2
1* 13*2
10*2
15* 7*2
directorydata pages
![Page 15: Class 11: Hash Indexing...Extendible Hashing Directory: an array Search for k: –Apply hash function h(k) –Take last global depth # bits of h(k) Insert: –If the bucket has space,](https://reader033.fdocuments.us/reader033/viewer/2022060523/6053295d2e2cbe5b195b82e8/html5/thumbnails/15.jpg)
CAS CS 460 [Fall 2020] - https://bu-disc.github.io/CS460/ - Manos Athanassoulis
Example: Insert 6
15
00
01
10
11
2global depth:
h
6*=0110
4* 12*2
1* 13*2
10*2
15* 7*2
directorydata pages
![Page 16: Class 11: Hash Indexing...Extendible Hashing Directory: an array Search for k: –Apply hash function h(k) –Take last global depth # bits of h(k) Insert: –If the bucket has space,](https://reader033.fdocuments.us/reader033/viewer/2022060523/6053295d2e2cbe5b195b82e8/html5/thumbnails/16.jpg)
CAS CS 460 [Fall 2020] - https://bu-disc.github.io/CS460/ - Manos Athanassoulis
Example: Insert 6
16
00
01
10
11
2global depth:
h
6*=0110
4* 12*2
1* 13*2
10* 6*2
15* 7*2
directorydata pages
![Page 17: Class 11: Hash Indexing...Extendible Hashing Directory: an array Search for k: –Apply hash function h(k) –Take last global depth # bits of h(k) Insert: –If the bucket has space,](https://reader033.fdocuments.us/reader033/viewer/2022060523/6053295d2e2cbe5b195b82e8/html5/thumbnails/17.jpg)
CAS CS 460 [Fall 2020] - https://bu-disc.github.io/CS460/ - Manos Athanassoulis
Example 2: Insert 9
17
00
01
10
11
2
h
9*=1001
4* 12*2
1* 13*2
10* 6*2
15* 7*2
directorydata pages
![Page 18: Class 11: Hash Indexing...Extendible Hashing Directory: an array Search for k: –Apply hash function h(k) –Take last global depth # bits of h(k) Insert: –If the bucket has space,](https://reader033.fdocuments.us/reader033/viewer/2022060523/6053295d2e2cbe5b195b82e8/html5/thumbnails/18.jpg)
CAS CS 460 [Fall 2020] - https://bu-disc.github.io/CS460/ - Manos Athanassoulis
Example 2: Insert 9
18
00
01
10
11
2
h
9*=1001
4* 12*2
1* 13*2
10* 6*2
15* 7*2
directorydata pages
now what??
![Page 19: Class 11: Hash Indexing...Extendible Hashing Directory: an array Search for k: –Apply hash function h(k) –Take last global depth # bits of h(k) Insert: –If the bucket has space,](https://reader033.fdocuments.us/reader033/viewer/2022060523/6053295d2e2cbe5b195b82e8/html5/thumbnails/19.jpg)
CAS CS 460 [Fall 2020] - https://bu-disc.github.io/CS460/ - Manos Athanassoulis
Example 2: Insert 9
19
000
001
010
011
3
h
9*=1001
4* 12*2
1* 13*2
10* 6*2
15* 7*2
data pages100
101
110
111
(1) double the directory
![Page 20: Class 11: Hash Indexing...Extendible Hashing Directory: an array Search for k: –Apply hash function h(k) –Take last global depth # bits of h(k) Insert: –If the bucket has space,](https://reader033.fdocuments.us/reader033/viewer/2022060523/6053295d2e2cbe5b195b82e8/html5/thumbnails/20.jpg)
CAS CS 460 [Fall 2020] - https://bu-disc.github.io/CS460/ - Manos Athanassoulis
Example 2: Insert 9
20
000
001
010
011
3
h
9*=1001
4* 12*2
1*3
10* 6*2
15* 7*2100
101
110
111
(1) double the directory(2) re-distribute the split bucket
13*3
![Page 21: Class 11: Hash Indexing...Extendible Hashing Directory: an array Search for k: –Apply hash function h(k) –Take last global depth # bits of h(k) Insert: –If the bucket has space,](https://reader033.fdocuments.us/reader033/viewer/2022060523/6053295d2e2cbe5b195b82e8/html5/thumbnails/21.jpg)
CAS CS 460 [Fall 2020] - https://bu-disc.github.io/CS460/ - Manos Athanassoulis
Example 2: Insert 9
21
000
001
010
011
3
h
9*=1001
4* 12*2
1*3
10* 6*2
15* 7*2100
101
110
111
(1) double the directory(2) re-distribute the split bucket(3) connect corresponding buckets
13*3
![Page 22: Class 11: Hash Indexing...Extendible Hashing Directory: an array Search for k: –Apply hash function h(k) –Take last global depth # bits of h(k) Insert: –If the bucket has space,](https://reader033.fdocuments.us/reader033/viewer/2022060523/6053295d2e2cbe5b195b82e8/html5/thumbnails/22.jpg)
CAS CS 460 [Fall 2020] - https://bu-disc.github.io/CS460/ - Manos Athanassoulis
Example 2: Insert 9
22
000
001
010
011
3
h
9*=1001
4* 12*2
1*3
10* 6*2
15* 7*2100
101
110
111
13*3
![Page 23: Class 11: Hash Indexing...Extendible Hashing Directory: an array Search for k: –Apply hash function h(k) –Take last global depth # bits of h(k) Insert: –If the bucket has space,](https://reader033.fdocuments.us/reader033/viewer/2022060523/6053295d2e2cbe5b195b82e8/html5/thumbnails/23.jpg)
CAS CS 460 [Fall 2020] - https://bu-disc.github.io/CS460/ - Manos Athanassoulis
Example 2: Insert 9
23
000
001
010
011
3
h
9*=1001
4* 12*2
1* 9*3
10* 6*2
15* 7*2100
101
110
111
13*3
do we have to re-distribute all?
![Page 24: Class 11: Hash Indexing...Extendible Hashing Directory: an array Search for k: –Apply hash function h(k) –Take last global depth # bits of h(k) Insert: –If the bucket has space,](https://reader033.fdocuments.us/reader033/viewer/2022060523/6053295d2e2cbe5b195b82e8/html5/thumbnails/24.jpg)
CAS CS 460 [Fall 2020] - https://bu-disc.github.io/CS460/ - Manos Athanassoulis
Example 3: Insert 5
24
000
001
010
011
3
h
5*=0101
4* 12*2
1* 9*3
10* 6*2
15* 7*2100
101
110
111
13*3
![Page 25: Class 11: Hash Indexing...Extendible Hashing Directory: an array Search for k: –Apply hash function h(k) –Take last global depth # bits of h(k) Insert: –If the bucket has space,](https://reader033.fdocuments.us/reader033/viewer/2022060523/6053295d2e2cbe5b195b82e8/html5/thumbnails/25.jpg)
CAS CS 460 [Fall 2020] - https://bu-disc.github.io/CS460/ - Manos Athanassoulis
Example 3: Insert 5
25
000
001
010
011
3
h
5*=0101
4* 12*2
1* 9*3
10* 6*2
15* 7*2100
101
110
111
13* 5*3
what happens if we want to insert 17?
[17à10001] so, double the dir again!
do we have to re-distribute all?
![Page 26: Class 11: Hash Indexing...Extendible Hashing Directory: an array Search for k: –Apply hash function h(k) –Take last global depth # bits of h(k) Insert: –If the bucket has space,](https://reader033.fdocuments.us/reader033/viewer/2022060523/6053295d2e2cbe5b195b82e8/html5/thumbnails/26.jpg)
CAS CS 460 [Fall 2020] - https://bu-disc.github.io/CS460/ - Manos Athanassoulis
Example 3: Insert 5
26
000
001
010
011
3
h
5*=0101
4* 12*2
1* 9*3
10* 6*2
15* 7*2100
101
110
111
13* 5*3
do we have to double the directoryevery time we split a bucket?
![Page 27: Class 11: Hash Indexing...Extendible Hashing Directory: an array Search for k: –Apply hash function h(k) –Take last global depth # bits of h(k) Insert: –If the bucket has space,](https://reader033.fdocuments.us/reader033/viewer/2022060523/6053295d2e2cbe5b195b82e8/html5/thumbnails/27.jpg)
CAS CS 460 [Fall 2020] - https://bu-disc.github.io/CS460/ - Manos Athanassoulis
Example 3: Insert 14
27
000
001
010
011
3
h
14*=1110
4* 12*2
1* 9*3
10* 6*2
15* 7*2100
101
110
111
13* 5*3
![Page 28: Class 11: Hash Indexing...Extendible Hashing Directory: an array Search for k: –Apply hash function h(k) –Take last global depth # bits of h(k) Insert: –If the bucket has space,](https://reader033.fdocuments.us/reader033/viewer/2022060523/6053295d2e2cbe5b195b82e8/html5/thumbnails/28.jpg)
CAS CS 460 [Fall 2020] - https://bu-disc.github.io/CS460/ - Manos Athanassoulis
Example 3: Insert 14
28
000
001
010
011
3
h
14*=1110
4* 12*2
1* 9*3
10* 6*2
15* 7*2100
101
110
111
13* 5*3
![Page 29: Class 11: Hash Indexing...Extendible Hashing Directory: an array Search for k: –Apply hash function h(k) –Take last global depth # bits of h(k) Insert: –If the bucket has space,](https://reader033.fdocuments.us/reader033/viewer/2022060523/6053295d2e2cbe5b195b82e8/html5/thumbnails/29.jpg)
CAS CS 460 [Fall 2020] - https://bu-disc.github.io/CS460/ - Manos Athanassoulis
Example 3: Insert 14
29
000
001
010
011
3
h
14*=1110
4* 12*2
1* 9*3
10*3
15* 7*2100
101
110
111
13* 5*3
6*3
![Page 30: Class 11: Hash Indexing...Extendible Hashing Directory: an array Search for k: –Apply hash function h(k) –Take last global depth # bits of h(k) Insert: –If the bucket has space,](https://reader033.fdocuments.us/reader033/viewer/2022060523/6053295d2e2cbe5b195b82e8/html5/thumbnails/30.jpg)
CAS CS 460 [Fall 2020] - https://bu-disc.github.io/CS460/ - Manos Athanassoulis
Example 3: Insert 14
30
000
001
010
011
3
h
14*=1110
4* 12*2
1* 9*3
10*3
15* 7*2100
101
110
111
13* 5*3
6* 14*3
![Page 31: Class 11: Hash Indexing...Extendible Hashing Directory: an array Search for k: –Apply hash function h(k) –Take last global depth # bits of h(k) Insert: –If the bucket has space,](https://reader033.fdocuments.us/reader033/viewer/2022060523/6053295d2e2cbe5b195b82e8/html5/thumbnails/31.jpg)
CAS CS 460 [Fall 2020] - https://bu-disc.github.io/CS460/ - Manos Athanassoulis
Notes on Extendible HashingHow many disk accesses for equality search?
– One if directory fits in memory, else two
Directory grows in spurts, and, if the distribution of hash values is skewed, can grow large
31
![Page 32: Class 11: Hash Indexing...Extendible Hashing Directory: an array Search for k: –Apply hash function h(k) –Take last global depth # bits of h(k) Insert: –If the bucket has space,](https://reader033.fdocuments.us/reader033/viewer/2022060523/6053295d2e2cbe5b195b82e8/html5/thumbnails/32.jpg)
CAS CS 460 [Fall 2020] - https://bu-disc.github.io/CS460/ - Manos Athanassoulis
Notes on Extendible HashingDo we ever need overflow pages?
– Multiple entries with same hash value cause problems!
Delete: Reverse of inserts – Can merge with split image– Can shrink the directory by half. When?
Each directory element points to same bucket as its split image – Is shrinking/merging a good idea?
32
![Page 33: Class 11: Hash Indexing...Extendible Hashing Directory: an array Search for k: –Apply hash function h(k) –Take last global depth # bits of h(k) Insert: –If the bucket has space,](https://reader033.fdocuments.us/reader033/viewer/2022060523/6053295d2e2cbe5b195b82e8/html5/thumbnails/33.jpg)
CAS CS 460 [Fall 2020] - https://bu-disc.github.io/CS460/ - Manos Athanassoulis
Units
Hash Indexing
Static Hashing
Extendible Hashing
Linear Hashing
33
![Page 34: Class 11: Hash Indexing...Extendible Hashing Directory: an array Search for k: –Apply hash function h(k) –Take last global depth # bits of h(k) Insert: –If the bucket has space,](https://reader033.fdocuments.us/reader033/viewer/2022060523/6053295d2e2cbe5b195b82e8/html5/thumbnails/34.jpg)
CAS CS 460 [Fall 2020] - https://bu-disc.github.io/CS460/ - Manos Athanassoulis
Linear Hashinganother dynamic hashing scheme
LH handles overflow chains without a directory
Idea: Use overflow pages, and split pages in a round-robin fashion
34
![Page 35: Class 11: Hash Indexing...Extendible Hashing Directory: an array Search for k: –Apply hash function h(k) –Take last global depth # bits of h(k) Insert: –If the bucket has space,](https://reader033.fdocuments.us/reader033/viewer/2022060523/6053295d2e2cbe5b195b82e8/html5/thumbnails/35.jpg)
CAS CS 460 [Fall 2020] - https://bu-disc.github.io/CS460/ - Manos Athanassoulis
Example
35
4* 8*
1* 13*
10*
15* 7*
00
01
10
11
this for information reasons! it is not really kept.
Next bucket to splith0
000
001
010
011
h1
what happens when we insert 5?
![Page 36: Class 11: Hash Indexing...Extendible Hashing Directory: an array Search for k: –Apply hash function h(k) –Take last global depth # bits of h(k) Insert: –If the bucket has space,](https://reader033.fdocuments.us/reader033/viewer/2022060523/6053295d2e2cbe5b195b82e8/html5/thumbnails/36.jpg)
CAS CS 460 [Fall 2020] - https://bu-disc.github.io/CS460/ - Manos Athanassoulis
Example
36
4* 8*
1* 13*
10*
15* 7*
00
01
10
11
this for information reasons! it is not really kept.
Next bucket to splith0
000
001
010
011
h1
what happens when we insert 5?
(1) 5 goes to an overflow page
5*
![Page 37: Class 11: Hash Indexing...Extendible Hashing Directory: an array Search for k: –Apply hash function h(k) –Take last global depth # bits of h(k) Insert: –If the bucket has space,](https://reader033.fdocuments.us/reader033/viewer/2022060523/6053295d2e2cbe5b195b82e8/html5/thumbnails/37.jpg)
CAS CS 460 [Fall 2020] - https://bu-disc.github.io/CS460/ - Manos Athanassoulis
Example
37
8*
1* 13*
10*
15* 7*
00
01
10
11
this for information reasons! it is not really kept.
Next bucket to splith0
4*
000
001
010
011
h1
100what happens when we insert 5?
(1) 5 goes to an overflow page(2) we split the ”next” page
5*
![Page 38: Class 11: Hash Indexing...Extendible Hashing Directory: an array Search for k: –Apply hash function h(k) –Take last global depth # bits of h(k) Insert: –If the bucket has space,](https://reader033.fdocuments.us/reader033/viewer/2022060523/6053295d2e2cbe5b195b82e8/html5/thumbnails/38.jpg)
CAS CS 460 [Fall 2020] - https://bu-disc.github.io/CS460/ - Manos Athanassoulis
Example
38
8*
1* 13*
10*
15* 7*
00
01
10
11
this for information reasons! it is not really kept.
Next bucket to split
h0
4*
000
001
010
011
h1
100what happens when we insert 5?
(1) 5 goes to an overflow page(2) we split the ”next” page(3) we move the ”next” pointer
5*
![Page 39: Class 11: Hash Indexing...Extendible Hashing Directory: an array Search for k: –Apply hash function h(k) –Take last global depth # bits of h(k) Insert: –If the bucket has space,](https://reader033.fdocuments.us/reader033/viewer/2022060523/6053295d2e2cbe5b195b82e8/html5/thumbnails/39.jpg)
CAS CS 460 [Fall 2020] - https://bu-disc.github.io/CS460/ - Manos Athanassoulis
Example: Insert 2
39
8*
1* 13*
10*
15* 7*
00
01
10
11
this for information reasons! it is not really kept.
Next bucket to split
h0
4*
000
001
010
011
h1
100
5*
![Page 40: Class 11: Hash Indexing...Extendible Hashing Directory: an array Search for k: –Apply hash function h(k) –Take last global depth # bits of h(k) Insert: –If the bucket has space,](https://reader033.fdocuments.us/reader033/viewer/2022060523/6053295d2e2cbe5b195b82e8/html5/thumbnails/40.jpg)
CAS CS 460 [Fall 2020] - https://bu-disc.github.io/CS460/ - Manos Athanassoulis
Example: Insert 2
40
8*
1* 13*
10* 2*
15* 7*
00
01
10
11
this for information reasons! it is not really kept.
Next bucket to split
h0
4*
000
001
010
011
h1
100
5*
![Page 41: Class 11: Hash Indexing...Extendible Hashing Directory: an array Search for k: –Apply hash function h(k) –Take last global depth # bits of h(k) Insert: –If the bucket has space,](https://reader033.fdocuments.us/reader033/viewer/2022060523/6053295d2e2cbe5b195b82e8/html5/thumbnails/41.jpg)
CAS CS 460 [Fall 2020] - https://bu-disc.github.io/CS460/ - Manos Athanassoulis
Example: Insert 3
41
8*
1* 13*
10* 2*
15* 7*
00
01
10
11
this for information reasons! it is not really kept.
Next bucket to split
h0
4*
000
001
010
011
h1
100
5*
what happens when we insert 3?
![Page 42: Class 11: Hash Indexing...Extendible Hashing Directory: an array Search for k: –Apply hash function h(k) –Take last global depth # bits of h(k) Insert: –If the bucket has space,](https://reader033.fdocuments.us/reader033/viewer/2022060523/6053295d2e2cbe5b195b82e8/html5/thumbnails/42.jpg)
CAS CS 460 [Fall 2020] - https://bu-disc.github.io/CS460/ - Manos Athanassoulis
Example: Insert 3
42
8*
1* 13*
10* 2*
15* 7*
00
01
10
11
this for information reasons! it is not really kept.
Next bucket to split
h0
4*
000
001
010
011
h1
100
5*
3*
what happens when we insert 3?
(1) 3 goes to an overflow page
![Page 43: Class 11: Hash Indexing...Extendible Hashing Directory: an array Search for k: –Apply hash function h(k) –Take last global depth # bits of h(k) Insert: –If the bucket has space,](https://reader033.fdocuments.us/reader033/viewer/2022060523/6053295d2e2cbe5b195b82e8/html5/thumbnails/43.jpg)
CAS CS 460 [Fall 2020] - https://bu-disc.github.io/CS460/ - Manos Athanassoulis
Example: Insert 3
43
8*
1*
10* 2*
15* 7*
00
01
10
11
this for information reasons! it is not really kept.
Next bucket to split
h0
4*
000
001
010
011
h1
100
3*
what happens when we insert 3?
(1) 3 goes to an overflow page(2) we split the ”next” page
13* 5*101
![Page 44: Class 11: Hash Indexing...Extendible Hashing Directory: an array Search for k: –Apply hash function h(k) –Take last global depth # bits of h(k) Insert: –If the bucket has space,](https://reader033.fdocuments.us/reader033/viewer/2022060523/6053295d2e2cbe5b195b82e8/html5/thumbnails/44.jpg)
CAS CS 460 [Fall 2020] - https://bu-disc.github.io/CS460/ - Manos Athanassoulis
Example: Insert 3
44
8*
1*
10* 2*
15* 7*
00
01
10
11
this for information reasons! it is not really kept.
Next bucket to split
h0
4*
000
001
010
011
h1
100
3*
what happens when we insert 3?
(1) 3 goes to an overflow page(2) we split the ”next” page(3) we move the ”next” pointer
13* 5*101
![Page 45: Class 11: Hash Indexing...Extendible Hashing Directory: an array Search for k: –Apply hash function h(k) –Take last global depth # bits of h(k) Insert: –If the bucket has space,](https://reader033.fdocuments.us/reader033/viewer/2022060523/6053295d2e2cbe5b195b82e8/html5/thumbnails/45.jpg)
CAS CS 460 [Fall 2020] - https://bu-disc.github.io/CS460/ - Manos Athanassoulis
Linear Hashingh0, h1 ,h2 … can be more general hash functions
when h0 hits on a split buffer we employ h1 and we have to look in both buffers
if the second is also split we use h2 and so on
Benefit: buckets are split round-robin à no long chains
45
![Page 46: Class 11: Hash Indexing...Extendible Hashing Directory: an array Search for k: –Apply hash function h(k) –Take last global depth # bits of h(k) Insert: –If the bucket has space,](https://reader033.fdocuments.us/reader033/viewer/2022060523/6053295d2e2cbe5b195b82e8/html5/thumbnails/46.jpg)
CAS CS 460 [Fall 2020] - https://bu-disc.github.io/CS460/ - Manos Athanassoulis
Hash IndexingHash indexes: best for equality searches
Static Hashing can lead to long overflow chains
Extendible Hashingavoids overflow pages by splitting a bucket when fulldirectory to keep track of bucketsdir. can get too large (>memory) when data is skewed
Linear Hashingavoids directory by splitting buckets round-robinuses overflow pagesoverflow pages not likely to be long
46