Modern Database Systems Lecture 2 - Aalto · 2017. 2. 1. · comments on extendible hashing if...
Transcript of Modern Database Systems Lecture 2 - Aalto · 2017. 2. 1. · comments on extendible hashing if...
![Page 1: Modern Database Systems Lecture 2 - Aalto · 2017. 2. 1. · comments on extendible hashing if directoryfits in memory, equality queryanswered in one disk access answered = retrieve](https://reader033.fdocuments.us/reader033/viewer/2022060523/6053295c2e2cbe5b195b82e4/html5/thumbnails/1.jpg)
Modern Database SystemsLecture 2
Aristides GionisMichael Mathioudakis
Spring 2017
![Page 2: Modern Database Systems Lecture 2 - Aalto · 2017. 2. 1. · comments on extendible hashing if directoryfits in memory, equality queryanswered in one disk access answered = retrieve](https://reader033.fdocuments.us/reader033/viewer/2022060523/6053295c2e2cbe5b195b82e4/html5/thumbnails/2.jpg)
in this lecture...
b+ trees and hash-based indexingexternal sorting
2
![Page 3: Modern Database Systems Lecture 2 - Aalto · 2017. 2. 1. · comments on extendible hashing if directoryfits in memory, equality queryanswered in one disk access answered = retrieve](https://reader033.fdocuments.us/reader033/viewer/2022060523/6053295c2e2cbe5b195b82e4/html5/thumbnails/3.jpg)
b+ trees
![Page 4: Modern Database Systems Lecture 2 - Aalto · 2017. 2. 1. · comments on extendible hashing if directoryfits in memory, equality queryanswered in one disk access answered = retrieve](https://reader033.fdocuments.us/reader033/viewer/2022060523/6053295c2e2cbe5b195b82e4/html5/thumbnails/4.jpg)
b+ trees
4
leaf nodescontain data-entries
sequentially linkedeach node stored in one page
data entries can be any one of the three alternative typestype 1: data records; type 2: (k, rid); type 3: (k, rids)
at least 50% capacity - except for root!
non-leaf nodesindex entries
used to direct search
in the examples that follow...alternative 2 is used
all nodes have between d and 2d key entriesd is the order of the tree
![Page 5: Modern Database Systems Lecture 2 - Aalto · 2017. 2. 1. · comments on extendible hashing if directoryfits in memory, equality queryanswered in one disk access answered = retrieve](https://reader033.fdocuments.us/reader033/viewer/2022060523/6053295c2e2cbe5b195b82e4/html5/thumbnails/5.jpg)
b+ trees
5
non-leaf nodesindex entries
used to direct search
P0 K 1 P 1 K 2 P 2 K m P m
k* < K1 K1 ≤ k* < K2Km ≤ k*
leaf nodescontain data-entries
sequentially linked
closer look at non-leaf nodes
search key values pointers
![Page 6: Modern Database Systems Lecture 2 - Aalto · 2017. 2. 1. · comments on extendible hashing if directoryfits in memory, equality queryanswered in one disk access answered = retrieve](https://reader033.fdocuments.us/reader033/viewer/2022060523/6053295c2e2cbe5b195b82e4/html5/thumbnails/6.jpg)
b+ trees
6
most widely used index
search and updates at logFN cost (cost = pages I/O)F = fanout (num of pointers per index node); N = num of leaf pages
efficient equality and range queries
non-leaf nodesindex entries
used to direct search
leaf nodescontain data-entries
sequentially linked
![Page 7: Modern Database Systems Lecture 2 - Aalto · 2017. 2. 1. · comments on extendible hashing if directoryfits in memory, equality queryanswered in one disk access answered = retrieve](https://reader033.fdocuments.us/reader033/viewer/2022060523/6053295c2e2cbe5b195b82e4/html5/thumbnails/7.jpg)
example b+ tree - search
search begins at root, and key comparisons direct it to a leafsearch for 5*; search for all data entries >= 24*
root
17 24 30
2* 3* 5* 7* 14* 16* 19* 20* 22* 24* 27* 29* 33* 34* 38* 39*
13
7
![Page 8: Modern Database Systems Lecture 2 - Aalto · 2017. 2. 1. · comments on extendible hashing if directoryfits in memory, equality queryanswered in one disk access answered = retrieve](https://reader033.fdocuments.us/reader033/viewer/2022060523/6053295c2e2cbe5b195b82e4/html5/thumbnails/8.jpg)
inserting a data entry
1. find correct leaf L2. place data entry onto L
a. if L has enough space, done!b. else must split L into L and L2
• redistribute entries evenly• copy up the middle key to parent of L• insert entry pointing to L2 to parent of L
8
the above happens recursivelywhen index nodes are split, push up middle key
splits grow the treeroot split increases height
![Page 9: Modern Database Systems Lecture 2 - Aalto · 2017. 2. 1. · comments on extendible hashing if directoryfits in memory, equality queryanswered in one disk access answered = retrieve](https://reader033.fdocuments.us/reader033/viewer/2022060523/6053295c2e2cbe5b195b82e4/html5/thumbnails/9.jpg)
example b+ tree
insert 8*
root
17 24 30
2* 3* 5* 7* 14* 16* 19* 20* 22* 24* 27* 29* 33* 34* 38* 39*
13
9
middle key is copied up (andcontinues to appear in the leaf)
5≥ 5< 5
5 24 30
17
13
middle key is pushed up
≥ 17< 17split parent node!
L
2* 3* 5* 7* 8*
L L2
![Page 10: Modern Database Systems Lecture 2 - Aalto · 2017. 2. 1. · comments on extendible hashing if directoryfits in memory, equality queryanswered in one disk access answered = retrieve](https://reader033.fdocuments.us/reader033/viewer/2022060523/6053295c2e2cbe5b195b82e4/html5/thumbnails/10.jpg)
example b+ tree
insert 8*
root
17 24 30
2* 3* 5* 7* 14* 16* 19* 20* 22* 24* 27* 29* 33* 34* 38* 39*
13
10
2* 3*
17
24 30
14* 16* 19* 20* 22* 24* 27* 29* 33* 34* 38* 39*
135
7*5* 8*
![Page 11: Modern Database Systems Lecture 2 - Aalto · 2017. 2. 1. · comments on extendible hashing if directoryfits in memory, equality queryanswered in one disk access answered = retrieve](https://reader033.fdocuments.us/reader033/viewer/2022060523/6053295c2e2cbe5b195b82e4/html5/thumbnails/11.jpg)
deleting a data entry
11
inverse of insertion
re-distribute entries &(maybe) merge nodes
vs split nodes &re-distribute entries
when nodesare less than half-full
when nodes overflow
remove data entry add data entryvs
deletion insertion
![Page 12: Modern Database Systems Lecture 2 - Aalto · 2017. 2. 1. · comments on extendible hashing if directoryfits in memory, equality queryanswered in one disk access answered = retrieve](https://reader033.fdocuments.us/reader033/viewer/2022060523/6053295c2e2cbe5b195b82e4/html5/thumbnails/12.jpg)
b+ trees in practicetypical order d = 100, fill-factor = 67%
average fan-out 133
typical capacities:for height 4: 1334 = 312,900,700 records
for height 3: 1333 = 2,352,637 records
can often hold top levels in main memorylevel 1: 1 page = 8KBytes
level 2: 133 pages = 1MBytelevel 3: 17,689 pages = 133 MBytes
12
![Page 13: Modern Database Systems Lecture 2 - Aalto · 2017. 2. 1. · comments on extendible hashing if directoryfits in memory, equality queryanswered in one disk access answered = retrieve](https://reader033.fdocuments.us/reader033/viewer/2022060523/6053295c2e2cbe5b195b82e4/html5/thumbnails/13.jpg)
hash-based indexes
![Page 14: Modern Database Systems Lecture 2 - Aalto · 2017. 2. 1. · comments on extendible hashing if directoryfits in memory, equality queryanswered in one disk access answered = retrieve](https://reader033.fdocuments.us/reader033/viewer/2022060523/6053295c2e2cbe5b195b82e4/html5/thumbnails/14.jpg)
hash-based index
the index supports equality queriesdoes not support range queriesstatic and dynamic variants exist 14
data entries organized in M bucketsbucket = a collection of pages
the data entry for recordwith search key value key
is assigned to bucket given by hashing function
h(key) mod M
e.g., h(key) = α key + β
h(key) mod M
keyh
0
1
2
M-1
...
buckets
![Page 15: Modern Database Systems Lecture 2 - Aalto · 2017. 2. 1. · comments on extendible hashing if directoryfits in memory, equality queryanswered in one disk access answered = retrieve](https://reader033.fdocuments.us/reader033/viewer/2022060523/6053295c2e2cbe5b195b82e4/html5/thumbnails/15.jpg)
static hashing
number of buckets is fixed
start with one page per bucket
allocated sequentially, never de-allocatedcan use overflow pages
15
h(key) mod M
keyh
0
1
2
M-1
...
buckets
![Page 16: Modern Database Systems Lecture 2 - Aalto · 2017. 2. 1. · comments on extendible hashing if directoryfits in memory, equality queryanswered in one disk access answered = retrieve](https://reader033.fdocuments.us/reader033/viewer/2022060523/6053295c2e2cbe5b195b82e4/html5/thumbnails/16.jpg)
static hashing
drawbacklong overflow chains can degrade performance
dynamic hashing techniquesadapt index to data sizeextendible and linear hashing
16
![Page 17: Modern Database Systems Lecture 2 - Aalto · 2017. 2. 1. · comments on extendible hashing if directoryfits in memory, equality queryanswered in one disk access answered = retrieve](https://reader033.fdocuments.us/reader033/viewer/2022060523/6053295c2e2cbe5b195b82e4/html5/thumbnails/17.jpg)
extendible hashingproblem: bucket becomes full
one solutiondouble the number of buckets......and redistribute data entries
however…reading and re-writing all buckets is expensive
better idea:use directory of pointers to buckets
double number of ‘logical’ buckets…but split ‘physically’ only the overflown bucket
directory much smaller than data entry pages - good!no overflow pages - good!
17
![Page 18: Modern Database Systems Lecture 2 - Aalto · 2017. 2. 1. · comments on extendible hashing if directoryfits in memory, equality queryanswered in one disk access answered = retrieve](https://reader033.fdocuments.us/reader033/viewer/2022060523/6053295c2e2cbe5b195b82e4/html5/thumbnails/18.jpg)
example
2
2
2
2
local depth
2
global depth
directory
00011011
data entries
bucket A
bucket B
bucket C
bucket D
4* 12* 32* 16*
1* 5* 21* 13*
10*
15* 7* 19*
directory is array of size M = 4 = 22
to find bucket for r, take last 2 # bits of keyh(r) = key mod 22
e.g., if h(r) = 5 = binary 101it is in bucket pointed to by 01
global depth = 2 = min. bits enough to enumerate buckets
local depth= min bits to identify individual bucket
= 2
18
![Page 19: Modern Database Systems Lecture 2 - Aalto · 2017. 2. 1. · comments on extendible hashing if directoryfits in memory, equality queryanswered in one disk access answered = retrieve](https://reader033.fdocuments.us/reader033/viewer/2022060523/6053295c2e2cbe5b195b82e4/html5/thumbnails/19.jpg)
insertion
2
2
2
2
local depth
2
global depth
directory
00011011
data entries
bucket A
bucket B
bucket C
bucket D
4* 12* 32* 16*
1* 5* 21* 13*
10*
15* 7* 19*
try to insert entry tocorresponding bucket
if necessary, double the directoryi.e., when for split bucketlocal depth > global depth
19
when directory doubles,increase global depth +1
if bucket is full,increase +1 local depth
and split bucket(allocate new bucket, re-distribute)
![Page 20: Modern Database Systems Lecture 2 - Aalto · 2017. 2. 1. · comments on extendible hashing if directoryfits in memory, equality queryanswered in one disk access answered = retrieve](https://reader033.fdocuments.us/reader033/viewer/2022060523/6053295c2e2cbe5b195b82e4/html5/thumbnails/20.jpg)
exampleinsert record with h(r) = 20 = binary 10100 è bucket A
split bucket A !allocate new page,
redistribute according to modulo 2M = 8 = 23
3 least significant bits
we’ll have more than 4 buckets now,
so double the directory!
00011011
2
2
2
2
2
local depth
global depth
directory
bucket A
bucket B
bucket C
bucket D
data entries
4* 12* 32* 16*
1* 5* 21* 13*
10*
15* 7* 19*
20
![Page 21: Modern Database Systems Lecture 2 - Aalto · 2017. 2. 1. · comments on extendible hashing if directoryfits in memory, equality queryanswered in one disk access answered = retrieve](https://reader033.fdocuments.us/reader033/viewer/2022060523/6053295c2e2cbe5b195b82e4/html5/thumbnails/21.jpg)
global depth
2
example
3
2
2
2
2
2
local depth 3
2
2
2
3
directory
001
4* 12* 20* bucket A2split image of A
000
010011
bucket A
bucket B
bucket C
bucket D
32* 16*
1* 5* 21* 13*
10*
15* 7* 19*
100101110111
21
split bucket A andredistribute entries
update local depthdouble the directoryupdate global depth
update pointers
![Page 22: Modern Database Systems Lecture 2 - Aalto · 2017. 2. 1. · comments on extendible hashing if directoryfits in memory, equality queryanswered in one disk access answered = retrieve](https://reader033.fdocuments.us/reader033/viewer/2022060523/6053295c2e2cbe5b195b82e4/html5/thumbnails/22.jpg)
notes
20 = binary 10100last 2 bits (00) tell us r belongs in A or A2
last 3 bits needed to tell which
global depth of directorynumber of bits enough to determine which bucket any entry belongs to
local depth of a bucketnumber of bits enough to determine if an entry belongs to this bucket
when does bucket split cause directory doubling?before insert, local depth of bucket = global depth
22
![Page 23: Modern Database Systems Lecture 2 - Aalto · 2017. 2. 1. · comments on extendible hashing if directoryfits in memory, equality queryanswered in one disk access answered = retrieve](https://reader033.fdocuments.us/reader033/viewer/2022060523/6053295c2e2cbe5b195b82e4/html5/thumbnails/23.jpg)
example
34* 12* 20* bucket A2
000001010011
3
3
2
2
2
local depth
global depth
directory
bucket A
bucket B
bucket C
bucket D
32* 16*
1* 5* 21* 13*
10*
15* 7* 19*
100101110111
insert h(r) = 17
split bucket B
000001010011
3 3
global depth
directory
bucket B1* 17*
100101110111
3bucket B25* 21* 13*
23
other buckets not shown
![Page 24: Modern Database Systems Lecture 2 - Aalto · 2017. 2. 1. · comments on extendible hashing if directoryfits in memory, equality queryanswered in one disk access answered = retrieve](https://reader033.fdocuments.us/reader033/viewer/2022060523/6053295c2e2cbe5b195b82e4/html5/thumbnails/24.jpg)
comments on extendible hashingif directory fits in memory,
equality query answered in one disk accessanswered = retrieve rid
directory grows in spurtsif hash values are skewed, it might grow large
delete: reverse algorithmempty bucket can be merged with its ‘split image’
when can the directory be halved?when all directory elements point to same bucket as their ‘split image’
24
![Page 25: Modern Database Systems Lecture 2 - Aalto · 2017. 2. 1. · comments on extendible hashing if directoryfits in memory, equality queryanswered in one disk access answered = retrieve](https://reader033.fdocuments.us/reader033/viewer/2022060523/6053295c2e2cbe5b195b82e4/html5/thumbnails/25.jpg)
indexes in SQL
![Page 26: Modern Database Systems Lecture 2 - Aalto · 2017. 2. 1. · comments on extendible hashing if directoryfits in memory, equality queryanswered in one disk access answered = retrieve](https://reader033.fdocuments.us/reader033/viewer/2022060523/6053295c2e2cbe5b195b82e4/html5/thumbnails/26.jpg)
create index
CREATE INDEX indexbON students (age, grade)USING BTREE;
CREATE INDEX indexhON students (age, grade)USING HASH;
DROP INDEX indexhON student;
26
![Page 27: Modern Database Systems Lecture 2 - Aalto · 2017. 2. 1. · comments on extendible hashing if directoryfits in memory, equality queryanswered in one disk access answered = retrieve](https://reader033.fdocuments.us/reader033/viewer/2022060523/6053295c2e2cbe5b195b82e4/html5/thumbnails/27.jpg)
27
external sorting
![Page 28: Modern Database Systems Lecture 2 - Aalto · 2017. 2. 1. · comments on extendible hashing if directoryfits in memory, equality queryanswered in one disk access answered = retrieve](https://reader033.fdocuments.us/reader033/viewer/2022060523/6053295c2e2cbe5b195b82e4/html5/thumbnails/28.jpg)
the sorting problemsetting
a relation R, stored over N disk pages3≤B<N pages available in memory (buffer pages)
tasksort records of R and store result on disk
sort by a function of record field values f(r)
whyapplication need records ordered
part of join implementation (in upcoming lecture...)28
![Page 29: Modern Database Systems Lecture 2 - Aalto · 2017. 2. 1. · comments on extendible hashing if directoryfits in memory, equality queryanswered in one disk access answered = retrieve](https://reader033.fdocuments.us/reader033/viewer/2022060523/6053295c2e2cbe5b195b82e4/html5/thumbnails/29.jpg)
sorting with 3 buffer pages
2 phases
29
1
2
N
inputrelation R
N p
ages
sto
red
on d
isk
outputsorted R
f(r)
N pages stored on disk
buffer (memory used by dbms)
![Page 30: Modern Database Systems Lecture 2 - Aalto · 2017. 2. 1. · comments on extendible hashing if directoryfits in memory, equality queryanswered in one disk access answered = retrieve](https://reader033.fdocuments.us/reader033/viewer/2022060523/6053295c2e2cbe5b195b82e4/html5/thumbnails/30.jpg)
sorting with 3 buffer pages - first phase
30
1
2
N
inputrelation R
N p
ages
sto
red
on d
isk
pass 0: output N runsrun: sorted sub-file
after first phase: one run is one pagehow: load one page at a time,
sort it in-memory, output to disk
run #1
run #2
run #N
only 1 buffer page needed for first phase
outputN runs
![Page 31: Modern Database Systems Lecture 2 - Aalto · 2017. 2. 1. · comments on extendible hashing if directoryfits in memory, equality queryanswered in one disk access answered = retrieve](https://reader033.fdocuments.us/reader033/viewer/2022060523/6053295c2e2cbe5b195b82e4/html5/thumbnails/31.jpg)
run #1
...
run #N/2
sorting with 3 buffer pages - second phase
31
inputrelation R
N p
ages
sto
red
on d
isk
run #1
run #2
run #(N-1)
run #N
pass 1,2,...: halve the runshow: scan pairs of runs, each in own page,
merge in-memory into a new run,output to disk
input page 1
input page 2
output page
N pages stored on disk
outputhalf runs
![Page 32: Modern Database Systems Lecture 2 - Aalto · 2017. 2. 1. · comments on extendible hashing if directoryfits in memory, equality queryanswered in one disk access answered = retrieve](https://reader033.fdocuments.us/reader033/viewer/2022060523/6053295c2e2cbe5b195b82e4/html5/thumbnails/32.jpg)
merge?
32
input page 1
input page 2
output page8 4 3 1
7 6 5 2
merge the two sorted input pagesinto the output page
maintaining sorted order
compare the next smallest value from each pagemove smallest to output page
valuesf(r)
![Page 33: Modern Database Systems Lecture 2 - Aalto · 2017. 2. 1. · comments on extendible hashing if directoryfits in memory, equality queryanswered in one disk access answered = retrieve](https://reader033.fdocuments.us/reader033/viewer/2022060523/6053295c2e2cbe5b195b82e4/html5/thumbnails/33.jpg)
merge?
33
input page 1
input page 2
output page8 4 3
7 6 5 21
merge the two input pages into the output pagemaintaining sorted order
compare the next smallest value from each pagemove smallest to output page
![Page 34: Modern Database Systems Lecture 2 - Aalto · 2017. 2. 1. · comments on extendible hashing if directoryfits in memory, equality queryanswered in one disk access answered = retrieve](https://reader033.fdocuments.us/reader033/viewer/2022060523/6053295c2e2cbe5b195b82e4/html5/thumbnails/34.jpg)
merge?
34
input page 1
input page 2
output page8 4 3
7 6 52 1
merge the two input pages into the output pagemaintaining sorted order
compare the next smallest value from each pagemove smallest to output page
![Page 35: Modern Database Systems Lecture 2 - Aalto · 2017. 2. 1. · comments on extendible hashing if directoryfits in memory, equality queryanswered in one disk access answered = retrieve](https://reader033.fdocuments.us/reader033/viewer/2022060523/6053295c2e2cbe5b195b82e4/html5/thumbnails/35.jpg)
merge?
35
input page 1
input page 2
output page8 4
7 6 53 2 1
merge the two input pages into the output pagemaintaining sorted order
compare the next smallest value from each pagemove smallest to output page
![Page 36: Modern Database Systems Lecture 2 - Aalto · 2017. 2. 1. · comments on extendible hashing if directoryfits in memory, equality queryanswered in one disk access answered = retrieve](https://reader033.fdocuments.us/reader033/viewer/2022060523/6053295c2e2cbe5b195b82e4/html5/thumbnails/36.jpg)
merge?
36
input page 1
input page 2
output page8
7 6 54 3 2 1
merge the two input pages into the output pagemaintaining sorted order
compare the next smallest value from each pagemove smallest to output page
output page is full,what do we do?
write it to disk!
![Page 37: Modern Database Systems Lecture 2 - Aalto · 2017. 2. 1. · comments on extendible hashing if directoryfits in memory, equality queryanswered in one disk access answered = retrieve](https://reader033.fdocuments.us/reader033/viewer/2022060523/6053295c2e2cbe5b195b82e4/html5/thumbnails/37.jpg)
merge?
37
input page 1
input page 2
output page8
7 6 5
merge the two input pages into the output pagemaintaining sorted order
compare the next smallest value from each pagemove smallest to output page
![Page 38: Modern Database Systems Lecture 2 - Aalto · 2017. 2. 1. · comments on extendible hashing if directoryfits in memory, equality queryanswered in one disk access answered = retrieve](https://reader033.fdocuments.us/reader033/viewer/2022060523/6053295c2e2cbe5b195b82e4/html5/thumbnails/38.jpg)
merge?
38
input page 1
input page 2
output page8
7 65
merge the two input pages into the output pagemaintaining sorted order
compare the next smallest value from each pagemove smallest to output page
![Page 39: Modern Database Systems Lecture 2 - Aalto · 2017. 2. 1. · comments on extendible hashing if directoryfits in memory, equality queryanswered in one disk access answered = retrieve](https://reader033.fdocuments.us/reader033/viewer/2022060523/6053295c2e2cbe5b195b82e4/html5/thumbnails/39.jpg)
merge?
39
input page 1
input page 2
output page8
76 5
merge the two input pages into the output pagemaintaining sorted order
compare the next smallest value from each pagemove smallest to output page
![Page 40: Modern Database Systems Lecture 2 - Aalto · 2017. 2. 1. · comments on extendible hashing if directoryfits in memory, equality queryanswered in one disk access answered = retrieve](https://reader033.fdocuments.us/reader033/viewer/2022060523/6053295c2e2cbe5b195b82e4/html5/thumbnails/40.jpg)
merge?
40
input page 1
input page 2
output page87 6 5
merge the two input pages into the output pagemaintaining sorted order
compare the next smallest value from each pagemove smallest to output page
input page is empty!what do we do?
if the input run has more pages, load
next one
![Page 41: Modern Database Systems Lecture 2 - Aalto · 2017. 2. 1. · comments on extendible hashing if directoryfits in memory, equality queryanswered in one disk access answered = retrieve](https://reader033.fdocuments.us/reader033/viewer/2022060523/6053295c2e2cbe5b195b82e4/html5/thumbnails/41.jpg)
merge?
41
input page 1
input page 2
output page
8 7 6 5
merge the two input pages into the output pagemaintaining sorted order
compare the next smallest value from each pagemove smallest to output page
![Page 42: Modern Database Systems Lecture 2 - Aalto · 2017. 2. 1. · comments on extendible hashing if directoryfits in memory, equality queryanswered in one disk access answered = retrieve](https://reader033.fdocuments.us/reader033/viewer/2022060523/6053295c2e2cbe5b195b82e4/html5/thumbnails/42.jpg)
merge?
42
input page 1
input page 2
output page
merge the two input pages into the output pagemaintaining sorted order
compare the next smallest value from each pagemove smallest to output page
![Page 43: Modern Database Systems Lecture 2 - Aalto · 2017. 2. 1. · comments on extendible hashing if directoryfits in memory, equality queryanswered in one disk access answered = retrieve](https://reader033.fdocuments.us/reader033/viewer/2022060523/6053295c2e2cbe5b195b82e4/html5/thumbnails/43.jpg)
sorting with 3 buffer pages - second phase
43
inputrelation R
N p
ages
sto
red
on d
isk N
pages stored on disk
run #1
run #2
run #(N-1)
run #N
pass 1,2,...: halve the runsafter log2N passes...
we are done!
input page 1
input page 2
output page
outputsorted R
f(r)
![Page 44: Modern Database Systems Lecture 2 - Aalto · 2017. 2. 1. · comments on extendible hashing if directoryfits in memory, equality queryanswered in one disk access answered = retrieve](https://reader033.fdocuments.us/reader033/viewer/2022060523/6053295c2e2cbe5b195b82e4/html5/thumbnails/44.jpg)
sorting with B buffer pages
44
1
2
N
inputrelation R
N p
ages
sto
red
on d
isk
outputsorted R
f(r)
N pages stored on disk
page #B
page #1
page #2
page #(B-1)
same approach
...
![Page 45: Modern Database Systems Lecture 2 - Aalto · 2017. 2. 1. · comments on extendible hashing if directoryfits in memory, equality queryanswered in one disk access answered = retrieve](https://reader033.fdocuments.us/reader033/viewer/2022060523/6053295c2e2cbe5b195b82e4/html5/thumbnails/45.jpg)
sorting with B buffer pages - first phase
45
1
2
N
inputrelation R
N p
ages
sto
red
on d
isk
outputsorted R
N pages stored on disk
...
pass 0: output éN/Bù runshow: load R to memory in chunks
of B pages, sort in-memory,output to disk
run #1
run #2
run é#N/Bù
page #B
page #1
page #2
page #(B-1)
![Page 46: Modern Database Systems Lecture 2 - Aalto · 2017. 2. 1. · comments on extendible hashing if directoryfits in memory, equality queryanswered in one disk access answered = retrieve](https://reader033.fdocuments.us/reader033/viewer/2022060523/6053295c2e2cbe5b195b82e4/html5/thumbnails/46.jpg)
sorting with B buffer pages - second phase
46
inputrelation R
N p
ages
sto
red
on d
isk
outputsorted R
f(r)
N pages stored on disk
output page
pass 1,2,...:merge runs in groups of B-1
...
input page #1
input page #2
input page #(B-1)
run #1
run #2
run #éN/Bù
![Page 47: Modern Database Systems Lecture 2 - Aalto · 2017. 2. 1. · comments on extendible hashing if directoryfits in memory, equality queryanswered in one disk access answered = retrieve](https://reader033.fdocuments.us/reader033/viewer/2022060523/6053295c2e2cbe5b195b82e4/html5/thumbnails/47.jpg)
sorting with B buffer pages
47
1
2
N
inputrelation R
N p
ages
sto
red
on d
isk
outputsorted R
f(r)
N pages stored on disk
output page
how many passes in total?let N1 = éN/Bù
total number of passes = 1 + élogB-1(N1)ù
...
input page #1
input page #2
input page #(B-1)
phase 1 phase 2
![Page 48: Modern Database Systems Lecture 2 - Aalto · 2017. 2. 1. · comments on extendible hashing if directoryfits in memory, equality queryanswered in one disk access answered = retrieve](https://reader033.fdocuments.us/reader033/viewer/2022060523/6053295c2e2cbe5b195b82e4/html5/thumbnails/48.jpg)
sorting with B buffer
48
1
2
N
inputrelation R
N p
ages
sto
red
on d
isk
outputsorted R
f(r)
N pages stored on disk
output page
how many pages I/O per pass?
...
input page #1
input page #2
input page #(B-1)
2N: N input, N output
![Page 49: Modern Database Systems Lecture 2 - Aalto · 2017. 2. 1. · comments on extendible hashing if directoryfits in memory, equality queryanswered in one disk access answered = retrieve](https://reader033.fdocuments.us/reader033/viewer/2022060523/6053295c2e2cbe5b195b82e4/html5/thumbnails/49.jpg)
summary
49
![Page 50: Modern Database Systems Lecture 2 - Aalto · 2017. 2. 1. · comments on extendible hashing if directoryfits in memory, equality queryanswered in one disk access answered = retrieve](https://reader033.fdocuments.us/reader033/viewer/2022060523/6053295c2e2cbe5b195b82e4/html5/thumbnails/50.jpg)
summary• commonly used indexes• B+ tree
• most commonly used• supports efficient equation and range queries
• hash-based indexes• extendible hashing uses directory, not overflow pages
• external sorting
50
![Page 51: Modern Database Systems Lecture 2 - Aalto · 2017. 2. 1. · comments on extendible hashing if directoryfits in memory, equality queryanswered in one disk access answered = retrieve](https://reader033.fdocuments.us/reader033/viewer/2022060523/6053295c2e2cbe5b195b82e4/html5/thumbnails/51.jpg)
references● “cowbook”, database management systems, by ramakrishnan and gehrke● “elmasri”, fundamentals of database systems, elmasri and navathe● other database textbooks
51
creditssome slides based on material fromdatabase management systems, by ramakrishnan and gehrke
![Page 52: Modern Database Systems Lecture 2 - Aalto · 2017. 2. 1. · comments on extendible hashing if directoryfits in memory, equality queryanswered in one disk access answered = retrieve](https://reader033.fdocuments.us/reader033/viewer/2022060523/6053295c2e2cbe5b195b82e4/html5/thumbnails/52.jpg)
backup slides
52
![Page 53: Modern Database Systems Lecture 2 - Aalto · 2017. 2. 1. · comments on extendible hashing if directoryfits in memory, equality queryanswered in one disk access answered = retrieve](https://reader033.fdocuments.us/reader033/viewer/2022060523/6053295c2e2cbe5b195b82e4/html5/thumbnails/53.jpg)
linear hashing
53
![Page 54: Modern Database Systems Lecture 2 - Aalto · 2017. 2. 1. · comments on extendible hashing if directoryfits in memory, equality queryanswered in one disk access answered = retrieve](https://reader033.fdocuments.us/reader033/viewer/2022060523/6053295c2e2cbe5b195b82e4/html5/thumbnails/54.jpg)
linear hashing
dynamic hashinguses overflow pages; no directory
splits a bucket in round-robin fashionwhen an overflow occurs
M = 2level: number of buckets at beginning of roundpointer next ∈ [0, M) points at next bucket to split
already next - 1 ‘split-image’ buckets appended to original M
54
![Page 55: Modern Database Systems Lecture 2 - Aalto · 2017. 2. 1. · comments on extendible hashing if directoryfits in memory, equality queryanswered in one disk access answered = retrieve](https://reader033.fdocuments.us/reader033/viewer/2022060523/6053295c2e2cbe5b195b82e4/html5/thumbnails/55.jpg)
linear hashingto allocate entries, use
H0(key) = h(key) mod M, orH1(key) = h(key) mod 2M
i.e., level or level+1 least significant bits of h(key)
to allocate bucket for keyfirst use H0(key)
if H0(key) is less than nextthen it refers to a split bucket
use H1(key) to determine if it refers to original or its split image
55
![Page 56: Modern Database Systems Lecture 2 - Aalto · 2017. 2. 1. · comments on extendible hashing if directoryfits in memory, equality queryanswered in one disk access answered = retrieve](https://reader033.fdocuments.us/reader033/viewer/2022060523/6053295c2e2cbe5b195b82e4/html5/thumbnails/56.jpg)
linear hashingin the middle of a round...
M buckets that existed at thebeginning of this round;this is the range of H0
bucket to be split nextbuckets already split in this round
split image bucketscreated through splitting of other buckets in this round
if H0(key) is in this range, then must use H1(key) to decide if entry is in split image bucket
> is a directory necessary? 56
![Page 57: Modern Database Systems Lecture 2 - Aalto · 2017. 2. 1. · comments on extendible hashing if directoryfits in memory, equality queryanswered in one disk access answered = retrieve](https://reader033.fdocuments.us/reader033/viewer/2022060523/6053295c2e2cbe5b195b82e4/html5/thumbnails/57.jpg)
linear hashing insertsinsert
find bucket by applying H0 / H1 andinsert if there is space
if bucket to insert into is full:add overflow page, insert data entry,split next bucket and increment next
since buckets are split round-robin,long overflow chains don’t develop!
57
![Page 58: Modern Database Systems Lecture 2 - Aalto · 2017. 2. 1. · comments on extendible hashing if directoryfits in memory, equality queryanswered in one disk access answered = retrieve](https://reader033.fdocuments.us/reader033/viewer/2022060523/6053295c2e2cbe5b195b82e4/html5/thumbnails/58.jpg)
example - insert h(r) = 43on split, H1 is used to redistribute entries
H0
this is for illustration only!
M=4
00
01
10
11
000
001
010
011 actual contents of the
linear hashing file
next=0PRIMARYPAGES
44*
36*
32*
25*
9* 5*
14*
18*10*
30*
31*
35*
11*
7*
00
01
10
11
000
001
010
011
next=1
PRIMARYPAGES
44*
36*
32*
25*
9* 5*
14*
18*10*
30*
31*
35*
11*
7*
OVERFLOWPAGES
43*
00100
H1 H0H1
58
![Page 59: Modern Database Systems Lecture 2 - Aalto · 2017. 2. 1. · comments on extendible hashing if directoryfits in memory, equality queryanswered in one disk access answered = retrieve](https://reader033.fdocuments.us/reader033/viewer/2022060523/6053295c2e2cbe5b195b82e4/html5/thumbnails/59.jpg)
b+ tree - deletion
59
![Page 60: Modern Database Systems Lecture 2 - Aalto · 2017. 2. 1. · comments on extendible hashing if directoryfits in memory, equality queryanswered in one disk access answered = retrieve](https://reader033.fdocuments.us/reader033/viewer/2022060523/6053295c2e2cbe5b195b82e4/html5/thumbnails/60.jpg)
deleting a data entry
1. start at root, find leaf L of entry 2. remove the entry, if it exists
○ if L is at least half-full, done!○ else
■ try to re-distribute, borrowing from sibling● adjacent node with same parent as L
■ if that fails, merge L into siblingo if merge occured,
must delete L from parent of L
60
merge could propagate to root
![Page 61: Modern Database Systems Lecture 2 - Aalto · 2017. 2. 1. · comments on extendible hashing if directoryfits in memory, equality queryanswered in one disk access answered = retrieve](https://reader033.fdocuments.us/reader033/viewer/2022060523/6053295c2e2cbe5b195b82e4/html5/thumbnails/61.jpg)
example b+ tree
delete 19*2* 3*
root17
24 30
14* 16* 19* 20* 22* 24* 27* 29* 33* 34* 38* 39*
135
7*5* 8*
61
2* 3*
17
24 30
14* 16* 20* 22* 24* 27* 29* 33* 34* 38* 39*
135
7*5* 8*
![Page 62: Modern Database Systems Lecture 2 - Aalto · 2017. 2. 1. · comments on extendible hashing if directoryfits in memory, equality queryanswered in one disk access answered = retrieve](https://reader033.fdocuments.us/reader033/viewer/2022060523/6053295c2e2cbe5b195b82e4/html5/thumbnails/62.jpg)
example b+ tree
delete 20*2* 3*
root17
24 30
14* 16* 20* 22* 24* 27* 29* 33* 34* 38* 39*
135
7*5* 8*
62
2* 3*
17
24 30
14* 16* 22* 24* 27* 29* 33* 34* 38* 39*
135
7*5* 8*
occupancy below 50%, redistribute!
![Page 63: Modern Database Systems Lecture 2 - Aalto · 2017. 2. 1. · comments on extendible hashing if directoryfits in memory, equality queryanswered in one disk access answered = retrieve](https://reader033.fdocuments.us/reader033/viewer/2022060523/6053295c2e2cbe5b195b82e4/html5/thumbnails/63.jpg)
example b+ tree
delete 20*2* 3*
root17
24 30
14* 16* 20* 22* 24* 27* 29* 33* 34* 38* 39*
135
7*5* 8*
63
2* 3*
17
27 30
14* 16* 22* 27* 29* 33* 34* 38* 39*
135
7*5* 8*
occupancy below 50%, redistribute!
24*
middle key is copied up!
![Page 64: Modern Database Systems Lecture 2 - Aalto · 2017. 2. 1. · comments on extendible hashing if directoryfits in memory, equality queryanswered in one disk access answered = retrieve](https://reader033.fdocuments.us/reader033/viewer/2022060523/6053295c2e2cbe5b195b82e4/html5/thumbnails/64.jpg)
example b+ tree
delete 24*2* 3*
root17
27 30
14* 16* 22* 24* 27* 29* 33* 34* 38* 39*
135
7*5* 8*
64occupancy below 50%, merge!
2* 3*
17
27 30
14* 16* 22* 27* 29* 33* 34* 38* 39*
135
7*5* 8*
![Page 65: Modern Database Systems Lecture 2 - Aalto · 2017. 2. 1. · comments on extendible hashing if directoryfits in memory, equality queryanswered in one disk access answered = retrieve](https://reader033.fdocuments.us/reader033/viewer/2022060523/6053295c2e2cbe5b195b82e4/html5/thumbnails/65.jpg)
example b+ tree
delete 24*2* 3*
root17
27 30
14* 16* 22* 24* 27* 29* 33* 34* 38* 39*
135
7*5* 8*
65occupancy below 50%, merge!
2* 3*
17
27 30
14* 16* 22* 27* 29* 33* 34* 38* 39*
135
7*5* 8*
delete from parent!reverse of copying up
![Page 66: Modern Database Systems Lecture 2 - Aalto · 2017. 2. 1. · comments on extendible hashing if directoryfits in memory, equality queryanswered in one disk access answered = retrieve](https://reader033.fdocuments.us/reader033/viewer/2022060523/6053295c2e2cbe5b195b82e4/html5/thumbnails/66.jpg)
example b+ tree
delete 24*2* 3*
root17
27 30
14* 16* 22* 24* 27* 29* 33* 34* 38* 39*
135
7*5* 8*
66
2* 3*
17
30
14* 16* 22* 27* 29* 33* 34* 38* 39*
135
7*5* 8*
delete from parent!reverse of copying up
![Page 67: Modern Database Systems Lecture 2 - Aalto · 2017. 2. 1. · comments on extendible hashing if directoryfits in memory, equality queryanswered in one disk access answered = retrieve](https://reader033.fdocuments.us/reader033/viewer/2022060523/6053295c2e2cbe5b195b82e4/html5/thumbnails/67.jpg)
example b+ tree
delete 24*2* 3*
root17
27 30
14* 16* 22* 24* 27* 29* 33* 34* 38* 39*
135
7*5* 8*
67
2* 3*
17
30
14* 16* 22* 27* 29* 33* 34* 38* 39*
135
7*5* 8*
delete from parent!reverse of copying up
![Page 68: Modern Database Systems Lecture 2 - Aalto · 2017. 2. 1. · comments on extendible hashing if directoryfits in memory, equality queryanswered in one disk access answered = retrieve](https://reader033.fdocuments.us/reader033/viewer/2022060523/6053295c2e2cbe5b195b82e4/html5/thumbnails/68.jpg)
example b+ tree
delete 24*2* 3*
root17
27 30
14* 16* 22* 24* 27* 29* 33* 34* 38* 39*
135
7*5* 8*
68
2* 3*
17
30
14* 16* 22* 27* 29* 33* 34* 38* 39*
135
7*5* 8*
merge children of root!reverse of pushing up
![Page 69: Modern Database Systems Lecture 2 - Aalto · 2017. 2. 1. · comments on extendible hashing if directoryfits in memory, equality queryanswered in one disk access answered = retrieve](https://reader033.fdocuments.us/reader033/viewer/2022060523/6053295c2e2cbe5b195b82e4/html5/thumbnails/69.jpg)
example b+ tree
delete 24*2* 3*
root17
27 30
14* 16* 22* 24* 27* 29* 33* 34* 38* 39*
135
7*5* 8*
69
2* 3*
17 30
14* 16* 22* 27* 29* 33* 34* 38* 39*
135
7*5* 8*
merge children of root!reverse of pushing up
![Page 70: Modern Database Systems Lecture 2 - Aalto · 2017. 2. 1. · comments on extendible hashing if directoryfits in memory, equality queryanswered in one disk access answered = retrieve](https://reader033.fdocuments.us/reader033/viewer/2022060523/6053295c2e2cbe5b195b82e4/html5/thumbnails/70.jpg)
example b+ tree
delete 24*2* 3*
root17
27 30
14* 16* 22* 24* 27* 29* 33* 34* 38* 39*
135
7*5* 8*
70
2* 3* 7* 14* 16* 22* 27* 29* 33* 34* 38* 39*5* 8*
30135 17