Secondary Storage and Indexing
Transcript of Secondary Storage and Indexing
![Page 1: Secondary Storage and Indexing](https://reader036.fdocuments.us/reader036/viewer/2022081420/62a168a85299924f690313cc/html5/thumbnails/1.jpg)
Secondary Storage and Indexing
CSCI 4380 Database Systems
Friday, April 20, 12
![Page 2: Secondary Storage and Indexing](https://reader036.fdocuments.us/reader036/viewer/2022081420/62a168a85299924f690313cc/html5/thumbnails/2.jpg)
Disk access• Databases are generally large data stores,
much larger than available memory.
• Data is stored on disk, brought to memory on demand.
• A disk page or block is the smallest unit of access, to read and write.
• A disk page typically is 1K - 8K.
Friday, April 20, 12
![Page 3: Secondary Storage and Indexing](https://reader036.fdocuments.us/reader036/viewer/2022081420/62a168a85299924f690313cc/html5/thumbnails/3.jpg)
Disk organization
• A disk contains
• multiple platters (usually 2 surfaces per platter)
• usually, the disk contains read/write heads that allow is to read/write from all surfaces simultaneously
Friday, April 20, 12
![Page 4: Secondary Storage and Indexing](https://reader036.fdocuments.us/reader036/viewer/2022081420/62a168a85299924f690313cc/html5/thumbnails/4.jpg)
Disk organization
• A disk surface contains
• multiple concentric tracks
• the same track on different surfaces can be read by different heads at the same time, this unit is called a cylinder
Friday, April 20, 12
![Page 5: Secondary Storage and Indexing](https://reader036.fdocuments.us/reader036/viewer/2022081420/62a168a85299924f690313cc/html5/thumbnails/5.jpg)
Disk organization• A track is broken down to sectors, sectors are
separated from each other by blank spaces
• A sector is the smallest unit of operation (read/write) possible on a disk
• A disk block is usually composed of a number of consecutive sectors (determined by the operating system)
• Data are read/written in units of a disk block/page
• A disk block is the same size as a memory block or page.
Friday, April 20, 12
![Page 6: Secondary Storage and Indexing](https://reader036.fdocuments.us/reader036/viewer/2022081420/62a168a85299924f690313cc/html5/thumbnails/6.jpg)
Reading a disk page• Reading a page from disk requires the disk to start
spinning
• Disk arm has to be moved to the correct track of the disk -> seek operation
• The disk head must wait until the right location on the track is found -> rotational latency
• Then, the disk page can be read from disk and copied to memory -> transfer time.
Friday, April 20, 12
![Page 7: Secondary Storage and Indexing](https://reader036.fdocuments.us/reader036/viewer/2022081420/62a168a85299924f690313cc/html5/thumbnails/7.jpg)
Reading a disk page• The cost of reading a disk page:
• seek time + rotational latency time + transfer time
• Multiple pages on the same track/cylinder can be read with a single seek/latency. Reading M pages on the same track/cylinder:
• seek time + rotational latency time + transfer time * (percentage of disk circumference to be scanned)
Friday, April 20, 12
![Page 8: Secondary Storage and Indexing](https://reader036.fdocuments.us/reader036/viewer/2022081420/62a168a85299924f690313cc/html5/thumbnails/8.jpg)
A high end disk example
• Consider a disk with 16 surfaces, 216 tracks per surface (approx. 65K), 28=256 sectors per track and 212 bytes per sector.
• Each track has = 212 * 28 =220 bytes (1MB)
• Each surface has = 220 * 216 = 236 bytes
• The disk has = 24 * 236 = 240 byte = 1 TB
Friday, April 20, 12
![Page 9: Secondary Storage and Indexing](https://reader036.fdocuments.us/reader036/viewer/2022081420/62a168a85299924f690313cc/html5/thumbnails/9.jpg)
Reading a page• Typical times:
• 7200 rpm means one rotation takes 8.33 ms (in the average, 1/2 of the disk needs to be rotated before the correct location is found, 4.17ms)
• seek time between 0 - 17.38 ms (in the average, 1/3 of the disk surface is scanned = 6.46 ms)
• transfer time for one sector : 8.33/256 = 0.03 ms
Friday, April 20, 12
![Page 10: Secondary Storage and Indexing](https://reader036.fdocuments.us/reader036/viewer/2022081420/62a168a85299924f690313cc/html5/thumbnails/10.jpg)
Reading a page• Reading a page of 8K (2 sectors):
• 1 seek + 1 rotational latency + 2 sector transfer time
• 6.46 + 4.17 + 0.03 * 2 = 10.69 ms
• Reading 100 consecutive pages on the same track:
• 6.46 + 4.17 + 0.03 * 10 = 13.63 ms
• The lesson: Put blocks that are accessed together on the same track/cylinder as much as possible
Friday, April 20, 12
![Page 11: Secondary Storage and Indexing](https://reader036.fdocuments.us/reader036/viewer/2022081420/62a168a85299924f690313cc/html5/thumbnails/11.jpg)
Disk scheduling• The disk controller can order the requests
to minimize seeks
• When the controller is moving from low tracks to high tracks, serve the next track request in the direction of the movement, queue the rest
• The method is called the elevator algorithm
Friday, April 20, 12
![Page 12: Secondary Storage and Indexing](https://reader036.fdocuments.us/reader036/viewer/2022081420/62a168a85299924f690313cc/html5/thumbnails/12.jpg)
Checksums• For each sector, store a number of error checking bits called
checksums.
• The checksum is 1 if the number of 1’s in the given sector is odd, and 0 if the number of 1’s is even.
• When reading a sector, check that the checksum is correct.
• Checks for 1 bit errors.
• Errors for more than 1 bits, the checksum will catch it in 50% of the time.
• For better error correction, use multiple bits (8 bits, bit i stores the parity of the ith bit of each byte).
Friday, April 20, 12
![Page 13: Secondary Storage and Indexing](https://reader036.fdocuments.us/reader036/viewer/2022081420/62a168a85299924f690313cc/html5/thumbnails/13.jpg)
Stable storage• When we are writing a sector, if the write fails, then we
lost the data on that sector.
• Use two sectors for each sector, XL and XR.
• First write XL, check the checksum. If XL is written correctly, then write XR.
• If XL is written incorrectly, then the old version of X is still stored in XR.
• If XR is written incorrectly, then the new version of X is stored in XL.
Friday, April 20, 12
![Page 14: Secondary Storage and Indexing](https://reader036.fdocuments.us/reader036/viewer/2022081420/62a168a85299924f690313cc/html5/thumbnails/14.jpg)
Multiple disks
• Raid (redundant array of inexpensive disks) is a series of methods for improving access time and reducing possibility of data loss by using multiple disks.
Friday, April 20, 12
![Page 15: Secondary Storage and Indexing](https://reader036.fdocuments.us/reader036/viewer/2022081420/62a168a85299924f690313cc/html5/thumbnails/15.jpg)
RAID-0
• RAID-0, striping
• Distribute the data into multiple disks
• Example with 4 disks:
• Disk 1 has pages 1,5,9
• Disk 2 has pages 2,6,10
• Disk 3 has pages 3,7,11
• Disk 4 has pages 4,8,12
Friday, April 20, 12
![Page 16: Secondary Storage and Indexing](https://reader036.fdocuments.us/reader036/viewer/2022081420/62a168a85299924f690313cc/html5/thumbnails/16.jpg)
RAID-0
• RAID-0, striping
• Reads are faster (read from all disks simultaneously)
• Writes are the same
• No redundancy in case a disk fails
Friday, April 20, 12
![Page 17: Secondary Storage and Indexing](https://reader036.fdocuments.us/reader036/viewer/2022081420/62a168a85299924f690313cc/html5/thumbnails/17.jpg)
RAID-1• RAID-1, mirroring
• Mirror each disk onto another disk
• Reads are twice as fast, read from any disk available
• Writes are slow, each write require writing to two disks
• If one of the disks fail, the other one contains all the data (no data loss)
Friday, April 20, 12
![Page 18: Secondary Storage and Indexing](https://reader036.fdocuments.us/reader036/viewer/2022081420/62a168a85299924f690313cc/html5/thumbnails/18.jpg)
RAID-4• One block contains the parity of the remaining disks
• Block i in the parity disk contains the parity of the ith block in all the remaining disks
• Reads are unchanged
• Writes are slower, each write requires a write to the parity disk as well
• If a disk fails, the lost data can be constructed from the remaining disks
Friday, April 20, 12
![Page 19: Secondary Storage and Indexing](https://reader036.fdocuments.us/reader036/viewer/2022081420/62a168a85299924f690313cc/html5/thumbnails/19.jpg)
RAID-5• Similar to RAID-4, but the parity block is distributed
to all the disks
• Example: Given 5 disks (4 regular and 1 parity):
• Use disk 1 for parity of block 1
• Use disk 2 for parity of block 2
• etc.
• Reads are the same
• Writes are faster as the parity block is no longer a bottleneck
Friday, April 20, 12
![Page 20: Secondary Storage and Indexing](https://reader036.fdocuments.us/reader036/viewer/2022081420/62a168a85299924f690313cc/html5/thumbnails/20.jpg)
Tuple organization• A disk page typically stores multiple tuples.
Many different organizations exist.
• The number of tuples that can fit in a page is determined by the number of attributes and the types of attributes the relation has.
Header info row directory
1 2 N... Free space Data rowsRow N Row N-1 Row 1...
Friday, April 20, 12
![Page 21: Secondary Storage and Indexing](https://reader036.fdocuments.us/reader036/viewer/2022081420/62a168a85299924f690313cc/html5/thumbnails/21.jpg)
Tuple addressing• Tuples have a physical address which contains the
relevant subset of:
• Host name/Disk number/Surface No/ Track No/Sector No
• Physical address tends to be long
• Tuples are also given a logical address in the relation,
• A map table stored on disk contains the mapping from the logical address to physical address
Friday, April 20, 12
![Page 22: Secondary Storage and Indexing](https://reader036.fdocuments.us/reader036/viewer/2022081420/62a168a85299924f690313cc/html5/thumbnails/22.jpg)
Tuple addressing
• When tuples are brought from disk to memory, its current address becomes a memory address
• Pointer swizzling is the act of changing physical address to the memory address in the map table for pages in memory
Friday, April 20, 12
![Page 23: Secondary Storage and Indexing](https://reader036.fdocuments.us/reader036/viewer/2022081420/62a168a85299924f690313cc/html5/thumbnails/23.jpg)
Indexing
• An index is a lookup structure built on a search key
• the search key can consist of multiple attribute
• the index contains pointers to tuples (logical address)
• The index itself is also packed into pages and stored on disk.
Friday, April 20, 12
![Page 24: Secondary Storage and Indexing](https://reader036.fdocuments.us/reader036/viewer/2022081420/62a168a85299924f690313cc/html5/thumbnails/24.jpg)
Dense vs. sparse
• The index is called dense if it contains an entry for each tuple in the relation.
• An index is called sparse if it does not contain an entry for each tuple.
• A sparse index is possible if the addressed relation is sorted with respect to the index key.
Friday, April 20, 12
![Page 25: Secondary Storage and Indexing](https://reader036.fdocuments.us/reader036/viewer/2022081420/62a168a85299924f690313cc/html5/thumbnails/25.jpg)
Dense Index Example1, t1
2,t3
4,t5
5,t6
8,t7
9,t10
10,t12
Index
t1
t7
t12
t5
t6
t3
t10
Indexed
Relation
Friday, April 20, 12
![Page 26: Secondary Storage and Indexing](https://reader036.fdocuments.us/reader036/viewer/2022081420/62a168a85299924f690313cc/html5/thumbnails/26.jpg)
Sparse Index Example1, t1
8,t7
Index
t1
t3
t5
t6
t7
t10
t12
Indexed
Relation
1,t1 points to all values between 1 and 5 8,t7 points to all values greater than 5
Friday, April 20, 12
![Page 27: Secondary Storage and Indexing](https://reader036.fdocuments.us/reader036/viewer/2022081420/62a168a85299924f690313cc/html5/thumbnails/27.jpg)
Index types
• An index can be
• primary, i.e. determines where the tuples are stored
• secondary, i.e. points to the tuples
• There can be many secondary indices.
• An index can be multi-level, i.e. a tree index, where each level is an index on the level below.
Friday, April 20, 12
![Page 28: Secondary Storage and Indexing](https://reader036.fdocuments.us/reader036/viewer/2022081420/62a168a85299924f690313cc/html5/thumbnails/28.jpg)
B- trees
• B trees (called B+ trees in some books) are constructed on a list of attributes (also called the index key)
• Each node on a B-tree is mapped to a disk page
• Leaf nodes:
• A leaf node can contain at most n tuples (key values and pointers) and 1 additional pointer to the sibling node.
• A leaf node must contain at least floor((n+1)/2) tuples (plus one additional pointer to the next sibling node.
Friday, April 20, 12
![Page 29: Secondary Storage and Indexing](https://reader036.fdocuments.us/reader036/viewer/2022081420/62a168a85299924f690313cc/html5/thumbnails/29.jpg)
B- trees
• Internal nodes:
• An internal node can contain at most n + 1 pointers and n key values.
• An internal node must contain at least floor((n+1)/2) pointers (and one less key value), except the root which can contain a single key value and 2 pointers.
Friday, April 20, 12
![Page 30: Secondary Storage and Indexing](https://reader036.fdocuments.us/reader036/viewer/2022081420/62a168a85299924f690313cc/html5/thumbnails/30.jpg)
B- tree example• Suppose n = 3
• Each leaf node will have at least 2 and at most 3 tuples.
• Each internal node will point to at least 2 and at most 4 nodes below (and hence will have between 1 and 2 key values).
• Suppose n = 99
• Each leaf node will have at least 50 and at most 99 tuples.
• Each internal node will point to at least 50 and at most 100 nodes below (and hence will have between 49 and 99 key values).
• The root can have 2 pointers and 1 key value in the least.
Friday, April 20, 12
![Page 31: Secondary Storage and Indexing](https://reader036.fdocuments.us/reader036/viewer/2022081420/62a168a85299924f690313cc/html5/thumbnails/31.jpg)
Sibling nodes
• Leaf nodes point to the next node in the leaf, called a sibling node.
Friday, April 20, 12
![Page 32: Secondary Storage and Indexing](https://reader036.fdocuments.us/reader036/viewer/2022081420/62a168a85299924f690313cc/html5/thumbnails/32.jpg)
B- trees
• Leaf nodes contain pairs of
• key values
• pointers to the tuple
• If the B-tree is a secondary index, then there is an entry in the leaf level for each tuple in the relation.
• The leaf nodes also contain a pointer to the next (sibling) leaf node.
Friday, April 20, 12
![Page 33: Secondary Storage and Indexing](https://reader036.fdocuments.us/reader036/viewer/2022081420/62a168a85299924f690313cc/html5/thumbnails/33.jpg)
B- trees• Internal nodes contain n key values and n+1 pointers
• The pointers point to the nodes at the level below
10 25 32
values
<10
values
>=10
and
<25
values
>=25
and
<32
values
>=32
Friday, April 20, 12
![Page 34: Secondary Storage and Indexing](https://reader036.fdocuments.us/reader036/viewer/2022081420/62a168a85299924f690313cc/html5/thumbnails/34.jpg)
Example B-tree
Assume at most 4 key values per node
2 7 11 15 22 30 41 53 54 63 66 69 71 76 78 84 93
11 3066 78
53
pointers to tuples
Friday, April 20, 12
![Page 35: Secondary Storage and Indexing](https://reader036.fdocuments.us/reader036/viewer/2022081420/62a168a85299924f690313cc/html5/thumbnails/35.jpg)
B-trees with duplicate values
• If the B-tree is built on a key value that may contain duplicates, build the index in an identical way, except:
• The non-leaf node pointing to leaf node contains the key value of the first node that is not repeating from the previous sibling
• If there is no such key, then a null value is stored at this location.
Friday, April 20, 12
![Page 36: Secondary Storage and Indexing](https://reader036.fdocuments.us/reader036/viewer/2022081420/62a168a85299924f690313cc/html5/thumbnails/36.jpg)
Example B-tree with duplicates
Assume at most 4 key values per node
2 7 11 15 15 15 18 18 22 41 41 41 41 41 55 63
11 18- 55
22
Friday, April 20, 12
![Page 37: Secondary Storage and Indexing](https://reader036.fdocuments.us/reader036/viewer/2022081420/62a168a85299924f690313cc/html5/thumbnails/37.jpg)
B-tree equality search• Given select * from R where A = x and an index on R.A (assume no
duplicate values for R.A):
• While not at leaf level:
• Starting from the root, find the address for the node below that may contain this value (the pointer to the left of the first key value that is greater than x or the last pointer if no such value exists)
• Read the node from disk
• If the leaf level contains a tuple with the searched value, read the matched tuples from disk and return
Friday, April 20, 12
![Page 38: Secondary Storage and Indexing](https://reader036.fdocuments.us/reader036/viewer/2022081420/62a168a85299924f690313cc/html5/thumbnails/38.jpg)
B-tree equality search• Given select * from R where A = x and an index on R.A (assume R.A
may contain duplicate values):
• While not at leaf level:
• Starting from the root, find the address for the node below that may contain this value (the pointer to the left of the first key value that is greater than x or the last pointer if no such value exists)
• Read the node from disk
• If the leaf level contains a tuple with the searched value, scan all sibling pointers until a value different than x is found. Read the matched tuples from disk and return
Friday, April 20, 12
![Page 39: Secondary Storage and Indexing](https://reader036.fdocuments.us/reader036/viewer/2022081420/62a168a85299924f690313cc/html5/thumbnails/39.jpg)
B-tree range search
• Given select * from R where A < y and A > x an index on R.A:
• Using the same algorithm from before, find the first leaf node containing a value > x
• Traverse the sibling pointers from left to right until all tuples in the range are read
• Read all the matching tuples from the disk
Friday, April 20, 12
![Page 40: Secondary Storage and Indexing](https://reader036.fdocuments.us/reader036/viewer/2022081420/62a168a85299924f690313cc/html5/thumbnails/40.jpg)
Index only search
• Given select A from R where A < 120 and A > 10 and an index on R.A:
• Scan the index for matching tuples as before and return the found A values (no need to read the tuples from disk)
Friday, April 20, 12
![Page 41: Secondary Storage and Indexing](https://reader036.fdocuments.us/reader036/viewer/2022081420/62a168a85299924f690313cc/html5/thumbnails/41.jpg)
Index partial match• Given an index on R.A, R.B (index is sorted on A first and then
on B)
• Select * from R where A > 10 and A < 100 and B=2
• Scan index for the range A > 10 and A < 100, and for each matching tuple check the B value, read matched tuples from disk
• Select * from R where B > 10 and B < 100
• Scan the leaf level of the index completely to find the matching B tuples, read matched tuples from disk
Friday, April 20, 12
![Page 42: Secondary Storage and Indexing](https://reader036.fdocuments.us/reader036/viewer/2022081420/62a168a85299924f690313cc/html5/thumbnails/42.jpg)
Insertion1. Given a new entry A to be inserted
1.1. Search the tree for the new entry
1.2. If the leaf node X has space for the new entry, insert.
1.3. Otherwise
1.3.1. Create a new leaf node Y and distribute the entries in X and the entry A to X and the new node
1.3.2. Create a new entry B with the address of Y and the lowest entry in Y
1.3.3. Insert B into the parent of X recursively (go to step 1.2)
Friday, April 20, 12
![Page 43: Secondary Storage and Indexing](https://reader036.fdocuments.us/reader036/viewer/2022081420/62a168a85299924f690313cc/html5/thumbnails/43.jpg)
43
Insert Example
Insert record with key 57 (at most 4 key values)
2 7 11 15 22 30 41 53 54 63 66 69 71 76 78 84 93
11 3066 78
53
Friday, April 20, 12
![Page 44: Secondary Storage and Indexing](https://reader036.fdocuments.us/reader036/viewer/2022081420/62a168a85299924f690313cc/html5/thumbnails/44.jpg)
44
Insert Example
Insert record with key 57 (at most 4 key values)
2 7 11 15 22 30 41 53 54 57 63 66 69 71 76 78 84 93
11 3066 78
53
We are done! No rebalancing necessary
Friday, April 20, 12
![Page 45: Secondary Storage and Indexing](https://reader036.fdocuments.us/reader036/viewer/2022081420/62a168a85299924f690313cc/html5/thumbnails/45.jpg)
45
Another Insert Example
Insert 65
2 7 11 15 22 30 41 53 54 57 63 66 69 71 76 78 84 93
11 3066 78
53
Friday, April 20, 12
![Page 46: Secondary Storage and Indexing](https://reader036.fdocuments.us/reader036/viewer/2022081420/62a168a85299924f690313cc/html5/thumbnails/46.jpg)
46
Another Insert Example
Overflown node is split
2 7 11 15 22 30 41 53 54 57 66 69 71 76 78 84 93
11 30 63 66 78
53
63 65
Friday, April 20, 12
![Page 47: Secondary Storage and Indexing](https://reader036.fdocuments.us/reader036/viewer/2022081420/62a168a85299924f690313cc/html5/thumbnails/47.jpg)
47
Another Insert Example
Insert 70 and 94, one more node split
53
2 7
11 30 63 66 71 76
11 15 22 30 41 53 54 57 66 69 70 78 84 93 94
63 65 71 76
Friday, April 20, 12
![Page 48: Secondary Storage and Indexing](https://reader036.fdocuments.us/reader036/viewer/2022081420/62a168a85299924f690313cc/html5/thumbnails/48.jpg)
48
Another Insert Example
Finally, insert 90 (which will cause the parent to split)
53
2 7
11 30 63 66 71 76
11 15 22 30 41 53 54 57 66 69 70 78 84 93 94
63 65 71 76
Friday, April 20, 12
![Page 49: Secondary Storage and Indexing](https://reader036.fdocuments.us/reader036/viewer/2022081420/62a168a85299924f690313cc/html5/thumbnails/49.jpg)
49
Another Insert Example
Finally, insert 90 (which will cause the parent to split)
53 71
2 7
11 30 63 66
11 15 22 30 41 53 54 57 66 69 70 78 84 90
63 65 71 76 93 94
78 93
Friday, April 20, 12
![Page 50: Secondary Storage and Indexing](https://reader036.fdocuments.us/reader036/viewer/2022081420/62a168a85299924f690313cc/html5/thumbnails/50.jpg)
Deletion
• Suppose we would like to delete entry A
• Locate leaf node X containing entry A and delete A
• If X has n/2 or more pointers, then adjust the parent node entry pointing to this node if necessary recursively (if we deleted the smallest entry in the node)
Friday, April 20, 12
![Page 51: Secondary Storage and Indexing](https://reader036.fdocuments.us/reader036/viewer/2022081420/62a168a85299924f690313cc/html5/thumbnails/51.jpg)
Deletion• Otherwise, the node has too few pointers.
• If a sibling node with the same parent has more than n/2 pointers, then redistribute entries with the sibling and adjust the parent pointers
• Else
• delete A
• insert all the entries in A to a sibling B
• adjust the parent entry for B
• delete the entry Y in the parent for A recursively (go to the first step of this algorithm)
Friday, April 20, 12
![Page 52: Secondary Storage and Indexing](https://reader036.fdocuments.us/reader036/viewer/2022081420/62a168a85299924f690313cc/html5/thumbnails/52.jpg)
52
Deletion Example
Delete key 30
2 7 11 15 17 22 30 53 54 78 84 93
11 22
78
53
Friday, April 20, 12
![Page 53: Secondary Storage and Indexing](https://reader036.fdocuments.us/reader036/viewer/2022081420/62a168a85299924f690313cc/html5/thumbnails/53.jpg)
53
Deletion Example
Delete key 30Borrow from neighbor
Adjust the internal node
2 7 11 15 17 22 53 54 78 84 93
11 17
78
53
Friday, April 20, 12
![Page 54: Secondary Storage and Indexing](https://reader036.fdocuments.us/reader036/viewer/2022081420/62a168a85299924f690313cc/html5/thumbnails/54.jpg)
53
Deletion Example
Delete key 30Borrow from neighbor
Redistribute betweenthe second andthird leaf nodes.
Adjust the internal node
2 7 11 15 17 22 53 54 78 84 93
11 17
78
53
Friday, April 20, 12
![Page 55: Secondary Storage and Indexing](https://reader036.fdocuments.us/reader036/viewer/2022081420/62a168a85299924f690313cc/html5/thumbnails/55.jpg)
54
Another Deletion Example
Delete key 7
Cannot borrow from neighbor,Merge with neighbor
2 7 11 15 17 22 53 54 78 84 93
11 17
78
53
Friday, April 20, 12
![Page 56: Secondary Storage and Indexing](https://reader036.fdocuments.us/reader036/viewer/2022081420/62a168a85299924f690313cc/html5/thumbnails/56.jpg)
55
Another Deletion Example
2 11 15 17 22 53 54 78 84 93
17
78
53
Delete the corresponding pointer
Friday, April 20, 12
![Page 57: Secondary Storage and Indexing](https://reader036.fdocuments.us/reader036/viewer/2022081420/62a168a85299924f690313cc/html5/thumbnails/57.jpg)
56
Another Deletion Example
Delete 53, must merge with a sibling
2 11 15 17 22 53 54 78 84
17
78
53
Friday, April 20, 12
![Page 58: Secondary Storage and Indexing](https://reader036.fdocuments.us/reader036/viewer/2022081420/62a168a85299924f690313cc/html5/thumbnails/58.jpg)
57
Another Deletion Example
2 11 15 17 22 54 78 84
17
78
53Node too empty,
cannot borrow from sibling,
must merge with sibling
Friday, April 20, 12
![Page 59: Secondary Storage and Indexing](https://reader036.fdocuments.us/reader036/viewer/2022081420/62a168a85299924f690313cc/html5/thumbnails/59.jpg)
58
Another Deletion Example
2 11 15 17 22 54 78 84
17 54
78
53
Friday, April 20, 12
![Page 60: Secondary Storage and Indexing](https://reader036.fdocuments.us/reader036/viewer/2022081420/62a168a85299924f690313cc/html5/thumbnails/60.jpg)
59
Another Deletion Example
2 11 15 17 22 54 78 84
17 54
The final tree.
Friday, April 20, 12
![Page 61: Secondary Storage and Indexing](https://reader036.fdocuments.us/reader036/viewer/2022081420/62a168a85299924f690313cc/html5/thumbnails/61.jpg)
60
A B-Tree Example
Given:
disk page has capacity of 4K bytes
each tuple address takes 6 bytes and each key value takes 2 bytes
each node is 70% full
need to store 1 million tuples
Friday, April 20, 12
![Page 62: Secondary Storage and Indexing](https://reader036.fdocuments.us/reader036/viewer/2022081420/62a168a85299924f690313cc/html5/thumbnails/62.jpg)
61
A B+-Tree Example
Leaf node capacity
• each (key value, tuple address) pair takes 8 bytes
• disk page capacity is 4K, so (4*1024)/8 = 512 (key value, rowid) pairs per leaf page
in reality there are extra headers and pointers that we will ignore
• Hence, the maximum number of points for the tree is about 256 (and 255 key values)
Friday, April 20, 12
![Page 63: Secondary Storage and Indexing](https://reader036.fdocuments.us/reader036/viewer/2022081420/62a168a85299924f690313cc/html5/thumbnails/63.jpg)
62
Example Continued• If all pages are 70% full, each page has about
512*0.7 = 359 pointers
• To store 1 million tuples, requires
1,000,000 / 359 = 2786 pages at the leaf level
2789 / 359 = 8 pages at next level up
1 root page pointing to those 8 pages
Hence, we have a B-tree with 3 levels
Friday, April 20, 12
![Page 64: Secondary Storage and Indexing](https://reader036.fdocuments.us/reader036/viewer/2022081420/62a168a85299924f690313cc/html5/thumbnails/64.jpg)
Hashing• Given a hash of K buckets
• Allocate a number of disk blocks M to each bucket
• For each tuple t, apply the hash function. Suppose, we hash on attribute A, if h(t.A) = x, then store t in the blocks allocated for bucket x.
• Search on attribute A (select * from r where r.a=c)
• Cost: M/2 (search half the pages for that bucket in the average
Friday, April 20, 12
![Page 65: Secondary Storage and Indexing](https://reader036.fdocuments.us/reader036/viewer/2022081420/62a168a85299924f690313cc/html5/thumbnails/65.jpg)
Hashing
• Search on another attribute
• Cost: N
• Insertion cost: 1 read and 1 write (find the last page in the appropriate bucket and store)
• Deletion/Update cost: M/2 (search cost) + 1 to update
Friday, April 20, 12
![Page 66: Secondary Storage and Indexing](https://reader036.fdocuments.us/reader036/viewer/2022081420/62a168a85299924f690313cc/html5/thumbnails/66.jpg)
Hashing - collisions
• If a bucket has too many tuples, than the allocated M pages may not be sufficient
• Allocate additional overflow area
• If the overflow area is large, the benefit of the hash is lost
Friday, April 20, 12
![Page 67: Secondary Storage and Indexing](https://reader036.fdocuments.us/reader036/viewer/2022081420/62a168a85299924f690313cc/html5/thumbnails/67.jpg)
Extensible hashing• The address space of the hash (K) can be adjusted to the
number of tuples in the relation
• Use a hash function h
• But, use only first z bits of the hashed value to address the tuples
• If a bucket overflows, split the hash directory and use z+1 bits to address
Friday, April 20, 12
![Page 68: Secondary Storage and Indexing](https://reader036.fdocuments.us/reader036/viewer/2022081420/62a168a85299924f690313cc/html5/thumbnails/68.jpg)
Extensible hashing• Using a single bit to address
tuples
0
1z=1
Page 1
0
1
Page 0Overflow!
new point
Friday, April 20, 12
![Page 69: Secondary Storage and Indexing](https://reader036.fdocuments.us/reader036/viewer/2022081420/62a168a85299924f690313cc/html5/thumbnails/69.jpg)
Extensible hashing• Double the directory
Page 1
0
1
Page 0
Distribute to00 and 10
Friday, April 20, 12
![Page 70: Secondary Storage and Indexing](https://reader036.fdocuments.us/reader036/viewer/2022081420/62a168a85299924f690313cc/html5/thumbnails/70.jpg)
Extensible hashing• Double the directory
Page 1
00
01
Page 0
Page 3
10
11
Page 2
Make a copy of thedirectory
Friday, April 20, 12
![Page 71: Secondary Storage and Indexing](https://reader036.fdocuments.us/reader036/viewer/2022081420/62a168a85299924f690313cc/html5/thumbnails/71.jpg)
Extensible hashing
Page 1
00
01
Page 0
Page 3
10
11
Page 2
Update thelink for the new node
Friday, April 20, 12
![Page 72: Secondary Storage and Indexing](https://reader036.fdocuments.us/reader036/viewer/2022081420/62a168a85299924f690313cc/html5/thumbnails/72.jpg)
Extensible hashing
Page 1
00
01
Page 0
Page 3
10
11
Page 2
How do we knowwhich nodes canbe split withoutsplitting the directory?
2
1
2
Friday, April 20, 12
![Page 73: Secondary Storage and Indexing](https://reader036.fdocuments.us/reader036/viewer/2022081420/62a168a85299924f690313cc/html5/thumbnails/73.jpg)
Linear hashing• The addressing is the same, but we allow overflows
• We decide to split based on a global rule
• If number of pages/number of tuples > k %
• Split one bucket at a time
0
1
Friday, April 20, 12
![Page 74: Secondary Storage and Indexing](https://reader036.fdocuments.us/reader036/viewer/2022081420/62a168a85299924f690313cc/html5/thumbnails/74.jpg)
Linear hashing0
1
new point
decide to split
00
1
10
split the contents
into 00 and 10
00
1
10
bucket 1 still contains
all entries
01 and 11
Friday, April 20, 12
![Page 75: Secondary Storage and Indexing](https://reader036.fdocuments.us/reader036/viewer/2022081420/62a168a85299924f690313cc/html5/thumbnails/75.jpg)
Linear hashing• The bucket split is the next one in sequence
• it may not be the one that has overflow pages
• eventually all buckets will be split
Friday, April 20, 12