Chapter 12: Indexing and Hashing
description
Transcript of Chapter 12: Indexing and Hashing
![Page 1: Chapter 12: Indexing and Hashing](https://reader035.fdocuments.us/reader035/viewer/2022062423/56814471550346895db10498/html5/thumbnails/1.jpg)
1
Chapter 12: Indexing and Chapter 12: Indexing and HashingHashing
IndexingIndexing Basic ConceptsBasic Concepts Ordered Indices Ordered Indices B+-Tree Index FilesB+-Tree Index Files
HashingHashing StaticStatic Dynamic HashingDynamic Hashing
![Page 2: Chapter 12: Indexing and Hashing](https://reader035.fdocuments.us/reader035/viewer/2022062423/56814471550346895db10498/html5/thumbnails/2.jpg)
2
Basic ConceptsBasic Concepts
ValueValue
Search KeySearch Key - set of attributes used to - set of attributes used to look up records in a file.look up records in a file.
value
record
search key pointer
![Page 3: Chapter 12: Indexing and Hashing](https://reader035.fdocuments.us/reader035/viewer/2022062423/56814471550346895db10498/html5/thumbnails/3.jpg)
3
Index Evaluation MetricsIndex Evaluation Metrics
Access types supported efficiently. E.g., Access types supported efficiently. E.g., Point query: find “Tom”Point query: find “Tom” Range query: find students whose age is Range query: find students whose age is
between 20-40between 20-40 Access timeAccess time Update timeUpdate time Space overheadSpace overhead
![Page 4: Chapter 12: Indexing and Hashing](https://reader035.fdocuments.us/reader035/viewer/2022062423/56814471550346895db10498/html5/thumbnails/4.jpg)
4
Ordered IndicesOrdered Indices In an In an ordered indexordered index, , index entries are index entries are
stored sorted on the search key value. stored sorted on the search key value. E.g., author catalog in library.E.g., author catalog in library.
![Page 5: Chapter 12: Indexing and Hashing](https://reader035.fdocuments.us/reader035/viewer/2022062423/56814471550346895db10498/html5/thumbnails/5.jpg)
5
2010
4030
6050
8070
10090
10305070
90110130150
170190210230
Primary index
Also called clustering index
•The search key of a primary index is usually but not necessarily the primary key.
same order
Search key
![Page 6: Chapter 12: Indexing and Hashing](https://reader035.fdocuments.us/reader035/viewer/2022062423/56814471550346895db10498/html5/thumbnails/6.jpg)
6
Search key
5030
7020
4080
10100
6090
10203040
506070...
Secondary index: non-clustering index.
different order
![Page 7: Chapter 12: Indexing and Hashing](https://reader035.fdocuments.us/reader035/viewer/2022062423/56814471550346895db10498/html5/thumbnails/7.jpg)
7
Sequential File
2010
4030
6050
8070
10090
Dense Index
10203040
50607080
90100110120
Dense Index: contains index records for every search-key values.
![Page 8: Chapter 12: Indexing and Hashing](https://reader035.fdocuments.us/reader035/viewer/2022062423/56814471550346895db10498/html5/thumbnails/8.jpg)
8
Sequential File
2010
4030
6050
8070
10090
Sparse Index
10305070
90110130150
170190210230
Sparse Index: contains index records for only some search-key values.
Applicable when records are sequentially ordered on search-key
![Page 9: Chapter 12: Indexing and Hashing](https://reader035.fdocuments.us/reader035/viewer/2022062423/56814471550346895db10498/html5/thumbnails/9.jpg)
9
Secondary indexesSecondary indexes Sequencefield
5030
7020
4080
10100
6090
• Sparse index
302080
100
90...
does not make sense!
![Page 10: Chapter 12: Indexing and Hashing](https://reader035.fdocuments.us/reader035/viewer/2022062423/56814471550346895db10498/html5/thumbnails/10.jpg)
10
Sequential File
2010
4030
6050
8070
10090
Sparse 2nd level
10305070
90110130150
170190210230
1090
170250
330410490570
Multilevel IndexMultilevel Index
![Page 11: Chapter 12: Indexing and Hashing](https://reader035.fdocuments.us/reader035/viewer/2022062423/56814471550346895db10498/html5/thumbnails/11.jpg)
11
Secondary indexesSecondary indexes Sequencefield
5030
7020
4080
10100
6090
10203040
506070...
105090...
sparsehighlevel
Lowest level is denseLowest level is dense Other levels are sparseOther levels are sparse
Multilevel IndexMultilevel Index
![Page 12: Chapter 12: Indexing and Hashing](https://reader035.fdocuments.us/reader035/viewer/2022062423/56814471550346895db10498/html5/thumbnails/12.jpg)
12
Conventional indexesConventional indexes
Advantage:Advantage:
- Simple- Simple- Index is sequential file good - Index is sequential file good
for for scansscansDisadvantage:Disadvantage:
- Inserts expensive- Inserts expensive
![Page 13: Chapter 12: Indexing and Hashing](https://reader035.fdocuments.us/reader035/viewer/2022062423/56814471550346895db10498/html5/thumbnails/13.jpg)
13
OutlineOutline
Conventional indexesConventional indexes B+-Tree B+-Tree NEXT NEXT
![Page 14: Chapter 12: Indexing and Hashing](https://reader035.fdocuments.us/reader035/viewer/2022062423/56814471550346895db10498/html5/thumbnails/14.jpg)
14
NEXT: Another type of indexNEXT: Another type of index Give up on sequentiality of indexGive up on sequentiality of index Try to get “balance”Try to get “balance”
![Page 15: Chapter 12: Indexing and Hashing](https://reader035.fdocuments.us/reader035/viewer/2022062423/56814471550346895db10498/html5/thumbnails/15.jpg)
15
RootRoot
B+Tree Example n=4
100
120
150
180
30
3 5 11
30
35
100
101
110
120
130
150
156
179
180
200
![Page 16: Chapter 12: Indexing and Hashing](https://reader035.fdocuments.us/reader035/viewer/2022062423/56814471550346895db10498/html5/thumbnails/16.jpg)
16
Sample non-leafSample non-leaf
57
81
95
Key is moved (not copied) from lower level non-leaf node to upper level non-leaf node
to keys to keys to keys to keys
< 57 57 k<81 81k<95 95
![Page 17: Chapter 12: Indexing and Hashing](https://reader035.fdocuments.us/reader035/viewer/2022062423/56814471550346895db10498/html5/thumbnails/17.jpg)
17
Sample leaf node:Sample leaf node:
From non-leaf nodeFrom non-leaf node
to next leafto next leaf
in sequencein sequence
57
81
95
To r
eco
rd
wit
h k
ey 5
7
To r
eco
rd
wit
h k
ey 8
1
To r
eco
rd
wit
h k
ey 8
5
Key is copied (not moved) from leaf node to non-leaf node
![Page 18: Chapter 12: Indexing and Hashing](https://reader035.fdocuments.us/reader035/viewer/2022062423/56814471550346895db10498/html5/thumbnails/18.jpg)
18
n=4n=4
Leaf:Leaf:
Non-leaf:Non-leaf:
30
35
30
30 35
30
![Page 19: Chapter 12: Indexing and Hashing](https://reader035.fdocuments.us/reader035/viewer/2022062423/56814471550346895db10498/html5/thumbnails/19.jpg)
19
Size of nodes: Size of nodes:
n pointersn pointers
n-1 keysn-1 keys
![Page 20: Chapter 12: Indexing and Hashing](https://reader035.fdocuments.us/reader035/viewer/2022062423/56814471550346895db10498/html5/thumbnails/20.jpg)
20
Don’t want nodes to be too Don’t want nodes to be too emptyempty
Use at leastUse at least
Root : 2 pointersRoot : 2 pointers
Non-leaf: Non-leaf: n/2n/2 pointers pointers
Leaf : Leaf : (n-1)/2(n-1)/2 keys keys
![Page 21: Chapter 12: Indexing and Hashing](https://reader035.fdocuments.us/reader035/viewer/2022062423/56814471550346895db10498/html5/thumbnails/21.jpg)
21
Full nodeFull node min. nodemin. node
Non-leafNon-leaf
LeafLeaf
n=4
12
01
50
18
0
30
3 5 11
30
35
counts
even if
null
![Page 22: Chapter 12: Indexing and Hashing](https://reader035.fdocuments.us/reader035/viewer/2022062423/56814471550346895db10498/html5/thumbnails/22.jpg)
22
B+tree rulesB+tree rulestree of order tree of order nn
(1) All leaves at same lowest level(1) All leaves at same lowest level(balanced tree)(balanced tree)
(2) Pointers in leaves point to records(2) Pointers in leaves point to records except for “sequence pointer”except for “sequence pointer”
![Page 23: Chapter 12: Indexing and Hashing](https://reader035.fdocuments.us/reader035/viewer/2022062423/56814471550346895db10498/html5/thumbnails/23.jpg)
23
(3) Number of pointers/keys for B+tree(3) Number of pointers/keys for B+tree
Non-leaf(non-root) n n-1 n/2 n/2- 1
Leaf(non-root) n n-1
Root n n-1 2 1
Max Max Min Min ptrs keys ptrsdata keys
(n-1)/2 (n-1)/2
![Page 24: Chapter 12: Indexing and Hashing](https://reader035.fdocuments.us/reader035/viewer/2022062423/56814471550346895db10498/html5/thumbnails/24.jpg)
24
Insert into B+treeInsert into B+tree
(a) simple case(a) simple case space available in leafspace available in leaf
(b) leaf overflow(b) leaf overflow
(c) non-leaf overflow(c) non-leaf overflow
(d) new root(d) new root
![Page 25: Chapter 12: Indexing and Hashing](https://reader035.fdocuments.us/reader035/viewer/2022062423/56814471550346895db10498/html5/thumbnails/25.jpg)
25
(a) Insert key = 32(a) Insert key = 32 n=43 5 11
30
31
30
100
32
![Page 26: Chapter 12: Indexing and Hashing](https://reader035.fdocuments.us/reader035/viewer/2022062423/56814471550346895db10498/html5/thumbnails/26.jpg)
26
(b) Insert key = 7(b) Insert key = 7 n=4
3 5 11
30
31
30
100
3 5
7
7
![Page 27: Chapter 12: Indexing and Hashing](https://reader035.fdocuments.us/reader035/viewer/2022062423/56814471550346895db10498/html5/thumbnails/27.jpg)
27
(c) Insert key = 160(c) Insert key = 160 n=4
10
0
120
150
180
150
156
179
180
200
160
18
0
160
179
![Page 28: Chapter 12: Indexing and Hashing](https://reader035.fdocuments.us/reader035/viewer/2022062423/56814471550346895db10498/html5/thumbnails/28.jpg)
28
(d) New root, insert 45(d) New root, insert 45 n=4
10
20
30
1 2 3 10
12
20
25
30
32
40
40
45
40
30new root
![Page 29: Chapter 12: Indexing and Hashing](https://reader035.fdocuments.us/reader035/viewer/2022062423/56814471550346895db10498/html5/thumbnails/29.jpg)
29
(a) Simple case - (a) Simple case - no exampleno example
(b) Coalesce with neighbor (b) Coalesce with neighbor (sibling)(sibling)
(c) Re-distribute keys(c) Re-distribute keys
(d) Cases (b) or (c) at non-leaf(d) Cases (b) or (c) at non-leaf
Deletion from B+treeDeletion from B+tree
![Page 30: Chapter 12: Indexing and Hashing](https://reader035.fdocuments.us/reader035/viewer/2022062423/56814471550346895db10498/html5/thumbnails/30.jpg)
30
(b) Coalesce with sibling(b) Coalesce with sibling Delete 50Delete 50
10
40
100
10
20
30
40
50
n=5
40
![Page 31: Chapter 12: Indexing and Hashing](https://reader035.fdocuments.us/reader035/viewer/2022062423/56814471550346895db10498/html5/thumbnails/31.jpg)
31
(c) Redistribute keys(c) Redistribute keys Delete 50Delete 50
10
40
100
10
20
30
35
40
50
n=5
35
35
![Page 32: Chapter 12: Indexing and Hashing](https://reader035.fdocuments.us/reader035/viewer/2022062423/56814471550346895db10498/html5/thumbnails/32.jpg)
32
40
45
30
37
25
26
20
22
10
141 3
10
20
30
40
(d) Non-leaf coalese(d) Non-leaf coalese Delete 37Delete 37
n=5
40
30
25
25
new root
![Page 33: Chapter 12: Indexing and Hashing](https://reader035.fdocuments.us/reader035/viewer/2022062423/56814471550346895db10498/html5/thumbnails/33.jpg)
33
B+tree deletions in B+tree deletions in practicepractice
– Often, coalescing is Often, coalescing is notnot implemented implemented Too hard and not worth it!Too hard and not worth it!
![Page 34: Chapter 12: Indexing and Hashing](https://reader035.fdocuments.us/reader035/viewer/2022062423/56814471550346895db10498/html5/thumbnails/34.jpg)
34
Index Definition in SQLIndex Definition in SQL
Create an indexCreate an indexcreate indexcreate index <index-name> <index-name>
onon <relation-name> (<attribute-list>) <relation-name> (<attribute-list>)
E.g.: create index gindex on country(gdp);E.g.: create index gindex on country(gdp);
To drop an index To drop an index drop index drop index <index-name><index-name>
E.g.: drop index gindex;E.g.: drop index gindex;