1 CS 430 / INFO 430 Information Retrieval Lecture 9 Latent Semantic Indexing.
CS 277: Database System Implementation Notes 4: Indexing
description
Transcript of CS 277: Database System Implementation Notes 4: Indexing
![Page 1: CS 277: Database System Implementation Notes 4: Indexing](https://reader035.fdocuments.us/reader035/viewer/2022062518/56814473550346895db1056d/html5/thumbnails/1.jpg)
CS 277 – Spring 2002
Notes 4 1
CS 277: Database System Implementation
Notes 4: Indexing
Arthur Keller
![Page 2: CS 277: Database System Implementation Notes 4: Indexing](https://reader035.fdocuments.us/reader035/viewer/2022062518/56814473550346895db1056d/html5/thumbnails/2.jpg)
CS 277 – Spring 2002
Notes 4 2
Indexing & Hashing
value
Chapter 4
value
record
![Page 3: CS 277: Database System Implementation Notes 4: Indexing](https://reader035.fdocuments.us/reader035/viewer/2022062518/56814473550346895db1056d/html5/thumbnails/3.jpg)
CS 277 – Spring 2002
Notes 4 3
Topics
• Conventional indexes• B-trees• Hashing schemes
![Page 4: CS 277: Database System Implementation Notes 4: Indexing](https://reader035.fdocuments.us/reader035/viewer/2022062518/56814473550346895db1056d/html5/thumbnails/4.jpg)
CS 277 – Spring 2002
Notes 4 4
Sequential File
2010
4030
6050
8070
10090
![Page 5: CS 277: Database System Implementation Notes 4: Indexing](https://reader035.fdocuments.us/reader035/viewer/2022062518/56814473550346895db1056d/html5/thumbnails/5.jpg)
CS 277 – Spring 2002
Notes 4 5
Sequential File
2010
4030
6050
8070
10090
Dense Index
10203040
50607080
90100110120
![Page 6: CS 277: Database System Implementation Notes 4: Indexing](https://reader035.fdocuments.us/reader035/viewer/2022062518/56814473550346895db1056d/html5/thumbnails/6.jpg)
CS 277 – Spring 2002
Notes 4 6
Sequential File
2010
4030
6050
8070
10090
Sparse Index
10305070
90110130150
170190210230
![Page 7: CS 277: Database System Implementation Notes 4: Indexing](https://reader035.fdocuments.us/reader035/viewer/2022062518/56814473550346895db1056d/html5/thumbnails/7.jpg)
CS 277 – Spring 2002
Notes 4 7
Sequential File
2010
4030
6050
8070
10090
Sparse 2nd level
10305070
90110130150
170190210230
1090
170250
330410490570
![Page 8: CS 277: Database System Implementation Notes 4: Indexing](https://reader035.fdocuments.us/reader035/viewer/2022062518/56814473550346895db1056d/html5/thumbnails/8.jpg)
CS 277 – Spring 2002
Notes 4 8
• Comment:{FILE,INDEX} may be contiguous
or not (blocks chained)
![Page 9: CS 277: Database System Implementation Notes 4: Indexing](https://reader035.fdocuments.us/reader035/viewer/2022062518/56814473550346895db1056d/html5/thumbnails/9.jpg)
CS 277 – Spring 2002
Notes 4 9
Question:
• Can we build a dense, 2nd level index for a dense index?
![Page 10: CS 277: Database System Implementation Notes 4: Indexing](https://reader035.fdocuments.us/reader035/viewer/2022062518/56814473550346895db1056d/html5/thumbnails/10.jpg)
CS 277 – Spring 2002
Notes 4 10
Notes on pointers:
(1) Block pointer (sparse index) can be smaller than record pointer
BP
RP
![Page 11: CS 277: Database System Implementation Notes 4: Indexing](https://reader035.fdocuments.us/reader035/viewer/2022062518/56814473550346895db1056d/html5/thumbnails/11.jpg)
CS 277 – Spring 2002
Notes 4 11
(2) If file is contiguous, then we can omitpointers (i.e., compute them)
Notes on pointers:
![Page 12: CS 277: Database System Implementation Notes 4: Indexing](https://reader035.fdocuments.us/reader035/viewer/2022062518/56814473550346895db1056d/html5/thumbnails/12.jpg)
CS 277 – Spring 2002
Notes 4 12
K1
K3
K4
K2
R1
R2
R3
R4
say:1024 Bper block
• if we want K3 block: get it at offset (3-1)1024 = 2048 bytes
![Page 13: CS 277: Database System Implementation Notes 4: Indexing](https://reader035.fdocuments.us/reader035/viewer/2022062518/56814473550346895db1056d/html5/thumbnails/13.jpg)
CS 277 – Spring 2002
Notes 4 13
Sparse vs. Dense Tradeoff
• Sparse: Less index space per record can keep more of
index in memory• Dense: Can tell if any record exists
without accessing file
(Later: – sparse better for insertions– dense needed for secondary indexes)
![Page 14: CS 277: Database System Implementation Notes 4: Indexing](https://reader035.fdocuments.us/reader035/viewer/2022062518/56814473550346895db1056d/html5/thumbnails/14.jpg)
CS 277 – Spring 2002
Notes 4 14
Terms
• Index sequential file• Search key ( primary key)• Primary index (on Sequencing field)• Secondary index• Dense index (all Search Key values in)• Sparse index• Multi-level index
![Page 15: CS 277: Database System Implementation Notes 4: Indexing](https://reader035.fdocuments.us/reader035/viewer/2022062518/56814473550346895db1056d/html5/thumbnails/15.jpg)
CS 277 – Spring 2002
Notes 4 15
Next:
• Duplicate keys
• Deletion/Insertion
• Secondary indexes
![Page 16: CS 277: Database System Implementation Notes 4: Indexing](https://reader035.fdocuments.us/reader035/viewer/2022062518/56814473550346895db1056d/html5/thumbnails/16.jpg)
CS 277 – Spring 2002
Notes 4 16
Duplicate keys
1010
2010
3020
3030
4540
![Page 17: CS 277: Database System Implementation Notes 4: Indexing](https://reader035.fdocuments.us/reader035/viewer/2022062518/56814473550346895db1056d/html5/thumbnails/17.jpg)
CS 277 – Spring 2002
Notes 4 17
1010
2010
3020
3030
4540
10101020
20303030
1010
2010
3020
3030
4540
10101020
20303030
Dense index, one way to implement?
Duplicate keys
![Page 18: CS 277: Database System Implementation Notes 4: Indexing](https://reader035.fdocuments.us/reader035/viewer/2022062518/56814473550346895db1056d/html5/thumbnails/18.jpg)
CS 277 – Spring 2002
Notes 4 18
1010
2010
3020
3030
4540
10203040
Dense index, better way?
Duplicate keys
![Page 19: CS 277: Database System Implementation Notes 4: Indexing](https://reader035.fdocuments.us/reader035/viewer/2022062518/56814473550346895db1056d/html5/thumbnails/19.jpg)
CS 277 – Spring 2002
Notes 4 19
1010
2010
3020
3030
4540
10102030
Sparse index, one way?
Duplicate keys
care
ful if lookin
gfo
r 2
0 o
r 3
0!
![Page 20: CS 277: Database System Implementation Notes 4: Indexing](https://reader035.fdocuments.us/reader035/viewer/2022062518/56814473550346895db1056d/html5/thumbnails/20.jpg)
CS 277 – Spring 2002
Notes 4 20
1010
2010
3020
3030
4540
10203030
Sparse index, another way?
Duplicate keys
– place first new key from block
shouldthis be40?
![Page 21: CS 277: Database System Implementation Notes 4: Indexing](https://reader035.fdocuments.us/reader035/viewer/2022062518/56814473550346895db1056d/html5/thumbnails/21.jpg)
CS 277 – Spring 2002
Notes 4 21
Duplicate values, primary index
• Index may point to first instance ofeach value only
File Index
Summary
aaa
b
![Page 22: CS 277: Database System Implementation Notes 4: Indexing](https://reader035.fdocuments.us/reader035/viewer/2022062518/56814473550346895db1056d/html5/thumbnails/22.jpg)
CS 277 – Spring 2002
Notes 4 22
Deletion from sparse index
2010
4030
6050
8070
10305070
90110130150
![Page 23: CS 277: Database System Implementation Notes 4: Indexing](https://reader035.fdocuments.us/reader035/viewer/2022062518/56814473550346895db1056d/html5/thumbnails/23.jpg)
CS 277 – Spring 2002
Notes 4 23
Deletion from sparse index
2010
4030
6050
8070
10305070
90110130150
– delete record 40
![Page 24: CS 277: Database System Implementation Notes 4: Indexing](https://reader035.fdocuments.us/reader035/viewer/2022062518/56814473550346895db1056d/html5/thumbnails/24.jpg)
CS 277 – Spring 2002
Notes 4 24
Deletion from sparse index
2010
4030
6050
8070
10305070
90110130150
– delete record 30
4040
![Page 25: CS 277: Database System Implementation Notes 4: Indexing](https://reader035.fdocuments.us/reader035/viewer/2022062518/56814473550346895db1056d/html5/thumbnails/25.jpg)
CS 277 – Spring 2002
Notes 4 25
Deletion from sparse index
2010
4030
6050
8070
10305070
90110130150
– delete records 30 & 40
5070
![Page 26: CS 277: Database System Implementation Notes 4: Indexing](https://reader035.fdocuments.us/reader035/viewer/2022062518/56814473550346895db1056d/html5/thumbnails/26.jpg)
CS 277 – Spring 2002
Notes 4 26
Deletion from dense index
2010
4030
6050
8070
10203040
50607080
![Page 27: CS 277: Database System Implementation Notes 4: Indexing](https://reader035.fdocuments.us/reader035/viewer/2022062518/56814473550346895db1056d/html5/thumbnails/27.jpg)
CS 277 – Spring 2002
Notes 4 27
Deletion from dense index
2010
4030
6050
8070
10203040
50607080
– delete record 30
4040
![Page 28: CS 277: Database System Implementation Notes 4: Indexing](https://reader035.fdocuments.us/reader035/viewer/2022062518/56814473550346895db1056d/html5/thumbnails/28.jpg)
CS 277 – Spring 2002
Notes 4 28
Insertion, sparse index case
2010
30
5040
60
10304060
![Page 29: CS 277: Database System Implementation Notes 4: Indexing](https://reader035.fdocuments.us/reader035/viewer/2022062518/56814473550346895db1056d/html5/thumbnails/29.jpg)
CS 277 – Spring 2002
Notes 4 29
Insertion, sparse index case
2010
30
5040
60
10304060
– insert record 34
34
• our lucky day! we have free space where we need it!
![Page 30: CS 277: Database System Implementation Notes 4: Indexing](https://reader035.fdocuments.us/reader035/viewer/2022062518/56814473550346895db1056d/html5/thumbnails/30.jpg)
CS 277 – Spring 2002
Notes 4 30
Insertion, sparse index case
2010
30
5040
60
10304060
– insert record 15
15
2030
20
• Illustrated: Immediate reorganization• Variation:
– insert new block (chained file)– update index
![Page 31: CS 277: Database System Implementation Notes 4: Indexing](https://reader035.fdocuments.us/reader035/viewer/2022062518/56814473550346895db1056d/html5/thumbnails/31.jpg)
CS 277 – Spring 2002
Notes 4 31
Insertion, sparse index case
2010
30
5040
60
10304060
– insert record 25
25
overflow blocks(reorganize later...)
![Page 32: CS 277: Database System Implementation Notes 4: Indexing](https://reader035.fdocuments.us/reader035/viewer/2022062518/56814473550346895db1056d/html5/thumbnails/32.jpg)
CS 277 – Spring 2002
Notes 4 32
Insertion, dense index case
• Similar
• Often more expensive . . .
![Page 33: CS 277: Database System Implementation Notes 4: Indexing](https://reader035.fdocuments.us/reader035/viewer/2022062518/56814473550346895db1056d/html5/thumbnails/33.jpg)
CS 277 – Spring 2002
Notes 4 33
Secondary indexesSequencefield
5030
7020
4080
10100
6090
![Page 34: CS 277: Database System Implementation Notes 4: Indexing](https://reader035.fdocuments.us/reader035/viewer/2022062518/56814473550346895db1056d/html5/thumbnails/34.jpg)
CS 277 – Spring 2002
Notes 4 34
Secondary indexesSequencefield
5030
7020
4080
10100
6090
• Sparse index
302080
100
90...
does not make sense!
![Page 35: CS 277: Database System Implementation Notes 4: Indexing](https://reader035.fdocuments.us/reader035/viewer/2022062518/56814473550346895db1056d/html5/thumbnails/35.jpg)
CS 277 – Spring 2002
Notes 4 35
Secondary indexesSequencefield
5030
7020
4080
10100
6090
• Dense index10203040
506070...
105090...
sparsehighlevel
![Page 36: CS 277: Database System Implementation Notes 4: Indexing](https://reader035.fdocuments.us/reader035/viewer/2022062518/56814473550346895db1056d/html5/thumbnails/36.jpg)
CS 277 – Spring 2002
Notes 4 36
With secondary indexes:
• Lowest level is dense• Other levels are sparse
Also: Pointers are record pointers
(not block pointers; not computed)
![Page 37: CS 277: Database System Implementation Notes 4: Indexing](https://reader035.fdocuments.us/reader035/viewer/2022062518/56814473550346895db1056d/html5/thumbnails/37.jpg)
CS 277 – Spring 2002
Notes 4 37
Duplicate values & secondary indexes
1020
4020
4010
4010
4030
![Page 38: CS 277: Database System Implementation Notes 4: Indexing](https://reader035.fdocuments.us/reader035/viewer/2022062518/56814473550346895db1056d/html5/thumbnails/38.jpg)
CS 277 – Spring 2002
Notes 4 38
Duplicate values & secondary indexes
1020
4020
4010
4010
4030
10101020
20304040
4040...
one option...
Problem:excess overhead!
• disk space• search time
![Page 39: CS 277: Database System Implementation Notes 4: Indexing](https://reader035.fdocuments.us/reader035/viewer/2022062518/56814473550346895db1056d/html5/thumbnails/39.jpg)
CS 277 – Spring 2002
Notes 4 39
Duplicate values & secondary indexes
1020
4020
4010
4010
4030
10
another option...
4030
20Problem:variable sizerecords inindex!
![Page 40: CS 277: Database System Implementation Notes 4: Indexing](https://reader035.fdocuments.us/reader035/viewer/2022062518/56814473550346895db1056d/html5/thumbnails/40.jpg)
CS 277 – Spring 2002
Notes 4 40
Duplicate values & secondary indexes
1020
4020
4010
4010
4030
10203040
5060...
Another idea (suggested in class):Chain records with same key?
Problems:• Need to add fields to records• Need to follow chain to know records
![Page 41: CS 277: Database System Implementation Notes 4: Indexing](https://reader035.fdocuments.us/reader035/viewer/2022062518/56814473550346895db1056d/html5/thumbnails/41.jpg)
CS 277 – Spring 2002
Notes 4 41
Duplicate values & secondary indexes
1020
4020
4010
4010
4030
10203040
5060...
buckets
![Page 42: CS 277: Database System Implementation Notes 4: Indexing](https://reader035.fdocuments.us/reader035/viewer/2022062518/56814473550346895db1056d/html5/thumbnails/42.jpg)
CS 277 – Spring 2002
Notes 4 42
Why “bucket” idea is useful
Indexes RecordsName: primary EMP
(name,dept,floor,...)
Dept: secondaryFloor: secondary
![Page 43: CS 277: Database System Implementation Notes 4: Indexing](https://reader035.fdocuments.us/reader035/viewer/2022062518/56814473550346895db1056d/html5/thumbnails/43.jpg)
CS 277 – Spring 2002
Notes 4 43
Query: Get employees in
(Toy Dept) ^ (2nd floor)
Dept. index EMP Floor index
Toy 2nd
Intersect toy bucket and 2nd Floor
bucket to get set of matching EMP’s
![Page 44: CS 277: Database System Implementation Notes 4: Indexing](https://reader035.fdocuments.us/reader035/viewer/2022062518/56814473550346895db1056d/html5/thumbnails/44.jpg)
CS 277 – Spring 2002
Notes 4 44
This idea used in text information retrieval
Documents
...the cat is fat ...
...was raining cats and dogs...
...Fido the dog ...
Inverted lists
cat
dog
![Page 45: CS 277: Database System Implementation Notes 4: Indexing](https://reader035.fdocuments.us/reader035/viewer/2022062518/56814473550346895db1056d/html5/thumbnails/45.jpg)
CS 277 – Spring 2002
Notes 4 45
IR QUERIES
• Find articles with “cat” and “dog”• Find articles with “cat” or “dog”• Find articles with “cat” and not “dog”
• Find articles with “cat” in title• Find articles with “cat” and “dog”
within 5 words
![Page 46: CS 277: Database System Implementation Notes 4: Indexing](https://reader035.fdocuments.us/reader035/viewer/2022062518/56814473550346895db1056d/html5/thumbnails/46.jpg)
CS 277 – Spring 2002
Notes 4 46
Common technique: more info in inverted
list
cat Title 5
Title 100
Author 10
Abstract 57
Title 12
d3d2
d1
dog
typeposit
ion
locatio
n
![Page 47: CS 277: Database System Implementation Notes 4: Indexing](https://reader035.fdocuments.us/reader035/viewer/2022062518/56814473550346895db1056d/html5/thumbnails/47.jpg)
CS 277 – Spring 2002
Notes 4 47
Posting: an entry in inverted list.Represents occurrence
ofterm in article
Size of a list: 1 Rare words or (in postings) miss-spellings
106 Common words
Size of a posting: 10-15 bits (compressed)
![Page 48: CS 277: Database System Implementation Notes 4: Indexing](https://reader035.fdocuments.us/reader035/viewer/2022062518/56814473550346895db1056d/html5/thumbnails/48.jpg)
CS 277 – Spring 2002
Notes 4 48
IR DISCUSSION
• Stop words• Truncation• Thesaurus• Full text vs. Abstracts• Vector model
![Page 49: CS 277: Database System Implementation Notes 4: Indexing](https://reader035.fdocuments.us/reader035/viewer/2022062518/56814473550346895db1056d/html5/thumbnails/49.jpg)
CS 277 – Spring 2002
Notes 4 49
Vector space model
w1 w2 w3 w4 w5 w6 w7 …
DOC = <1 0 0 1 1 0 0 …>
Query= <0 0 1 1 0 0 0 …>
PRODUCT = 1 + ……. = score
![Page 50: CS 277: Database System Implementation Notes 4: Indexing](https://reader035.fdocuments.us/reader035/viewer/2022062518/56814473550346895db1056d/html5/thumbnails/50.jpg)
CS 277 – Spring 2002
Notes 4 50
• Tricks to weigh scores + normalize
e.g.: Match on common word not asuseful as match on rare
words...
![Page 51: CS 277: Database System Implementation Notes 4: Indexing](https://reader035.fdocuments.us/reader035/viewer/2022062518/56814473550346895db1056d/html5/thumbnails/51.jpg)
CS 277 – Spring 2002
Notes 4 51
• How to process VS. Queries?
w1 w2 w3 w4 w5 w6 …Q = < 0 0 0 1 1 0 … >
![Page 52: CS 277: Database System Implementation Notes 4: Indexing](https://reader035.fdocuments.us/reader035/viewer/2022062518/56814473550346895db1056d/html5/thumbnails/52.jpg)
CS 277 – Spring 2002
Notes 4 52
• Try Altavista, Excite, Infoseek, Lycos...
![Page 53: CS 277: Database System Implementation Notes 4: Indexing](https://reader035.fdocuments.us/reader035/viewer/2022062518/56814473550346895db1056d/html5/thumbnails/53.jpg)
CS 277 – Spring 2002
Notes 4 53
Summary so far
• Conventional index– Basic Ideas: sparse, dense, multi-
level…– Duplicate Keys– Deletion/Insertion– Secondary indexes
– Buckets of Postings List
![Page 54: CS 277: Database System Implementation Notes 4: Indexing](https://reader035.fdocuments.us/reader035/viewer/2022062518/56814473550346895db1056d/html5/thumbnails/54.jpg)
CS 277 – Spring 2002
Notes 4 54
Conventional indexes
Advantage:- Simple- Index is sequential file
good for scans
Disadvantage:- Inserts expensive, and/or- Lose sequentiality & balance
![Page 55: CS 277: Database System Implementation Notes 4: Indexing](https://reader035.fdocuments.us/reader035/viewer/2022062518/56814473550346895db1056d/html5/thumbnails/55.jpg)
CS 277 – Spring 2002
Notes 4 55
Example Index (sequential)
continuous
free space
102030
405060
708090
39313536
323834
33
overflow area(not sequential)
![Page 56: CS 277: Database System Implementation Notes 4: Indexing](https://reader035.fdocuments.us/reader035/viewer/2022062518/56814473550346895db1056d/html5/thumbnails/56.jpg)
CS 277 – Spring 2002
Notes 4 56
Outline:
• Conventional indexes• B-Trees NEXT• Hashing schemes
![Page 57: CS 277: Database System Implementation Notes 4: Indexing](https://reader035.fdocuments.us/reader035/viewer/2022062518/56814473550346895db1056d/html5/thumbnails/57.jpg)
CS 277 – Spring 2002
Notes 4 57
• NEXT: Another type of index– Give up on sequentiality of index– Try to get “balance”
![Page 58: CS 277: Database System Implementation Notes 4: Indexing](https://reader035.fdocuments.us/reader035/viewer/2022062518/56814473550346895db1056d/html5/thumbnails/58.jpg)
CS 277 – Spring 2002
Notes 4 58
Root
B+Tree Example n=3
100
120
150
180
30
3 5 11
30
35
100
101
110
120
130
150
156
179
180
200
![Page 59: CS 277: Database System Implementation Notes 4: Indexing](https://reader035.fdocuments.us/reader035/viewer/2022062518/56814473550346895db1056d/html5/thumbnails/59.jpg)
CS 277 – Spring 2002
Notes 4 59
Sample non-leaf
to keys to keys to keys to keys
< 57 57 k<81 81k<95 95
57
81
95
![Page 60: CS 277: Database System Implementation Notes 4: Indexing](https://reader035.fdocuments.us/reader035/viewer/2022062518/56814473550346895db1056d/html5/thumbnails/60.jpg)
CS 277 – Spring 2002
Notes 4 60
Sample leaf node:
From non-leaf node
to next leafin
sequence5
7
81
95
To r
eco
rd
wit
h k
ey 5
7
To r
eco
rd
wit
h k
ey 8
1
To r
eco
rd
wit
h k
ey 8
5
![Page 61: CS 277: Database System Implementation Notes 4: Indexing](https://reader035.fdocuments.us/reader035/viewer/2022062518/56814473550346895db1056d/html5/thumbnails/61.jpg)
CS 277 – Spring 2002
Notes 4 61
In textbook’s notationn=3
Leaf:
Non-leaf:
30
35
30
30 35
30
![Page 62: CS 277: Database System Implementation Notes 4: Indexing](https://reader035.fdocuments.us/reader035/viewer/2022062518/56814473550346895db1056d/html5/thumbnails/62.jpg)
CS 277 – Spring 2002
Notes 4 62
Size of nodes: n+1 pointersn keys
(fixed)
![Page 63: CS 277: Database System Implementation Notes 4: Indexing](https://reader035.fdocuments.us/reader035/viewer/2022062518/56814473550346895db1056d/html5/thumbnails/63.jpg)
CS 277 – Spring 2002
Notes 4 63
Don’t want nodes to be too empty
• Use at least
Non-leaf: (n+1)/2pointers
Leaf: (n+1)/2 pointers to data
![Page 64: CS 277: Database System Implementation Notes 4: Indexing](https://reader035.fdocuments.us/reader035/viewer/2022062518/56814473550346895db1056d/html5/thumbnails/64.jpg)
CS 277 – Spring 2002
Notes 4 64
Full nodemin. node
Non-leaf
Leaf
n=3
12
01
50
18
0
30
3 5 11
30
35
counts
even if
null
![Page 65: CS 277: Database System Implementation Notes 4: Indexing](https://reader035.fdocuments.us/reader035/viewer/2022062518/56814473550346895db1056d/html5/thumbnails/65.jpg)
CS 277 – Spring 2002
Notes 4 65
B+tree rules tree of order n
(1) All leaves at same lowest level(balanced tree)
(2) Pointers in leaves point to records except for “sequence pointer”
![Page 66: CS 277: Database System Implementation Notes 4: Indexing](https://reader035.fdocuments.us/reader035/viewer/2022062518/56814473550346895db1056d/html5/thumbnails/66.jpg)
CS 277 – Spring 2002
Notes 4 66
(3) Number of pointers/keys for B+tree
Non-leaf(non-root) n+1 n (n+1)/2 (n+1)/2- 1
Leaf(non-root) n+1 n
Root n+1 n 1 1
Max Max Min Min ptrs keys ptrsdata keys
(n+1)/2 (n+1)/2
![Page 67: CS 277: Database System Implementation Notes 4: Indexing](https://reader035.fdocuments.us/reader035/viewer/2022062518/56814473550346895db1056d/html5/thumbnails/67.jpg)
CS 277 – Spring 2002
Notes 4 67
Insert into B+tree
(a) simple case– space available in leaf
(b) leaf overflow(c) non-leaf overflow(d) new root
![Page 68: CS 277: Database System Implementation Notes 4: Indexing](https://reader035.fdocuments.us/reader035/viewer/2022062518/56814473550346895db1056d/html5/thumbnails/68.jpg)
CS 277 – Spring 2002
Notes 4 68
(a) Insert key = 32 n=33 5 11
30
31
30
100
32
![Page 69: CS 277: Database System Implementation Notes 4: Indexing](https://reader035.fdocuments.us/reader035/viewer/2022062518/56814473550346895db1056d/html5/thumbnails/69.jpg)
CS 277 – Spring 2002
Notes 4 69
(a) Insert key = 7 n=3
3 5 11
30
31
30
100
3 5
7
7
![Page 70: CS 277: Database System Implementation Notes 4: Indexing](https://reader035.fdocuments.us/reader035/viewer/2022062518/56814473550346895db1056d/html5/thumbnails/70.jpg)
CS 277 – Spring 2002
Notes 4 70
(c) Insert key = 160 n=3
10
0
120
150
180
150
156
179
180
200
160
18
0
160
179
![Page 71: CS 277: Database System Implementation Notes 4: Indexing](https://reader035.fdocuments.us/reader035/viewer/2022062518/56814473550346895db1056d/html5/thumbnails/71.jpg)
CS 277 – Spring 2002
Notes 4 71
(d) New root, insert 45 n=3
10
20
30
1 2 3 10
12
20
25
30
32
40
40
45
40
30new root
![Page 72: CS 277: Database System Implementation Notes 4: Indexing](https://reader035.fdocuments.us/reader035/viewer/2022062518/56814473550346895db1056d/html5/thumbnails/72.jpg)
CS 277 – Spring 2002
Notes 4 72
(a) Simple case - no example
(b) Coalesce with neighbor (sibling)
(c) Re-distribute keys(d) Cases (b) or (c) at non-leaf
Deletion from B+tree
![Page 73: CS 277: Database System Implementation Notes 4: Indexing](https://reader035.fdocuments.us/reader035/viewer/2022062518/56814473550346895db1056d/html5/thumbnails/73.jpg)
CS 277 – Spring 2002
Notes 4 73
(b) Coalesce with sibling– Delete 50
10
40
100
10
20
30
40
50
n=4
40
![Page 74: CS 277: Database System Implementation Notes 4: Indexing](https://reader035.fdocuments.us/reader035/viewer/2022062518/56814473550346895db1056d/html5/thumbnails/74.jpg)
CS 277 – Spring 2002
Notes 4 74
(c) Redistribute keys– Delete 50
10
40
100
10
20
30
35
40
50
n=4
35
35
![Page 75: CS 277: Database System Implementation Notes 4: Indexing](https://reader035.fdocuments.us/reader035/viewer/2022062518/56814473550346895db1056d/html5/thumbnails/75.jpg)
CS 277 – Spring 2002
Notes 4 75
40
45
30
37
25
26
20
22
10
141 3
10
20
30
40
(d) Non-leaf coalese– Delete 37
n=4
40
30
25
25
new root
![Page 76: CS 277: Database System Implementation Notes 4: Indexing](https://reader035.fdocuments.us/reader035/viewer/2022062518/56814473550346895db1056d/html5/thumbnails/76.jpg)
CS 277 – Spring 2002
Notes 4 76
B+tree deletions in practice
– Often, coalescing is not implemented– Too hard and not worth it!
![Page 77: CS 277: Database System Implementation Notes 4: Indexing](https://reader035.fdocuments.us/reader035/viewer/2022062518/56814473550346895db1056d/html5/thumbnails/77.jpg)
CS 277 – Spring 2002
Notes 4 77
Comparison: B-trees vs. static indexed sequential
fileRef #1: Held & Stonebraker
“B-Trees Re-examined”CACM, Feb. 1978
![Page 78: CS 277: Database System Implementation Notes 4: Indexing](https://reader035.fdocuments.us/reader035/viewer/2022062518/56814473550346895db1056d/html5/thumbnails/78.jpg)
CS 277 – Spring 2002
Notes 4 78
Ref # 1 claims:- Concurrency control harder in B-Trees
- B-tree consumes more space
For their comparison:block = 512 byteskey = pointer = 4 bytes4 data records per block
![Page 79: CS 277: Database System Implementation Notes 4: Indexing](https://reader035.fdocuments.us/reader035/viewer/2022062518/56814473550346895db1056d/html5/thumbnails/79.jpg)
CS 277 – Spring 2002
Notes 4 79
Example: 1 block static index
127 keys
(127+1)4 = 512 Bytes-> pointers in index implicit! up to 127
blocks
k1
k2
k3
k1
k2
k3
1 datablock
![Page 80: CS 277: Database System Implementation Notes 4: Indexing](https://reader035.fdocuments.us/reader035/viewer/2022062518/56814473550346895db1056d/html5/thumbnails/80.jpg)
CS 277 – Spring 2002
Notes 4 80
Example: 1 block B-tree
63 keys
63x(4+4)+8 = 512 Bytes-> pointers needed in B-tree up to 63
blocks because index is blocksnot contiguous
k1
k2
...
k63
k1
k2
k3
1 datablock
next
-
![Page 81: CS 277: Database System Implementation Notes 4: Indexing](https://reader035.fdocuments.us/reader035/viewer/2022062518/56814473550346895db1056d/html5/thumbnails/81.jpg)
CS 277 – Spring 2002
Notes 4 81
Size comparison Ref. #1Size comparison Ref. #1
Static Index B-tree# data # datablocks height blocks
height
2 -> 127 2 2 -> 63 2128 -> 16,129 3 64 -> 3968 316,130 -> 2,048,383 4 3969 -> 250,047 4
250,048 -> 15,752,961 5
![Page 82: CS 277: Database System Implementation Notes 4: Indexing](https://reader035.fdocuments.us/reader035/viewer/2022062518/56814473550346895db1056d/html5/thumbnails/82.jpg)
CS 277 – Spring 2002
Notes 4 82
Ref. #1 analysis claims
• For an 8,000 block file,after 32,000 inserts
after 16,000 lookups Static index saves enough accesses
to allow for reorganization
Ref. #1 conclusion Static index better!!
![Page 83: CS 277: Database System Implementation Notes 4: Indexing](https://reader035.fdocuments.us/reader035/viewer/2022062518/56814473550346895db1056d/html5/thumbnails/83.jpg)
CS 277 – Spring 2002
Notes 4 83
Ref #2: M. Stonebraker, “Retrospective on a database
system,” TODS, June 1980
Ref. #2 conclusion B-trees better!!
![Page 84: CS 277: Database System Implementation Notes 4: Indexing](https://reader035.fdocuments.us/reader035/viewer/2022062518/56814473550346895db1056d/html5/thumbnails/84.jpg)
CS 277 – Spring 2002
Notes 4 84
• DBA does not know when to reorganize• DBA does not know how full to load
pages of new index
Ref. #2 conclusion B-trees better!!
![Page 85: CS 277: Database System Implementation Notes 4: Indexing](https://reader035.fdocuments.us/reader035/viewer/2022062518/56814473550346895db1056d/html5/thumbnails/85.jpg)
CS 277 – Spring 2002
Notes 4 85
• Buffering– B-tree: has fixed buffer requirements– Static index: must read several
overflow blocks to be efficient (large & variable size buffers needed for this)
Ref. #2 conclusion B-trees better!!
![Page 86: CS 277: Database System Implementation Notes 4: Indexing](https://reader035.fdocuments.us/reader035/viewer/2022062518/56814473550346895db1056d/html5/thumbnails/86.jpg)
CS 277 – Spring 2002
Notes 4 86
• Speaking of buffering… Is LRU a good policy for B+tree
buffers? Of course not! Should try to keep root in memory
at all times(and perhaps some nodes from second level)
![Page 87: CS 277: Database System Implementation Notes 4: Indexing](https://reader035.fdocuments.us/reader035/viewer/2022062518/56814473550346895db1056d/html5/thumbnails/87.jpg)
CS 277 – Spring 2002
Notes 4 87
Interesting problem:
For B+tree, how large should n be?
…
n is number of keys / node
![Page 88: CS 277: Database System Implementation Notes 4: Indexing](https://reader035.fdocuments.us/reader035/viewer/2022062518/56814473550346895db1056d/html5/thumbnails/88.jpg)
CS 277 – Spring 2002
Notes 4 88
Sample assumptions:(1) Time to read node from disk is
(70+0.05n) msec.(2) Once block in memory, use binary
search to locate key:(a + b LOG2 n) msec.
For some constants a,b; Assume a << 70
(3) Assume B+tree is full, i.e., # nodes to examine is LOGn N
where N = # records
![Page 89: CS 277: Database System Implementation Notes 4: Indexing](https://reader035.fdocuments.us/reader035/viewer/2022062518/56814473550346895db1056d/html5/thumbnails/89.jpg)
CS 277 – Spring 2002
Notes 4 89
Can get: f(n) = time to find a record
f(n)
nopt n
![Page 90: CS 277: Database System Implementation Notes 4: Indexing](https://reader035.fdocuments.us/reader035/viewer/2022062518/56814473550346895db1056d/html5/thumbnails/90.jpg)
CS 277 – Spring 2002
Notes 4 90
FIND nopt by f’(n) = 0
Answer is nopt = “few hundred”
(see homework for details)
What happens to nopt as
• Disk gets faster?• CPU get faster?
![Page 91: CS 277: Database System Implementation Notes 4: Indexing](https://reader035.fdocuments.us/reader035/viewer/2022062518/56814473550346895db1056d/html5/thumbnails/91.jpg)
CS 277 – Spring 2002
Notes 4 91
Variation on B+tree: B-tree (no +)• Idea:
– Avoid duplicate keys– Have record pointers in non-leaf nodes
![Page 92: CS 277: Database System Implementation Notes 4: Indexing](https://reader035.fdocuments.us/reader035/viewer/2022062518/56814473550346895db1056d/html5/thumbnails/92.jpg)
CS 277 – Spring 2002
Notes 4 92
to record to record to record with K1 with K2 with K3
to keys to keys to keys to keys < K1 K1<x<K2 K2<x<k3 >k3
K1 P1 K2 P2 K3 P3
![Page 93: CS 277: Database System Implementation Notes 4: Indexing](https://reader035.fdocuments.us/reader035/viewer/2022062518/56814473550346895db1056d/html5/thumbnails/93.jpg)
CS 277 – Spring 2002
Notes 4 93
B-tree example n=2
65
125
145
165
85
105
25
45
10
20
30
40
110
120
90
100
70
80
170
180
50
60
130
140
150
160
• sequence pointers not useful now! (but keep space for simplicity)
![Page 94: CS 277: Database System Implementation Notes 4: Indexing](https://reader035.fdocuments.us/reader035/viewer/2022062518/56814473550346895db1056d/html5/thumbnails/94.jpg)
CS 277 – Spring 2002
Notes 4 94
Note on inserts
• Say we insert record with key = 25
10
20
30 n=3
leaf
10
– 20 –
25
30
• Afterwards:
![Page 95: CS 277: Database System Implementation Notes 4: Indexing](https://reader035.fdocuments.us/reader035/viewer/2022062518/56814473550346895db1056d/html5/thumbnails/95.jpg)
CS 277 – Spring 2002
Notes 4 95
So, for B-trees:
MAX MINTree Rec Keys Tree Rec KeysPtrs Ptrs Ptrs Ptrs
Non-leafnon-root n+1 n n (n+1)/2 (n+1)/2-1
(n+1)/2-1Leafnon-root 1 n n 1 (n+1)/2
(n+1)/2
Rootnon-leafn+1 n n 2 1 1
RootLeaf 1 n n 1 1 1
![Page 96: CS 277: Database System Implementation Notes 4: Indexing](https://reader035.fdocuments.us/reader035/viewer/2022062518/56814473550346895db1056d/html5/thumbnails/96.jpg)
CS 277 – Spring 2002
Notes 4 96
Tradeoffs:
B-trees have faster lookup than B+trees
in B-tree, non-leaf & leaf different sizes in B-tree, deletion more complicated
B+trees preferred!
![Page 97: CS 277: Database System Implementation Notes 4: Indexing](https://reader035.fdocuments.us/reader035/viewer/2022062518/56814473550346895db1056d/html5/thumbnails/97.jpg)
CS 277 – Spring 2002
Notes 4 97
But note:
• If blocks are fixed size(due to disk and buffering restrictions)
Then lookup for B+tree isactually better!!
![Page 98: CS 277: Database System Implementation Notes 4: Indexing](https://reader035.fdocuments.us/reader035/viewer/2022062518/56814473550346895db1056d/html5/thumbnails/98.jpg)
CS 277 – Spring 2002
Notes 4 98
Example:- Pointers 4 bytes- Keys 4 bytes- Blocks 100 bytes (just example)- Look at full 2 level tree
![Page 99: CS 277: Database System Implementation Notes 4: Indexing](https://reader035.fdocuments.us/reader035/viewer/2022062518/56814473550346895db1056d/html5/thumbnails/99.jpg)
CS 277 – Spring 2002
Notes 4 99
Root has 8 keys + 8 record pointers+ 9 son pointers
= 8x4 + 8x4 + 9x4 = 100 bytes
B-tree:
Each of 9 sons: 12 rec. pointers (+12 keys)= 12x(4+4) + 4 = 100 bytes
2-level B-tree, Max # records =12x9 + 8 = 116
![Page 100: CS 277: Database System Implementation Notes 4: Indexing](https://reader035.fdocuments.us/reader035/viewer/2022062518/56814473550346895db1056d/html5/thumbnails/100.jpg)
CS 277 – Spring 2002
Notes 4 100
Root has 12 keys + 13 son pointers= 12x4 + 13x4 = 100 bytes
B+tree:
Each of 13 sons: 12 rec. ptrs (+12 keys)= 12x(4 +4) + 4 = 100 bytes
2-level B+tree, Max # records= 13x12 = 156
![Page 101: CS 277: Database System Implementation Notes 4: Indexing](https://reader035.fdocuments.us/reader035/viewer/2022062518/56814473550346895db1056d/html5/thumbnails/101.jpg)
CS 277 – Spring 2002
Notes 4 101
So...
ooooooooooooo ooooooooo 156 records 108 records
Total = 116
B+ B
8 records
• Conclusion:– For fixed block size,– B+ tree is better because it is bushier
![Page 102: CS 277: Database System Implementation Notes 4: Indexing](https://reader035.fdocuments.us/reader035/viewer/2022062518/56814473550346895db1056d/html5/thumbnails/102.jpg)
CS 277 – Spring 2002
Notes 4 102
Outline/summary
• Conventional Indexes• Sparse vs. dense• Primary vs. secondary
• B trees• B+trees vs. B-trees• B+trees vs. indexed sequential
• Hashing schemes --> Next