CS4432: Database Systems II

36
CS 4432 lecture #10 - b+ tree indexing 1 CS4432: Database Systems II Lecture #10 Professor Elke A. Rundensteiner

description

CS4432: Database Systems II. Lecture #10. Professor Elke A. Rundensteiner. 39. 32. 38. 31. 33. 35. 34. 36. overflow area (not sequential). 10. 20. 30. Example Sequential Index continuous free space. 40. 50. 60. 70. 80. 90. Problems … Problems … Problems …. - PowerPoint PPT Presentation

Transcript of CS4432: Database Systems II

Page 1: CS4432: Database Systems II

CS 4432 lecture #10 - b+ tree indexing

1

CS4432: Database Systems IILecture #10

Professor Elke A. Rundensteiner

Page 2: CS4432: Database Systems II

CS 4432 lecture #10 - b+ tree indexing

2

Example Sequential Index

continuous

free space

102030

405060

708090

39313536

323834

33

overflow area(not sequential)

Page 3: CS4432: Database Systems II

CS 4432 lecture #10 - b+ tree indexing

3

• Without re-organization we get unpredictable performance

• Too much/often re-organization brings too much overhead

• DBA does not know when to reorganize

• DBA does not know how full to loadpages of new index

Problems … Problems … Problems …

Page 4: CS4432: Database Systems II

CS 4432 lecture #10 - b+ tree indexing

4

Another type of index

• Give up “sequentiality” of index• Predictable performance under

updates• Achieve always balance of “tree” • Automate restructuring under

updates

Page 5: CS4432: Database Systems II

CS 4432 lecture #10 - b+ tree indexing

5

Root

B+Tree Example n=3

100

120

150

180

30

3 5 11

30

35

100

101

110

120

130

150

156

179

180

200

Page 6: CS4432: Database Systems II

CS 4432 lecture #10 - b+ tree indexing

6

Sample non-leaf

to keys to keys to keys to keys

< 57 57 k<81 81k<95 95

57

81

95

Page 7: CS4432: Database Systems II

CS 4432 lecture #10 - b+ tree indexing

7

B+ Trees in Practice

• Typical order: 100. Typical fill-factor: 67%.– average fanout = 133

• Typical capacities:– Height 4: 1334 = 312,900,700 records– Height 3: 1333 = 2,352,637 records

• Can often hold top levels in buffer pool:– Level 1 = 1 page = 8 Kbytes– Level 2 = 133 pages = 1 Mbyte– Level 3 = 17,689 pages = 133 MBytes

Page 8: CS4432: Database Systems II

CS 4432 lecture #10 - b+ tree indexing

8

Sample non-leaf

to keys to keys to keys to keys

< 57 57 k<81 81k<95 95

57

81

95

Page 9: CS4432: Database Systems II

CS 4432 lecture #10 - b+ tree indexing

9

Sample leaf node:

From non-leaf node

to next leafin

sequence5

7

81

95

To r

eco

rd

wit

h k

ey 5

7

To r

eco

rd

wit

h k

ey 8

1

To r

eco

rd

wit

h k

ey 8

5

Page 10: CS4432: Database Systems II

CS 4432 lecture #10 - b+ tree indexing

10

In textbook’s notationn=3

Leaf:

Non-leaf:

30

35

30

30 35

30

Page 11: CS4432: Database Systems II

CS 4432 lecture #10 - b+ tree indexing

11

Size of nodes: n+1 pointersn keys

(fixed)

Page 12: CS4432: Database Systems II

CS 4432 lecture #10 - b+ tree indexing

13

Full nodemin. node

Non-leaf

Leaf

n=3

12

01

50

18

0

30

3 5 11

30

35

counts

even if

null

Non-leaf: (n+1)/2 pointers

Leaf: (n+1)/2 pointers to data

Page 13: CS4432: Database Systems II

CS 4432 lecture #10 - b+ tree indexing

14

B+tree rules tree of order n

(1) All leaves at same lowest level(balanced tree)

(2) Pointers in leaves point to records except for “sequence pointer”

Page 14: CS4432: Database Systems II

CS 4432 lecture #10 - b+ tree indexing

16

Root

B+Tree Example : Searches

100

120

150

180

30

3 5 11

30

35

100

101

110

120

130

150

156

179

180

200

Page 15: CS4432: Database Systems II

CS 4432 lecture #10 - b+ tree indexing

17

Insert into B+tree

(a) simple case– space available in leaf

(b) leaf overflow(c) non-leaf overflow(d) new root

Page 16: CS4432: Database Systems II

CS 4432 lecture #10 - b+ tree indexing

18

(a) Insert key = 32 n=33 5 11

30

31

30

100

32

Page 17: CS4432: Database Systems II

CS 4432 lecture #10 - b+ tree indexing

19

(a) Insert key = 7 n=3

3 5 11

30

31

30

100

3 5

7

7

Page 18: CS4432: Database Systems II

CS 4432 lecture #10 - b+ tree indexing

20

(c) Insert key = 160 n=3

10

0

120

150

180

150

156

179

180

200

160

18

0

160

179

Page 19: CS4432: Database Systems II

CS 4432 lecture #10 - b+ tree indexing

21

(d) New root, insert 45 n=3

10

20

30

1 2 3 10

12

20

25

30

32

40

40

45

40

30new root

Page 20: CS4432: Database Systems II

CS 4432 lecture #10 - b+ tree indexing

22

Recap: Insert Data into B+ Tree

• Find correct leaf L. • Put data entry onto L.

– If L has enough space, done!– Else, must split L (into L and a new node L2)

• Redistribute entries evenly, copy up middle key.• Insert index entry pointing to L2 into parent of L.

• This can happen recursively– To split index node, redistribute entries evenly, but

push up middle key. (Contrast with leaf splits.)

• Splits “grow” tree; root split increases height. – Tree growth: gets wider or one level taller at top.

Page 21: CS4432: Database Systems II

CS 4432 lecture #10 - b+ tree indexing

23

(a) Simple case (b) Coalesce with neighbor (sibling)

(c) Re-distribute keys(d) Cases (b) or (c) at non-leaf

Deletion from B+tree

Page 22: CS4432: Database Systems II

CS 4432 lecture #10 - b+ tree indexing

24

(a) Delete key = 11 n=33 5 11

30

31

30

100

Page 23: CS4432: Database Systems II

CS 4432 lecture #10 - b+ tree indexing

25

(b) Coalesce with sibling– Delete 50

10

40

100

10

20

30

40

50

n=4

40

Page 24: CS4432: Database Systems II

CS 4432 lecture #10 - b+ tree indexing

26

(c) Redistribute keys– Delete 50

10

40

100

10

20

30

35

40

50

n=4

35

35

Page 25: CS4432: Database Systems II

CS 4432 lecture #10 - b+ tree indexing

27

40

45

30

37

25

26

20

22

10

141 3

10

20

30

40

(d) Coalese and Non-leaf coalese– Delete 37

n=4

40

30

25

25

new root

Page 26: CS4432: Database Systems II

CS 4432 lecture #10 - b+ tree indexing

28

Delete Data from B+ Tree

• Start at root, find leaf L where entry belongs.• Remove the entry.

– If L is at least half-full, done! – If L has only d-1 entries,

• Try to re-distribute, borrowing from sibling (adjacent node with same parent as L).

• If re-distribution fails, merge L and sibling.

• If merge occurred, must delete entry (pointing to L or sibling) from parent of L.

• Merge could propagate to root, decreasing height.

Page 27: CS4432: Database Systems II

CS 4432 lecture #10 - b+ tree indexing

29

• Concurrency control harder in B-Trees• B-tree consumes more space• B-tree automatically decides :

– when to reorganize– how full to load pages of new index

• Buffering– B-tree: has fixed buffer requirements– Static index: must read several overflow blocks to be efficient (variable size buffers)

Discussion of B-trees (vs. static indexed sequential files)

Page 28: CS4432: Database Systems II

CS 4432 lecture #10 - b+ tree indexing

30

ComparisonB-tree vs. indexed seq.

file• Less space, so

lookup faster• Inserts managed

by overflow area• Requires

temporary restructuring

• Unpredictable performance

• Consumes more space, so lookup slower

•Each insert/delete potentially restructures

•Build-in restructuring

• Predictable performance

Page 29: CS4432: Database Systems II

CS 4432 lecture #10 - b+ tree indexing

31

• Speaking of buffering… Is LRU a good policy for B+tree

buffers?Of course not!

Should try to keep root in memory at all times

(and perhaps some nodes from second level)

Page 30: CS4432: Database Systems II

CS 4432 lecture #10 - b+ tree indexing

33

Interesting problem:

For B+tree, how large should n be?

n is number of keys / node

Page 31: CS4432: Database Systems II

CS 4432 lecture #10 - b+ tree indexing

34

assumptions: n children per node and N records in database

(1) Time to read B-Tree node from disk is (tseek + tread*n) msec.(2) Once in main memory, use binary search to locate key, (a + b log_2 n) msec(3) Need to search (read) log_n (N) tree nodes

(4) t-search = (tseek + tread*n + (a + b*log_2(n)) * log n (N)

Page 32: CS4432: Database Systems II

CS 4432 lecture #10 - b+ tree indexing

35

Can get: f(n) = time to find a record

f(n)

nopt n

FIND nopt by f’(n) = 0

What happens to nopt as:•Disk gets faster? CPU get faster? …

Page 33: CS4432: Database Systems II

CS 4432 lecture #10 - b+ tree indexing

36

Bulk Loading of B+ Tree

• For large collection of records, create B+ tree.• Method 1: Repeatedly insert records slow.• Method 2: Bulk Loading more efficient.

Page 34: CS4432: Database Systems II

CS 4432 lecture #10 - b+ tree indexing

37

Bulk Loading of B+ Tree

• Initialization: – Sort all data entries – Insert pointer to first (leaf) page in new (root) page.

3* 4* 6* 9* 10* 11* 12* 13* 20* 22* 23* 31* 35* 36* 38* 41* 44*

Sorted pages of data entries; not yet in B+ treeRoot

Page 35: CS4432: Database Systems II

CS 4432 lecture #10 - b+ tree indexing

38

Bulk Loading (Contd.)

• Index entries for leaf pages always entered into right-most index page

• When this fills up, it splits.

Split may go up right-most path to root.

3* 4* 6* 9* 10*11* 12*13* 20*22* 23* 31* 35*36* 38*41* 44*

Root

Data entry pages

not yet in B+ tree3523126

10 20

3* 4* 6* 9* 10* 11* 12*13* 20*22* 23* 31* 35*36* 38*41* 44*

6

Root

10

12 23

20

35

38

not yet in B+ treeData entry pages

Page 36: CS4432: Database Systems II

CS 4432 lecture #10 - b+ tree indexing

39

Summary of Bulk Loading

• Method 1: multiple inserts.– Slow.– Does not give sequential storage of leaves.

• Method 2: Bulk Loading – Has advantages for concurrency control.– Fewer I/Os during build.– Leaves will be stored sequentially (and

linked) – Can control “fill factor” on pages.