B-Trees Disk Storage What is a multiway tree? What is a B-tree? Why B-trees? Insertion in a B-tree...

23
B-Trees Disk Storage What is a multiway tree? What is a B-tree? Why B-trees? Insertion in a B-tree Deleting from a B-tree
  • date post

    20-Dec-2015
  • Category

    Documents

  • view

    224
  • download

    0

Transcript of B-Trees Disk Storage What is a multiway tree? What is a B-tree? Why B-trees? Insertion in a B-tree...

Page 1: B-Trees Disk Storage What is a multiway tree? What is a B-tree? Why B-trees? Insertion in a B-tree Deleting from a B-tree.

B-Trees

• Disk Storage

• What is a multiway tree?

• What is a B-tree?

• Why B-trees?

• Insertion in a B-tree

• Deleting from a B-tree

Page 2: B-Trees Disk Storage What is a multiway tree? What is a B-tree? Why B-trees? Insertion in a B-tree Deleting from a B-tree.

Disk Storage• In this lecture you will study two external data storage structures:

M-way and B-trees.• Data is stored on disk (i.e., secondary memory) in blocks.• A block is the smallest amount of data that can be accessed on a

disk.• Each block has a fixed number of bytes. • Each block may hold many data records.

999000

data

999004

data

empty

empty

empty

empty

Record 1

Record 2

Record 3

Record 4

999000

999004

Page 3: B-Trees Disk Storage What is a multiway tree? What is a B-tree? Why B-trees? Insertion in a B-tree Deleting from a B-tree.

What is a Multiway (M-way) search tree?

• A multiway (or M-way) search tree of order m is a finite set of keys and references. Either the set is empty or:

1. Each non-leaf node consists of distinct keys ki and of at least 2 and at most m non-empty M-way subtrees T0, T1, . . . , Tm-1. The number of keys is one less than the number of non-empty subtrees.

km-2. . .k3k2k1

T0 T1 T2 Tm-2 Tm-1key < k1 k1 < key < k2 k2 < key < k3 km-2 < key < km-1 key > km-1

2. The keys and subtrees of a non-leaf node are ordered as:

T0, k1, T1, k2, T2, . . . , km-1, Tm-1 such that:

• All keys in subtree T0 are less than k1.

• All keys in subtree Ti , 1 <= i <= m - 2, are greater than ki but less than ki+1.

• All keys in subtree Tm-1 are greater than km-1

km-1

Page 4: B-Trees Disk Storage What is a multiway tree? What is a B-tree? Why B-trees? Insertion in a B-tree Deleting from a B-tree.

What is an (M-way) search tree?(cont’d)

3. Each leaf node consists of distinct keys pi and at least 2 and at most m non-null references to data blocks B0, B1, B2, . . . , Bm-1. The number of keys is one less than the number of non-null references.

4. The keys in a non-leaf node and the referenced blocks are ordered as: B0, p1, B1, p2, B2, . . . , pm-1, Bm-1 such that:

• All data in block B0 correspond to keys that are less than p1.

• All data in block Bi , 1 <= i <= m - 2, correspond to pi and to keys that are greater than pi but less than pi+1.

• All data in block Bm - 1 correspond to pm-1 and to keys that are greater than pm-1.

Page 5: B-Trees Disk Storage What is a multiway tree? What is a B-tree? Why B-trees? Insertion in a B-tree Deleting from a B-tree.

Multiway (M-way) search tree Examples

• A multiway search tree of order 3:

10 26

4 7 14 20 32

1

2

4 7

8

9

10 14

17

19

20

24

26

28

32

Page 6: B-Trees Disk Storage What is a multiway tree? What is a B-tree? Why B-trees? Insertion in a B-tree Deleting from a B-tree.

Multiway search tree Examples (cont’d)

• Example:A multiway search tree of order 3:

• Example:A multiway search tree of order 5:

Note: The leaf nodes in an M-way tree are not necessarily at the same level.

Page 7: B-Trees Disk Storage What is a multiway tree? What is a B-tree? Why B-trees? Insertion in a B-tree Deleting from a B-tree.

What is a B-Tree?• A B-tree of order m (or branching factor m), where m > 2, is either an

empty tree or a multiway search tree with the following properties: The root is either a leaf or it has at least two non-empty subtrees and

at most m non-empty subtrees. Each non-leaf node, other than the root, has at least m / 2 non-

empty subtrees and at most m non-empty subtrees. (Note: x is the lowest integer > x ).

The number of keys in each non-leaf node is one less than the number of non-empty subtrees for that node.

Each leaf node has at least m / 2 and at most m references to data blocks.

The number of keys in each leaf node is one less than the number of non-null references in that node.

All leaf nodes are at the same level; that is the tree is perfectly balanced.

A data block referenced by by a leaf node may store at least L / 2 and at most L data records, where L is a fixed constant, L 1.

Page 8: B-Trees Disk Storage What is a multiway tree? What is a B-tree? Why B-trees? Insertion in a B-tree Deleting from a B-tree.

B-Tree ExamplesExample: A B-tree with m = 4 and L = 3

Example:

Page 9: B-Trees Disk Storage What is a multiway tree? What is a B-tree? Why B-trees? Insertion in a B-tree Deleting from a B-tree.

Choosing M and L for a B-Tree

• A good value for L , the number of records in a block, is the largest integer satisfying the inequality:

L <= BlockSize / RecordSize• A good value for the branching factor, m, is the largest integer

satisfying the inequality:

ReferenceSize * m + KeySize * (m - 1) <= BlockSize• Example: A B-tree stores student records. Each student record is

1024 bytes. The key for a record is a student ID that is 36 bytes. If a block holds 4096 bytes and a reference requires 32 bytes. Find suitable values for m and L.

L <= 4096 / 1024 => L = 4

32m + 36(m - 1) <= 4096 => m <= 59.7 => m = 58

Page 10: B-Trees Disk Storage What is a multiway tree? What is a B-tree? Why B-trees? Insertion in a B-tree Deleting from a B-tree.

Why B-Trees?

• Accessing data in secondary memory is much slower than accessing data in primary memory

• Thus, when data is too large to fit in primary memory, minimizing the number of disk accesses is important.

• Many algorithms and data structures that are efficient for manipulating data in primary memory are not efficient for manipulating large data in secondary memory because they do not minimize the number of disk accesses.

• For example, AVL trees are not suitable for representing huge tables residing in secondary memory.

• The height of an AVL tree increases, and hence the number of disk accesses required to access a particular record increases, as the number of records increases.

Page 11: B-Trees Disk Storage What is a multiway tree? What is a B-tree? Why B-trees? Insertion in a B-tree Deleting from a B-tree.

Why B-Trees? (cont’d)

• B-trees are suitable for representing huge tables residing in secondary memory because:

1. With a large branching factor m, the height of a B-tree is low resulting in fewer disk accesses.

2. The branching factor can be chosen such that a node reference corresponds to a block of secondary memory.

3. B-trees usually keep related records on the same block. Again this results in fewer disk accesses.

Page 12: B-Trees Disk Storage What is a multiway tree? What is a B-tree? Why B-trees? Insertion in a B-tree Deleting from a B-tree.

Why B-Trees? (comparison with AVL Trees)

• The height h of a B-tree of order m, with a total of n keys, satisfies the inequality:

h <= 1 + log m / 2 ((n + 1) / 2)

• If m = 300 and n = 16,000,000 then h <= 3.31 .• Thus, in the worst case finding a key in such a B-tree requires 3 disk

accesses.• The average number of comparisons for an AVL tree with n keys is

log n + 0.25 where n is large.• If n = 16,000,000 the average number of comparisons is 17.• Thus, in the average case, finding a key in such an AVL tree

requires 17 disk accesses.

Page 13: B-Trees Disk Storage What is a multiway tree? What is a B-tree? Why B-trees? Insertion in a B-tree Deleting from a B-tree.

Insertion in B-Trees

• In insertion or deletion, a B-tree undergoes changes that must maintain: Its height balance. Its leaves to be at the same level. Each of its nodes, except the root, to be at least half full (i.e., to

contain a minimum of m / 2 - 1 keys, where m is the order of the tree).

• A node of a B-tree of order m is said to underflow if, after deletion, it contains m / 2 - 2 keys.

• There are 2 insertion cases:1. The leaf in which the insertion is to be done does not overflow.2. The leaf overflows:

(a) with an odd number of keys,(b) with an even number of keys.

Page 14: B-Trees Disk Storage What is a multiway tree? What is a B-tree? Why B-trees? Insertion in a B-tree Deleting from a B-tree.

Insertion: case 1

• The leaf does not overflow:

Insert 511

B-tree of order 5

Page 15: B-Trees Disk Storage What is a multiway tree? What is a B-tree? Why B-trees? Insertion in a B-tree Deleting from a B-tree.

Insertion: case 2a• The leaf overflows with an odd numbers of keys.

Insert 490

B-tree of order 5

Page 16: B-Trees Disk Storage What is a multiway tree? What is a B-tree? Why B-trees? Insertion in a B-tree Deleting from a B-tree.

Insertion: case 2a (cont’d)

Page 17: B-Trees Disk Storage What is a multiway tree? What is a B-tree? Why B-trees? Insertion in a B-tree Deleting from a B-tree.

Insertion: case 2b

Left bias insertion of 102

Right bias insertion of 102

The leaf overflows with an even number of keys.

B-tree of order 4

Page 18: B-Trees Disk Storage What is a multiway tree? What is a B-tree? Why B-trees? Insertion in a B-tree Deleting from a B-tree.

Deletion

• If the key to be deleted is not in a leaf, swap it with either its successor or predecessor (each will be in a leaf).

• Delete the key from the leaf.• There are four cases:

1. The leaf does not underflow.

Delete 140

B-tree of order 4

Page 19: B-Trees Disk Storage What is a multiway tree? What is a B-tree? Why B-trees? Insertion in a B-tree Deleting from a B-tree.

Deletion (cont’d)

2. The leaf underflows and the adjacent right sibling has at least m / 2 keys.

Delete 113

B-tree of order 5

Page 20: B-Trees Disk Storage What is a multiway tree? What is a B-tree? Why B-trees? Insertion in a B-tree Deleting from a B-tree.

Deletion (cont’d)

3. The leaf underflows and the adjacent left sibling has at least m / 2 keys.

Delete 135

B-tree of order 5

Page 21: B-Trees Disk Storage What is a multiway tree? What is a B-tree? Why B-trees? Insertion in a B-tree Deleting from a B-tree.

Deletion (cont’d)

4 .The leaf underflows and each adjacent sibling has m / 2 - 1 keys.

merge node, sibling and the separating key x

If the parent of the merged node underflows, the merging process propagates upward. In the limit, a root with one key is deleted and the height decreases by one.

Page 22: B-Trees Disk Storage What is a multiway tree? What is a B-tree? Why B-trees? Insertion in a B-tree Deleting from a B-tree.

Deletion (cont’d)

Delete 412

The parent of the merged node does not underflow. The merging process does not propagate upward.

B-tree of order 5

Page 23: B-Trees Disk Storage What is a multiway tree? What is a B-tree? Why B-trees? Insertion in a B-tree Deleting from a B-tree.

Deletion (cont’d)

Delete D

B-tree of order 5