Group Project B- Tree Student: Yongsheng Ma

41
Group Project B-Tree Student: Yongsheng Ma CS632 – Algorithm Professor: G. Gibson

description

CS632 – Algorithm Professor: G. Gibson. Group Project B- Tree Student: Yongsheng Ma. B-Tree. Introduction Operations Complexities Applications Summary. B-Tree Properties. A m-way search way Root node may have as few as two children or none if the tree is empty Root may be a leaf - PowerPoint PPT Presentation

Transcript of Group Project B- Tree Student: Yongsheng Ma

Page 1: Group Project B- Tree Student: Yongsheng Ma

Group Project

B-Tree

Student: Yongsheng Ma

CS632 – AlgorithmProfessor: G. Gibson

Page 2: Group Project B- Tree Student: Yongsheng Ma

B-Tree

Introduction Operations Complexities Applications Summary

Page 3: Group Project B- Tree Student: Yongsheng Ma

B-Tree Properties

A m-way search way Root node may have as few as two

children or none if the tree is empty Root may be a leaf Internal nodes have at least ceiling(m/2)

and at most m non-null sub-trees

Page 4: Group Project B- Tree Student: Yongsheng Ma

B-Tree Properties

All leaf nodes are at the same level; that is, the tree is perfectly balanced.

A leaf node has at least ceiling(m/2)-1 entries (keys) and at most m-1 entries (keys).

Page 5: Group Project B- Tree Student: Yongsheng Ma

B-Tree Properties

“branching factor ” can be quite large. Each node may have many children, from

a handful to thousands. The keys in each node is in non-

decreasing order.

Page 6: Group Project B- Tree Student: Yongsheng Ma

Operations

Searching a key Inserting a key Splitting a node Deleting a node

Page 7: Group Project B- Tree Student: Yongsheng Ma

Searching a key

Much like searching a binary tree. Make a multi-way branching decision at

each node The nodes encountered form a path

downward from the root.

Page 8: Group Project B- Tree Student: Yongsheng Ma

Searching a key

The number of pages accessed is (h)=(logtn) , in which h is the height and n is the number of keys.

CPU time is O(th)=O(t logtn) . Note

t is minimum degree for B-tree. So each node has the maximum number of children

as 2t and entries(keys) as 2t-1.

Page 9: Group Project B- Tree Student: Yongsheng Ma

Searching a key

M

HD XTQ

GFCB LKJ PN WVSR ZY

Page 10: Group Project B- Tree Student: Yongsheng Ma

Creating a empty tree

We can assume there is no disk read.

Allocates one disk page to be used as a new node in O(1) time.

Page 11: Group Project B- Tree Student: Yongsheng Ma

Splitting a node

A fundamental operation used during insertion

The median key moves up into its parent node, which must be non-full.

If it has no parent, then the tree grows in height by one

Page 12: Group Project B- Tree Student: Yongsheng Ma

Splitting a node

WN… …… …

SRQ TP U V

SN… W… … …

RQ TP U V

t=4

Page 13: Group Project B- Tree Student: Yongsheng Ma

Splitting a node

HFD LA N PFD LA N P

t=4

H

Page 14: Group Project B- Tree Student: Yongsheng Ma

Inserting a key

Requiring O(h) disk accesses. CPU time O(th)=O(t logtn) .

Page 15: Group Project B- Tree Student: Yongsheng Ma

Inserting a key

Splitting the root is the only way to increase the height of a B-tree.

Unlike a binary tree, a B-tree increases in height at the top instead of the bottom .

Page 16: Group Project B- Tree Student: Yongsheng Ma

Inserting a key

EDC JA K RN O Y ZVUTS

XPMG(a) initial tree

t=3

Page 17: Group Project B- Tree Student: Yongsheng Ma

Inserting a key

EDC JB K RN O Y ZVUTS

XPMG(b) B inserted

A

t=3

Page 18: Group Project B- Tree Student: Yongsheng Ma

Inserting a key

EDC JB K QN O Y ZVUSR

TPMG(c) Q inserted

A

X

t=3

Page 19: Group Project B- Tree Student: Yongsheng Ma

Inserting a key

EDC JB K QN O Y ZVUSR

T

P

MG

(d) L inserted

A

X

L

t=3

Page 20: Group Project B- Tree Student: Yongsheng Ma

Inserting a key

ED

C

JB K QN O Y ZVUSR

T

P

MG

(e) F inserted

A

X

LF

t=3

Page 21: Group Project B- Tree Student: Yongsheng Ma

Deleting a key

is analogous to insertion but is a little more complicated.

Exists various cases of deleting keys from B-tree.

Page 22: Group Project B- Tree Student: Yongsheng Ma

Deleting a key

Different conditions can affect different behaviors.

In practice, deletion operations are most often used to delete keys from leaves.

Page 23: Group Project B- Tree Student: Yongsheng Ma

Deleting a key

When deleting a key from an internal node, however, the procedure makes a downward pass through the tree but may have to return to the node from which the key was deleted to replace the key with its predecessor or successor.

Page 24: Group Project B- Tree Student: Yongsheng Ma

Deleting a key

Although this procedure seems complicated, it involves only O(h) disk operations for a B-tree with height h.

The CPU time required is

O(th)=O(t logtn) .

Page 25: Group Project B- Tree Student: Yongsheng Ma

Deleting a key

ED

C

JB K QN O Y ZVUSR

T

P

MG

(a) Initial tree

A

X

LF

t=3

Page 26: Group Project B- Tree Student: Yongsheng Ma

Deleting a key

ED

C

JB K QN O Y ZVUSR

T

P

MG

(b) F deleted: case 1

A

X

L

t=3

Page 27: Group Project B- Tree Student: Yongsheng Ma

Deleting a key

ED

C

JB K QN O Y ZVUSR

T

P

LG

(c) M deleted: case 2a

A

X

t=3

Page 28: Group Project B- Tree Student: Yongsheng Ma

Deleting a key

ED

C

JB K QN O Y ZVUSR

T

P

L

(d) G deleted: case 2c

A

X

t=3

Page 29: Group Project B- Tree Student: Yongsheng Ma

Deleting a key

E

C

JB K QN O Y ZVUSR

TPL

(e) D deleted: case 3b

A

X

t=3

Page 30: Group Project B- Tree Student: Yongsheng Ma

Deleting a key

E

C

JB K QN O Y ZVUSR

TPL

(e’) tree shrinks in height

A

X

t=3

Page 31: Group Project B- Tree Student: Yongsheng Ma

Deleting a key

E

JC K QN O Y ZVUSR

TPL

(f) B deleted: case 3a

A

X

t=3

Page 32: Group Project B- Tree Student: Yongsheng Ma

Complexities

A large Branching Factor reduces the number of disk accesses required to find a key.

When root node resides in memory, a tree with a height of 1 will require at most 2 disk accesses to find any key in the tree, this can be realized in Constant Time O(1).

Page 33: Group Project B- Tree Student: Yongsheng Ma

Complexities

Running Time is comprised of the number of disk accesses and the CPU time.

During a disk Read or Write, an entire page of information is accessed

The number of disk accesses is measured in terms of pages that have to be read from or written to the disk.

Page 34: Group Project B- Tree Student: Yongsheng Ma

Complexities

The number of disk pages accessed is

O(h)=O(logtn). The CPU time to traverse within each node is

O(t). The Total Time is O(th) which is equal to

O(tlogtn) or ≈ O(log n).

It is the same for every basic operation.

Page 35: Group Project B- Tree Student: Yongsheng Ma

Applications

Databases cannot typically be maintained entirely in memory.

Secondary storage is usually used. B-tree is often used to index the data and

to provide fast access.

Page 36: Group Project B- Tree Student: Yongsheng Ma

Applications

Searching an un-indexed and unsorted database containing n key values will have a worst case running time of O(n)

Indexed with a B-tree, the same search operation will run in O(log n)

Page 37: Group Project B- Tree Student: Yongsheng Ma

Applications – an example

To perform a search for a single key on a set of one million keys (1,000,000), a linear search will require at most 1,000,000 comparisons.

If the same data is indexed with a B-tree of

minimum order 10 and height 9, 81 comparisons will be required in the worst case.

Page 38: Group Project B- Tree Student: Yongsheng Ma

Summary

B-Tree is a balanced, multi-way file organization.

Search, Insert, and Delete operations retain desirable logarithmic costs.

B-Tree schemes promote 50% storage usage.

Page 39: Group Project B- Tree Student: Yongsheng Ma

Extra

B-tree variants B+ and B* tree Branching factors are improved

Page 40: Group Project B- Tree Student: Yongsheng Ma

Extra

B+ tree Combine features of ISAM and B tree Contain Index pages and Data pages Data pages always appear as leaf nodes Root and intermediate nodes are index pages

Page 41: Group Project B- Tree Student: Yongsheng Ma

Extra

B+ tree Saves more space (but who cares) Non-leaf and leaf nodes contain different numbers

of nodes Deletion more complicated Faster look up for B-trees because the height of the

tree is smaller (because items are stored more compactly)