Group Project B- Tree Student: Yongsheng Ma
description
Transcript of Group Project B- Tree Student: Yongsheng Ma
Group Project
B-Tree
Student: Yongsheng Ma
CS632 – AlgorithmProfessor: G. Gibson
B-Tree
Introduction Operations Complexities Applications Summary
B-Tree Properties
A m-way search way Root node may have as few as two
children or none if the tree is empty Root may be a leaf Internal nodes have at least ceiling(m/2)
and at most m non-null sub-trees
B-Tree Properties
All leaf nodes are at the same level; that is, the tree is perfectly balanced.
A leaf node has at least ceiling(m/2)-1 entries (keys) and at most m-1 entries (keys).
B-Tree Properties
“branching factor ” can be quite large. Each node may have many children, from
a handful to thousands. The keys in each node is in non-
decreasing order.
Operations
Searching a key Inserting a key Splitting a node Deleting a node
Searching a key
Much like searching a binary tree. Make a multi-way branching decision at
each node The nodes encountered form a path
downward from the root.
Searching a key
The number of pages accessed is (h)=(logtn) , in which h is the height and n is the number of keys.
CPU time is O(th)=O(t logtn) . Note
t is minimum degree for B-tree. So each node has the maximum number of children
as 2t and entries(keys) as 2t-1.
Searching a key
M
HD XTQ
GFCB LKJ PN WVSR ZY
Creating a empty tree
We can assume there is no disk read.
Allocates one disk page to be used as a new node in O(1) time.
Splitting a node
A fundamental operation used during insertion
The median key moves up into its parent node, which must be non-full.
If it has no parent, then the tree grows in height by one
Splitting a node
WN… …… …
SRQ TP U V
SN… W… … …
RQ TP U V
t=4
Splitting a node
HFD LA N PFD LA N P
t=4
H
Inserting a key
Requiring O(h) disk accesses. CPU time O(th)=O(t logtn) .
Inserting a key
Splitting the root is the only way to increase the height of a B-tree.
Unlike a binary tree, a B-tree increases in height at the top instead of the bottom .
Inserting a key
EDC JA K RN O Y ZVUTS
XPMG(a) initial tree
t=3
Inserting a key
EDC JB K RN O Y ZVUTS
XPMG(b) B inserted
A
t=3
Inserting a key
EDC JB K QN O Y ZVUSR
TPMG(c) Q inserted
A
X
t=3
Inserting a key
EDC JB K QN O Y ZVUSR
T
P
MG
(d) L inserted
A
X
L
t=3
Inserting a key
ED
C
JB K QN O Y ZVUSR
T
P
MG
(e) F inserted
A
X
LF
t=3
Deleting a key
is analogous to insertion but is a little more complicated.
Exists various cases of deleting keys from B-tree.
Deleting a key
Different conditions can affect different behaviors.
In practice, deletion operations are most often used to delete keys from leaves.
Deleting a key
When deleting a key from an internal node, however, the procedure makes a downward pass through the tree but may have to return to the node from which the key was deleted to replace the key with its predecessor or successor.
Deleting a key
Although this procedure seems complicated, it involves only O(h) disk operations for a B-tree with height h.
The CPU time required is
O(th)=O(t logtn) .
Deleting a key
ED
C
JB K QN O Y ZVUSR
T
P
MG
(a) Initial tree
A
X
LF
t=3
Deleting a key
ED
C
JB K QN O Y ZVUSR
T
P
MG
(b) F deleted: case 1
A
X
L
t=3
Deleting a key
ED
C
JB K QN O Y ZVUSR
T
P
LG
(c) M deleted: case 2a
A
X
t=3
Deleting a key
ED
C
JB K QN O Y ZVUSR
T
P
L
(d) G deleted: case 2c
A
X
t=3
Deleting a key
E
C
JB K QN O Y ZVUSR
TPL
(e) D deleted: case 3b
A
X
t=3
Deleting a key
E
C
JB K QN O Y ZVUSR
TPL
(e’) tree shrinks in height
A
X
t=3
Deleting a key
E
JC K QN O Y ZVUSR
TPL
(f) B deleted: case 3a
A
X
t=3
Complexities
A large Branching Factor reduces the number of disk accesses required to find a key.
When root node resides in memory, a tree with a height of 1 will require at most 2 disk accesses to find any key in the tree, this can be realized in Constant Time O(1).
Complexities
Running Time is comprised of the number of disk accesses and the CPU time.
During a disk Read or Write, an entire page of information is accessed
The number of disk accesses is measured in terms of pages that have to be read from or written to the disk.
Complexities
The number of disk pages accessed is
O(h)=O(logtn). The CPU time to traverse within each node is
O(t). The Total Time is O(th) which is equal to
O(tlogtn) or ≈ O(log n).
It is the same for every basic operation.
Applications
Databases cannot typically be maintained entirely in memory.
Secondary storage is usually used. B-tree is often used to index the data and
to provide fast access.
Applications
Searching an un-indexed and unsorted database containing n key values will have a worst case running time of O(n)
Indexed with a B-tree, the same search operation will run in O(log n)
Applications – an example
To perform a search for a single key on a set of one million keys (1,000,000), a linear search will require at most 1,000,000 comparisons.
If the same data is indexed with a B-tree of
minimum order 10 and height 9, 81 comparisons will be required in the worst case.
Summary
B-Tree is a balanced, multi-way file organization.
Search, Insert, and Delete operations retain desirable logarithmic costs.
B-Tree schemes promote 50% storage usage.
Extra
B-tree variants B+ and B* tree Branching factors are improved
Extra
B+ tree Combine features of ISAM and B tree Contain Index pages and Data pages Data pages always appear as leaf nodes Root and intermediate nodes are index pages
Extra
B+ tree Saves more space (but who cares) Non-leaf and leaf nodes contain different numbers
of nodes Deletion more complicated Faster look up for B-trees because the height of the
tree is smaller (because items are stored more compactly)