CS102-07 Indexed Sequential_1
-
Upload
joanne-tan -
Category
Documents
-
view
216 -
download
0
Transcript of CS102-07 Indexed Sequential_1
-
8/7/2019 CS102-07 Indexed Sequential_1
1/10
-
8/7/2019 CS102-07 Indexed Sequential_1
2/10
CJD
Perfect Binary TreesPerfect Binary tree :
a perfect binary tree has all nodes with 2 childrenexcept the leaf nodes which are all at level n.
a perfect binary tree has 2l-1
nodes at level l.a perfect binary tree of height h has 2 h -1 nodes
the height h of a perfect binary tree with n nodesis log 2(n+1) .
Binary trees have height at least log 2(n+1) .
a tree is balanced if the height is a minimum forthe number of nodes. (Other non-equivalent andapproximate definitions of "balanced" exist)
CJD
Example Perfect and Balanced Trees
E
MI
B
D
L N
F G
C
A
OH KJ
CJD
Binary Search TreesBinary Search Tree :
an organized binary tree such that
the key value of every node is greater thanthe key value of its left child and less than thekey value of its right child.
if the tree is balanced, searching a BST could beas efficient as binary search
balanced binary search trees are used toorganize data for quick access.
CJD
AVL TreesCreated by Adelson, Velskii, Landis in 1962
height balanced binary search trees
AVL trees definition
the heights of the left and right subtrees ofevery node of T do not differ by more than 1
Random retrieval in an AVL tree
maximum height h 1.4404 log 2 (n+2) - 0.328or roughly 1.5 log 2 n
for n=1,000,000, max h is 29.
proven by AVL themselves.
CJD
AVL Tree InsertionUse binary search to determine insertion point ofnew key, k. Insert new node x containing k.
Update the subtree heights, h , of x, h(x)=1, andupwards to all its ancestors (no change or add 1)
While updating subtree heights, if there is animbalance at node a : |h(a.left) h(a.right)| > 1,
Let p be a.parent, p is null if a is the root.
Let node b be child of a on path from x.
Let node c be child of b on path from x.
Fix imbalance using appropriate rotations.
CJD
AVL Tree Rotations (1)To maintain the AVL tree after insertions anddeletions, local rotations are performed.
20
10
Initial BST Insert 30
10 30
20
RR-RotationL2 null
30
20
10
R2
R1
a
b
c
-
8/7/2019 CS102-07 Indexed Sequential_1
3/10
CJD
RR Rotationa.right b.left
b.left a
if p = NULL
then root belse if p.left = a
then p.left = b
else p.right = b
h(a) = h(a) - 1
CJD
AVL Tree Rotations (2)
10
40
30
20
Insert 40 Insert 50
10
50
40
20
30
RR-RotationL2 null
50
10
40
30
20
R2
R1
c
b
a
CJD
AVL Tree Rotations (3)
Insert 60RR-RotationL2 not null
10
50
40
20
30 60
10
50
40
20
30
60
R1
R2
c
b
a
CJD
AVL Tree Rotations (4)
Without 60,Insert 35 or 25
RL-Rotation40s left 30s right20s right 30s left30s left 2030s right 40
Note : LL- and LR-Rotations are symmetric toRR- and RL-Rotations
10 50
4020
30
3525
10
50
40
20
30
35
R1
L2
25
c
b
a
CJD
RL Rotationa.right c.left
b.left c.right
c.left a
c.right b
if p = NULL
then root c
else if p.left = a
then p.left = c
else p.right = c
h(a) = h(a) 2
h(b) = h(b) 1
h(c) = h(c) + 1CJD
AVL Updates and DeletionsWhat about key updates ?
Delete record, and insert record with new key
How do you handle deletions?
Swap with successor (next in linear ordering)Successor is a leaf node.
Delete this node after swap.
Handle rotations from thereupdate subtree heights towards root.
do rotations if it becomes imbalanced.
Details left to the student
-
8/7/2019 CS102-07 Indexed Sequential_1
4/10
CJD
AVL Trees Analysis
If the AVL tree is created in main memory, theremay not be enough space for all the key values.
If the AVL tree is created on disk, the number ofdisk accesses required to access a record is atmost proportional to the height of the tree.
for 1,000,000 records, no more than 29 diskaccess are required for the search of anyrecord in the file.
This beats sequential access.
CJD
m-Way Search TreesDefinition :
All nodes have m or less children
Each node contains the following information:
t = the number of key values in the node, 1 t < mS 0 = pointer to a subree containing key values lessthan K 1(K1, A1, S 1), , (K t, At, S t) = t triples described below
CJD
m-Way Search TreesDefinition (continued):
Ki (1 i t) are key values where K i < Ki+1 for1 i < t
S i (1 i < t) are pointers to subtree containingkey values between K i and K i+1 exclusive
S t is a pointer to subtree containing key valuesstrictly greater than K tS i (0 i t) are pointers to balanced m-waysearch subtrees
Ai (1
i
t) are addresses in the file of the recordwith key K iCJD
Access
Accessing the m-way search tree :
Each node of the tree is stored in a block ofstorage
All computations within a node are done inmain memory
CJD
Illustration
1, 0, (100, addr[100], 0)f1, 0, (60, addr[60], 0)e2, 0, (80, addr[80], 0), (90, addr[90], f)d2, 0, (40, addr[40], 0), (50, addr[50], e)c2, 0, (10, addr[10], 0), (20, addr[20], 0)b2, b, (30, addr[30], c), (70, addr[70], d)aFormatNode
30 70
40 50 80 9010 20
60 100
a
b c d
e f
CJD
m-Way Analysis (1)
Since there are more key values per node, thereare less nodes
Since a node could have more than 2 children, theheight is generally smaller than binary searchtrees with the same number of nodes
If balanced, the height of an m-way search treewith n nodes is about log m n which is smallerthan for a balanced binary search tree when m=2.
-
8/7/2019 CS102-07 Indexed Sequential_1
5/10
-
8/7/2019 CS102-07 Indexed Sequential_1
6/10
CJD
Example
50 70
80 87 95
10 20 40
5730
55 60 75 10090
partial insertion of87 and 85
85
adjust overflow node
CJD
Example
50 70 87
95
10 20 40
5730
55 60 75 10090
partial insertion of87 and 85
85
80
adjust overflow node
CJD
Example
50
95
10 20 40
5730
55 60 75 10090
complete insertion of87 and 85
85
80
70
87
CJD
Deletion (1)search for key value and delete it from containingnode
if node is a leaf and does not underflow, youredone
if node is a leaf and underflows,
search for adjacent sibling with more thanminimum number of keys
if sibling found, adjust all nodes f rom this sibling tocurrent node and parent, to transfer an excess keyto the current node using simple rotations
CJD
Deletion (2)if sibling not found,
merge the node with a sibling node. Demote theintervening key value in the parent node to themerged node.
handle deletion of a key value from parent as ifparent is a leaf (i.e. r epeat this step for parent)
if node is a non-leaf,
replace the deleted key by its predecessorkey in the tree
handle the deletion of the predecessor keyfrom the containing leaf node
CJD
Example : m=5
B-Trees where m=5
Root node has 2 to 5 children
all non-terminal nodes has at least 5/2 = 3children: 3, 4 or 5
all non-terminal nodes has 2, 3 or 4 keys
all terminal nodes are on the same level
-
8/7/2019 CS102-07 Indexed Sequential_1
7/10
CJD
Example
90
30 70
10 20 40 60 75 80 100 110 130 140 150 170 180 200 210 230 240
120 160 190 220
Lets delete 20 - its in a leaf node;- results in underflow in 10;- adjacent sibling(s) minimal
merge with adjacent sibling
B-Treeof order 5
CJD
Example
90
70
10 30 40 60 75 80 100 110 130 140 150 170 180 200 210 230 240
120 160 190 220
partial deletionof 20
adjust underflow node rotate with non-minimal sibling
CJD
Example
120
70 90
10 30 40 60 75 80 100 110 130 140 150 170 180 200 210 230 240
160 190 220
20 deleted
Lets delete 160 its a non-leaf- replace it with its predecessor 150
CJD
Example
120
70 90
10 30 40 60 75 80 100 110 130 140 170 180 200 210 230 240
150 190 220
160 deleted
Lets delete 90 replace it with its predecessor 80 results in underflow in 75
rotate 75 with non-minimal sibling
CJD
Example
120
60 80
10 30 40 70 75 100 110 130 140 170 180 200 210 230 240
150 190 220
90 deleted
Lets delete 120 replace it with its predecessor 110- results in underflow in 100- adjacent sibling is minimal merge !
CJD
Example
110
60
10 30 40 70 75 80 100 130 140 170 180 200 210 230 240
150 190 220
partial deletionof 120
adjust underflow node rotate with non-minimal sibling
-
8/7/2019 CS102-07 Indexed Sequential_1
8/10
CJD
Example
150
60 110
10 30 40 70 75 80 100 130 140 170 180 200 210 230 240
190 220
120 deleted
Lets delete 190 replace with its successor 180- results in underflow in 170;- adj sibling minimal; merge !
CJD
Example
150
60 110
10 30 40 70 75 80 100 130 140 170 180 200 210 230 240
220
partial deletionof 190
adjust underflow node adjacent sibling minimalmerge !
CJD
Example
60 110 150 220
10 30 40 70 75 80 100 130 140 170 180 200 210 230 240
190 deleted
CJD
AnalysisSome nodes may not be full (i.e. not havemaximum number of children)
This increases the number of nodes but does notsubstantially increase the height.
This allows easy insertions because there maybe space available in the nodes
Each node is at least half-full, so the height isless than about logm/2 n
Very efficient to use as an indexing structure of
files and compares very well to a perfectlybalanced tree
CJD
B*-Trees
Description :
This is a modification of B-Trees using anoverflow handling technique
Insertion into a full node causes rotations thatoverflow excess keys to sibling nodes.
Nodes are at least 2/3 full instead of at least half-full
CJD
B*-TreesB*-Trees Definition :
An empty tree is a special m-way search tree with nonodes
All non-empty subtrees are m-way search tr ees
The root node has at least two children and a maximum of2 (2m-2)/3 + 1 children. Note: This could exceed m.
All non-terminal nodes (S i 0 for some i) other than theroot has at least (2m-1)/3 children; therefore the non-terminal nodes store K i values i (2m-1)/3 - 1
All terminal nodes (S i = 0 for all i) are on the same levelTerminal nodes are initially empty.
-
8/7/2019 CS102-07 Indexed Sequential_1
9/10
CJD
Illustrative ComparisonWith m=7, B-Trees require each (non-root)internal node to have 4 to 7 children or 3 to 6keys per node.
With m=7, B*-Trees require each (non-root)internal node to have 5 to 7 children or 4 to 6keys per node.
With m=31, B-Trees require each (non-root)internal node to have 16 to 31 children or 15 to30 keys per node.
With m=31, B*-Trees require each (non-root)internal node to have 20 to 31 children or 19 to30 keys per node.
CJD
InsertionInserting a key to a full non-root node causes overflow ofan excess key into its siblings using rotations throughtheir parent.
If all siblings are also full, then the overflow node, a
nearest sibling, and the dividing parent key are merged.This is then divided into three n odes as evenly aspossible with the 2 dividing keys moved u p to theirparent, keeping each new node at least about 2/3 full.
If the parent overflows, then adjust that node with itssiblings and parent.
Inserting a key to a full root node creates a new root withthe first half of keys as its left child node, and the secondhalf of keys as its right child node.
CJD
Deletionskipped
CJD
Analysis
Some nodes may not be full
Nodes are more full than B-trees
This reduces the number of nodes required.
Each node is at least two-thirds full, so the heightis less than about log 2m/3 n
CJD
B+-TreesDescription :
modification of B-Trees where each node is atleast half full
internal nodes do not contain addresses of keyvalues: just keys and child pointers
Each K i in internal nodes represent the largestkey value in the left subtree pointed by S i-1. S tpoints to the rightmost subtree whose key valuesare greater than K tall keys and their addresses are in the leaves
leaves are linked to each otherCJD
B+-TreesDescription (continued):
Often used as indexing technique where eachleaf is stored as a block in the file.
Each leaf node contains the key values, theaddress of the block in the disk, and theaddress of the next leaf.
The index is usually stored as a separate indexfile from the data file.
-
8/7/2019 CS102-07 Indexed Sequential_1
10/10
CJD
Example B +-TreeNode Layout : (n, S 0, (K 1, S 1), (K 2, S 2), (K t, S t))
60 8020
10 20
40
30 40 50 60 70 80 90 100
..............................OtherInfo
801006090301070205040Key
9876543210RelAddr
CJD
Implementations
CDC machines use a variation of B +-Trees toimplement its indexed sequential file organizationcalled SIS (Scope Indexed Sequential)
IBM uses a variation of B+-Trees to implementits indexed sequential file organization calledVSAM (Virtual Sequential Access Method)
IBM also implemented ISAM (IndexedSequential Access Method) using cylinders andsurfaces to organize its indices, prime area andoverflow areas. It is a variation of m-way searchtrees.
CJD
End