CS102-07 Indexed Sequential_1

download CS102-07 Indexed Sequential_1

of 10

Transcript of CS102-07 Indexed Sequential_1

  • 8/7/2019 CS102-07 Indexed Sequential_1

    1/10

  • 8/7/2019 CS102-07 Indexed Sequential_1

    2/10

    CJD

    Perfect Binary TreesPerfect Binary tree :

    a perfect binary tree has all nodes with 2 childrenexcept the leaf nodes which are all at level n.

    a perfect binary tree has 2l-1

    nodes at level l.a perfect binary tree of height h has 2 h -1 nodes

    the height h of a perfect binary tree with n nodesis log 2(n+1) .

    Binary trees have height at least log 2(n+1) .

    a tree is balanced if the height is a minimum forthe number of nodes. (Other non-equivalent andapproximate definitions of "balanced" exist)

    CJD

    Example Perfect and Balanced Trees

    E

    MI

    B

    D

    L N

    F G

    C

    A

    OH KJ

    CJD

    Binary Search TreesBinary Search Tree :

    an organized binary tree such that

    the key value of every node is greater thanthe key value of its left child and less than thekey value of its right child.

    if the tree is balanced, searching a BST could beas efficient as binary search

    balanced binary search trees are used toorganize data for quick access.

    CJD

    AVL TreesCreated by Adelson, Velskii, Landis in 1962

    height balanced binary search trees

    AVL trees definition

    the heights of the left and right subtrees ofevery node of T do not differ by more than 1

    Random retrieval in an AVL tree

    maximum height h 1.4404 log 2 (n+2) - 0.328or roughly 1.5 log 2 n

    for n=1,000,000, max h is 29.

    proven by AVL themselves.

    CJD

    AVL Tree InsertionUse binary search to determine insertion point ofnew key, k. Insert new node x containing k.

    Update the subtree heights, h , of x, h(x)=1, andupwards to all its ancestors (no change or add 1)

    While updating subtree heights, if there is animbalance at node a : |h(a.left) h(a.right)| > 1,

    Let p be a.parent, p is null if a is the root.

    Let node b be child of a on path from x.

    Let node c be child of b on path from x.

    Fix imbalance using appropriate rotations.

    CJD

    AVL Tree Rotations (1)To maintain the AVL tree after insertions anddeletions, local rotations are performed.

    20

    10

    Initial BST Insert 30

    10 30

    20

    RR-RotationL2 null

    30

    20

    10

    R2

    R1

    a

    b

    c

  • 8/7/2019 CS102-07 Indexed Sequential_1

    3/10

    CJD

    RR Rotationa.right b.left

    b.left a

    if p = NULL

    then root belse if p.left = a

    then p.left = b

    else p.right = b

    h(a) = h(a) - 1

    CJD

    AVL Tree Rotations (2)

    10

    40

    30

    20

    Insert 40 Insert 50

    10

    50

    40

    20

    30

    RR-RotationL2 null

    50

    10

    40

    30

    20

    R2

    R1

    c

    b

    a

    CJD

    AVL Tree Rotations (3)

    Insert 60RR-RotationL2 not null

    10

    50

    40

    20

    30 60

    10

    50

    40

    20

    30

    60

    R1

    R2

    c

    b

    a

    CJD

    AVL Tree Rotations (4)

    Without 60,Insert 35 or 25

    RL-Rotation40s left 30s right20s right 30s left30s left 2030s right 40

    Note : LL- and LR-Rotations are symmetric toRR- and RL-Rotations

    10 50

    4020

    30

    3525

    10

    50

    40

    20

    30

    35

    R1

    L2

    25

    c

    b

    a

    CJD

    RL Rotationa.right c.left

    b.left c.right

    c.left a

    c.right b

    if p = NULL

    then root c

    else if p.left = a

    then p.left = c

    else p.right = c

    h(a) = h(a) 2

    h(b) = h(b) 1

    h(c) = h(c) + 1CJD

    AVL Updates and DeletionsWhat about key updates ?

    Delete record, and insert record with new key

    How do you handle deletions?

    Swap with successor (next in linear ordering)Successor is a leaf node.

    Delete this node after swap.

    Handle rotations from thereupdate subtree heights towards root.

    do rotations if it becomes imbalanced.

    Details left to the student

  • 8/7/2019 CS102-07 Indexed Sequential_1

    4/10

    CJD

    AVL Trees Analysis

    If the AVL tree is created in main memory, theremay not be enough space for all the key values.

    If the AVL tree is created on disk, the number ofdisk accesses required to access a record is atmost proportional to the height of the tree.

    for 1,000,000 records, no more than 29 diskaccess are required for the search of anyrecord in the file.

    This beats sequential access.

    CJD

    m-Way Search TreesDefinition :

    All nodes have m or less children

    Each node contains the following information:

    t = the number of key values in the node, 1 t < mS 0 = pointer to a subree containing key values lessthan K 1(K1, A1, S 1), , (K t, At, S t) = t triples described below

    CJD

    m-Way Search TreesDefinition (continued):

    Ki (1 i t) are key values where K i < Ki+1 for1 i < t

    S i (1 i < t) are pointers to subtree containingkey values between K i and K i+1 exclusive

    S t is a pointer to subtree containing key valuesstrictly greater than K tS i (0 i t) are pointers to balanced m-waysearch subtrees

    Ai (1

    i

    t) are addresses in the file of the recordwith key K iCJD

    Access

    Accessing the m-way search tree :

    Each node of the tree is stored in a block ofstorage

    All computations within a node are done inmain memory

    CJD

    Illustration

    1, 0, (100, addr[100], 0)f1, 0, (60, addr[60], 0)e2, 0, (80, addr[80], 0), (90, addr[90], f)d2, 0, (40, addr[40], 0), (50, addr[50], e)c2, 0, (10, addr[10], 0), (20, addr[20], 0)b2, b, (30, addr[30], c), (70, addr[70], d)aFormatNode

    30 70

    40 50 80 9010 20

    60 100

    a

    b c d

    e f

    CJD

    m-Way Analysis (1)

    Since there are more key values per node, thereare less nodes

    Since a node could have more than 2 children, theheight is generally smaller than binary searchtrees with the same number of nodes

    If balanced, the height of an m-way search treewith n nodes is about log m n which is smallerthan for a balanced binary search tree when m=2.

  • 8/7/2019 CS102-07 Indexed Sequential_1

    5/10

  • 8/7/2019 CS102-07 Indexed Sequential_1

    6/10

    CJD

    Example

    50 70

    80 87 95

    10 20 40

    5730

    55 60 75 10090

    partial insertion of87 and 85

    85

    adjust overflow node

    CJD

    Example

    50 70 87

    95

    10 20 40

    5730

    55 60 75 10090

    partial insertion of87 and 85

    85

    80

    adjust overflow node

    CJD

    Example

    50

    95

    10 20 40

    5730

    55 60 75 10090

    complete insertion of87 and 85

    85

    80

    70

    87

    CJD

    Deletion (1)search for key value and delete it from containingnode

    if node is a leaf and does not underflow, youredone

    if node is a leaf and underflows,

    search for adjacent sibling with more thanminimum number of keys

    if sibling found, adjust all nodes f rom this sibling tocurrent node and parent, to transfer an excess keyto the current node using simple rotations

    CJD

    Deletion (2)if sibling not found,

    merge the node with a sibling node. Demote theintervening key value in the parent node to themerged node.

    handle deletion of a key value from parent as ifparent is a leaf (i.e. r epeat this step for parent)

    if node is a non-leaf,

    replace the deleted key by its predecessorkey in the tree

    handle the deletion of the predecessor keyfrom the containing leaf node

    CJD

    Example : m=5

    B-Trees where m=5

    Root node has 2 to 5 children

    all non-terminal nodes has at least 5/2 = 3children: 3, 4 or 5

    all non-terminal nodes has 2, 3 or 4 keys

    all terminal nodes are on the same level

  • 8/7/2019 CS102-07 Indexed Sequential_1

    7/10

    CJD

    Example

    90

    30 70

    10 20 40 60 75 80 100 110 130 140 150 170 180 200 210 230 240

    120 160 190 220

    Lets delete 20 - its in a leaf node;- results in underflow in 10;- adjacent sibling(s) minimal

    merge with adjacent sibling

    B-Treeof order 5

    CJD

    Example

    90

    70

    10 30 40 60 75 80 100 110 130 140 150 170 180 200 210 230 240

    120 160 190 220

    partial deletionof 20

    adjust underflow node rotate with non-minimal sibling

    CJD

    Example

    120

    70 90

    10 30 40 60 75 80 100 110 130 140 150 170 180 200 210 230 240

    160 190 220

    20 deleted

    Lets delete 160 its a non-leaf- replace it with its predecessor 150

    CJD

    Example

    120

    70 90

    10 30 40 60 75 80 100 110 130 140 170 180 200 210 230 240

    150 190 220

    160 deleted

    Lets delete 90 replace it with its predecessor 80 results in underflow in 75

    rotate 75 with non-minimal sibling

    CJD

    Example

    120

    60 80

    10 30 40 70 75 100 110 130 140 170 180 200 210 230 240

    150 190 220

    90 deleted

    Lets delete 120 replace it with its predecessor 110- results in underflow in 100- adjacent sibling is minimal merge !

    CJD

    Example

    110

    60

    10 30 40 70 75 80 100 130 140 170 180 200 210 230 240

    150 190 220

    partial deletionof 120

    adjust underflow node rotate with non-minimal sibling

  • 8/7/2019 CS102-07 Indexed Sequential_1

    8/10

    CJD

    Example

    150

    60 110

    10 30 40 70 75 80 100 130 140 170 180 200 210 230 240

    190 220

    120 deleted

    Lets delete 190 replace with its successor 180- results in underflow in 170;- adj sibling minimal; merge !

    CJD

    Example

    150

    60 110

    10 30 40 70 75 80 100 130 140 170 180 200 210 230 240

    220

    partial deletionof 190

    adjust underflow node adjacent sibling minimalmerge !

    CJD

    Example

    60 110 150 220

    10 30 40 70 75 80 100 130 140 170 180 200 210 230 240

    190 deleted

    CJD

    AnalysisSome nodes may not be full (i.e. not havemaximum number of children)

    This increases the number of nodes but does notsubstantially increase the height.

    This allows easy insertions because there maybe space available in the nodes

    Each node is at least half-full, so the height isless than about logm/2 n

    Very efficient to use as an indexing structure of

    files and compares very well to a perfectlybalanced tree

    CJD

    B*-Trees

    Description :

    This is a modification of B-Trees using anoverflow handling technique

    Insertion into a full node causes rotations thatoverflow excess keys to sibling nodes.

    Nodes are at least 2/3 full instead of at least half-full

    CJD

    B*-TreesB*-Trees Definition :

    An empty tree is a special m-way search tree with nonodes

    All non-empty subtrees are m-way search tr ees

    The root node has at least two children and a maximum of2 (2m-2)/3 + 1 children. Note: This could exceed m.

    All non-terminal nodes (S i 0 for some i) other than theroot has at least (2m-1)/3 children; therefore the non-terminal nodes store K i values i (2m-1)/3 - 1

    All terminal nodes (S i = 0 for all i) are on the same levelTerminal nodes are initially empty.

  • 8/7/2019 CS102-07 Indexed Sequential_1

    9/10

    CJD

    Illustrative ComparisonWith m=7, B-Trees require each (non-root)internal node to have 4 to 7 children or 3 to 6keys per node.

    With m=7, B*-Trees require each (non-root)internal node to have 5 to 7 children or 4 to 6keys per node.

    With m=31, B-Trees require each (non-root)internal node to have 16 to 31 children or 15 to30 keys per node.

    With m=31, B*-Trees require each (non-root)internal node to have 20 to 31 children or 19 to30 keys per node.

    CJD

    InsertionInserting a key to a full non-root node causes overflow ofan excess key into its siblings using rotations throughtheir parent.

    If all siblings are also full, then the overflow node, a

    nearest sibling, and the dividing parent key are merged.This is then divided into three n odes as evenly aspossible with the 2 dividing keys moved u p to theirparent, keeping each new node at least about 2/3 full.

    If the parent overflows, then adjust that node with itssiblings and parent.

    Inserting a key to a full root node creates a new root withthe first half of keys as its left child node, and the secondhalf of keys as its right child node.

    CJD

    Deletionskipped

    CJD

    Analysis

    Some nodes may not be full

    Nodes are more full than B-trees

    This reduces the number of nodes required.

    Each node is at least two-thirds full, so the heightis less than about log 2m/3 n

    CJD

    B+-TreesDescription :

    modification of B-Trees where each node is atleast half full

    internal nodes do not contain addresses of keyvalues: just keys and child pointers

    Each K i in internal nodes represent the largestkey value in the left subtree pointed by S i-1. S tpoints to the rightmost subtree whose key valuesare greater than K tall keys and their addresses are in the leaves

    leaves are linked to each otherCJD

    B+-TreesDescription (continued):

    Often used as indexing technique where eachleaf is stored as a block in the file.

    Each leaf node contains the key values, theaddress of the block in the disk, and theaddress of the next leaf.

    The index is usually stored as a separate indexfile from the data file.

  • 8/7/2019 CS102-07 Indexed Sequential_1

    10/10

    CJD

    Example B +-TreeNode Layout : (n, S 0, (K 1, S 1), (K 2, S 2), (K t, S t))

    60 8020

    10 20

    40

    30 40 50 60 70 80 90 100

    ..............................OtherInfo

    801006090301070205040Key

    9876543210RelAddr

    CJD

    Implementations

    CDC machines use a variation of B +-Trees toimplement its indexed sequential file organizationcalled SIS (Scope Indexed Sequential)

    IBM uses a variation of B+-Trees to implementits indexed sequential file organization calledVSAM (Virtual Sequential Access Method)

    IBM also implemented ISAM (IndexedSequential Access Method) using cylinders andsurfaces to organize its indices, prime area andoverflow areas. It is a variation of m-way searchtrees.

    CJD

    End