University of Liverpool - Computer Science Intranetmichele/TEACHING/COMP102/2006/...AVL trees...

28
Data Structures and Information Systems Part 1: Data Structures Michele Zito Lecture 11: Trees . – p.1/87 Topics Trees Binary trees Binary tree implementation Binary tree traversal . – p.2/87 Trees 1-dimensional arrays, linked lists, stacks and queues are linear data structures, that is, sequences of data items We have also seen 2-dimensional arrays which organise data items into a rigid 2-dimensional structure However, in a number of applications, data items are not organised in this way: directory structure of a file store structure of arithmetic expressions the part-whole relationship in physical objects hierarchy of an organisation Appropriate data structures for these applications are trees . – p.3/87 Trees: Recursive definition A tree T of order n is either empty or consists of a node called root of T together with at most n trees, called the subtrees of T We usually depict a non-empty tree T as a set of nodes connected by directed links, with the root of T at the top: A tree of order 3 A tree of order 2 . – p.4/87

Transcript of University of Liverpool - Computer Science Intranetmichele/TEACHING/COMP102/2006/...AVL trees...

  • Data Structures and Information SystemsPart 1: Data Structures

    Michele Zito

    Lecture 11: Trees

    . – p.1/87

    Topics

    Trees

    Binary trees

    Binary tree implementation

    Binary tree traversal

    . – p.2/87

    Trees

    1-dimensional arrays, linked lists, stacks and queues arelinear data structures, that is, sequences of data items

    We have also seen 2-dimensional arrays which organisedata items into a rigid 2-dimensional structure

    However, in a number of applications, data items are notorganised in this way:

    directory structure of a file store

    structure of arithmetic expressions

    the part-whole relationship in physical objects

    hierarchy of an organisation

    Appropriate data structures for these applications are trees

    . – p.3/87

    Trees: Recursive definition

    A tree T of order n is either empty or consists of a nodecalled root of T together with at most n trees, called thesubtrees of T

    We usually depict a non-empty tree T as a set of nodesconnected by directed links, with the root of T at the top:

    A tree oforder 3

    A tree oforder 2

    . – p.4/87

  • Trees: Non-recursive definition (1)

    Trees are a special kind of directed graph

    A directed graph is a (finite) set of nodes connected bydirected links called edges

    a

    b c

    d e

    g

    A path in a directed graph is a list of nodes n0,. . . ,nk suchthat for each successive pair (ni,ni+1) there is a directed linkfrom ni to ni+1E.g. a,c,e in the graph above

    . – p.5/87

    Trees: Non-recursive definition (1)

    Trees are a special kind of directed graph

    A directed graph is a (finite) set of nodes connected bydirected links called edges

    a

    b c

    d e

    ga

    c

    e

    A path in a directed graph is a list of nodes n0,. . . ,nk suchthat for each successive pair (ni,ni+1) there is a directed linkfrom ni to ni+1E.g. a,c,e in the graph above

    . – p.5/87

    Trees: Non-recursive definition (2)

    A cycle in a directed graph is a path that starts and ends withthe same nodeE.g. a,c,e,a in the graph below

    a

    b c

    d e

    g

    A directed graph is cyclic if it contains at least one cycle,otherwise it is acyclicObviously, the graph above is cyclic

    . – p.6/87

    Trees: Non-recursive definition (3)

    A directed graph is said to fan-in at node n if there are twodistinct nodes n1 and n2 with edges from n1 to n and n2 to nE.g. the graph below fans-in at node d

    a

    c

    e

    b

    d

    g

    A root of a directed graph is a node with no edges leading toitE.g. the nodes a and g are both roots in the graph above

    . – p.7/87

  • Trees: Non-recursive definition (4)

    A tree is a directed graph G such that

    the graph G is acyclicthe graph G has exactly one rootthe graph G does not fan-in at any node

    The order of the tree is the maximal number of edges leavingfrom a node in G

    a

    c

    e

    b

    d

    g a

    c

    e

    b

    d

    not a tree a tree of order 2

    . – p.8/87

    Binary trees

    A tree of order 2 is called a binary tree

    In the following, we will concentrate on binary trees, so whenwe will refer to a tree here we mean a binary tree

    In a binary tree each node is connected to at most two othernodes which are called the left and the right child (one orboth can be non-existent)

    A node with no children is called a leaf

    root

    left child right childleaf

    leaf

    . – p.9/87

    More precise definition

    A binary tree T is either empty or consists of a node called rootof T together with at most two trees, called the left and rightsubtree of T, together with the following operations:

    Create an empty tree

    Test whether a tree is empty (isEmpty)

    Insert a new node

    Delete a node

    Retrieve the data item stored in the root of a non-empty tree

    Retrieve the left or right subtree of a non-empty tree

    List (traverse) the nodes of the tree

    . – p.10/87

    Binary tree implementation: Nodes

    Each node N in our implementation of binary trees has a data partand two references, left and right, referring to the roots of the leftand right subtrees of N

    In a leaf, the left and right reference will refer to null

    left data right

    left data right left data right

    left data right left data right left data right

    null null

    null

    null null null null

    . – p.11/87

  • Binary trees: Node implementation (1)

    p u b l i c c lass TreeNode {pro tec ted Object data ; / / Data stored i n nodepro tec ted TreeNode l e f t ; / / L ink to l e f t subtreepro tec ted TreeNode r i g h t ; / / L ink to r i g h t subtree

    / / Const ructorsp u b l i c TreeNode ( Object o , TreeNode l , TreeNode r ) {

    data = o ; / / Set datal e f t = l ; / / Set re ference to l e f t subtreer i g h t = r ; / / Set re ference to r i g h t subtree

    }

    p u b l i c TreeNode ( i n t number , TreeNode l , TreeNode r ) {data = new In tege r ( number ) ;l e f t = l ;r i g h t = r ;

    }

    / / Accessors and Mutators}

    . – p.12/87

    Binary trees: Node implementation (2)

    / / Accessorsp u b l i c Object getData ( ) { r e t u r n ( data ) ; }p u b l i c i n t ge t In tDa ta ( ) {

    r e t u r n ( ( ( I n tege r ) data ) . i n tVa lue ( ) ) ;}

    p u b l i c TreeNode getLe f tSubt ree ( ) { r e t u r n ( l e f t ) ; }p u b l i c TreeNode getRightSubtree ( ) { r e t u r n ( r i g h t ) ; }

    / / Mutatorsp u b l i c vo id setData ( Object o ) { data = o ; }p u b l i c vo id se t In tDa ta ( i n t number ) {

    data = new In tege r ( number ) ;}

    p u b l i c vo id se tLe f tSub t ree ( TreeNode N ) { l e f t = N ; }p u b l i c vo id setRightSubt ree ( TreeNode N ) { r i g h t = N ; }

    . – p.13/87

    Binary trees: Implementation (1)In order to deal with illegal operations on empty trees we usejava.lang.NullPointerException

    impor t java . lang . Nu l lPo in te rExcep t ion ;p u b l i c c lass BinaryTree {

    pro tec ted TreeNode roo t ;

    / / Const ructorsp u b l i c BinaryTree ( ) { r oo t = n u l l ; }

    p u b l i c BinaryTree ( TreeNode N ) { roo t = N ; }

    p u b l i c BinaryTree ( Object o ) {r oo t = new TreeNode ( o , n u l l , n u l l ) ;

    }

    p u b l i c boolean isEmpty ( ) { r e t u r n ( roo t == n u l l ) ; }

    / / Other methods}

    . – p.14/87

    Binary tree traversal (1)

    A common task we have considered for all our datastructures is to access all the data items stored in a datastructure one at a time (traversal)

    In a linear structure like an array or a linked list there are twonatural traversals:

    from front to end (first to last data item)from end to front (last to first data item)

    Also there are natural solutions for how this can be done

    in an array we use a for-loopin a linked list we use a while-loop

    . – p.15/87

  • Binary tree traversal (2)

    Since binary trees are non-linear data structures, there aremore possibilitiesWe consider three ways of traversing a binary tree

    Preorder: first access the data of the root node, thentraverse the left subtree, finally traverse the right subtree

    Inorder : first traverse the left subtree, then access the dataof the root node, finally traverse the right subtree

    Postorder: first traverse left subtree, then traverse the rightsubtree, finally access the data of the root node

    Notice that the prefix on the word order indicates when thedata of the root node is accessedThe standard solution for how these different possibilities areimplemented is recursion

    . – p.16/87

    Binary tree traversal: Examples

    Given the following binary tree, we have:

    A

    B C

    D E

    H K

    G

    Preorder (Data,L,R)

    A B D E H K C G

    Inorder (L,Data,R)

    D B H E K A C G

    Postorder (L,R,Data)

    D H K E B G C A

    . – p.17/87

    Preorder traversal: Code

    We can use preorder traversal to construct a string representationof a binary treeTo this end we extend the BinaryTree class by:

    p u b l i c S t r i n g p reo rde rS t r i ng ( ) {i f ( ! isEmpty ( ) )

    r e t u r n ( roo t . p reo rde rS t r i ng ( ) )e lse r e t u r n (new S t r i n g ( " " ) ) ;

    }

    and the TreeNode class by:

    p u b l i c S t r i n g p reo rde rS t r i ng ( ) {S t r i n g st r ingRep = new S t r i n g ( " " ) ;s t r ingRep += data + " " ;i f ( l e f t ! = n u l l ) s t r ingRep += l e f t . p reo rde rS t r i ng ( ) ;i f ( r i g h t ! = n u l l ) s t r ingRep += r i g h t . p reo rde rS t r i ng ( ) ;r e t u r n ( st r ingRep ) ;

    }

    . – p.18/87

    Inorder traversal: Code

    We can use inorder traversal to construct a string representationof a binary treeTo this end we extend the BinaryTree class by:

    p u b l i c S t r i n g i n o r d e r S t r i n g ( ) {i f ( ! isEmpty ( ) )

    r e t u r n ( roo t . i n o r d e r S t r i n g ( ) )e lse r e t u r n (new S t r i n g ( " " ) ) ;

    }

    and the TreeNode class by:

    p u b l i c S t r i n g i n o r d e r S t r i n g ( ) {S t r i n g st r ingRep = new S t r i n g ( " " ) ;i f ( l e f t ! = n u l l ) s t r ingRep += l e f t . i n o r d e r S t r i n g ( ) ;s t r ingRep += data + " " ;i f ( r i g h t ! = n u l l ) s t r ingRep += r i g h t . i n o r d e r S t r i n g ( ) ;r e t u r n ( st r ingRep ) ;

    }

    . – p.19/87

  • Postorder traversal: Code

    We can use postorder traversal to construct a stringrepresentation of a binary treeTo this end we extend the BinaryTree class by:

    p u b l i c S t r i n g pos to rde rS t r i ng ( ) {i f ( ! isEmpty ( ) )

    r e t u r n ( roo t . pos to rde rS t r i ng ( ) )e lse r e t u r n (new S t r i n g ( " " ) ) ;

    }

    and the TreeNode class by:p u b l i c S t r i n g pos to rde rS t r i ng ( ) {

    S t r i n g st r ingRep = new S t r i n g ( " " ) ;i f ( l e f t ! = n u l l ) s t r ingRep += l e f t . pos to rde rS t r i ng ( ) ;i f ( r i g h t ! = n u l l ) s t r ingRep += r i g h t . pos to rde rS t r i ng ( ) ;s t r ingRep += data + " " ;r e t u r n ( st r ingRep ) ;

    }

    . – p.20/87

    Constructing a tree: Example

    Below is an example program that constructs the tree on slide 17and uses the methods preorderString, inorderString,postorderString to print out the tree:

    p u b l i c c lass TestBinaryTree {p u b l i c s t a t i c vo id main ( S t r i n g args [ ] ) {TreeNode nH = new TreeNode ( "H" , n u l l , n u l l ) ;TreeNode nK = new TreeNode ( " K " , n u l l , n u l l ) ;TreeNode nE = new TreeNode ( " E " , nH , nK ) ;TreeNode nD = new TreeNode ( "D" , n u l l , n u l l ) ;TreeNode nB = new TreeNode ( " B " , nD , nE ) ;TreeNode nG = new TreeNode ( "G" , n u l l , n u l l ) ;TreeNode nC = new TreeNode ( "C" , n u l l ,nG ) ;TreeNode nA = new TreeNode ( " A " , nB ,nC ) ;BinaryTree myTree = new BinaryTree (nA ) ;System . out . p r i n t l n ( " Preorder : " + myTree . p reo rde rS t r i ng ( ) ) ;System . out . p r i n t l n ( " Ino rder : " + myTree . i n o r d e r S t r i n g ( ) ) ;System . out . p r i n t l n ( " Postorder : " + myTree . pos to rde rS t r i ng ( ) ) ;

    } }

    . – p.21/87

    Binary tree traversal: Example

    The output of our example program is as follows:

    Preorder : A B D E H K C GInorde r : D B H E K A C GPostorder : D H K E B G C A

    Compare the output or the program to the preorder, inorder, and

    postorder traversals we have computed on page 17

    . – p.22/87

    Data Structures and Information SystemsPart 1: Data Structures

    Michele Zito

    Lecture 12: Efficient search: AVL trees

    . – p.23/87

  • Topics

    Searching in a binary tree

    Binary search trees

    AVL trees

    Inserting in an AVL tree

    Example: Train departure information

    . – p.24/87

    Binary trees

    A binary tree T is either empty or consists of a node calledroot of T together with at most two trees, called the left andright subtrees of TIn a binary tree each node is connected to at most two othernodes which are called the left and the right child (one orboth can be non-existent)A node with no children is called a leaf

    level 1

    level 2

    level 3

    root

    left child right childleaf

    leafThe root of a tree is at level 1, the children of the root are atlevel 2, etc.

    . – p.25/87

    Searching in a binary tree

    Consider the operation of searching for a given object o in abinary tree

    The only way to do that is

    Traverse the binary tree using either preorder,inorder, or postorder traversal, comparing the dataitem stored in each node with the given object o untilthe object is found or all nodes have been visitedIf all nodes have been visited, but the given object ohas not been found, then it is not stored in the tree

    In the worst case we may visit all nodes of a tree (withoutfinding o!)

    How can we improve on this???

    . – p.26/87

    Height of a binary tree

    The height of an empty binary tree is 0

    The height of a non-empty binary tree T with left subtree Tland right subtree Tr is

    height(T) = 1 + max( height(Tl), height(Tr) )

    Examples:

    12

    34 6

    42 6

    Tree of height 3 Tree of height 2

    Alternatively, the height of a tree T is the maximal number ofnodes passed from the root to a leaf in the tree T

    . – p.27/87

  • Binary search trees

    Assume that we are given a collection of data items togetherwith an ordering

    A binary search tree is a binary tree that it is either empty orin which each node contains a data item that satisfies theconditions:

    All data items (if any) in the left subtree of the root aresmaller than the data item in the root

    The data item in the root is smaller than all data items (ifany) in the right subtree

    The left and right subtrees of the root are again binarysearch trees

    . – p.28/87

    Binary search trees: Examples

    5

    4 6

    3 8

    5

    4 8

    2 7

    a binary search tree not a binary search tree

    12

    34

    5

    12

    346

    5

    a binary search tree not a binary search tree

    . – p.29/87

    Searching in a binary search tree

    If we want to find the node storing a given data item o in abinary search tree T, then we proceed as follows:

    1. If the tree T is empty, then we have failed to find o2. Otherwise, compare o with the data item o’ stored in the

    root R of TIf o and o’ are identical, then return Rif o is smaller than o’ then search for o in the leftsubtree of R (by recursively applying the samealgorithm)if o is greater than o’ then search for o in the rightsubtree of R (again, recursively)

    . – p.30/87

    Efficiency (1)

    Question: Is the new search method on binary search treesbetter than the old search method on binary (search) trees?Consider the following two binary search trees:

    1

    2

    3

    4

    5

    6

    7

    12

    34

    56

    7

    In the left tree, the new search needs three accesses to dataitems to find the number 7 (the old search needs seven)In the right tree, the new search needs seven accesses todata items to find the number 7 (as does the old search)

    . – p.31/87

  • Efficiency (2)

    Answer: In a binary search tree of height h, the new searchmethod needs at most h accesses; it only performs betterthan the old search method if h is (much) smaller than thenumber of nodes n in the treeQuestion: What distinguishes binary search trees on whichour new search method is fast from those on which it is not?Answer: The binary search tree should be balanced!

    A binary tree of height h is balanced if all nodes at levels1, . . . , h−2 have two children, while nodes at level h−1 canhave zero, one, or two children(Be careful: There are other definitions around!)

    . – p.32/87

    Balanced binary search trees

    One more question has to be answered:

    How do we construct balanced binary search trees?

    Note that with our current insert method for binary searchtrees the shape/height of a binary search tree depends onthe order in which items are inserted

    Inserting 1,2,3 Inserting 2,1,3

    12

    32

    1 3

    It turns out that constructing balanced trees is difficultInstead we focus on height-balanced trees

    . – p.33/87

    AVL trees

    A binary tree is called height-balanced if its subtrees differ inheight by no more than one and the subtrees are againheight-balanced

    An AVL tree is a binary search tree in which the heights ofthe left and right subtrees of the root differ by at most 1 andin which the left and right subtrees are again AVL trees

    Obviously, AVL trees are height-balanced

    AVL trees are named after their inventors:

    G. M. Adelson-Velskii and E. M. Landis

    . – p.34/87

    AVL trees: Examples

    5

    4 6

    3 8

    12

    453

    6

    an AVL tree not an AVL tree

    7

    4 8

    2 5

    5

    4 8

    2 7

    an AVL tree not an AVL tree

    . – p.35/87

  • Inserting in an AVL tree: Single rotation

    Our insert method does not guarantee that we get abalanced tree nor does it guarantee that we get an AVL tree

    The idea underlying an insert method for AVL trees that willalways produce an AVL tree, is that we have to rotate nodesE.g. insert the number 6 into the AVL tree below

    7

    84

    5

    . – p.36/87

    Inserting in an AVL tree: Single rotation

    Our insert method does not guarantee that we get abalanced tree nor does it guarantee that we get an AVL tree

    The idea underlying an insert method for AVL trees that willalways produce an AVL tree, is that we have to rotate nodesE.g. insert the number 6 into the AVL tree below

    7

    84

    5

    6

    . – p.36/87

    Inserting in an AVL tree: Single rotation

    Our insert method does not guarantee that we get abalanced tree nor does it guarantee that we get an AVL tree

    The idea underlying an insert method for AVL trees that willalways produce an AVL tree, is that we have to rotate nodesE.g. insert the number 6 into the AVL tree below

    7

    8

    4

    5

    6

    . – p.36/87

    Single rotation to the right, general rule

    1. Assume the root of the tree is called N and that the sub-treerooted at N’s left child (call such a node C) is one unit higherthan the other sub-tree§

    2. insert an element so that the height of the left sub-tree rootedat C grows: this results in an unbalanced tree.

    3. To rebalance the tree, set the root to be C and C’s right childto be N, and ...

    4. then (to make sure the new tree is a binary search tree) setthe left sub-tree of the tree rooted at N to be T2.

    The resulting tree is now completely balanced (pictures on thefollowing slides explain the situation).

    § If the two sub-trees have the same height no unbalance can occur!. – p.37/87

  • Single rotation, general rule

    N

    C

    T3

    T T1 2

    . – p.38/87

    Single rotation, general rule

    N

    C

    T3

    N

    C

    T

    T

    3

    T1 2T T1 2

    . – p.39/87

    Single rotation, general rule

    N

    C

    T3

    N

    C

    T

    T

    3

    T1 2T T1 2

    . – p.40/87

    Single rotation, general rule

    N

    C

    T3

    N

    C

    T

    T

    3

    T1 2T T1 2

    . – p.41/87

  • Single rotation, general rule

    33T

    N

    CC

    T

    N

    T1 2T T1 2 T

    . – p.42/87

    Single rotation, general rule

    33T

    N

    T

    C

    T1 2T T1 2 T

    . – p.43/87

    Inserting in an AVL tree: Double rotation

    If after a single rotation the tree is still not an AVL tree, thenwe perform a double rotation in the opposite direction

    E.g. insert the number 6 into the AVL tree below

    7

    8

    3

    4

    5

    . – p.44/87

    Inserting in an AVL tree: Double rotation

    If after a single rotation the tree is still not an AVL tree, thenwe perform a double rotation in the opposite direction

    E.g. insert the number 6 into the AVL tree below

    7

    8

    3

    4

    5

    6

    . – p.44/87

  • Inserting in an AVL tree: Double rotation

    If after a single rotation the tree is still not an AVL tree, thenwe perform a double rotation in the opposite direction

    E.g. insert the number 6 into the AVL tree below

    4

    3 7

    5 8

    6

    . – p.44/87

    Inserting in an AVL tree: Double rotation

    If after a single rotation the tree is still not an AVL tree, thenwe perform a double rotation in the opposite direction

    E.g. insert the number 6 into the AVL tree below

    5

    4

    3

    7

    86

    Single and double rotations are the basic building blocks ofinserting into an AVL tree

    . – p.44/87

    Double rotation: left-right, general rule

    1. insert an element so that the height of the right sub-treerooted at C grows due to the expansion of the left sub-treerooted at C’s right child G!

    2. To rebalance the tree, we first perform a rotation to the left inthe tree rooted at node C, and then ...

    3. since the intermediate tree is still not balanced we perform aright rotation of the nodes C, G, and N (see pictures).

    The resulting tree is an AVL tree.

    . – p.45/87

    Double rotation: left-right, general rule

    C

    T1 T

    G

    2 T3

    T4

    N

    . – p.46/87

  • Double rotation: left-right, general rule

    C

    T1 T

    G

    2 T3

    T4

    N

    . – p.47/87

    Double rotation: left-right, general rule

    C

    T1 T

    G

    2 T3

    T4

    N

    . – p.48/87

    Double rotation: left-right, general rule

    C

    T1 T

    G

    2 T3

    T4

    N

    . – p.49/87

    Double rotation: left-right, general rule

    C

    T1 T

    G

    2 T3

    T4

    N

    . – p.50/87

  • Double rotation: left-right, general rule

    N

    G

    T

    C

    2 T3

    T4

    T1

    . – p.51/87

    Double rotation: left-right, general rule

    N

    G

    T

    C

    2 T3

    T4

    T1

    . – p.52/87

    Double rotation: left-right, general rule

    N

    C

    T

    G

    2 T3

    T4

    T1

    . – p.53/87

    Double rotation: left-right, general rule

    G

    T

    N

    3

    T4T2T1

    C

    . – p.54/87

  • Double rotation: left-right, general rule

    N

    T

    C

    G

    3

    T2T1

    T4

    . – p.55/87

    Double rotation: left-right, general rule

    3T

    N

    2T1

    C

    G

    T4

    T

    . – p.56/87

    Missing cases ...

    The case of a single rotation to the left or a double right-leftrotations have NOT been described.

    The first one is quite similar to the case of a single rotation tothe right (but the starting tree is the one that is symmetric ofthe one on slide 38, with respect to its root N).

    The second one is again quite similar to the case of a doubleleft-right rotation.

    It is a very useful exercise to draw on a piece of paper with apencil (and an eraser) the starting situation in each case andthen simulate the processing that takes place.

    . – p.57/87

    Using binary search trees

    Our examples for binary search trees so far have only usednumbers and characters as data items

    However, we can use instances of any class that implementsthe Comparable interface

    That is, the class has to implement a method

    p u b l i c i n t compareTo ( Object o ther )

    which compares two objects

    In comparing two objects, compareTo does not have to takeinto account all information about the two objects

    E.g. if our data items are student records then it is enough tocompare student ids (since they uniquely identify a student)The information that is used is called the key of a data item

    . – p.58/87

  • Example: Train departure information

    We want to develop yet another implementation of ourinformation system for train departure information atLiverpool Lime Street Station

    The information is given in form of a table

    Time Destination Track9:10 Manchester 69:15 Leeds 109:20 Nottingham 7

    The only kind of query we deal with is ‘When and wheredoes the next train to X leave?’

    The implementation should use the binary search tree datastructure

    . – p.59/87

    Train departure information: Idea

    We have to come up with a representation of train departureinformation in a form that allows us to use a binary searchtree

    Since in our queries we are looking for a city, this should bethe key

    The information on all trains going to a particular city is thenstored in a singly linked list of TDInfo objects

    city

    infoshead data next

    timecity

    track

    data next

    timecity

    track

    null

    . – p.60/87

    Train departure information: The code (1)

    The data structure that was just described is implemented in classTDData (note how compareTo is implemented)

    p u b l i c c lass TDData implements Comparable {pro tec ted S t r i n g c i t y ;p ro tec ted L inkedL i s t i n f o s ;

    p u b l i c TDData ( S t r i n g T c i t y ) {c i t y = T c i t y ; i n f o s = new L inkedL i s t ( ) ; }

    p u b l i c S t r i n g ge tC i t y ( ) { r e t u r n ( c i t y ) ; }p u b l i c L inkedL i s t ge t I n fos ( ) { r e t u r n ( i n f o s ) ; }

    p u b l i c i n t compareTo ( Object o ther ) {TDData tdd = ( TDData ) o ther ;r e t u r n ( t h i s . ge tC i t y ( ) . compareTo ( tdd . ge tC i t y ( ) ) ) ;

    }

    p u b l i c vo id addLast ( Object data ) { i n f o s . addLast ( data ) ; }}

    . – p.61/87

    Train departure information: The code (2)

    Next we construct the binary search tree storing the traindeparture informationThe root of the tree stores information on trains to Manchester, theleft child on trains to Leeds, and the right child on trains toNottingham

    c lass TDIBST {s t a t i c f i n a l BSTree t d i = new BSTree ( ) ;s t a t i c {

    TDData toManchester = new TDData ( " Manchester " ) ;toManchester . addLast (new TDInfo ( " 9 : 1 0 " , " " , 6 ) ) ;TDData toLeeds = new TDData ( " Leeds " ) ;toLeeds . addLast (new TDInfo ( " 9 : 1 5 " , " " , 1 0 ) ) ;TDData toNott ingham = new TDData ( " Nottingham " ) ;toNott ingham . addLast (new TDInfo ( " 9 : 2 0 " , " " , 7 ) ) ;t d i . i n s e r t ( toManchester ) ;t d i . i n s e r t ( toLeeds ) ;t d i . i n s e r t ( toNott ingham ) ;

    }

    . – p.62/87

  • Train departure information: The code (3)

    Implementing the searchDest method which returns departuretime and track of the next train leaving for a given destination isstraightforward using the search method for binary search treesHowever, retrieving the required information from a node returnedby the search method is a bit tedious

    p u b l i c s t a t i c S t r i n g [ ] searchDest ( S t r i n g dest ) {S t r i n g [ ] r e s u l t = n u l l ;TDData tdDest = new TDData ( dest ) ;BSTNode node = t d i . search ( tdDest ) ;i f ( node == n u l l )

    r e t u r n r e s u l t ;TDData data = ( TDData ) node . getData ( ) ;TDInfo t r a i n = ( TDInfo ) data . ge t I n fos ( ) . getHead ( ) . getData ( ) ;r e s u l t = new S t r i n g [ 2 ] ;r e s u l t [ 0 ] = t r a i n . getTime ( ) ;r e s u l t [ 1 ] = S t r i n g . valueOf ( t r a i n . getTrack ( ) ) ;r e t u r n r e s u l t ;

    }

    . – p.63/87

    Train departure information: The code (4)

    Finally, here is our main program that asks the user where hewants to go, uses searchDest to find the required information,and prints it outNote that the main program is identical to the one in TDIDLLexcept that we use TDIBST instead of TDIDLL

    p u b l i c s t a t i c vo id main ( S t r i n g args [ ] ) throws IOExcept ion{InputStreamReader inpu t = new InputStreamReader ( System . i n ) ;BufferedReader keyboardInput = new BufferedReader ( i npu t ) ;System . out . p r i n t l n ( " Where do you want to go ? " ) ;S t r i n g [ ] i n f o = TDIBST . searchDest ( keyboardInput . readLine ( ) ) ;i f ( i n f o == n u l l )System . out . p r i n t l n ( " Sorry , no t r a i n to t h i s d e s t i n a t i o n " ) ;

    e lseSystem . out . p r i n t l n ( i n f o [ 0 ] + " depar t ing a t t r a ck " + i n f o [ 1 ] ) ;

    } }

    . – p.64/87

    Train departure information: ScriptHere are two runs of the program:

    Information successfully retrieved

    Where do you want to go?Nottingham9:20 departing at track 7

    No information found

    Where do you want to go?LondonSorry, no train to this destination

    . – p.65/87

    Announcements

    FOURTH ASSIGNMENT DEADLINE:

    Wednesday, February 22th, 14.30, LAB 7 Boxes

    (late submissions to be left in the wooden box next to my office door).

    SIGN the PLAGIARISM DECLARATION!!

    EXTRA TUTORIALS:

    Additional lecture (with question & answer sessions) on MONDAY

    FEBRUARY 27th (Lecture theatre A, 12noon).

    BE THERE!

    . – p.66/87

  • Data Structures and Information SystemsPart 1: Data Structures

    Michele Zito

    Lecture 13: B-trees

    . – p.67/87

    Topics

    Searching for data: Storage considerations

    B-trees

    B-trees: Variations

    Searching in a B-tree

    Inserting in a B-tree

    Deleting in a B-tree

    . – p.68/87

    Searching for data: Storage considerations (1)

    When considering how efficiently we can perform searcheson a given data structure, we simply counted the number ofaccessesThis assumes that all accesses are equally expensive

    If we consider very large collections of data items then wehave to take into account whether data items are stored inmain memory or on mass storage

    Main memory: Access time 10 ns (10−8 seconds)Data transfer rate 1000 MByte/sec

    ; 10 Byte in 10 nsHard disk: Access time 10 ms (10−2 seconds)

    Data transfer rate 50 MByte/sec; 0.5 MByte in 10 ms

    . – p.69/87

    Searching for data: Storage considerations (2)

    How data items are stored in main memory or on massstorage is also importantConsider the following two structures:

    left data righto1

    left data righto2

    left data righto3

    link0 data1 link1 data2 link2 data3 link3o2 o1 o3

    If both are held on mass storage then the second one ismuch more time efficient for searches

    . – p.70/87

  • B-trees

    A B-tree of order M is a tree of order M with the followingproperties:

    Every node has at most M subtrees T0, T1, . . .The root is either a leaf or has at least 2 childrenEvery node, except for root and leaves, has at leastdM/2e childrenAll leaves have the same distance to the rootA non-leaf node with n+1 children contains n keysK1 < K2 < · · · < Kn and the following conditions hold:

    All keys in T0 are smaller than K1All keys in Tn are greater than KnAll keys in Ti, 0 < i < n, are greater than Ki andsmaller than Ki+1

    . – p.71/87

    B-trees: Remarks

    B-trees were first proposed by Bayer & McCreight in 1972 asa way of maintaining efficient indices for very large databases

    Sometimes B-trees are also referred to as balanced M-waymultiway trees

    Multiway trees are trees with any number of children for eachnode

    M-way trees are trees with no more than M children for eachnode (a notion identical to a tree of order M)

    Most notions we have defined for trees carry over to B-trees,e.g. the height of a tree, the level of a node in a tree, etc.

    The root of a tree is at level 1, the children of the root are atlevel 2, etc.

    . – p.72/87

    B-trees: Example

    Belows is a B-tree of order 4:

    L0K1L1K2L2K3L323

    L0K1L1K2L2K3L312 19

    L0K1L1K2L2K3L329 36

    K1 K2 K36 8 9

    K1 K2 K313 16

    K1 K2 K320 21 22

    K1 K2 K324 25 27

    K1 K2 K330 32

    K1 K2 K341 56 91

    . – p.73/87

    B-trees: Variations (1)

    Unfortunately, there are a lot of slight variations of ourdefinition of a B-tree around

    Some people define the order of a B-tree to be the maximalnumber of keys in a node

    In this case, a node in a B-tree of order M stores up to Mkeys and has up to M+1 children (which is one child morethan according to our definition)

    Some people define the order of a B-tree to be the minimalnumber of keys in a non-root node

    In this case, a node in a B-tree of order M stores between Mand 2M keys and has up to 2M+1 children (which is morethan twice the number of children we have)

    . – p.74/87

  • B-trees: Variations (2)

    Note that in a B-tree of order 3 or in a B-tree of order 4 theminimal number of children of a node is only 2

    Thus, such a tree could degenerate to a binary search tree

    To prevent this, we can increase the minimal number ofchildren of a node

    B∗-trees are very much like ordinary B-trees, except thateach non-root node must be at least 2/3 full (instead of 1/2full), that is,

    every node, except for the root and the leaves,has ≥ (2M−1)/3 childrenthe root has at least 2 and at most 2d(2M−2)/3e+1children

    . – p.75/87

    B-trees: Variations (3)

    Usually, some additional data is stored in the B-tree togetherwith the keys, e.g. the address of the object identified by thekey

    B+-trees are a variant of B-trees where all keys and theirassociated data are stored in the leaf nodes

    Some of the keys, but not their data, are duplicated in theupper levels of the B-tree

    In this case, the maximal number of keys stored in a leafmight be considerably smaller than the maximal number ofkeys in non-leaf nodes, due to the size of the associated data

    On the other hand, the order of a B+-trees could be muchhigher than that of a B-tree, since the non-leaf nodes do notcontain any associated data at all (leaving space for morekeys)

    . – p.76/87

    Searching in a B-tree

    If we want to find the node storing a given key o in a B-tree T, thenwe proceed as follows:

    1. If the B-tree T is empty, then we have failed to find o

    2. Otherwise, the root R of T contains keys K1, . . . , Kn and hassubtrees T0, . . . , TnWe search linearly through the keys for o until one of thefollowing is true:

    If we find o then we return RIf we encounter the first key Ki+1 greater than o then wesearch for o in the subtree Ti (by recursively applying thesame algorithm)if o is greater than Kn then search for o in Tn (again,recursively)

    . – p.77/87

    B-trees: Search example

    We search for the number 32 in the B-tree below:

    L0K1L1K2L2K3L323

    L0K1L1K2L2K3L312 19

    L0K1L1K2L2K3L329 36

    K1 K2 K36 8 9

    K1 K2 K313 16

    K1 K2 K320 21 22

    K1 K2 K324 25 27

    K1 K2 K330 32

    K1 K2 K341 56 91

    . – p.78/87

  • Inserting in a B-tree: Algorithm

    Similar to inserting in a binary search tree, to insert a key ointo a B-tree T, we first have to locate the correct position foro

    For this we basically perform a search for o

    This search can have three outcomes:

    (1) The search fails because T is empty; create a new B-tree node storing just o

    (2) The search succeeds because T already stores o; leave T unchanged

    (3) The search fails on a non-empty T; we try to insert o into the leaf node visited

    last by during our search

    . – p.79/87

    Inserting in a B-tree (1)

    We want to insert the number 32 in the B-tree of order 3 below:

    L0K1L1K2L223

    L0K1L1K2L212

    L0K1L1K2L229

    . – p.80/87

    Inserting in a B-tree (1)

    We want to insert the number 32 in the B-tree of order 3 below:

    L0K1L1K2L223

    L0K1L1K2L212

    L0K1L1K2L229

    . – p.80/87

    Inserting in a B-tree (1)

    We want to insert the number 32 in the B-tree of order 3 below:

    L0K1L1K2L223

    L0K1L1K2L212

    L0K1L1K2L229 32

    . – p.80/87

  • Inserting in a B-tree (2)

    We want to insert the number 10 in the B-tree of order 3 below:

    L0K1L1K2L223

    L0K1L1K2L212

    L0K1L1K2L229 32

    . – p.81/87

    Inserting in a B-tree (2)

    We want to insert the number 10 in the B-tree of order 3 below:

    L0K1L1K2L223

    L0K1L1K2L212

    L0K1L1K2L229 32

    . – p.81/87

    Inserting in a B-tree (2)

    We want to insert the number 10 in the B-tree of order 3 below:

    L0K1L1K2L223

    L0K1L1K2L210 12

    L0K1L1K2L229 32

    . – p.81/87

    Inserting in a B-tree (3)

    We want to insert the number 14 in the B-tree of order 3 below:

    L0K1L1K2L223

    L0K1L1K2L210 12

    L0K1L1K2L229 32

    . – p.82/87

  • Inserting in a B-tree (3)

    We want to insert the number 14 in the B-tree of order 3 below:

    L0K1L1K2L223

    L0K1L1K2L210 12

    L0K1L1K2L229 32

    The node into which we want to insert 14 is already full (overflow)

    The only option is to split the node at the median key (that’s element

    dM/2e) into a pair of siblings and move the median key 12 up into

    the parent node

    . – p.82/87

    Inserting in a B-tree (3)

    We want to insert the number 14 in the B-tree of order 3 below:

    L0K1L1K2L223

    L0K1L1K2L210

    L0K1L1K2L214

    12L0K1L1K2L2

    29 32

    The node into which we want to insert 14 is already full (overflow)

    The only option is to split the node at the median key (that’s element

    dM/2e) into a pair of siblings and move the median key 12 up into

    the parent node

    . – p.82/87

    Inserting in a B-tree (3)

    We want to insert the number 14 in the B-tree of order 3 below:

    L0K1L1K2L212 23

    L0K1L1K2L210

    L0K1L1K2L214

    L0K1L1K2L229 32

    The node into which we want to insert 14 is already full (overflow)

    The only option is to split the node at the median key (that’s element

    dM/2e) into a pair of siblings and move the median key 12 up into

    the parent node

    . – p.82/87

    Inserting in a B-tree (4)

    We want to insert the number 15 in the B-tree of order 3 below:

    L0K1L1K2L212 23

    L0K1L1K2L210

    L0K1L1K2L214 16

    L0K1L1K2L229 32

    . – p.83/87

  • Inserting in a B-tree (4)

    We want to insert the number 15 in the B-tree of order 3 below:

    L0K1L1K2L212 23

    L0K1L1K2L210

    L0K1L1K2L214 16

    L0K1L1K2L229 32

    The node into which we want to insert 15 is already full (overflow)

    The only option is to split the node at the median key (that’s element

    dM/2e) into a pair of siblings and move the median key 15 up into

    the parent node. – p.83/87

    Inserting in a B-tree (4)

    We want to insert the number 15 in the B-tree of order 3 below:

    L0K1L1K2L212 23

    L0K1L1K2L210

    L0K1L1K2L214

    L0K1L1K2L216

    15L0K1L1K2L2

    29 32

    But the parent node into which we want to insert 15 is also full

    So, we have split the parent node in the same way into a pair of

    siblings and move the median key 15 up into a new root node

    . – p.83/87

    Inserting in a B-tree (4)

    We want to insert the number 15 in the B-tree of order 3 below:

    L0K1L1K2L212

    L0K1L1K2L223

    L0K1L1K2L210

    L0K1L1K2L214

    L0K1L1K2L216

    15

    L0K1L1K2L229 32

    But the parent node into which we want to insert 15 is also full

    So, we have split the parent node in the same way into a pair of

    siblings and move the median key 15 up into a new root node

    . – p.83/87

    Inserting in a B-tree (4)

    We want to insert the number 15 in the B-tree of order 3 below:L0K1L1K2L2

    15

    L0K1L1K2L212

    L0K1L1K2L223

    L0K1L1K2L210

    L0K1L1K2L214

    L0K1L1K2L216

    L0K1L1K2L229 32

    But the parent node into which we want to insert 15 is also full

    So, we have split the parent node in the same way into a pair of

    siblings and move the median key 15 up into a new root node

    . – p.83/87

  • Inserting in a B-tree: Remark

    The last example also shows that our algorithm for inserting in aB-tree does not always lead to B-trees of minimal height, that is,B-trees that would be optimal

    The B-tree below stores the same numbers as the B-tree we havejust constructed, but only has height 2

    L0K1L1K2L214 23

    L0K1L1K2L210 12

    L0K1L1K2L215 16

    L0K1L1K2L229 32

    However, this B-tree would be more expensive to construct

    . – p.84/87

    Deleting in a B-tree: Algorithm

    To delete a key o from a B-tree T we again start by locating oin T

    This search can have three outcomes:

    (1) The search fails because o does not occur in T; terminate, possibly with an error message

    (2) The key o is found in a leaf node N; remove o from N; if N has underflowed, re-

    stock N(3) The key o is found in a non-leaf node N

    ; replace o in N by the minimal key in the sub-tree of N to the right of o; if the leaf node Lfrom which take the minimal key underflows,restock L

    Restocking a node basically means moving keys over from asibling node or merging nodes

    . – p.85/87

    Deleting in a B-tree (1)

    We want to delete the number 12 in the B-tree of order 3 below:

    L0K1L1K2L223

    L0K1L1K2L212

    L0K1L1K2L229 32

    . – p.86/87

    Deleting in a B-tree (1)

    We want to delete the number 12 in the B-tree of order 3 below:

    L0K1L1K2L223

    L0K1L1K2L212

    L0K1L1K2L229 32

    If we would simply remove 12 from the leaf, then the leaf would nolonger store the required minimal number of keys

    Instead we move keys over from the sibling

    . – p.86/87

  • Deleting in a B-tree (1)

    We want to delete the number 12 in the B-tree of order 3 below:

    L0K1L1K2L229

    L0K1L1K2L223

    L0K1L1K2L232

    If we would simply remove 12 from the leaf, then the leaf would nolonger store the required minimal number of keys

    Instead we move keys over from the sibling

    . – p.86/87

    Deleting in a B-tree (2)

    We want to delete the number 23 in the B-tree of order 3 below:

    L0K1L1K2L223

    L0K1L1K2L212 15

    L0K1L1K2L229

    . – p.87/87

    Deleting in a B-tree (2)

    We want to delete the number 23 in the B-tree of order 3 below:

    L0K1L1K2L223

    L0K1L1K2L212 15

    L0K1L1K2L229

    If we would simply remove 23 from the root, then the root would nolonger store the required minimal number of keys

    Instead we replace it by the minimal key 29 from the right subtree

    . – p.87/87

    Deleting in a B-tree (2)

    We want to delete the number 23 in the B-tree of order 3 below:

    L0K1L1K2L229

    L0K1L1K2L212 15

    L0K1L1K2L2

    But now we have an underflow in the right leaf

    So, we need to move over keys from the sibling of the right leaf

    . – p.87/87

  • Deleting in a B-tree (2)

    We want to delete the number 23 in the B-tree of order 3 below:

    L0K1L1K2L215

    L0K1L1K2L212

    L0K1L1K2L229

    But now we have an underflow in the right leaf

    So, we need to move over keys from the sibling of the right leaf

    . – p.87/87

    TopicsTreesTrees: Recursive definitionTrees: Non-recursive definition (1)Trees: Non-recursive definition (2)Trees: Non-recursive definition (3)Trees: Non-recursive definition (4)Binary treesMore precise definitionBinary tree implementation: NodesBinary trees: Node implementation (1)Binary trees: Node implementation (2)Binary trees: Implementation (1)Binary tree traversal (1)Binary tree traversal (2)Binary tree traversal: ExamplesPreorder traversal: CodeInorder traversal: CodePostorder traversal: CodeConstructing a tree: ExampleBinary tree traversal: ExampleTopicsBinary treesSearching in a binary treeHeight of a binary treeBinary search treesBinary search trees: ExamplesSearching in a binary search treeEfficiency (1)Efficiency (2)Balanced binary search treesAVL treesAVL trees: ExamplesInserting in an AVL tree: Single rotationSingle rotation to the right, general ruleSingle rotation, general ruleSingle rotation, general ruleSingle rotation, general ruleSingle rotation, general ruleSingle rotation, general ruleSingle rotation, general rulelarge Inserting in an AVL tree: Double rotationDouble rotation: left-right, general ruleDouble rotation: left-right, general ruleDouble rotation: left-right, general ruleDouble rotation: left-right, general ruleDouble rotation: left-right, general ruleDouble rotation: left-right, general ruleDouble rotation: left-right, general ruleDouble rotation: left-right, general ruleDouble rotation: left-right, general ruleDouble rotation: left-right, general ruleDouble rotation: left-right, general ruleDouble rotation: left-right, general ruleMissing cases ...Using binary search treesExample: Train departure informationTrain departure information: Idealarge Train departure information: The code (1)large Train departure information: The code (2)large Train departure information: The code (3)large Train departure information: The code (4)Train departure information: ScriptAnnouncementsTopicslarge Searching for data: Storage considerations (1)large Searching for data: Storage considerations (2)B-treesB-trees: RemarksB-trees: ExampleB-trees: Variations (1)B-trees: Variations (2)B-trees: Variations (3)Searching in a B-treeB-trees: Search exampleInserting in a B-tree: AlgorithmInserting in a B-tree (1)Inserting in a B-tree (2)Inserting in a B-tree (3)Inserting in a B-tree (4)Inserting in a B-tree: RemarkDeleting in a B-tree: AlgorithmDeleting in a B-tree (1)Deleting in a B-tree (2)