University of Liverpool - Computer Science Intranetmichele/TEACHING/COMP102/2006/...AVL trees...

Data Structures and Information SystemsPart 1: Data Structures

Michele Zito

Lecture 11: Trees

. – p.1/87

Topics

Trees

Binary trees

Binary tree implementation

Binary tree traversal

. – p.2/87

Trees

1-dimensional arrays, linked lists, stacks and queues arelinear data structures, that is, sequences of data items

We have also seen 2-dimensional arrays which organisedata items into a rigid 2-dimensional structure

However, in a number of applications, data items are notorganised in this way:

directory structure of a file store

structure of arithmetic expressions

the part-whole relationship in physical objects

hierarchy of an organisation

Appropriate data structures for these applications are trees

. – p.3/87

Trees: Recursive definition

A tree T of order n is either empty or consists of a nodecalled root of T together with at most n trees, called thesubtrees of T

We usually depict a non-empty tree T as a set of nodesconnected by directed links, with the root of T at the top:

A tree oforder 3

A tree oforder 2

. – p.4/87

Trees: Non-recursive definition (1)

Trees are a special kind of directed graph

A directed graph is a (finite) set of nodes connected bydirected links called edges

a

b c

d e

g

A path in a directed graph is a list of nodes n0,. . . ,nk suchthat for each successive pair (ni,ni+1) there is a directed linkfrom ni to ni+1E.g. a,c,e in the graph above

. – p.5/87


Trees are a special kind of directed graph

A directed graph is a (finite) set of nodes connected bydirected links called edges

a

b c

d e

ga

c

e

A path in a directed graph is a list of nodes n0,. . . ,nk suchthat for each successive pair (ni,ni+1) there is a directed linkfrom ni to ni+1E.g. a,c,e in the graph above

. – p.5/87


A cycle in a directed graph is a path that starts and ends withthe same nodeE.g. a,c,e,a in the graph below

a

b c

d e

g

A directed graph is cyclic if it contains at least one cycle,otherwise it is acyclicObviously, the graph above is cyclic

. – p.6/87


A directed graph is said to fan-in at node n if there are twodistinct nodes n1 and n2 with edges from n1 to n and n2 to nE.g. the graph below fans-in at node d

a

c

e

b

d

g

A root of a directed graph is a node with no edges leading toitE.g. the nodes a and g are both roots in the graph above

. – p.7/87


A tree is a directed graph G such that

the graph G is acyclicthe graph G has exactly one rootthe graph G does not fan-in at any node

The order of the tree is the maximal number of edges leavingfrom a node in G

a

c

e

b

d

g a

c

e

b

d

not a tree a tree of order 2

. – p.8/87

Binary trees

A tree of order 2 is called a binary tree

In the following, we will concentrate on binary trees, so whenwe will refer to a tree here we mean a binary tree

In a binary tree each node is connected to at most two othernodes which are called the left and the right child (one orboth can be non-existent)

A node with no children is called a leaf

root

left child right childleaf

leaf

. – p.9/87

More precise definition

A binary tree T is either empty or consists of a node called rootof T together with at most two trees, called the left and rightsubtree of T, together with the following operations:

Create an empty tree

Test whether a tree is empty (isEmpty)

Insert a new node

Delete a node

Retrieve the data item stored in the root of a non-empty tree

Retrieve the left or right subtree of a non-empty tree

List (traverse) the nodes of the tree

. – p.10/87

Binary tree implementation: Nodes

Each node N in our implementation of binary trees has a data partand two references, left and right, referring to the roots of the leftand right subtrees of N

In a leaf, the left and right reference will refer to null

left data right

left data right left data right

left data right left data right left data right

null null

null

null null null null

. – p.11/87

Binary trees: Node implementation (1)

p u b l i c c lass TreeNode {pro tec ted Object data ; / / Data stored i n nodepro tec ted TreeNode l e f t ; / / L ink to l e f t subtreepro tec ted TreeNode r i g h t ; / / L ink to r i g h t subtree

/ / Const ructorsp u b l i c TreeNode ( Object o , TreeNode l , TreeNode r ) {

data = o ; / / Set datal e f t = l ; / / Set re ference to l e f t subtreer i g h t = r ; / / Set re ference to r i g h t subtree

}

p u b l i c TreeNode ( i n t number , TreeNode l , TreeNode r ) {data = new In tege r ( number ) ;l e f t = l ;r i g h t = r ;

}

/ / Accessors and Mutators}

. – p.12/87

Binary trees: Node implementation (2)

/ / Accessorsp u b l i c Object getData ( ) { r e t u r n ( data ) ; }p u b l i c i n t ge t In tDa ta ( ) {

r e t u r n ( ( ( I n tege r ) data ) . i n tVa lue ( ) ) ;}

p u b l i c TreeNode getLe f tSubt ree ( ) { r e t u r n ( l e f t ) ; }p u b l i c TreeNode getRightSubtree ( ) { r e t u r n ( r i g h t ) ; }

/ / Mutatorsp u b l i c vo id setData ( Object o ) { data = o ; }p u b l i c vo id se t In tDa ta ( i n t number ) {

data = new In tege r ( number ) ;}

p u b l i c vo id se tLe f tSub t ree ( TreeNode N ) { l e f t = N ; }p u b l i c vo id setRightSubt ree ( TreeNode N ) { r i g h t = N ; }

. – p.13/87

Binary trees: Implementation (1)In order to deal with illegal operations on empty trees we usejava.lang.NullPointerException

impor t java . lang . Nu l lPo in te rExcep t ion ;p u b l i c c lass BinaryTree {

pro tec ted TreeNode roo t ;

/ / Const ructorsp u b l i c BinaryTree ( ) { r oo t = n u l l ; }

p u b l i c BinaryTree ( TreeNode N ) { roo t = N ; }

p u b l i c BinaryTree ( Object o ) {r oo t = new TreeNode ( o , n u l l , n u l l ) ;

}

p u b l i c boolean isEmpty ( ) { r e t u r n ( roo t == n u l l ) ; }

/ / Other methods}

. – p.14/87

Binary tree traversal (1)

A common task we have considered for all our datastructures is to access all the data items stored in a datastructure one at a time (traversal)

In a linear structure like an array or a linked list there are twonatural traversals:

from front to end (first to last data item)from end to front (last to first data item)

Also there are natural solutions for how this can be done

in an array we use a for-loopin a linked list we use a while-loop

. – p.15/87

Binary tree traversal (2)

Since binary trees are non-linear data structures, there aremore possibilitiesWe consider three ways of traversing a binary tree

Preorder: first access the data of the root node, thentraverse the left subtree, finally traverse the right subtree

Inorder : first traverse the left subtree, then access the dataof the root node, finally traverse the right subtree

Postorder: first traverse left subtree, then traverse the rightsubtree, finally access the data of the root node

Notice that the prefix on the word order indicates when thedata of the root node is accessedThe standard solution for how these different possibilities areimplemented is recursion

. – p.16/87

Binary tree traversal: Examples

Given the following binary tree, we have:

A

B C

D E

H K

G

Preorder (Data,L,R)

A B D E H K C G

Inorder (L,Data,R)

D B H E K A C G

Postorder (L,R,Data)

D H K E B G C A

. – p.17/87

Preorder traversal: Code

We can use preorder traversal to construct a string representationof a binary treeTo this end we extend the BinaryTree class by:

p u b l i c S t r i n g p reo rde rS t r i ng ( ) {i f ( ! isEmpty ( ) )

r e t u r n ( roo t . p reo rde rS t r i ng ( ) )e lse r e t u r n (new S t r i n g ( " " ) ) ;

}

and the TreeNode class by:

p u b l i c S t r i n g p reo rde rS t r i ng ( ) {S t r i n g st r ingRep = new S t r i n g ( " " ) ;s t r ingRep += data + " " ;i f ( l e f t ! = n u l l ) s t r ingRep += l e f t . p reo rde rS t r i ng ( ) ;i f ( r i g h t ! = n u l l ) s t r ingRep += r i g h t . p reo rde rS t r i ng ( ) ;r e t u r n ( st r ingRep ) ;

}

. – p.18/87

Inorder traversal: Code

We can use inorder traversal to construct a string representationof a binary treeTo this end we extend the BinaryTree class by:

p u b l i c S t r i n g i n o r d e r S t r i n g ( ) {i f ( ! isEmpty ( ) )

r e t u r n ( roo t . i n o r d e r S t r i n g ( ) )e lse r e t u r n (new S t r i n g ( " " ) ) ;

}

and the TreeNode class by:

p u b l i c S t r i n g i n o r d e r S t r i n g ( ) {S t r i n g st r ingRep = new S t r i n g ( " " ) ;i f ( l e f t ! = n u l l ) s t r ingRep += l e f t . i n o r d e r S t r i n g ( ) ;s t r ingRep += data + " " ;i f ( r i g h t ! = n u l l ) s t r ingRep += r i g h t . i n o r d e r S t r i n g ( ) ;r e t u r n ( st r ingRep ) ;

}

. – p.19/87

Postorder traversal: Code

We can use postorder traversal to construct a stringrepresentation of a binary treeTo this end we extend the BinaryTree class by:

p u b l i c S t r i n g pos to rde rS t r i ng ( ) {i f ( ! isEmpty ( ) )

r e t u r n ( roo t . pos to rde rS t r i ng ( ) )e lse r e t u r n (new S t r i n g ( " " ) ) ;

}

and the TreeNode class by:p u b l i c S t r i n g pos to rde rS t r i ng ( ) {

S t r i n g st r ingRep = new S t r i n g ( " " ) ;i f ( l e f t ! = n u l l ) s t r ingRep += l e f t . pos to rde rS t r i ng ( ) ;i f ( r i g h t ! = n u l l ) s t r ingRep += r i g h t . pos to rde rS t r i ng ( ) ;s t r ingRep += data + " " ;r e t u r n ( st r ingRep ) ;

}

. – p.20/87

Constructing a tree: Example

Below is an example program that constructs the tree on slide 17and uses the methods preorderString, inorderString,postorderString to print out the tree:

p u b l i c c lass TestBinaryTree {p u b l i c s t a t i c vo id main ( S t r i n g args [ ] ) {TreeNode nH = new TreeNode ( "H" , n u l l , n u l l ) ;TreeNode nK = new TreeNode ( " K " , n u l l , n u l l ) ;TreeNode nE = new TreeNode ( " E " , nH , nK ) ;TreeNode nD = new TreeNode ( "D" , n u l l , n u l l ) ;TreeNode nB = new TreeNode ( " B " , nD , nE ) ;TreeNode nG = new TreeNode ( "G" , n u l l , n u l l ) ;TreeNode nC = new TreeNode ( "C" , n u l l ,nG ) ;TreeNode nA = new TreeNode ( " A " , nB ,nC ) ;BinaryTree myTree = new BinaryTree (nA ) ;System . out . p r i n t l n ( " Preorder : " + myTree . p reo rde rS t r i ng ( ) ) ;System . out . p r i n t l n ( " Ino rder : " + myTree . i n o r d e r S t r i n g ( ) ) ;System . out . p r i n t l n ( " Postorder : " + myTree . pos to rde rS t r i ng ( ) ) ;

} }

. – p.21/87

Binary tree traversal: Example

The output of our example program is as follows:

Preorder : A B D E H K C GInorde r : D B H E K A C GPostorder : D H K E B G C A

Compare the output or the program to the preorder, inorder, and

postorder traversals we have computed on page 17

. – p.22/87


Michele Zito

Lecture 12: Efficient search: AVL trees

. – p.23/87

Topics

Searching in a binary tree

Binary search trees

AVL trees

Inserting in an AVL tree

Example: Train departure information

. – p.24/87

Binary trees

A binary tree T is either empty or consists of a node calledroot of T together with at most two trees, called the left andright subtrees of TIn a binary tree each node is connected to at most two othernodes which are called the left and the right child (one orboth can be non-existent)A node with no children is called a leaf

level 1

level 2

level 3

root

left child right childleaf

leafThe root of a tree is at level 1, the children of the root are atlevel 2, etc.

. – p.25/87

Searching in a binary tree

Consider the operation of searching for a given object o in abinary tree

The only way to do that is

Traverse the binary tree using either preorder,inorder, or postorder traversal, comparing the dataitem stored in each node with the given object o untilthe object is found or all nodes have been visitedIf all nodes have been visited, but the given object ohas not been found, then it is not stored in the tree

In the worst case we may visit all nodes of a tree (withoutfinding o!)

How can we improve on this???

. – p.26/87

Height of a binary tree

The height of an empty binary tree is 0

The height of a non-empty binary tree T with left subtree Tland right subtree Tr is

height(T) = 1 + max( height(Tl), height(Tr) )

Examples:

12

34 6

42 6

Tree of height 3 Tree of height 2

Alternatively, the height of a tree T is the maximal number ofnodes passed from the root to a leaf in the tree T

. – p.27/87

Binary search trees

Assume that we are given a collection of data items togetherwith an ordering

A binary search tree is a binary tree that it is either empty orin which each node contains a data item that satisfies theconditions:

All data items (if any) in the left subtree of the root aresmaller than the data item in the root

The data item in the root is smaller than all data items (ifany) in the right subtree

The left and right subtrees of the root are again binarysearch trees

. – p.28/87

Binary search trees: Examples

5

4 6

3 8

5

4 8

2 7

a binary search tree not a binary search tree

12

34

5

12

346

5

a binary search tree not a binary search tree

. – p.29/87

Searching in a binary search tree

If we want to find the node storing a given data item o in abinary search tree T, then we proceed as follows:

1. If the tree T is empty, then we have failed to find o2. Otherwise, compare o with the data item o’ stored in the

root R of TIf o and o’ are identical, then return Rif o is smaller than o’ then search for o in the leftsubtree of R (by recursively applying the samealgorithm)if o is greater than o’ then search for o in the rightsubtree of R (again, recursively)

. – p.30/87

Efficiency (1)

Question: Is the new search method on binary search treesbetter than the old search method on binary (search) trees?Consider the following two binary search trees:

1

2

3

4

5

6

7

12

34

56

7

In the left tree, the new search needs three accesses to dataitems to find the number 7 (the old search needs seven)In the right tree, the new search needs seven accesses todata items to find the number 7 (as does the old search)

. – p.31/87

Efficiency (2)

Answer: In a binary search tree of height h, the new searchmethod needs at most h accesses; it only performs betterthan the old search method if h is (much) smaller than thenumber of nodes n in the treeQuestion: What distinguishes binary search trees on whichour new search method is fast from those on which it is not?Answer: The binary search tree should be balanced!

A binary tree of height h is balanced if all nodes at levels1, . . . , h−2 have two children, while nodes at level h−1 canhave zero, one, or two children(Be careful: There are other definitions around!)

. – p.32/87

Balanced binary search trees

One more question has to be answered:

How do we construct balanced binary search trees?

Note that with our current insert method for binary searchtrees the shape/height of a binary search tree depends onthe order in which items are inserted

Inserting 1,2,3 Inserting 2,1,3

12

32

1 3

It turns out that constructing balanced trees is difficultInstead we focus on height-balanced trees

. – p.33/87

AVL trees

A binary tree is called height-balanced if its subtrees differ inheight by no more than one and the subtrees are againheight-balanced

An AVL tree is a binary search tree in which the heights ofthe left and right subtrees of the root differ by at most 1 andin which the left and right subtrees are again AVL trees

Obviously, AVL trees are height-balanced

AVL trees are named after their inventors:

G. M. Adelson-Velskii and E. M. Landis

. – p.34/87

AVL trees: Examples

5

4 6

3 8

12

453

6

an AVL tree not an AVL tree

7

4 8

2 5

5

4 8

2 7

an AVL tree not an AVL tree

. – p.35/87

Inserting in an AVL tree: Single rotation

Our insert method does not guarantee that we get abalanced tree nor does it guarantee that we get an AVL tree

The idea underlying an insert method for AVL trees that willalways produce an AVL tree, is that we have to rotate nodesE.g. insert the number 6 into the AVL tree below

7

84

5

. – p.36/87




7

84

5

6

. – p.36/87




7

8

4

5

6

. – p.36/87

Single rotation to the right, general rule

1. Assume the root of the tree is called N and that the sub-treerooted at N’s left child (call such a node C) is one unit higherthan the other sub-tree§

2. insert an element so that the height of the left sub-tree rootedat C grows: this results in an unbalanced tree.

3. To rebalance the tree, set the root to be C and C’s right childto be N, and ...

4. then (to make sure the new tree is a binary search tree) setthe left sub-tree of the tree rooted at N to be T2.

The resulting tree is now completely balanced (pictures on thefollowing slides explain the situation).

§ If the two sub-trees have the same height no unbalance can occur!. – p.37/87

Single rotation, general rule

N

C

T3

T T1 2

. – p.38/87


N

C

T3

N

C

T

T

3

T1 2T T1 2

. – p.39/87


N

C

T3

N

C

T

T

3

T1 2T T1 2

. – p.40/87


N

C

T3

N

C

T

T

3

T1 2T T1 2

. – p.41/87


33T

N

CC

T

N

T1 2T T1 2 T

. – p.42/87


33T

N

T

C

T1 2T T1 2 T

. – p.43/87

Inserting in an AVL tree: Double rotation

If after a single rotation the tree is still not an AVL tree, thenwe perform a double rotation in the opposite direction

E.g. insert the number 6 into the AVL tree below

7

8

3

4

5

. – p.44/87




7

8

3

4

5

6

. – p.44/87




4

3 7

5 8

6

. – p.44/87




5

4

3

7

86

Single and double rotations are the basic building blocks ofinserting into an AVL tree

. – p.44/87

Double rotation: left-right, general rule

1. insert an element so that the height of the right sub-treerooted at C grows due to the expansion of the left sub-treerooted at C’s right child G!

2. To rebalance the tree, we first perform a rotation to the left inthe tree rooted at node C, and then ...

3. since the intermediate tree is still not balanced we perform aright rotation of the nodes C, G, and N (see pictures).

The resulting tree is an AVL tree.

. – p.45/87


C

T1 T

G

2 T3

T4

N

. – p.46/87


C

T1 T

G

2 T3

T4

N

. – p.47/87


C

T1 T

G

2 T3

T4

N

. – p.48/87


C

T1 T

G

2 T3

T4

N

. – p.49/87


C

T1 T

G

2 T3

T4

N

. – p.50/87


N

G

T

C

2 T3

T4

T1

. – p.51/87


N

G

T

C

2 T3

T4

T1

. – p.52/87


N

C

T

G

2 T3

T4

T1

. – p.53/87


G

T

N

3

T4T2T1

C

. – p.54/87


N

T

C

G

3

T2T1

T4

. – p.55/87


3T

N

2T1

C

G

T4

T

. – p.56/87

Missing cases ...

The case of a single rotation to the left or a double right-leftrotations have NOT been described.

The first one is quite similar to the case of a single rotation tothe right (but the starting tree is the one that is symmetric ofthe one on slide 38, with respect to its root N).

The second one is again quite similar to the case of a doubleleft-right rotation.

It is a very useful exercise to draw on a piece of paper with apencil (and an eraser) the starting situation in each case andthen simulate the processing that takes place.

. – p.57/87

Using binary search trees

Our examples for binary search trees so far have only usednumbers and characters as data items

However, we can use instances of any class that implementsthe Comparable interface

That is, the class has to implement a method

p u b l i c i n t compareTo ( Object o ther )

which compares two objects

In comparing two objects, compareTo does not have to takeinto account all information about the two objects

E.g. if our data items are student records then it is enough tocompare student ids (since they uniquely identify a student)The information that is used is called the key of a data item

. – p.58/87

Example: Train departure information

We want to develop yet another implementation of ourinformation system for train departure information atLiverpool Lime Street Station

The information is given in form of a table

Time Destination Track9:10 Manchester 69:15 Leeds 109:20 Nottingham 7

The only kind of query we deal with is ‘When and wheredoes the next train to X leave?’

The implementation should use the binary search tree datastructure

. – p.59/87

Train departure information: Idea

We have to come up with a representation of train departureinformation in a form that allows us to use a binary searchtree

Since in our queries we are looking for a city, this should bethe key

The information on all trains going to a particular city is thenstored in a singly linked list of TDInfo objects

city

infoshead data next

timecity

track

data next

timecity

track

null

. – p.60/87

Train departure information: The code (1)

The data structure that was just described is implemented in classTDData (note how compareTo is implemented)

p u b l i c c lass TDData implements Comparable {pro tec ted S t r i n g c i t y ;p ro tec ted L inkedL i s t i n f o s ;

p u b l i c TDData ( S t r i n g T c i t y ) {c i t y = T c i t y ; i n f o s = new L inkedL i s t ( ) ; }

p u b l i c S t r i n g ge tC i t y ( ) { r e t u r n ( c i t y ) ; }p u b l i c L inkedL i s t ge t I n fos ( ) { r e t u r n ( i n f o s ) ; }

p u b l i c i n t compareTo ( Object o ther ) {TDData tdd = ( TDData ) o ther ;r e t u r n ( t h i s . ge tC i t y ( ) . compareTo ( tdd . ge tC i t y ( ) ) ) ;

}

p u b l i c vo id addLast ( Object data ) { i n f o s . addLast ( data ) ; }}

. – p.61/87


Next we construct the binary search tree storing the traindeparture informationThe root of the tree stores information on trains to Manchester, theleft child on trains to Leeds, and the right child on trains toNottingham

c lass TDIBST {s t a t i c f i n a l BSTree t d i = new BSTree ( ) ;s t a t i c {

TDData toManchester = new TDData ( " Manchester " ) ;toManchester . addLast (new TDInfo ( " 9 : 1 0 " , " " , 6 ) ) ;TDData toLeeds = new TDData ( " Leeds " ) ;toLeeds . addLast (new TDInfo ( " 9 : 1 5 " , " " , 1 0 ) ) ;TDData toNott ingham = new TDData ( " Nottingham " ) ;toNott ingham . addLast (new TDInfo ( " 9 : 2 0 " , " " , 7 ) ) ;t d i . i n s e r t ( toManchester ) ;t d i . i n s e r t ( toLeeds ) ;t d i . i n s e r t ( toNott ingham ) ;

}

. – p.62/87


Implementing the searchDest method which returns departuretime and track of the next train leaving for a given destination isstraightforward using the search method for binary search treesHowever, retrieving the required information from a node returnedby the search method is a bit tedious

p u b l i c s t a t i c S t r i n g [ ] searchDest ( S t r i n g dest ) {S t r i n g [ ] r e s u l t = n u l l ;TDData tdDest = new TDData ( dest ) ;BSTNode node = t d i . search ( tdDest ) ;i f ( node == n u l l )

r e t u r n r e s u l t ;TDData data = ( TDData ) node . getData ( ) ;TDInfo t r a i n = ( TDInfo ) data . ge t I n fos ( ) . getHead ( ) . getData ( ) ;r e s u l t = new S t r i n g [ 2 ] ;r e s u l t [ 0 ] = t r a i n . getTime ( ) ;r e s u l t [ 1 ] = S t r i n g . valueOf ( t r a i n . getTrack ( ) ) ;r e t u r n r e s u l t ;

}

. – p.63/87


Finally, here is our main program that asks the user where hewants to go, uses searchDest to find the required information,and prints it outNote that the main program is identical to the one in TDIDLLexcept that we use TDIBST instead of TDIDLL

p u b l i c s t a t i c vo id main ( S t r i n g args [ ] ) throws IOExcept ion{InputStreamReader inpu t = new InputStreamReader ( System . i n ) ;BufferedReader keyboardInput = new BufferedReader ( i npu t ) ;System . out . p r i n t l n ( " Where do you want to go ? " ) ;S t r i n g [ ] i n f o = TDIBST . searchDest ( keyboardInput . readLine ( ) ) ;i f ( i n f o == n u l l )System . out . p r i n t l n ( " Sorry , no t r a i n to t h i s d e s t i n a t i o n " ) ;

e lseSystem . out . p r i n t l n ( i n f o [ 0 ] + " depar t ing a t t r a ck " + i n f o [ 1 ] ) ;

} }

. – p.64/87

Train departure information: ScriptHere are two runs of the program:

Information successfully retrieved

Where do you want to go?Nottingham9:20 departing at track 7

No information found

Where do you want to go?LondonSorry, no train to this destination

. – p.65/87

Announcements

FOURTH ASSIGNMENT DEADLINE:

Wednesday, February 22th, 14.30, LAB 7 Boxes

(late submissions to be left in the wooden box next to my office door).

SIGN the PLAGIARISM DECLARATION!!

EXTRA TUTORIALS:

Additional lecture (with question & answer sessions) on MONDAY

FEBRUARY 27th (Lecture theatre A, 12noon).

BE THERE!

. – p.66/87


Michele Zito

Lecture 13: B-trees

. – p.67/87

Topics

Searching for data: Storage considerations

B-trees

B-trees: Variations

Searching in a B-tree

Inserting in a B-tree

Deleting in a B-tree

. – p.68/87

Searching for data: Storage considerations (1)

When considering how efficiently we can perform searcheson a given data structure, we simply counted the number ofaccessesThis assumes that all accesses are equally expensive

If we consider very large collections of data items then wehave to take into account whether data items are stored inmain memory or on mass storage

Main memory: Access time 10 ns (10−8 seconds)Data transfer rate 1000 MByte/sec

; 10 Byte in 10 nsHard disk: Access time 10 ms (10−2 seconds)

Data transfer rate 50 MByte/sec; 0.5 MByte in 10 ms

. – p.69/87

Searching for data: Storage considerations (2)

How data items are stored in main memory or on massstorage is also importantConsider the following two structures:

left data righto1

left data righto2

left data righto3

link0 data1 link1 data2 link2 data3 link3o2 o1 o3

If both are held on mass storage then the second one ismuch more time efficient for searches

. – p.70/87

B-trees

A B-tree of order M is a tree of order M with the followingproperties:

Every node has at most M subtrees T0, T1, . . .The root is either a leaf or has at least 2 childrenEvery node, except for root and leaves, has at leastdM/2e childrenAll leaves have the same distance to the rootA non-leaf node with n+1 children contains n keysK1 < K2 < · · · < Kn and the following conditions hold:

All keys in T0 are smaller than K1All keys in Tn are greater than KnAll keys in Ti, 0 < i < n, are greater than Ki andsmaller than Ki+1

. – p.71/87

B-trees: Remarks

B-trees were first proposed by Bayer & McCreight in 1972 asa way of maintaining efficient indices for very large databases

Sometimes B-trees are also referred to as balanced M-waymultiway trees

Multiway trees are trees with any number of children for eachnode

M-way trees are trees with no more than M children for eachnode (a notion identical to a tree of order M)

Most notions we have defined for trees carry over to B-trees,e.g. the height of a tree, the level of a node in a tree, etc.

The root of a tree is at level 1, the children of the root are atlevel 2, etc.

. – p.72/87

B-trees: Example

Belows is a B-tree of order 4:

L0K1L1K2L2K3L323

L0K1L1K2L2K3L312 19

L0K1L1K2L2K3L329 36

K1 K2 K36 8 9

K1 K2 K313 16

K1 K2 K320 21 22

K1 K2 K324 25 27

K1 K2 K330 32

K1 K2 K341 56 91

. – p.73/87

B-trees: Variations (1)

Unfortunately, there are a lot of slight variations of ourdefinition of a B-tree around

Some people define the order of a B-tree to be the maximalnumber of keys in a node

In this case, a node in a B-tree of order M stores up to Mkeys and has up to M+1 children (which is one child morethan according to our definition)

Some people define the order of a B-tree to be the minimalnumber of keys in a non-root node

In this case, a node in a B-tree of order M stores between Mand 2M keys and has up to 2M+1 children (which is morethan twice the number of children we have)

. – p.74/87


Note that in a B-tree of order 3 or in a B-tree of order 4 theminimal number of children of a node is only 2

Thus, such a tree could degenerate to a binary search tree

To prevent this, we can increase the minimal number ofchildren of a node

B∗-trees are very much like ordinary B-trees, except thateach non-root node must be at least 2/3 full (instead of 1/2full), that is,

every node, except for the root and the leaves,has ≥ (2M−1)/3 childrenthe root has at least 2 and at most 2d(2M−2)/3e+1children

. – p.75/87


Usually, some additional data is stored in the B-tree togetherwith the keys, e.g. the address of the object identified by thekey

B+-trees are a variant of B-trees where all keys and theirassociated data are stored in the leaf nodes

Some of the keys, but not their data, are duplicated in theupper levels of the B-tree

In this case, the maximal number of keys stored in a leafmight be considerably smaller than the maximal number ofkeys in non-leaf nodes, due to the size of the associated data

On the other hand, the order of a B+-trees could be muchhigher than that of a B-tree, since the non-leaf nodes do notcontain any associated data at all (leaving space for morekeys)

. – p.76/87

Searching in a B-tree

If we want to find the node storing a given key o in a B-tree T, thenwe proceed as follows:

1. If the B-tree T is empty, then we have failed to find o

2. Otherwise, the root R of T contains keys K1, . . . , Kn and hassubtrees T0, . . . , TnWe search linearly through the keys for o until one of thefollowing is true:

If we find o then we return RIf we encounter the first key Ki+1 greater than o then wesearch for o in the subtree Ti (by recursively applying thesame algorithm)if o is greater than Kn then search for o in Tn (again,recursively)

. – p.77/87

B-trees: Search example

We search for the number 32 in the B-tree below:

L0K1L1K2L2K3L323

L0K1L1K2L2K3L312 19

L0K1L1K2L2K3L329 36

K1 K2 K36 8 9

K1 K2 K313 16

K1 K2 K320 21 22

K1 K2 K324 25 27

K1 K2 K330 32

K1 K2 K341 56 91

. – p.78/87

Inserting in a B-tree: Algorithm

Similar to inserting in a binary search tree, to insert a key ointo a B-tree T, we first have to locate the correct position foro

For this we basically perform a search for o

This search can have three outcomes:

(1) The search fails because T is empty; create a new B-tree node storing just o

(2) The search succeeds because T already stores o; leave T unchanged

(3) The search fails on a non-empty T; we try to insert o into the leaf node visited

last by during our search

. – p.79/87

Inserting in a B-tree (1)

We want to insert the number 32 in the B-tree of order 3 below:

L0K1L1K2L223

L0K1L1K2L212

L0K1L1K2L229

. – p.80/87



L0K1L1K2L223

L0K1L1K2L212

L0K1L1K2L229

. – p.80/87



L0K1L1K2L223

L0K1L1K2L212

L0K1L1K2L229 32

. – p.80/87



L0K1L1K2L223

L0K1L1K2L212

L0K1L1K2L229 32

. – p.81/87



L0K1L1K2L223

L0K1L1K2L212

L0K1L1K2L229 32

. – p.81/87



L0K1L1K2L223

L0K1L1K2L210 12

L0K1L1K2L229 32

. – p.81/87



L0K1L1K2L223

L0K1L1K2L210 12

L0K1L1K2L229 32

. – p.82/87



L0K1L1K2L223

L0K1L1K2L210 12

L0K1L1K2L229 32

The node into which we want to insert 14 is already full (overflow)

The only option is to split the node at the median key (that’s element

dM/2e) into a pair of siblings and move the median key 12 up into

the parent node

. – p.82/87



L0K1L1K2L223

L0K1L1K2L210

L0K1L1K2L214

12L0K1L1K2L2

29 32




the parent node

. – p.82/87



L0K1L1K2L212 23

L0K1L1K2L210

L0K1L1K2L214

L0K1L1K2L229 32




the parent node

. – p.82/87



L0K1L1K2L212 23

L0K1L1K2L210

L0K1L1K2L214 16

L0K1L1K2L229 32

. – p.83/87



L0K1L1K2L212 23

L0K1L1K2L210

L0K1L1K2L214 16

L0K1L1K2L229 32




the parent node. – p.83/87



L0K1L1K2L212 23

L0K1L1K2L210

L0K1L1K2L214

L0K1L1K2L216

15L0K1L1K2L2

29 32

But the parent node into which we want to insert 15 is also full

So, we have split the parent node in the same way into a pair of

siblings and move the median key 15 up into a new root node

. – p.83/87



L0K1L1K2L212

L0K1L1K2L223

L0K1L1K2L210

L0K1L1K2L214

L0K1L1K2L216

15

L0K1L1K2L229 32




. – p.83/87


We want to insert the number 15 in the B-tree of order 3 below:L0K1L1K2L2

15

L0K1L1K2L212

L0K1L1K2L223

L0K1L1K2L210

L0K1L1K2L214

L0K1L1K2L216

L0K1L1K2L229 32




. – p.83/87

Inserting in a B-tree: Remark

The last example also shows that our algorithm for inserting in aB-tree does not always lead to B-trees of minimal height, that is,B-trees that would be optimal

The B-tree below stores the same numbers as the B-tree we havejust constructed, but only has height 2

L0K1L1K2L214 23

L0K1L1K2L210 12

L0K1L1K2L215 16

L0K1L1K2L229 32

However, this B-tree would be more expensive to construct

. – p.84/87

Deleting in a B-tree: Algorithm

To delete a key o from a B-tree T we again start by locating oin T

This search can have three outcomes:

(1) The search fails because o does not occur in T; terminate, possibly with an error message

(2) The key o is found in a leaf node N; remove o from N; if N has underflowed, re-

stock N(3) The key o is found in a non-leaf node N

; replace o in N by the minimal key in the sub-tree of N to the right of o; if the leaf node Lfrom which take the minimal key underflows,restock L

Restocking a node basically means moving keys over from asibling node or merging nodes

. – p.85/87

Deleting in a B-tree (1)

We want to delete the number 12 in the B-tree of order 3 below:

L0K1L1K2L223

L0K1L1K2L212

L0K1L1K2L229 32

. – p.86/87



L0K1L1K2L223

L0K1L1K2L212

L0K1L1K2L229 32

If we would simply remove 12 from the leaf, then the leaf would nolonger store the required minimal number of keys

Instead we move keys over from the sibling

. – p.86/87



L0K1L1K2L229

L0K1L1K2L223

L0K1L1K2L232

If we would simply remove 12 from the leaf, then the leaf would nolonger store the required minimal number of keys

Instead we move keys over from the sibling

. – p.86/87



L0K1L1K2L223

L0K1L1K2L212 15

L0K1L1K2L229

. – p.87/87



L0K1L1K2L223

L0K1L1K2L212 15

L0K1L1K2L229

If we would simply remove 23 from the root, then the root would nolonger store the required minimal number of keys

Instead we replace it by the minimal key 29 from the right subtree

. – p.87/87



L0K1L1K2L229

L0K1L1K2L212 15

L0K1L1K2L2

But now we have an underflow in the right leaf

So, we need to move over keys from the sibling of the right leaf

. – p.87/87



L0K1L1K2L215

L0K1L1K2L212

L0K1L1K2L229

But now we have an underflow in the right leaf

So, we need to move over keys from the sibling of the right leaf

. – p.87/87

TopicsTreesTrees: Recursive definitionTrees: Non-recursive definition (1)Trees: Non-recursive definition (2)Trees: Non-recursive definition (3)Trees: Non-recursive definition (4)Binary treesMore precise definitionBinary tree implementation: NodesBinary trees: Node implementation (1)Binary trees: Node implementation (2)Binary trees: Implementation (1)Binary tree traversal (1)Binary tree traversal (2)Binary tree traversal: ExamplesPreorder traversal: CodeInorder traversal: CodePostorder traversal: CodeConstructing a tree: ExampleBinary tree traversal: ExampleTopicsBinary treesSearching in a binary treeHeight of a binary treeBinary search treesBinary search trees: ExamplesSearching in a binary search treeEfficiency (1)Efficiency (2)Balanced binary search treesAVL treesAVL trees: ExamplesInserting in an AVL tree: Single rotationSingle rotation to the right, general ruleSingle rotation, general ruleSingle rotation, general ruleSingle rotation, general ruleSingle rotation, general ruleSingle rotation, general ruleSingle rotation, general rulelarge Inserting in an AVL tree: Double rotationDouble rotation: left-right, general ruleDouble rotation: left-right, general ruleDouble rotation: left-right, general ruleDouble rotation: left-right, general ruleDouble rotation: left-right, general ruleDouble rotation: left-right, general ruleDouble rotation: left-right, general ruleDouble rotation: left-right, general ruleDouble rotation: left-right, general ruleDouble rotation: left-right, general ruleDouble rotation: left-right, general ruleDouble rotation: left-right, general ruleMissing cases ...Using binary search treesExample: Train departure informationTrain departure information: Idealarge Train departure information: The code (1)large Train departure information: The code (2)large Train departure information: The code (3)large Train departure information: The code (4)Train departure information: ScriptAnnouncementsTopicslarge Searching for data: Storage considerations (1)large Searching for data: Storage considerations (2)B-treesB-trees: RemarksB-trees: ExampleB-trees: Variations (1)B-trees: Variations (2)B-trees: Variations (3)Searching in a B-treeB-trees: Search exampleInserting in a B-tree: AlgorithmInserting in a B-tree (1)Inserting in a B-tree (2)Inserting in a B-tree (3)Inserting in a B-tree (4)Inserting in a B-tree: RemarkDeleting in a B-tree: AlgorithmDeleting in a B-tree (1)Deleting in a B-tree (2)

University of Liverpool - Computer Science Intranetmichele/TEACHING/COMP102/2006/...AVL trees...

Documents

Transcript of University of Liverpool - Computer Science Intranetmichele/TEACHING/COMP102/2006/...AVL trees...