Balanced Trees Ellen Walker CPSC 201 Data Structures Hiram College.

38
Balanced Trees Ellen Walker CPSC 201 Data Structures Hiram College

Transcript of Balanced Trees Ellen Walker CPSC 201 Data Structures Hiram College.

Page 1: Balanced Trees Ellen Walker CPSC 201 Data Structures Hiram College.

Balanced Trees

Ellen WalkerCPSC 201 Data Structures

Hiram College

Page 2: Balanced Trees Ellen Walker CPSC 201 Data Structures Hiram College.

Search Tree Efficiency

• The average time to search a binary tree is the average path length from root to leaf

• In a tree with N nodes, this is…– Best case: log N (the tree is full)– Worst case: N (the tree has only one path)

• Worst case tree examples– Items inserted in order– Items inserted in reverse order

Page 3: Balanced Trees Ellen Walker CPSC 201 Data Structures Hiram College.

Keeping Trees Balanced

• Change the insert algorithm to rebalance the tree

• Change the delete algorithm to rebalance the tree

• Many different approaches, we’ll look at one– RED-BLACK trees– Based on non-binary trees (2-3-4 trees)

Page 4: Balanced Trees Ellen Walker CPSC 201 Data Structures Hiram College.

2-3 Trees

• Relax constraint that a node has 2 children

• Allow 2-child nodes and 3-child nodes– With bigger nodes, tree is shorter & branchier

– 2-node is just like before (one item, two children)

– 3-node has two values and 3 children (left, middle, right)

< x , y>

<=x >x and <=y >y

Page 5: Balanced Trees Ellen Walker CPSC 201 Data Structures Hiram College.

Why 2-3 tree

• Faster searching?– Actually, no. 2-3 tree is about as fast as an “equally balanced” binary tree, because you sometimes have to make 2 comparisons to get past a 3-node

• Easier to keep balanced?– Yes, definitely.– Insertion can split 3-nodes into 2-nodes, or promote 2-nodes to 3-nodes to keep tree approximately balanced!

Page 6: Balanced Trees Ellen Walker CPSC 201 Data Structures Hiram College.

3-Node and Equivalent 2-Nodes

10,2010

20

20

10

L L

L

M M MR

R

R

Page 7: Balanced Trees Ellen Walker CPSC 201 Data Structures Hiram College.

Inserting into 2-3 Tree

• As for binary tree, start by searching for the item

• If you don’t find it, and you stop at a 2-node, upgrade the 2-node to a 3-node.

• If you don’t find it, and you stop at a 3-node, you can’t just add another value. So, replace the 3-node by 2 2-nodes and push the middle value up to the parent node

• Repeat recursively until you upgrade a 2-node or create a new root.

• When is a new root created?

Page 8: Balanced Trees Ellen Walker CPSC 201 Data Structures Hiram College.

Why is this better?

• Intuitively, you unbalance a binary tree when you add height to one path significantly more than other possible paths.

• With the 2-3 insert algorithm, you can only add height to the tree when you create a new root, and this adds one unit of height to all paths simultaneously.

• Hence, the average path length of the tree stays close to log N.

Page 9: Balanced Trees Ellen Walker CPSC 201 Data Structures Hiram College.

Deleting from a 2-3 Tree

• Like for a binary tree, we want to start our deletion at a leaf

• First, swap the value to be deleted with its immediate successor in the tree (like binary search tree delete)

• Next, delete the value from the node.– If the node still has a value, you’ve changed a 3-node into a 2-node; you’re done

– If no value is left, find a value from sibling or parent

Page 10: Balanced Trees Ellen Walker CPSC 201 Data Structures Hiram College.

Deletion Cases

• If leaf has 2 items, remove one item (done)

• If leaf has 1 item– If sibling has 2 items, redistribute items among sibling, parent, and leaf

– If sibling has 1 item, slide an item down from the parent to the sibling (merge)

– Recursively redistribute and merge up the tree until no change is needed, or root is reached. (If root becomes empty, replace by its child)

• Fig. 11.42-11.47, p. 602-603

Page 11: Balanced Trees Ellen Walker CPSC 201 Data Structures Hiram College.

Going Another Step

• If 2-3 trees are good, why not make bigger nodes?

• 2-3-4 trees have 3 kinds of nodes • Remember a node is described by the number of children. It contains one less value than children

• So, a 4-node has 4 children and 3 values.

• Names of children are left, middle-left, middle-right, and right

Page 12: Balanced Trees Ellen Walker CPSC 201 Data Structures Hiram College.

4-node is equivalent to 3 2-nodes

• 4 node has 3 values e.g. <10,20,30>• A binary tree of those values would have the middle value (20) as the parent, and the outer values (10, 20) as the children

• So every 4-node can be replaced by 3 2-nodes.

• This leads naturally to a very nice insertion algorithm.

Page 13: Balanced Trees Ellen Walker CPSC 201 Data Structures Hiram College.

Insert into 2-3-4 tree

• Find the place for the item in the usual way.• On the way down the tree, if you see any 4-nodes, split them and pass the middle value up.

• If the leaf is a 2-node or 3-node, add the item to the leaf.

• If the leaf is a 4-node, split it into 2 2-nodes, passing the middle value up to the parent node, then insert the item into the appropriate leaf node.

• There will be room, because 4-nodes were split on the way down!

Page 14: Balanced Trees Ellen Walker CPSC 201 Data Structures Hiram College.

2-3-4 Insert Example

6, 15, 25

2,4,5 10 18, 20 30

Insert 24, then 19

Page 15: Balanced Trees Ellen Walker CPSC 201 Data Structures Hiram College.

Insert 24: Split root first

6

2,4,5 10 18, 20,24 30

15

25

Page 16: Balanced Trees Ellen Walker CPSC 201 Data Structures Hiram College.

Insert 19, Split leaf (20 up) first

6

2,4,5 10 18, 19 30

15

20, 25

24

Page 17: Balanced Trees Ellen Walker CPSC 201 Data Structures Hiram College.

2-3-4 Algorithm is Simpler

• All splits happen on the way down the tree

• Therefore, there is always room in the leaf for the insertion

• And there is always room in the parent for a node that has to move up (because if the parent were a 4-node, it would already have been split!)

Page 18: Balanced Trees Ellen Walker CPSC 201 Data Structures Hiram College.

Deleting from a 2-3-4 Tree

• Find the value to be deleted and swap with inorder successor.

• On the way down the tree (both for value and successor), upgrade 2-nodes into 3-nodes or 4 nodes. This ensures that the deleted value will be in a 3-node or 4-node leaf

• Remove the value from the leaf.

Page 19: Balanced Trees Ellen Walker CPSC 201 Data Structures Hiram College.

Upgrade cases

• 2-node whose next sibling is a 2-node– Combine sibling values and “divider” value from parent into a 4-node

– By the algorithm, parent cannot be a 2-node unless it is the root; in this case, our new 4-node becomes the root

• 2-node whose next sibling is a 3-node – Move this value up to parent, move divider value down, shift a value to sibling

Page 20: Balanced Trees Ellen Walker CPSC 201 Data Structures Hiram College.

Red-Black Trees

• Red-Black trees are binary trees• But each node has an additional piece of information (color)

• Red nodes can be considered (with their parents) as 3-nodes or 4-nodes

• There can never be 2 red nodes in a row!

Page 21: Balanced Trees Ellen Walker CPSC 201 Data Structures Hiram College.

Advantages of Red-black trees

• Binary tree search algorithm and traversals hold directly (ignore color)

• 2-3-4 tree insert and delete algorithms keep tree balanced (consider color)

Page 22: Balanced Trees Ellen Walker CPSC 201 Data Structures Hiram College.

Splitting a 4-node

• A 4-node in a RB tree looks like a black node with two red children.

• If you make it a red node with 2 black children, you have split the node (and passed the parent up).

• If the parent is red, you have to split it too.

Page 23: Balanced Trees Ellen Walker CPSC 201 Data Structures Hiram College.

Revising a 3-node

• To avoid having two red children in a row, you might have to rotate as well as color change.

• When the parent is red:– If the parent’s value is between the child’s and the grandparent’s, do a single rotation

– If the child’s value is between the parent’s and the grandparent’s, do a double rotation

Page 24: Balanced Trees Ellen Walker CPSC 201 Data Structures Hiram College.

Single Rotation

8

4

3

8

4

3

66

Page 25: Balanced Trees Ellen Walker CPSC 201 Data Structures Hiram College.

Double Rotation

4

8

8

5

6

4

6

8

6

4

5

5

Page 26: Balanced Trees Ellen Walker CPSC 201 Data Structures Hiram College.

Top Down Insertion Algorithm

• Search the binary tree in the usual way for the insertion algorithm

• If you pass a 4-node (black node with two red children) on the way down, split it

• Insert the node as a red child, and use the split algorithm to adjust via rotation if the parent is red also.

• Force the root to be black after every insertion.

Page 27: Balanced Trees Ellen Walker CPSC 201 Data Structures Hiram College.

Insert 1, 2, 3

1

2

1

2

3

Insert red leaf(2 consecutive red nodes!)

2

31

Left single rotation

Page 28: Balanced Trees Ellen Walker CPSC 201 Data Structures Hiram College.

Continued: Insert 4, 5

2

31

4

4-node (2 red children)split on the way downRoot remains black

2

31

4

2

41

5

5

3

Single rotation to avoidconsecutive red nodes

Page 29: Balanced Trees Ellen Walker CPSC 201 Data Structures Hiram College.

Continued, Insert 9, 6

2

41

53

9

4-node (3,4,5) split on the way down,4 is now red (passed up)

2

41

53

9

6

2

41

63

95

Double rotation

Page 30: Balanced Trees Ellen Walker CPSC 201 Data Structures Hiram College.

Deletion Algorithm

• Find the node to be deleted (NTBD)– On the way down, if you pass a 2-node upgrade it by borrowing from its neighbor and/or parent

• If the node is not a leaf node, – Find its immediate successor, upgrading all 2-nodes

– Swap value of leaf node with value of NTBD

• Remove the current leaf node, which is now NTBD (because of swap, if it happened)

Page 31: Balanced Trees Ellen Walker CPSC 201 Data Structures Hiram College.

Red-black “neighbor” of a node

• Let X be a 2-node to be deleted• If X is its parent’s left child, X’s right neighbor can be found by:– Let S = parent’s right child. If S is black, it is the neighbor

– Otherwise, S’s left child is the neighbor.

• If X is parent’s left child, then X’s left neighbor is grandparent’s left child.

Page 32: Balanced Trees Ellen Walker CPSC 201 Data Structures Hiram College.

Neighbor examples2

41

53

9

Right neighbor of 1 is 3

Right neighbor of 3 is 5

Left neighbor of 5 is 3

Left neighbor of 3 is 1

Page 33: Balanced Trees Ellen Walker CPSC 201 Data Structures Hiram College.

Upgrade a 2-node

• Find the 2-node’s neighbor (right if any, otherwise left)

• If neighbor is also a 2-node (2 black children)– Create a 4-node from neighbors and their parent.– If neighbors are actually siblings, this is a color swap.

– Otherwise, it requires a rotation

• If neighbor is a 3-node or 4-node (red child)– Move “inner value” from neighbor to parent, and “dividing value” from parent to 2-node.

– This is a rotation

Page 34: Balanced Trees Ellen Walker CPSC 201 Data Structures Hiram College.

Deletion Examples (Delete 1)

2

41

63

95

4

2 6

3 95

Make a 4-node from 1, sibling 3, and “divider value” 2.[Single rotation of 2,4,6]

1

Page 35: Balanced Trees Ellen Walker CPSC 201 Data Structures Hiram College.

Delete 64

2 6

3 95

4

2 9

3 65

Upgrade 6 by color flip, swap with successor (9)

4

2 9

3 5

Page 36: Balanced Trees Ellen Walker CPSC 201 Data Structures Hiram College.

Delete 4

2

41

63

95

2

51

63

94

2

51

63

9

No 2-nodes to upgrade, swap with successor (5)

Page 37: Balanced Trees Ellen Walker CPSC 201 Data Structures Hiram College.

Delete 22

51

63

9

Find 2-node enroute to successor (3)Neighbor is 3-node (6,9) Shift to get (3,5) and 9 as children,6 up to parent.

2

61

95

3

Single rotation

Page 38: Balanced Trees Ellen Walker CPSC 201 Data Structures Hiram College.

Delete 2 (cont’d)

3

61

95

2

Swap with successor

3

61

95

Remove leaf