Binary search trees Definition Binary search trees and dynamic set operations Balanced binary search...

Binary search trees

• Definition• Binary search trees and dynamic set operations• Balanced binary search trees

– Tree rotations

– Red-black trees

• Move to front “balancing” technique– Unsorted linked lists

– Splay trees

Binary search trees

• Basic tree property– For any node x

• left subtree has nodes ≤ x• right subtree has nodes ≥ x

BSTs and Dynamic Sets

• Dynamic set operations and binary search trees– Search(S,k)

– Insert(S,x)

– Delete(S,x)

– Minimum or Maximum(S)

– Successor or Predecessor (S,x)

– List All(S)

– Merge(S1,S2)

Dynamic Set Operations

• Listall(T)?– time to list?

• Search(T,k)?– search time?

• Minimum(T)? Maximum(T)?– search time?

• Successor(T,x)? Predecessor(T,x)?– Search time

• Simple Insertion(T,x)

Simple deletion

• Delete(T,x): Three possible cases:– a) x is a leaf :– b) x has one child : – c) x has two children : Replace x with

successor(x). • Successor(x) has at most one child (why?); • Use step a or b on successor(x)

Simple binary search trees

• What is the expected height of a binary search tree?

• Difficult to compute if we allow both insertions and deletions

• With insertions, analysis of section 12.4 shows that expected height is O(log n)

• Implications about BSTs as dynamic sets?

Tree-Balancing Algorithms

• Tree rotations

• Red-Black Trees

• Splay Trees

• Others

– AVL Trees

– 2-3 Trees and 2-3-4 Trees

Tree Rotations

A

B

T1

T2 T3

B

A

T3

T1 T2

Right Rotate(T,A)

Left Rotate(T,B)

Red-Black Trees

• All nodes in the tree are either red or black.• Every null-child is included and colored black.• All red nodes must have two black children.• Every path from any node x (including the root) to

a leaf must have the same number of black nodes.

• How balanced of a tree will this produce? • How hard will it be to maintain?

Example Red-Black Tree

Insertion(T,z)

• Find place to insert using simple insertion

• Color node z as red

• Fix up tree so that it is still a red-black tree

• What needs to be fixed?– Problem 1: z is root (first node inserted)

• Minor detail

– Problem 2: parent(z) is red

RB-Insert-Fixup

• Situation: parent(z) is red and z is red– Case 1: uncle(z) is red

• Then make both uncle(z) and parent(z) black and p(p(z)) red and recurse up tree

z

z

RB-Insert-Fixup(parent(z) is right child of parent(parent(z)))

• Situation: parent(z) is red and z is red– Case 2: uncle(z) is black and z is a left child

• Right rotate to make into case 3

A

B

T1

T2 T3

zB

A

T3

T1 T2

z

RB-Insert-Fixup(parent(z) is right child of parent(parent(z)))

• Situation: parent(z) is red and z is red– Case 3: uncle(z) is black and z is a left child

• Left rotate to make B root of tree

A

B

C

T1

T2 T3

z

A

B

C

T1 T2 T3

RB-Insert-Fixup Analysis(parent(z) is right child of parent(parent(z)))

• Situation: parent(z) is red and z is red– Case 1: no rotations, always moving up tree– Cases 2 and 3: At most 2 rotations total and tree

ends up balanced• No more need to fix up once these cases are met

– Total cost: at most 2 rotations and log n operations

Delete(T,z)

• Find node y to delete using simple deletion• Let x be a child of y if such a child exists

(otherwise x is a null child)• If y is black, fix up tree so that it is still a red-

black tree• What needs to be fixed?

– Problem 1: y was root, so now we might have red root

– Problem 2: x and parent(y) are both red

– Problem 3: Removal of y violates black height properties of paths that used to go through y

Move To Front (MTF) Technique

• A powerful “balancing” mechanism is the “move to front” idea

• This technique is effective in managing both unsorted lists and binary search trees

• The idea: Whenever an item is accessed (by search or by insertion), it is always moved to the front– In a list, the front is well-defined

– In a binary search tree, the front of the tree is the root

– A tree that implements this idea is called a splay tree• Rotations are not simple single rotations but occur in pairs

• We give some intuition about the power of MTF

Splay Tree Example

Effectiveness in lists

• Reference: Amortized efficiency of list update and paging rules, Sleator and Tarjan, CACM 1985

• Problem statement:– Suppose you are maintaining an unsorted list where every search must

progress from the front of the list to the item (or end of list if search is unsuccessful)

– Operations: search, insert, delete• Costs: finding or deleting the ith item costs i

• Inserting a new item costs n+1

• Immediately after an insertion or search of an item i, item i may be moved anywhere closer to the front of the list at no extra cost

• Goal: Find a way to manage list that minimizes total cost of a sequence of operations

Notation for computing costs

• S: sequence of requests – (insertions, deletions, searches)

• A: any algorithm for maintaining list– including those those that know S in advance

• cA(S): cost incurred by algorithm A on sequence S not including paid exchanges

• xA(S): # of paid exchanges for A on S• fA(S): # of free exchanges for A on S• Example:

– List: 5, 9, 2, 7, 3, 6 and we search for 7– MTF then has list with 7, 5, 9, 2, 3, 6– cMTF(S) increases by 4– xMTF(S) increases by 0 since moving 7 to the front is a free– fMTF(S) increases by 3 since we made 3 free exchanges to move 7

Performance of MTF

• Thm: For any algorithm A and any sequence SxMTF(S) + cMTF(S) ≤ 2cA(S) + xA(S) – FA(S) – m

• Observation: xMTF(S) = 0

• Interpretation– MTF incurs at most twice the cost of any other

algorithm, even those that know the request sequence in advance

Direct Cost Comparison

• The ideal approach to proving this result is that for each operation, MTF incurs a cost that is at most twice that of algorithm A

• However, this is clearly not always true• Example just before tth operation search(1):

– A’s list: 1, 20, 7, 9, 3, 5, 24, 4, 8• A’s cost is just 1

– MTF’s list: 5, 24, 8, 3, 9, 7, 20, 4, 1• MTF’s cost is 9

• How can this happen?– Well, since last access to item 1, items 5, 24, 8, 3, 9, 7, 20 and 4

have been accessed.– Thus, A must have done some extra work since last access to 1 in

order to have 1 at the current front of the list– This leads to ideas of potential function and amortized analysis

Potential Function Φ

• Consider A, MTF, and S• Let t be the number of operations performed so far

(0 ≤ t ≤ |S|)• For any t, we define Φ(t) to be the number of

inversions between A’s list and MTF’s list– Inversion: a pair of elements x,y s.t. x appears before y

in one list and y appears before x in the other list

• Example– MTF: 1, 7, 2, 5– A: 2, 1, 5, 7– Inversions: (1,2), (2,7), (5,7)

Amortized Cost

• Cost cA(t) is the cost of the tth operation for algorithm A

• Amortized cost aA(t) of the tth operation is cA(t) + Φ(t) - Φ(t-1)– Cost of tth operation + change in potential fct

• Key observationΣt aMTF(t) = Σt cMTF(t) + Φ(t) - Φ(t-1)

= Φ(|S|) - Φ(0) + Σt cMTF(t)

Thus, cMTF(S) = Σt cMTF(t) = Φ(0) - Φ(|S|) + Σt aMTF(t)

Thus, cMTF(S) ≤ Σt aMTF(t)• Note Φ(0) = 0 and Φ(|S|) ≥ 0

Amortized Cost Comparison

• Our revised goal is to show that MTF incurs an amortized cost that is at most twice that of algorithm A

• Example just before tth operation search(1):– A’s list: 1, 20, 7, 9, 3, 5, 24, 4, 8

• A’s cost is just 1– MTF’s list: 5, 24, 8, 3, 9, 7, 20, 4, 1– MTF’s list afterwards: 1, 5, 24, 8, 3, 9, 7, 20, 4, 1

• MTF’s direct cost is 9• The change in potential function is -8 as 8 inversions (all

involving 1) are eliminated after 1 is moved to the front of the list

• MTF’s amortized cost is 1

Amortized Cost Comparison Cont’d

• General case– We are searching for x which is at position i in A’s list and k in MTF’s list

• Direct costs– A’s cost to access x is then i– MTF’s cost to access x is k

• Potential function changes– Let y be the number of items that precede x in MTF’s list but follow x in

A’s list– This means k-y-1 items precede x in both lists

• Note k-y ≤ i. Why?– After x is moved to front of MTF’s list

• y inversions are eliminated• k-y-1 inversions are created

– Thus potential function change is k-2y-1• MTF’s amortized cost is thus: k + (k-2y-1) = 2(k-y) -1 ≤ 2i-1• Similar analysis holds for other operations

Splay Tree Performance

• Analysis of splay trees also uses a potential function and amortized analysis

• Individual operations may take O(n) time• However, it can be shown that any sequence of m

operations including n insertions starting with an empty tree take O(m log n) time

• Static optimality theorem– For any sequence of access operations, a splay tree is

asymptotically as efficient as the optimum static search tree (that cannot perform any rotations)

Dynamic Optimality Conjecture

• Splay trees are as asymptotically fast on any sequence of operations as any other type of search tree with rotations.

• What does this mean?– Worst case sequence of splay tree operations takes

amortized O(log n) time per operation– Some sequences of operations take less.

• Accessing the same ten items over and over again – Splay tree should then take less on these sequences as well.

• One special case that has been proven:– search in order from the smallest key to the largest key– the total time for all n operations is O(n)

Binary search trees Definition Binary search trees and dynamic set operations Balanced binary search...

Documents

Transcript of Binary search trees Definition Binary search trees and dynamic set operations Balanced binary search...