1 Jeff Edmonds York University COSC 2011 Abstract Data Types Positions and Pointers Loop Invariants...

1

Jeff Edmonds

York University COSC 2011

Abstract Data TypesPositions and PointersLoop InvariantsSystem InvariantsTime ComplexityClassifying FunctionsAdding Made EasyUnderstand QuantifiersRecursionBalanced TreesHeapsHuffman CodesHash TablesGraphsParadigms

Midterm Review

2

Midterm Review• Review slides.

3


4


5


6


7

Midterm Review• Review slides.• Review the assignment notes and solutions!

8

Midterm Review• Review slides.• Review the assignment notes and solutions!• Review 3101

• Steps0: Basic Math• First Order Logic • Time Complexity• Logs and Exp• Growth Rates• Adding Made Easy• Recurrence Relations

9

Midterm Review• Review slides.• Review the assignment notes and solutions!• Review 3101

• Steps0: Basic Math• Steps1: Loop Invariants• Steps2: Recursion

10

Jeff Edmonds

York University COSC 2011Lecture 1

Abstractions (Hierarchy)ElementsSetsLists, Stacks, & QueuesTreesGraphsIteratorsAbstract Positions/Pointers

Abstract Data Types

11

Software Engineering• Software must be:

– Readable and understandable • Allows correctness to be verified, and software to be easily updated.

– Correct and complete • Works correctly for all expected inputs

– Robust• Capable of handling unexpected inputs.

– Adaptable• All programs evolve over time. Programs should be designed so that

re-use, generalization and modification is easy.– Portable

• Easily ported to new hardware or operating system platforms.– Efficient

• Makes reasonable use of time and memory resources.

James Elder

Abstract Data Types (ADTs)

• An ADT is a model of a data structure that specifies– The type of data stored

– Operations supported on these data

• An ADT does not specify how the data are stored or how the operations are implemented.

• The abstraction of an ADT facilitates– Design of complex systems. Representing complex data

structures by concise ADTs, facilitates reasoning about and designing large systems of many interacting data structures.

– Encapsulation/Modularity. If I just want to use an object / data structure, all I need to know is its ADT (not its internal workings).

Abstraction

James Elder

13

Abstract Data Types

Restricted Data Structure:Some times we limit what operation can be done• for efficiency • understandingStack: A list, but elements can only be pushed onto and popped from the top.Queue: A list, but elements can only be added at the end and removed from the front. • Important in handling jobs.Priority Queue: The “highest priority” element is handled next.

14

Data Structures Implementations• Array List

– (Extendable) Array

• Node List– Singly or Doubly Linked List

• Stack– Array– Singly Linked List

• Queue– Circular Array– Singly or Doubly Linked List

• Priority Queue– Unsorted doubly-linked list– Sorted doubly-linked list– Heap (array-based)

• Adaptable Priority Queue– Sorted doubly-linked list with

location-aware entries– Heap with location-aware entries

• Tree– Linked Structure

• Binary Tree– Linked Structure– Array

15

Jeff Edmonds


Abstract Positions/PointersPositions in an ArrayPointers in CReferences in JavaImplementing Positions in TreesBuilding Trees

Positions and Pointers

16

High Level Positions/Pointers

Positions: Given a data structure, we want to have one or more current elements that we are considering.Conceptualizations: • Fingers in pies• Pins on maps• Little girl dancing there• Me

See Goodrich Sec 7.3 Positional Lists

17

head

2039

Positions/Pointers:

Implementations of Positions/Pointers

Now lets redo it in Java.

element next

The right hand side of the “=” specifies a memory location.So does its left hand side.The action is to put the value contained in the first into the second.

2039

5element next2182

2182head.next.next

head .next;=2182

18

Implementing Positions in Treesclass LinkedBinaryTree { class Node { E element; Node parent; Node left; Node right; } private Node root = null;

tree

19

Implementing Positions in Treesclass LinkedBinaryTree { Position sibling(Position p) { Node n=p; if( n.parent.right = n ) return n.parent.left; else return n.parent.right; }

At any time the user can move a position to the sibling. p3 = tree.sibling(p2);

p2

p1

p3

…tree

if( n.parent != null )

else throw new IllegalArgumentException(“p is the root"); }

20

Implementing Positions in Treesclass LinkedBinaryTree { Position addRight(Position p,E e) { Node n=p; if( n.right = null ) n.right = return else throw new IllegalArgumentException(

"p already has a right child");

}At any time the user can add a position/node to the right of a position. p3 = tree.addRight(p2,“Toronto”);

…

p2

p1

p3

Toronto

new Node(e, ,null,null);nn.right;

tree

21

Implementing Positions in Trees

Defining the class of trees nodes can have many children.We use the data structure Set or List to store the Positions of a node’s children.

class LinkedTree {tree

22

Jeff Edmonds


ContractsAssertionsLoop InvariantsThe Sum of ObjectsInsertion and Selection SortBinary Search Like ExamplesBucket (Quick) Sort for HumansReverse Polish Notation (Stack)Whose Blocking your View (Stack)Parsing (Stack)Data Structure InvariantsStack and Queue in an ArrayLinked Lists

Contracts, Assertions, and Invariants

23

Precondition Postcondition

On Step At a Time

I implored you to not worry about the entire computation.

next

It can be difficult to understand where computation go.

i-1 i

ii

0 T+1

Trust who passes you the baton

and go around once

24

Iterative Algorithm with Loop Invariants• Precondition:

What is true about input

• Post condition: What is true about output.

25

Iterative Algorithm with Loop Invariants

Goal: Your goal is to prove that • no matter what the input is,

as long as it meets the precondition, • and no matter how many times your algorithm iterates,

as long as eventually the exit condition is met, • then the post condition is guarantee to be achieved.

Proves that IF the program terminates then it works

<PreCond> & <code> Þ <PostCond>

26

Iterative Algorithm with Loop Invariants• Loop Invariant:

Picture of what is true at top of loop.

27

Iterative Algorithm with Loop Invariants• Establishing the Loop Invariant.

• Our computation has just begun.• All we know is that we have an input instance that

meets the Pre Condition. • Being lazy, we want to do the minimum work. • And to prove that it follows that the Loop Invariant is

then made true.

<preCond>codeA

<loop-invariant>

Establishing Loop Invariant

28

Iterative Algorithm with Loop Invariants• Maintaining the loop invariant

(while making progress)79 km 75 km

Exit

• We arrived at the top of the loop knowing only • the Loop Invariant is true • and the Exit Condition is not.

• We must take one step (iteration) (making some kind of progress).

• And then prove that the Loop Invariant will be true when we arrive back at the top of the loop.

<loop-invariantt>¬<exit Cond>codeB

<loop-invariantt+1>

Maintaining Loop Invariant

Exit

29

<loop-invariant><exit Cond>codeC

<postCond>

Obtain the Post Condition

Exit

• We know the Loop Invariant is true because we have maintained it. We know the Exit Condition is true because we exited.

• We do a little extra work. • And then prove that it follows that the Post

Condition is then true.

• Obtain the Post Condition: Exit


30


8814

982562

52

79

3023

3114,23,25,30,31,52,62,79,88,98

• Precondition: What is true about input


Insertion Sort

31



14

9825 62

79

3023,31,52,88

Sorted sub-list

32


9825 62

79

23,31,52,88

14

9825

79

3023,31,52,62,88

1430

6 elements

to school

• Making progress while Maintaining the loop invariant

79 km 75 km

Exit

33


88 14

9825 6

2

52

79

3023

31

88 14

9825 6

2

52

79

3023

31n elements

to school

14,23,25,30,31,52,62,79,88,98

14,23,25,30,31,52,62,79,88,980 elements

to school

• Beginning & Endingkm Exit0 km Exit

34


n+1 n+1 n+1 n+1 n+1

n

n

n

n

n

n

= 1+2+3+…+n = (n2)

• Running Time

35

Define Problem Define Loop Invariants

Define Measure of Progress

Define Step Define Exit Condition Maintain Loop Inv

Make Progress Initial Conditions Ending

km

79 km

to school

Exit

Exit

79 km 75 km

Exit

Exit

0 km Exit


36

Proves that IF the program terminates then it works

<PreCond> & <code> Þ <PostCond>

<preCond>codeA

<loop-invariant>

Establishing Loop Invariant

<loop-invariant><exit Cond>codeC

<postCond>

Clean up loose ends

Exit

<loop-invariantt>¬<exit Cond>codeB

<loop-invariantt+1>

Maintaining Loop Invariant

Exit


37

Iterative Algorithm with Loop Invariants• Precondition:

What is true about input


Binary Searchkey 25

3 5 6 13 18 21 21 25 36 43 49 51 53 60 72 74 83 88 91 95

38



key 25

3 5 6 13 18 21 21 25 36 43 49 51 53 60 72 74 83 88 91 95

• If the key is contained in the original list, then the key is contained in the sub-list.

39


• Making progress while Maintaining the loop invariant

79 km 75 km

Exit

key 25

3 5 6 13 18 21 21 25 36 43 49 51 53 60 72 74 83 88 91 95

If key ≤ mid,then key is inleft half.

If key > mid,then key is inright half.

40


key 25

3 5 6 13 18 21 21 25 36 43 49 51 53 60 72 74 83 88 91 95

If key ≤ mid,then key is inleft half.

If key > mid,then key is inright half.

• Running Time

The sub-list is of size n, n/2, n/4, n/8,…,1Each step (1) time.

Total = (log n)

41

Iterative Algorithm with Loop Invariants• Beginning & Ending

km Exit0 km Exit

key 25

3 5 6 13 18 21 21 25 36 43 49 51 53 60 72 74 83 88 91 95

42

Parsing with a StackInput: A string of brackets.

Output: Each “(”, “{”, or “[” must be paired with a matching “)”, “}”, or “[”.

Loop Invariant: Prefix has been read.• Matched brackets are matched and removed.• Unmatched brackets are on the stack.

Stack

[ (

Opening Bracket: • Push on stack.

Closing Bracket: • If matches that on stack

pop and match.• else return(unmatched)

[( ) ( ( ) ) {

( (

}])

43

Dude! You have been teaching 3101 too long.This is not an course on Algorithms,

but on Data Structures!

Data Structure Invariants

The importance of invariants is the same.

Differences:1. An algorithm must terminate with an answer,

while systems and data structures may run forever.

2. An algorithm gets its full input at the beginning, while data structures gets a continuous stream of instructions from the user.

• Both have invariants that must be maintained.

44


Assume we fly in from Mars and InvariantsData Struc t is true:

InvariantsData Struc t+1

postCondPush

Maintaining Loop InvariantExit

InvariantsData Struc tPush OperationpreCondPush

Assume the user correctly calls the Push Operation: preCondPush The input is info for a new element.Implementer must ensure: postCondPush The element is pushed on top of the stack. InvariantsData Struc t+1

45


InvariantsData Struc t+1

postCondPush

Maintaining Loop InvariantExit

InvariantsData Struc tPush OperationpreCondPush

top = top + 1;A[top] = info;

46

Data Structure InvariantsQueue: Add and Remove from opposite ends.

Algorithm dequeue()

if isEmpty() then

throw EmptyQueueException

else

info A[bottom]

bottom (bottom + 1) mod N

return e

47

Data Structure InvariantsInvariantsData Struc t

preCondPush

postCondPushInvariantsData Struc t+1

Don’t panic. Just draw the pictures and move the pointers.

48


preCondPush


49


preCondPush


Special Case: Empty

50


preCondRemove Rear

postCondRemove RearInvariantsData Struc t+1

How about removing an element from the rear?

Is it so easy???

last must point at the second last element.

How do we find it?

You have to walk there from first!

time # of elementsinstead of constant

51


Front Rear

Add Element Time Constant Time Constant

Remove Element Time Constant Time n

Stack: Add and Remove from same end.

Actually, for a Stack the last pointer is not needed.

52


Front Rear


Remove Element Time Constant Time n

Stack: Add and Remove from same end.

Queue: Add and Remove from opposite ends.

53


Front Rear


Remove Element Time Constant Time nTime Constant

trailerheader nodes/positions

elements

Doubly-linked lists allow more flexible list

54


Exit

55

Jeff Edmonds


Asymptotic Analysis of Time ComplexityHistory of Classifying ProblemsGrowth RatesTime ComplexityLinear vs Constant TimeBinary Search Time (logn)Insertion Sort Time (Quadratic)Don't Redo WorkTest (linear) vs Search (Exponential)Multiplying (Quadratic vs Exponential)Bits of InputCryptographyAmortized Time ComplexityWorst Case InputClassifying Functions (BigOh)Adding Made EasyLogs and ExponentialsUnderstand Quantifiers

56

Some MathTime Complexity

t(n) = Q(n2)

Input Size

Tim

e

Classifying Functionsf(i) = nQ(n)

Logs and Exps

2a × 2b = 2a+b

2log n = n

Adding Made Easy∑i=1 f(i).

Logic Quantifiers g "b Loves(b,g)"b g Loves(b,g)

Recurrence RelationsT(n) = a T(n/b) + f(n)

57

• Specifies how the running time • depends on the size of the input.

The Time Complexity of an Algorithm

“size” of input

“time” T(n) executed .

Work for me to give you the instance.

Work for you tosolve it.

A function mapping

58

History of Classifying Problems

Computable

Exp = 2n

Poly = nc

Quadratic = n2

nlogn

log n

Fast sortingLook at input

Slow sorting

Considered Feasible

Brute Force (Infeasible)

Mathematicians’ dream

ConstantTime does not depend on input.

Linear = nBinary Search

HaltingImpossible

59

Quadratic = n2

log nLook at input

Slow sorting

Brute Force (Infeasible)

ConstantTime does not depend on input.

Linear = nBinary Search

input size

ti

me

Growth Rates

Exp = 2n

5log n

nn2

2n

60

Search:• Input: A linked list. • Output: Find the end. • Alg: Walk there.• Time

Insert Front:• Input: A linked list. • Output: Add record to front.• Alg: Play with pointers.• Time

# of records = n.

= 4

Linear vs Constant Time

61

• Time = 4

Linear vs Constant Time

= Constant time = O(1)Time does not “depend” on input.

a Java Program J, an integer k, " inputs I, Time(J,I) ≤ k

Is this “Constant time”

= O(1)?

Time

n

Yes because bounded

by a constant

62

Test/Evaluate:• Input: Circuit & Assignment• Output: Value at output.• Alg: Let values percolate down.• Time:

Search/Satisfiablity:• Input: Circuit • Output: An assignment

giving true:• Alg: Try all assignments.

(Brute Force)• Time:

Test vs Search

F T F

x3x2x1

OR

ORANDAND

OR

NOTF F

F

F F

T

# of gates.

2n

63

* * * * * * * * * * * * * * * *

* * * * * * * * * * * * * * * *

* * * * * * * * * * * * * * * *

* * * * * * * * * * * * * * * *

* * * * * * * * * * * * * * * *

* * * * * * * * * * * * * * * *

n2

Grade School vs Kindergarten

a × b = a + a + a + ... + a

b

Running Time

T(n) = Time multiply

= θ(b) = linear time.


= θ(n2) = quadratic time.Which is faster?

92834765225674897 × 838839775901103948759

64

Size of Input Instance

• Size of paper

• # of bits• # of digits• Value

- n = 2 in2

- n = 17 bits

- n = 5 digits

- n = 83920

• Intuitive• Formal• Reasonable• Unreasonable

# of bits = log2(Value) Value = 2# of bits

2’’

83920

5

1’’

65

• Specifies how the running time • depends on the size of the input.

The Time Complexity of an Algorithm

“size” of input

“time” T(n) executed .

Work for me to give you the instance.

Work for you tosolve it.

A function mapping

66

* * * * * * * * * * * * * * * *

* * * * * * * * * * * * * * * *

* * * * * * * * * * * * * * * *

* * * * * * * * * * * * * * * *

* * * * * * * * * * * * * * * *

* * * * * * * * * * * * * * * *

n2


a × b = a + a + a + ... + a

b

Running Time





92834765225674897 × 8388397759011039475n = # digits = 20Time ≈ 202 ≈ 400

b = value = 8388397759011039475Time ≈ 8388397759011039475

67

* * * * * * * * * * * * * * * *

* * * * * * * * * * * * * * * *

* * * * * * * * * * * * * * * *

* * * * * * * * * * * * * * * *

* * * * * * * * * * * * * * * *

* * * * * * * * * * * * * * * *

n2


a × b = a + a + a + ... + a

b

Running Time





92834765225674897 × 8388397759011039475n = # digits = 20Time ≈ 202 ≈ 400

b = value ≈ 10n Time ≈ 10n ≈ exponential!!!

68

* * * * * * * * * * * * * * * *

* * * * * * * * * * * * * * * *

* * * * * * * * * * * * * * * *

* * * * * * * * * * * * * * * *

* * * * * * * * * * * * * * * *

* * * * * * * * * * * * * * * *

n2


a × b = a + a + a + ... + a

b

Running Time





92834765225674897 × 8388397759011039475n = # digits = 20Time ≈ 202 ≈ 400

Adding a single digit multiplies the time by 10!

9

69

Time Complexity of Algorithm

O(n2): Prove that for every input of size n, the algorithm takes no more than cn2 time.

Ω(n2): Find one input of size n, for which the algorithm takes at least this much time.

θ (n2): Do both.

The time complexity of an algorithm isthe largest time required on any input of size n.

70

Time Complexity of Problem

O(n2): Provide an algorithm that solves the problem in no more than this time.

Ω(n2): Prove that no algorithm can solve it faster.θ (n2): Do both.

The time complexity of a problem is the time complexity of the fastest algorithm that solves the problem.

71

Classifying FunctionsFunctions

Poly L

ogarithmic

Polynom

ial

Exponential

Exp

Double E

xp

Constant

(log n)5 n5 25n5 2n5 25n

2<< << << << <<

(log n)θ(1) nθ(1) 2θ(n)θ(1) 2nθ(1) 2θ(n)

2

72

Classifying Functions

Linear

Quadratic

Cubic

?

θ(n2)θ(n) θ(n3)

Polynomial = nθ(1)

θ(n4)

Others

θ(n3 log7(n))log(n) not absorbed

because not Mult-constant

73

BigOh and Theta?

• 5n2 + 8n + 2log n = (n2)

Drop low-order terms.Drop multiplicative constant.

• 5n2 log n + 8n + 2log n = (n2 log n)

74

Notations

Theta f(n) = θ(g(n)) f(n) ≈ c g(n)

BigOh f(n) = O(g(n)) f(n) ≤ c g(n)

Omega f(n) = Ω(g(n)) f(n) ≥ c g(n)

Little Oh f(n) = o(g(n)) f(n) << c g(n)

Little Omega f(n) = ω(g(n)) f(n) >> c g(n)

75

Definition of Theta

c c n n n c g n f n c g n1 2 0 0 1 2, , , , ( ) ( ) ( )

f(n) = θ(g(n))

76

Definition of Theta

f(n) is sandwiched between c1g(n) and c2g(n)

c c n n n c g n f n c g n1 2 0 0 1 2, , , , ( ) ( ) ( )

f(n) = θ(g(n))

77

Definition of Theta

f(n) is sandwiched between c1g(n) and c2g(n)

for some sufficiently small c1 (= 0.0001)

for some sufficiently large c2 (= 1000)

c c n n n c g n f n c g n1 2 0 0 1 2, , , , ( ) ( ) ( )

f(n) = θ(g(n))

78

Definition of Theta

For all sufficiently large n

c c n n n c g n f n c g n1 2 0 0 1 2, , , , ( ) ( ) ( )

f(n) = θ(g(n))

79

Definition of Theta

For all sufficiently large n

For some definition of “sufficiently large”

c c n n n c g n f n c g n1 2 0 0 1 2, , , , ( ) ( ) ( )

f(n) = θ(g(n))

80

Gauss ∑i=1..n i = 1 + 2 + 3 + . . . + n

= Q(# of terms · last term)

Arithmetic Sum

n+1 n+1 n+1 n+1 n+1

n

n

n

n

n

n

Adding Made Easy

81

Gauss ∑i=1..n i = 13 + 23 + 33 + . . . + n3

= Q(# of terms · last term)

Arithmetic Sum

True when ever terms increase slowly

Adding Made Easy

82

∑i=0..n ri = r0 + r1 + r2 +. . . + rn

= Q(biggest term)

Geometric Increasing

Adding Made Easy

83

• Geometric Like: If f(n) ³ 2Ω(n), then ∑i=1..n f(i) = θ(f(n)).

• Arithmetic Like: If f(n) = nθ(1)-1, then ∑i=1..n f(i) = θ(n · f(n)).

• Harmonic: If f(n) = 1/n , then ∑i=1..n f(i) = logen + θ(1).

• Bounded Tail: If f(n) £ n-1-Ω(1), then ∑i=1..n f(i) = θ(1).

(For +, -, ×, , exp, log functions f(n))

This may seem confusing, but it is really not.

It should help you compute most sums easily.

Adding Made Easy

84

Logs and Exp

• properties of logarithms:

logb(xy) = logbx + logby

logb (x/y) = logbx - logby

logbxa = alogbx

logba = logxa/logxb• properties of exponentials:

a(b+c) = aba c

abc = (ab)c

ab /ac = a(b-c)

b = a logab

bc = a c*logab

85

Easy.I choose a trillion trillion.

Say, I have a game for you.We will each choose an integer.You win if yours is bigger.I am so nice, I will even let you go first.

Well done. That is big!

Understand Quantifiers!!!

But I choose a trillion trillion and oneso I win.

86

You laugh but this is a very important game

in theoretical computer science.

You choose the size of your Java program.Then I choose the size of the input.

Likely |I| >> |J|So you better be sure your Java program

can handle such long inputs.


87

The first order logic we can statethat I win the game:

"x, $y, y>x

The proof:Let x be an arbitrary integer.

Let y = x+1Note y = x+1 >x


Good game.Let me try again. I will win this time!

88


Fred

LaytonJohn

Bob

Sam

One politician

Fred

LaytonJohn

HarperBob

Sam

Could be a different politician.

$politician, "voters, Loves(v, p)

"voters, $politician, Loves(v, p)

89

Fred

LaytonJohn

Bob

Sam

Fred

LaytonJohn

HarperBob

Sam

“There is a politician that is loved by everyone.”

This statement is “about” a politician.

The existence of such a politician.

We claim that this politician is “loved by everyone”.

$politician, "voters, Loves(v, p)

"voters, $politician, Loves(v, p)

[ ]

[ ]

“Every voter loves some politician.”

This statement is “about” voters.

Something is true about every voter.

We claim that he “loves some politician.”


90

A Computational Problem P states• for each possible input I • what the required output P(I) is.

An Algorithm/Program/Machine M is • a set of instructions

(described by a finite string “M”) • on a given input I• follow instructions and

• produces output M(I)• or runs for ever.

Eg: Sorting

Eg: Insertion Sort


91

M(I)=P(I)"I,$M,

Problem P iscomputable if

There exists is a single algorithm/machinethat solves P for every input


Play the following game to prove it!

92

M(I)=P(I)"I,$M,



Two players:a prover and a disprover.

93

M(I)=P(I)"I,$M,



They read the statement left to right.

I produce the object when it is a .$

I produce the object when it is a ".

I can always winif and only if

statement is true.The order the players go

REALY matters.

94

I have a machine M that I claim works.

I win if M on input I gives the correct output

Oh yeah, I have an input I for which it does not.

M(I)=P(I)"I,$M,


What we have been doing all along.


95

"M, $I, M(I) P(I)M(I)=P(I)"I,$M,

Problem P isuncomputable if

I win if M on input I gives the wrong output

I have a machine M that I claim works.

I find one counter example input I for which

his machine M fails us.


Generally very hard to do.


96

"M, $I, M(I) Halting(I)M(I)=Sorting(I)"I,$M,"I, $M, M(I) = Halting(I)

The order the players go REALY matters.

If you don’t know if it is true or not, trust the

game.

true


Problem P iscomputable if true


97

"M, $I, M(I) Halting(I)

Given I either Halting(I) = yes orHalting(I) = no.

I give you an input I.

"I, Myes(I) says yes

"I, Mno(I) says no

"I, $M, M(I) = Halting(I)

I don’t know which, but one of these does the trick.

true

trueM(I)=Sorting(I)"I,$M,

Problem P iscomputable if true


A tricky one.


98

• Problem P is computable in polynomial time.

• Problem P is not computable in polynomial time.

• Problem P is computable in exponential time.

• The computational class “Exponential Time" is strictly bigger than the computational class “Polynomial Time”.

M, c, n0,"I, M(I)=P(I) & (|I| < n0 or Time(M,I) ≤ |I|c)

M, c, n0, I, M(I)≠P(I) or (|I| ≥ n0 & Time(M,I) > |I|c)

M, c, n0,"I, M(I)=P(I) & (|I| < n0 or Time(M,I) ≤ 2c|I|)

[ M, c, n0, I, M(I)≠P(I) or (|I| ≥ n0 & Time(M,I) > |I|c)][ M, c, n0,"I, M(I)=P(I) & (|I| < n0 or Time(M,I) ≤ 2c|I|)]

P, &


99

Jeff Edmonds

York UniversityCOSC 2011Lecture 5

One Step at a TimeStack of Stack FramesFriends and Strong InductionRecurrence RelationsTowers of HanoiCheck ListMerge & Quick SortSimple Recursion on TreesBinary Search TreeThings not to doHeap Sort & Priority QueuesTrees Representing Equations Pretty PrintParsingIterate over all s-t PathsRecursive ImagesAckermann's Function

Recursion

100

Precondition Postcondition

On Step At a Time

I implored you to not worry about the entire computation.

next

It can be difficult to understand where computation go.

x/4 3y (x-3x1) y

Strange(x,y):

x1 = x/4; y1 = 3y; f1 = Strange( x1, y1 ); x2 = x - 3x1; y2 = y; f2 = Strange( x2, y2 ); return( f1+f2 );

101

• Consider your input instance• If it is small enough solve it on your own.• Allocate work –Construct one or more sub-instances

• It must be smaller• and meet the precondition

–Assume by magic your friends give you the answer for these.

• Use this help to solve your own instance.• Do not worry about anything else.

– Micro-manage friends by tracing out what they and their friend’s friends do.

– Who your boss is.

X = 7Y = 15XY = 105

X = 9Y = 5XY = 45

Strange(x,y): If x < 4 then return( xy ); x1 = x/4; y1 = 3y; f1 = Strange( x1, y1 ); x2 = x - 3x1; y2 = y; f2 = Strange( x2, y2 ); return( f1+f2 );

?• Know Precond: ints x,y Postcond: ???

Friends & Strong Induction

x = 30; y = 5x1 = 7; y1 = 15f1 = 105x2 = 9; y2 = 5f2 = 45return 150

102

Recurrence Relations» Time of Recursive Program

procedure Eg(In) n = |In| if(n£1) then put “Hi” else loop i=1..nc

put “Hi” loop i=1..a In/b = In cut in b pieces

Eg(In/b)

T(1) = 1T(n) = a T(n/b) + nc

• n is the “size” of our input.• a is the number of “friends”• n/b is the “size” of friend’s input.• nc is the work I personally do.

103

nT(n) =

n/2 n/2 n/2n/2

11111111111111111111111111111111 . . . . ……………………... . 111111111111111111111111111111111

n/4 n/4 n/4n/4n/4n/4n/4n/4n/4n/4n/4n/4n/4n/4n/4 n/4

104

Evaluating: T(n) = aT(n/b)+f(n)

LevelInstance

size

Workin stackframe

# stack frames

Work in Level

0 n f(n) 1 1 · f(n)

1 n/b f(n/b) a a · f(n/b)

2 n/b2 f(n/b2) a2 a2 · f(n/b2)

i n/bi f(n/bi) ai ai · f(n/bi)

h = log n/log b n/bh T(1) nlog a/log b n · T(1)

log a/log b

Total Work T(n) = ∑i=0..h ai×f(n/bi)

105


106

Time for top level:

Time for base cases:

Dominated?: c = 1 < 2 = log a/log b

θ(n ) =log a/log b θ(n ) = θ(n2)

log 4/log 2

Hence, T(n) = ?= θ(base cases) = θ(n ) = θ(n2).log a/log b

Evaluating: T(n) = aT(n/b) + nc

= 4T(n/2) + n

n c == n1 1

If we reduce the number of friends from 4 to 3is the savings just 25%?

107

Time for top level:


Dominated?: c = 1 < 1.58 = log a/log b

θ(n ) =log a/log b θ(n ) = θ(n1.58)

log 3/log 2

Hence, T(n) = ?= θ(base cases) = θ(n ) = θ(n1.58).log a/log b


= 3T(n/2) + n

n c == n1 1

Not just a 25% savings! θ(n2) vs θ(n1.58..)

108

Time for top level: n2, c=2


Dominated?: c = 1.58 = log a/log b


log 3/log 2

Hence, T(n) = ?= θ(top level) = θ(n2).


= 3T(n/2) + n 2

2 >

109

Time for top level: n1.58, c=1.58


Dominated?: c = 1.58 = log a/log b Hence, all θ(logn) layers require a total of θ(n1.58) work.The sum of these layers is no longer geometric.Hence T(n) = θ(n1.58 logn)


log 3/log 2


= 3T(n/2) + n 1.58

110


LevelInstance

size

Workin stackframe

# stack frames

Work in Level

0 n f(n) 1 1 · f(n)

1 n/b f(n/b) a a · f(n/b)

2 n/b2 f(n/b2) a2 a2 · f(n/b2)

i n/bi f(n/bi) ai ai · f(n/bi)

h = log n/log b n/bh T(1) nlog a/log b n · T(1)

log a/log b

All levels the same: Top Level to Base Cases

111


112

Check Lists for Recursive Programs

This is the format of “all” recursive programs.Don’t deviate from this.

Or else!

113

Merge Sort

8814

982562

52

79

3023

31Split Set into Two

(no real work)

25,31,52,88,98

Get one friend to sort the first half.

14,23,30,62,79

Get one friend to sort the second half.

114

Merge Sort

Merge two sorted lists into one

25,31,52,88,98

14,23,30,62,79

14,23,25,30,31,52,62,79,88,98

115

115

Java Implementation

Andranik Mirzaian

116

116

Java Implementation

Andranik Mirzaian

117

Quick Sort

8814

982562

52

79

3023

31

Partition set into two using randomly chosen pivot

14

2530

2331

88 9862

79≤ 52 ≤

118

Quick Sort

14

2530

2331

88 9862

79≤ 52 ≤

14,23,25,30,31

Get one friend to sort the first half.

62,79,98,88

Get one friend to sort the second half.

119

Quick Sort

14,23,25,30,31

62,79,98,88

52

Glue pieces together. (No real work)

14,23,25,30,31,52,62,79,88,98

Faster Because all done “in place”

ie in the input array

120

Java Implementation

Andranik Mirzaian

120

121

Recursion on TreesA binary tree is: - the empty tree - a node with a right and a left sub-tree.

3

8

1

3 2

2

7

6

5

9

4

1

(define)

122

Recursion on Treesnumber of nodes = ?

3

8

1

3 2

2

7

6

5

9

4

1

65

Get help from friends

(friends)

123

Recursion on Treesnumber of nodes

3

8

1

3 2

2

7

6

5

9

4

1

= number on left + number on right + 1= 6 + 5 + 1 = 12

65

(friends)

124

Recursion on Trees

Base Case ?

3

8

1

3 2

2

7

6

5

9

4

1

number of nodes

0

Base case!

(base case)

125

Recursion on Trees

3

8

1

3 2

2

7

6

5

9

4

1

(communication)

Being lazy, I will only considermy root node and my communication with my friends.

I will never look at my children subtreesbut will trust them to my friends.

6

126

Recursion on Trees (code)

127

Recursion on Trees

3

Designing Program/Test Cases

generic generic

3

generic

3

00

0+0+1=1

n1 n2

n1 + n2 + 1

0n1

n1+0+1

Same code works!

Same code works!

Try same code

Try same code

0

Base Case

(cases)

128

Recursion on Trees

One stack frame • for each node in the tree• and for empty trees hang offAnd constant work per stack frame.

= Q(n) ×

Time: T(n) = ∑stack frame Work done by stack frame

Q(1) = Q(n)

(time)

129

Recursion on Trees

number of nodes =

One friend for each sub-tree.

3

8

1

3 2

2

7

6

5

9

4

1 4

42

2 4

Many Children

4 + 2 + 4 + 2 + 1 = 13

(friends)(mult-children)

130

Recursion on Trees

3

Designing Program/Test Cases

generic generic

3

generic

3 0+1=1

n1

n1 + 1

Same code works!

Same code works!

Try same code

Try same code

generic

n1 n3

n1 + n2 + n3 + 1

n2

But is this needed (if not the input)

(cases)(mult-children)

131

Recursion on Trees (code)(mult-children)

132

Recursion on TreesTime: T(n) = ∑stack frame Work done by stack frame

= Q(edgeTree)

(time)(mult-children)

= ∑stack frame Q(# subroutine calls)

= Q(nodeTree) = Q(n)

= ∑node Q(# edge in node)

133

We pass the recursive program a “binary tree”.But what type is it really?This confused Jeff at first.

class LinkedBinaryTree { class Node { E element; Node parent; Node left; Node right; } Node root = null;

Tree

Recursion on Trees

134

One would think tree is of type LinkedBinaryTree.Then getting its left subtree,would be confusing.


Tree

Recursion on Trees

left_Tree

(LinkedBinaryTree tree)

135

One would think tree is of type LinkedBinaryTree.Then getting its left subtree,would be confusing.


Tree

Recursion on Trees

left_Tree

(LinkedBinaryTree tree)

136

It is easier to have tree be of type Node.But it is thought of as the subtree rooted at the node pointed at.The left child is then


Tree

Recursion on Trees

(Node tree)

tree

tree.left or tree.Getleft()

or Tree.leftSub(tree) or leftSub(tree)

137

But the outside user does not know about pointers to nodes.

class LinkedBinaryTree { class Node { E element; Node parent; Node left; Node right; } Node root = null; public int NumberNodes() { return NumberNodesRec( root ); }

Tree

Recursion on Treestree

private int NumberNodesRec (Node tree)

NumberNodesRecNumberNodesRec

138

Jeff Edmonds


Balanced Trees

Dictionary/Map ADTBinary Search TreesInsertions and DeletionsAVL TreesRebalancing AVL TreesUnion-Find PartitionHeaps & Priority QueuesCommunication & Hoffman Codes(Splay Trees)

139

Dictionary/Map ADT

Problem: Store value/data associated with keys.

Inputkey, value

k1,v1k2,v2k3,v3k4,v4

Examples:• key = word, value = definition• key = social insurance number

value = person’s data

140

Dictionary/Map ADT


Inputkey, value


Insert SearchUnordered Array O(n)O(1)Implementations:

Array

0

1

2

3

4

5

6

7

…

k5,v5

141

Dictionary/Map ADT


Inputkey, value

2,v34,v47,v19,v2

Insert Search

Ordered ArrayUnordered Array

O(n)O(n)O(1)O(logn)

Implementations:

Array

0

1

2

3

4

5

6

7

…

6,v5

142

trailerheader

nodes/positions

entries

Dictionary/Map ADT


Inputkey, value

2,v34,v47,v19,v2

Insert Search

Ordered ArrayOrdered Linked List

Unordered ArrayO(n)O(n)

O(n)O(1)O(logn)

Implementations:

O(n)

6,v5

Inserting is O(1) if you have the spot.but O(n) to find the spot.

143

Dictionary/Map ADT


Inputkey, value

2,v34,v47,v19,v2

Insert Search


Binary Search Tree

O(n)O(n)

O(logn)

O(1)

O(logn)

O(logn)

Implementations:

38

25

17

4 21

31

28 35

51

42

40 49

63

55 71

144

Dictionary/Map ADT


Inputkey, value

2,v34,v47,v19,v2

Insert Search


Binary Search Tree

O(n)O(n)

O(logn)

O(1)

O(logn)

O(logn)

Heaps Faster: O(logn) O(n)

Implementations:

Max O(1)

Heaps are good for Priority Queues.

145

Dictionary/Map ADT


Inputkey, value

2,v34,v47,v19,v2

Hash Tables Avg: O(1) O(1)

Next

O(1) (Avg)

O(1)

O(n)

Hash Tables are very fast,but keys have no order.

Insert Search


Binary Search Tree

O(n)O(n)

O(logn)

O(1)

O(logn)

O(logn)

Heaps Faster: O(logn)

Implementations:

Max O(1)

146

Unsorted List

SortedList

Balanced Trees

Splay Trees

Heap Hash Tables

•Search

•Insert•Delete

•Find Max

•Find Next in Order

O(log(n))

Balanced Trees

O(n)

O(1) O(n)

O(1)

O(1)

O(n)

O(n)

O(log(n))

O(log(n))

O(log(n))

O(log(n))

O(1)

O(n)Amortized

O(1)

(Priority Queue)

O(1)

O(n)

O(n)

Worst case O(n)

Practice better

O(log(n))better

(Dictionary)

O(n)

(Static)

From binary search toBinary Search Trees

147

38

25

17

4 21

31

28 35

51

42

40 49

63

55 71

Binary Search Tree

All nodes in left subtree ≤ Any node ≤ All nodes in right subtree

≤≤ ≤

key 17

38

25

17

4 21

31

28 35

51

42

40 49

63

55 71

Algorithm TreeSearch(k, v)v = T.root()

loop if T.isExternal (v)

return “not there” if k < key(v) v = T.left(v) else if k = key(v) return v else { k > key(v) }

v = T.right(v)end loop

Move down the tree. Loop Invariant: If the key is contained in the original tree, then the key is contained in the sub-tree rooted at the current node.

Iterative Algorithm

key 17

38

25

17

4 21

31

28 35

51

42

40 49

63

55 71

Recursive Algorithm If the key is not at the root, ask a friend to look for it in the

appropriate subtree.

Algorithm TreeSearch(k, v)if T.isExternal (v)

return “not there”if k < key(v)

return TreeSearch(k, T.left(v))else if k = key(v)

return velse { k > key(v) }

return TreeSearch(k, T.right(v))

3

1

11

9 12

8

v

w

2

10

6

5 7

4

Insertions/Deletions

To insert(key, data):

We search for key.

Not being there, we end up in an empty tree.

Insert the key there.

Insert 10

3

1

11

9 12

8

v

w

2

10>

6

5 7

4


To Delete(keydel, data):

If it does not have two children,

point its one child at its parent.

Delete 4

keydel

3

1

11

9 12

8w

2

10>

6

5 7

4


To Delete(keydel, data):

else find the next keynext in order

right left left left …. to empty tree

Replace keydel to delete with keynext

point keynext’s one child at its parent.

Delete 3

keynext

keydel

Performance find, insert and remove take O(height) time

In a balanced tree, the height is O(log n)

In the worst case, it is O(n)

Thus it worthwhile to balance the tree (next topic)!

AVL Trees AVL trees are “mostly” balanced. Tree is said to be an AVL Tree if and only if

heights of siblings differ by at most 1. balanceFactor(v)

= height(rightChild(v)) - height(leftChild(v)) { -1,0,1 }.

Claim: The height of an AVL tree storing n keys is ≤ O(log n).

88

44

17 78

32 50

48 62

2subtreeheight 3

subtreeheight

balanceFactor = 2-3 = -1

Cases

Rebalancing after an Insertion

x

z

y

height = h

T0 T1

T2

T3

h-1 h-3

h-2

one is h-3 & one is h-4

h-3 x

z

y

height = h

T1 T2

T0

T3

h-1 h-3

h-2


h-3

+2

+1

+2

-1

x

z

y

height = h

T3T2

T1

T0

h-1h-3

h-2


h-3 x

z

y

height = h

T2T1

T3

T0

h-1h-3

h-2


h-3

-2

-1

-2

+1


7

4

3

8

5

Problem!

Increases heights along path from leaf to root.

6 balanceFactor0

-1

+2

-1

Inserting new leaf 6 in to AVL tree


7

4

3

8

5

Problem!

6

-1

+2

-1

Denote

z = the lowest imbalanced node

y = the child of z with highest subtree

x = the child of y with highest subtree



z

y

x

T2 T3

T4

T1

rotateL(y) z

x

y

y ≤ x ≤ z



z

y

x

T2 T3

T4

T1

rotateL(y)

rotateR(z)

z

x

y

y ≤ x ≤ zy z

x




z

y

x

T2 T3

T4

T1

y z

x

T1 T2 T3T4

• This subtree is balanced.• And shorter by one.• Hence the whole is an AVL Tree

Rest of Tree Rest of Tree

162

7

19

31

23

13

2

1

15

1

0 0

3

0 0 0 0

0

1

3 2

1

4

5

3

40

0 0

1

2

9

11

0 0

1

2

8

0 0

1

Rebalancing after an InsertionExample: Insert 12

163

7

19

31

23

13

2

1

15

1

0 0

3

0 0 0 0

0

1

3 2

1

4

5

3

40

0 0

1

2

9

11

0 0

1

2

8

0 0

1

w

Step 1.1: top-down search


164

7

19

31

23

13

2

1

15

1

0 0

3

0 0 0 0

0

1

3 2

1

4

5

3

40

0 0

1

2

9

11

0

1

2

8

0 0

1

w

Step 1.2: expand and insert new item in it

12

0 0

1


165

7

19

31

23

13

2

1

15

1

0 0

3

0 0 0 0

0

1

4 2

1

4

5

3

40

0 0

1

2

9

11

0

2

3

8

0 0

1

w

Step 2.1: move up along ancestral path of ; updateancestor heights; find unbalanced node.

12

0 0

1

imbalance


166

7

19

31

23

13

2

1

15

1

0 0

3

0 0 0 0

0

1

4 2

1

4

5

3

40

0 0

1

2

9

11

0

2

3

8

0 0

1

Step 2.2: trinode discovered (needs double rotation)

12

0 0

1

x

y

z


167

7

19

31

23

11

2

1

15

1

0 0

3

0 0

0 0

01

3 2

1

4

5

3

40

0 0

1

2

9 13

0

22

8

0 0

1

Step 2.3: trinode restructured; balance restored. DONE!

12

0 0

1

zy

x


Rebalancing after a deletion

Very similar to before.

Unfortunately, trinode restructuring may reduce the height of the subtree, causing another imbalance further up the tree.

Thus this search and repair process must in the worst case be repeated until we reach the root.

See text for implementation.

169

End

Midterm Review

170

Union-Find Data structure.

Average Time = Akerman’s-1(E) 4

171

Heaps, Heap Sort, &Priority Queues

J. W. J. Williams, 1964

172

Abstract Data Types

Restricted Data Structure:Some times we limit what operation can be done• for efficiency • understandingStack: A list, but elements can only be pushed onto and popped from the top.Queue: A list, but elements can only be added at the end and removed from the front. • Important in handling jobs.Priority Queue: The “highest priority” element is handled next.

173

Priority Queues

Sorted List

UnsortedList

Heap

•Items arrive with a priority.

O(n) O(1) O(logn)

•Item removed is that with highest

priority.

O(1) O(n) O(logn)

174

Heap Definition• Completely Balanced Binary Tree• The value of each node ³ each of the node's children. • Left or right child could be larger.

Where can 1 go?Maximum is at root.

Where can 8 go?

Where can 9 go?

175

Heap Data StructureCompletely Balanced Binary Tree

Implemented by an Array

176

Heap Pop/Push/Changes

With Pop, a Priority Queue returns the highest priority data item.

This is at the root.

21

21

177


But this is now the wrong shape!To keep the shape of the tree,

which space should be deleted?

178


What do we do with the element that was there?Move it to the root.

3

3

179


But now it is not a heap!The left and right subtrees still are heaps.

3

3

180


But now it is not a heap!

3

3

The 3 “bubbles” down until it finds its spot.

The max of these three moves up.

Time = O(log n)

181

When inserting a new item,to keep the shape of the tree,

which new space should be filled?

21

21


182

21

21


But now it is not a heap!The 21 “bubbles” up until it finds its spot.

The max of these two moves up.30

30

Time = O(log n)

183

Adaptable Heap Pop/Push/Changes

But now it is not a heap!The 39 “bubbles” down or up until it finds its spot.

Suppose some outside userknows about

some data item cand remembers where

it is in the heap.And changes its priority

from 21 to 39 21 c

21

39

3927

184

Adaptable Heap Pop/Push/Changes

But now it is not a heap!The 39 “bubbles” down or up until it finds its spot.

Suppose some outside useralso knows about

data item f and its location in the heap just changed.The Heap must be ableto find this outside userand tell him it moved.21 c

27

39

39

27 f

Time = O(log n)

185

Heap Implementation• A location-aware heap entry

is an object storing key value position of the entry in the

underlying heap

• In turn, each heap position stores an entry

• Back pointers are updated during entry swaps

Last Update: Oct 23, 2014

Andy 185

4 a

2 d

6 b

8 g 5 e 9 c

186

Selection Sort

Largest i values are sorted on side.Remaining values are off to side.

6,7,8,9<3

415

2

Exit

79 km 75 km

Exit

Max is easier to find if a heap.

Selection

187

Heap Sort

Largest i values are sorted on side.Remaining values are in a heap.

Exit

79 km 75 km

Exit

O(n log n) time.

188

Communication & Entropy

Claude Shannon (1948)

10 10 10 10 10

10

10

10

10

10

Use a Huffman Codedescribed by a binary tree.

001000101

I first get , the I start over to get

189

Communication & Entropy

Claude Shannon (1948)

10 10 10 10 10

10

10

10

10

10

Objects that are more likely will have shorter codes.

I get it.I am likely to answer .

so you give it a 1 bit code.

190

Jeff Edmonds


Hash Tables

Dictionary/Map ADTDirect AddressingHash TablesRandom Algorithms and Hash FunctionsKey to IntegerSeparate ChainingProbe SequencePut, Get, Del, IteratorsRunning TimeSimpler Schemes

191

Random Balls in Bins Throw m/2 balls (keys)

randomlyinto m bins (array cells)

The balls get spread out reasonably well.- Exp( # a balls a ball shares a bin with ) = O(1)- O(1) bins contain O(logn) balls.

192

Dictionary/Map ADT


Inputkey, value


Examples:• key = word, value = definition• key = social insurance number

value = person’s data

193

• Map ADT methods:– put(k, v): insert entry (k, v) into the map M.

• If there is a previous value associated with k return it.• (Multi-Map allows multiple values with same key)

– get(k): returns value v associated with key k. (else null)– remove(k): remove key k and its associated value.– size(), isEmpty()– Iterator:

• keys(): over the keys k in M• values(): over the values v in M• entries(): over the entries k,v in M

Dictionary/Map ADT

194

Dictionary/Map ADT


Inputkey, value


Insert SearchUnordered Array O(n)O(1)Implementations:

Array

0

1

2

3

4

5

6

7

…

k5,v5

195

Dictionary/Map ADT


Inputkey, value

2,v34,v47,v19,v2

Insert Search


O(n)O(n)O(1)O(logn)

Implementations:

Array

0

1

2

3

4

5

6

7

…

6,v5

196

trailerheader

nodes/positions

entries

Dictionary/Map ADT


Inputkey, value

2,v34,v47,v19,v2

Insert Search


Unordered ArrayO(n)O(n)

O(n)O(1)O(logn)

Implementations:

O(n)

6,v5

Inserting is O(1) if you have the spot.but O(n) to find the spot.

197

Dictionary/Map ADT


Inputkey, value

2,v34,v47,v19,v2

Insert Search


Unordered Array

Binary Search Tree

O(n)O(n)

O(n)

O(logn)

O(1)

O(logn)

O(logn)

Implementations:

O(n)

38

25

17

4 21

31

28 35

51

42

40 49

63

55 71

198

Dictionary/Map ADT


Inputkey, value

Insert Search


Unordered Array

Binary Search Tree

O(n)O(n)

O(n)

O(logn)

O(1)

O(logn)

O(logn)

Hash Tables Avg: O(1) O(1)

NextImplementations:

O(n)

O(n)

O(1) (Avg)

O(1)

O(n)

O(1)

Hash Tables are very fast,but keys have no order.

5,v19,v22,v37,v4

199

Inputkey, value

Hash Tables

0

1

2

3

4

5

6

7

8

9

The Mappingfrom key

to Array Cellis many to one

CalledHash FunctionHash(key) = i

Universe of Keys

…

Consider an array # items stored.

0123456789

Universe of keys is likely huge.

(eg social insurance numbers)

200

Inputkey, value

Hash Tables

0

1

2

3

4

5

6

7

8

9

The Mappingfrom key

to Array Cellis many to one

CalledHash FunctionHash(key) = i

Universe of Keys

…

Consider an array # items stored.

0123456789

5,v1

Hash Function O(1) O(1)Insert SearchImplementations:

4,v55,?

9,v22,v37,v4

5,v1

4,v5

9,v22,v37,v4

Collisionsare a problem.

201

I have a algorithm A that I claim works.

Actually my algorithm always gives the right answer.

Oh yeah, I have a worst case input I for which it does not.

A(I)=P(I)"I,$A,

Understand Quantifiers!!!Problem P is computable if

Ok, but I found in set of keys I = key1,…, keyn

for which lots of collisions happenand hence the time is bad.

& Time(A,I) ≤ T

202

I have a random algorithm A that I claim works.

I know the algorithm A, but not its random coin flips R.I do my best to give you a worst case input I.

" R, AR(I)=P(I)"I,$A,Problem P is computable by a random algorithm if


Remember Quick SortExpectedR Time(AR,I) ≤ T

The random coin flips R are independent of the input.

Actually my algorithm always gives the right answer.

And for EVERY input I,the expected running time (over choice of R)

is great.

There are worst case coin flips but not worst case inputs.

203

Inputkey, value

Random Hash Functions

0

1

2

3

4

5

6

7

8

9

Universe of Keys

Fix the worst case input I

Choose a random mapping

Hash(key) = i

287 005,v1

193 005,v5287 005,?

923 005,v2394 005,v3482 005,v4

We don’t expect there to be a lot of collisions.

(Actually, the random Hash function likely is

chosen and fixed before the input comes, but the key is that the worst case input

does not “know” the hash function.)

287 005

193 005

923 005394 005

482 005

204

Inputkey, value


0

1

2

3

4

5

6

7

8

9

Universe of Keys

287 005,v1

193 005,v5287 005,?

923 005,v2394 005,v3482 005,v4

287 005

193 005

923 005394 005

482 005

Throw m/2 balls (keys)randomly

into m bins (array cells)

The balls get spread out reasonably well.- Exp( # a balls a ball shares a bin with ) = O(1)- O(1) bins contain O(logn) balls.

205


0

1

2

3

4

5

6

7

8

9

Universe of Keys

…

0123456789

Choose a random mapping Hash(key) = i

We want Hash to be computed in O(1) time.

Theory people useHash(key) = (akey mod p) mod N

N = size of arrayp is a prime > |U|

a is randomly chosen [1..p-1]n is the number of data items.

a adds just enough

randomness.

The integers mod p form a finite field

similar to the reals.

The mod N ensures the result indexes a

cell in the array.

.

206


0

1

2

3

4

5

6

7

8

9

Universe of Keys

…

0123456789

Pairwise Independence" k1 & k2 Pra( Hasha(k1)=Hasha(k2) ) = 1/N






Proof: Fix distinct k1&k2 U. Because p > |U|, k2-k1 mod p0. Because p is prime, every nonzero element has an inverse, eg 23mod5=1. Let e=(k2-k1)-1. Let D = a (k2-k1) mod p, a

= De mod p, and d = D mod N. k1&k2 collide iff d=0 iff D = jN for j[0..p/N] iff a = jNe mod p. The probability a has one of these p/N values is 1/p p/N = 1/N.

207


0

1

2

3

4

5

6

7

8

9

Universe of Keys

…

0123456789







Insert key k. Exp( #other keys in its cell ) = Exp( collision ) = Exp(collision)

= n 1/N = O(1).k1,k k1,k

208


0

1

2

3

4

5

6

7

8

9

Universe of Keys

…

0123456789







Not much more independence. Knowing that Hasha(k1)=Hasha(k2) decreases the range of a from p to p/N values. Doing this logp/logN

times, likely determines a, and hence all further collisions.

209


0

1

2

3

4

5

6

7

8

9

Universe of Keys

…

0123456789

This is usually written akey+b.The b adds randomness to which cells get hit,

but does not help with collisions.






210

Inputkey, value

Handling Collisions

0

1

2

3

4

5

6

7

8

9

10

Universe of Keys

…

0123456789

583, v3394,v1

482,v2

394,v1482,v2 583,v3

Handling CollisionsWhen different data items are mapped to

the same cell

211

Inputkey, value

Separate Chaining

Universe of Keys

…

0123456789

583, v3394,v1

482,v2

394,v1482,v2 583,v3

Separate ChainingEach cell uses external

memory to store all the data items hitting that cell.

Simplebut requires

additional memory

482,v2

394,v1 583,v3

0

1

2

3

4

5

6

7

8

9

10

212

Inputkey, value

A Sequence of Probes

Universe of Keys

…

0123456789

583, v3394,v1

482,v2

394,v1482,v2 583,v3

Open addressingThe colliding item is placed in a different cell of the table

0

1

2

3

4

5

6

7

8

9

10

213

Inputkey, value


Universe of Keys

…

0123456789

583, v3

394,v1

482,v2

103,v6Cells chosen by a

sequence of probes.

put(key k, value v) i1 = (ak mod p) mod N

903,v5

290,v4 Theory people usei1 = Hash(key) = (akey mod p) mod N

8321009

11N = size of arrayp is a prime > |U|

a[1,p-1] is randomly chosen

akey = 832103 = 8569685696 mod 1009 = 940

940 mod 11 = 5 = i1

5 0

1

2

3

4

5

6

7

8

9

10

214

Inputkey, value


Universe of Keys

…

0123456789

394,v1

482,v2

103,v6Cells chosen by a

sequence of probes.

put(key k, value v) i1 = (ak mod p) mod N

903,v5

290,v4

5

103,v6

This was our first in the sequence of probes.

1

0

1

2

3

4

5

6

7

8

9

10

583, v3

215

Inputkey, value


Universe of Keys

…

0123456789

394,v1

482,v2

103,v6

put(key k, value v) i1 = (ak mod p) mod N d = (bk mod q) + 1

903,v5

290,v4

5

Double Hashto get sequence distance d.

1

0

1

2

3

4

5

6

7

8

9

10

3

583, v3

216

Inputkey, value


Universe of Keys

…

0123456789

394,v1

482,v2

103,v6

903,v5

290,v4

5

1

0

1

2

3

4

5

6

7

8

9

10

3

put(key k, value v) i1 = (ak mod p) mod N d = (bk mod q) + 1 for j = 1..N i = i1 + (j-1)d mod N

2

d=3

583, v3

d=3

3

3

4

5If N is prime,

this sequence will reach each cell.

Double Hashto get sequence distance d.

217

Inputkey, value


Universe of Keys

…

0123456789

394,v1

482,v2

103,v6

903,v5

290,v4

1

0

1

2

3

4

5

6

7

8

9

10

put(key k, value v) i1 = (ak mod p) mod N d = (bk mod q) + 1 for j = 1..N i = i1 + (j-1)d mod N if ( cell(i)==empty ) cell(i) = k,v return if ( cellkey(i)==k ) vold = cellvalue(i) cell(i) = k,v return vold

2

583, v3

3

3

4

5

Stop this sequence of probes when:Cell is empty

103,v6

or key already there

was not there

value53

218

Inputkey, value


Universe of Keys

…

0123456789

394,v1

482,v2

103,v6

903,v5

290,v4

0

1

2

3

4

5

6

7

8

9

10

583, v3

103,v6

value del(key k)

115,v7

115,v7

Del 583

103,?477,?

deleted

219

Inputkey, value

Running TimeThe load factor a = n/N < 0.9

# data items / # of array cells.

Universe of Keys

…

0123456789

394,v1

0

1

2

3

4

5

6

7

8

9

10

938,v2

193,v3472,v4

873,v5093,v6

N = 11n = 6a = 6/11

For EVERY input:Expected number of probes is

= O(1/(1-a)) = O(1)

220

Inputkey, value When the load factor gets

bigger than some threshold,rehash all items into an array

that is double the size.

Universe of Keys

…

0123456789

394,v1

0

1

2

3

4

5

6

7

8

9

10

938,v2

193,v3472,v4

873,v5093,v6

Total cost of doubling

= 1 + 2 + 4 + 8 + 16 + … + n = 2n-1

amortized time = TotalTime(n)/n = (2n-1)/n = O(1).

Running Time

221

Searching in a Graph

Jeff Edmonds


Generic SearchBreadth First SearchDijkstra's Shortest Paths AlgorithmDepth First SearchLinear Order

Graphs• A graph is a pair (V, E), where

– V is a set of nodes, called vertices– E is a collection of pairs of vertices, called edges– Vertices and edges are positions and store elements

• Example:– A vertex represents an airport and stores the three-letter airport code– An edge represents a flight route between two airports and stores the

mileage of the route

Andy Mirzaian 222

ORD PVD

MIADFW

SFO

LAX

LGA

HNL

849

802

13871743

1843

10991120

1233337

2555

142

Last Update: Dec 4, 2014

Edge Types• Directed edge

– ordered pair of vertices (u,v)– first vertex u is the origin– second vertex v is the destination– e.g., a flight

• Undirected edge– unordered pair of vertices (u,v)– e.g., a flight route

• Directed graph– all the edges are directed– e.g., route network

• Undirected graph– all the edges are undirected– e.g., flight network

Andy Mirzaian 223

ORD PVDflight

AA 1206

ORD PVD849

miles


Applications• Electronic circuits

– Printed circuit board– Integrated circuit

• Transportation networks– Highway network– Flight network

• Computer networks– Local area network– Internet– Web

• Databases– Entity-relationship diagram

Andy Mirzaian 224Last Update: Dec 4, 2014

Terminology• End vertices (or endpoints) of an edge

– U and V are the endpoints of a• Edges incident on a vertex

– a, d, and b are incident on V• Adjacent vertices

– U and V are adjacent• Degree of a vertex

– X has degree 5 • Parallel edges

– h and i are parallel edges• Self-loop

– j is a self-loop

Andy Mirzaian 225

XU

V

W

Z

Y

a

c

b

e

d

f

g

h

i

j


Terminology (cont.)• Path

– sequence of alternating vertices and edges – begins with a vertex– ends with a vertex– each edge is preceded and followed

by its endpoints• Simple path

– path such that all its vertices and edges are distinct

• Examples:– P1 = (V,b,X,h,Z) is a simple path– P2 = (U,c,W,e,X,g,Y,f,W,d,V) is a path that is not simple

Andy Mirzaian 226

P1

XU

V

W

Z

Y

a

c

b

e

d

f

g

hP2


Terminology (cont.)• Cycle

– circular sequence of alternating vertices and edges – each edge is preceded and followed by its endpoints

• Simple cycle– cycle such that all its vertices and

edges are distinct

• Examples:– C1 = (V,b,X,g,Y,f,W,c,U,a,)

is a simple cycle– C2 = (U,c,W,e,X,g,Y,f,W,d,V,a,)

is a cycle that is not simple

Andy Mirzaian 227

C1

XU

V

W

Z

Y

a

c

b

e

d

f

g

hC2


PropertiesNotation

n number of vertices m number of edgesdeg(v) degree of vertex v

Property 1v deg(v) = 2mProof: each edge is counted twice

Property 2In an undirected graph with no self-loops and

no multiple edges m n (n - 1)/2Proof: each vertex has degree at most (n - 1)

What is the bound for a directed graph?


Vertices and Edges• A graph is a collection of vertices and edges. • We model the abstraction as a combination of three data

types: Vertex, Edge, and Graph. • A Vertex is a lightweight object that stores an arbitrary

element provided by the user (e.g., an airport code)– We assume it supports a method, element(), to retrieve the stored

element.

• An Edge stores an associated object (e.g., a flight number, travel distance, cost), retrieved with the element( ) method.


Graph ADT: part 1


Graph ADT: part 2


Edge List Structure


• Vertex object– element– reference to position in vertex sequence

• Edge object– element– origin vertex object– destination vertex object– reference to position in edge sequence

• Vertex sequence– sequence of vertex objects

• Edge sequence– sequence of edge objects

Adjacency List Structure


• Incidence sequence for each vertex– sequence of references to edge objects

of incident edges

• Augmented edge objects– references to associated positions in

incidence sequences of end vertices

Adjacency Map Structure


• Incidence sequence for each vertex– sequence of references to adjacent

vertices, each mapped to edge object of the incident edge

• Augmented edge objects– references to associated positions in

incidence sequences of end vertices

Adjacency Matrix Structure


• Edge list structure• Augmented vertex objects

– Integer key (index) associated with vertex• 2D-array adjacency array

– Reference to edge object for adjacent vertices– Null for non-adjacent vertices

• The “old fashioned” version just has0 for no edge and 1 for edge

Performance n vertices, m edges no parallel edges no self-loops

EdgeList

AdjacencyList

Adjacency Matrix

Space n + m n + m n2

incidentEdges(v) m deg(v) nareAdjacent (v, w) m min(deg(v), deg(w)) 1insertVertex(o) 1 1 n2

insertEdge(v, w, o) 1 1 1removeVertex(v) m deg(v) n2

removeEdge(e) 1 max(deg(v), deg(w)) 1


Subgraphs• A subgraph S of a graph G is a graph such that

– The vertices of S are a subset of the vertices of G– The edges of S are a subset of the edges of G

• A spanning subgraph of G is a subgraph that contains all the vertices of G

Andy Mirzaian 237

Subgraph Spanning subgraph


Connectivity• A graph is connected if there is a path between every

pair of vertices• A connected component of a graph G is a maximal

connected subgraph of G

Andy Mirzaian 238

Connected graphNon connected graph with

two connected components


Trees and Forests• A (free) tree is an undirected graph T such that

– T is connected– T has no cyclesThis definition of tree is different from the one of a rooted tree

• A forest is an undirected graph without cycles• The connected components of a forest are trees

Andy Mirzaian 239

Tree Forest


Spanning Trees and Forests• A spanning tree of a connected graph is a spanning

subgraph that is a tree• A spanning tree is not unique unless the graph is a tree• Spanning trees have applications to the design of

communication networks• A spanning forest of a graph is a spanning subgraph

that is a forest

Andy Mirzaian 240

Graph Spanning tree


241

sa

c

hk

fi

m

j

eb

gd

We know found nodes are reachable from s because we have traced out a path.

If a node has been handled, then all of its neighbors have been found.

Graph Search

l

242

Graph Search

Which foundNotHandled node do we handle?

• Queue: Handle in order found.

• Breadth-First Search

• Stack: Handle most recently found

• Depth-First Search

• Priority Queue: Handle node that seems to be closest to s.

• Shortest (Weighted) Paths:

243

Dijkstra'sHandled Nodes

uvs

Found Nodes

Handled paths go through handled edges through any number of handled nodes

followed by last edge to an unhandled node.

For handled w,d(w) is the length of the

shortest paths to w.

Handle node with smallest d(u).

d(v) is the length of the shortest handled

path to v.

244

Dijkstra's

0

1316

1720 ∞

Handle c

19 37

Π(d) =cΠ(e) =c

245

DFS

s

a

c

h

k

f

i

l

m

j

e

b

gd

s,1

FoundNot Handled

Stack <node,# edges>

a,1c,2i,0

246

Algorithmic Paradigms

Jeff EdmondsYork University COSC 2011Lecture 9

Brute Force: Optimazation ProblemGreedy Algorithm: Minimal Spanning TreeDual Hill Climbing: Max Flow / Min CutLinear Programing: HotdogsRecursive Back Tracking: Bellman-FordDynamic Programing: Bellman-FordNP-Complete Problems

247

• Ingredients: • Instances: The possible inputs to the problem. • Solutions for Instance: Each instance has an

exponentially large set of solutions. •Cost of Solution: Each solution has an easy to

compute cost or value. • Specification • <preCond>: The input is one instance.• <postCond>: A valid solution with optimal cost.

(minimum or maximum)

Optimization Problems

248

Iterative Greedy Algorithm:

Loop: grabbing the best, then second best, ...

if it conflicts with committed objects or fulfills no new requirements. Reject this next best objectelse Commit to it.

Problem: Choose the best m prizes.

Greedy Algorithms

249

We have not gone wrong. There is at least one optimal solution St

that extends the choices At made so far.

Loop Invariant

Take the lion because it looks best.

Consequences: If you take the lion, you can't take the elephant.

Maybe some optimal solutions do not contain the lion.But at least one does.

Greedy Algorithms

250

Minimal Spanning TreeInstance: A undirected graph with weights on the edges.s

c

ba

d

f

i j

hg

40

1

2

15

1 6

1

30

3301

2

12k

2

2

4

251

Minimal Spanning Tree

s

c

ba

d

f

i j

hg

40

1

2

15

1 6

1

30

3301

2

12k

Instance: A undirected graph with weights on the edges.Solution: A subset of edge• A tree (no cycles, not rooted)• Spanning

Connected nodes still connected.

Cost: Sum of edge weightsGoal: Find Minimal Spanning Tree

2

2

4

252

3

Minimal Spanning Tree

s

c

ba

d

f

i j

hg

40

1

2

15

1

4

6

1

30

2

30

2

1

2

12k

Instance: A undirected graph with weights on the edges.Solution: A subset of edge• A tree (no cycles)• Spanning

Connected nodes still connected.

Cost: Sum of edge weightsGoal: Find Minimal Spanning TreeGreedy Alg: Commit to the edge that looks the “best.”

Can’t add because of cycle.

Must prove that this is• acylic• spanning • optimal.

Done

253

Fixed Priority:Sort the objects from best to worst and loop through them.

Adaptive Priority:–Greedy criteria depends on which objects have been committed to so far. –At each step, the next “best” object is chosen according to the current greedy criteria. –Searching or re-sorting takes too much time. –Use a priority queue.

Adaptive Greedy

254

•Max Flow• Min Cut

Goal: Max Flow

Network Flow

U

V

= Canada

= USA

255

We have a valid solution.(not necessarily optimal)

Take a step that goes up.

measure

progress

Value of our solution.

Problems:

Exit Can't take a step that goes up.

Running time?

Initially have the “zero

Local Max

Global Max

Can our Network Flow Algorithm get stuck in a local maximum?

Make small local changes to your solution toconstruct a slightly better solution.

If you take small step,could be exponential time.

Primal-Dual Hill Climbing

256

Primal-Dual Hill Climbing

No Gap

Flowalg witness that network has this flow. Cutalg witness that network has no bigger flow.

Prove:• For every location to stand either:• the alg takes a step up or• the alg gives a reason that explains why not

by giving a ceiling of equal height.i.e. L [ L’ height(L’) > height(L) or R height(R) = height(L)]

257

Given today’s prices,what is a fast algorithm to find the cheapest hotdog?

Linear Programing

258

Cost: 29, 8, 1, 2

Amount to add: x1, x2, x3, x4

pork

grai

nw

ater

saw

dust

3x1 + 4x2 – 7x3 + 8x4 ≤ 122x1 - 8x2 + 4x3 - 3x4 ≤ 24

-8x1 + 2x2 – 3x3 - 9x4 ≤ 8x1 + 2x2 + 9x3 - 3x4 ≤ 31

Constraints: • moisture• protein,• …

29x1 + 8x2 + 1x3 + 2x4Cost of Hotdog:

Linear Programing(Abstract Out Essentials)

259

• Consider your instance I.• Ask a little question (to the little bird) about its optimal solution.• Try all answers k.

• Knowing k about the solutionrestricts your instance to a subinstance subI.

• Ask your recursive friend for a optimal solution subsol for it. • Construct a solution optS<I,k> = subsol + k

for your instance that is the best of those consistent with the kth bird' s answer.

• Return the best of these best solutions.

Recursive Back TrackingBellman Ford

260

Specification: All Nodes Shortest-Weighted Paths • <preCond>: The input is a graph G (directed or undirected)

with edge weights (possibly negative)• <postCond>: For each u,v, find a shortest path from u to v

Stored in a matrix Dist[u,v].

b

d

cu

k

g i

v

h

40

1 10

2

15

181

2

6

8

1

230325

12

3

For a recursive algorithm, we must give our friend a smaller subinstance.How can this instance be made smaller?Remove a node? and edge?


with ≤l edges

and integer l.

with at most l edge.

l=3

l=4

261

b

d

cu

k

g i

v

h

40

1 10

2

15

181

2

6

8

1

230325

12

3


l=4

• Consider your instance I = u,v,l.• Ask a little question (to the little bird) about its optimal solution.

• “What node is in the middle of the path?”• She answers node k.• I ask one friend subI = u,k, l/2

and another subI = k,v, l/2• optS<I,k> = subsolu,k,l/2

+ k + subsolk,v,l/2 is the best solution for I consistent with the kth bird‘s answer.

• Try all k and return the best of these best solutions.

262

Dynamic Programming Algorithm• Given an instance I,• Imagine running the recursive alg on it.• Determine the complete set of subI

ever given to you, your friends, their friends, …• Build a table indexed by these subI• Fill in the table in order so that nobody waits.

Recursive Back Tracking

Given graph G, find Dist[uv,l] for l =1,2,4,8,…

b

d

cu

k

g i

v

h

40

1 10

2

15

181

2

6

8

1

230325

12

3

l=4

,n

263

b

d

cu

k

g i

v

h

40

1 10

2

15

181

2

6

8

1

230325

12

3

l=4

Dynamic Programming AlgorithmLoop Invariant: For each u,v,

Dist[u,v,l] = a shortest path from u to v with ≤l edges

Exit

for l = 2,4,8,16,…,2n % Find Dist[uv,l] from Dist[u,v,l/2] for all u,v Verticies Dist[u,v,l] = Dist[u,v,l/2] for all k Verticies Dist[u,v,l] = min( Dist[u,v,l], Dist[u,k,l/2]+Dist[k,v,l/2] )

NP-Complete Problems

Computable

Exp

Poly

Known

GCDMatching

Halting

Jack Edmonds Steve Cook

NP

• exponential time to search• poly time to verify given witness

Non-Deterministic Polynomial Time

Circuit-Sat Problem: Does a circuit have a satisfying assignment.

SAT

Industry would love a free lunch

• Given a description of a good plane, automatically find one.

• Given a circuit, find a satisfying assignment

• Given a graph find a bichromatic coloring.

• Given course description, find a schedule

X

X X X X



Find the biggest clique,ie subset of nodes that are all connected.


Find the LONGEST simple s-t path.


Find a partition of the nodes into two sets with most edges between them.

Colour each node

Use the fewest # of colours.

Nodes with lines between them must have different colours.


Try all possible colourings

Too many to try. A 50 node graph has more colourings than the number of atoms.


Is there a fast algorithm?

Most people think not.

We have not been able to prove that there is not.

It is one of the biggest open problems in the field.


272

End

1 Jeff Edmonds York University COSC 2011 Abstract Data Types Positions and Pointers Loop Invariants...

Documents

Transcript of 1 Jeff Edmonds York University COSC 2011 Abstract Data Types Positions and Pointers Loop Invariants...