CSE 326 Data Structures: Complexity

Lecture 2: Wednesday, Jan 8, 2003

Overview of the Quarter• Complexity and analysis of algorithms• List-like data structures• Search trees• Priority queues• Hash tables• Sorting• Disjoint sets• Graph algorithms• Algorithm design• Advanced topics

Midtermaround here

Complexity and analysis of algorithms

• Weiss Chapters 1 and 2

• Additional material– Cormen,Leiserson,Rivest – on reserve at Eng.

library– Graphical analysis– Amortized analysis

Program Analysis

• Correctness– Testing– Proofs of correctness

• Efficiency– Asymptotic complexity - how running times

scales as function of size of input

Proving Programs Correct• Often takes the form of an inductive proof• Example: summing an array

int sum(int v[], int n)

if (n==0) return 0;

else return v[n-1]+sum(v,n-1);

if (n==0) return 0;

What are the parts of an inductive proof?

Inductive Proof of Correctnessint sum(int v[], int n)

if (n==0) return 0;

Theorem: sum(v,n) correctly returns sum of 1st n elements of array v for any n.

Basis Step: Program is correct for n=0; returns 0.

Inductive Hypothesis (n=k): Assume sum(v,k) returns sum of first k elements of v.

Inductive Step (n=k+1): sum(v,k+1) returns v[k]+sum(v,k), which is the same of the first k+1 elements of v.

Inductive Proof of Correctness

int find(int x, int v[], int n){ int left = -1; int right = n;

while (left+1 < right) { int m = (left+right) / 2; if (x == v[m]) return m; if (x < v[m]) right = m; else left = m; } return –1; /* not found */}

int find(int x, int v[], int n){ int left = -1; int right = n;

while (left+1 < right) { int m = (left+right) / 2; if (x == v[m]) return m; if (x < v[m]) right = m; else left = m; } return –1; /* not found */}

Binary search in a sortedarray:

v[0]v[1] ... v[n-1]

Given x,find m s.t. x=v[m]

Exercise 1: proof that it is correct

Exercise 2: compute m, k s.t.v[m-1] < x = v[m] = v[m+1] = ... = v[k] < v[k+1]

Proof by Contradiction

• Assume negation of goal, show this leads to a contradiction

• Example: there is no program that solves the “halting problem”– Determines if any other program runs

forever or not

Alan Turing, 1937

• Does NonConformist(NonConformist) halt?

• Yes? That means HALT(NonConformist) = “never

halts”

• No? That means HALT(NonConformist) = “halts”

Program NonConformist (Program P)If ( HALT(P) = “never halts” ) Then

Halt Else

Do While (1 > 0)Print “Hello!”

End WhileEnd If

End Program

Contradiction!

Defining Efficiency

• Asymptotic Complexity - how running time scales as function of size of input

• Two problems:– What is the “input size” ?– How do we express the running time ? (The

Big-O notation)

Input Size

• Usually: length (in characters) of input

• Sometimes: value of input (if it is a number)

• Which inputs?– Worst case: tells us how good an algorithm works– Best case: tells us how bad an algorithm works– Average case: useful in practice, but there are

technical problems here (next)

Input Size

Average Case Analysis• Assume inputs are randomly distributed according

to some “realistic” distribution • Compute expected running time

• Drawbacks– Often hard to define realistic random distributions– Usually hard to perform math

Inputs( )

( , ) Prob ( ) RunTime( )x n

E T n x x

Input Size

• Recall the function: find(x, v, n)• Input size: n (the length of the array)• T(n) = “running time for size n”• But T(n) needs clarification:

– Worst case T(n): it runs in at most T(n) time for any x,v– Best case T(n): it takes at least T(n) time for any x,v– Average case T(n): average time over all v and x

Input Size

Amortized Analysis• Instead of a single input, consider a sequence of

inputs:– This is interesting when the running time on some input

depends on the result of processing previous inputs

• Worst case analysis over the sequence of inputs• Determine average running time on this sequence• Will illustrate in the next lecture

Definition of Order Notation• Upper bound: T(n) = O(f(n)) Big-O

Exist constants c and n’ such that

T(n) c f(n) for all n n’

• Lower bound: T(n) = (g(n)) OmegaExist constants c and n’ such that

T(n) c g(n) for all n n’

• Tight bound: T(n) = (f(n)) ThetaWhen both hold:

T(n) = O(f(n))

T(n) = (f(n))

Other notations: o(f), (f) - see Cormen et al.

Example: Upper Bound

Proof: Must find , such that for all ,

Let's try setting 2. Then

set 100 and reverse the steps abov .

c n n n

n n cn

Using a Different Pair of Constants2

100 100

100 101

Claim: 100 ( )

(divide both sides by

100 100

So we can set 1 and reve

c n n n

n n cn

n n O n

e the steps above.

Example: Lower Bound2

So we can set 0 and reverse the steps above.

Thus we can also conc

Claim: 100 ( )

100lude

c n n n

n n cn

Conventions of Order Notation2 2

Order notation is not symmetric: write

but never

The expression ( ( )) ( ( )) is equivalent to

( ) ( ( ))

The expression ( ( )) ( ( )) is equivalent to

( ) ( (

he right

O f n O g n

f n O g n

f n g n

O n n n

and side is a "cruder" version of the l

( ) ( )

18 ( ) (

log ) ( )

e t:nn O n O n

Which Function Dominates?f(n) =

n3 + 2n2

n + 100n0.1

n-152n/100

82log n

g(n) =

100n2 + 1000

2n + 10 log n

1000n15

3n7 + 7n

Question to class: is f = O(g) ? Is g = O(f) ?

Race If(n)= n3+2n2 g(n)=100n2+1000vs.

Race IIn0.1 log nvs.

Race IIIn + 100n0.1 2n + 10 log nvs.

Race IV5n5 n!vs.

Race Vn-152n/100 1000n15vs.

Race VI82log(n) 3n7 + 7nvs.

• Eliminate low order terms

• Eliminate constant coefficients

3 2 28

3 28 8

3 3 28 8

16 log (10 ) 100

16 log (10 )

log (10 )

log (10) log ( )

log ( )

2 log ( )

log ( )

log (2) log( )

log( )

3 2 2 3816 log (10 ) 100 ( log( ))n n n O n n

Common Namesconstant: O(1)logarithmic: O(log n) linear: O(n)log-linear: O(n log n)quadratic: O(n2)exponential: O(cn) (c is a constant > 1)hyperexponential: (a tower of n

exponentials

Other names:superlinear: O(nc) (c is a constant > 1)polynomial: O(nc) (c is a constant > 0)

)O(22. . .22

Fastest Growth

Slowest Growth

Sums and Recurrences

Often the function f(n) is not explicit but expressed as:

• A sum, or

• A recurrence

Need to obtain analytical formula first

)12)(1(...21)( 3

2222 nOnnn

)1(...21)( 2

)()12()12(...531)( 2

2 nOninnfn

(?)...21)( 333 Onnf

(??))34()34(...951)(1

44444 Oinnfn

More Sums

1333...331)(

nin Onf

))(ln()1ln()ln(11

nOndxxin

11 222222

Sometimes sums are easiest computed with integrals:

Recurrences• f(n) = 2f(n-1) + 1, f(0) = T• Telescoping

f(n)+1 = 2(f(n-1)+1) f(n-1)+1 = 2(f(n-2)+1) 2 f(n-2)+1 = 2(f(n-3)+1) 22

. . . . . f(1) + 1 = 2(f(0) + 1) 2n-1

f(n)+1 = 2n(f(0)+1) = 2n(T+1)

f(n) = 2n(T+1) - 1

Recurrences

• Fibonacci: f(n) = f(n-1)+f(n-2), f(0)=f(1)=1 try f(n) = A cn What is c ? A cn = A cn-1 + A cn-2

c2 – c – 1 = 0

4112,1

Constants A, B can be determined from f(0), f(1) – not interesting for us for the Big O notation

Recurrences

• f(n) = f(n/2) + 1, f(1) = T

• Telescoping:f(n) = f(n/2) + 1f(n/2) = f(n/4) + 1. . .f(2) = f(1) + 1 = T + 1

f(n) = T + log n = O(log n)

CSE 326 Data Structures: Complexity

Documents

Transcript of CSE 326 Data Structures: Complexity

CSE 326: Data Structures Network Flow

CSE 326: Data Structures Introduction & Part One: Complexity Henry Kautz Autumn Quarter 2002.

CSE 326: Data Structures Graph Traversals

CSE 326: Data Structures Lecture #20 Really, Really Hard Problems

CSE 326: Sorting

CSE 326 Data Structures Part 6: Priority Queues, AKA Heaps

CSE 326 Trees

1 CSE 326: Sorting Henry Kautz Autumn Quarter 2002.

CSE 326 Asymptotic Analysis

CSE 326 Disjoint Sets and Dynamic Equivalence

CSE 326: Data Structures: Advanced Topics

CSE 326: Data Structures AVL Trees Ben Lerner Summer 2007.

CSE 326: Data Structures Lecture #0 Introduction

CSE 326: Data Structures Sorting Ben Lerner Summer 2007.

CSE 326: Data Structures Lecture #14 Whoa… Good Hash, Man

CSE 326: Data Structures Network Flow and APSP

CSE 326: Data Structures Hashing Ben Lerner Summer 2007.

CSE 326: Data Structures Lecture #21 Multidimensional Search Trees

CSE 326 More Lists, Stacks and Queues

CSE 326: Data Structures James Fogarty Autumn 2007 Lecture 13.