Algorithms

53
Algorithms Modified from slides by Sanjoy Dasgupta

description

Algorithms. Modified from slides by Sanjoy Dasgupta. Why you should know Algorithms. Courses: CSE 21 (Discrete Math), CSE 100 (Data Structures), CSE 101 (Algorithms) Motivation? If you want to be a good programmer, why should you know this stuff? - PowerPoint PPT Presentation

Transcript of Algorithms

Page 1: Algorithms

Algorithms

Modified from slides by Sanjoy Dasgupta

Page 2: Algorithms

Why you should know Algorithms

• Courses: CSE 21 (Discrete Math), CSE 100 (Data Structures), CSE 101 (Algorithms)

• Motivation? If you want to be a good programmer, why should you know this stuff?

• Definition: Algorithms are recipes for faster, smarter computation in lieu of brute force (more HW)

• Apps!: Many real world problems and systems (Roomba, Google, Facebook, Internet) have a few core algorithms

• Bottom line: Knowing algorithms can help you make some spectacular speed wins in your problem area.

Page 3: Algorithms

Some big algorithm wins

• Compression: Lempel and Ziv. Gzip for files faster downloads, less disk storage

• Fast Fourier Transforms (FFTs): Cooley-Tukey. Signal processing modern cell phones

• Shortest Path Algorithm: Dijkstra. Computing Internet routes fast response when comm links crash

• Primality Testing: Rabin-Miller. Finding large primes for public key encryption PGP

• String Matching: Boyer-Moore. Fast search for a keyword in a file grep

Page 4: Algorithms

Algorithm Areas

Useful recipes in different areas • Mathematical Computing

– App: Solid modeling, more interactive

• String matching – App: Google search, faster search

• Graph – App: Computing Internet Routes, fast failure recovery

• Scientific Computation – App: Drug Design, faster design

Page 5: Algorithms

Area 1: Computing Mathematics

Page 6: Algorithms

Counting rabbits

A royal mathematical challenge (1202):

Suppose that rabbits take exactly one month to become fertile, after which they produce one child per month, forever. Starting with one rabbit, how many are there after n months?

Leonardo da Pisa, aka Fibonacci

Page 7: Algorithms

The proliferation of rabbits

Fertile Not fertile

Initially

One month

Two months

Three months

Four months

Five months

Page 8: Algorithms

Formula for rabbits

Let Fn = number of rabbits at month n

F1 = 1

F2 = 1

Fn = Fn-1 + Fn-2

These are the Fibonacci numbers:1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89, …

They grow very fast: F30 > 106 !

In fact, Fn 20.694n, exponential growth. Useful in CSE: merge sort, compression

Page 9: Algorithms

Computing Fibonacci numbers

function F(n)

if n = 1 return 1

if n = 2 return 1

return F(n-1) + F(n-2)

F(5)

F(4) F(3)

F(3)

F(2)

F(1)F(2)

F(1)

F(2)

A recursive algorithm

Page 10: Algorithms

Two questions we always ask

Does it work correctly? (Proofs)

Yes – it directly implements the definition of Fibonacci numbers.

How long does it take? (Time analysis)

This is not so obvious…

Page 11: Algorithms

Running time analysis

function F(n)if n = 1 return 1if n = 2 return 1return F(n-1) + F(n-2)

Let T(n) = number of steps needed to compute F(n).

Then:T(n) > T(n-1) + T(n-2)

But recall that Fn = Fn-1 + Fn-2. Therefore T(n) > Fn 20.694n !

Exponential time.

Page 12: Algorithms

How bad is exponential time?

Need 20.694n operations to compute Fn.

Eg. Computing F200 needs about 2140 operations.

How long does this take on a fast computer?

Page 13: Algorithms

NEC Earth Simulator

Page 14: Algorithms

NEC Earth Simulator

Can perform up to 40 trillion operations per second.

Page 15: Algorithms

Is exponential time all that bad?

The Earth simulator needs 295 seconds for F200.

Time in seconds Interpretation

210 17 minutes

220 12 days

230 32 years

240 cave paintings

Page 16: Algorithms

Post mortemWhat takes so long? Let’s unravel the recursion…

F(n)

F(n-1) F(n-2)

F(n-2)

F(n-3)

F(n-4)F(n-3)

F(n-4)

F(n-3)

F(n-4) F(n-5) F(n-4) F(n-5) F(n-5) F(n-6)

The same subproblems get solved over and over again!

Page 17: Algorithms

A better algorithm

function F(n)Create an array fib[1..n]fib[1] = 1fib[2] = 1for i = 3 to n: fib[i] = fib[i-1] + fib[i-2]return fib[n]

There are n subproblems F1, F2, …, Fn. Solve them in order from low to high (unlike recursion, from high to low)

The usual two questions:

1. Does it return the correct answer?

2. How fast is it?

Page 18: Algorithms

Running time analysisfunction F(n)Create an array fib[1..n]fib[1] = 1fib[2] = 1for i = 3 to n: fib[i] = fib[i-1] + fib[i-2]return fib[n]

The number of operations is proportional to n. [Previous method: 20.7n]

F200 is now reasonable to compute, as are F2000 and F20000.

Motto: the right algorithm makes all the difference.

Page 19: Algorithms

Big-O notation

The constant depends upon:The units of time – minutes, seconds, milliseconds,…Specifics of the computer architecture.

It is much too hairy to figure out exactly. Moreover it is nowhere as important as the huge gulf between n and 2n.So we simply say the running time is O(n).

function F(n)Create an array fib[1..n]fib[1] = 1fib[2] = 1for i = 3 to n: fib[i] = fib[i-1] + fib[i-2]return fib[n]

Running time is proportional to n.

But what is the constant: is it 2n or 3n or what?

Page 20: Algorithms

A more careful analysis

function F(n)Create an array fib[1..n]fib[1] = 1fib[2] = 1for i = 3 to n: fib[i] = fib[i-1] + fib[i-2]return fib[n]

Dishonest accounting: the O(n) operations aren’t all constant-time.

The numbers get very large: Fn 20.7n has 0.7n bits. Adding numbers of this size is hardly a unit operation.

Page 21: Algorithms

Addition

Adding two n-bit numbers

[22] 1 0 1 1 0[13] 1 1 0 1

----------------------------------------[35] 1 0 0 0 1 1

Takes O(n) operations… and we can’t hope for better. Addition takes linear time.

Page 22: Algorithms

Revised time analysisfunction F(n)Create an array fib[1..n]fib[1] = 1fib[2] = 1for i = 3 to n: fib[i] = fib[i-1] + fib[i-2]return fib[n]

Takes O(n) simple operations and O(n) additions… a total of O(n2), quadratic time.

The inefficient algorithm: not O(20.7n) but O(n20.7n).

Page 23: Algorithms

Polynomial vs. exponential

Running times liken, n2, n3, are polynomial.

Running times like2n, en, 2n are exponential.

To an excellent first approximation:polynomial is reasonableexponential is not reasonable

This is the most fundamental dichotomy in algorithms.

Page 24: Algorithms

MultiplicationIntuitively harder than addition… a time analysis lets us quantify this.

[22] 1 0 1 1 0[5] 1 0 1

----------------------------------------------1 0 1 1 0

0 0 0 0 01 0 1 1 0--------------------------------------------------------------------

[110] 1 1 0 1 1 1 0

To multiply two n-bit numbers: create an array of n intermediate sums, and add them up. Each addition is O(n)… a total of O(n2).

It seems that addition is linear, while multiplication is quadratic.

Page 25: Algorithms

Euro-multiplication

22 5

11 10

5 20

2 40

1 80

-----------------------------

110

Still quadratic, but other methods are close to linear!

There are other ways to multiply!

Page 26: Algorithms

Area 2: String Matching

Page 27: Algorithms

Imaginary Scenario

• Context: You agree to help a historian build a web site of famous speeches

• Problem: Historian wants a keyword facility: e.g., all lines with keyword “truth”

• Measure: Need fast response to search so that the site feels “interactive”

Page 28: Algorithms

Keyword Search in File

• Naïve: search for keyword at every offset in file, one character at a time

We hold these truths

truths

truths

truths

truths

Page 29: Algorithms

Usual Questions

• How fast does it run? If file length is F and keyword is K, can take F * K time.

• e.g. F = 10 Mbyte, K = 10: 10 M 10 sec

aaaaaa . . . . aaaaz

aaaaz

aaaaz aaaaz

Page 30: Algorithms

Boyer-Moore String Matching

• Ideas: Compare last character first; on failure consult table to find next offset to try

• e.g. F = 10 Mbyte, K = 10: 0.2M 0.2 sec

aaaabaaaabaaaaz

aaaaz aaaaz

aaaaz

Failure a for z Skip 5

TABLE COMPUTED AT START

. . .

. . .

Page 31: Algorithms

Area 3: Graph Algorithms

Page 32: Algorithms

A cartographer’s problem

Color this map (one color per country) using as few colors as possible.

Neighboring countries must be different colors.

Page 33: Algorithms

Graphs

Graph specified by nodes and edges.node = countryedge = neighbors

The graph coloring problem: color the nodes of the graph with as few colors as possible, such that there is no edge between nodes of the same color.

6

13

78 1

9

1011

12

2345

Page 34: Algorithms

Exam scheduling

Schedule final exams:- use as few time slots as possible- can’t schedule two exams in the same slot if

there’s a student taking both classes.

This is also graph coloring!Node = examEdge = some student is taking

both endpoint-examsColor = time slot

1

23

4

5

Page 35: Algorithms

Building short networks

Albuquerque

Wichita

Little Rock

Houston

San Antonio

El Paso

Amarillo Tulsa

DallasAbilene

?

Minimum spanning tree

Page 36: Algorithms

Finding the minimum spanning tree

Kruskal’s algorithm (1957):

Repeat until nodes are connected:

Add the shortest edge which doesn’t

create a cycle

Greedy strategy: at each stage, make the move which gives the greatest immediate benefit.

Greedy: a popular algorithmic paradigm.

Page 37: Algorithms

Network Example: Shortest Path

• Sally to Jorge via ISP ATT

Seattle

Los Angeles

Chicago

Boston

New York

Sally

Jorge

2

1

3 12

15

2

14

Page 38: Algorithms

Dikstra’s Algorithm (Greedy)

• Add shortest node to tree each step

Seattle

Los Angeles

Chicago

Boston

New York

Sally

Jorge

23

16

12

Page 39: Algorithms

Network Example: Shortest Path

Seattle

Los Angeles

Chicago

Boston

New York

Sally

Jorge

23

6

12

Page 40: Algorithms

Area 4: Scientific Computation

Page 41: Algorithms

A slight variation of MST

Chicago Boston

Atlanta

Minimum Steiner tree

= “Steiner point”

Page 42: Algorithms

A slight variation

Albuquerque

Wichita

Little Rock

Houston

San Antonio

El Paso

Amarillo Tulsa

DallasAbilene

Minimum Steiner tree

Page 43: Algorithms

Bad news

The existing methods for Steiner tree fall into two categories:

1. Correct answer, but exponential time.

2. Efficient, but often the wrong answer.

Let’s look at the second category…

Page 44: Algorithms

Biological inspiration

What are the underlying algorithmic paradigms behind evolution?

1. Mutation

2. Crossover (during reproduction)

3. Survival of the fittest

Can these ideas be used to design algorithms?

Page 45: Algorithms

Hill climbing

Start with a solutionin this case: minimum spanning tree, but

more generally scientific optimizationsProduce a mutation

add/remove/move a Steiner point andits connections

Keep it if it works wellie. if it reduces the total length

Length keeps improving at each step…

Page 46: Algorithms

Hill climbing/Simulated Annealing

Length

Solutions

Optimal solution

Local optimum

Page 47: Algorithms

Genetic algorithms

“Breed” good solutions…

Maintain a population of candidate solutions.At each time step, have them reproduce:

Solution 1 [set of Steiner points]

“Child” solutionSolution 2

[set of Steiner points]

Routinely eliminate “unfit” solutions.

Page 48: Algorithms

Applications to scientific computing

• Problems in physics, chemistry and biology boil down to searching for an “optimal” solution

• For example, recall Bafna’s lecture: finding best arrangement of primers for cancer

• Simulated annealing and genetic algorithms are very useful for such problems

• No strong bounds on performance: yet they do very well in practice.

Page 49: Algorithms
Page 50: Algorithms

Questions

1. Write in big-O notation.

(a) 24n(b) 320n2

2. Draw the graph which captures the adjacency relationships in this map (the countries are Guatemala, Belize, El Salvador, Honduras, Nicaragua, Costa Rica).

Page 51: Algorithms

Conclusions

• Most programming has a few parts of performance critical code that can benefit from a knowledge of algorithms

• Beyond application, it’s a fundamental way of thinking

• This way of thinking is impacting all the sciences: physics, chemistry, linguistics

• Even if you love to program, take CSE 21 and 101 seriously. It will help you!

Page 52: Algorithms

Specs for Earth simulator

Distributed Memory Parallel Computing System which 640 processor nodes interconnected by Single-Stage Crossbar Network

Processor Node: 8 vector processors with shared memory

Peak Performance: 40 Tflops

Total Main Memory: 10 TBPurpose: --- Climate change ---

IBM said that Blue Gene/L is special for its size compared to the Yokohama, Japan-based Simulator, which gauges climate changes. Blue Gene/L is one-hundredth the physical size (320 vs. 32,500 square feet) and consumes one-twentieth the power (216 kilowatts vs. 6,000 KW) compared to the Earth Simulator.

Page 53: Algorithms

Leonardo da Pisa a.k.a. Fibonacci

His Liber abaci (1202) begins:

These are the nine figures of the Indians:

9 8 7 6 5 4 3 2 1.

With these nine figures, and with the sign 0 which in Arabic is called zephirum, any number can be written, as will be demonstrated.