Advanced Algorithms and Data Structureselomaa/teach/AADS-2016-1.pdf · Advanced Algorithms and Data...

8/31/2016

1

Advanced Algorithmsand Data Structures

Prof. Tapio [email protected]

Course Prerequisites

• A seven credit unit course• We take things a bit further than basic

algorithms / data structures courses that youmight have attended

• We will assume familiarity with– Necessary mathematics– Elementary data structures– Programming

31-Aug-16MAT-72006 AADS, Fall 2016 2

8/31/2016

2

Course Basics

• There will be 4 hours of lectures per week• Weekly exercises start in a weeks time

• We will not have a programming exercise thisyear (unless you demand to have one)

• We might consider organizing a seminar withvoluntary presentations (yielding extra points)at the end of the course

31-Aug-16MAT-72006 AADS, Fall 2016 3

Organization & Timetable

• Lectures: Prof. Tapio Elomaa– Mon & Wed 12–14 PM in TB216– Aug. 29 – Dec. 7, 2016

• Period break Oct. 17–23, 2016

• Exercises: M.Sc. Juho LauriThu 12–14 TB224, Start: Sept. 8

• Exam: Fri Dec. 16, 2016 @ 13—16

31-Aug-16MAT-72006 AADS, Fall 2016 4

8/31/2016

3

Course Grading

31-Aug-16MAT-72006 AADS, Fall 2016 5

• Exam: Maximum of 30 points• Weekly exercises yield extra points

• 40% of questions answered: 1 point• 80% answered: 6 points• In between: linear scale (so that

decimals are possible)• Final grading depends on what we

agree as course components

Material

• The textbook of the course is– Cormen, Leiserson, Rivest, Stein: Introduction

to Algorithms, 3rd ed., MIT Press, 2009• There is no prepared material, the slides

appear in the web as the lectures proceed– http://www.cs.tut.fi/~elomaa/teach/72006.html

• The exam is based on the lectures (i.e., noton the slides only)

31-Aug-16MAT-72006 AADS, Fall 2016 6

8/31/2016

4

Content (Plan)

I. FoundationsII. (Sorting and) Order StatisticsIII. Data StructuresIV. Advanced Design and Analysis TechniquesV. Advanced Data StructuresVI. Graph AlgorithmsVII.Selected Topics

31-Aug-16MAT-72006 AADS, Fall 2016 7

I Foundations

The Role of Algorithms in ComputingGetting Started

Growth of FunctionsRecurrences

Probabilistic Analysis andRandomized Algorithms

8/31/2016

5

II (Sorting and) OrderStatistics

HeapsortQuicksort

Sorting in Linear TimeMedians and Order Statistics

III Data Structures

Elementary Data StructuresHash Tables

Binary Search TreesRed-Black Trees

8/31/2016

6

IV Advanced Design andAnalysis Techniques

Dynamic ProgrammingGreedy AlgorithmsAmortized Analysis

V Advanced DataStructures

B-TreesBinomial Heaps

Fibonacci Heaps

8/31/2016

7

VI Graph Algorithms

Elementary Graph AlgorithmsMinimum Spanning Trees

Single-Source Shortest PathsAll-Pairs Shortest Paths

Maximum Flow

VII Selected Topics

Matrix OperationsLinear Programming

Number-Theoretic AlgorithmsApproximation Algorithms

Part of these could also bestudent presentation topics

8/31/2016

8

The sorting problem

• Input: A sequence of numbers, , … ,

• Output: A permutation (reordering), , … ,

of the input sequence such that

• The numbers that we wish to sort are alsoknown as keys

31-Aug-16MAT-72006 AADS, Fall 2016 15

INSERTION-SORT( )1. for 2 to .2.3. // Insert [ ] into the sorted sequence [1. . – 1]4. 15. while > 0 and [ ] >6. [ + 1] [ ]7. 18. [ + 1]

31-Aug-16MAT-72006 AADS, Fall 2016 16

8/31/2016

9

5 2 4 6 1 3

31-Aug-16MAT-72006 AADS, Fall 2016 17

2 5 4 6 1 3

2 4 5 6 1 3 2 4 5 6 1 3

1 2 4 5 6 3 1 2 3 4 5 6

Correctness of the Algorithm

• The following loop invariant helps usunderstand why the algorithm is correct:

At the start of each iteration of the forloop of lines 1–8, the subarray [1. . – 1]consists of the elements originally in

[1. . – 1] but in sorted order

31-Aug-16MAT-72006 AADS, Fall 2016 18

8/31/2016

10

Initialization

• The loop invariant holds before the first loopiteration, when = 2:– The subarray, therefore, consists of just the

single element [1]– It is the original element in [1]– This subarray is trivially sorted– Therefore, the loop invariant holds prior to

the first iteration of the loop

31-Aug-16MAT-72006 AADS, Fall 2016 19

Maintenance• Each iteration maintains the loop invariant:

– The body of the for loop works by moving[ – 1], [ – 2], [ – 3], … by one position to

the right until the proper position for [ ] isfound (lines 4–7)

– At this point the value of [ ] is inserted (line8)

– The subarray [1. . ] then consists of theelements originally in [1. . ], but in sortedorder

31-Aug-16MAT-72006 AADS, Fall 2016 20

8/31/2016

11

Termination

• The condition causing the for loop toterminate is that > . =

• Because each loop iteration increases by 1,we must have = + 1 at that time

• Substituting + 1 for j in the wording of loopinvariant, we have that the subarray [1. . ]consists of the elements originally in [1. . ],but in sorted order

• [1. . ] is the entire array31-Aug-16MAT-72006 AADS, Fall 2016 21

Analysis of insertion sort

• The time taken by the INSERTION-SORTdepends on the input:– sorting a thousand numbers takes longer than

sorting three numbers• Moreover, the procedure can take different

amounts of time to sort two input sequencesof the same size– depending on how nearly sorted they already

are31-Aug-16MAT-72006 AADS, Fall 2016 22

8/31/2016

12

Input size

• The time taken by an algorithm grows withthe size of the input

• Traditional to describe the running time of aprogram as a function of the size of its input

• For many problems, such as sorting mostnatural measure for input size is the numberof items in the input—i.e., the array size

31-Aug-16MAT-72006 AADS, Fall 2016 23

• For, e.g., multiplying two integers, the bestmeasure is the total number of bits needed torepresent the input in binary notation

• Sometimes, more appropriate to describe thesize with two numbers rather than one

• E.g., if the input to an algorithm is a graph,the input size can be described by thenumbers of vertices and edges in it

31-Aug-16MAT-72006 AADS, Fall 2016 24

8/31/2016

13

Running time

• Running time of an algorithm on an input:– The number of primitive operations (“steps”)

executed• Step as machine-independent as possible• For the moment:

– Constant amount of time to execute each lineof pseudocode

– We assume that each execution of the th linetakes time , where is a constant

31-Aug-16MAT-72006 AADS, Fall 2016 25

31-Aug-16MAT-72006 AADS, Fall 2016 26

INSERTION-SORT( ) cost times

1 for 2 to . 1

2 [ ] 2 – 13 // Insert [ ] into the sorted sequence [1. . ] 0 – 1

4 – 1 4 – 1

5 while > 0 and [ ] > 5

6 [ + 1] [ ] 6 ( 1)

7 – 1 7 ( 1)

8 [ + 1] 8 – 1

8/31/2016

14

• denotes the number of times the while looptest in line 5 is executed for that value of

• When a for or while loop exits in the usualway, the test is executed one time more thanthe loop body

• Comments are not executable statements, sothey take no time

• Running time of the algorithm is the sum ofthose for each statement executed

31-Aug-16MAT-72006 AADS, Fall 2016 27

• To compute ( ), the running time ofINSERTION-SORT on an input of values,– we sum the products of the cost and times

columns, obtaining

= + 1 + 1 +

+ ( 1) + 1 + ( 1)

31-Aug-16MAT-72006 AADS, Fall 2016 28

8/31/2016

15

Best case

• The best case occurs if the array is alreadysorted

• For each = 2, 3, … , , we then nd that [ ] <in line 5 when has its initial value of 1

• Thus = 1 for = 2, 3, … , , and the best-caserunning time is

= + 1 + 1 + 1 + 1= ( + + + + ) ( + + + )

31-Aug-16MAT-72006 AADS, Fall 2016 29

Worst case

• We can express this as + for constantsand that depend on the statement costs

• It is a linear function of• The worst case results when the array is in

reverse sorted order — in decreasing order• We must compare each element [ ] with each

element in the entire sorted subarray [1. . – 1],and so = for = 2, 3, … ,

31-Aug-16MAT-72006 AADS, Fall 2016 30

8/31/2016

16

• Note that

=( + 1)

21

and

1 =( 1)

2by the summation of an arithmetic series

=( + 1)

231-Aug-16MAT-72006 AADS, Fall 2016 31

• The worst-case running time of INSERTION-SORT is

= + 1 + 1

+( + 1)

2 1 +( 1)

2

+( 1)

2 + ( 1)

= 2 + 2 + 2 + +

( + + )31-Aug-16MAT-72006 AADS, Fall 2016 32

8/31/2016

17

• We can express this worst-case running timeas 2 + + for constants , , and thatdepend on the statement costs

• It is a quadratic function of• The rate of growth, or order of growth, of

the running time really interests us• We consider only the leading term of a

formula ( 2); the lower-order terms arerelatively insigni cant for large values of

31-Aug-16MAT-72006 AADS, Fall 2016 33

• We also ignore the leading term’s coef cient,constant factors are less signi cant than therate of growth in determining computationalef ciency for large inputs

• For insertion sort, we are left with the factorof 2 from the leading term

• We write that insertion sort has a worst-caserunning time of ( 2)(“theta of -squared”)

31-Aug-16MAT-72006 AADS, Fall 2016 34

8/31/2016

18

2.3 Designing algorithms

• Insertion sort is an incremental approach: havingsorted [1. . – 1], we insert [ ] into its properplace, yielding sorted subarray [1. . ]

• Let us examine an alternative design approach,known as “divide-and-conquer”

• We design a sorting algorithm whose worst-caserunning time is much lower

• The running times of divide-and-conqueralgorithms are often easily determined

31-Aug-16MAT-72006 AADS, Fall 2016 35

The divide-and-conquer approach

• Many useful algorithms are recursive:– to solve a problem, they call themselves to

deal with closely related subproblems• These algorithms typically follow a divide-

and-conquer approach:– Break the problem into subproblems that

resemble the original problem but are smaller,– Solve the subproblems recursively,– Combine these solutions to create a solution

to the original problem31-Aug-16MAT-72006 AADS, Fall 2016 36

8/31/2016

19

• The paradigm involves three steps at eachlevel of the recursion:1. Divide the problem into a number of

subproblems that are smaller instances ofthe same problem

2. Conquer the subproblems by solving themrecursively

• If the sizes are small enough, just solve thesubproblems in a straightforward manner

3. Combine the solutions to the subproblemsinto the solution for the original problem

31-Aug-16MAT-72006 AADS, Fall 2016 37

The merge sort algorithm

• Divide: Divide the -element sequence intotwo subsequences of /2 elements each

• Conquer: Sort the two subsequencesrecursively using merge sort

• Combine: Merge the two sortedsubsequences to produce the sorted answer– Recursion “bottoms out” when the sequence

to be sorted has length 1: a sequence oflength 1 is already in sorted order

31-Aug-16MAT-72006 AADS, Fall 2016 38

8/31/2016

20

• The key operation is the merging of two sortedsequences in the “combine” step

• We call auxiliary procedure MERGE( , , , ),where is an array and , , and are indicessuch that <

• The procedure assumes that the subarrays[ . . ] and [ + 1. . ] are in sorted order and

• merges them to form a single sorted subarraythat replaces the current subarray [ . . ]

31-Aug-16MAT-72006 AADS, Fall 2016 39

MERGE( , , , )1. 1 – + 12. 2 –3. Let [1. . 1 + 1] and

[1. . 2 + 1] be newarrays

4. for 1 to 1

5. [ [ + – 1]6. for 1 to 2

7. [ [ + ]

8. [ 1 + 1]9. [ 2 + 1]10. 111. 112. for to13. if [ [ ]14. [ [ ]15. + 116. else [ [ ]17. + 1

31-Aug-16MAT-72006 AADS, Fall 2016 40

8/31/2016

21

• Line 1 computes the length 1 of the subarray[ . . ]; similarly for 2 and [ + 1. . ] on line 2

• Line 3 creates arrays (left) and (right), oflengths 1 + 1 and 2 + 1, respectively– the extra position will hold the sentinel

• The for loop of lines 4–5 copies [ . . ] into[1. . ];

• Lines 6–7 copy [ + 1. . ] into [1. . 2]• Lines 8–9 put the sentinels at the ends of and

31-Aug-16MAT-72006 AADS, Fall 2016 41

• Lines 10–17 perform the + 1 basic stepsby maintaining the following loop invariant:– At the start of each iteration of the for loop of

lines 12–17, [ . . – 1] contains the –smallest elements of [1. . 1 + 1] and

[1. . 2 + 1], in sorted order– Moreover, [ ] and [ ] are the smallest

elements of their arrays that have not beencopied back into

31-Aug-16MAT-72006 AADS, Fall 2016 42

8/31/2016

22

1 2 3 4 5

1 2 3 6

1 2 3 4 5

2 4 5 7

31-Aug-16MAT-72006 AADS, Fall 2016 43

8 9 10 11 12 13 14 15 16 17

… 2 4 5 7 1 2 3 6 …

MERGE( , 9, 12, 16)

1 2 3 4 5

2 4 5 7

1 2 3 4 5

1 2 3 6

8 9 10 11 12 13 14 15 16 17

… 1 4 5 7 1 2 3 6 …

1 2 3 4 5

1 2 3 6

1 2 3 4 5

2 4 5 7

31-Aug-16MAT-72006 AADS, Fall 2016 44

8 9 10 11 12 13 14 15 16 17

… 1 2 2 7 1 2 3 6 …

8 9 10 11 12 13 14 15 16 17

… 1 2 5 7 1 2 3 6 …

1 2 3 4 5

2 4 5 7

1 2 3 4 5

1 2 3 6

8/31/2016

23

1 2 3 4 5

1 2 3 6

31-Aug-16MAT-72006 AADS, Fall 2016 45

8 9 10 11 12 13 14 15 16 17

… 1 2 2 3 4 2 3 6 …

1 2 3 4 5

2 4 5 7

8 9 10 11 12 13 14 15 16 17

… 1 2 2 3 1 2 3 6 …

1 2 3 4 5

2 4 5 7

i

1 2 3 4 5

1 2 3 6

1 2 3 4 5

1 2 3 6

31-Aug-16MAT-72006 AADS, Fall 2016 46

8 9 10 11 12 13 14 15 16 17

… 1 2 2 3 4 5 6 6 …

1 2 3 4 5

2 4 5 7

8 9 10 11 12 13 14 15 16 17

… 1 2 2 3 4 5 3 6 …

1 2 3 4 5

2 4 5 7

1 2 3 4 5

1 2 3 6

8/31/2016

24

31-Aug-16MAT-72006 AADS, Fall 2016 47

8 9 10 11 12 13 14 15 16 17

… 1 2 2 3 4 5 6 7 …

1 2 3 4 5

2 4 5 7

1 2 3 4 5

1 2 3 6

• The needed – + 1 iterations of the lastfor loop have been executed:• [9. . 16] is sorted, and• the two sentinels in and are the only

two elements in these arrays that havenot been copied into

• MERGE procedure runs in ( ) time, where= – + 1

– each of lines 1–3 and 8–11 takes constanttime

– the for loops of lines 4–7 take ( 1 + 2) =( ) time

– there are iterations of the for loop of lines12–17, each of which takes constant time

31-Aug-16MAT-72006 AADS, Fall 2016 48

8/31/2016

25

Merge sort

• The procedure MERGE-SORT( , , ) sorts theelements in [ . . ]

• If , the subarray has at most oneelement and is therefore already sorted

• Otherwise, the divide step computes an indexthat partitions [ . . ] into two subarrays:

– [ . . ], containing /2 elements– [ + 1. . ], containing /2 elements

31-Aug-16MAT-72006 AADS, Fall 2016 49

MERGE-SORT( , , )1. if <2. ( + )/23. MERGE-SORT( , , )4. MERGE-SORT( , + 1, )5. MERGE( , , , )

31-Aug-16MAT-72006 AADS, Fall 2016 50

8/31/2016

26

Analysis of merge sort

• Our analysis assumes that the originalproblem size is a power of 2

• Each divide step then yields twosubsequences of size exactly /2

• We set up the recurrence for ( ), the worst-case running time of merge sort onnumbers

• Merge sort on just one element takesconstant time

31-Aug-16MAT-72006 AADS, Fall 2016 51

• When we have > 1 elements, we breakdown the running time as follows:– Divide: The step just computes the middle of

the subarray, which takes constant time:( ) = (1)

– Conquer: We recursively solve twosubproblems, each of size /2, whichcontributes ( /2) to the running time

– Combine: the MERGE procedure on an -element array takes time ( ), and so

( ) = ( )

31-Aug-16MAT-72006 AADS, Fall 2016 52

8/31/2016

27

• When we add the ( ) and ( ), we areadding functions that are ( ) and (1)

• This sum is a linear function of• Adding it to the ( /2) term from the

“conquer” step gives the recurrence for ( ):

=(1) if = 1

( 2) + ( ) if > 1

31-Aug-16MAT-72006 AADS, Fall 2016 53

• To intuitively see that the solution to therecurrence is ( ) = ( lg ), where lgstands for log2 , let us rewrite it as

= if = 1( 2) + if > 1

where constant represents time required tosolve problems of size 1 and that per arrayelement of the divide and combine steps

31-Aug-16MAT-72006 AADS, Fall 2016 54

8/31/2016

28

31-Aug-16MAT-72006 AADS, Fall 2016 55

• Inductive argument shows that total numberof levels of the recursion tree is lg + 1– base case = 1: tree has only one level; lg 1

= 0 lg + 1 is the correct number of levels– Inductive hypothesis: number of levels of a

tree with 2 leaves is lg 2 + 1 = + 1– We assume that the input size is a power of 2,

the next input size to consider is 2– A tree with = 2 leaves has one more

level than a tree with 2 leaves, and so thetotal number of levels is ( + 1) + 1 = lg 2 +1

31-Aug-16MAT-72006 AADS, Fall 2016 56

8/31/2016

29

• To compute the total cost represented by therecurrence, we simply add up the costs of allthe levels:– The recursion tree has lg + 1 levels, each

costing , for a total cost of(lg + 1) = lg +

– Ignoring the low-order term and theconstant gives the desired result of

( lg )31-Aug-16MAT-72006 AADS, Fall 2016 57

Advanced Algorithms and Data Structureselomaa/teach/AADS-2016-1.pdf · Advanced Algorithms and Data...

Documents

Transcript of Advanced Algorithms and Data Structureselomaa/teach/AADS-2016-1.pdf · Advanced Algorithms and Data...