Advanced Algorithms and Data Structureselomaa/teach/AADS-2016-1.pdf · Advanced Algorithms and Data...
Transcript of Advanced Algorithms and Data Structureselomaa/teach/AADS-2016-1.pdf · Advanced Algorithms and Data...
8/31/2016
1
Advanced Algorithmsand Data Structures
Prof. Tapio [email protected]
Course Prerequisites
• A seven credit unit course• We take things a bit further than basic
algorithms / data structures courses that youmight have attended
• We will assume familiarity with– Necessary mathematics– Elementary data structures– Programming
31-Aug-16MAT-72006 AADS, Fall 2016 2
8/31/2016
2
Course Basics
• There will be 4 hours of lectures per week• Weekly exercises start in a weeks time
• We will not have a programming exercise thisyear (unless you demand to have one)
• We might consider organizing a seminar withvoluntary presentations (yielding extra points)at the end of the course
31-Aug-16MAT-72006 AADS, Fall 2016 3
Organization & Timetable
• Lectures: Prof. Tapio Elomaa– Mon & Wed 12–14 PM in TB216– Aug. 29 – Dec. 7, 2016
• Period break Oct. 17–23, 2016
• Exercises: M.Sc. Juho LauriThu 12–14 TB224, Start: Sept. 8
• Exam: Fri Dec. 16, 2016 @ 13—16
31-Aug-16MAT-72006 AADS, Fall 2016 4
8/31/2016
3
Course Grading
31-Aug-16MAT-72006 AADS, Fall 2016 5
• Exam: Maximum of 30 points• Weekly exercises yield extra points
• 40% of questions answered: 1 point• 80% answered: 6 points• In between: linear scale (so that
decimals are possible)• Final grading depends on what we
agree as course components
Material
• The textbook of the course is– Cormen, Leiserson, Rivest, Stein: Introduction
to Algorithms, 3rd ed., MIT Press, 2009• There is no prepared material, the slides
appear in the web as the lectures proceed– http://www.cs.tut.fi/~elomaa/teach/72006.html
• The exam is based on the lectures (i.e., noton the slides only)
31-Aug-16MAT-72006 AADS, Fall 2016 6
8/31/2016
4
Content (Plan)
I. FoundationsII. (Sorting and) Order StatisticsIII. Data StructuresIV. Advanced Design and Analysis TechniquesV. Advanced Data StructuresVI. Graph AlgorithmsVII.Selected Topics
31-Aug-16MAT-72006 AADS, Fall 2016 7
I Foundations
The Role of Algorithms in ComputingGetting Started
Growth of FunctionsRecurrences
Probabilistic Analysis andRandomized Algorithms
8/31/2016
5
II (Sorting and) OrderStatistics
HeapsortQuicksort
Sorting in Linear TimeMedians and Order Statistics
III Data Structures
Elementary Data StructuresHash Tables
Binary Search TreesRed-Black Trees
8/31/2016
6
IV Advanced Design andAnalysis Techniques
Dynamic ProgrammingGreedy AlgorithmsAmortized Analysis
V Advanced DataStructures
B-TreesBinomial Heaps
Fibonacci Heaps
8/31/2016
7
VI Graph Algorithms
Elementary Graph AlgorithmsMinimum Spanning Trees
Single-Source Shortest PathsAll-Pairs Shortest Paths
Maximum Flow
VII Selected Topics
Matrix OperationsLinear Programming
Number-Theoretic AlgorithmsApproximation Algorithms
Part of these could also bestudent presentation topics
8/31/2016
8
The sorting problem
• Input: A sequence of numbers, , … ,
• Output: A permutation (reordering), , … ,
of the input sequence such that
• The numbers that we wish to sort are alsoknown as keys
31-Aug-16MAT-72006 AADS, Fall 2016 15
INSERTION-SORT( )1. for 2 to .2.3. // Insert [ ] into the sorted sequence [1. . – 1]4. 15. while > 0 and [ ] >6. [ + 1] [ ]7. 18. [ + 1]
31-Aug-16MAT-72006 AADS, Fall 2016 16
8/31/2016
9
5 2 4 6 1 3
31-Aug-16MAT-72006 AADS, Fall 2016 17
2 5 4 6 1 3
2 4 5 6 1 3 2 4 5 6 1 3
1 2 4 5 6 3 1 2 3 4 5 6
Correctness of the Algorithm
• The following loop invariant helps usunderstand why the algorithm is correct:
At the start of each iteration of the forloop of lines 1–8, the subarray [1. . – 1]consists of the elements originally in
[1. . – 1] but in sorted order
31-Aug-16MAT-72006 AADS, Fall 2016 18
8/31/2016
10
Initialization
• The loop invariant holds before the first loopiteration, when = 2:– The subarray, therefore, consists of just the
single element [1]– It is the original element in [1]– This subarray is trivially sorted– Therefore, the loop invariant holds prior to
the first iteration of the loop
31-Aug-16MAT-72006 AADS, Fall 2016 19
Maintenance• Each iteration maintains the loop invariant:
– The body of the for loop works by moving[ – 1], [ – 2], [ – 3], … by one position to
the right until the proper position for [ ] isfound (lines 4–7)
– At this point the value of [ ] is inserted (line8)
– The subarray [1. . ] then consists of theelements originally in [1. . ], but in sortedorder
31-Aug-16MAT-72006 AADS, Fall 2016 20
8/31/2016
11
Termination
• The condition causing the for loop toterminate is that > . =
• Because each loop iteration increases by 1,we must have = + 1 at that time
• Substituting + 1 for j in the wording of loopinvariant, we have that the subarray [1. . ]consists of the elements originally in [1. . ],but in sorted order
• [1. . ] is the entire array31-Aug-16MAT-72006 AADS, Fall 2016 21
Analysis of insertion sort
• The time taken by the INSERTION-SORTdepends on the input:– sorting a thousand numbers takes longer than
sorting three numbers• Moreover, the procedure can take different
amounts of time to sort two input sequencesof the same size– depending on how nearly sorted they already
are31-Aug-16MAT-72006 AADS, Fall 2016 22
8/31/2016
12
Input size
• The time taken by an algorithm grows withthe size of the input
• Traditional to describe the running time of aprogram as a function of the size of its input
• For many problems, such as sorting mostnatural measure for input size is the numberof items in the input—i.e., the array size
31-Aug-16MAT-72006 AADS, Fall 2016 23
• For, e.g., multiplying two integers, the bestmeasure is the total number of bits needed torepresent the input in binary notation
• Sometimes, more appropriate to describe thesize with two numbers rather than one
• E.g., if the input to an algorithm is a graph,the input size can be described by thenumbers of vertices and edges in it
31-Aug-16MAT-72006 AADS, Fall 2016 24
8/31/2016
13
Running time
• Running time of an algorithm on an input:– The number of primitive operations (“steps”)
executed• Step as machine-independent as possible• For the moment:
– Constant amount of time to execute each lineof pseudocode
– We assume that each execution of the th linetakes time , where is a constant
31-Aug-16MAT-72006 AADS, Fall 2016 25
31-Aug-16MAT-72006 AADS, Fall 2016 26
INSERTION-SORT( ) cost times
1 for 2 to . 1
2 [ ] 2 – 13 // Insert [ ] into the sorted sequence [1. . ] 0 – 1
4 – 1 4 – 1
5 while > 0 and [ ] > 5
6 [ + 1] [ ] 6 ( 1)
7 – 1 7 ( 1)
8 [ + 1] 8 – 1
8/31/2016
14
• denotes the number of times the while looptest in line 5 is executed for that value of
• When a for or while loop exits in the usualway, the test is executed one time more thanthe loop body
• Comments are not executable statements, sothey take no time
• Running time of the algorithm is the sum ofthose for each statement executed
31-Aug-16MAT-72006 AADS, Fall 2016 27
• To compute ( ), the running time ofINSERTION-SORT on an input of values,– we sum the products of the cost and times
columns, obtaining
= + 1 + 1 +
+ ( 1) + 1 + ( 1)
31-Aug-16MAT-72006 AADS, Fall 2016 28
8/31/2016
15
Best case
• The best case occurs if the array is alreadysorted
• For each = 2, 3, … , , we then nd that [ ] <in line 5 when has its initial value of 1
• Thus = 1 for = 2, 3, … , , and the best-caserunning time is
= + 1 + 1 + 1 + 1= ( + + + + ) ( + + + )
31-Aug-16MAT-72006 AADS, Fall 2016 29
Worst case
• We can express this as + for constantsand that depend on the statement costs
• It is a linear function of• The worst case results when the array is in
reverse sorted order — in decreasing order• We must compare each element [ ] with each
element in the entire sorted subarray [1. . – 1],and so = for = 2, 3, … ,
31-Aug-16MAT-72006 AADS, Fall 2016 30
8/31/2016
16
• Note that
=( + 1)
21
and
1 =( 1)
2by the summation of an arithmetic series
=( + 1)
231-Aug-16MAT-72006 AADS, Fall 2016 31
• The worst-case running time of INSERTION-SORT is
= + 1 + 1
+( + 1)
2 1 +( 1)
2
+( 1)
2 + ( 1)
= 2 + 2 + 2 + +
( + + )31-Aug-16MAT-72006 AADS, Fall 2016 32
8/31/2016
17
• We can express this worst-case running timeas 2 + + for constants , , and thatdepend on the statement costs
• It is a quadratic function of• The rate of growth, or order of growth, of
the running time really interests us• We consider only the leading term of a
formula ( 2); the lower-order terms arerelatively insigni cant for large values of
31-Aug-16MAT-72006 AADS, Fall 2016 33
• We also ignore the leading term’s coef cient,constant factors are less signi cant than therate of growth in determining computationalef ciency for large inputs
• For insertion sort, we are left with the factorof 2 from the leading term
• We write that insertion sort has a worst-caserunning time of ( 2)(“theta of -squared”)
31-Aug-16MAT-72006 AADS, Fall 2016 34
8/31/2016
18
2.3 Designing algorithms
• Insertion sort is an incremental approach: havingsorted [1. . – 1], we insert [ ] into its properplace, yielding sorted subarray [1. . ]
• Let us examine an alternative design approach,known as “divide-and-conquer”
• We design a sorting algorithm whose worst-caserunning time is much lower
• The running times of divide-and-conqueralgorithms are often easily determined
31-Aug-16MAT-72006 AADS, Fall 2016 35
The divide-and-conquer approach
• Many useful algorithms are recursive:– to solve a problem, they call themselves to
deal with closely related subproblems• These algorithms typically follow a divide-
and-conquer approach:– Break the problem into subproblems that
resemble the original problem but are smaller,– Solve the subproblems recursively,– Combine these solutions to create a solution
to the original problem31-Aug-16MAT-72006 AADS, Fall 2016 36
8/31/2016
19
• The paradigm involves three steps at eachlevel of the recursion:1. Divide the problem into a number of
subproblems that are smaller instances ofthe same problem
2. Conquer the subproblems by solving themrecursively
• If the sizes are small enough, just solve thesubproblems in a straightforward manner
3. Combine the solutions to the subproblemsinto the solution for the original problem
31-Aug-16MAT-72006 AADS, Fall 2016 37
The merge sort algorithm
• Divide: Divide the -element sequence intotwo subsequences of /2 elements each
• Conquer: Sort the two subsequencesrecursively using merge sort
• Combine: Merge the two sortedsubsequences to produce the sorted answer– Recursion “bottoms out” when the sequence
to be sorted has length 1: a sequence oflength 1 is already in sorted order
31-Aug-16MAT-72006 AADS, Fall 2016 38
8/31/2016
20
• The key operation is the merging of two sortedsequences in the “combine” step
• We call auxiliary procedure MERGE( , , , ),where is an array and , , and are indicessuch that <
• The procedure assumes that the subarrays[ . . ] and [ + 1. . ] are in sorted order and
• merges them to form a single sorted subarraythat replaces the current subarray [ . . ]
31-Aug-16MAT-72006 AADS, Fall 2016 39
MERGE( , , , )1. 1 – + 12. 2 –3. Let [1. . 1 + 1] and
[1. . 2 + 1] be newarrays
4. for 1 to 1
5. [ [ + – 1]6. for 1 to 2
7. [ [ + ]
8. [ 1 + 1]9. [ 2 + 1]10. 111. 112. for to13. if [ [ ]14. [ [ ]15. + 116. else [ [ ]17. + 1
31-Aug-16MAT-72006 AADS, Fall 2016 40
8/31/2016
21
• Line 1 computes the length 1 of the subarray[ . . ]; similarly for 2 and [ + 1. . ] on line 2
• Line 3 creates arrays (left) and (right), oflengths 1 + 1 and 2 + 1, respectively– the extra position will hold the sentinel
• The for loop of lines 4–5 copies [ . . ] into[1. . ];
• Lines 6–7 copy [ + 1. . ] into [1. . 2]• Lines 8–9 put the sentinels at the ends of and
31-Aug-16MAT-72006 AADS, Fall 2016 41
• Lines 10–17 perform the + 1 basic stepsby maintaining the following loop invariant:– At the start of each iteration of the for loop of
lines 12–17, [ . . – 1] contains the –smallest elements of [1. . 1 + 1] and
[1. . 2 + 1], in sorted order– Moreover, [ ] and [ ] are the smallest
elements of their arrays that have not beencopied back into
31-Aug-16MAT-72006 AADS, Fall 2016 42
8/31/2016
22
1 2 3 4 5
1 2 3 6
1 2 3 4 5
2 4 5 7
31-Aug-16MAT-72006 AADS, Fall 2016 43
8 9 10 11 12 13 14 15 16 17
… 2 4 5 7 1 2 3 6 …
MERGE( , 9, 12, 16)
1 2 3 4 5
2 4 5 7
1 2 3 4 5
1 2 3 6
8 9 10 11 12 13 14 15 16 17
… 1 4 5 7 1 2 3 6 …
1 2 3 4 5
1 2 3 6
1 2 3 4 5
2 4 5 7
31-Aug-16MAT-72006 AADS, Fall 2016 44
8 9 10 11 12 13 14 15 16 17
… 1 2 2 7 1 2 3 6 …
8 9 10 11 12 13 14 15 16 17
… 1 2 5 7 1 2 3 6 …
1 2 3 4 5
2 4 5 7
1 2 3 4 5
1 2 3 6
8/31/2016
23
1 2 3 4 5
1 2 3 6
31-Aug-16MAT-72006 AADS, Fall 2016 45
8 9 10 11 12 13 14 15 16 17
… 1 2 2 3 4 2 3 6 …
1 2 3 4 5
2 4 5 7
8 9 10 11 12 13 14 15 16 17
… 1 2 2 3 1 2 3 6 …
1 2 3 4 5
2 4 5 7
i
1 2 3 4 5
1 2 3 6
1 2 3 4 5
1 2 3 6
31-Aug-16MAT-72006 AADS, Fall 2016 46
8 9 10 11 12 13 14 15 16 17
… 1 2 2 3 4 5 6 6 …
1 2 3 4 5
2 4 5 7
8 9 10 11 12 13 14 15 16 17
… 1 2 2 3 4 5 3 6 …
1 2 3 4 5
2 4 5 7
1 2 3 4 5
1 2 3 6
8/31/2016
24
31-Aug-16MAT-72006 AADS, Fall 2016 47
8 9 10 11 12 13 14 15 16 17
… 1 2 2 3 4 5 6 7 …
1 2 3 4 5
2 4 5 7
1 2 3 4 5
1 2 3 6
• The needed – + 1 iterations of the lastfor loop have been executed:• [9. . 16] is sorted, and• the two sentinels in and are the only
two elements in these arrays that havenot been copied into
• MERGE procedure runs in ( ) time, where= – + 1
– each of lines 1–3 and 8–11 takes constanttime
– the for loops of lines 4–7 take ( 1 + 2) =( ) time
– there are iterations of the for loop of lines12–17, each of which takes constant time
31-Aug-16MAT-72006 AADS, Fall 2016 48
8/31/2016
25
Merge sort
• The procedure MERGE-SORT( , , ) sorts theelements in [ . . ]
• If , the subarray has at most oneelement and is therefore already sorted
• Otherwise, the divide step computes an indexthat partitions [ . . ] into two subarrays:
– [ . . ], containing /2 elements– [ + 1. . ], containing /2 elements
31-Aug-16MAT-72006 AADS, Fall 2016 49
MERGE-SORT( , , )1. if <2. ( + )/23. MERGE-SORT( , , )4. MERGE-SORT( , + 1, )5. MERGE( , , , )
31-Aug-16MAT-72006 AADS, Fall 2016 50
8/31/2016
26
Analysis of merge sort
• Our analysis assumes that the originalproblem size is a power of 2
• Each divide step then yields twosubsequences of size exactly /2
• We set up the recurrence for ( ), the worst-case running time of merge sort onnumbers
• Merge sort on just one element takesconstant time
31-Aug-16MAT-72006 AADS, Fall 2016 51
• When we have > 1 elements, we breakdown the running time as follows:– Divide: The step just computes the middle of
the subarray, which takes constant time:( ) = (1)
– Conquer: We recursively solve twosubproblems, each of size /2, whichcontributes ( /2) to the running time
– Combine: the MERGE procedure on an -element array takes time ( ), and so
( ) = ( )
31-Aug-16MAT-72006 AADS, Fall 2016 52
8/31/2016
27
• When we add the ( ) and ( ), we areadding functions that are ( ) and (1)
• This sum is a linear function of• Adding it to the ( /2) term from the
“conquer” step gives the recurrence for ( ):
=(1) if = 1
( 2) + ( ) if > 1
31-Aug-16MAT-72006 AADS, Fall 2016 53
• To intuitively see that the solution to therecurrence is ( ) = ( lg ), where lgstands for log2 , let us rewrite it as
= if = 1( 2) + if > 1
where constant represents time required tosolve problems of size 1 and that per arrayelement of the divide and combine steps
31-Aug-16MAT-72006 AADS, Fall 2016 54
8/31/2016
28
31-Aug-16MAT-72006 AADS, Fall 2016 55
• Inductive argument shows that total numberof levels of the recursion tree is lg + 1– base case = 1: tree has only one level; lg 1
= 0 lg + 1 is the correct number of levels– Inductive hypothesis: number of levels of a
tree with 2 leaves is lg 2 + 1 = + 1– We assume that the input size is a power of 2,
the next input size to consider is 2– A tree with = 2 leaves has one more
level than a tree with 2 leaves, and so thetotal number of levels is ( + 1) + 1 = lg 2 +1
31-Aug-16MAT-72006 AADS, Fall 2016 56
8/31/2016
29
• To compute the total cost represented by therecurrence, we simply add up the costs of allthe levels:– The recursion tree has lg + 1 levels, each
costing , for a total cost of(lg + 1) = lg +
– Ignoring the low-order term and theconstant gives the desired result of
( lg )31-Aug-16MAT-72006 AADS, Fall 2016 57