Post on 06-Apr-2018
8/3/2019 Chap12 Slides
1/42
Dynamic Programming
Ananth Grama, Anshul Gupta, GeorgeKarypis, and Vipin Kumar
To accompany the text ``Introduction to Parallel Computing'', Addison Wesley, 2003
8/3/2019 Chap12 Slides
2/42
TopicOverview
Overview of Serial Dynamic Programming
Serial Monadic DP Formulations
Nonserial Monadic DP Formulations
Serial Polyadic DP Formulations
Nonserial Polyadic DP Formulations
8/3/2019 Chap12 Slides
3/42
Overview of Serial Dynamic Programming
Dynamic programming(DP) is used to solve a widevariety of discrete optimization problems such asscheduling, string-editing, packaging, and inventorymanagement.
Break problems into subproblems and combine theirsolutions into solutions to larger problems.
In contrast to divide-and-conquer, there may berelationships across subproblems.
8/3/2019 Chap12 Slides
4/42
Dynamic Programming: Example
Consider the problem of finding a shortest path betweena pair of vertices in an acyclic graph.
An edge connecting node ito nodejhas cost c(i,j).
The graph contains nnodes numbered 0,1,, n-1, and
has an edge from node i to nodejonly if i < j. Node 0 issource and node n-1 is the destination.
Let f(x) be the cost of the shortest path from node 0 tonode x.
8/3/2019 Chap12 Slides
5/42
Dynamic Programming: Example
A graph for which the shortest path between nodes 0
and 4 is to be computed.
8/3/2019 Chap12 Slides
6/42
Dynamic Programming
The solution to a DP problem is typically expressed as aminimum (or maximum) of possible alternate solutions.
If rrepresents the cost of a solution composed ofsubproblems x1, x2,, xl, then rcan be written as
Here, gis the composition function.
If the optimal solution to each problem is determined by
composing optimal solutions to the subproblems andselecting the minimum (or maximum), the formulation issaid to be a DP formulation.
8/3/2019 Chap12 Slides
7/42
Dynamic Programming: Example
The computation and composition of subproblem solutionsto solve problem f(x8).
8/3/2019 Chap12 Slides
8/42
Dynamic Programming
The recursive DP equation is also called the functionalequationor optimization equation.
In the equation for the shortest path problem thecomposition function is f(j) + c(j,x). This contains a single
recursive term (f(j)). Such a formulation is calledmonadic.
If the RHS has multiple recursive terms, the DPformulation is called polyadic.
8/3/2019 Chap12 Slides
9/42
Dynamic Programming
The dependencies between subproblems can beexpressed as a graph.
If the graph can be levelized (i.e., solutions to problemsat a level depend only on solutions to problems at the
previous level), the formulation is called serial, else it iscalled non-serial.
Based on these two criteria, we can classify DPformulations into four categories - serial-monadic, serial-polyadic, non-serial-monadic, non-serial-polyadic.
This classification is useful since it identifies concurrencyand dependencies that guide parallel formulations.
8/3/2019 Chap12 Slides
10/42
Serial Monadic DP Formulations
It is difficult to derive canonical parallel formulations forthe entire class of formulations.
For this reason, we select two representative examples,the shortest-path problem for a multistage graph and the0/1 knapsack problem.
We derive parallel formulations for these problems andidentify common principles guiding design within theclass.
8/3/2019 Chap12 Slides
11/42
Shortest-Path Problem
Special class of shortest path problem where the graphis a weighted multistage graph of r + 1 levels.
Each level is assumed to have nlevels and every nodeat level iis connected to every node at level i + 1.
Levels zero and rcontain only one node, the source anddestination nodes, respectively.
The objective of this problem is to find the shortest pathfrom Sto R.
8/3/2019 Chap12 Slides
12/42
Shortest-Path Problem
An example of a serial monadic DP formulation for findingthe shortest path in a graph whose nodes can be
organized into levels.
8/3/2019 Chap12 Slides
13/42
Shortest-Path Problem
The ith node at level lin the graph is labeled viland the
cost of an edge connecting vilto node vj
l+1 is labeled cil,j.
The cost of reaching the goal node Rfrom any node vil is
represented by Cil.
If there are nnodes at level l, the vector[C0
l, C1l,,
Cnl-1]
Tis referred to as Cl. Note that
C0= [C00].
We have Cil= min {(ci
l,j+ Cj
l+1) | jis a node at level l + 1}
8/3/2019 Chap12 Slides
14/42
Shortest-Path Problem
Since all nodes vjr-1 have only one edge connecting them
to the goal node Rat level r, the cost Cjr-1 is equal to cj
r,-
R1.
We have:
Notice that this problem is serial and monadic.
8/3/2019 Chap12 Slides
15/42
Shortest-Path Problem
The cost of reaching the goal node Rfrom any node atlevel lis (0 < l < r 1) is
8/3/2019 Chap12 Slides
16/42
Shortest-Path Problem
We can express the solution to the problem as amodified sequence of matrix-vector products.
Replacing the addition operation by minimization and themultiplication operation by addition, the preceding set ofequations becomes:
where Cl
and Cl+1
are n x 1 vectors representing the costof reaching the goal node from each node at levels landl + 1.
8/3/2019 Chap12 Slides
17/42
Shortest-Path Problem
Matrix Ml,l+1 is an n x nmatrix in which entry (i, j) storesthe cost of the edge connecting node iat level lto nodejat level l + 1.
The shortest path problem has been formulated as asequence of rmatrix-vector products.
8/3/2019 Chap12 Slides
18/42
Parallel Shortest-Path
We can parallelize this algorithm using the parallelalgorithms for the matrix-vector product.
(n) processing elements can compute each vector Cl intime (n) and solve the entire problem in time (rn).
In many instances of this problem, the matrix Mmay besparse. For such problems, it is highly desirable to usesparse matrix techniques.
8/3/2019 Chap12 Slides
19/42
0/1 Knapsack Problem
We are given a knapsack of capacity cand a set of nobjectsnumbered 1,2,,n. Each object ihas weight wiand profit pi.
Let v = [v1, v2,, vn]be a solution vector in which vi= 0if object iisnot in the knapsack, and vi= 1 if it is in the knapsack.
The goal is to find a subset of objects to put into the knapsack sothat
(that is, the objects fit into the knapsack) and
is maximized (that is, the profit is maximized).
8/3/2019 Chap12 Slides
20/42
0/1 Knapsack Problem
The naive method is to consider all 2npossible subsetsof the nobjects and choose the one that fits into theknapsack and maximizes the profit.
Let F[i,x]be the maximum profit for a knapsack ofcapacity xusing only objects {1,2,,i}. The DPformulation is:
8/3/2019 Chap12 Slides
21/42
0/1 Knapsack Problem
Construct a table Fof size n x cin row-major order.
Filling an entry in a row requires two entries from theprevious row: one from the same column and one fromthe column offset by the weight of the objectcorresponding to the row.
Computing each entry takes constant time; thesequential run time of this algorithm is (nc).
The formulation is serial-monadic.
8/3/2019 Chap12 Slides
22/42
0/1 Knapsack Problem
Computing entries of table Ffor the 0/1 knapsack problem. The computation ofentry F[i,j]requires communication with processing elements containing
entries F[i-1,j]and F[i-1,j-wi].
8/3/2019 Chap12 Slides
23/42
0/1 Knapsack Problem
Using cprocessors in a PRAM, we can derive a simpleparallel algorithm that runs in O(n) time by partitioningthe columns across processors.
In a distributed memory machine, in thejth iteration, forcomputing F[j,r]at processing element Pr-1, F[j-1,r]isavailable locally but F[j-1,r-wj]must be fetched.
The communication operation is a circular shift and thetime is given by (ts+ tw) log c. The total time is thereforetc+ (ts+ tw) log c.
Across all niterations (rows), the parallel time is O(n logc). Note that this is not cost optimal.
8/3/2019 Chap12 Slides
24/42
0/1 Knapsack Problem
Using p-processing elements, each processing elementcomputes c/pelements of the table in each iteration.
The corresponding shift operation takes time (2ts+ twc/p),since the data block may be partitioned across twoprocessors, but the total volume of data is c/p.
The corresponding parallel time is n(tcc/p + 2ts+ twc/p),or O(nc/p) (which is cost-optimal).
Note that there is an upper bound on the efficiency ofthis formulation.
8/3/2019 Chap12 Slides
25/42
Nonserial Monadic DP Formulations: Longest-Common-Subsequence
Given a sequence A = , a subsequence of
A can be formed by deleting some entries from A.
Given two sequences A = and B = , find the longest sequence that is a
subsequence of both A and B.
If A = and B = , the longestcommon subsequence of A and Bis .
8/3/2019 Chap12 Slides
26/42
Longest-Common-Subsequence Problem
Let F[i,j]denote the length of the longest commonsubsequence of the first ielements of A and the firstjelements of B. The objective of the LCS problem is tofind F[n,m].
We can write:
8/3/2019 Chap12 Slides
27/42
Longest-Common-Subsequence Problem
The algorithm computes the two-dimensional Ftable in arow- or column-major fashion. The complexity is (nm).
Treating nodes along a diagonal as belonging to onelevel, each node depends on two subproblems at the
preceding level and one subproblem two levels prior. This DP formulation is nonserial monadic.
8/3/2019 Chap12 Slides
28/42
Longest-Common-Subsequence Problem
(a) Computing entries of table for the longest-common-subsequence problem. Computation proceeds along the dotteddiagonal lines. (b) Mapping elements of the table to processing
elements.
8/3/2019 Chap12 Slides
29/42
Longest-Common-Subsequence: Example
Consider the LCS of two amino-acid sequences H E A G A W G H E E and P A W H E A E. For the interestedreader, the names of the corresponding amino-acids are A: Alanine, E: Glutamic acid, G: Glycine, H: Histidine, P:Proline, and W: Tryptophan.
The Ftable for computing the LCS of the sequences. The LCS is A W H E E.
8/3/2019 Chap12 Slides
30/42
Parallel Longest-Common-Subsequence
Table entries are computed in a diagonal sweep from thetop-left to the bottom-right corner.
Using nprocessors in a PRAM, each entry in a diagonalcan be computed in constant time.
For two sequences of length n, there are 2n-1 diagonals.
The parallel run time is (n) and the algorithm is cost-optimal.
8/3/2019 Chap12 Slides
31/42
Parallel Longest-Common-Subsequence
Consider a (logical) linear array of processors.Processing element Pi is responsible for the (i+1)
thcolumn of the table.
To compute F[i,j], processing element Pj-1 may needeither F[i-1,j-1]or F[i,j-1]from the processing element to
its left. This communication takes time ts+ tw. The computation takes constant time (tc). We have:
Note that this formulation is cost-optimal, however, itsefficiency is upper-bounded by 0.5!
Can you think of how to fix this?
8/3/2019 Chap12 Slides
32/42
Serial Polyadic DP Formulation: Floyd's All-PairsShortest Path
Given weighted graph G(V,E), Floyd's algorithmdetermines the cost di,jof the shortest path betweeneach pair of nodes in V.
Let dik,jbe the minimum cost of a path from node ito
nodej, using only nodes v0,v1,,vk-1.
We have:
Each iteration requires time (n2) and the overall runtime of the sequential algorithm is (n3).
8/3/2019 Chap12 Slides
33/42
Serial Polyadic DP Formulation: Floyd's All-PairsShortest Path
A PRAM formulation of this algorithm uses n2processorsin a logical 2D mesh. Processor Pi,jcomputes the valueof di
k,jfor k=1,2,,n in constant time.
The parallel runtime is (n) and it is cost-optimal.
The algorithm can easily be adapted to practicalarchitectures, as discussed in our treatment of GraphAlgorithms.
8/3/2019 Chap12 Slides
34/42
Nonserial Polyadic DP Formulation: Optimal Matrix-Parenthesization Problem
When multiplying a sequence of matrices, the order ofmultiplication significantly impacts operation count.
Let C[i,j]be the optimal cost of multiplying the matricesAi,Aj.
The chain of matrices can be expressed as a product oftwo smaller chains, Ai,Ai+1,,Akand Ak+1,,Aj.
The chain Ai,Ai+1,,Akresults in a matrix of dimensionsri-1 x rk, and the chain Ak+1,,Ajresults in a matrix ofdimensions rkx rj.
The cost of multiplying these two matrices is ri-1rkrj.
8/3/2019 Chap12 Slides
35/42
Optimal Matrix-Parenthesization Problem
We have:
8/3/2019 Chap12 Slides
36/42
Optimal Matrix-Parenthesization Problem
A nonserial polyadic DP formulation for finding an optimal matrixparenthesization for a chain of four matrices. A square node
represents the optimal cost of multiplying a matrix chain. A circlenode represents a possible parenthesization.
8/3/2019 Chap12 Slides
37/42
Optimal Matrix-Parenthesization Problem
The goal of finding C[1,n]is accomplished in a bottom-upfashion.
Visualize this by thinking of filling in the Ctablediagonally. Entries in diagonal lcorresponds to the cost
of multiplying matrix chains of length l+1. The value of C[i,j]is computed as min{C[i,k] + C[k+1,j] +
ri-1rkrj}, where kcan take values from itoj-1.
Computing C[i,j]requires that we evaluate (j-i) terms andselect their minimum.
The computation of each term takes time tc, and thecomputation of C[i,j]takes time (j-i)tc. Each entry indiagonal lcan be computed in time ltc.
8/3/2019 Chap12 Slides
38/42
Optimal Matrix-Parenthesization Problem
The algorithm computes (n-1) chains of length two. Thistakes time (n-1)tc; computing n-2chains of length threetakes time (n-2)tc. In the final step, the algorithmcomputes one chain of length nin time (n-1)tc.
It follows that the serial time is (n3).
8/3/2019 Chap12 Slides
39/42
Optimal Matrix-Parenthesization Problem
The diagonal order of computation for the optimal matrix-parenthesization problem.
8/3/2019 Chap12 Slides
40/42
Parallel Optimal Matrix-Parenthesization Problem
Consider a logical ring of processors. In step l, each processor computes asingle element belonging to the lthdiagonal.
On computing the assigned value of the element in table C, each processorsends its value to all other processors using an all-to-all broadcast.
The next value can then be computed locally. The total time required to compute the entries along diagonal lis ltc+tslog
n+tw(n-1). The corresponding parallel time is given by:
8/3/2019 Chap12 Slides
41/42
Parallel Optimal Matrix-Parenthesization Problem
When using p(
8/3/2019 Chap12 Slides
42/42
Discussion of Parallel Dynamic ProgrammingAlgorithms
By representing computation as a graph, we identifythree sources of parallelism: parallelism within nodes,parallelism across nodes at a level, and pipelining nodesacross multiple levels. The first two are available in serialformulations and the third one in non-serial formulations.
Data locality is critical for performance. Different DPformulations, by the very nature of the problem instance,have different degrees of locality.