8/13/2019 Algorithm Analysis - Max Alekseyev
1/27
1/19
CSCE750: Analysis of Algorithms
Lecture 6
Max Alekseyev
University of South Carolina
September 10, 2013
http://find/8/13/2019 Algorithm Analysis - Max Alekseyev
2/27
2/19
Outline
Divide-and-ConquerFast Integer MultiplicationFast Matrix Multiplication
http://find/http://goback/8/13/2019 Algorithm Analysis - Max Alekseyev
3/27
3/19
Fast Integer Multiplication
Let b, c 0 be integers, represented in binary, with n bits each.
Let us assume that n is large, so that band ccannot be added,subtracted, or multiplied in constant time.
We imagine that the band care both represented as arrays ofnbits: b=bn
1 b
0and c=cn
1 c
0, where the bi and ci are
individual bits (leading 0s are allowed). Thus,
b = b0 20 +b1 2
1 + +bn1 2n1 =
n1
i=0bi2
i,
c = c0 20 +c1 2
1 + +cn1 2n1 =
n
1i=0
ci2i.
http://goforward/http://find/http://goback/8/13/2019 Algorithm Analysis - Max Alekseyev
4/274/19
Addition
The usual sequential binary add-with-carry algorithm that we alllearned in school takes time (n), since we spend a constant
amount of time at each column, from right to left. The sum isrepresentable by n+ 1 bits (at most).
Q: Can we do better?
http://find/8/13/2019 Algorithm Analysis - Max Alekseyev
5/274/19
Addition
The usual sequential binary add-with-carry algorithm that we alllearned in school takes time (n), since we spend a constantamount of time at each column, from right to left. The sum isrepresentable by n+ 1 bits (at most).
This algorithm is clearly asymptotically optimal, since the producethe correct sum we must at least examine each bit ofband ofc.
http://find/8/13/2019 Algorithm Analysis - Max Alekseyev
6/275/19
Subtraction
Similar to addition, the usual subtract-and-borrow algorithm takes
(n) time, which is clearly asymptotically optimal. The result canbe represented by at most n bits.
http://find/8/13/2019 Algorithm Analysis - Max Alekseyev
7/276/19
Multiplication
If we multiply band cusing the naive grade school algorithm, thenit takes quadratic (i.e., (n2)) time. Essentially, this algorithm istantamount to expanding the product bcaccording to theexpressions above:
bc=
n1i=0
bi2i
n1
j=0
cj2j
=
i,j
bicj2i+j,
then adding everything up term by term. There are n2 terms.
Q: Can we do better?
http://find/8/13/2019 Algorithm Analysis - Max Alekseyev
8/277/19
Multiplying with Divide-and-Conquer
Ifn= 1, then the multiplication is trivial, so assume that n>1.Let us further assume for simplicity n is even. In fact, we can
assume that n is a power of 2. If it is not, pad each number withleading 0s to the next power of 2; at worst this just doubles theinput size.
http://find/http://goback/8/13/2019 Algorithm Analysis - Max Alekseyev
9/277/19
Multiplying with Divide-and-Conquer
Let m= n/2. Split band cup into their m least and m mostsignificant bits. Let b and bh be the numbers given by the low mbits and the high m bits ofb, respectively. Similarly, let c and ch
be the low and high halves ofc. Thus, 0 b, bh, c, ch
8/13/2019 Algorithm Analysis - Max Alekseyev
10/277/19
Multiplying with Divide-and-Conquer
We then have
bc= (b+bh2m)(c+ch2m) =bc+ (bch+bhc)2m +bhch2n.
This suggests that we can compute bcwith four recursivemultiplications of pairs ofm-bit numbers bc, bch, bhc, andbhch as well as (n) time spent doing other things, namely, some
additions and multiplications by powers of two (the latter amountsto arithmetic shift of the bits, which can be done in linear time.)
The time for this divide-and-conquer multiplication algorithm thussatisfies the recurrence
T(n) = 4T(m) + (n) = 4T(n/2) + (n).
The Master Theorem (Case 1) then gives T(n) = (n2), which isasymptotically no better than the naive algorithm.
http://goforward/http://find/http://goback/8/13/2019 Algorithm Analysis - Max Alekseyev
11/278/19
Better Approach
Another way to compute
bc= (b+bh2m
)(c+ch2m
) =bc+ (bch+bhc)2m
+bhch2n
.
Split band cup into their low and high halves as above, but then
recursively compute only three products:
x = bc,
y = bhch,
z = (b+bh)(c+ch).
Now you should verify for yourself that
bc=x+ (z y x)2m +y2n,
which the algorithm then computes.
http://find/8/13/2019 Algorithm Analysis - Max Alekseyev
12/27
9/19
Running Time Analysis
How much time does this take?Besides the recursive calls, there is a linear times worth ofoverhead: additions, subtractions, and arithmetic shifts. There arethree recursive callscomputing x, y, and z. The numbers x and
yare products of two m-bit integers each, and z is the product of(at most) two (m+ 1)-bit integers. Thus the running time satisfies
T(n) = 2 T(n/2) +T(n/2+ 1) + (n).
http://find/8/13/2019 Algorithm Analysis - Max Alekseyev
13/27
9/19
Running Time Analysis
It can be shown that the +1 doesnt affect the result, so therecurrence is effectively
T(n) = 3 T(n/2) + (n),
which yields T(n) = (nlg 3) by the Master Theorem. Sincelg 3 1.585
8/13/2019 Algorithm Analysis - Max Alekseyev
14/27
10/19
A Bit of History
This approach dates back at least to Gauss, who discovered (usingthe same trick) that multiplying two complex numbers togethercould be done with only three real multiplications instead of the
more naive four.
The same idea has been applied to long integer multiplication byKaratsuba and to matrix multiplication by Strassen.
http://goforward/http://find/http://goback/8/13/2019 Algorithm Analysis - Max Alekseyev
15/27
11/19
Matrix Multiplication
Given two n n matrices A= (aij)ni,j=1 and B= (bij)
ni,j=1, their
product is defined as follows:
A B= (cij)ni,j=1, where cij=
nk=1
aik bkj.
Therefore, to compute the matrix product, we need to compute n2
matrix entries. A naive approach takes n multiplications and n 1
additions for each entry.
http://goforward/http://find/http://goback/8/13/2019 Algorithm Analysis - Max Alekseyev
16/27
12/19
Naive Matrix Multiplication Pseudocode
Matrix-Multiply
(A,
B)
1. n=A.rows
2. let Cbe a new n n matrix
3. for i= 1 to n
4. for j= 1 to n5. cij= 0
6. fork= 1 to n
7. cij=cij+aik bkj
8. return C
http://find/http://goback/8/13/2019 Algorithm Analysis - Max Alekseyev
17/27
12/19
Naive Matrix Multiplication Pseudocode
Matrix-Multiply
(A,
B)
1. n=A.rows
2. let Cbe a new n n matrix
3. for i= 1 to n
4. for j= 1 to n5. cij= 0
6. fork= 1 to n
7. cij=cij+aik bkj
8. return C
Running time is (n3).
C ?
http://find/http://goback/8/13/2019 Algorithm Analysis - Max Alekseyev
18/27
13/19
Can we do better?
Is (n3) the best we can do? Can we multiply matrices in o(n3)
time?It seems like any algorithm to multiply matrices must take (n3)time:
Must compute n2 entries.
Each entry is the sum ofn terms.
But with Strassens method, we can multiply matrices in o(n3)time:
Strassens algorithm runs in (nlg 7
) time. 2.80
8/13/2019 Algorithm Analysis - Max Alekseyev
19/27
14/19
Divide-and-Conquer Multiplication Algorithm
For simplisity assume that n is a power of 2. To compute the
product of matrices, we subdivide each of the matrices into fourn/2 n/2submatrices so that the equation C=A B takes form:C11 C12C21 C22
=
A11 A12A21 A22
B11 B12B21 B22
.
This matrix equation corresponds to the following four equationson the submatrices:
C11=A11 B11+A12 B21
C12=A11 B12+A12 B22C21=A21 B11+A22 B21
C22=A21 B12+A22 B22
Di id d C M l i li i P d d
http://find/http://goback/8/13/2019 Algorithm Analysis - Max Alekseyev
20/27
15/19
Divide-and-Conquer Multiplication Pseudocode
Matrix-Multiply-Recursive(A, B)
1. n=A.rows
2. let Cbe a new n n matrix
3. ifn== 1
4. c11=a11 b11
5. else partition each ofA, B, C into four submatrices
6. C11= Matrix-Multiply-Recursive(A11, B11)+ Matrix-Multiply-Recursive(A12, B21)
7. C12= Matrix-Multiply-Recursive(A11, B12)+ Matrix-Multiply-Recursive(A12, B22)
8. C21= Matrix-Multiply-Recursive(A21, B11)+ Matrix-Multiply-Recursive(A22, B21)
9. C22= Matrix-Multiply-Recursive(A21, B12)+ Matrix-Multiply-Recursive(A22, B22)
10. return C
Di id d C M lti li ti R i Ti
http://find/8/13/2019 Algorithm Analysis - Max Alekseyev
21/27
16/19
Divide-and-Conquer Multiplication Running Time
Using index calculation, we can execute Step 5 in (1) time (incontrast to (n2) that would be required if we created submatricesand copied their entries). However, that does not make adifference asymptotically.
Di id d C M lti li ti R i Ti
http://find/8/13/2019 Algorithm Analysis - Max Alekseyev
22/27
16/19
Divide-and-Conquer Multiplication Running Time
Using index calculation, we can execute Step 5 in (1) time (incontrast to (n2) that would be required if we created submatricesand copied their entries). However, that does not make adifference asymptotically.
The running time T(n) for Matrix-Multiply-Recursive onn n matrices satisfy the recurrence:
T(n) = (1) + 8 T(n/2) + (n2) = 8 T(n/2) + (n2)
with T(1) = (1).By Master Theorem, T(n) = (n3) which is unfortunately notfaster than the naive method Matrix-Multiply.
Di id d C M lti li ti D b k
http://find/8/13/2019 Algorithm Analysis - Max Alekseyev
23/27
17/19
Divide-and-Conquer Multiplication Drawback
Each time we split matrix sizes in half, but do not actually reducethe total amount of time.
If we assume that naive matrix multiplication takes c n3 time.Then computing each product of submatrices takesc (n/2)3 =c n
3
8 and we need eight such products, resulting in
total time of 8 c n3
8 =c n3 (plus overhead) that is no better
than simply doing multiplication in the naive way.
Divide and Conquer Multiplication Drawback
http://goforward/http://find/http://goback/8/13/2019 Algorithm Analysis - Max Alekseyev
24/27
17/19
Divide-and-Conquer Multiplication Drawback
Each time we split matrix sizes in half, but do not actually reducethe total amount of time.
If we assume that naive matrix multiplication takes c n3 time.Then computing each product of submatrices takesc (n/2)3 =c n
3
8 and we need eight such products, resulting in
total time of 8 c n3
8 =c n3 (plus overhead) that is no better
than simply doing multiplication in the naive way.
In contrast, let us consider Merge-Sort with the running timerecurrence T(n) = 2 T(n/2) + (n). Even if we did naivequadratic (that is, of time c n2) sorting for each of the
subproblems, the total time would be 2 c (n/2)2 =c n22 (plusoverhead of (n)) that is faster than naive sorting of the wholeproblem by factor of 2. This tells us that Divide-and-Conquersorting may be more efficient than naive sorting (and it is indeedsuch as Master Theorem proves).
Strassens Method
http://find/8/13/2019 Algorithm Analysis - Max Alekseyev
25/27
18/19
Strassen s Method
The idea behind Strassens method is to reduce the number of
multiplications at each recursive call from eight to seven. Thatmakes the recursion tree slightly less bushy.
Strassens Method
http://find/8/13/2019 Algorithm Analysis - Max Alekseyev
26/27
18/19
Strassen s MethodStrassens method has four steps:
1. Divide the input matrices A and B into submatrices as before,
using index calculations in (1) time.2. Create ten n/2 n/2 matrices S1, S2, . . . , S10, each equal the
sum or difference of two submatrices created in Step 1. Thisstep takes (n2) time.
3. Using the submatrices created in Steps 1 and 2, compute
seven products P1, P2, . . . , P7, each of size n/2 n/2.
4. Compute the submatrices ofCby adding or subtractingvarious combinations of the matrices Pi. This step takes(n2).
The running time for Strassens Method satisfies the recurrence:
T(n) =
(1) if n= 1,
7 T(n/2) + (n2) ifn>1.
By the Master theorem, T(n) = (nlg 7).
Notes
http://find/http://goback/8/13/2019 Algorithm Analysis - Max Alekseyev
27/27
19/19
Notes
Strassens algorithm was the first to beat (n3) time, but it is notthe asymptotically fastest known. A method by Coppersmith and
Winograd runs in O(n2.376
) time.
Practical issues against Strassens algorithm:
Higher constant factor than the obvious (n3)-time method.
Not good for sparse matrices.
Not numerically stable: larger errors accumulate than in thenaive method.
Submatrices consume space, especially if copying.
Various researchers have tried to find the crossover point, whereStrassens algorthm runs faster than the naive (n3)-time method.Theoretical analyses (that ignore caches and hardware pipelines)have produced crossover points as low as n= 8, and practical
experiments have found crossover points as low as n= 400.
http://goforward/http://find/http://goback/Top Related