Download - Algorithm Analysis - Max Alekseyev

8/13/2019 Algorithm Analysis - Max Alekseyev

1/27

1/19

CSCE750: Analysis of Algorithms

Lecture 6

Max Alekseyev

University of South Carolina

September 10, 2013
http://find/


2/27

2/19

Outline

Divide-and-ConquerFast Integer MultiplicationFast Matrix Multiplication
http://find/http://goback/


3/27

3/19

Fast Integer Multiplication

Let b, c 0 be integers, represented in binary, with n bits each.

Let us assume that n is large, so that band ccannot be added,subtracted, or multiplied in constant time.

We imagine that the band care both represented as arrays ofnbits: b=bn

1 b

0and c=cn

1 c

0, where the bi and ci are

individual bits (leading 0s are allowed). Thus,

b = b0 20 +b1 2

1 + +bn1 2n1 =

n1

i=0bi2

i,

c = c0 20 +c1 2

1 + +cn1 2n1 =

n

1i=0

ci2i.
http://goforward/http://find/http://goback/


4/274/19

Addition

The usual sequential binary add-with-carry algorithm that we alllearned in school takes time (n), since we spend a constant

amount of time at each column, from right to left. The sum isrepresentable by n+ 1 bits (at most).

Q: Can we do better?
http://find/


5/274/19

Addition

The usual sequential binary add-with-carry algorithm that we alllearned in school takes time (n), since we spend a constantamount of time at each column, from right to left. The sum isrepresentable by n+ 1 bits (at most).

This algorithm is clearly asymptotically optimal, since the producethe correct sum we must at least examine each bit ofband ofc.
http://find/


6/275/19

Subtraction

Similar to addition, the usual subtract-and-borrow algorithm takes

(n) time, which is clearly asymptotically optimal. The result canbe represented by at most n bits.
http://find/


7/276/19

Multiplication

If we multiply band cusing the naive grade school algorithm, thenit takes quadratic (i.e., (n2)) time. Essentially, this algorithm istantamount to expanding the product bcaccording to theexpressions above:

bc=

n1i=0

bi2i

n1

j=0

cj2j

=

i,j

bicj2i+j,

then adding everything up term by term. There are n2 terms.

Q: Can we do better?
http://find/


8/277/19

Multiplying with Divide-and-Conquer

Ifn= 1, then the multiplication is trivial, so assume that n>1.Let us further assume for simplicity n is even. In fact, we can

assume that n is a power of 2. If it is not, pad each number withleading 0s to the next power of 2; at worst this just doubles theinput size.


9/277/19


Let m= n/2. Split band cup into their m least and m mostsignificant bits. Let b and bh be the numbers given by the low mbits and the high m bits ofb, respectively. Similarly, let c and ch

be the low and high halves ofc. Thus, 0 b, bh, c, ch


10/277/19


We then have

bc= (b+bh2m)(c+ch2m) =bc+ (bch+bhc)2m +bhch2n.

This suggests that we can compute bcwith four recursivemultiplications of pairs ofm-bit numbers bc, bch, bhc, andbhch as well as (n) time spent doing other things, namely, some

additions and multiplications by powers of two (the latter amountsto arithmetic shift of the bits, which can be done in linear time.)

The time for this divide-and-conquer multiplication algorithm thussatisfies the recurrence

T(n) = 4T(m) + (n) = 4T(n/2) + (n).

The Master Theorem (Case 1) then gives T(n) = (n2), which isasymptotically no better than the naive algorithm.


11/278/19

Better Approach

Another way to compute

bc= (b+bh2m

)(c+ch2m

) =bc+ (bch+bhc)2m

+bhch2n

.

Split band cup into their low and high halves as above, but then

recursively compute only three products:

x = bc,

y = bhch,

z = (b+bh)(c+ch).

Now you should verify for yourself that

bc=x+ (z y x)2m +y2n,

which the algorithm then computes.
http://find/


12/27

9/19

Running Time Analysis

How much time does this take?Besides the recursive calls, there is a linear times worth ofoverhead: additions, subtractions, and arithmetic shifts. There arethree recursive callscomputing x, y, and z. The numbers x and

yare products of two m-bit integers each, and z is the product of(at most) two (m+ 1)-bit integers. Thus the running time satisfies

T(n) = 2 T(n/2) +T(n/2+ 1) + (n).
http://find/


13/27

9/19

Running Time Analysis

It can be shown that the +1 doesnt affect the result, so therecurrence is effectively

T(n) = 3 T(n/2) + (n),

which yields T(n) = (nlg 3) by the Master Theorem. Sincelg 3 1.585


14/27

10/19

A Bit of History

This approach dates back at least to Gauss, who discovered (usingthe same trick) that multiplying two complex numbers togethercould be done with only three real multiplications instead of the

more naive four.

The same idea has been applied to long integer multiplication byKaratsuba and to matrix multiplication by Strassen.


15/27

11/19

Matrix Multiplication

Given two n n matrices A= (aij)ni,j=1 and B= (bij)

ni,j=1, their

product is defined as follows:

A B= (cij)ni,j=1, where cij=

nk=1

aik bkj.

Therefore, to compute the matrix product, we need to compute n2

matrix entries. A naive approach takes n multiplications and n 1

additions for each entry.


16/27

12/19

Naive Matrix Multiplication Pseudocode

Matrix-Multiply

(A,

B)

1. n=A.rows

2. let Cbe a new n n matrix

3. for i= 1 to n

4. for j= 1 to n5. cij= 0

6. fork= 1 to n

7. cij=cij+aik bkj

8. return C


17/27

12/19

Naive Matrix Multiplication Pseudocode

Matrix-Multiply

(A,

B)

1. n=A.rows


3. for i= 1 to n

4. for j= 1 to n5. cij= 0

6. fork= 1 to n

7. cij=cij+aik bkj

8. return C

Running time is (n3).

C ?


18/27

13/19

Can we do better?

Is (n3) the best we can do? Can we multiply matrices in o(n3)

time?It seems like any algorithm to multiply matrices must take (n3)time:

Must compute n2 entries.

Each entry is the sum ofn terms.

But with Strassens method, we can multiply matrices in o(n3)time:

Strassens algorithm runs in (nlg 7

) time. 2.80


19/27

14/19

Divide-and-Conquer Multiplication Algorithm

For simplisity assume that n is a power of 2. To compute the

product of matrices, we subdivide each of the matrices into fourn/2 n/2submatrices so that the equation C=A B takes form:C11 C12C21 C22

=

A11 A12A21 A22

B11 B12B21 B22

.

This matrix equation corresponds to the following four equationson the submatrices:

C11=A11 B11+A12 B21

C12=A11 B12+A12 B22C21=A21 B11+A22 B21

C22=A21 B12+A22 B22

Di id d C M l i li i P d d


20/27

15/19

Divide-and-Conquer Multiplication Pseudocode

Matrix-Multiply-Recursive(A, B)

1. n=A.rows


3. ifn== 1

4. c11=a11 b11

5. else partition each ofA, B, C into four submatrices

6. C11= Matrix-Multiply-Recursive(A11, B11)+ Matrix-Multiply-Recursive(A12, B21)




10. return C

Di id d C M lti li ti R i Ti
http://find/


21/27

16/19

Divide-and-Conquer Multiplication Running Time

Using index calculation, we can execute Step 5 in (1) time (incontrast to (n2) that would be required if we created submatricesand copied their entries). However, that does not make adifference asymptotically.

Di id d C M lti li ti R i Ti
http://find/


22/27

16/19

Divide-and-Conquer Multiplication Running Time

Using index calculation, we can execute Step 5 in (1) time (incontrast to (n2) that would be required if we created submatricesand copied their entries). However, that does not make adifference asymptotically.

The running time T(n) for Matrix-Multiply-Recursive onn n matrices satisfy the recurrence:

T(n) = (1) + 8 T(n/2) + (n2) = 8 T(n/2) + (n2)

with T(1) = (1).By Master Theorem, T(n) = (n3) which is unfortunately notfaster than the naive method Matrix-Multiply.

Di id d C M lti li ti D b k
http://find/


23/27

17/19

Divide-and-Conquer Multiplication Drawback

Each time we split matrix sizes in half, but do not actually reducethe total amount of time.

If we assume that naive matrix multiplication takes c n3 time.Then computing each product of submatrices takesc (n/2)3 =c n

3

8 and we need eight such products, resulting in

total time of 8 c n3

8 =c n3 (plus overhead) that is no better

than simply doing multiplication in the naive way.

Divide and Conquer Multiplication Drawback


24/27

17/19

Divide-and-Conquer Multiplication Drawback

Each time we split matrix sizes in half, but do not actually reducethe total amount of time.

If we assume that naive matrix multiplication takes c n3 time.Then computing each product of submatrices takesc (n/2)3 =c n

3

8 and we need eight such products, resulting in

total time of 8 c n3

8 =c n3 (plus overhead) that is no better

than simply doing multiplication in the naive way.

In contrast, let us consider Merge-Sort with the running timerecurrence T(n) = 2 T(n/2) + (n). Even if we did naivequadratic (that is, of time c n2) sorting for each of the

subproblems, the total time would be 2 c (n/2)2 =c n22 (plusoverhead of (n)) that is faster than naive sorting of the wholeproblem by factor of 2. This tells us that Divide-and-Conquersorting may be more efficient than naive sorting (and it is indeedsuch as Master Theorem proves).

Strassens Method
http://find/


25/27

18/19

Strassen s Method

The idea behind Strassens method is to reduce the number of

multiplications at each recursive call from eight to seven. Thatmakes the recursion tree slightly less bushy.

Strassens Method
http://find/


26/27

18/19

Strassen s MethodStrassens method has four steps:

1. Divide the input matrices A and B into submatrices as before,

using index calculations in (1) time.2. Create ten n/2 n/2 matrices S1, S2, . . . , S10, each equal the

sum or difference of two submatrices created in Step 1. Thisstep takes (n2) time.

3. Using the submatrices created in Steps 1 and 2, compute

seven products P1, P2, . . . , P7, each of size n/2 n/2.

4. Compute the submatrices ofCby adding or subtractingvarious combinations of the matrices Pi. This step takes(n2).

The running time for Strassens Method satisfies the recurrence:

T(n) =

(1) if n= 1,

7 T(n/2) + (n2) ifn>1.

By the Master theorem, T(n) = (nlg 7).

Notes


27/27

19/19

Notes

Strassens algorithm was the first to beat (n3) time, but it is notthe asymptotically fastest known. A method by Coppersmith and

Winograd runs in O(n2.376

) time.

Practical issues against Strassens algorithm:

Higher constant factor than the obvious (n3)-time method.

Not good for sparse matrices.

Not numerically stable: larger errors accumulate than in thenaive method.

Submatrices consume space, especially if copying.

Various researchers have tried to find the crossover point, whereStrassens algorthm runs faster than the naive (n3)-time method.Theoretical analyses (that ignore caches and hardware pipelines)have produced crossover points as low as n= 8, and practical

experiments have found crossover points as low as n= 400.