Fast Bilinear Algorithms for Convolution · Convolution The discrete convolution between vectors f...

50
Fast Bilinear Algorithms for Convolution Caleb Ju CS598EVS March 5, 2020

Transcript of Fast Bilinear Algorithms for Convolution · Convolution The discrete convolution between vectors f...

Page 1: Fast Bilinear Algorithms for Convolution · Convolution The discrete convolution between vectors f 2Rr and g 2Rn is y k = X i f ig k i: View as a matrix{vector product between matrix

Fast Bilinear Algorithms for Convolution

Caleb Ju

CS598EVS

March 5, 2020

Page 2: Fast Bilinear Algorithms for Convolution · Convolution The discrete convolution between vectors f 2Rr and g 2Rn is y k = X i f ig k i: View as a matrix{vector product between matrix

Convolution

The discrete convolution between vectors f ∈ Rr and g ∈ Rn is

yk =∑i

figk−i .

View as a matrix–vector product between matrix T and vector f ,

yk =∑i

gk−i fi =∑j

Tk,j · fj = Tf .

What does the matrix Tlook like?

Denote as T〈g ,r〉, whichis a Toeplitz matrix,whereT〈g ,r〉 ∈ Rn+r−1×r .

T〈g ,r〉 =

g0 0 · · · 0...

. . ....

gn−1. . . 0

. . . g0. . .

...gn−1

Page 3: Fast Bilinear Algorithms for Convolution · Convolution The discrete convolution between vectors f 2Rr and g 2Rn is y k = X i f ig k i: View as a matrix{vector product between matrix

Convolution and its Variants

Linear convolution is

yk =

min(k,r−1)∑i=max(0,k−n+1)

figk−i .

The bounds ensure that if we go past either end of vector g , wedon’t compute.

We also have cyclic convolution,

yk =r−1∑i=0

fig(k−i) mod n.

Can also derive correlation,

yk =r−1∑i=0

figk+i .

Page 4: Fast Bilinear Algorithms for Convolution · Convolution The discrete convolution between vectors f 2Rr and g 2Rn is y k = X i f ig k i: View as a matrix{vector product between matrix

Applications of Convolution

String matching (Clifford and Clifford, 2007)

Let the pattern be p ∈ Σm and the text be t ∈ Σn.

m−1∑j=0

(pj − ti+j)2 =

m−1∑j=0

(p2j − 2pj ti+j + t2i+j) , ∀ 0 ≤ i ≤ n −m.

Image Processing (Convolutional Neural Network)

Given K filters in tensor F of size r × r , N input images in tensorG of size n × n. Seek to sum over all H channels,

yikxy =H∑

c=1

r∑v=1

r∑u=1

fkcuv · gi ,c,x+u,y+v .

Other applications: cosmological simulation, solutions to partialdifferential equations, signal processing, integer multiplication, . . .

Page 5: Fast Bilinear Algorithms for Convolution · Convolution The discrete convolution between vectors f 2Rr and g 2Rn is y k = X i f ig k i: View as a matrix{vector product between matrix

Fast Algorithms for Computing Convolution

A direction computation has O(n2) cost.

Consider complex multiplication,

x × y = (a + bi)× (c + di) = (ac − bd) + (ad + bc)i

= (ac − bd) +(ac + bd − (a− b)(c − d)

)i .

Karatsuba’s Algorithm applies this recursively for O(nlog2(3)) cost.Can also be solved by the discrete Fourier transform,

a ∗ b = IDFT(DFT(a) DFT(b)

).

Using the fast Fourier transform (FFT), can compute linearconvolution in O(nlogn) time.

Other algorithms: Winograd’s minimal filtering method, matrixmultiplication, fast symmetric multiplication

Page 6: Fast Bilinear Algorithms for Convolution · Convolution The discrete convolution between vectors f 2Rr and g 2Rn is y k = X i f ig k i: View as a matrix{vector product between matrix

Derivation of Bilinear Algorithms

Recall a bilinear algorithm is

c = F (C)(

(F (A)Ta) (F (B)Tb))

=∑i

∑j

tijkajbk .

The discrete linear convolution of f and g by

yk =

min(k,r−1)∑i=max(0,k−n+1)

fi · gk−i =∑i ,j

tijk figj ,

The tensor T is defined by tijk =

1 : i + j − k = 0

0 : otherwise.

Page 7: Fast Bilinear Algorithms for Convolution · Convolution The discrete convolution between vectors f 2Rr and g 2Rn is y k = X i f ig k i: View as a matrix{vector product between matrix

Convolution is Multiplication

How can we derive fast bilinear algorithms for convolution?

Define polynomials a(x) = a0 + a1x + · · ·+ ar−1xr−1 and

b(x) = b0 + b1x + · · · bn−1xn−1. Their product is

c(x) = a(x)b(x) =r+n−2∑k=0

min(k,n−1)∑i=max(0,k−n+1)

(ai · bk−i )xk .

The coefficients of c(x) = c0 + c1x + . . .+ cr+n−2xr+n−2 are

determined by linear convolution.

Page 8: Fast Bilinear Algorithms for Convolution · Convolution The discrete convolution between vectors f 2Rr and g 2Rn is y k = X i f ig k i: View as a matrix{vector product between matrix

Convolution as Multiplication

How can we compute c(x)? Suppose we know the value of c(xi )at some nodes x0, . . . , xi , . . . xR−1 and R = deg c(x) + 1. Letcoefficients of c(x) be c . We can get c by

c(xi ) =R−1∑k=0

xki ck = Vi ,:c where V =

x00 . . . xR−10...

...

x0R−1 . . . xR−1R−1

∈ CR×R .

How can we compute c(xi )? Recall c(x) = a(x)b(x). Therefore,

c(xi ) = a(xi )b(xi ).

How can we compute a(xi )? Let a be the coefficients ofpolynomial a(x) (and b for b(x)). Then, computing a(xi ) is aninner product,

a(xi ) =r−1∑k=0

xki ak = Vi ,:a where V is the first r columns of V .

Page 9: Fast Bilinear Algorithms for Convolution · Convolution The discrete convolution between vectors f 2Rr and g 2Rn is y k = X i f ig k i: View as a matrix{vector product between matrix

Toom-Cook Algorithm

Toom-Cook

1. Evaluate α = V a and β = V b2. Compute the products ν = α β

3. Interpolate by solving the linear system Vc = ν

Can prescribe this three-step computation as the following bilinearalgorithm,

c = V−1(2n−1×2n−1)

(V(2n−1×n)a V(2n−1×n)b

).

where V is a Vandermonde matrix, V =

x00 . . . xR−10...

...

x0R−1 . . . xR−1R−1

.

Page 10: Fast Bilinear Algorithms for Convolution · Convolution The discrete convolution between vectors f 2Rr and g 2Rn is y k = X i f ig k i: View as a matrix{vector product between matrix

Discrete Fourier Transform

(a) Chebyshev Nodes(b) Equispaced Nodes on Unit Cirlce

Discrete Fourier TransformLet ω(n) = exp(−2πi/n), the nth primitive root of unity. Set the

nodes of V as [ω0(n), ω(n), . . . , ω

r−1(n) ]. Then, V is the Fourier matrix

(and V−1 is the inverse Fourier matrix), leading to bilinearalgorithm,

c = F−1(2n−1×2n−1)

(F(2n−1×n)a F(2n−1×n)b

).

Page 11: Fast Bilinear Algorithms for Convolution · Convolution The discrete convolution between vectors f 2Rr and g 2Rn is y k = X i f ig k i: View as a matrix{vector product between matrix

Alternative Bilinear Algorithms

The Toom-Cook method and fast Fourier transform work well forsmall and large convolution problems respectively.

I The Toom-Cook is numerically inaccurate for convolutions ofsize greater than four

I The FFT has significant hidden constants

Now we examine alternative algorithms that offer trade-offsbetween computational efficiency and numerical accuracy.

Page 12: Fast Bilinear Algorithms for Convolution · Convolution The discrete convolution between vectors f 2Rr and g 2Rn is y k = X i f ig k i: View as a matrix{vector product between matrix

Modular Polynomial Multiplication

Let’s revisit convolution as a polynomial multiplication problem,

c(x) = a(x)b(x) =2n∑k=0

min(k,n−1)∑i=max(0,k−n+1)

(ai · bk−i )xk .

What is the remainder of c(x) divided polynomial M wheredegM > deg c(x)?

c(x) = r(x) ≡ c(x) (mod M).

What if we use a polynomial m where degm ≤ deg c(x)?

c(x) 6= r(x) ≡ c(x) (mod m).

Page 13: Fast Bilinear Algorithms for Convolution · Convolution The discrete convolution between vectors f 2Rr and g 2Rn is y k = X i f ig k i: View as a matrix{vector product between matrix

Modular Polynomial Multiplication

Why use modulo polynomial multiplication? Modulomultiplication decreases size of inputs.

c(x) ≡ a(x)b(x) ≡(a(x) mod m

)(b(x) mod m

)(mod m).

However, this leads to an answer that is congruent to the actualproduct, i.e. not the solution we actually want.

Can we compute the polynomial multiplication using modulopolynomial multiplication?

Yes, using the Chinese Remainder Theorem.

Page 14: Fast Bilinear Algorithms for Convolution · Convolution The discrete convolution between vectors f 2Rr and g 2Rn is y k = X i f ig k i: View as a matrix{vector product between matrix

Chinese Remainder Theorem

TheoremLet m(1), . . . ,m(k) be coprime integers and M =

∏ki m

(i). Givenremainders r (1), . . . , r (k) where 0 ≤ r (i) < m(i), the ChineseRemainder Theorem (CRT) asserts that there exists a uniqueinteger x (modulo M) such that

x ≡ r (i) (mod m(i)) ∀i ∈ [k].

Further, this mapping between integer and remainders is a ringisomorphism (structure preserving).

Example

Let m(1) = 3,m(2) = 4, and M = 12. Let x = 7 (mod M), and itsremainders,

x ≡ r (1) ≡ 1 (mod 3) and x ≡ r (2) ≡ 3 (mod 4).

Page 15: Fast Bilinear Algorithms for Convolution · Convolution The discrete convolution between vectors f 2Rr and g 2Rn is y k = X i f ig k i: View as a matrix{vector product between matrix

Chinese Remainder Theorem: Example

Let x ≡ 7 (mod 12). Seek to compute (7× 4) (mod 12).

Figure: Ring Isomorphism

Page 16: Fast Bilinear Algorithms for Convolution · Convolution The discrete convolution between vectors f 2Rr and g 2Rn is y k = X i f ig k i: View as a matrix{vector product between matrix

Chinese Remainder Theorem: Example

x ≡ r (1) ≡ 1 (mod 3) and x ≡ r (2) ≡ 3 (mod 4).

Figure: Ring Isomorphism

Page 17: Fast Bilinear Algorithms for Convolution · Convolution The discrete convolution between vectors f 2Rr and g 2Rn is y k = X i f ig k i: View as a matrix{vector product between matrix

Chinese Remainder Theorem: Example

r ′(1) ≡ r (1) × 4 ≡ 4 ≡ 1 (mod 3) and r ′(2) ≡ r (2) × 4 ≡ 0 (mod 4).

Figure: Ring Isomorphism

Page 18: Fast Bilinear Algorithms for Convolution · Convolution The discrete convolution between vectors f 2Rr and g 2Rn is y k = X i f ig k i: View as a matrix{vector product between matrix

Chinese Remainder Theorem: Example

y ≡ 28 ≡ 4 (mod 12) satisfiesr ′(1) ≡ 1 (mod 3) and r ′(2) ≡ 0 (mod 4).

Figure: Ring Isomorphism

Page 19: Fast Bilinear Algorithms for Convolution · Convolution The discrete convolution between vectors f 2Rr and g 2Rn is y k = X i f ig k i: View as a matrix{vector product between matrix

Modular Polynomial Multiplication

Akin to interpolation, modular polynomial multiplication can becomputed via

I Compute the remainders of a(x) and b(x) for a series ofcoprime divisors m(i)

I Multiply the corresponding remainders (can use normalpolynomial multiplication)

I Map remainders back to its (unique) polynomial

How do we recover the polynomial from its remainder?The Chinese Remainder Theorem also tells us how to do so.

Page 20: Fast Bilinear Algorithms for Convolution · Convolution The discrete convolution between vectors f 2Rr and g 2Rn is y k = X i f ig k i: View as a matrix{vector product between matrix

Chinese Remainder Theorem (part 2)

TheoremRecall the polynomial divisors m(i) are coprime, M =

∏i m

(i), andwe have a set of remainders r (i). To solve for x , we compute

x =( k∑

i=1

r (i)M(i)N(i))

mod M,

where M(i) = M/m(i) and N(i) and n(i) are arbitrary polynomialssatisfying Bezout’s identity,

M(i)N(i) + m(i)n(i) = 1.

Page 21: Fast Bilinear Algorithms for Convolution · Convolution The discrete convolution between vectors f 2Rr and g 2Rn is y k = X i f ig k i: View as a matrix{vector product between matrix

Chinese Remainder Theorem (part 2): Example

Coprimepolynomial divisorsm(i),

whereM =

∏i m

(i),

andM(i) = M/m(i).Let N(i), n(i) suchthat ∀i M(i)N(i) +m(i)n(i) = 1.

Solution is x =( k∑i=1

r (i)M(i)N(i))

mod M.

Compute product y = (4×7) (mod 12).

Have M(1) = 4, m(1) = 3, M(2) = 3,m(2) = 4, and M = 12,

with remainders r ′(1) ≡ 1 (mod 3) andr ′(2) ≡ 0 (mod 4).

See 4N(1) + 3n(1) = 1 and3N(2) + 4n(2) = 1 are satisfied withN(1) = 1, n(1) = −1, N(2) = −1, andn(2) = 1.

So we have∑i

r (i)M(i)N(i) = 1(4)(1) + 0(3)(−1)

= 4 ≡ 28 (mod 12).

Page 22: Fast Bilinear Algorithms for Convolution · Convolution The discrete convolution between vectors f 2Rr and g 2Rn is y k = X i f ig k i: View as a matrix{vector product between matrix

Chinese Remainder Theorem (part 2)

x =( k∑

i=1

r (i)M(i)N(i))

mod M

Why does this work?

Since M(i)N(i) = 1−m(i)n(i), then for a fixed i ,

x =∑j

r (j)M(j)N(j) = r (i)(1−m(i)n(i)︸ ︷︷ ︸=M(i)N(i)

) = r (i) (mod m(i))

The Chinese Remainder tells us there is bijection betweenremainders and the original polynomial. Therefore, any polynomialsatsifying the remainder equivalences is equivalent to the originalpolynomial (modulo M)!

Page 23: Fast Bilinear Algorithms for Convolution · Convolution The discrete convolution between vectors f 2Rr and g 2Rn is y k = X i f ig k i: View as a matrix{vector product between matrix

Modular Polynomial Multiplication

The Chinese Remainder Theorem required thatM(i)N(i) + m(i)n(i) = 1 for all i . Does there even exist N(i), n(i)?

Theorem (Bezout’s identity)

Let p and q be coprime polynomials (do not share any roots), thenthere exists polynomials u and v such that pu + qv = 1.

Since M(i) and m(i) are coprime, there exists polynomials N(i) andn(i) such that

M(i)N(i) + m(i)n(i) = 1.

Page 24: Fast Bilinear Algorithms for Convolution · Convolution The discrete convolution between vectors f 2Rr and g 2Rn is y k = X i f ig k i: View as a matrix{vector product between matrix

Winograd Convolution Algorithm

Let f ∈ Rr and g ∈ Rn be the vectors we seek to convolve. Recallthat we first compute the remainders,

f = r(i)(f )(mod m(i)) and g = r

(i)(g)(mod m(i)).

Next, we compute the product of remainders using a convolutionalgorithm,

r (i) = (r(i)(g) ∗ r

(i)(g))(mod m(i)).

We use the Chinese remainder theorem to recover the uniquesolution,

y =(∑

r (i) ∗M(i) ∗ N(i))(mod M),

where M(i) = M/m(i) and M(i)N(i) + m(i)n(i) = 1.

Page 25: Fast Bilinear Algorithms for Convolution · Convolution The discrete convolution between vectors f 2Rr and g 2Rn is y k = X i f ig k i: View as a matrix{vector product between matrix

Toom-Cook vs. Winograd Convolution Algorithm

Toom-Cook

1. Evaluate at a set ofunique integer points

2. Compute the element-wisemultiplication (these areevaluated points of theproduct)

3. Interpolate to recover theproduct polynomial

Winograd ConvolutionAlgorithm

1. Evaluate the remainderwith the set of coprimepolynomial divisors m(i)

2. Compute the element-wisepolynomial multiplication(via convolution)

3. Use the CRT to recover theproduct polynomial moduloM

Page 26: Fast Bilinear Algorithms for Convolution · Convolution The discrete convolution between vectors f 2Rr and g 2Rn is y k = X i f ig k i: View as a matrix{vector product between matrix

Toom-Cook vs. Winograd Convolution Algorithm

Toom-Cook

1. Evaluate at a set ofunique integer points

2. Compute the element-wisemultiplication (these areevaluated points of theproduct)

3. Interpolate to recover theproduct polynomial

Winograd ConvolutionAlgorithm

1. Evaluate the remainderwith the set of coprimepolynomial divisors m(i)

2. Compute the element-wisepolynomial multiplication(via convolution)

3. Use the CRT to recover theproduct polynomial moduloM

Page 27: Fast Bilinear Algorithms for Convolution · Convolution The discrete convolution between vectors f 2Rr and g 2Rn is y k = X i f ig k i: View as a matrix{vector product between matrix

Evaluate the Remainder of a Polynomial Division

Denote the coefficients of an arbitrary polynomial p as p, e.g.p = 3x2 − 1 is represented as p =

[−1 0 3

]Let p and m be polynomials where deg(m) ≤ deg(p).

Modulo Operation

LemmaLet r = p (mod m), with d = deg p. There exists a matrix X〈m,d〉such that r = X〈m,d〉p.

Page 28: Fast Bilinear Algorithms for Convolution · Convolution The discrete convolution between vectors f 2Rr and g 2Rn is y k = X i f ig k i: View as a matrix{vector product between matrix

Evaluate the Remainder of a Polynomial Division

LemmaLet r = p (mod m), with d = deg p. There exists a matrix X〈m,d〉such that r = X〈m,d〉p.

Proof.We know p = mq + r for some polynomial q. Then,

T〈m,r〉q + r =

m0 . . . 0...

. . .

mdegm−1 m0

. . ....

mdegm−1

q + r =

[UL

]q +

[r0

]=

[p(A)

p(B)

].

Solving both systems, we get r = −UL−1p(B) + p(A).

Page 29: Fast Bilinear Algorithms for Convolution · Convolution The discrete convolution between vectors f 2Rr and g 2Rn is y k = X i f ig k i: View as a matrix{vector product between matrix

Solve Bezout’s identity

LemmaWrite MN + mn = 1 as[

T〈M,degm−1〉 T〈m,degM−1〉]︸ ︷︷ ︸

A

[Nn

]=[1 0 . . .

]T

Proof.Show that the matrix A is invertible.

Page 30: Fast Bilinear Algorithms for Convolution · Convolution The discrete convolution between vectors f 2Rr and g 2Rn is y k = X i f ig k i: View as a matrix{vector product between matrix

Winograd Convolution Algorithm

Theorem (Winograd Convolution Algorithm)

Given coprime polynomials m(1),m(2) such that M = m(1)m(2) anddegM = n + r − 1, bilinear algorithms (A(i),B(i),C (i)) for aconvolution of dimension degm(i) for i ∈ 1, 2, then (A,B,C ) isa convolution for vectors of dimension r and n, where

A =[XT〈m(1),r−1〉A

(1) , XT〈m(2),r−1〉A

(2))],

B =[XT〈m(1),n−1〉B

(1) , XT〈m(2),n−1〉B

(2))], and

C =[C (1) , C (2)

],

with C (i) = X〈M,degM+degm(i)−2〉T〈e(i),degm(i)〉X〈m(i),2 degm(i)−1〉C(i)

and polynomial e(i) = M(i)N(i) mod M is defined from Bezout’sidentity.

Page 31: Fast Bilinear Algorithms for Convolution · Convolution The discrete convolution between vectors f 2Rr and g 2Rn is y k = X i f ig k i: View as a matrix{vector product between matrix

Rank of Winograd Convolution Algorithm

Given f ∈ Rr and g ∈ Rn, the solution y ∈ Rr+n−1. Therefore,select M to be a (n + r − 1)-degree polynomial.

Remark The bilinear rank R of the Winograd convolutionalgorithm with polynomial divisors m(1), . . . ,m(k) is

k∑i=1

(2 degm(i) − 1).

Observation Increasing the bilinear rank of the Winogradconvolution with (at least one) superpolynomial divisor (degreegreater than one) improves the numerical accuracy of convolution.

Page 32: Fast Bilinear Algorithms for Convolution · Convolution The discrete convolution between vectors f 2Rr and g 2Rn is y k = X i f ig k i: View as a matrix{vector product between matrix

Nested and Multidimensional Convolution

Given F ,G ∈ Rn×n, a 2D convolution is defined as

yxy =r∑

i=0

r∑j=0

fijgx+i ,y+j =∑i

∑j

fijguv .

Can nest the tensors,

yab =r∑

i=0

r∑j=0

n∑u=0

n∑v=0

t(A)ixu t

(B)jyv fijguv .

Equivalently, we have the following nested bilinear algorithm,

vec(Y ) = (C ⊗ C )[(

(A⊗ A)T vec(F ))((B ⊗ B)T vec(G )

))],

or otherwise,

Y = C[(ATFA) (BTGB)

]CT .

Page 33: Fast Bilinear Algorithms for Convolution · Convolution The discrete convolution between vectors f 2Rr and g 2Rn is y k = X i f ig k i: View as a matrix{vector product between matrix

Overlap Add

We can use multidimensional convolution to solve 1D convolutionproblems.Let the recomposition matrix be

Q(γ,η) =

Iη−11

Iη−1 Iη−11

. . .

Iη−1 Iη−11

Iη−1

.

LemmaLet Y = F ∗ G , where F , G ∈ Rγ×η. Then if f = vec(F ),g = vec(G ), f ∗ g = vec(Q(γ,η)Y ).

Page 34: Fast Bilinear Algorithms for Convolution · Convolution The discrete convolution between vectors f 2Rr and g 2Rn is y k = X i f ig k i: View as a matrix{vector product between matrix

Numerical Accuracy

Figure: 1D convolution error

Page 35: Fast Bilinear Algorithms for Convolution · Convolution The discrete convolution between vectors f 2Rr and g 2Rn is y k = X i f ig k i: View as a matrix{vector product between matrix

Numerical Accuracy

Figure: 2D convolution error

Page 36: Fast Bilinear Algorithms for Convolution · Convolution The discrete convolution between vectors f 2Rr and g 2Rn is y k = X i f ig k i: View as a matrix{vector product between matrix

Properties of Bilinear Algorithms

Matrix Interchange

I How can we build new algorithms with the sameencoding/decoding matrices?

I Can we design new algorithms with the same complexity assimilar bilinear algorithms?

Asymptotic Complexity

I The role of bilinear rank.

I How can we nest bilinear algorithms?

Lower BoundsI What are lower bounds for bilinear algorithms?

Page 37: Fast Bilinear Algorithms for Convolution · Convolution The discrete convolution between vectors f 2Rr and g 2Rn is y k = X i f ig k i: View as a matrix{vector product between matrix

Matrix Interchange

Recall the definition of the discrete convolution and correlationalgorithm,

yk =r−1∑i=0

figk−i and yk =r−1∑i=0

figk+i .

Theorem (Matrix Interchange)

Let the bilinear algorithm for discrete convolution f and g bedefined as C

((AT f ) (BTg)

). The correlation algorithm with

output size m = n is

B(

(AT f ) (CTg)).

Page 38: Fast Bilinear Algorithms for Convolution · Convolution The discrete convolution between vectors f 2Rr and g 2Rn is y k = X i f ig k i: View as a matrix{vector product between matrix

Matrix Interchange

Let the bilinear algorithm for discrete convolution f and g bedefined as C

((AT f ) (BTg)

). The correlation algorithm with

output size m = n is

B(

(AT f ) (CTg)).

Proof.The tensor T in yk =

∑ijtijk figj is 1 if and only if i + j − k = 0.

Moreover, the tensor T corr in yk =∑ijtcorrijk figj is one if and only if

i − j + k = 0.

We see the role of index j (belonging to encoding matrix B) andindex k (belonging to decoding matrix C ) are swapped.

Page 39: Fast Bilinear Algorithms for Convolution · Convolution The discrete convolution between vectors f 2Rr and g 2Rn is y k = X i f ig k i: View as a matrix{vector product between matrix

Bilinear Rank

We will denote the bilinear algorithm,

yk =R−1∑l=0

ckl

( r−1∑i=0

ail fi

)( n−1∑j=0

bjlgj

), i.e., y = C

[(AT f )(BTg)

].

with the triplet (A,B,C ). The variable R determines the numberof element-wise multiplications.

Theorem (Correlation Rank Lower Bound (Winograd, 1980))

Given a filter of size r and output of size m, the minimum rank ofa correlation algorithm is m + r − 1.

Corollary

Given a filter of size r and input of size n, the minimum rank of alinear convolution algorithm is n + r − 1.

Page 40: Fast Bilinear Algorithms for Convolution · Convolution The discrete convolution between vectors f 2Rr and g 2Rn is y k = X i f ig k i: View as a matrix{vector product between matrix

Asymptotic Complexity

Like in matrix multiplication, we can recursively compute a largerconvolution using a smaller one.

Given a convolution algorithm that divides the problem by size band has bilinear rank R, the cost of the algorithm is

T (n) = R · T (n/b) + (c · b) · n/b= c · nlogb(R).

Page 41: Fast Bilinear Algorithms for Convolution · Convolution The discrete convolution between vectors f 2Rr and g 2Rn is y k = X i f ig k i: View as a matrix{vector product between matrix

Error Bounds

Convolution is an ill-posed problem

Consider the cyclic convolution of1−11−1

...

∗cyclic

1111...

=

0000...

.

Therefore, we will use absolute error rather than relative error.

Page 42: Fast Bilinear Algorithms for Convolution · Convolution The discrete convolution between vectors f 2Rr and g 2Rn is y k = X i f ig k i: View as a matrix{vector product between matrix

Error Bounds

Theorem (1D bilinear algorithm convolution error)

Given inputs f ∈ Rr and g ∈ Rn, the absolute error of the bilinearalgorithm

‖δy‖ ≤ 2(‖C‖ · ‖A‖ · ‖B‖ · ‖f ‖ · ‖g‖

)ε+ O(ε2),

where ‖ · ‖ is the 2-norm.

Corollary

A d-nested convolution with F ∈ Rr×···×r and G ∈ Rn×···×n hasan error of

||δY || ≤ 2(||C ||d · ||A||d · ||B||d · ||vec(F)|| · ||vec(G)||

)ε+ O(ε2).

Page 43: Fast Bilinear Algorithms for Convolution · Convolution The discrete convolution between vectors f 2Rr and g 2Rn is y k = X i f ig k i: View as a matrix{vector product between matrix

Error Bounds

Proof.We can use the fact ||Ax || ≤ ||A|| · ||x || for the encoding anddecoding step. To bound the error from the element-wise product,we use the fact that

‖x y‖2 =∑i

|xiyi |2 ≤(∑

i

|xi |2)(∑

i

|yi |2)

= ‖x‖2 · ‖y‖2,

which leads to ‖x y‖ ≤ ‖x‖ · ‖y‖.

Page 44: Fast Bilinear Algorithms for Convolution · Convolution The discrete convolution between vectors f 2Rr and g 2Rn is y k = X i f ig k i: View as a matrix{vector product between matrix

Error Mitigation

Theorem (Pan 2016)

For a Vandermonde matrix V with s as the large magnitude node,the condition number is proportional to

κ(V ) = Ω(sn−1√

n

).

Need node to find ways to either decrease κ(V ) or use a differentmatrix.

Page 45: Fast Bilinear Algorithms for Convolution · Convolution The discrete convolution between vectors f 2Rr and g 2Rn is y k = X i f ig k i: View as a matrix{vector product between matrix

Error Mitigation

Better node choiceNumerical accuracy of interpolation improves buy better nodechoices

I Chebyshev nodes

I Brute force search

Can combine small convolution algorithms into larger convolutionalgorithms. Given matrices A,B where C = A⊗ B, we have

κ(C ) = κ(A)κ(B).

Instead of having ||A|| = Ω(nn), we have ||A|| = Ω(nn

1/d)

.

Page 46: Fast Bilinear Algorithms for Convolution · Convolution The discrete convolution between vectors f 2Rr and g 2Rn is y k = X i f ig k i: View as a matrix{vector product between matrix

Numerical Accuracy

Figure: 1D convolution error

Page 47: Fast Bilinear Algorithms for Convolution · Convolution The discrete convolution between vectors f 2Rr and g 2Rn is y k = X i f ig k i: View as a matrix{vector product between matrix

Numerical Accuracy

Figure: 2D convolution error

Page 48: Fast Bilinear Algorithms for Convolution · Convolution The discrete convolution between vectors f 2Rr and g 2Rn is y k = X i f ig k i: View as a matrix{vector product between matrix

Arithmetic Complexity

Let nnz(A) be the number of nonzeros, additions a(A) the numberof additions needed, and m(A) the number of multiplications. Wehave

a(A) ≤(nnz(A)−#row(A)

)and m(A) ≤ nnz(A).

Therefore, the overall cost of a convolution is

a(F ) ≤ a(A)+a(B)+a(C ) and m(F ) ≤ m(A)+m(B)+m(C )+R.

Page 49: Fast Bilinear Algorithms for Convolution · Convolution The discrete convolution between vectors f 2Rr and g 2Rn is y k = X i f ig k i: View as a matrix{vector product between matrix

Final Thoughts

Can also use bilinear algorithms to

I Find communication lower bounds

I Discover alternative bilinear algorithms

Concluding Thoughts

We have derived a family of fast bilinear algorithms.

We analyzed the error bounds and arithmetic costs for the differentalgorithms, esepcially bounded vs. unbounded algorithms.

Page 50: Fast Bilinear Algorithms for Convolution · Convolution The discrete convolution between vectors f 2Rr and g 2Rn is y k = X i f ig k i: View as a matrix{vector product between matrix

Thanks!

Remaining Questions

I Communication lower bounds for nested convolutionalgorithms

I Error lower bounds with node and polynomial divisors choice

I Do polynomial and interpolation-based algorithms cover theentire class of fast bilinear algorithms?

More information covered in the paper,

Caleb Ju and Edgar Solomonik. Derivation and analysis of fastbilinear algorithms for convolution, arXiv:1910.13367 [math.NA],October 2019.