Reconfigurable Computing (EN2911X, Fall07) Lecture 16: Application-Driven

Reconfigurable ComputingS. Reda, Brown University

Reconfigurable Computing(EN2911X, Fall07)

Lecture 16: Application-Driven Hardware Acceleration (1/4)

Prof. Sherief RedaDivision of Engineering, Brown University

http://ic.engin.brown.edu


Fast Fourier transform

• One of the most important subroutines in scientific computing

• Used in many applications including: signal and image processing, solution of differential equations, multiplication of polynomial functions, data compression, …, etc

• One of the most widely implemented hardware accelerators


Discrete Fourier transform

DFT

110 ,...,, Nxxx

110 ,...,, NXXX

1

0

2N

i

ikNj

ik exX

Maps a set of input points to another set of output points.The operation is reversible.


Roots of the unity

real

imaginary

(1, 0)

(0, j)

(-1, 0)

(0, -j)

• What are the Nth roots of unity? If N = 8 then we have

78

26

8

2

58

24

8

2

38

22

8

2

18

20

8

2

,

,

,

,

jj

jj

jj

jj

ee

ee

ee

ee

Define Nj

N eW2


Calculating the DFT

1

0

1

0

2 N

i

iki

N

i

ikNj

ik NWxexX

1

3

2

1

0

321

3963

2642

1321

1

3

2

1

0

.

...1

...

...1

...1

...1

1...1111

...

NNN

NN

NN

NNNNN

NNNNN

NNNNN

N x

x

x

x

x

WWWW

WWWW

WWWW

WWWW

X

X

X

XX

How many arithmetic (+ and *) operations do we need to calculate the DFT?


Computing the DFT using the FFT• How can we do better? Fast Fourier Transform (FFT)

ok

kN

ekk

N

i

N

ikj

ikN

N

i

N

ikj

ik

N

kjN

i

N

ikj

i

N

i

N

ikj

ik

N

i

N

kij

i

N

i

N

kij

ik

N

i

ikNj

ik

XWXX

exWexX

eexexX

exexX

exX

12/

0

2/

2

12

12/

0

2/

2

2

212/

0

2/

2

12

12/

0

2/

2

2

12/

0

)12(2

12

12/

0

)2(2

2

1

0

2

DFT of even indices

DFT of odd indicesThe sum of N point DFT has been broken into two N/2 point DFTs


Example when N=8Objective: Compute X0, X1, … X7 given x0, x1, …, x7

magic box

magic box

x0

x2

x4

x6

x1

x3

x5

x7

eX 0eX1eX 2eX 3

oX 0oX1oX 2oX 3

18W

08W

X0

28W38W48W58W68W

78W

X1

X2

X3

X4

X5

X6

X7

kN

NkN WW 2/

Note that


Now let’s apply the idea recursively

x0

x4

x2

x6

x1

x5

x3

x7

eX 0eX1eX 2eX 3

oX 0oX1oX 2oX 3

18W

08W

X0

28W

38W48W58W68W

78W

X1

X2

X3

X4

X5

X6

X7

eeX 0eeX1eoX 0eoX1

oeX 0oeX1ooX 0ooX1

04W14W

24W

34W

04W14W

24W

34W


One more timex0

x4

x2

x6

x1

x5

x3

x7

eX 0eX1eX 2eX 3

oX 0oX1oX 2oX 3

18W

08W

X0

28W

38W48W58W68W

78W

X1

X2

X3

X4

X5

X6

X7

eeX 0eeX1eoX 0

eoX1oeX 0oeX1ooX 0

ooX1

04W14W

24W

34W

04W14W

24W

34W

• How many operations do we need now?• What is the execution time on a general purpose CPU?• What is the execution time on a FPGA? How many resources u need?


Another way to visualize FFT computations

How can we determine the order of the first inputs?

x0

x4

x2

x6

x1

x5

x3

x7

Butterfly

Butterfly

Butterfly

Butterfly

Butterfly

Butterfly

Butterfly

Butterfly

Butterfly

Butterfly

Butterfly

Butterfly

X0

X4

X1

X5

X2

X6

X3

X7


Application of FFT: faster multiplication of two polynomials

Suppose we want to evaluate A(x) at x0, how many operations do we need?

Use Horner’s rule

Suppose you have two polynomials represented by the coefficient vectors

• How many operations it takes to add these two polynomials?• How many operations it takes to multiply these two polynomials?


Point value representation

A point-value representation of a polynomial A(x) of degree-bound N is a set of N point-value pairs

such that all of the xk are distinct and yk=A(xk) for k=0, 1, …, N-1

How many operations do we need to compute the point representation of a polynomial? How can we do better?


Interpolation of polynomials from point-value representations

Given the point representation of a polynomial, how can we inverse the evaluation, i.e., determine the coefficient form of a polynomial from a point representation?

NNNN

N

N

NN y

y

y

a

a

a

x

x

x

xx

xx

xx

......

...1

...

...1

..1

1

0

1

1

0

11

11

10

211

211

200

How can we find the a’s?


Adding and multiplying polynomials in point representation

If polynomial C(x)=A(x)+B(x) then we can get point representation of C easily

Polynomial A

Polynomial B

How many operations do we need? How about C(x)=A(x)*B(x)?


How can we convert a polynomial quickly from coefficient form to point-value and back?

Evaluate O(N2)

Point-wisemultiplication

Interpolate O(N2)

Ordinary multiplication

O(N2)

O(N)

It does not make sense now. How can we evaluate and interpolate faster than O(N2)? Can we choose the evaluation points smartly?


Choosing the evaluation points smartly

.

.

.


Finally multiplying polynomials in O(NlogN)

FFT O(N log N)

Point-wisemultiplication

Inverse FFT

Ordinary multiplication

O(N2)

O(N)


Back to signal processing

Linear systemwith

Impulse response(b0, b1, …, bN-1)

(a0, a1, …, aN-1)

T=0: a0b0

T=1: a0b1+a1b0

T=2: a0b2+a1b1+a2b0

….

….The response of the system to the input signal at different times is equal to the coefficients of the polynomial produced from multiplying the input signal polynomial with the impulse response polynomial? Commonly known as the convolution of the input and the system’s impulse response. How to do to find the output response faster than O(N2)?


Summary

• The lecture covered one of the most important hardware accelerators: FFT

• We have seen how it can be parallelized and speed up

• Examined some of the applications

Reconfigurable Computing (EN2911X, Fall07) Lecture 16: Application-Driven

Documents

Transcript of Reconfigurable Computing (EN2911X, Fall07) Lecture 16: Application-Driven