Reconfigurable Computing (EN2911X, Fall07) Lecture 16: Application-Driven
description
Transcript of Reconfigurable Computing (EN2911X, Fall07) Lecture 16: Application-Driven
Reconfigurable ComputingS. Reda, Brown University
Reconfigurable Computing(EN2911X, Fall07)
Lecture 16: Application-Driven Hardware Acceleration (1/4)
Prof. Sherief RedaDivision of Engineering, Brown University
http://ic.engin.brown.edu
Reconfigurable ComputingS. Reda, Brown University
Fast Fourier transform
• One of the most important subroutines in scientific computing
• Used in many applications including: signal and image processing, solution of differential equations, multiplication of polynomial functions, data compression, …, etc
• One of the most widely implemented hardware accelerators
Reconfigurable ComputingS. Reda, Brown University
Discrete Fourier transform
DFT
110 ,...,, Nxxx
110 ,...,, NXXX
1
0
2N
i
ikNj
ik exX
Maps a set of input points to another set of output points.The operation is reversible.
Reconfigurable ComputingS. Reda, Brown University
Roots of the unity
real
imaginary
(1, 0)
(0, j)
(-1, 0)
(0, -j)
• What are the Nth roots of unity? If N = 8 then we have
78
26
8
2
58
24
8
2
38
22
8
2
18
20
8
2
,
,
,
,
jj
jj
jj
jj
ee
ee
ee
ee
Define Nj
N eW2
Reconfigurable ComputingS. Reda, Brown University
Calculating the DFT
1
0
1
0
2 N
i
iki
N
i
ikNj
ik NWxexX
1
3
2
1
0
321
3963
2642
1321
1
3
2
1
0
.
...1
...
...1
...1
...1
1...1111
...
NNN
NN
NN
NNNNN
NNNNN
NNNNN
N x
x
x
x
x
WWWW
WWWW
WWWW
WWWW
X
X
X
XX
How many arithmetic (+ and *) operations do we need to calculate the DFT?
Reconfigurable ComputingS. Reda, Brown University
Computing the DFT using the FFT• How can we do better? Fast Fourier Transform (FFT)
ok
kN
ekk
N
i
N
ikj
ikN
N
i
N
ikj
ik
N
kjN
i
N
ikj
i
N
i
N
ikj
ik
N
i
N
kij
i
N
i
N
kij
ik
N
i
ikNj
ik
XWXX
exWexX
eexexX
exexX
exX
12/
0
2/
2
12
12/
0
2/
2
2
212/
0
2/
2
12
12/
0
2/
2
2
12/
0
)12(2
12
12/
0
)2(2
2
1
0
2
DFT of even indices
DFT of odd indicesThe sum of N point DFT has been broken into two N/2 point DFTs
Reconfigurable ComputingS. Reda, Brown University
Example when N=8Objective: Compute X0, X1, … X7 given x0, x1, …, x7
magic box
magic box
x0
x2
x4
x6
x1
x3
x5
x7
eX 0eX1eX 2eX 3
oX 0oX1oX 2oX 3
18W
08W
X0
28W38W48W58W68W
78W
X1
X2
X3
X4
X5
X6
X7
kN
NkN WW 2/
Note that
Reconfigurable ComputingS. Reda, Brown University
Now let’s apply the idea recursively
x0
x4
x2
x6
x1
x5
x3
x7
eX 0eX1eX 2eX 3
oX 0oX1oX 2oX 3
18W
08W
X0
28W
38W48W58W68W
78W
X1
X2
X3
X4
X5
X6
X7
eeX 0eeX1eoX 0eoX1
oeX 0oeX1ooX 0ooX1
04W14W
24W
34W
04W14W
24W
34W
Reconfigurable ComputingS. Reda, Brown University
One more timex0
x4
x2
x6
x1
x5
x3
x7
eX 0eX1eX 2eX 3
oX 0oX1oX 2oX 3
18W
08W
X0
28W
38W48W58W68W
78W
X1
X2
X3
X4
X5
X6
X7
eeX 0eeX1eoX 0
eoX1oeX 0oeX1ooX 0
ooX1
04W14W
24W
34W
04W14W
24W
34W
• How many operations do we need now?• What is the execution time on a general purpose CPU?• What is the execution time on a FPGA? How many resources u need?
Reconfigurable ComputingS. Reda, Brown University
Another way to visualize FFT computations
How can we determine the order of the first inputs?
x0
x4
x2
x6
x1
x5
x3
x7
Butterfly
Butterfly
Butterfly
Butterfly
Butterfly
Butterfly
Butterfly
Butterfly
Butterfly
Butterfly
Butterfly
Butterfly
X0
X4
X1
X5
X2
X6
X3
X7
Reconfigurable ComputingS. Reda, Brown University
Application of FFT: faster multiplication of two polynomials
Suppose we want to evaluate A(x) at x0, how many operations do we need?
Use Horner’s rule
Suppose you have two polynomials represented by the coefficient vectors
• How many operations it takes to add these two polynomials?• How many operations it takes to multiply these two polynomials?
Reconfigurable ComputingS. Reda, Brown University
Point value representation
A point-value representation of a polynomial A(x) of degree-bound N is a set of N point-value pairs
such that all of the xk are distinct and yk=A(xk) for k=0, 1, …, N-1
How many operations do we need to compute the point representation of a polynomial? How can we do better?
Reconfigurable ComputingS. Reda, Brown University
Interpolation of polynomials from point-value representations
Given the point representation of a polynomial, how can we inverse the evaluation, i.e., determine the coefficient form of a polynomial from a point representation?
NNNN
N
N
NN y
y
y
a
a
a
x
x
x
xx
xx
xx
......
...1
...
...1
..1
1
0
1
1
0
11
11
10
211
211
200
How can we find the a’s?
Reconfigurable ComputingS. Reda, Brown University
Adding and multiplying polynomials in point representation
If polynomial C(x)=A(x)+B(x) then we can get point representation of C easily
Polynomial A
Polynomial B
How many operations do we need? How about C(x)=A(x)*B(x)?
Reconfigurable ComputingS. Reda, Brown University
How can we convert a polynomial quickly from coefficient form to point-value and back?
Evaluate O(N2)
Point-wisemultiplication
Interpolate O(N2)
Ordinary multiplication
O(N2)
O(N)
It does not make sense now. How can we evaluate and interpolate faster than O(N2)? Can we choose the evaluation points smartly?
Reconfigurable ComputingS. Reda, Brown University
Choosing the evaluation points smartly
.
.
.
Reconfigurable ComputingS. Reda, Brown University
Finally multiplying polynomials in O(NlogN)
FFT O(N log N)
Point-wisemultiplication
Inverse FFT
Ordinary multiplication
O(N2)
O(N)
Reconfigurable ComputingS. Reda, Brown University
Back to signal processing
Linear systemwith
Impulse response(b0, b1, …, bN-1)
(a0, a1, …, aN-1)
T=0: a0b0
T=1: a0b1+a1b0
T=2: a0b2+a1b1+a2b0
….
….The response of the system to the input signal at different times is equal to the coefficients of the polynomial produced from multiplying the input signal polynomial with the impulse response polynomial? Commonly known as the convolution of the input and the system’s impulse response. How to do to find the output response faster than O(N2)?
Reconfigurable ComputingS. Reda, Brown University
Summary
• The lecture covered one of the most important hardware accelerators: FFT
• We have seen how it can be parallelized and speed up
• Examined some of the applications