Computation of DFT - VLSI Signal Processing Lab, EE, NCTUtwins.ee.nctu.edu.tw/courses/dsp_16/Class...
Transcript of Computation of DFT - VLSI Signal Processing Lab, EE, NCTUtwins.ee.nctu.edu.tw/courses/dsp_16/Class...
DSP (2015 Spring) Computation of DFT
NCTU EE 1
Computation of DFT
Efficient algorithms for computing DFT – Fast Fourier Transform.
(a) Compute only a few points out of all N points
(b) Compute all N points
What are the efficiency criteria?
Number of multiplications
Number of additions
Chip area in VLSI implementation
DFT as a Linear Transformation Matrix representation of DFT
Definition of DFT:
1,,1,0,)(1
)(
1,,1,0,)()(
1
0
1
0
NnWkXN
nx
NkWnxkX
N
k
knN
N
n
knN
where
Let ,
)1(
)1(
)0(
,
)1(
)1(
)0(
NX
X
X
Nx
x
x
NN
Xx
and
)1)(1()1(2)1(
)1(242
12
1
1
1
1111
NNN
NN
NN
NNNN
NNNN
N
WWW
WWW
WWW
W
Thus,
NN
NNN
NNN
N
N
N
XW
XWx
xWX
*
1
1IDFTpoint -
point DFT-
Because the matrix (transformation) NW has a specific structure and because k
NW has par-
ticular values (for some k and n), we can reduce the number of arithmetic operations for
computing this transform.
DSP (2015 Spring) Computation of DFT
NCTU EE 2
Example 3] 2 1 0[][ nx
jj
jj
WWW
WWW
WWW
WWWW
WWWW
WWWW
WWWW
11
1111
11
1111
1
1
1
1111
14
24
34
24
04
24
34
24
14
94
64
34
04
64
44
24
04
34
24
14
04
04
04
04
04
4W
Only additions are needed to compute this specific transform.
(This is a well-known radix-4 FFT)
Thus, the DFT of ][nx is
j
j
22
2
22
6
444 xWX
Fast Fourier Transform -- Highly efficient algorithms for computing DFT
General principle: Divide-and-conquer
Specific properties of kNW
Complex conjugate symmetry: *)( knN
knN WW
Symmetry: kN
Nk
N WW 2
Periodicity: kN
NkN WW
Particular values of k and n: e.g., radix-4 FFT (no multiplications)
Direct computation of DFT
1
0
1
0
Re][ImIm][Re
Im][ImRe][Re
1,,1,0 ,][][
N
nkn
Nkn
N
knN
knN
N
n
knN
WnxWnxj
WnxWnx
NkWnxkX
For each k, we need N complex multiplications and N-1 complex additions. 4N real
multiplications and 4N-2 real additions.
DSP (2015 Spring) Computation of DFT
NCTU EE 3
We will show how to use the properties of k
NW to reduce computations.
Radix-2 algorithms: Decimation-in-time; Decimation-in-frequency
Composite N algorithms: Cooley-Tukey; Prime factor
Winograd algorithm
Chirp transform algorithm
Radix-2 Decimation-in-time Algorithms -- Assume N-point DFT and 2N
Idea: N-point DFT 2
N -point DFT 4
N -point DFT
4
N -point DFT
2
N -point DFT 4
N -point DFT
4
N -point DFT
Sequence: ]1[]2[]3[]2[]1[]0[ Nxnxxxxx
Even index: ]2[]2[]0[ Nxxx
Odd index: ]1[]3[]1[ Nxxx
2/2/
22
22
12
0
12
0
)12(2
12
odd
2
even
1
0
]12[]2[
][][
1,,1,0 ,][][
NN
jN
j
N
N
r
N
r
krN
rkN
rn
n
knN
rn
n
knN
N
n
knN
WeeW
WrxWrx
WnxWnx
NkWnxkX
][][
]12[]2[][
point DFT-2
12
02/
point DFT-2
12
02/
kHWkG
WrxWWrxkX
kN
N
N
r
rkN
kN
N
N
r
rkN
DSP (2015 Spring) Computation of DFT
NCTU EE 4
Comparison:
(a) Direct computation of N-point DFT (N frequency samples):
~ 2N complex multiplications and 2N complex adds
(b) Direct computation of 2
N -point DFT:
~ 2
2
N complex multiplications and
2
2
N complex adds
+ additional N complex multis and N complex adds
~ (Total:) 22
222
NN
NN
complex multis and adds
(c) N2log -stage FFT
Since 2N , we can further break 2
N -point DFT into two 4
N -point DFT and
so on.
DSP (2015 Spring) Computation of DFT
NCTU EE 5
At each stage: ~ N complex multis and adds
Total: ~ NN 2log complex multis and adds (--> NN
2log2
)
Number of points, N
Direct Computation: Complex Multis
FFT: Complex Multis
Speed Im-provement Factor
4 16 4 4.08 64 12 5.3
16 256 32 864 4,096 192 21.3
256 65,536 1,024 64.01024 1,048,576 5,120 204.8
Butterfly: Basic unit in FFT
Two multiplications:
DSP (2015 Spring) Computation of DFT
NCTU EE 6
One multiplication:
In-place computations
Only two registers are needed for computing a butterfly unit.
][][][
][][][
11
11
qXWpXqX
qXWpXpX
mr
Nmm
mr
Nmm
Advantage: less storage!
In order to retain the in-place computation property, the input data are accessed in the
bit-reversed order.
Note: The outputs are in the normal order (same as the “position”)
Position Binary equivalent Bit reversed Sequence index
6 110 011 3 2 010 010 2
DSP (2015 Spring) Computation of DFT
NCTU EE 7
Remark: Index 3 input data is placed at position 6.
We may also place the inputs in the normal order; then the outputs are in the bit-reversed
order.
If we try to maintain the normal order of both inputs and outputs, then in-place compu-
tation structure is destroyed.
DSP (2015 Spring) Computation of DFT
NCTU EE 8
Radix-2 Decimation-in-frequency Algorithms Dividing the output sequence ][kX into smaller pieces.
1,,1,0,)()(1
0
NkWnxkX
N
n
knN
If k is even, rk 2 .
12
02
12
0
2
22]2[2
12
0
12
0
22
2
12
0
1
2
22
1
0
2][
2][
2][
)2(][][
12
,,1,0 ,][]2[
N
n
nrN
N
n
nrN
rnN
rNN
rnN
Nnr
N
N
n
N
n
Nnr
Nnr
N
N
n
N
Nn
nrN
nrN
N
n
knN
WN
nxnx
WN
nxnx
WWWW
WN
nxWnx
NnnWnxWnx
NrWnxrX
Similarly, if k is odd, 12 rk .
12
022
][]12[
N
n
nrN
nN WW
NnxnxrX
DSP (2015 Spring) Computation of DFT
NCTU EE 9
12
02
12
02
2][]12[
2][]2[
N
n
nrN
nN
N
n
nrN
WWN
nxnxrX
WN
nxnxrX
Let
2][][
2][][
Nnxnxnh
Nnxnxng
We can further break ]2[ rX into even and odd groups …
Again, we can reduce the two-multiplication butterfly into one multiplication. Hence, the
computational complexity is bout NN
2log2
. The in-place computation property holds if the
outputs are in bit-reversed order (when inputs are in the normal order).
DSP (2015 Spring) Computation of DFT
NCTU EE 10
FFT for Composite N -- Cooley-Tukey Algorithm:
21NNN
10
10:index Freq.
10
10:index Time
22
11211
22
11212
Nk
NkkNkk
Nn
NnnnNn
Remark: ),( 21 nnn and ),( 21 kkk
Goal: Decompose N-point DFT into two stages:
1N -point DFT 2N -point DFT
point-
1
0
factortwiddle
point-
1
0212
1
0
1
01
212
1
0
1
0212
211
1
0
2
2
2
22
2
21
1
1
1
11
1
2
2
1
1
1221
222
21121
111
112
2
2
1
1
212211
][
][
][
][
10 ,][][
N
N
n
nkN
nkN
N
N
n
nkN
N
n
N
n
nkNNN
W
nNkN
nkN
W
nkNN
N
n
N
n
nnNkNkN
N
n
knN
WWWnnNx
WWWWnnNx
WnnNx
kNkX
NkWnxkX
nkN
nkN
Procedure
(1) Compute 1N -point DFT: (row transform)
1
021212
1
1
11
1][],[
N
n
nkNWnnNxknG
(2) Multiply twiddle factors:
],[],[~
121221 knGWknG nk
N
(3) Compute 2N -point DFT: (column transform)
1
012211
2
2
22
2],[
~][
N
n
nkNWknGkNkX
DSP (2015 Spring) Computation of DFT
NCTU EE 11
(Computation of N=15-point DFT by means of 3-point and 5-point DFTs.)
DSP (2015 Spring) Computation of DFT
NCTU EE 12
Extension: NNNN 21
21 If
point DFT-for tionsmultiplica ofnumber )(Let
NNN
NN
)(: transfrmcolumn 3.
:factors twiddle2.
)(:ormrow transf 1.
21
21
12
NN
NNN
NN
1)()(
)()()(
2
2
1
1
2112
N
N
N
NN
NNNNNN
In general,
1
)1()(
)(i i
i
N
NNN
In fact, the term 1 should be 2
1 because rearranging the butterfly structure
would make half of the branches becoming “1”.
Special Case: 221 NNN
Radix-2: 221 NNN and N2log
2
1)( NN multiplications because )2( requires no multiplications.
Radix-4: 421 NNN and N4log
2
1)( NN multiplications because )4( requires no multiplications. This FFT
has fewer stages than Radix-2 ==> fewer multiplications.
jj
jj
WWW
WWW
WWW
WWWW
WWWW
WWWW
WWWW
11
1111
11
1111
1
1
1
1111
14
24
34
24
04
24
34
24
14
94
64
34
04
64
44
24
04
34
24
14
04
04
04
04
04
4W
DSP (2015 Spring) Computation of DFT
NCTU EE 13
Inverse FFT IDFT:
1
0
][1
][N
k
knNWkX
Nnx
(*)
DFT:
1
0
][][N
n
nkNWnxkX
Hence, take the conjugate of (*) :
)( DFT1
][1
][1
][1
][
*
1
0
*
1
0
*
*1
0
*
kXN
WkXN
WkXN
WkXN
nx
N
k
knN
N
k
knN
N
k
knN
Take the conjugate of the above equation:
**
**
)(FFT 1
)(DFT 1
][
kXN
kXN
nx
Thus, we can use the FFT algorithm to compute the inverse DFT.
The Goertzel Algorithm
12)
2(
kjNk
Nj
kNN eeW
1
0
)(1
0
][][][N
r
rNkN
N
r
krN
kNN WrxWrxWkX
If we define x[n] = 0 for n < 0 and n N
and ])[(][][][][ )( nuWnxrnuWrxny knN
r
rNkNk
,
Nnk nykX
][][
DSP (2015 Spring) Computation of DFT
NCTU EE 14
11
1)(
zWzH
kN
k
If x[n] is complex, we need 4 real multiplications and 4 real additions to compute each
yk[n].
To compute yk[N], we need to compute yk[1], yk[2], …, yk[N-1].
We need 4N real multiplications and 4N real additions to compute X[k].
Remarks:
less efficient than the direct method.
Avoid the computation or storage of the coefficients kn
NW .
To reduce the number of multiplications,
21
1
11
1
)/2cos(21
1
)1)(1(
1)(
zzNk
zW
zWzW
zWzH
kN
kN
kN
kN
k
If x[n] is complex, we only need 2 real multiplications and 4 real additions to implement
the poles of the system. (The complex multiplication by k
NW needs not be performed at every iteration.)
To compute X[k], we need 2N real multiplications and 4N real additions for the poles
and 4 real multiplications and 4 real additions for the zero.
Remarks:
Avoid the computation or storage of the coefficients kn
NW .
Only need to compute and save kNW and )/2cos( Nk .