Implementation of DSP IC
Transcript of Implementation of DSP IC
![Page 1: Implementation of DSP IC](https://reader030.fdocuments.us/reader030/viewer/2022041219/6251314668fa150f57573afb/html5/thumbnails/1.jpg)
VSP Lecture4 - Fast Algorithms ([email protected])
Implementation of DSP IC
Implementation of DSP IC
Lecture 4 Fast Algorithms for Digital Signal Processing
1
![Page 2: Implementation of DSP IC](https://reader030.fdocuments.us/reader030/viewer/2022041219/6251314668fa150f57573afb/html5/thumbnails/2.jpg)
VSP Lecture4 - Fast Algorithms ([email protected])
Algorithm Strength Reduction• Strength reduction leads to a reduction in
hardware complexity by exploiting substructure sharing and leads to less silicon area or power consumption in a VLSI ASIC or iteration period in a programmable DSP implementation
• Strength reduction enables design of parallel FIR filters with a less-than-linear increase in hardware
2
![Page 3: Implementation of DSP IC](https://reader030.fdocuments.us/reader030/viewer/2022041219/6251314668fa150f57573afb/html5/thumbnails/3.jpg)
Algorithm Strength Reduction• Motivation
– The number of strong operations, such as multiplications, is reduced possibly at the expense of an increase in the number of weaker operations, such as additions.
• Reduce computation complexity• Example: Complex multiplication
– (a+jb)(c+jd)=e+jf, a,b,c,d,e,f R– The direct implementation requires 4 multiplications and 2
additions
– However, the number of multiplication can be reduced to 3 at the expense of 3 extra additions by using the identities
ba
cddc
fe
)()()()(
baddcbbcadbaddcabdac
3 multiplications
5 additions
![Page 4: Implementation of DSP IC](https://reader030.fdocuments.us/reader030/viewer/2022041219/6251314668fa150f57573afb/html5/thumbnails/4.jpg)
VSP Lecture4 - Fast Algorithms ([email protected])
Complex Multiplication
Reduce the number of strong operation (less switched capacitance), however, increase the critical path
Speed?, Area?, Power? ….
4
![Page 7: Implementation of DSP IC](https://reader030.fdocuments.us/reader030/viewer/2022041219/6251314668fa150f57573afb/html5/thumbnails/7.jpg)
Continuous-Time and Continuous-Frequency
ContinuousAperiodic
ContinuousAperiodic
![Page 8: Implementation of DSP IC](https://reader030.fdocuments.us/reader030/viewer/2022041219/6251314668fa150f57573afb/html5/thumbnails/8.jpg)
Continuous-Time and Discrete-Frequency
Fourier series of periodic continuous signals
PeriodicContinuous
Discrete Aperiodic
![Page 9: Implementation of DSP IC](https://reader030.fdocuments.us/reader030/viewer/2022041219/6251314668fa150f57573afb/html5/thumbnails/9.jpg)
Discrete-Time and Continuous-Frequency
Fourier transform of aperiodic discrete signals
DiscreteAperiodic Continuous
Periodic
![Page 10: Implementation of DSP IC](https://reader030.fdocuments.us/reader030/viewer/2022041219/6251314668fa150f57573afb/html5/thumbnails/10.jpg)
Discrete Fourier Transform
• DFT is identical to samples of Fourier transforms• In DSP applications, we are able to store only a finite number of samples• we are able to compute the spectrum only at specific discrete values of
![Page 11: Implementation of DSP IC](https://reader030.fdocuments.us/reader030/viewer/2022041219/6251314668fa150f57573afb/html5/thumbnails/11.jpg)
Discrete Fourier Transform• Discrete Fourier transform (DFT) pairs
knN
jknN
N
k
knN
N
n
knN
eW
NnWkXN
nx
NkWnxkX
2
1
0
1
0
where
,1,,1,0 ,][1][
1,,1,0 ,][][
• DFT/IDFT can be implemented by using the same hardware• It requires N2 complex multiplications and N(N-1) complex additions
N complex multiplicationsN-1 complex additions
![Page 12: Implementation of DSP IC](https://reader030.fdocuments.us/reader030/viewer/2022041219/6251314668fa150f57573afb/html5/thumbnails/12.jpg)
More About DFT• Properties of Discrete Fourier
Transform• Linear Convolution and Discrete
Fourier Transform
![Page 13: Implementation of DSP IC](https://reader030.fdocuments.us/reader030/viewer/2022041219/6251314668fa150f57573afb/html5/thumbnails/13.jpg)
Periodic Sequence• Consider a periodic sequence of period N• The sequence can be represented by Fourier
series
• The Fourier series for any discrete-time signal with period N requires only N harmonically related complex exponentials.
][~ nx
k
knNjekXN
nx /2][~1][~
][][ /2/2 neeene kknNjnlNkNj
lNk
1
0
/2][~1][~ N
k
knNjekXN
nx
![Page 14: Implementation of DSP IC](https://reader030.fdocuments.us/reader030/viewer/2022041219/6251314668fa150f57573afb/html5/thumbnails/14.jpg)
Apply the Orthogonality property, we have
Interchange the order of summation
The coefficients are also periodic with period N
![Page 15: Implementation of DSP IC](https://reader030.fdocuments.us/reader030/viewer/2022041219/6251314668fa150f57573afb/html5/thumbnails/15.jpg)
DFS Representation of a Periodic Sequence
Synthesis equation Analysis equation
NnxkX period of sequence periodic are ~ and ~
![Page 18: Implementation of DSP IC](https://reader030.fdocuments.us/reader030/viewer/2022041219/6251314668fa150f57573afb/html5/thumbnails/18.jpg)
Sampling the Fourier Transform
N2
unit circle
Then
or
The sampling sequence is periodic with period N
Suppose exists
Since
![Page 20: Implementation of DSP IC](https://reader030.fdocuments.us/reader030/viewer/2022041219/6251314668fa150f57573afb/html5/thumbnails/20.jpg)
Aliasing Problem 2• If x[n] is finite-length sequence, 0nM-1• Consider the case NM
][][~ nxnx
][~ nx
![Page 22: Implementation of DSP IC](https://reader030.fdocuments.us/reader030/viewer/2022041219/6251314668fa150f57573afb/html5/thumbnails/22.jpg)
Circular Shift of a Sequence
][]2[~
]2[~
][~
][
nRnx
nx
nx
nx
N
N=15
A rotation ofthe cylinder
![Page 23: Implementation of DSP IC](https://reader030.fdocuments.us/reader030/viewer/2022041219/6251314668fa150f57573afb/html5/thumbnails/23.jpg)
Circular Shift of a Sequence
][]13[~
]13[~
][~
][
15 nRnx
nx
nx
nx
N=15
A rotation ofthe cylinder
![Page 24: Implementation of DSP IC](https://reader030.fdocuments.us/reader030/viewer/2022041219/6251314668fa150f57573afb/html5/thumbnails/24.jpg)
Review of Convolution
• Given two sequences:– Data sequence xi, 0 ≤ i≤ N-1, of length N– Filter sequence hi, 0 ≤ i≤ L-1, of length L
• Linear convolution
• Direct computation, for example 2-by-2 convolution2,,1,0 , NLixhhxy iiiii
NL multiplications
hx sL-point sequence N-point
sequence
(L+N-1)-point sequence
1
0
1
01
0
2
1
0
0
0
xx
hhh
h
sss require 4 multiplications
and 1 addition
![Page 29: Implementation of DSP IC](https://reader030.fdocuments.us/reader030/viewer/2022041219/6251314668fa150f57573afb/html5/thumbnails/29.jpg)
Circular Convolution Definition• Suppose two finite-length duration sequences:
x1[n] and x2[n] of length N
x3[n] is also a finite-length duration sequences of length N
![Page 30: Implementation of DSP IC](https://reader030.fdocuments.us/reader030/viewer/2022041219/6251314668fa150f57573afb/html5/thumbnails/30.jpg)
Computation for Circular Convolution
1. To period the two sequence with period N (large enough)
2. To compute the periodic convolution of the two periodic sequences
3. To get out the duration sequence between [0, N-1]
![Page 32: Implementation of DSP IC](https://reader030.fdocuments.us/reader030/viewer/2022041219/6251314668fa150f57573afb/html5/thumbnails/32.jpg)
Circular Convolution Property• Usually, we use the following notation to
represent the circular convolution of length N
• Circular convolution property
![Page 33: Implementation of DSP IC](https://reader030.fdocuments.us/reader030/viewer/2022041219/6251314668fa150f57573afb/html5/thumbnails/33.jpg)
Circular Convolution Implementation
• Direct Implementation
hx sN-point sequence N-point
sequence
N-point sequence
44 cyclic convolution
16 multiplications12 additions
Circular Convolution
~ O(N2)
![Page 34: Implementation of DSP IC](https://reader030.fdocuments.us/reader030/viewer/2022041219/6251314668fa150f57573afb/html5/thumbnails/34.jpg)
Using Circular Convolution to Implement Linear Convolution
• Consider two sequences x1[n] of length L and x2[n] of length P, respectively
• The linear convolution x3=x1[n] x2[n]
• Choose N, such that NL+P-1, then
a sequence of length L+P-1The same concept related to Winogrand Algorithm
![Page 36: Implementation of DSP IC](https://reader030.fdocuments.us/reader030/viewer/2022041219/6251314668fa150f57573afb/html5/thumbnails/36.jpg)
Circular Convolution with N=L+P-1
Time aliasing in the circular convolution of two finite-length sequence can be avoided if N L+P-1
![Page 37: Implementation of DSP IC](https://reader030.fdocuments.us/reader030/viewer/2022041219/6251314668fa150f57573afb/html5/thumbnails/37.jpg)
Concluding Remarks• The convolution of two finite-length sequences can be
interpreted by circular convolution with large enough length• Circular convolution can be implemented by DFT/FFT
• However, in real applications….– For an FIR system, the input sequence is of indefinite duration– To store the entire input signal requires ?
• A large delay in processing• An indefinite memory
– Block convolution
![Page 38: Implementation of DSP IC](https://reader030.fdocuments.us/reader030/viewer/2022041219/6251314668fa150f57573afb/html5/thumbnails/38.jpg)
Block Convolution• Step1: To segment a sequence into
sections of length L• Step2: Each section is convolved with the
finite-length impulse response of length P by using DFT/FFT of length N=L+P-1
• Step3: The filtered sections are fitted together in an appropriate way
• Overlap-add method• Overlap-save method
![Page 40: Implementation of DSP IC](https://reader030.fdocuments.us/reader030/viewer/2022041219/6251314668fa150f57573afb/html5/thumbnails/40.jpg)
Step2&
Step3
Time shift
][ ][][][][ N nhnxnhnxny rrr with L+P-1 length
Time shift
![Page 41: Implementation of DSP IC](https://reader030.fdocuments.us/reader030/viewer/2022041219/6251314668fa150f57573afb/html5/thumbnails/41.jpg)
Fast Convolution with the FFT• Given two sequences x1 and x2 of length N1 and N2
respectively– Direct implementation requires N1N2 complex
multiplications• Consider using FFT to convolve two sequences:
– Pick N, a power of 2, such that N≥N1+N2-1– Zero-pad x1 and x2 to length N– Compute N-point FFTs of zero-padded x1 and x2, one
obtains X1 and X2– Multiply X1 and X2– Apply the IFFT to obtain the convolution sum of x1 and
x2– Computation complexity: 2(N/2) log2N + N + (N/2)log2N
![Page 42: Implementation of DSP IC](https://reader030.fdocuments.us/reader030/viewer/2022041219/6251314668fa150f57573afb/html5/thumbnails/42.jpg)
Example• A sequence x[n] of length 1024• FIR filter h[n] of length 34
• Direct computation: 341024=34816• Using radix-2 FFT: 35840 (N=2048)• Using overlap-add radix-2 FFT:
– x[n] is segmented into a set of contiguous blocks of equal length 95
– Apply radix-2 FFT of length 128– Each segment requires 1472 multiplications– This algorithm requires total 16192 multiplications
![Page 43: Implementation of DSP IC](https://reader030.fdocuments.us/reader030/viewer/2022041219/6251314668fa150f57573afb/html5/thumbnails/43.jpg)
Discrete Fourier Transform• Discrete Fourier transform (DFT) pairs
knN
jknN
N
k
knN
N
n
knN
eW
NnWkXN
nx
NkWnxkX
2
1
0
1
0
where
,1,,1,0 ,][1][
1,,1,0 ,][][
• DFT/IDFT can be implemented by using the same hardware• It requires N2 complex multiplications and N(N-1) complex additions
N complex multiplicationsN-1 complex additions
2/N
![Page 44: Implementation of DSP IC](https://reader030.fdocuments.us/reader030/viewer/2022041219/6251314668fa150f57573afb/html5/thumbnails/44.jpg)
Decimation in Time
N+2(N/2)2 complex multiplications vs. N2 complex multiplication
twiddle factor
n2ℓ
2ℓ+1
![Page 45: Implementation of DSP IC](https://reader030.fdocuments.us/reader030/viewer/2022041219/6251314668fa150f57573afb/html5/thumbnails/45.jpg)
![Page 46: Implementation of DSP IC](https://reader030.fdocuments.us/reader030/viewer/2022041219/6251314668fa150f57573afb/html5/thumbnails/46.jpg)
Flow Graph of the DIT FFT
![Page 47: Implementation of DSP IC](https://reader030.fdocuments.us/reader030/viewer/2022041219/6251314668fa150f57573afb/html5/thumbnails/47.jpg)
![Page 48: Implementation of DSP IC](https://reader030.fdocuments.us/reader030/viewer/2022041219/6251314668fa150f57573afb/html5/thumbnails/48.jpg)
8-point DIT DFT
![Page 49: Implementation of DSP IC](https://reader030.fdocuments.us/reader030/viewer/2022041219/6251314668fa150f57573afb/html5/thumbnails/49.jpg)
Remarks• It requires v=log2N stages. Each stage has N/2 butterfly
operation (radix-2 DIT FFT), which requires 2 complex multiplications and 2 complex additions
• Each stage has N complex multiplications and N complex additions
• The number of complex multiplications (as well as additions) is equal to N log2N
• By symmetry property, we have (butterfly operation)222 N
Njr
NN
Nr
NNr
N WeWWWW
2 complex multiplications2 complex additions
1 complex multiplications2 complex additions
![Page 50: Implementation of DSP IC](https://reader030.fdocuments.us/reader030/viewer/2022041219/6251314668fa150f57573afb/html5/thumbnails/50.jpg)
8-point FFT
Normal orderBit-Reversed order
![Page 51: Implementation of DSP IC](https://reader030.fdocuments.us/reader030/viewer/2022041219/6251314668fa150f57573afb/html5/thumbnails/51.jpg)
In-Place Computation
Stage 1
X0[000]
X0[001]
X0[010]
X0[011]
X0[100]
X0[101]
X0[110]
X0[111]
X1[000]
X1[001]
X1[010]
X1[011]
X1[100]
X1[101]
X1[110]
X1[111]
X2[000]
X2[001]
X2[010]
X2[011]
X2[100]
X2[101]
X2[110]
X2[111]
Stage 3Stage 2
X3[000]
X3[001]
X3[010]
X3[011]
X3[100]
X3[101]
X3[110]
X3[111]
The same register array can be used in each stage
![Page 52: Implementation of DSP IC](https://reader030.fdocuments.us/reader030/viewer/2022041219/6251314668fa150f57573afb/html5/thumbnails/52.jpg)
8-point FFT
Normal order Bit-reversed order
![Page 53: Implementation of DSP IC](https://reader030.fdocuments.us/reader030/viewer/2022041219/6251314668fa150f57573afb/html5/thumbnails/53.jpg)
Normal-Order Sorting v.s. Bit-Reversed Sorting
Normal Order Bit-reversed Order
even
odd
top
bottom
![Page 54: Implementation of DSP IC](https://reader030.fdocuments.us/reader030/viewer/2022041219/6251314668fa150f57573afb/html5/thumbnails/54.jpg)
DFT v.s. Radix-2 FFT• DFT: N2 complex multiplications and N(N-1)
complex additions• Recall that each butterfly operation requires one
complex multiplication and two complex additions• FFT: (N/2) log2N multiplications and N log2N
complex additions
• In-place computations: the input and the output nodes for each butterfly operation are horizontally adjacent only one storage arrays will be required
![Page 55: Implementation of DSP IC](https://reader030.fdocuments.us/reader030/viewer/2022041219/6251314668fa150f57573afb/html5/thumbnails/55.jpg)
Decimation in Frequency (DIF)• Recall that the DFT is
• DIT FFT algorithm is based on the decomposition of the DFT computations by forming small subsequences in time domain index “n”: n=2ℓ or n=2ℓ+1
• One can consider dividing the output sequence X[k], in frequency domain, into smaller subsequences: k=2r or k=2r+1:
10 ,][1
0
NkWnxkXN
n
nkN
Substitution of variables
![Page 56: Implementation of DSP IC](https://reader030.fdocuments.us/reader030/viewer/2022041219/6251314668fa150f57573afb/html5/thumbnails/56.jpg)
DIF FFT Algorithm (1)
is just N/2-point DFT. Similarly,
![Page 57: Implementation of DSP IC](https://reader030.fdocuments.us/reader030/viewer/2022041219/6251314668fa150f57573afb/html5/thumbnails/57.jpg)
DIF FFT Algorithm (2)
v=log2N stages, each stage has N/2 butterfly operation.
(N/2)log2N complex multiplications, N complex additions
![Page 58: Implementation of DSP IC](https://reader030.fdocuments.us/reader030/viewer/2022041219/6251314668fa150f57573afb/html5/thumbnails/58.jpg)
Remarks• The basic butterfly operations for DIT FFT and DIF FFT
respectively are transposed-form pair.
• The I/O values of DIT FFT and DIF FFT are the same• Applying the transpose transform to each DIT FFT
algorithm, one obtains DIF FFT algorithm
DIF BF unitDIT BF unit
![Page 59: Implementation of DSP IC](https://reader030.fdocuments.us/reader030/viewer/2022041219/6251314668fa150f57573afb/html5/thumbnails/59.jpg)
Fast Convolution with the FFT• Given two sequences x1 and x2 of length N1 and N2
respectively– Direct implementation requires N1N2 complex
multiplications• Consider using FFT to convolve two sequences:
– Pick N, a power of 2, such that N≥N1+N2-1– Zero-pad x1 and x2 to length N– Compute N-point FFTs of zero-padded x1 and x2, then we
obtain X1 and X2– Multiply X1 and X2– Apply the IFFT to obtain the convolution sum of x1 and
x2– Computation complexity: 2(N/2) log2N + N + (N/2)log2N
![Page 60: Implementation of DSP IC](https://reader030.fdocuments.us/reader030/viewer/2022041219/6251314668fa150f57573afb/html5/thumbnails/60.jpg)
Implementation Issues• Radix-2, Radix-4, Radix-8, Split-Radix,Radix-22, …, • I/O Indexing• In-place computation
– Bit-reversed sorting is necessary– Efficient use of memory– Random access (not sequential) of memory. An address
generator unit is required.– Good for cascade form: FFT followed by IFFT (or vice
versa)• E.g. fast convolution algorithm
• Twiddle factors– Look up table– CORDIC rotator
![Page 61: Implementation of DSP IC](https://reader030.fdocuments.us/reader030/viewer/2022041219/6251314668fa150f57573afb/html5/thumbnails/61.jpg)
FIR Filters
2
1
0
1
01
01
0
3
2
1
0
000
000
xxx
hhh
hhh
yyyy Transform-domainTime-domain
![Page 62: Implementation of DSP IC](https://reader030.fdocuments.us/reader030/viewer/2022041219/6251314668fa150f57573afb/html5/thumbnails/62.jpg)
Example: Linear Phase FIRLinear phase FIR filter: with approximately constant frequency-response magnitude and linear phase (constant group delay) in pass-band
N-tap
N multipliersN-1 adders
(N+1)/2 multipliersN-1 adders, if odd N
N/2 multipliersN-1 adders, if even N
By exploiting substructure sharing to reduce area
![Page 63: Implementation of DSP IC](https://reader030.fdocuments.us/reader030/viewer/2022041219/6251314668fa150f57573afb/html5/thumbnails/63.jpg)
An Efficient Decomposition• Example: 2-fold decomposition
• Example 3-fold decomposition
• General case (N-fold decomposition)
)(
421
)(
642
654321
21
20
)]5[]3[]1[()]6[]4[]2[]0[( ]6[]5[]4[]3[]2[]1[]0[)(
zHzH
zhzhhzzhzhzhhzhzhzhzhzhzhhzH
)(
32
)(
31
)(
63
654321
32
31
30
)]5[]2[()]4[]1[()]6[]3[]0[( ]6[]5[]4[]3[]2[]1[]0[)(
zHzHzH
zhhzzhhzzhzhhzhzhzhzhzhzhhzH
k
kl
N
l
Nl
l
k
k zlNkhzHzHzzkhzH ][)( where,)(][)(1
0
![Page 64: Implementation of DSP IC](https://reader030.fdocuments.us/reader030/viewer/2022041219/6251314668fa150f57573afb/html5/thumbnails/64.jpg)
Traditional Parallel Architecture• 2-fold parallel architecture
4(N/2) multiplicationsN/2-tap 4(N/2-1)+2 additions
![Page 65: Implementation of DSP IC](https://reader030.fdocuments.us/reader030/viewer/2022041219/6251314668fa150f57573afb/html5/thumbnails/65.jpg)
Traditional Parallel FIR
L-parallel FIR filter of length N/L requires 1. L2 (N/L) multiplications, i.e. LN multiplications2. L2 (N/L -1) +L(L-1) additions, i.e. L(N-1) additions
~ LN multiply-add operations
![Page 66: Implementation of DSP IC](https://reader030.fdocuments.us/reader030/viewer/2022041219/6251314668fa150f57573afb/html5/thumbnails/66.jpg)
![Page 67: Implementation of DSP IC](https://reader030.fdocuments.us/reader030/viewer/2022041219/6251314668fa150f57573afb/html5/thumbnails/67.jpg)
Fast FIR Algorithm (FFA)• First by applying L-fold polyphase
decomposition for H(z)– There are L filters of length N/L
• By applying Winograd algorithm– 2 polynomials of degree (N/L)-1 can be
implemented by using 2 (N/L)-1 product terms.– Each product terms are equivalent to filtering
operations in the block formulation– Consequently, it can be realized using
approximately L FIR filters of length N/L It requires 2N-L multiplications
![Page 69: Implementation of DSP IC](https://reader030.fdocuments.us/reader030/viewer/2022041219/6251314668fa150f57573afb/html5/thumbnails/69.jpg)
VSP Lecture4 - Fast Algorithms ([email protected])
Traditional Parallel Architecture• 2-fold parallel architecture
4(N/2) multiplicationsN/2-tap 4(N/2-1)+2 additions
69
![Page 71: Implementation of DSP IC](https://reader030.fdocuments.us/reader030/viewer/2022041219/6251314668fa150f57573afb/html5/thumbnails/71.jpg)
VSP Lecture4 - Fast Algorithms ([email protected])
2-Parallel FFA• It requires 3 distinct sub-filters of length N/2
and 4 pre/post-processing additions. • Totally, it requires 1.5N multiplications and
3(N/2 -1)+4=1.5N +1 additions
71