Implementation of DSP IC

VSP Lecture4 - Fast Algorithms ([email protected])

Implementation of DSP IC

Implementation of DSP IC

Lecture 4 Fast Algorithms for Digital Signal Processing

1


Algorithm Strength Reduction• Strength reduction leads to a reduction in

hardware complexity by exploiting substructure sharing and leads to less silicon area or power consumption in a VLSI ASIC or iteration period in a programmable DSP implementation

• Strength reduction enables design of parallel FIR filters with a less-than-linear increase in hardware

2

Algorithm Strength Reduction• Motivation

– The number of strong operations, such as multiplications, is reduced possibly at the expense of an increase in the number of weaker operations, such as additions.

• Reduce computation complexity• Example: Complex multiplication

– (a+jb)(c+jd)=e+jf, a,b,c,d,e,f R– The direct implementation requires 4 multiplications and 2

additions

– However, the number of multiplication can be reduced to 3 at the expense of 3 extra additions by using the identities

ba

cddc

fe

)()()()(

baddcbbcadbaddcabdac

3 multiplications

5 additions


Complex Multiplication

Reduce the number of strong operation (less switched capacitance), however, increase the critical path

Speed?, Area?, Power? ….

4

[email protected] 5

Review of Discrete Fourier Transform

[email protected] 6

4 Forms of Fourier Analysis

“Sampled” frequency

[email protected] 7

Continuous-Time and Continuous-Frequency

ContinuousAperiodic

ContinuousAperiodic

[email protected] 8

Continuous-Time and Discrete-Frequency

Fourier series of periodic continuous signals

PeriodicContinuous

Discrete Aperiodic

[email protected] 9

Discrete-Time and Continuous-Frequency

Fourier transform of aperiodic discrete signals

DiscreteAperiodic Continuous

Periodic

[email protected] 10

Discrete Fourier Transform

• DFT is identical to samples of Fourier transforms• In DSP applications, we are able to store only a finite number of samples• we are able to compute the spectrum only at specific discrete values of


Discrete Fourier Transform• Discrete Fourier transform (DFT) pairs

knN

jknN

N

k

knN

N

n

knN

eW

NnWkXN

nx

NkWnxkX

2

1

0

1

0

where

,1,,1,0 ,][1][

1,,1,0 ,][][

• DFT/IDFT can be implemented by using the same hardware• It requires N2 complex multiplications and N(N-1) complex additions

N complex multiplicationsN-1 complex additions


More About DFT• Properties of Discrete Fourier

Transform• Linear Convolution and Discrete

Fourier Transform


Periodic Sequence• Consider a periodic sequence of period N• The sequence can be represented by Fourier

series

• The Fourier series for any discrete-time signal with period N requires only N harmonically related complex exponentials.

][~ nx

k

knNjekXN

nx /2][~1][~

][][ /2/2 neeene kknNjnlNkNj

lNk

1

0

/2][~1][~ N

k

knNjekXN

nx


Apply the Orthogonality property, we have

Interchange the order of summation

The coefficients are also periodic with period N


DFS Representation of a Periodic Sequence

Synthesis equation Analysis equation

NnxkX period of sequence periodic are ~ and ~


Physical Significance

Let

One period

Then


vs][~ kX )( jeX

Example


Sampling the Fourier Transform

N2

unit circle

Then

or

The sampling sequence is periodic with period N

Suppose exists

Since


Aliasing Problem 1• x[n] is infinite-length sequence

][~ nx

x


Aliasing Problem 2• If x[n] is finite-length sequence, 0nM-1• Consider the case NM

][][~ nxnx

][~ nx


Concluding Remarks

][~ nx

The case NM

or


Circular Shift of a Sequence

][]2[~

]2[~

][~

][

nRnx

nx

nx

nx

N

N=15

A rotation ofthe cylinder


Circular Shift of a Sequence

][]13[~

]13[~

][~

][

15 nRnx

nx

nx

nx

N=15

A rotation ofthe cylinder


Review of Convolution

• Given two sequences:– Data sequence xi, 0 ≤ i≤ N-1, of length N– Filter sequence hi, 0 ≤ i≤ L-1, of length L

• Linear convolution

• Direct computation, for example 2-by-2 convolution2,,1,0 , NLixhhxy iiiii

NL multiplications

hx sL-point sequence N-point

sequence

(L+N-1)-point sequence

1

0

1

01

0

2

1

0

0

0

xx

hhh

h

sss require 4 multiplications

and 1 addition


Linear Convolution

Linear Shift

Linear Shift

Linear Shift


Linear Shift vs Circular Shift

Conventional shift(linear shift)


Circular Shift Example


Periodic/Circular Convolution

Circular Shift


Circular Convolution Definition• Suppose two finite-length duration sequences:

x1[n] and x2[n] of length N

x3[n] is also a finite-length duration sequences of length N


Computation for Circular Convolution

1. To period the two sequence with period N (large enough)

2. To compute the periodic convolution of the two periodic sequences

3. To get out the duration sequence between [0, N-1]


Example

Step 1

Step 2

Step 3


Circular Convolution Property• Usually, we use the following notation to

represent the circular convolution of length N

• Circular convolution property


Circular Convolution Implementation

• Direct Implementation

hx sN-point sequence N-point

sequence

N-point sequence

44 cyclic convolution

16 multiplications12 additions

Circular Convolution

~ O(N2)


Using Circular Convolution to Implement Linear Convolution

• Consider two sequences x1[n] of length L and x2[n] of length P, respectively

• The linear convolution x3=x1[n] x2[n]

• Choose N, such that NL+P-1, then

a sequence of length L+P-1The same concept related to Winogrand Algorithm


Linear Convolution


Circular Convolution with N=L+P-1

Time aliasing in the circular convolution of two finite-length sequence can be avoided if N L+P-1


Concluding Remarks• The convolution of two finite-length sequences can be

interpreted by circular convolution with large enough length• Circular convolution can be implemented by DFT/FFT

• However, in real applications….– For an FIR system, the input sequence is of indefinite duration– To store the entire input signal requires ?

• A large delay in processing• An indefinite memory

– Block convolution


Block Convolution• Step1: To segment a sequence into

sections of length L• Step2: Each section is convolved with the

finite-length impulse response of length P by using DFT/FFT of length N=L+P-1

• Step3: The filtered sections are fitted together in an appropriate way

• Overlap-add method• Overlap-save method


Overlap-Add Methodhx y

x[n]

h[n]

Step1 Zero padding

Zero padding

Zero padding


Step2&

Step3

Time shift

][ ][][][][ N nhnxnhnxny rrr with L+P-1 length

Time shift


Fast Convolution with the FFT• Given two sequences x1 and x2 of length N1 and N2

respectively– Direct implementation requires N1N2 complex

multiplications• Consider using FFT to convolve two sequences:

– Pick N, a power of 2, such that N≥N1+N2-1– Zero-pad x1 and x2 to length N– Compute N-point FFTs of zero-padded x1 and x2, one

obtains X1 and X2– Multiply X1 and X2– Apply the IFFT to obtain the convolution sum of x1 and

x2– Computation complexity: 2(N/2) log2N + N + (N/2)log2N


Example• A sequence x[n] of length 1024• FIR filter h[n] of length 34

• Direct computation: 341024=34816• Using radix-2 FFT: 35840 (N=2048)• Using overlap-add radix-2 FFT:

– x[n] is segmented into a set of contiguous blocks of equal length 95

– Apply radix-2 FFT of length 128– Each segment requires 1472 multiplications– This algorithm requires total 16192 multiplications

Discrete Fourier Transform• Discrete Fourier transform (DFT) pairs

knN

jknN

N

k

knN

N

n

knN

eW

NnWkXN

nx

NkWnxkX

2

1

0

1

0

where

,1,,1,0 ,][1][

1,,1,0 ,][][

• DFT/IDFT can be implemented by using the same hardware• It requires N2 complex multiplications and N(N-1) complex additions

N complex multiplicationsN-1 complex additions

2/N

Decimation in Time

N+2(N/2)2 complex multiplications vs. N2 complex multiplication

twiddle factor

n2ℓ

2ℓ+1

Flow Graph of the DIT FFT

8-point DIT DFT

Remarks• It requires v=log2N stages. Each stage has N/2 butterfly

operation (radix-2 DIT FFT), which requires 2 complex multiplications and 2 complex additions

• Each stage has N complex multiplications and N complex additions

• The number of complex multiplications (as well as additions) is equal to N log2N

• By symmetry property, we have (butterfly operation)222 N

Njr

NN

Nr

NNr

N WeWWWW

2 complex multiplications2 complex additions

1 complex multiplications2 complex additions

8-point FFT

Normal orderBit-Reversed order

In-Place Computation

Stage 1

X0[000]

X0[001]

X0[010]

X0[011]

X0[100]

X0[101]

X0[110]

X0[111]

X1[000]

X1[001]

X1[010]

X1[011]

X1[100]

X1[101]

X1[110]

X1[111]

X2[000]

X2[001]

X2[010]

X2[011]

X2[100]

X2[101]

X2[110]

X2[111]

Stage 3Stage 2

X3[000]

X3[001]

X3[010]

X3[011]

X3[100]

X3[101]

X3[110]

X3[111]

The same register array can be used in each stage

8-point FFT

Normal order Bit-reversed order

Normal-Order Sorting v.s. Bit-Reversed Sorting

Normal Order Bit-reversed Order

even

odd

top

bottom

DFT v.s. Radix-2 FFT• DFT: N2 complex multiplications and N(N-1)

complex additions• Recall that each butterfly operation requires one

complex multiplication and two complex additions• FFT: (N/2) log2N multiplications and N log2N

complex additions

• In-place computations: the input and the output nodes for each butterfly operation are horizontally adjacent only one storage arrays will be required

Decimation in Frequency (DIF)• Recall that the DFT is

• DIT FFT algorithm is based on the decomposition of the DFT computations by forming small subsequences in time domain index “n”: n=2ℓ or n=2ℓ+1

• One can consider dividing the output sequence X[k], in frequency domain, into smaller subsequences: k=2r or k=2r+1:

10 ,][1

0

NkWnxkXN

n

nkN

Substitution of variables

DIF FFT Algorithm (1)

is just N/2-point DFT. Similarly,

DIF FFT Algorithm (2)

v=log2N stages, each stage has N/2 butterfly operation.

(N/2)log2N complex multiplications, N complex additions

Remarks• The basic butterfly operations for DIT FFT and DIF FFT

respectively are transposed-form pair.

• The I/O values of DIT FFT and DIF FFT are the same• Applying the transpose transform to each DIT FFT

algorithm, one obtains DIF FFT algorithm

DIF BF unitDIT BF unit

Fast Convolution with the FFT• Given two sequences x1 and x2 of length N1 and N2

respectively– Direct implementation requires N1N2 complex

multiplications• Consider using FFT to convolve two sequences:

– Pick N, a power of 2, such that N≥N1+N2-1– Zero-pad x1 and x2 to length N– Compute N-point FFTs of zero-padded x1 and x2, then we

obtain X1 and X2– Multiply X1 and X2– Apply the IFFT to obtain the convolution sum of x1 and

x2– Computation complexity: 2(N/2) log2N + N + (N/2)log2N

Implementation Issues• Radix-2, Radix-4, Radix-8, Split-Radix,Radix-22, …, • I/O Indexing• In-place computation

– Bit-reversed sorting is necessary– Efficient use of memory– Random access (not sequential) of memory. An address

generator unit is required.– Good for cascade form: FFT followed by IFFT (or vice

versa)• E.g. fast convolution algorithm

• Twiddle factors– Look up table– CORDIC rotator

FIR Filters

2

1

0

1

01

01

0

3

2

1

0

000

000

xxx

hhh

hhh

yyyy Transform-domainTime-domain

Example: Linear Phase FIRLinear phase FIR filter: with approximately constant frequency-response magnitude and linear phase (constant group delay) in pass-band

N-tap

N multipliersN-1 adders

(N+1)/2 multipliersN-1 adders, if odd N

N/2 multipliersN-1 adders, if even N

By exploiting substructure sharing to reduce area

An Efficient Decomposition• Example: 2-fold decomposition

• Example 3-fold decomposition

• General case (N-fold decomposition)

)(

421

)(

642

654321

21

20

)]5[]3[]1[()]6[]4[]2[]0[( ]6[]5[]4[]3[]2[]1[]0[)(

zHzH

zhzhhzzhzhzhhzhzhzhzhzhzhhzH

)(

32

)(

31

)(

63

654321

32

31

30

)]5[]2[()]4[]1[()]6[]3[]0[( ]6[]5[]4[]3[]2[]1[]0[)(

zHzHzH

zhhzzhhzzhzhhzhzhzhzhzhzhhzH

k

kl

N

l

Nl

l

k

k zlNkhzHzHzzkhzH ][)( where,)(][)(1

0

Traditional Parallel Architecture• 2-fold parallel architecture

4(N/2) multiplicationsN/2-tap 4(N/2-1)+2 additions

Traditional Parallel FIR

L-parallel FIR filter of length N/L requires 1. L2 (N/L) multiplications, i.e. LN multiplications2. L2 (N/L -1) +L(L-1) additions, i.e. L(N-1) additions

~ LN multiply-add operations

Fast FIR Algorithm (FFA)• First by applying L-fold polyphase

decomposition for H(z)– There are L filters of length N/L

• By applying Winograd algorithm– 2 polynomials of degree (N/L)-1 can be

implemented by using 2 (N/L)-1 product terms.– Each product terms are equivalent to filtering

operations in the block formulation– Consequently, it can be realized using

approximately L FIR filters of length N/L It requires 2N-L multiplications


FIR Using Polyphase Decomposition

68


Traditional Parallel Architecture• 2-fold parallel architecture

4(N/2) multiplicationsN/2-tap 4(N/2-1)+2 additions

69


2-Parallel Fast FIR Filter

70


2-Parallel FFA• It requires 3 distinct sub-filters of length N/2

and 4 pre/post-processing additions. • Totally, it requires 1.5N multiplications and

3(N/2 -1)+4=1.5N +1 additions

71

Implementation of DSP IC

Documents

Transcript of Implementation of DSP IC