CHAPTER 5 Software Implementation of FFT Using the SC3850...

CHAPTER 5

Software Implementation of FFT pUsing the SC3850 Core

1

Fast Fourier Transform (FFT)

• Discrete Fourier Transform (DFT) is defined by:

21

, 0,1, , 1,N jnk N

N NX k x n W k N W e

• Theoretical arithmetic complexity:

0n

Theoretical arithmetic complexity:– N2 complex multiplications and– N(N-1) complex additionsN(N 1) complex additions.

• Real numbers computation 4N2 real multiplications and– 4N2 real multiplications and

– 4N2-2N real additions.

2


• There are fast algorithms that compute a DFT with a smaller number of operationswith a smaller number of operations.

N = RM

• This are called Radix-R algorithms.

• Radix-2 FFT reduces the complexity to– (N/2)log2N complex multiplications(N/2)log2N complex multiplications – Nlog2N complex additions

• Radix-4 FFT 75% reduced multiplications• Radix-4 FFT– (3N/4)log4N = (3N/8)log2N complex multiplications

8(N/4)log N = Nlog N complex additions

p

– 8(N/4)log4N = Nlog2N complex additions3

Transmitter and Receiver Structure of SC-FDMA and OFDMA SystemsSC FDMA and OFDMA Systems

4


• DFT:

21N jk

0

, 0,1, , 1,jnk N

N Nn

X k x n W k N W e

• Twiddle factors properties:2

– Periodicity property:2j N

k N k kNN N NW W e W

N N

– Symmetry property: 2 2N Nk k k j k

N N N N NW W W W e W

2 k

– Base change:2 kj nnk nN

N Nk

W e W

Radix-4 FFT

N=4M

• Decimation-In-Time (DIT) algorithm:

4 0 1 1 0 1 2 3N1 2 1 24 , 0,1, , 1, 0,1, 2,3;

4

0 1 1 0 1 2 3;

n n n n n

N Nk k k k k

D i ti I F (DIF) l ith

1 2 1 2, 0,1, , 1, 0,1,2,3;4 4

k k k k k

• Decimation-In-Frequency (DIF) algorithm:

0 1 1 0 1 2 3;N Nn n n n n 1 2 1 2

1 2 1 2

, 0,1, , 1, 0,1,2,3;4 4

4 , 0,1, , 1, 0,1,2,3;

n n n n n

Nk k k k k

1 2 1 24 , 0,1, , 1, 0,1,2,3;4

k k k k k

Radix-4 DIT FFT

N 1 2 1 2

1 34 4N

Nn n k k

1 24NX k X k k

1 2 1 2

1 2

41 2

0 04 N

n nx n n W

1 3

N

1 1 2 1 2 2

1 2

34

1 2 40 0 4

4 n k n k n kN N

n nx n n W W W

N N

2 0,1,2,3n

1 1

1

14

10 4

4

N

n kN

nx n W

1kNW 1 1

1

14

10 4

4 1

N

n kN

nx n W

1 24NX k k

2

4kW

1 0 4n

12kNW 13k

NW

1 4

1 1

14

14 2

N

n kNx n W

1 1

14

14 3

N

n kNx n W

224

kW 234

kWN N 1

10 4

Nn

1

10 4

Nn 4 4

Radix-4 DIT FFT

x(4n1) TFD 0 TFD

0

TFDN/4

k1N/4‐1

TFD4

x(4n1+1) TFDN/4

0k1

1kNW

k2=0 X(k1)k1

N/4 N/4‐1

x(4n1+2) TFD 0

TFD412kW

123

( 1)X(k1+N/4)X(k1+N/2)( / )

x(4n1 2) TFDN/4

0k1

N/4‐1

1NW 3 X(k1+3N/4)

N/4‐1

x(4n1+3) TFD 0k

13kNW TFD

N/4 1

N/4k1

N/4‐1 4

Radix-4 DIT Butterfly

• Computation complexity– 3 complex multiplications

k2=0p p

– 8 complex additions• Real computations

X(k1)

– 12 real multiplications– 22 real additions

1kNW

X(k1+N/4)‐j

k2=1

‐1

12kW

( 1 )j

k2=21NW

X(k1+N/2)‐1

‐1

k2 2

13kNW

X(k1+3N/4)j‐1

k2=3X(k1+3N/4)‐j

Radix-4 DIT Butterfly

Ar′ = Ar + (Cr × Wcr – Ci × Wci) + (Br × Wbr – Bi × Wbi) + (Dr × Wdr – Di × Wdi)Ai′ = Ai + (Cr × Wci + Ci × Wcr) + (Br × Wbi + Bi × Wbr) + (Dr × Wdi + Di × Wdr)( ) ( ) ( )Br′ = Ar – (Cr × Wcr – Ci × Wci) + (Br × Wbi + Bi × Wbr) – (Dr × Wdi + Di × Wdr)Bi′ = Ai – (Cr × Wci + Ci × Wcr) – (Br × Wbr – Bi × Wbi) + (Dr × Wdr – Di × Wdi)Cr′ = Ar + (Cr × Wcr Ci × Wci) (Br × Wbr Bi × Wbi) (Dr × Wdr Di × Wdi)Cr = Ar + (Cr × Wcr – Ci × Wci) – (Br × Wbr – Bi × Wbi) – (Dr × Wdr – Di × Wdi)Ci′ = Ai + (Cr × Wci + Ci × Wcr) – (Br × Wbi + Bi × Wbr) – (Dr × Wdi + Di × Wdr)Dr′ = Ar – (Cr × Wcr – Ci × Wci) – (Br × Wbi + Bi × Wbr) + (Dr × Wdi + Di × Wdr)Di′ = Ai – (Cr × Wci + Ci × Wcr) + (Br × Wbr – Bi × Wbi) – (Dr × Wdr – Di × Wdi)

10

Radix-4 DIF FFT

1 2 1 2

1 34 4N

Nn n k kN

1 24X k X k k 1 2 1 2

1 2

41 2

0 0 4 Nn n

Nx n n W

1 3

N

1 1 1 2 2 2

1 2

34

1 2 40 0 44

n k n k n kN N

n n

Nx n n W W W

2 2 2 1 1 1 2

14

2 31 4 1 4 1 4 1 3

4 2 4

N

k k k n k n kN N

N N Nx n W x n W x n W x n W W

2 2 2 22 2

24

4

j n k n kn kW e j

1 0 44 2 4n

4W e j

2 2 2 1 1 1 2

14

1 3

N

k k k n k n kN N Nx n j x n x n j x n W W

1

1 1 1 10 4

1 34 2 4 N N

nx n j x n x n j x n W W

Radix-4 DIF FFT

Radix-4 DIF Butterfly

Ar′ = Ar + Br + Cr + Dr= (Ar + Br) + (Cr + Dr)( ) ( )Ai′ = Ai + Bi + Ci + Di= (Ai + Ci) + (Bi + Di)Br′ = (Ar + Bi Cr Di) × Wbr (Ai Br Ci + Dr) × WbiBr = (Ar + Bi – Cr – Di) × Wbr – (Ai – Br – Ci + Dr) × Wbi= ((Ar – Cr) + (Bi – Di)) × Wbr – ((Ai – Ci) – (Br – Dr)) × WbiBi′ = (Ai – Br – Ci + Dr) × Wbr + (Ar + Bi – Cr – Di) × Wbi= ((Ai – Ci ) – (Br – Dr)) × Wbr + ((Ar – Cr) + (Bi – Di)) × Wbi

13


Cr′ = (Ar – Br + Cr – Dr) × Wcr – (Ai – Bi + Ci – Di) × Wci= ((Ar + Cr) – (Br + Dr)) × Wcr – ((Ai + Ci) – (Bi + Di)) × Wci(( ) ( )) (( ) ( ))Ci′ = (Ai – Bi + Ci – Di) × Wcr + (Ar – Br + Cr – Dr) × Wci= ((Ai + Ci) – (Bi + Di)) × Wcr + ((Ar + Cr) – (Br + Dr)) × WciDr′ = (Ar Bi Cr + Di) × Wdr (Ai + Br Ci Dr) × WdiDr = (Ar – Bi – Cr + Di) × Wdr – (Ai + Br – Ci – Dr) × Wdi= ((Ar – Cr) – (Bi – Di)) × Wdr – ((Ai – Ci) + (Br – Dr)) × WdiDi′ = (Ai + Br – Ci – Dr) × Wdr + (Ar – Bi – Cr + Di) × Wdi= ((Ai – Ci ) + (Br – Dr)) × Wdr + ((Ar – Cr) – (Bi – Di)) × Wdi

14


15

16-point Radix-4 DIF FFT

16

Digital Reversed Order of a 16-point Radix-4 FFT

Index Digital pattern Digital reversed Digital reversedpattern index

0 00 00 01 01 10 42 02 20 82 02 20 83 03 30 124 10 01 15 11 11 56 12 21 97 13 31 138 20 02 29 21 12 69 21 12 610 22 22 1011 23 32 1412 30 03 313 31 13 714 32 23 1115 33 33 15

17

Bit-Reversed Order

Index Bit Pattern Bit ReversedPattern

Bit ReversedIndex

0 0000 0000 00 0000 0000 01 0001 1000 82 0010 0100 43 0011 1100 123 0011 1100 124 0100 0010 25 0101 1010 106 0110 0110 67 0111 1110 148 1000 0001 19 1001 1001 910 1010 0101 511 1011 1101 1312 1100 0011 313 1101 1011 1113 1101 1011 1114 1110 0111 715 1111 1111 15

18

Bit-Reversed Addressing

19

Two Registers in Bit-Reversed Addressing Mode

Index Digital-Reversed Index

Bit-Reversed Indexwith One Address

R i t

Bit-Reversed Indexwith Two Address R i t R0 R1Register Registers R0, R1

0 0 0 0 (r0)1 4 8 4 (r0)2 8 4 8 (r1)2 8 4 8 (r1)3 12 12 12 (r1)4 1 2 2 (r0)5 5 10 6 (r0)5 5 10 6 (r0)6 9 6 10 (r1)7 13 14 14 (r1)8 2 1 1 (r0)( )9 6 9 5 (r0)

10 10 5 9 (r1)11 14 13 13 (r1)12 3 3 3 (r0)13 7 11 7 (r0)14 11 7 11 (r1)15 15 15 15 ( 1)15 15 15 15 (r1)

21

Scaling

• The real and imaginary parts of the butterfly can have a growth to 4have a growth to 4.

• The fixed-scaling method scales down by a fixed factor of 4 at each stagefactor of 4 at each stage.

• If an FFT consists of M stages, the output is M ( )scaled down by 4M (M = log4N), where N is the

length of the FFT.

22

SIMD Instruction Data Types

23

Instruction Description

ADD2 Packed additionSUB2 Packed subtractionNEG2 Two Words NegateIMACSU2 Two integer multiply accumulate signed by unsignedPACK.2W Packs two wordsPACK.2F Packs two fractional wordsADD.W Add 16-bit or 20-bit valueABS2 Two Words Absolute ValueASL2 A ith ti Shift L ft b O f T W d O dASL2 Arithmetic Shift Left by One of Two Word OperandsASLL2 Multiple-Bit Arithmetic Shift Left of Two Word OperandsASRR2 Multiple-Bit Arithmetic Shift Right of Two Word OperandsLSLL2 Multiple-Bit Bitwise Shift Left of Two Word OperandsLSR2 Bit i Shift Ri ht O Bit f T W d O dLSR2 Bitwise Shift Right One Bit of Two Word OperandsLSRR2 Multiple-Bit Bitwise Shift Right of Two Word OperandsSOD2ffcc Sum Or Difference of Two 16-Bit Values, function & crossMIN2 Transfer two 16-bit minimum signed valuesMAX2 T f t 16 bit i i d lMAX2 Transfer two 16-bit maximum signed valuesSUB.W Subtract 16-bit or 20-bit valueMPY2 Multiply 2 pairs of 16-bit data.MPY2R Multiply 2 pairs of 16-bit data and round the lower 16 bits of the result.MAC2 Multiply 2 pairs of 16 bit data clip the lower 16 bits of each result into 16 bit word andMAC2 Multiply 2 pairs of 16-bit data, clip the lower 16 bits of each result into 16-bit word and

accumulate it with 20-bit accumulator inputMAC2R Multiply 2 pairs of 16-bit data, round the lower 16 bits of each result into 16-bit word and

accumulate it with 20-bit accumulator input.CLIP20 Clip two 20-bit operands.

24

C 0 C p t o 0 b t ope a dsSATU20.B Saturate two unsigned bytes.MAC2ffggR Multiply 2 pairs of 16-bit data, add or subtract them from each portion

-specific format used for FFT calculation.MAC2ffggI

Complex Arithmetic

• Addition of a+jb and c+jd to form e+jfe=a+c;e=a+c;f=b+d;

M lti li ti f +jb d +jd t f +jf• Multiplication of a+jb and c+jd to form e+jfe=ac-bd;f=j(bc+ad);

25

Instructions for Complex ArithmeticSyntax Description

SOD2FFCC Da,Db,Dn Sum or Difference of Two Word Values—Function and CrossPerforms two separate 16-bit additions or subtractions betweenthe high and low portions of two source data registers and storesthe high and low portions of two source data registers and storesthe results in the two portions of the destination data register. Thevalue of FF and CC determine the behavior.FF: A for addition and S for subtractionFF: A for addition and S for subtractionCC: XX for crossed and II for not crossedThis instruction enables the use of the adder for smaller precisionvalues and therefore increases the number of operations that canpbe performed simultaneously.

MPYRE Assuming the complex type is stored in the register as 16-bit real,16-bit imaginary, this computes the real part of the complexmultiplication (Da H * Db H) (Da L * Db L) > Dnmultiplication. (Da.H Db.H) - (Da.L Db.L) -> Dn

MPYIM Assuming the complex type is stored in the register as 16-bit real,16-bit imaginary, this computes the imaginary part of the complexmultiplication. (Da.L * Db.H) + (Da.H * Db.L) -> Dn

MPYCIM Assuming the complex type is stored in the register as 16-bit real,16-bit imaginary, this computes the conjugate imaginary part ofthe complex multiplication. (Da.L * Db.H) - (Da.H * Db.L) -> Dn

MACRE Performs MPYRE with accumulationMACRE Performs MPYRE with accumulationMACIM Performs MPYIM with accumulationMACCIM Performs MPYCIM with accumulation 26

CHAPTER 5 Software Implementation of FFT Using the SC3850...

Documents

Transcript of CHAPTER 5 Software Implementation of FFT Using the SC3850...