CHAPTER 5 Software Implementation of FFT Using the SC3850...
Transcript of CHAPTER 5 Software Implementation of FFT Using the SC3850...
CHAPTER 5
Software Implementation of FFT pUsing the SC3850 Core
1
Fast Fourier Transform (FFT)
• Discrete Fourier Transform (DFT) is defined by:
21
, 0,1, , 1,N jnk N
N NX k x n W k N W e
• Theoretical arithmetic complexity:
0n
Theoretical arithmetic complexity:– N2 complex multiplications and– N(N-1) complex additionsN(N 1) complex additions.
• Real numbers computation 4N2 real multiplications and– 4N2 real multiplications and
– 4N2-2N real additions.
2
Fast Fourier Transform (FFT)
• There are fast algorithms that compute a DFT with a smaller number of operationswith a smaller number of operations.
N = RM
• This are called Radix-R algorithms.
• Radix-2 FFT reduces the complexity to– (N/2)log2N complex multiplications(N/2)log2N complex multiplications – Nlog2N complex additions
• Radix-4 FFT 75% reduced multiplications• Radix-4 FFT– (3N/4)log4N = (3N/8)log2N complex multiplications
8(N/4)log N = Nlog N complex additions
p
– 8(N/4)log4N = Nlog2N complex additions3
Transmitter and Receiver Structure of SC-FDMA and OFDMA SystemsSC FDMA and OFDMA Systems
4
Fast Fourier Transform (FFT)
• DFT:
21N jk
0
, 0,1, , 1,jnk N
N Nn
X k x n W k N W e
• Twiddle factors properties:2
– Periodicity property:2j N
k N k kNN N NW W e W
N N
– Symmetry property: 2 2N Nk k k j k
N N N N NW W W W e W
2 k
– Base change:2 kj nnk nN
N Nk
W e W
Radix-4 FFT
N=4M
• Decimation-In-Time (DIT) algorithm:
4 0 1 1 0 1 2 3N1 2 1 24 , 0,1, , 1, 0,1, 2,3;
4
0 1 1 0 1 2 3;
n n n n n
N Nk k k k k
D i ti I F (DIF) l ith
1 2 1 2, 0,1, , 1, 0,1,2,3;4 4
k k k k k
• Decimation-In-Frequency (DIF) algorithm:
0 1 1 0 1 2 3;N Nn n n n n 1 2 1 2
1 2 1 2
, 0,1, , 1, 0,1,2,3;4 4
4 , 0,1, , 1, 0,1,2,3;
n n n n n
Nk k k k k
1 2 1 24 , 0,1, , 1, 0,1,2,3;4
k k k k k
Radix-4 DIT FFT
N 1 2 1 2
1 34 4N
Nn n k k
1 24NX k X k k
1 2 1 2
1 2
41 2
0 04 N
n nx n n W
1 3
N
1 1 2 1 2 2
1 2
34
1 2 40 0 4
4 n k n k n kN N
n nx n n W W W
N N
2 0,1,2,3n
1 1
1
14
10 4
4
N
n kN
nx n W
1kNW 1 1
1
14
10 4
4 1
N
n kN
nx n W
1 24NX k k
2
4kW
1 0 4n
12kNW 13k
NW
1 4
1 1
14
14 2
N
n kNx n W
1 1
14
14 3
N
n kNx n W
224
kW 234
kWN N 1
10 4
Nn
1
10 4
Nn 4 4
Radix-4 DIT FFT
x(4n1) TFD 0 TFD
0
TFDN/4
k1N/4‐1
TFD4
x(4n1+1) TFDN/4
0k1
1kNW
k2=0 X(k1)k1
N/4 N/4‐1
x(4n1+2) TFD 0
TFD412kW
123
( 1)X(k1+N/4)X(k1+N/2)( / )
x(4n1 2) TFDN/4
0k1
N/4‐1
1NW 3 X(k1+3N/4)
N/4‐1
x(4n1+3) TFD 0k
13kNW TFD
N/4 1
N/4k1
N/4‐1 4
Radix-4 DIT Butterfly
• Computation complexity– 3 complex multiplications
k2=0p p
– 8 complex additions• Real computations
X(k1)
– 12 real multiplications– 22 real additions
1kNW
X(k1+N/4)‐j
k2=1
‐1
12kW
( 1 )j
k2=21NW
X(k1+N/2)‐1
‐1
k2 2
13kNW
X(k1+3N/4)j‐1
k2=3X(k1+3N/4)‐j
Radix-4 DIT Butterfly
Ar′ = Ar + (Cr × Wcr – Ci × Wci) + (Br × Wbr – Bi × Wbi) + (Dr × Wdr – Di × Wdi)Ai′ = Ai + (Cr × Wci + Ci × Wcr) + (Br × Wbi + Bi × Wbr) + (Dr × Wdi + Di × Wdr)( ) ( ) ( )Br′ = Ar – (Cr × Wcr – Ci × Wci) + (Br × Wbi + Bi × Wbr) – (Dr × Wdi + Di × Wdr)Bi′ = Ai – (Cr × Wci + Ci × Wcr) – (Br × Wbr – Bi × Wbi) + (Dr × Wdr – Di × Wdi)Cr′ = Ar + (Cr × Wcr Ci × Wci) (Br × Wbr Bi × Wbi) (Dr × Wdr Di × Wdi)Cr = Ar + (Cr × Wcr – Ci × Wci) – (Br × Wbr – Bi × Wbi) – (Dr × Wdr – Di × Wdi)Ci′ = Ai + (Cr × Wci + Ci × Wcr) – (Br × Wbi + Bi × Wbr) – (Dr × Wdi + Di × Wdr)Dr′ = Ar – (Cr × Wcr – Ci × Wci) – (Br × Wbi + Bi × Wbr) + (Dr × Wdi + Di × Wdr)Di′ = Ai – (Cr × Wci + Ci × Wcr) + (Br × Wbr – Bi × Wbi) – (Dr × Wdr – Di × Wdi)
10
Radix-4 DIF FFT
1 2 1 2
1 34 4N
Nn n k kN
1 24X k X k k 1 2 1 2
1 2
41 2
0 0 4 Nn n
Nx n n W
1 3
N
1 1 1 2 2 2
1 2
34
1 2 40 0 44
n k n k n kN N
n n
Nx n n W W W
2 2 2 1 1 1 2
14
2 31 4 1 4 1 4 1 3
4 2 4
N
k k k n k n kN N
N N Nx n W x n W x n W x n W W
2 2 2 22 2
24
4
j n k n kn kW e j
1 0 44 2 4n
4W e j
2 2 2 1 1 1 2
14
1 3
N
k k k n k n kN N Nx n j x n x n j x n W W
1
1 1 1 10 4
1 34 2 4 N N
nx n j x n x n j x n W W
Radix-4 DIF FFT
Radix-4 DIF Butterfly
Ar′ = Ar + Br + Cr + Dr= (Ar + Br) + (Cr + Dr)( ) ( )Ai′ = Ai + Bi + Ci + Di= (Ai + Ci) + (Bi + Di)Br′ = (Ar + Bi Cr Di) × Wbr (Ai Br Ci + Dr) × WbiBr = (Ar + Bi – Cr – Di) × Wbr – (Ai – Br – Ci + Dr) × Wbi= ((Ar – Cr) + (Bi – Di)) × Wbr – ((Ai – Ci) – (Br – Dr)) × WbiBi′ = (Ai – Br – Ci + Dr) × Wbr + (Ar + Bi – Cr – Di) × Wbi= ((Ai – Ci ) – (Br – Dr)) × Wbr + ((Ar – Cr) + (Bi – Di)) × Wbi
13
Radix-4 DIF Butterfly
Cr′ = (Ar – Br + Cr – Dr) × Wcr – (Ai – Bi + Ci – Di) × Wci= ((Ar + Cr) – (Br + Dr)) × Wcr – ((Ai + Ci) – (Bi + Di)) × Wci(( ) ( )) (( ) ( ))Ci′ = (Ai – Bi + Ci – Di) × Wcr + (Ar – Br + Cr – Dr) × Wci= ((Ai + Ci) – (Bi + Di)) × Wcr + ((Ar + Cr) – (Br + Dr)) × WciDr′ = (Ar Bi Cr + Di) × Wdr (Ai + Br Ci Dr) × WdiDr = (Ar – Bi – Cr + Di) × Wdr – (Ai + Br – Ci – Dr) × Wdi= ((Ar – Cr) – (Bi – Di)) × Wdr – ((Ai – Ci) + (Br – Dr)) × WdiDi′ = (Ai + Br – Ci – Dr) × Wdr + (Ar – Bi – Cr + Di) × Wdi= ((Ai – Ci ) + (Br – Dr)) × Wdr + ((Ar – Cr) – (Bi – Di)) × Wdi
14
Radix-4 DIF Butterfly
15
16-point Radix-4 DIF FFT
16
Digital Reversed Order of a 16-point Radix-4 FFT
Index Digital pattern Digital reversed Digital reversedpattern index
0 00 00 01 01 10 42 02 20 82 02 20 83 03 30 124 10 01 15 11 11 56 12 21 97 13 31 138 20 02 29 21 12 69 21 12 610 22 22 1011 23 32 1412 30 03 313 31 13 714 32 23 1115 33 33 15
17
Bit-Reversed Order
Index Bit Pattern Bit ReversedPattern
Bit ReversedIndex
0 0000 0000 00 0000 0000 01 0001 1000 82 0010 0100 43 0011 1100 123 0011 1100 124 0100 0010 25 0101 1010 106 0110 0110 67 0111 1110 148 1000 0001 19 1001 1001 910 1010 0101 511 1011 1101 1312 1100 0011 313 1101 1011 1113 1101 1011 1114 1110 0111 715 1111 1111 15
18
Bit-Reversed Addressing
19
20
Two Registers in Bit-Reversed Addressing Mode
Index Digital-Reversed Index
Bit-Reversed Indexwith One Address
R i t
Bit-Reversed Indexwith Two Address R i t R0 R1Register Registers R0, R1
0 0 0 0 (r0)1 4 8 4 (r0)2 8 4 8 (r1)2 8 4 8 (r1)3 12 12 12 (r1)4 1 2 2 (r0)5 5 10 6 (r0)5 5 10 6 (r0)6 9 6 10 (r1)7 13 14 14 (r1)8 2 1 1 (r0)( )9 6 9 5 (r0)
10 10 5 9 (r1)11 14 13 13 (r1)12 3 3 3 (r0)13 7 11 7 (r0)14 11 7 11 (r1)15 15 15 15 ( 1)15 15 15 15 (r1)
21
Scaling
• The real and imaginary parts of the butterfly can have a growth to 4have a growth to 4.
• The fixed-scaling method scales down by a fixed factor of 4 at each stagefactor of 4 at each stage.
• If an FFT consists of M stages, the output is M ( )scaled down by 4M (M = log4N), where N is the
length of the FFT.
22
SIMD Instruction Data Types
23
Instruction Description
ADD2 Packed additionSUB2 Packed subtractionNEG2 Two Words NegateIMACSU2 Two integer multiply accumulate signed by unsignedPACK.2W Packs two wordsPACK.2F Packs two fractional wordsADD.W Add 16-bit or 20-bit valueABS2 Two Words Absolute ValueASL2 A ith ti Shift L ft b O f T W d O dASL2 Arithmetic Shift Left by One of Two Word OperandsASLL2 Multiple-Bit Arithmetic Shift Left of Two Word OperandsASRR2 Multiple-Bit Arithmetic Shift Right of Two Word OperandsLSLL2 Multiple-Bit Bitwise Shift Left of Two Word OperandsLSR2 Bit i Shift Ri ht O Bit f T W d O dLSR2 Bitwise Shift Right One Bit of Two Word OperandsLSRR2 Multiple-Bit Bitwise Shift Right of Two Word OperandsSOD2ffcc Sum Or Difference of Two 16-Bit Values, function & crossMIN2 Transfer two 16-bit minimum signed valuesMAX2 T f t 16 bit i i d lMAX2 Transfer two 16-bit maximum signed valuesSUB.W Subtract 16-bit or 20-bit valueMPY2 Multiply 2 pairs of 16-bit data.MPY2R Multiply 2 pairs of 16-bit data and round the lower 16 bits of the result.MAC2 Multiply 2 pairs of 16 bit data clip the lower 16 bits of each result into 16 bit word andMAC2 Multiply 2 pairs of 16-bit data, clip the lower 16 bits of each result into 16-bit word and
accumulate it with 20-bit accumulator inputMAC2R Multiply 2 pairs of 16-bit data, round the lower 16 bits of each result into 16-bit word and
accumulate it with 20-bit accumulator input.CLIP20 Clip two 20-bit operands.
24
C 0 C p t o 0 b t ope a dsSATU20.B Saturate two unsigned bytes.MAC2ffggR Multiply 2 pairs of 16-bit data, add or subtract them from each portion
-specific format used for FFT calculation.MAC2ffggI
Complex Arithmetic
• Addition of a+jb and c+jd to form e+jfe=a+c;e=a+c;f=b+d;
M lti li ti f +jb d +jd t f +jf• Multiplication of a+jb and c+jd to form e+jfe=ac-bd;f=j(bc+ad);
25
Instructions for Complex ArithmeticSyntax Description
SOD2FFCC Da,Db,Dn Sum or Difference of Two Word Values—Function and CrossPerforms two separate 16-bit additions or subtractions betweenthe high and low portions of two source data registers and storesthe high and low portions of two source data registers and storesthe results in the two portions of the destination data register. Thevalue of FF and CC determine the behavior.FF: A for addition and S for subtractionFF: A for addition and S for subtractionCC: XX for crossed and II for not crossedThis instruction enables the use of the adder for smaller precisionvalues and therefore increases the number of operations that canpbe performed simultaneously.
MPYRE Assuming the complex type is stored in the register as 16-bit real,16-bit imaginary, this computes the real part of the complexmultiplication (Da H * Db H) (Da L * Db L) > Dnmultiplication. (Da.H Db.H) - (Da.L Db.L) -> Dn
MPYIM Assuming the complex type is stored in the register as 16-bit real,16-bit imaginary, this computes the imaginary part of the complexmultiplication. (Da.L * Db.H) + (Da.H * Db.L) -> Dn
MPYCIM Assuming the complex type is stored in the register as 16-bit real,16-bit imaginary, this computes the conjugate imaginary part ofthe complex multiplication. (Da.L * Db.H) - (Da.H * Db.L) -> Dn
MACRE Performs MPYRE with accumulationMACRE Performs MPYRE with accumulationMACIM Performs MPYIM with accumulationMACCIM Performs MPYCIM with accumulation 26