A Modified Radix-2 SDF Pipelined OFDM Module for FPGA Based MB-OfDM UWB Systems

download A Modified Radix-2 SDF Pipelined OFDM Module for FPGA Based MB-OfDM UWB Systems

of 5

Transcript of A Modified Radix-2 SDF Pipelined OFDM Module for FPGA Based MB-OfDM UWB Systems

  • 8/14/2019 A Modified Radix-2 SDF Pipelined OFDM Module for FPGA Based MB-OfDM UWB Systems

    1/5

    AModified Radix-2 SDF Pipelined OFDM Modulefor FPGA basedMB-OFDM UWB SystemsM.Santhi, S.Arun Kumar, G.S.Praveen Kalish, K.Murali, S.Siddharth, G.LakshminarayananDepartment of ECE, National Institute of Technology, [email protected] [email protected]

    Abstract - The OFDM module in the MB-oFDM UWBtransmitter is necessarily operated at 528 MHz. This is really achallenging task because the OFDM in the UWB module has tocalculate 128-point IFFT. Earlier papers used radix-2 SDFalgorithmwith parallel processing architectures of block size twoto achieve the required speed and implemented the module onASIC. In this paper a novel scheme "modified radix-2 SDFalgorithm" is proposed to achieve the calculation of 128-pointIFFT. In the proposed scheme, the order of the twiddle factorsequence is different compared to the earlier radix-2 SDFalgorithm. The change in twiddle factor sequence achieves easierimplementation of the CSD multiplier used for IFFT calculation.It is also proposed that the required speed can be achieved onFPGA itselfwithout using paraDel processing architectures. Thiscan be done by pipelining the OFDM module as well as usingLPMs. This leads to reduction in area compared to the earlierapproach of using parallel processing architectures of block sizetwo. For improving the accuracy, in the proposed scheme theinternal wordlength is maintained at 13bits which is 7 bits morethan the input, to account for the overflows at each ofthe 7 stagesof the OFDM module. The proposed scheme with increasedcomplexity for better accuracy is tested on ALTERA Stratix IIIEP3SL50F484C2 device. From the implementation, it is verifiedthat the OFDM module achieves a maximum clock speed of 528MSamplesls. In general ASICs are three times faster than FPGA,operating the ASIC based OFDM module in 528 MHz with theproposed modified radix-24 SDF pipelined algorithm is verymuch easier.Keywords - MB-OFDM, SDF, FFT, FPGA.

    INTRODUCTIONUltra wideband (UWB) communication systems, whichenable the delivery of data from a rate of 110 Mb/s at adistance of 10m to a rate of 480 Mb/s at a distance of 2 m, areideally suited to application in short range wirelesscommunications because they can share a frequency band withexisting narrowband systems and offer a higher data rate than802.11 or Bluetooth [1]. One of the communication methodsfor IEEE 802.15.3a standard is Multiband Orthogonal

    Frequency Division Multiplexing (MB-OFDM), which offers528 MHz bandwidth [2][3]. MB-OFDM-based UWB not onlyhas reliably high-data-rate transmission in time-dispersive orfrequency-selective channels without having complex timedomain channel equalizers but also can provide high-spectralefficiency.The processor is one of the modules having highcomputational complexity in the physical layer of the UWB

    system, and the execution time of the 128-point inUWB system is only 312.5 ns. The power consumption andhardware cost can be saved in our processor by using thehigher radix FFT algorithm and less memory and complexmultipliers.This paper is organized as follows. Section II describes thedesign issues of MB-OFDM UWB communication systems.Section III describes the proposed 128-point radix-2algorithm. Section IV describes the proposed 128point radix-2 FFT/IFFT architecture. In Section V, theimplementation and performance of the proposedarchitecture are discussed. Conclusions and further work arepresented in Sections VI and VII respectively.

    II. DESIGN ISSUESO F THE FFf PROCESSORA block diagram of the proposed physical layer of OFDMbased UWB system is shown in Fig. 1[4]. In the UWB system,the data rate is from 53.3 Mb/s to 480 Mb/s with code rates ofand The bandwidth of the transmittedsignal is 528 MHz and the OFDM symbol duration is 312.5ns, including 60.61 ns for cyclic prefix duration and 9.47 nsfor guard interval duration [2][3]. Thus, an has tocompute one OFDM symbol within 312.5 ns and thethroughput rate of this specification in 128-point isup to 409.6 MSamples/s.Various FFT architectures, such as single-memoryarchitecture, dual-memory architecture, pipelined architecture,array architecture, and cached-memory architecture, have beenproposed in the last three decades. In our view, the pipelinedarchitecture should be the best choice for UWB systems sinceit can provide high throughput rate with acceptable hardwarecost.The pipelined FFT architecture typically falls into one of thetwo following categories: multipath delay commutator (MDC)and single-path delay feedback (SDF)[5]. In general, theMoescheme can achieve a higher throughput rate, while the SOFscheme needs less memory and hardware cost. In addition, thehigher radix FFT algorithm is difficult to be implemented inthe traditional MOC architecture. Table 1 compares thehardware requirements for various architectures. The proposedarchitecture based on radix 2 SOF architecture was selectedfor implementation owing to the low hardware cost andgreater area efficiency and can also provide an availablethroughput rate to meet the UWB specifications.

    Proceedings of the 2008 International Conference on Computing, Communication andNetworking (ICCCN 2008)978-1-4244-3595-1/08/$25.00

  • 8/14/2019 A Modified Radix-2 SDF Pipelined OFDM Module for FPGA Based MB-OfDM UWB Systems

    2/5

    Fig. 1. Block diagram of the MB-OFDM UWB receiver systemTABLE 1 COMPARISONOF HARDWARE REQUIREMENTS FOR N-LENGTH FFTWITH DIFFERENTARCHITECTURES

    Architecture Complex Complex Memory ControlMultiplier # Adder # size circuitR2SDF log2(N)-2 log2(N) N-l simpleR2MDC l o ~ ( N ) - 2 4Iog4(N) 3N/2-2 simpleR4SDF l o ~ ( N ) - 1 log4(N) N-l mediumR4MDC 3 ( l o ~ ( N ) - 1 ) 8Iog4(N) 5N/2-4 simpleR22SDF l o ~ ( N ) - 1 4Iog4(N) N-l simpleR2 3 SDF logg(N)-1 4Iog4(N) N-l simpleR24SDF log16(N)-1 41og4(N) N-l simple

    III. PROPOSED RADIX 24SDF ALGORITHMA Discrete Fourier transform (DFf) of length of N (=128)

    is defined as=Lx(n)Wl' tatWhere W the so called "twiddle factor", denotes the N-th

    primitive root of unity, with its exponent evaluated moduloN.The k is the frequency index, and the n is the time index. Inorder to derive the radix-24 algorithm, consider the firststeps of decomposition [6]. Applying a 5-dimensional linearindex map, wherein the 5th dimension in itselfis decomposedintoa 2 bit and 1 bit index, we have,11 N .V .V

    k < 1 2k: +4k; 16k, (2)The common factor algorithm (CFA) takes the form ofXOc 2k: 4k; + 8 k 32ks 1 6 k ~ )~ ~ ~ ( i ~ V .v :N N NL. > 2 " ~ +it':J i i ~ ' " +ii,r.t +"a't .... " ' J ~ ' : -0 .. ~ -

    [G(JII.1'1,. k }it:l"....Jlo)(k,. ... ~ . j ; ( J ]n.-Ot'la-O .J(;:cI... (3)

    (5)

    Where H (n) denotes the second butterfly unitt Ic:>:: .kl)

    Where B (n,kl) denotes the first butterfly unit as follows.=x(n) + ( - l ) t t ' x ( n + ~ )

    The algorithm can take complex constant multiplier instead ofprogrammable complex multiplier. The Canonic Signed Digit(CSD) constant multiplier contains the fewest number of nonzero bits, so it can be used to reduce the area and powerconsumption [7]. Fig. 2 shows the signal flow graph (SFG) ofthe 128-point r a d i x - ~ 4 SDF FFT a l g ~ r i t h m .

    Fig 2. Signal flow graph of the proposedR24SDP algorithmIV. PROPOSED FFT ARCHITECTURE FOR THE MB-OFDM

    UWB SYSTEMA block diagram of the proposed single data-path 128-point

    R24SDF processor is shown in Fig. 3. Theproposed architecture consists of a memory block, butterflyunits (BFl, BF2), programmable complex multipliers, CSDcomplex constant multipliers, register files, and somemultiplexers. The FFT processor can be transformed to anIFFT block by performing the operation as shown in the Fig4. The output results of butterfly units are complex additionand complex subtraction of two input data x[n] andwhere N=l28.

    Due to the spatial regularity of Radix-24 algorithm, thesynchronization control of the processor is very simple. A(log2N)-bit binary counter serves two purposes:synchronization controller and address counter for twiddlefactor reading in each stage. For first N/2 cycles, the 2-to-lmult iplexers in the butterfly module I (as shown in Fig.5.i)switch to position "0", and the butterfly is idle. The input datafrom left is directed to the shift registers until they are filled.On next N/2 cycles, the multiplexers tum to position "1 ", thebutterfly computes a 2-point DFT with incoming data and thedata stored in the shift registers.

    ZI(n) = x(n) + x(n+N/2), 0 n N/2 (6)ZI(n + N/2) = x(n) - x(n+N/2)

    The butterfly output ZI(n) is sent to apply the twiddlefactor, and ZI(n + N/2) is sent back to the shift registers to be"multiplied" in still next N/2 cycles when the first half of thenext frame of time sequence is loaded in. The operation of thesecond butterfly is similar to that of the first one, except the

    Authorized licensed use limited to: VELLORE INSTITUTE OF TECHNOLOGY. Downloaded on August 3, 2009 at 08:01 from IEEE Xplore. Restrictions apply.

  • 8/14/2019 A Modified Radix-2 SDF Pipelined OFDM Module for FPGA Based MB-OfDM UWB Systems

    3/5

    X ( D ~ 2 . ) . . X ( 4 ) A(O)"A(n)"A(Uii)r-.A(62).,A(64) . -,.Il(94},A(t26),A ( J . ) " A ( " ) " A ( J . 7 ) r . A ( ~ ~ . . , . A ( 9 S ) , A ( 1 2 7 )

    c8> C a o CSD Complex_____DataFig3. Block diagram ofFFT/IFFT processor

    "distance" of butterfly input sequence are just N/4 and thetrivial twiddle factor multiplication has been implemented byreal-imaginary swapping with a commutator and controlledadd/subtract operations, as in Fig. 5-ii, which requires two bitcontrol signal from the synchronizing counter. The data thengoes through a full complex multiplier, working at 75%

    IFFT 11M Fig 5.i Structure ofBFl

    Fig 4 Block diagram of the proposed 128-pointR2 SDF FFT/IFFT processor

    R

    11N

    utility, accomplishes the result of first level of radix-4 OFTword by word. Further processing repeats this pattern with thedistance of the input data decreases by half at eachconsecutive butterfly stages. After N-l clock cycles, thecomplete OFT transform result streams out to the right, in bitreversed order. The next frame of transform can be computedwithout pausing due to the pipelined processing of each stage.Radix-2 FFT algorithm based single-data-path architectureshas fewer multipliers than those of lower radix FFTalgorithms. For example, radix-2 algorithm has the samenumber ofmultipliers as the radix-22 algorithm but can reducean amount ofmultiplicative complexity by means of replacinga half of full complex multipliers with trivial constantmultipliers [8].In the CSD complex constant multiplier, themultiplication of the twiddle factors is processed according totheir scheduling in the signal flow graph. The output datagenerated by the BF in the sixth stage are multiplied by atrivial twiddle factor, -j, W(16) or W(48) before they are fedto the last stage.The Simplification ofthe Complex MultiplicationComplex multipl ication is the main design key in the FFTalgorithm. Consider the complex multipl ication, the twoinputs should be the xr + xi and the coefficient W =exp(j21t1N) = cosa + sin a , and the result can be expressed byY = yr + yi , where,yr= xr cos a - xi sin a = xi(cos a + sina) + (xi - xr) cos ayi = xi cos a+ xr sin a= xr(cos a - sin a)-(xi - xr) cos a (7)

    Fig 5.ii Structure ofBF2

    After the transform of the Eq.7, the complex multiplicationonly needs 3 real multiplications, 1 addition and 2 subtractionwhen the sum and the difference between the real and theimaginary parts are precomputed and stored in the ROM .Thisalgorithm is used for the programmable complex multiplier toreduce the hardware complexity and to increase the speed.

    CSDMultiplierSince the twiddle factors in the FFT processor are known inadvance, we propose the use of a multiplier-less architectureto perform the multiplication with the twiddle factors usingshift-and-add operations. The canonical sign digit (CSD)algorithm has been applied to this architecture to furtherreduce the number of shift and-add operations required. In thisarchitecture trivial multiplications are implemented withoutany multipliers by either passing the data, swapping the realand imaginary parts of the complex data or a sign change. Thedesign presented in the paper takes advantage of thesymmetries of the twiddle factors in the complex plane.When the real and imaginary values of twiddle factors aresame, two CSO constant multipliers and two adder/subtractors are used to generate the output. When the real andimaginary values are not same, three CSO constant multipliers

    Authorized licensed use limited to: VELLORE INSTITUTE OF TECHNOLOGY. Downloaded on August 3, 2009 at 08:01 from IEEE Xplore. Restrictions apply.

  • 8/14/2019 A Modified Radix-2 SDF Pipelined OFDM Module for FPGA Based MB-OfDM UWB Systems

    4/5

    are used. If inputs don't need to multiply with twiddle factorthe output results are generated from the input directly.PipeliningThe radix 24 architecture was thoroughly analyzed to findpossible areas to be pipelined based on the design and thecritical path delays between various implemented blocks. The

    processor was extensively pipelined to achieve the highworking frequency to meet the UWB specification.Shimming registers are also needed for control signals tocomplywith thus revised timing.IMPLEMENTATIONAND PERFORMANCE

    The word length of the proposed FFT is 6-bit externalFFT data [9] for both the real and imaginary parts. The 2'scomplement representation of numbers is used in theprocessor. Due to overflow in each adder of the butterfly unit,13-bit internal FFT precision has been maintained. Thedetermined word length not only keeps the quantization noiseto the least but also can minimize the hardware complexity.After the appropriate word length of the proposed FFT/IFFTprocessor is chosen, the architecture of the processor wasmodeled in Verilog in an ALTERA Stratix III FPGA. Some ofthe modules were generated from the ALTERA MegawizardPlug-in Manager and others were written at the RTL level,including the top level wrapper file. contains all theinstantiated modules and the connectivity information in RTL(VerilogHDL). The Timequest timing analyzer and Chipplanner (Floorplan and Chip editor) of QUARTUS II 8.0 wereapplied to analyze timing, hardware expenditure and so on.Vector waveforms associated with the RTL description werecreated and the stimulus provided in an external file. Using thevector waveform file, simulations were carried out for thedesign to validate the behavioral description. The results wereobtained incrementally, first for a sub block comprising of onemodule of the FFT. Finally the results were obtained for thewhole design comprising of seven such sub blocks, globalclock and dual port RAMs. The output of the Verilog coded

    TABLE 2 IMPLEMENTATION RESULTS OF THE PROPOSED PROCESSORFamily ALTERA Stratix ALTERA Stratix IIIIIDevice EP3SL50F484C2 EP2s60FI020C4ALUTs 7972/38000 (3%) 7822/48352 (16%)ALMs 3986/19000(3%) 4375/19000(3%)

    DSP block 6/216 3%) 6/288 (2%)elementsTotal memory bits 3328/18800641 8192/25441921%)%)

    Word length 1:6 bits 1:6 bitsQ:6 bits Q:6 bitsNumber of 7580/38000(20%0 7697/38000(20%)reldsters

    Programmablecomplex 1 1

    multipliersConstant complex 2 2multipliers

    Number of 28 28complex addersClock rate 528 MHz 350MHz

    Throughput rate 528 Msamples/s 350Msamples/sCritical path delay 1.87 ns 2.87 ns

    architecture agreed with the output data ofMATLAB and thein our UWB platform, which was designed on aEXCEL worksheet which clearly depicts the outputs with thesignal flow graph.The implementation of the proposed processorwas carried out on a Stratix II EP2S60FI020C4 device andsimulated for ALTERA Stratix III EP3SL50F484C2. Theinput data is given through a dual port RAM and a PLL unit isused to give the required clock frequency. The output ischecked using a dual port RAM and the in-system memorycontent editor. Table 2 shows the performance and resourceusage of the implemented processor. This shows the processoris area efficient and so the entire MB-OFDM receiverItransmitter with the other modules can be accommodated in asingle chip. has a significantly reduced number of complexmultiplication and complex addition. The critical path delayoccurs between the input RAM and first butterfly unit and sothe processor is capable of running at UWB speeds ifimplemented within a larger system.All the previous implementations were on ASIC [9] andso comparison with them is not meaningful. Table 3 shows thecomparisons of performance of the different FFT processorsimplemented on FPGA. The validity and efficiency of theproposed architecture has been verified by extensivesimulation and implementation. Fig 6 shows theimplementation results of the proposed FFT processor.

    TABLE 3 COMPARISIONS OF THE Performance of DIFFERENT PROCESSORS

    Family Frequency maxAltera FFT Megacore function on 456MHzStratix II I [10]Proposed processor on ALTERA 350MHzStratix II EP2s60FI020C4Proposed processor on ALTERA 528 MHzStratix II I EP3SLSOF484C2

    VI. CONCLUSIONOFDM module implemented as 128-point FFT/IFFTprocessor for a FPGA-based MB-OFDM UWB system usingthe proposed modified radix-2 SDF pipelined algorithm hasbeen successfully implemented on ALTERA STRATIX IIIand STRATIX II FPGAs without using parallel processingarchitectures. The high speed is achieved by using extensivepipelining on Altera's LPM. The hardware costs of memoryand complex multiplier is saved by adopting delay feedbackand data scheduling approaches. In addition, the number ofcomplex multiplications is reduced effectively by using ahigher radix algorithm and using CSD complex multipliers.Also for improving the accuracy in the proposed scheme, theinternal wordlength is maintained at 13bits which is 7 bitsmore than the input, to account for the overflows at each of the7 stages of the OFDM module. The implementation resultsshow that the throughput rate is 350 MSamples/s at 350 MHzon ALTERA STRATIX II and 528 MSamples/s at 528 MHzon ALTERA STRATIX III device. The high throughput rateof the OFDM module with increased internal wordlength of 13

    Authorized licensed use limited to: VELLORE INSTITUTE OF TECHNOLOGY. Downloaded on August 3, 2009 at 08:01 from IEEE Xplore. Restrictions apply.

  • 8/14/2019 A Modified Radix-2 SDF Pipelined OFDM Module for FPGA Based MB-OfDM UWB Systems

    5/5

    bits from 6bits to improve accuracy is very well meeting theMB-OFDM UWB system's specifications.

    Fig 6. Results of the implemented processorVII. REFERENCES

    [1] Time Domain, "UWB Applications, Demonstration & RegulatoryUpdate," Sept2001 workshop, March 20,2001.

    [2] A. Batra et aI., "Multi-band OFDM Physical Layer Proposal for IEEE802.15 Task Group 3a," IEEE P802.15-Q3/268r3, March 2004.

    [3] A. Batra, J. Balakrishnan, G. R. Aiello, J. R. Foerster, A. Dabak, DesignofMultiband OFDM System for Realistic UWB Channel Environment,"IEEE Trans. On Microwave Theory and Techniques, vol . 52, no. 9, pp.2123-2138, Sept. 2004.

    [4] Y-W. Lin, H-Y. Liu, andC-Y. Lee, "A I-GS/s FFT/IFFT processor forUWB applications," IEEE Journal of Solid-State Circuits, vol. 40, no. 8,pp. 1726-1735, August 2005.

    [5] S. He and M. Torkel son, iODesigning pipeline FFT processor forOFDM(de)modulation,i in Proc. DRSI Int. Symp. Signals , Systems,and Electronics, vol. 29, Oct. 1998, pp. 257.262.

    [6] J. Lee, H. Lee, S-I. Cho, S-S. Choi, "A High-Speed, Low-ComplexityRadix-2 FFT Processor for MB-OFDM UWB Systems," IEEE Inter.Symp. on Circuits and Systems, pp. 4719-4722,

    [7] S-M. Kim, J-G. Chung, and K. K. Parhi, "Low Error Fixed-width CSDMult ip lie r with Eff ic ient Sign Extension," IEEE Transact ions onCircuitsand Systems-II, vol. 50, no. 12, Dec. 2003.

    [8] H.Lee , M.Shin "A High-Speed Low-Complexity Two-Parallel Radix-2FFT/IFFT Processor for UWB Applications, " IEEE Asian Solid-StateCircuitsConference, November 2007[9] R. S. Sherratt, S. Makino,"Numerical Precision Requirements on theMultiband Ultra-Wideband System for Practical Consumer ElectronicDevices" IEEE Transactions on Consumer Electronics, Vol. 51, No.2,MAY 2005.

    [10] FFT MegaCore Function UserGuide MegaCore Version 7.2www.altera.com