Unified Architecture for Reed-Solomon Decoder Combined With Burst-Error Correction

5
1346 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 20, NO. 7, JULY 2012 [19] SUN Microsystem, Santa Clara, CA, “UltraSPARC T2 and T2 Plus Processors,” 2007. [Online]. Available: http://www.sun.com/proces- sors/UltraSPARC-T2/ [20] SUN Microsystem, Santa Clara, CA, “OpenSPARC T2,” 2007. [On- line]. Available: http://www.opensparc.net/opensparc-t2/index.html [21] Synopsys, Inc., San Jose, CA, “Design compiler reference manual, V.Z-2007.03,” 2007. [22] D. Sylvester and K. Keutzer, “System-level performance modeling with BACPAC—berkeley advanced chip performance calculator,” in Proc. IEEE SLIP, 1999, pp. 109–114. Unified Architecture for Reed-Solomon Decoder Combined With Burst-Error Correction Li Li, Bo Yuan, Zhongfeng Wang, Jin Sha, Hongbing Pan,and Weishan Zheng Abstract—Reed-Solomon (RS) codes are widely used as forward correc- tion codes (FEC) in digital communication and storage systems. Correcting random errors of RS codes have been extensively studied in both academia and industry. However, for burst-error correction, the research is still quite limited due to its ultra high computation complexity. In this brief, starting from a recent theoretical work, a low-complexity reformulated inversion- less burst-error correcting (RiBC) algorithm is developed for practical ap- plications. Then, based on the proposed algorithm, a unified VLSI archi- tecture that is capable of correcting burst errors, as well as random errors and erasures, is firstly presented for multi-mode decoding requirements. This new architecture is denoted as unified hybrid decoding (UHD) ar- chitecture. It will be shown that, being the first RS decoder owning en- hanced burst-error correcting capability, it can achieve significantly im- proved error correcting capability than traditional hard-decision decoding (HDD) design. Index Terms—Burst errors, Reed-Solomon (RS) codes, unified architec- ture, VLSI. I. INTRODUCTION Reed-Solomon (RS) codes have been widely employed for error cor- rection in modern digital communication and data storage systems. Similar with other forward correction codes (FEC), when using RS codes as channel coding, the errors occurred in transmission procedure are typically divided into random errors and burst errors. Currently, for decoding RS codes with random-error correction, numerous liter- atures have given extensive studies on theoretical algorithms as well as hardware implementations [1]–[3], [8]. However, for specific RS burst-error decoder design, although some dedicated algorithms had Manuscript received August 04, 2010; revised December 11, 2010 and April 04, 2011; accepted April 20, 2011. Date of publication June 16, 2011; date of current version June 01, 2012. This work is was supported in part by the Na- tional Nature Science Foundation of China under Grant 60876017 and Grant 61006018, by the Joint Prospective Funds for Production, Education, and Re- search of Jiangsu Province under Grant 2009146, by the Fundamental Research Funds for the Central Universities under Grant 1095021031. This paper was presented in part at the 8th IEEE International Conference on ASIC, Changsha, China, October 2009. L. Li, B. Yuan, J. Sha, H. Pan, and W. Zheng are with the In- stitute of VLSI Design, Nanjing University, Nanjing 210093, China (e-mail: [email protected]; [email protected]; [email protected]; [email protected]; [email protected]). Z. Wang is with Broadcom Corporation, Irvine, CA 92617 USA (e-mail: [email protected]). Digital Object Identifier 10.1109/TVLSI.2011.2154369 been reported [4], [5], the VLSI implementations for these burst-error correcting algorithms are still under-investigated, which are limited by their cubic computation complexity , where and are length of codeword and traditional correction capability respectively. Recently, a novel low-complexity RS burst-error correcting algorithm that only requires computation was proposed by Wu [6]. It can be proved that the algorithm is capable of correcting a long burst of errors together with possible random errors. In this brief, developed from the above new algorithm, a high-speed reformulated inversionless burst-error correcting (RiBC) algorithm is proposed, and a unified hybrid decoding (UHD) architecture that supports three decoding modes is presented for the first time. It will be shown that, compared with traditional RS decoder, the proposed UHD architecture can achieve significantly better burst-error correcting capability. The structure of this paper is organized as follows. Section II de- scribes the proposed RiBC algorithm. The architecture and latency of new UHD decoder are presented in Section III. Section IV provides hardware performance and comparison. Final conclusion is drawn in Section V. II. PROPOSED RIBC ALGORITHM A. Original Burst-Error Correcting (BC) Algorithm In [6], Wu proposed a new approach to track the position of burst of errors. By introducing a new polynomial that is a special linear func- tion of syndromes, Wu proved that the desired single burst of errors can be acquired by tracking the longest consecutive roots of new poly- nomial. Furthermore, that approach was extended to BC algorithm for correcting a long burst of errors with length up to plus a maximum of random errors. In the BC algorithm, is a pre-chosen parameter that determines the specific error correcting capability. It indicates that the decoder is 1063-8210/$26.00 © 2011 IEEE

description

Reed-Solomon (RS) codes are widely used as forward correctioncodes (FEC) in digital communication and storage systems. Correctingrandom errors of RS codes have been extensively studied in both academiaand industry. However, for burst-error correction, the research is still quitelimited due to its ultra high computation complexity. In this brief, startingfrom a recent theoretical work, a low-complexity reformulated inversionlessburst-error correcting (RiBC) algorithm is developed for practical applications.Then, based on the proposed algorithm, a unified VLSI architecturethat is capable of correcting burst errors, as well as random errorsand erasures, is firstly presented for multi-mode decoding requirements.This new architecture is denoted as unified hybrid decoding (UHD) architecture.It will be shown that, being the first RS decoder owning enhancedburst-error correcting capability, it can achieve significantly improvederror correcting capability than traditional hard-decision decoding(HDD) design.

Transcript of Unified Architecture for Reed-Solomon Decoder Combined With Burst-Error Correction

  • 1346 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 20, NO. 7, JULY 2012

    [19] SUN Microsystem, Santa Clara, CA, UltraSPARC T2 and T2 PlusProcessors, 2007. [Online]. Available: http://www.sun.com/proces-sors/UltraSPARC-T2/

    [20] SUN Microsystem, Santa Clara, CA, OpenSPARC T2, 2007. [On-line]. Available: http://www.opensparc.net/opensparc-t2/index.html

    [21] Synopsys, Inc., San Jose, CA, Design compiler reference manual,V.Z-2007.03, 2007.

    [22] D. Sylvester and K. Keutzer, System-level performance modelingwith BACPACberkeley advanced chip performance calculator, inProc. IEEE SLIP, 1999, pp. 109114.

    Unified Architecture for Reed-Solomon Decoder CombinedWith Burst-Error Correction

    Li Li, Bo Yuan, Zhongfeng Wang, Jin Sha, Hongbing Pan, andWeishan Zheng

    AbstractReed-Solomon (RS) codes are widely used as forward correc-tion codes (FEC) in digital communication and storage systems. Correctingrandom errors of RS codes have been extensively studied in both academiaand industry. However, for burst-error correction, the research is still quitelimited due to its ultra high computation complexity. In this brief, startingfrom a recent theoretical work, a low-complexity reformulated inversion-less burst-error correcting (RiBC) algorithm is developed for practical ap-plications. Then, based on the proposed algorithm, a unified VLSI archi-tecture that is capable of correcting burst errors, as well as random errorsand erasures, is firstly presented for multi-mode decoding requirements.This new architecture is denoted as unified hybrid decoding (UHD) ar-chitecture. It will be shown that, being the first RS decoder owning en-hanced burst-error correcting capability, it can achieve significantly im-proved error correcting capability than traditional hard-decision decoding(HDD) design.

    Index TermsBurst errors, Reed-Solomon (RS) codes, unified architec-ture, VLSI.

    I. INTRODUCTION

    Reed-Solomon (RS) codes have been widely employed for error cor-rection in modern digital communication and data storage systems.Similar with other forward correction codes (FEC), when using RScodes as channel coding, the errors occurred in transmission procedureare typically divided into random errors and burst errors. Currently,for decoding RS codes with random-error correction, numerous liter-atures have given extensive studies on theoretical algorithms as wellas hardware implementations [1][3], [8]. However, for specific RSburst-error decoder design, although some dedicated algorithms had

    Manuscript received August 04, 2010; revised December 11, 2010 and April04, 2011; accepted April 20, 2011. Date of publication June 16, 2011; date ofcurrent version June 01, 2012. This work is was supported in part by the Na-tional Nature Science Foundation of China under Grant 60876017 and Grant61006018, by the Joint Prospective Funds for Production, Education, and Re-search of Jiangsu Province under Grant 2009146, by the Fundamental ResearchFunds for the Central Universities under Grant 1095021031. This paper waspresented in part at the 8th IEEE International Conference on ASIC, Changsha,China, October 2009.

    L. Li, B. Yuan, J. Sha, H. Pan, and W. Zheng are with the In-stitute of VLSI Design, Nanjing University, Nanjing 210093, China(e-mail: [email protected]; [email protected]; [email protected];[email protected]; [email protected]).

    Z. Wang is with Broadcom Corporation, Irvine, CA 92617 USA (e-mail:[email protected]).

    Digital Object Identifier 10.1109/TVLSI.2011.2154369

    been reported [4], [5], the VLSI implementations for these burst-errorcorrecting algorithms are still under-investigated, which are limitedby their cubic computation complexity , where and arelength of codeword and traditional correction capability respectively.Recently, a novel low-complexity RS burst-error correcting algorithmthat only requires computation was proposed by Wu [6]. It canbe proved that the algorithm is capable of correcting a long burst oferrors together with possible random errors.

    In this brief, developed from the above new algorithm, a high-speedreformulated inversionless burst-error correcting (RiBC) algorithmis proposed, and a unified hybrid decoding (UHD) architecture thatsupports three decoding modes is presented for the first time. It will beshown that, compared with traditional RS decoder, the proposed UHDarchitecture can achieve significantly better burst-error correctingcapability.

    The structure of this paper is organized as follows. Section II de-scribes the proposed RiBC algorithm. The architecture and latency ofnew UHD decoder are presented in Section III. Section IV provideshardware performance and comparison. Final conclusion is drawn inSection V.

    II. PROPOSED RIBC ALGORITHM

    A. Original Burst-Error Correcting (BC) AlgorithmIn [6], Wu proposed a new approach to track the position of burst of

    errors. By introducing a new polynomial that is a special linear func-tion of syndromes, Wu proved that the desired single burst of errorscan be acquired by tracking the longest consecutive roots of new poly-nomial. Furthermore, that approach was extended to BC algorithm forcorrecting a long burst of errors with length up to plus amaximum of random errors.

    In the BC algorithm, is a pre-chosen parameter that determinesthe specific error correcting capability. It indicates that the decoder is

    1063-8210/$26.00 2011 IEEE

  • IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 20, NO. 7, JULY 2012 1347

    capable of correcting a -length burst of errors plusa maximum of random errors. In this case, the miscorrection proba-bility is upper bounded by . Readers arestrongly recommended to refer to [6] for detailed descriptions of BCalgorithm.

    B. Proposed Reformulated Inversionless Burst-Error Correcting(RiBC) Algorithm

    Although BC algorithm has reduced computation complexity, somedisadvantages impedes its efficient VLSI design: 1) the inversion oper-ation exists in step A2.2.4; 2) computation in step A2.2.1 and A2.2.2contains long data path and data dependency; 3) for calculating (step A4), extra cycles or another copy of original circuitry arerequired.

    To resolve the above problems, by applying a similar arithmetictransformation presented in [1], we reformulate BC algorithm to theproposed RiBC algorithm.

    The RiBC algorithm is a kind of list decoding algorithm. Eight poly-nomials are updated simultaneously in each iteration. After every inner iterations, , as the candidate of the error locator polyno-mial of the random errors, is computed for current th outer iteration.When reaches , we track the that is identical for longestconsecutive , and record the last element of the consecutive s. Thenthe corresponding and at the th loop are markedas overall error locator polynomial and error evaluator polyno-mial respectively. Finally Forney algorithm is used to calculate

    the error value in each error position with the miscorrection probabilityup to .

    The proposed RiBC algorithm is targeted for correcting burst errorplus some random errors. If the channel condition guarantees that onlysingle long burst of errors occurs, Wu [6] presented a low-complexitysingle long burst of errors correcting (sLBC) algorithm for that case.The sLBC algorithm is a special version of RiBC algorithm, and itsmiscorrection probability is upper bounded by .

    In next section, a unified hybrid decoding architecture that can im-plement RiBC, sLBC and classical random errors and erasures cor-recting (rEEC) algorithm [2] will be presented. Hence at the end of thissection, for readers convenience, we introduce the rEEC algorithm.Detailed description for this algorithm can be found in [2].

    III. PROPOSED UNIFIED HYBRID DECODING ARCHITECTUREThe proposed RiBC algorithm is very effective for correcting com-

    bination of burst errors and random errors (mode-1), while sLBC andrEEC algorithms are well-suited for single burst (mode-2) and randomerrors and erasures (mode-3) correction. By observing the three algo-rithms, it can be founded that they share many common or similar com-putation steps. Based on this interesting similarity, a unified hybrid de-coding (UHD) architecture that is capable of correcting these three dif-ferent types of errors pattern (or called as three work modes) will begiven in this section.

    Fig. 1 shows the overall architecture of UHD decoder. Three typesof lines illustrate data flows for different work modes: solid line formode-1, dashed line for mode-2 and dotted line for mode-3. Differentblocks are used to process different steps. Since SC and CSEE block

  • 1348 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 20, NO. 7, JULY 2012

    Fig. 1. Overall architecture of proposed UHD decoder.

    Fig. 2. Block diagram of -Block.

    have been widely discussed in previous literatures, their architecturesare not discussed in this brief.

    A. -Block Architecture

    -block is used to process steps B1, C1, or D1 in different workmodes. No matter which work mode is selected, the computation of is always carried out as follows:

    where

    denotes or

    , and denotes , , or .

    By inputting

    to the block serially, it can be found that (1) can beimplemented as shown in Fig. 2.

    In Fig. 2, once the required work mode is selected, the left-mostregister is initialized as a specific value. Then after certain number ofclock cycles that depends on the selected mode, each accumulate unitcomputes its corresponding coefficient of . Note that if in mode-3the decoder detects that the current received symbol is not erasured, theinput 0 of multiplexer will be selected.

    B. -Block Architecture

    Steps B2.1 and C3.1 are implemented in -block (Fig. 3). For thesesteps, the common operation is multiply-accumulate for each coeffi-cient of the polynomial. Only a slight difference exists in step C3.1:it is a Chien Search-like step, hence an extra adder tree is required toverify the validity of current received symbol. Notice that-block willbe idle in mode-3.

    1) In mode-1,

    , as the coefficients of , are inputted into eachmultiply-accumulate unit for iterated multiplication. For each instep B2.1, since

    should be maintained within 2 cycles, 3:1 multiplexers areintroduced to help the lower registers keep the coefficients of cur-rent during the above time interval. The value of will in-crease by 1 every cycles. Once increases by 1, after 1cycle, the lower registers will output

    (the coefficients of )to the -block.

    2) In mode-2,

    , as the coefficients of , are selected to be in-putted instead of

    . Then by employing adder tree and zero detect,it takes cycles for -block to find the roots of (step C3.1).

    Fig. 3. Block diagram of -block.

    Fig. 4. Block diagram of -block.

    C. -Block Architecture

    -block is used to execute steps B2.2, C2, and D2. Actually, the

    inherent nature of steps B2.2, C2, and D2 is the multiplication of twopolynomials. This operation can be implemented as shown in Fig. 4.

    In Fig. 4, the initial values of registers are all set to 0, and the oper-ating procedures for three work modes are introduced as follows.

    1) For mode-1, the coefficients of (step B2.2) are serially in-putted into -block. After cycles,

    , as the coeffi-cients of , are stored in the registers.

    2) For mode-2, being different from mode-1 and mode-3, the coef-ficients of (step C2) are concurrently fed into -block, andthen after only 1 cycle,

    , as the coefficients of , are calcu-lated and stored in the registers.

    3) For mode-3, similar with mode-1, the coefficients of (stepD2) are serially inputted into -block. After cycles,

    , as

    the coefficients of , are stored in the registers.

    D. Key Equation Solver (KES) Block ArchitectureIn UHD decoder, KES block is employed to carry out steps B2.4,

    C4, C5, and D4. Fig. 5 presents the overall architecture of KES blockand the internal structure of its two types of processing elements (PE):PE0 and PE1.

    As shown in Fig. 5(a), the KES block consists of PE0s and PE1s. The detailed operating scheme is presented as follows.

    1) For mode-1 (step B2.4), in the th iteration, each register in

    stores the corresponding coefficients of differentpolynomials [see Fig. 5(b)(c)]. For each outer iteration, it takes cycles to compute

    and

    as the coefficients of

    and . Meanwhile,

    will also be computedand outputted into PT block to track the longest consecutive

    that are identical.2) For mode-2, as aforementioned, KES block is arranged to carry

    out steps C4 and C5. Accordingly, both of the initial values inregisters and input signals are different from those in mode-1, andthey are operated based on the following schedule:

    i) First, PE0s compute step C4

    . In each

    , the second uppermostregister (denoted as group A) is initialized with

    ; in ad-dition, Ctrl1 and

    are always set to 1 and 0, respectively.

  • IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 20, NO. 7, JULY 2012 1349

    Fig. 5. (a) Overall architecture of KES block. (b) The block diagram of .(c) The block diagram of .

    Then after cycles, these registers in group A juststore the coefficients of polynomial .

    ii) The successive step C5

    is carried out by PE1s. In each

    , the uppermostregister (denoted as group B) is initialized with 0, whilethe initial value in the third uppermost register (denoted asgroup C) is

    , meanwhile Ctrl and are alwaysset to 0 and 1. Additionally in the th cycle,

    is set to

    for 0 . Then after cycles registersin group B store the coefficients of . Notice that here

    stores

    .

    3) For mode-3, since step D4 has similar form with step B2.4, therEEC algorithm can be directly carried out by KES block by re-placing

    by

    in each

    s. Note that in this case only halfof the hardware component of each PE0/PE1 is utilized. Therefore

    Fig. 6. Architecture of PT block for mode-1.

    it can be derived that the total throughput would be improved bytwice if two independent codewords are inputted.

    E. Position Track (PT) Block ArchitecturePT block is used to track the longest consecutive polynomials that

    are identical (step B3) or positions of roots (steps C3.2 and C3.3)1) Fig. 6 illustrates the architecture of PT block for mode-1. The in-

    putted

    , and

    from KES block at the th outer iter-ation are denoted as

    , and

    . In addition,

    (temp)represents

    , while

    (store) are the coefficients of currentcontinuously identical . Moreover,

    (longest) stores thecoefficients of current longest continuously identical .Control signals shift and equal are generated from Schedule A.After reaches ,

    (longest) and

    (longest) are outputted as thecoefficients of overall error locator polynomial and overallerror evaluator polynomial .

    2) For mode-2, Schedule B is proposed to calculate the single burstsstarting position and its length in sLBC algorithm. Notice thatsince finding roots of has been implemented in-block (seeFig. 3), there is no need for PT block to carry out this functionany more, but just receiving the signal outputted from -blockwhich indicates whether or not. Then it is feasiblefor PT block to implement Schedule B with a simple control unit.Hence the extra architecture of PT block for executing ScheduleB is omitted in this section.

    F. Timing Chart of Proposed RS DecoderThe timing charts for three modes are illustrated in Fig. 7. Their

    latency in worst case are , and cycles,respectively (excluding SC and CSEE blocks).

    IV. HARDWARE PERFORMANCEIn this section, the hardware and error correction performance of the

    proposed UHD decoder for an example RS (255, 239) code will begiven.

    Table I presents the comparison between the proposed UHD andRiBM decoders. Here for the employed RS (255, 239) code,

    , and . The hardware complexity is estimated basedon the work in [8] and the throughput has been scaled properly. Al-though the area requirement of the UHD decoder is about 1.7 timesof that of the RiBM decoder, the UHD decoder can achieve signifi-cantly enhanced burst-error correcting capability with multiple workmodes. In the channel environments that likely generate long burst oferrors , such as high-density storage systems, the traditionalRiBM decoder fails to decode the codewords for its limited error cor-recting capability, while UHD decoder can be still effective (mode-1and mode-2). For random error-and-erasure correction (mode-3), theproposed UHD design has lower throughput than RiBM. However,

  • 1350 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 20, NO. 7, JULY 2012

    Fig. 7. Timing charts for different work modes: (a) mode-1, (b) mode-2, and(c) mode-3.

    TABLE ICOMPARISONS OF PERFORMANCE ON HARDWARE AND ERROR CORRECTION

    considering only half resource of KES block is utilized, if one ad-ditional copy of SC, CSEE, FIFO, and -blocks are employed, itsthroughput can be approximately doubled by inputting two indepen-dent codewords into the decoder, which will outperform RiBM archi-tecture significantly.

    Being the first RS decoder that is capable of correcting both of bursterrors and random errors, the proposed UHD design provides an ef-ficient and attractive unified solution for multi-mode RS decoding inpractical applications. The proposed three work modes cover differentapplications: mode-1 can be used for applications of low or moderate

    data rates (e.g., ADSL and DVB-T etc.); mode-2 is suitable for themedium to high speed (e.g., 12 Gbps) systems, and mode-3 is a goodchoice for very high-speed optical communication.

    V. CONCLUSION

    In this brief, a high-speed RiBC algorithm for RS code burst-errorcorrecting, and a UHD architecture that can support three different de-coding modes are proposed. Comparison results show that the UHDdecoder can achieve enhanced capability of correcting long burst of er-rors with good hardware efficiency.

    REFERENCES[1] D. V. Sarwate and N. R. Shanbhag, High-speed architectures for

    Reed-Solomon decoders, IEEE Trans. Very Large Scale Integr.(VLSI) Syst., vol. 9, no. 5, pp. 641655, Oct. 2001.

    [2] T. Zhang and K. K. Parhi, On the high-speed VLSI implementation oferrors-and-erasures correcting Reed-Solomon decoders, in Proc. ACMGreat Lake Symp. VLSI (GLVLSI), 2002, pp. 8993.

    [3] Z. Wang and J. Ma, High-speed interpolation architecture forsoft-decision decoding of Reed-Solomon codes, IEEE Trans. VeryLarge Scale Integr. (VLSI) Syst., vol. 14, no. 9, pp. 937950, Sep.2006.

    [4] E. Dawson and A. Khodkar, Burst error-correcting algorithm forReed-Solomon codes, Electron. Lett., vol. 31, pp. 848849, 1995.

    [5] L. Yin, J. Lu, K. B. Letaief, and Y. Wu, Burst-error-correcting algo-rithm for Reed-Solomon codes, Electron. Lett., vol. 37, no. 11, pp.695697, May 2001.

    [6] Y. Wu, Novel burst error correcting algorithms for Reed-Solomoncodes, in Proc. IEEE Allerton Conf. Commun., Control, Comput.,2009, pp. 10471052.

    [7] S. Shamshiri and K.-T. Cheng, Error-locality-aware linear coding tocorrect multi-bit upsets in SRAMs, in Proc. IEEE Int. Test Conf., 2010,pp. 110.

    [8] X. Zhang and J. Zhu, High-throughput interpolation architecture foralgebraic soft-decision Reed-Solomon decoding, IEEE Trans. CircuitsSyst. I, Reg. Papers, vol. 57, no. 3, pp. 581591, Mar. 2010.