Unified Architecture for Reed-Solomon Decoder Combined With Burst-Error Correction

1346 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 20, NO. 7, JULY 2012

[19] SUN Microsystem, Santa Clara, CA, UltraSPARC T2 and T2 PlusProcessors, 2007. [Online]. Available: http://www.sun.com/proces-sors/UltraSPARC-T2/

[20] SUN Microsystem, Santa Clara, CA, OpenSPARC T2, 2007. [On-line]. Available: http://www.opensparc.net/opensparc-t2/index.html

[21] Synopsys, Inc., San Jose, CA, Design compiler reference manual,V.Z-2007.03, 2007.

[22] D. Sylvester and K. Keutzer, System-level performance modelingwith BACPACberkeley advanced chip performance calculator, inProc. IEEE SLIP, 1999, pp. 109114.

Unified Architecture for Reed-Solomon Decoder CombinedWith Burst-Error Correction

Li Li, Bo Yuan, Zhongfeng Wang, Jin Sha, Hongbing Pan, andWeishan Zheng

AbstractReed-Solomon (RS) codes are widely used as forward correc-tion codes (FEC) in digital communication and storage systems. Correctingrandom errors of RS codes have been extensively studied in both academiaand industry. However, for burst-error correction, the research is still quitelimited due to its ultra high computation complexity. In this brief, startingfrom a recent theoretical work, a low-complexity reformulated inversion-less burst-error correcting (RiBC) algorithm is developed for practical ap-plications. Then, based on the proposed algorithm, a unified VLSI archi-tecture that is capable of correcting burst errors, as well as random errorsand erasures, is firstly presented for multi-mode decoding requirements.This new architecture is denoted as unified hybrid decoding (UHD) ar-chitecture. It will be shown that, being the first RS decoder owning en-hanced burst-error correcting capability, it can achieve significantly im-proved error correcting capability than traditional hard-decision decoding(HDD) design.

Index TermsBurst errors, Reed-Solomon (RS) codes, unified architec-ture, VLSI.

I. INTRODUCTION

Reed-Solomon (RS) codes have been widely employed for error cor-rection in modern digital communication and data storage systems.Similar with other forward correction codes (FEC), when using RScodes as channel coding, the errors occurred in transmission procedureare typically divided into random errors and burst errors. Currently,for decoding RS codes with random-error correction, numerous liter-atures have given extensive studies on theoretical algorithms as wellas hardware implementations [1][3], [8]. However, for specific RSburst-error decoder design, although some dedicated algorithms had

Manuscript received August 04, 2010; revised December 11, 2010 and April04, 2011; accepted April 20, 2011. Date of publication June 16, 2011; date ofcurrent version June 01, 2012. This work is was supported in part by the Na-tional Nature Science Foundation of China under Grant 60876017 and Grant61006018, by the Joint Prospective Funds for Production, Education, and Re-search of Jiangsu Province under Grant 2009146, by the Fundamental ResearchFunds for the Central Universities under Grant 1095021031. This paper waspresented in part at the 8th IEEE International Conference on ASIC, Changsha,China, October 2009.

L. Li, B. Yuan, J. Sha, H. Pan, and W. Zheng are with the In-stitute of VLSI Design, Nanjing University, Nanjing 210093, China(e-mail: [email protected]; [email protected]; [email protected];[email protected]; [email protected]).

Z. Wang is with Broadcom Corporation, Irvine, CA 92617 USA (e-mail:[email protected]).

Digital Object Identifier 10.1109/TVLSI.2011.2154369

been reported [4], [5], the VLSI implementations for these burst-errorcorrecting algorithms are still under-investigated, which are limitedby their cubic computation complexity , where and arelength of codeword and traditional correction capability respectively.Recently, a novel low-complexity RS burst-error correcting algorithmthat only requires computation was proposed by Wu [6]. It canbe proved that the algorithm is capable of correcting a long burst oferrors together with possible random errors.

In this brief, developed from the above new algorithm, a high-speedreformulated inversionless burst-error correcting (RiBC) algorithmis proposed, and a unified hybrid decoding (UHD) architecture thatsupports three decoding modes is presented for the first time. It will beshown that, compared with traditional RS decoder, the proposed UHDarchitecture can achieve significantly better burst-error correctingcapability.

The structure of this paper is organized as follows. Section II de-scribes the proposed RiBC algorithm. The architecture and latency ofnew UHD decoder are presented in Section III. Section IV provideshardware performance and comparison. Final conclusion is drawn inSection V.

II. PROPOSED RIBC ALGORITHM

A. Original Burst-Error Correcting (BC) AlgorithmIn [6], Wu proposed a new approach to track the position of burst of

errors. By introducing a new polynomial that is a special linear func-tion of syndromes, Wu proved that the desired single burst of errorscan be acquired by tracking the longest consecutive roots of new poly-nomial. Furthermore, that approach was extended to BC algorithm forcorrecting a long burst of errors with length up to plus amaximum of random errors.

In the BC algorithm, is a pre-chosen parameter that determinesthe specific error correcting capability. It indicates that the decoder is

1063-8210/$26.00 2011 IEEE

IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 20, NO. 7, JULY 2012 1347

capable of correcting a -length burst of errors plusa maximum of random errors. In this case, the miscorrection proba-bility is upper bounded by . Readers arestrongly recommended to refer to [6] for detailed descriptions of BCalgorithm.

B. Proposed Reformulated Inversionless Burst-Error Correcting(RiBC) Algorithm

Although BC algorithm has reduced computation complexity, somedisadvantages impedes its efficient VLSI design: 1) the inversion oper-ation exists in step A2.2.4; 2) computation in step A2.2.1 and A2.2.2contains long data path and data dependency; 3) for calculating (step A4), extra cycles or another copy of original circuitry arerequired.

To resolve the above problems, by applying a similar arithmetictransformation presented in [1], we reformulate BC algorithm to theproposed RiBC algorithm.

The RiBC algorithm is a kind of list decoding algorithm. Eight poly-nomials are updated simultaneously in each iteration. After every inner iterations, , as the candidate of the error locator polyno-mial of the random errors, is computed for current th outer iteration.When reaches , we track the that is identical for longestconsecutive , and record the last element of the consecutive s. Thenthe corresponding and at the th loop are markedas overall error locator polynomial and error evaluator polyno-mial respectively. Finally Forney algorithm is used to calculate

the error value in each error position with the miscorrection probabilityup to .

The proposed RiBC algorithm is targeted for correcting burst errorplus some random errors. If the channel condition guarantees that onlysingle long burst of errors occurs, Wu [6] presented a low-complexitysingle long burst of errors correcting (sLBC) algorithm for that case.The sLBC algorithm is a special version of RiBC algorithm, and itsmiscorrection probability is upper bounded by .

In next section, a unified hybrid decoding architecture that can im-plement RiBC, sLBC and classical random errors and erasures cor-recting (rEEC) algorithm [2] will be presented. Hence at the end of thissection, for readers convenience, we introduce the rEEC algorithm.Detailed description for this algorithm can be found in [2].

III. PROPOSED UNIFIED HYBRID DECODING ARCHITECTUREThe proposed RiBC algorithm is very effective for correcting com-

bination of burst errors and random errors (mode-1), while sLBC andrEEC algorithms are well-suited for single burst (mode-2) and randomerrors and erasures (mode-3) correction. By observing the three algo-rithms, it can be founded that they share many common or similar com-putation steps. Based on this interesting similarity, a unified hybrid de-coding (UHD) architecture that is capable of correcting these three dif-ferent types of errors pattern (or called as three work modes) will begiven in this section.

Fig. 1 shows the overall architecture of UHD decoder. Three typesof lines illustrate data flows for different work modes: solid line formode-1, dashed line for mode-2 and dotted line for mode-3. Differentblocks are used to process different steps. Since SC and CSEE block


Fig. 1. Overall architecture of proposed UHD decoder.

Fig. 2. Block diagram of -Block.

have been widely discussed in previous literatures, their architecturesare not discussed in this brief.

A. -Block Architecture

-block is used to process steps B1, C1, or D1 in different workmodes. No matter which work mode is selected, the computation of is always carried out as follows:

where

denotes or

, and denotes , , or .

By inputting

to the block serially, it can be found that (1) can beimplemented as shown in Fig. 2.

In Fig. 2, once the required work mode is selected, the left-mostregister is initialized as a specific value. Then after certain number ofclock cycles that depends on the selected mode, each accumulate unitcomputes its corresponding coefficient of . Note that if in mode-3the decoder detects that the current received symbol is not erasured, theinput 0 of multiplexer will be selected.

B. -Block Architecture

Steps B2.1 and C3.1 are implemented in -block (Fig. 3). For thesesteps, the common operation is multiply-accumulate for each coeffi-cient of the polynomial. Only a slight difference exists in step C3.1:it is a Chien Search-like step, hence an extra adder tree is required toverify the validity of current received symbol. Notice that-block willbe idle in mode-3.

1) In mode-1,

, as the coefficients of , are inputted into eachmultiply-accumulate unit for iterated multiplication. For each instep B2.1, since

should be maintained within 2 cycles, 3:1 multiplexers areintroduced to help the lower registers keep the coefficients of cur-rent during the above time interval. The value of will in-crease by 1 every cycles. Once increases by 1, after 1cycle, the lower registers will output

(the coefficients of )to the -block.

2) In mode-2,

, as the coefficients of , are selected to be in-putted instead of

. Then by employing adder tree and zero detect,it takes cycles for -block to find the roots of (step C3.1).

Fig. 3. Block diagram of -block.

Fig. 4. Block diagram of -block.

C. -Block Architecture

-block is used to execute steps B2.2, C2, and D2. Actually, the

inherent nature of steps B2.2, C2, and D2 is the multiplication of twopolynomials. This operation can be implemented as shown in Fig. 4.

In Fig. 4, the initial values of registers are all set to 0, and the oper-ating procedures for three work modes are introduced as follows.

1) For mode-1, the coefficients of (step B2.2) are serially in-putted into -block. After cycles,

, as the coeffi-cients of , are stored in the registers.

2) For mode-2, being different from mode-1 and mode-3, the coef-ficients of (step C2) are concurrently fed into -block, andthen after only 1 cycle,

, as the coefficients of , are calcu-lated and stored in the registers.

3) For mode-3, similar with mode-1, the coefficients of (stepD2) are serially inputted into -block. After cycles,

, as

the coefficients of , are stored in the registers.

D. Key Equation Solver (KES) Block ArchitectureIn UHD decoder, KES block is employed to carry out steps B2.4,

C4, C5, and D4. Fig. 5 presents the overall architecture of KES blockand the internal structure of its two types of processing elements (PE):PE0 and PE1.

As shown in Fig. 5(a), the KES block consists of PE0s and PE1s. The detailed operating scheme is presented as follows.

1) For mode-1 (step B2.4), in the th iteration, each register in

stores the corresponding coefficients of differentpolynomials [see Fig. 5(b)(c)]. For each outer iteration, it takes cycles to compute

and

as the coefficients of

and . Meanwhile,

will also be computedand outputted into PT block to track the longest consecutive

that are identical.2) For mode-2, as aforementioned, KES block is arranged to carry

out steps C4 and C5. Accordingly, both of the initial values inregisters and input signals are different from those in mode-1, andthey are operated based on the following schedule:

i) First, PE0s compute step C4

. In each

, the second uppermostregister (denoted as group A) is initialized with

; in ad-dition, Ctrl1 and

are always set to 1 and 0, respectively.

IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 20, NO. 7, JULY 2012 1349

Fig. 5. (a) Overall architecture of KES block. (b) The block diagram of .(c) The block diagram of .

Then after cycles, these registers in group A juststore the coefficients of polynomial .

ii) The successive step C5

is carried out by PE1s. In each

, the uppermostregister (denoted as group B) is initialized with 0, whilethe initial value in the third uppermost register (denoted asgroup C) is

, meanwhile Ctrl and are alwaysset to 0 and 1. Additionally in the th cycle,

is set to

for 0 . Then after cycles registersin group B store the coefficients of . Notice that here

stores

.

3) For mode-3, since step D4 has similar form with step B2.4, therEEC algorithm can be directly carried out by KES block by re-placing

by

in each

s. Note that in this case only halfof the hardware component of each PE0/PE1 is utilized. Therefore

Fig. 6. Architecture of PT block for mode-1.

it can be derived that the total throughput would be improved bytwice if two independent codewords are inputted.

E. Position Track (PT) Block ArchitecturePT block is used to track the longest consecutive polynomials that

are identical (step B3) or positions of roots (steps C3.2 and C3.3)1) Fig. 6 illustrates the architecture of PT block for mode-1. The in-

putted

, and

from KES block at the th outer iter-ation are denoted as

, and

. In addition,

(temp)represents

, while

(store) are the coefficients of currentcontinuously identical . Moreover,

(longest) stores thecoefficients of current longest continuously identical .Control signals shift and equal are generated from Schedule A.After reaches ,

(longest) and

(longest) are outputted as thecoefficients of overall error locator polynomial and overallerror evaluator polynomial .

2) For mode-2, Schedule B is proposed to calculate the single burstsstarting position and its length in sLBC algorithm. Notice thatsince finding roots of has been implemented in-block (seeFig. 3), there is no need for PT block to carry out this functionany more, but just receiving the signal outputted from -blockwhich indicates whether or not. Then it is feasiblefor PT block to implement Schedule B with a simple control unit.Hence the extra architecture of PT block for executing ScheduleB is omitted in this section.

F. Timing Chart of Proposed RS DecoderThe timing charts for three modes are illustrated in Fig. 7. Their

latency in worst case are , and cycles,respectively (excluding SC and CSEE blocks).

IV. HARDWARE PERFORMANCEIn this section, the hardware and error correction performance of the

proposed UHD decoder for an example RS (255, 239) code will begiven.

Table I presents the comparison between the proposed UHD andRiBM decoders. Here for the employed RS (255, 239) code,

, and . The hardware complexity is estimated basedon the work in [8] and the throughput has been scaled properly. Al-though the area requirement of the UHD decoder is about 1.7 timesof that of the RiBM decoder, the UHD decoder can achieve signifi-cantly enhanced burst-error correcting capability with multiple workmodes. In the channel environments that likely generate long burst oferrors , such as high-density storage systems, the traditionalRiBM decoder fails to decode the codewords for its limited error cor-recting capability, while UHD decoder can be still effective (mode-1and mode-2). For random error-and-erasure correction (mode-3), theproposed UHD design has lower throughput than RiBM. However,


Fig. 7. Timing charts for different work modes: (a) mode-1, (b) mode-2, and(c) mode-3.

TABLE ICOMPARISONS OF PERFORMANCE ON HARDWARE AND ERROR CORRECTION

considering only half resource of KES block is utilized, if one ad-ditional copy of SC, CSEE, FIFO, and -blocks are employed, itsthroughput can be approximately doubled by inputting two indepen-dent codewords into the decoder, which will outperform RiBM archi-tecture significantly.

Being the first RS decoder that is capable of correcting both of bursterrors and random errors, the proposed UHD design provides an ef-ficient and attractive unified solution for multi-mode RS decoding inpractical applications. The proposed three work modes cover differentapplications: mode-1 can be used for applications of low or moderate

data rates (e.g., ADSL and DVB-T etc.); mode-2 is suitable for themedium to high speed (e.g., 12 Gbps) systems, and mode-3 is a goodchoice for very high-speed optical communication.

V. CONCLUSION

In this brief, a high-speed RiBC algorithm for RS code burst-errorcorrecting, and a UHD architecture that can support three different de-coding modes are proposed. Comparison results show that the UHDdecoder can achieve enhanced capability of correcting long burst of er-rors with good hardware efficiency.

REFERENCES[1] D. V. Sarwate and N. R. Shanbhag, High-speed architectures for

Reed-Solomon decoders, IEEE Trans. Very Large Scale Integr.(VLSI) Syst., vol. 9, no. 5, pp. 641655, Oct. 2001.

[2] T. Zhang and K. K. Parhi, On the high-speed VLSI implementation oferrors-and-erasures correcting Reed-Solomon decoders, in Proc. ACMGreat Lake Symp. VLSI (GLVLSI), 2002, pp. 8993.

[3] Z. Wang and J. Ma, High-speed interpolation architecture forsoft-decision decoding of Reed-Solomon codes, IEEE Trans. VeryLarge Scale Integr. (VLSI) Syst., vol. 14, no. 9, pp. 937950, Sep.2006.

[4] E. Dawson and A. Khodkar, Burst error-correcting algorithm forReed-Solomon codes, Electron. Lett., vol. 31, pp. 848849, 1995.

[5] L. Yin, J. Lu, K. B. Letaief, and Y. Wu, Burst-error-correcting algo-rithm for Reed-Solomon codes, Electron. Lett., vol. 37, no. 11, pp.695697, May 2001.

[6] Y. Wu, Novel burst error correcting algorithms for Reed-Solomoncodes, in Proc. IEEE Allerton Conf. Commun., Control, Comput.,2009, pp. 10471052.

[7] S. Shamshiri and K.-T. Cheng, Error-locality-aware linear coding tocorrect multi-bit upsets in SRAMs, in Proc. IEEE Int. Test Conf., 2010,pp. 110.

[8] X. Zhang and J. Zhu, High-throughput interpolation architecture foralgebraic soft-decision Reed-Solomon decoding, IEEE Trans. CircuitsSyst. I, Reg. Papers, vol. 57, no. 3, pp. 581591, Mar. 2010.

Unified Architecture for Reed-Solomon Decoder Combined With Burst-Error Correction

Documents

Transcript of Unified Architecture for Reed-Solomon Decoder Combined With Burst-Error Correction