New Multiple-Bit Parity-Based Concurrent Fault Detection … · 2019. 7. 21. · Multiple-Bit...

15
Multiple-Bit Parity-Based Concurrent Fault Detection Architecture for Parallel CRC Computation Dipanwita Gangopadhyay and Arash Reyhani-Masoleh, Member, IEEE Abstract—As a result of huge advancements in VLSI technology, more and more complex circuits are being implemented making not only the whole digital system more prone to faults, but also the fault detector itself susceptible to faults resulting in the requirement of concurrent fault detection architecture of the encoders and decoders. In this paper, we present a multiple-bit parity-based fault detection architecture for parallel CRC computation. After analyzing the parallel implementation of CRC, we present a formulation to generate a multiple-bit parity prediction structure to incorporate the fault detection architecture. Using the formulations of digit level CRC architecture, the checksum is divided into few blocks and predicted multiple-bit parity of the blocks are compared with the actual parity bits. Finally, with the help of software simulation and ASIC implementation, we show that the proposed scheme is highly efficient in terms of fault detection capability whereas it involves small area and time overhead. As an example, we have shown that the worst case area overhead is 25:7 percent for CRC32 with four parity bits, and corresponding time overhead is 15:6 percent. Index Terms—CRC, parity prediction, fault detection, multiple-bit Ç 1 INTRODUCTION T HE recent advancements in wireless technology have brought forth very high speed transmission which faces a major challenge from noisy channels. Factors such as sig- nal attenuation, multiple access interference, inter symbol interference and Doppler shift degrade the quality of data to a great extent. As a result, the received signal has a chance of getting corrupted. Consequently, for the allevia- tion of this problem, error control coding is not only desir- able, but has become a must to achieve an acceptable Bit Error Rate (BER). For the purpose of checking the integrity of the data being stored or sent over noisy channels, the most widely adopted among all the error control codes is Cyclic Redundancy Check Code (CRC). CRCs, first intro- duced by Peterson and Brown in 1961 [1], are used for the detection of any corruption of the digital signal during its production, transmission, processing as well as storage stage. Recent wireless technologies namely, Asynchronous Transfer Mode (ATM) [2], IEEE communication standards, such as wired Ethernet (IEEE 802.3) [3], Wi-Fi (IEEE 802.11) [4] and WiMAX (IEEE 802.16) [5], employ CRC for error detection purpose. A very well written understanding of the CRC computa- tion and collection of most of the published hardware and software implementations of CRC can be found in [6], [7] and [8]. Various parallel hardware structures have been described in [9], [10], [11], [12], while [13] talks about cascading. Several literature have proposed highly efficient architecture based on performance enhancement [14], [17] as well as resource utilization [18]. The list also includes Two-Step [19], Cascade [13], Look-Ahead [12], State-Space Transformed [20], and Retimed Architectures [14], [25]. The authors of [11] have proposed a three-step LFSR architec- ture which makes it easy to address the two major issues of large fanout limitation and iteration bound limitation in parallel LFSR architecture which is an integral part of CRC computation. In [15], the authors have proposed an improved solution for the fanout problem. The authors of [15] have shown that the proposed architecture achieves sig- nificant improvement in terms of processing speed and hardware efficiency. In [16], the authors have extended the above idea and presented a mathematical proof showing that a transformation exists in state space that can reduce the complexity of the parallel LFSR feedback loop along with a new idea for high speed parallel implementation of linear feedback shift registers based upon parallel IIR filter. In [32], the authors have proposed a novel parallel imple- mentation of long BCH encoders to achieve significant speedup by eliminating the effect of large fanout. The authors have mentioned that similar technique can also be used for CRC. Rapid advancement of semiconductor technologies results in rapidly increasing soft error rates [26], [27], [28] due to shrink in area, time and power. Temporary faults caused by cosmic radiation, known as single event upset (SEU) [29], are the main sources of errors. A lot of attention has been paid for the reliability improvement of all the ele- mentary components reflected by their much lower failure rates [30]. But the large number and dense population of these elements on the chip degrade the reliability to a large extent. While error detection codes and error correction codes have long been used for checking errors, the reliability The authors are with the Department of Electrical and Computer Engineer- ing, The University of Western Ontario, London, ON, Canada N6A 5B9. E-mail: {dgangop, areyhani}@uwo.ca. Manuscript received 17 Feb. 2014; revised 26 May 2015; accepted 9 June 2015. Date of publication 16 Sept. 2015; date of current version 15 June 2016. Recommended for acceptance by D. Kagaris. For information on obtaining reprints of this article, please send e-mail to: [email protected], and reference the Digital Object Identifier below. Digital Object Identifier no. 10.1109/TC.2015.2479617 IEEE TRANSACTIONS ON COMPUTERS, VOL. 65, NO. 7, JULY 2016 2143 0018-9340 ß 2015 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

Transcript of New Multiple-Bit Parity-Based Concurrent Fault Detection … · 2019. 7. 21. · Multiple-Bit...

Page 1: New Multiple-Bit Parity-Based Concurrent Fault Detection … · 2019. 7. 21. · Multiple-Bit Parity-Based Concurrent Fault Detection Architecture for Parallel CRC Computation Dipanwita

Multiple-Bit Parity-Based Concurrent FaultDetection Architecture for Parallel CRC

ComputationDipanwita Gangopadhyay and Arash Reyhani-Masoleh,Member, IEEE

Abstract—As a result of huge advancements in VLSI technology, more and more complex circuits are being implemented making not

only the whole digital system more prone to faults, but also the fault detector itself susceptible to faults resulting in the requirement of

concurrent fault detection architecture of the encoders and decoders. In this paper, we present a multiple-bit parity-based fault

detection architecture for parallel CRC computation. After analyzing the parallel implementation of CRC, we present a formulation to

generate a multiple-bit parity prediction structure to incorporate the fault detection architecture. Using the formulations of digit level

CRC architecture, the checksum is divided into few blocks and predicted multiple-bit parity of the blocks are compared with the actual

parity bits. Finally, with the help of software simulation and ASIC implementation, we show that the proposed scheme is highly efficient

in terms of fault detection capability whereas it involves small area and time overhead. As an example, we have shown that the worst

case area overhead is 25:7 percent for CRC�32 with four parity bits, and corresponding time overhead is 15:6 percent.

Index Terms—CRC, parity prediction, fault detection, multiple-bit

Ç

1 INTRODUCTION

THE recent advancements in wireless technology havebrought forth very high speed transmission which faces

a major challenge from noisy channels. Factors such as sig-nal attenuation, multiple access interference, inter symbolinterference and Doppler shift degrade the quality of datato a great extent. As a result, the received signal has achance of getting corrupted. Consequently, for the allevia-tion of this problem, error control coding is not only desir-able, but has become a must to achieve an acceptable BitError Rate (BER). For the purpose of checking the integrityof the data being stored or sent over noisy channels, themost widely adopted among all the error control codes isCyclic Redundancy Check Code (CRC). CRCs, first intro-duced by Peterson and Brown in 1961 [1], are used for thedetection of any corruption of the digital signal during itsproduction, transmission, processing as well as storagestage. Recent wireless technologies namely, AsynchronousTransfer Mode (ATM) [2], IEEE communication standards,such as wired Ethernet (IEEE 802.3) [3], Wi-Fi (IEEE 802.11)[4] and WiMAX (IEEE 802.16) [5], employ CRC for errordetection purpose.

A very well written understanding of the CRC computa-tion and collection of most of the published hardware andsoftware implementations of CRC can be found in [6], [7]and [8]. Various parallel hardware structures have beendescribed in [9], [10], [11], [12], while [13] talks about

cascading. Several literature have proposed highly efficientarchitecture based on performance enhancement [14], [17]as well as resource utilization [18]. The list also includesTwo-Step [19], Cascade [13], Look-Ahead [12], State-SpaceTransformed [20], and Retimed Architectures [14], [25]. Theauthors of [11] have proposed a three-step LFSR architec-ture which makes it easy to address the two major issues oflarge fanout limitation and iteration bound limitation inparallel LFSR architecture which is an integral part of CRCcomputation. In [15], the authors have proposed animproved solution for the fanout problem. The authors of[15] have shown that the proposed architecture achieves sig-nificant improvement in terms of processing speed andhardware efficiency. In [16], the authors have extended theabove idea and presented a mathematical proof showingthat a transformation exists in state space that can reducethe complexity of the parallel LFSR feedback loop alongwith a new idea for high speed parallel implementation oflinear feedback shift registers based upon parallel IIR filter.In [32], the authors have proposed a novel parallel imple-mentation of long BCH encoders to achieve significantspeedup by eliminating the effect of large fanout. Theauthors have mentioned that similar technique can also beused for CRC.

Rapid advancement of semiconductor technologiesresults in rapidly increasing soft error rates [26], [27], [28]due to shrink in area, time and power. Temporary faultscaused by cosmic radiation, known as single event upset(SEU) [29], are the main sources of errors. A lot of attentionhas been paid for the reliability improvement of all the ele-mentary components reflected by their much lower failurerates [30]. But the large number and dense population ofthese elements on the chip degrade the reliability to a largeextent. While error detection codes and error correctioncodes have long been used for checking errors, the reliability

� The authors are with the Department of Electrical and Computer Engineer-ing, The University of Western Ontario, London, ON, Canada N6A 5B9.E-mail: {dgangop, areyhani}@uwo.ca.

Manuscript received 17 Feb. 2014; revised 26 May 2015; accepted 9 June 2015.Date of publication 16 Sept. 2015; date of current version 15 June 2016.Recommended for acceptance by D. Kagaris.For information on obtaining reprints of this article, please send e-mail to:[email protected], and reference the Digital Object Identifier below.Digital Object Identifier no. 10.1109/TC.2015.2479617

IEEE TRANSACTIONS ON COMPUTERS, VOL. 65, NO. 7, JULY 2016 2143

0018-9340� 2015 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

Page 2: New Multiple-Bit Parity-Based Concurrent Fault Detection … · 2019. 7. 21. · Multiple-Bit Parity-Based Concurrent Fault Detection Architecture for Parallel CRC Computation Dipanwita

of the error correction or detection encoder itself has come upas a major issue. High speed data transmission requires highspeed error detection and parallel processing at the cost ofincreased hardware, which increases the probability of erroroccurrence of internal faults. As a result, fault detection ofencoder and decoder have gained a lot of importance.

In [31], the authors have addressed the problem ofdesigning parallel fault-secure encoders by generating notonly error correcting check bits, but also independently andin parallel, error detecting check bits. The next step involvesthe comparison of error detection check bits with anotherset of error detecting check bits generated from error correc-tion check bits. This design is significantly cost effectivecompared to duplication with comparison technique. Thereare several other literature which deal with the similar topic.In [34], the authors have discussed the design of a faultsecure encoder in between a fault secure RAM and a faultychannel. Error detection and error correction using ResidueNumber System have been discussed in [35]. In [36], wefind the description of fault detection of Reed Solomonencoder, while [37] covers both Reed Solomon encoders anddecoders. In both [36] and [37], the area overhead betweenthe self checking implementation and without the selfchecking capability is mentioned as 50 percent.

In [23], the authors have presented a new approach forthe automated synthesis of multilevel circuits with concur-rent error detection using a parity-check code. They havedeveloped a new procedure namely structure-constrainedlogic optimization and proved that this design along with achecker forms a self-checking circuit. In [24], the authorshave addressed the issue of developing reliability drivenlogic synthesis algorithms by proposing three schemes. Thefirst scheme is duplication and comparison, the secondbeing utilizing Berger code and the third scheme involvespartitioning the outputs of the circuit and concurrent errordetection based on parity code. In this third scheme, theauthors have partitioned the circuit into groups and gener-ated the parity signals in such a way that only one outputsignal from each group influences a parity signal. Theyhave considered non-overlapping multilevel circuits for thesynthesis of outputs in each partition and the parity signals.Our proposed method of the multiple bit parity based con-current fault detection scheme for parallel CRC computa-tion inherently exploits the non-overlapping feature of theparity generation circuits.

In this paper, we have presented a design concept of amultiple-bit parity-based concurrent fault detection for par-allel CRC architecture. In the proposed scheme, the gener-ated CRC output is divided into a few blocks based on thenumber of parity bits and has been made to go through amultiple-bit parity comparison unit. Then, the actual paritybits are compared with the predicted multiple-bit parity ofthe blocks to incorporate the fault detection architecture.We note that to the best of our knowledge, this paper isthe first study specific to parallel CRC computation to gen-erate concurrent fault detection using multiple-bit parity.The contribution of this paper is summarized as follows:

� First, we have derived the formulation for the multi-ple-bit parity-based fault detection of parallel CRCarchitecture for the case l � m when l is the number

of input message bits processed in each iteration andm is the degree of the CRC generator polynomialwhich defines the underlying CRC code. Next, afterextending the matrix based formulation of parallelCRC structure for l < m, we have proposed the con-current fault detection formulation for this case too.

� Having derived all the required formulations, wehave presented the hardware architecture based onthe proposed formulation for both the above cases.

� We have discussed a thorough theoretical error andfault analysis using detailed error and fault modelsalong with the investigation of the influence of thegenerator polynomial on the error propagation. Wehave also formulated the error detection capabilityof the proposed scheme covering all the possible2m � 1 errors. Furthermore, we have done severalfault detection simulation using C codes for variousCRC polynomials and different values of the numberof parity bits. The simulation results presented in thetabular form confirm the significantly high faultdetection capability of the proposed scheme.

� Finally, we have presented a theoretical complexityanalysis showing the low required overhead of theproposed scheme. We have done ASIC implementa-tion using Verilog code to confirm these small valuesof the theoretical complexity overheads.

The remainder of this paper is organized as follows. Firstin Section 2, preliminary ideas regarding parallel CRC havebeen provided which are essential for the understanding ofthe proposed scheme. Next in Section 3, we propose multi-ple-bit parity prediction formulations for parallel CRC for allthe three possible cases i.e., l > m, l ¼ m and l < m. In thissection we also provide the architecture of our proposedscheme for all these cases. In Section 4, we present our analy-sis and simulation results followed by complexity analysis inSection 5. Finally, we conclude the paper in Section 6.

2 PRELIMINARIES

In this section the basic concept of CRC computation isdescribed followed by a brief description of parallel CRCformulations. Then, matrix multiplication based parallelCRC architecture proposed in [17] is discussed. We havespecially emphasized on this architecture since our pro-posed concurrent fault detection architecture described inthe next section, is based on this architecture. It is noted thatthe proposed scheme can be applicable on the other parallelCRC structures as well. In case of other parallel CRC archi-tecture, from the syndrome equation, using a similarapproach as shown in the proof of Lemma 1, one can derivethe necessary equation required to design the fault detectionunit. For the parallel structure which uses matrix formula-tion, our approach can be directly used. Parallel CRCcomputations proposed in [9], [11], [12] and [20] are formu-lations where our proposed scheme can be applied. How-ever, for non-matrix formulation structures, we need tofollow a similar derivation along with certain modifications.As a result, our multiple bit parity prediction basedapproach will be applicable for most of the parallel CRCarchitecture by using a similar derivation on the digit levelsyndrome formulation.

2144 IEEE TRANSACTIONS ON COMPUTERS, VOL. 65, NO. 7, JULY 2016

Page 3: New Multiple-Bit Parity-Based Concurrent Fault Detection … · 2019. 7. 21. · Multiple-Bit Parity-Based Concurrent Fault Detection Architecture for Parallel CRC Computation Dipanwita

2.1 CRC Basics

The generator polynomial of a systematic linear ðmþ k; kÞerror detecting code over GF ð2Þ is represented by

GðxÞ ¼ 1þXt�1

i¼1

gixi þ xt þ xm; gi 2 f0; 1g: (1)

In CRC computation first, a message is processed to gener-ate its CRC checksum. Then the CRC is appended to themessage to form the codeword, which is transmitted to thereceiver over the channel. The receiver again calculatesthe CRC of the codeword and compares it with the receivedCRC. If an error is found, the transmission protocol eitherdiscards the corrupted data and/or sends a retransmissionrequest. The k - bit input message UðxÞ can similarly be rep-

resented as UðxÞ ¼ Pk�1i¼0 uix

i; ui 2 f0; 1g: The m�bit CRCchecksum, known as the syndrome, is the remainder of thedivision of UðxÞ xm by GðxÞ in GF ð2Þ i.e., SðxÞ ¼ ðxm UðxÞÞmod GðxÞ:The checksum SðxÞ is concatenated with the

input message UðxÞ to generate the encoded messageMðxÞ ¼ xm UðxÞ þ SðxÞ;which is transmitted over the trans-mission channel. In a CRC decoder,MðxÞ is divided byGðxÞto find the remainder. A non-zero value of the remainderimplies the presence of an error.

The above polynomial division of CRC is achieved byfeeding the message bits UðxÞ into a LFSR structure consist-ing of registers each of which is accompanied by commonclock and clear signals. The coefficient of the highest orderterm is the first bit to enter the LFSR, while the coefficient ofthe lowest order term is the last bit [8]. At the end, the calcu-lated value of the remainder is shown by the content of theregisters. It is noted that syndrome calculation of a k�bitmessage requires k clock cycles in a LFSR-based CRC com-putation. The number of clock cycles can be reduced usingparallel circuit. This is explained in the next section.

2.2 Parallel CRC

The parallel computation of CRC can be performed by treat-ing the message block-wise iteratively where l messagebits are processed at each iteration. If k mod l 6¼ 0, thenðl� k mod lÞ zeroes are prepended to UðxÞ to make themessage length a multiple of l. The k�bit message is split

into q ¼ dkle message blocks each being l�bit long. Thus as

shown in Fig. 1, UðxÞ is divided into q parts, UðxÞ ¼Pq�1n¼0 UnðxÞ; where UnðxÞ ¼ BðnÞðxÞxlðq�1�nÞ and BðnÞðxÞ ¼Pl�1j¼0 ulðq�1�nÞþj x

j.

Since UnðxÞ is the l-bit message block or binary polyno-

mial being processed at the nth iteration, BðnÞðxÞ can havemaximum of degree ðl� 1Þ and can also be represented as

BðnÞðxÞ ¼Xl�1

j¼0

bðnÞj xj; b

ðnÞj 2 f0; 1g: (2)

The portion of UðxÞ that contains all the blocks for j ¼ 0 to

j ¼ n is denoted by U ðnÞðxÞ and can be written as a recursiveequation as follows

U ðnÞðxÞ ¼Xnj¼0

UjðxÞ ¼Xk�1

j¼fðq�1Þ�nglujx

j

¼ xlUðn�1ÞðxÞ þBðnÞðxÞ; (3)

where Uð0ÞðxÞ ¼ U0ðxÞ and U ðq�1ÞðxÞ ¼ UðxÞ.

In Fig. 1, it is shown that U0ðxÞ ¼Pk�1

j¼ðq�1Þl ujxj which has

ðl� k mod lÞ zeroes prepended to it. In the first iterationcycle of the parallel CRC structure, U0ðxÞ is processed to

generate the output Sð0ÞðxÞ. Then in the next iteration,

U1ðxÞ ¼Pðq�1Þl�1

j¼ðq�2Þl ujxj is processed to generate the output

Sð1ÞðxÞ. So after two cycles the processed portion of the mes-

sage is U ð1ÞðxÞ ¼ U ð0ÞðxÞ þ U1ðxÞ. In this way, after ðq � 1Þcycles, the processed message is U ðq�2ÞðxÞ ¼ U ðq�3ÞðxÞþUðq�2ÞðxÞ. Then in the last iteration cycle, the last block i.e.,

Uðq�1ÞðxÞ is processed to generate Sðq�1ÞðxÞ. So after the last

iteration cycle the processed message is U ðq�1ÞðxÞ ¼Uðq�2ÞðxÞ þ Uðq�1ÞðxÞ, which is actually the whole message

as shown in Fig. 1.Let SðnÞðxÞ be the syndrome of UðnÞðxÞ, which can be rep-

resented as

SðnÞðxÞ ¼ ðxm U ðnÞðxÞÞ mod GðxÞ: (4)

By substituting (3) into (4), one can obtain

SðnÞðxÞ ¼ fxl Sðn�1ÞðxÞ þ xm BðnÞðxÞg mod GðxÞ: (5)

We assume l > m for the rest of the discussion in thissection. The other two conditions, namely l < m and l ¼ m,will be considered in Section 4. Now we get from [17], thatfor l > m (5) reduces to

SðnÞðxÞ ¼ fxm T ðnÞðxÞg mod GðxÞ; (6)

where

T ðnÞðxÞ ¼ xl�m Sðn�1ÞðxÞ þBðnÞðxÞ: (7)

T ðnÞðxÞ can be represented as

T ðnÞðxÞ ¼Xl�1

j¼0

tðnÞj xj; t

ðnÞj 2 f0; 1g; (8)

and SðnÞðxÞ can be represented as SðnÞðxÞ ¼ Pm�1j¼0 s

ðnÞj xj;

sðnÞj 2 f0; 1g: From (6), (7) and (8), we can write

tðnÞj ¼ b

ðnÞj j 2 ½0; l�m� 1�bðnÞj þ s

ðn�1Þj�ðl�mÞ j 2 ½l�m; l� 1�:

((9)

Fig. 1. Message splitted in q blocks each being of length l.

GANGOPADHYAY AND REYHANI-MASOLEH: MULTIPLE-BIT PARITY-BASED CONCURRENT FAULT DETECTION ARCHITECTURE FOR PARALLEL... 2145

Page 4: New Multiple-Bit Parity-Based Concurrent Fault Detection … · 2019. 7. 21. · Multiple-Bit Parity-Based Concurrent Fault Detection Architecture for Parallel CRC Computation Dipanwita

A matrix multiplication scheme has been proposed in [17],which represents (8) in matrix multiplication notations as

T ðnÞðxÞ ¼ x tðnÞ; (10)

where T ðnÞðxÞ is a scalar, x ¼ x0; . . . ; xl�1� �

is the row vec-

tor and tðnÞ ¼ tðnÞ0 ; . . . ; t

ðnÞl�1

h iTis the column vector.

Similarly, using (7) and (10), we can get the matrix for-mulation as

SðnÞðxÞ ¼ ðxm x tðnÞÞ mod GðxÞ ¼ x sðnÞ;where

sðnÞ ¼ G tðnÞ: (11)

sðnÞ ¼ sðnÞ0 . . . s

ðnÞm�1

h iTis the column vector and G is a m� l

matrix whose each element can be represented as gi;j whengi;j 2 f0; 1g for 0 � i � m� 1 and 0 � j � l� 1:

The parallel implementation of a basic structure of CRCis shown in Fig. 2.

In this figure, at the nth iteration, l- bit BðnÞðxÞ and m- bit

Sðn�1ÞðxÞ are shifted and XORed to generate ðmþ lÞ- bit

T ðnÞðxÞ according to (7) and then following (6), T ðnÞðxÞ ismade to pass through an array of XOR gates to generate

T ðnÞðxÞ mod GðxÞ which is the next state value of the syn-

drome SðnÞðxÞ. This SðnÞðxÞ is then passed through flip-flops

and the current state value of the syndrome Sðn�1ÞðxÞ is sentback as feedback. In the first iteration cycle, Bð0ÞðxÞ and

Sð�1ÞðxÞ are processed to generate Sð0ÞðxÞwhen Sð�1ÞðxÞ ¼ 0,

while, in the second iteration cycle, this Sð0ÞðxÞ is fed back

and processed with Bð1ÞðxÞ to generate Sð1ÞðxÞ. Both Bð0ÞðxÞand Bð1ÞðxÞ can be obtained from Uð0ÞðxÞ and Uð1ÞðxÞ asshown in Fig. 1. In a similar way, the whole message is proc-

essed in total q cycles. In the last cycle, Bðq�1ÞðxÞ and

Sðq�2ÞðxÞ are processed to generate Sðq�1ÞðxÞ which is the

final CRC value i.e., Sðq�1ÞðxÞ ¼ SðxÞ . The relation between

U ðq�1ÞðxÞ and Bðq�1ÞðxÞ is shown in Fig. 1.The parallel implementation based on the above matrix

multiplication of CRC for the case of l > m proposed in[17] is shown in Fig. 3.

In Fig. 3, the whole design has been split in two differ-ent blocks, specifically the T Generation Unit and theMatrix Multiplication Unit. The shift and the adder withtwo ðmþ lÞ inputs stage shown in Fig. 2 are combinedinto a single stage in Fig. 3 and is named the T GenerationUnit, while the reduction stage consisting of an array ofXOR in Fig. 2 is named the Matrix Multiplication Unit inFig. 3, since the reduction is achieved using matrix multi-plication as given in (11). In the T Generation Unit, check-

sum values from a previous iteration i.e., Sðn�1ÞðxÞ and

the present block of the message i.e., BðnÞðxÞ are taken as

inputs and XORed to get the output which is T ðnÞðxÞ

according to (7). In the Matrix Multiplication Unit, this

T ðnÞðxÞ is multiplied with G for the computation of the

present CRC terms SðnÞðxÞ based on (11). Each I.S. box inthis unit works as input selection (I.S.) depending on itsinput from the G matrix. We consider gi;j as the ith rowand jth column element of the G matrix. The functionalityof I.S. box is similar to the AND gate. If the value of gi;j is0, then the corresponding I.S. box becomes open-circuitand if the value of gi;j is 1, then the corresponding I.S.box becomes short-circuit, i.e., if the value of gi;j is 1, thenthe output of the corresponding I.S. box is tj and if thevalue of gi;j is 0, then the output is 0.

In the next section, we present the formulation of multi-ple-bit parity-based concurrent fault detection of parallelCRC architecture. Towards that end, we derive a formula-tion for the calculation of the predicted parity of the syn-drome bits, for each of the three conditions i.e., l > m,l ¼ m and l < m, when l is the number of input messagebits in each iteration and m is the number of CRC bits. Westart with the case l > m since the matrix based CRC for-mulation of this case has already been presented in [17]. Forthe other two cases, we have to extend the idea given in [17]to generate the required CRC formulation. In this section,we also present the architecture of the proposed concurrentfault detection of parallel CRC structure based on multiple-bit parity prediction for all the three cases. We present themultiple-bit parity formulation and architecture followingthe order, i.e., first the structure for l > m is discussed fol-lowed by the design for l ¼ m and finally the structure cor-responding to l < m is constructed.

Before going to the next section, we present Table 1 list-ing all the symbols used throughtout this paper along withtheir definitions.

3 CONCURRENT FAULT DETECTION OF PARALLEL

CRC STRUCTURE

In this section, we consider the parallel CRC architecture pro-posed in [17] to apply a multiple-bit parity approach on thisarchitecture to achieve concurrent fault detection. This multi-ple-bit parity checking circuit will compare predicted parityand actual parity of the syndrome bits at the end of each

Fig. 2. Parallel Implementation of CRC. The block diagram is derivedfrom [9], [11].

Fig. 3. Parallel Implementation of CRC based on Matrix Multiplication forl > m [17]. The circuit is derived from [9], [11].

2146 IEEE TRANSACTIONS ON COMPUTERS, VOL. 65, NO. 7, JULY 2016

Page 5: New Multiple-Bit Parity-Based Concurrent Fault Detection … · 2019. 7. 21. · Multiple-Bit Parity-Based Concurrent Fault Detection Architecture for Parallel CRC Computation Dipanwita

iteration and raise an alarm if they do not match. The blockdiagram of the proposed scheme has been depicted in Fig. 4.

In Fig. 4, the output of the parity calculation unit is theactual multiple-bit parity of CRC syndrome bits after nthiteration, whereas the output of the parity prediction unit isthe predicted multiple-bit parity. The actual parity isobtained by XORing the outputs of the Matrix Multiplica-tion Unit, whereas the predicted parity is evaluated as a dif-ferent function of inputs. Functionality of T Generation Unitand Matrix Multiplication Unit have already been explainedin Section 2 regarding Fig. 3. The coloured portion of Fig. 4shows the overhead. The additional design unit, shown inFig. 4, computes the parity of the output of the flip-flopsand compares it with the actual parity.

Now we propose the formulation for the multiple-bitparity prediction when l � m and then we show the corre-sponding hardware structure.

3.1 Formulation of Parity Prediction for l � m

In this section, first we present formulations for a multiple-bit parity prediction architecture for CRC when the numberof message bits in each iteration, i.e., l is greater than orequal to the number of CRC bits, i.e.,m.

From (9) and (11), we get the value of the ith bit of them-bit syndrome after the nth iteration for l > m as follows,

sðnÞi ¼

Xl�m�1

j¼0

gi;j bðnÞj þ

Xl�1

j¼l�m

gi;j

�bðnÞj þ s

ðn�1Þj�ðl�mÞ

�: (12)

Again, for l ¼ m, we derive from (5), SðnÞðxÞ ¼fxm T ðnÞðxÞg mod GðxÞ; where

T ðnÞðxÞ ¼ Sðn�1ÞðxÞ þBðnÞðxÞ: (13)

Next from (6), (7), (8) and (13), for j 2 ½0; l� 1�,

tðnÞj ¼ b

ðnÞj þ s

ðn�1Þj ðxÞ: (14)

Now from (2), (11), (12) and (14), we derive the value of thesyndrome after the nth iteration,

sðnÞi ¼

Xl�1

j¼0

gi;j ðbðnÞj þ sðn�1Þj Þ: (15)

The above formulation can be extended for the multiple-bit parity structure as explained in the next lemma. We con-sider that the parallel CRC structure, having l-bit input andm-bit output, is splitted into w blocks while each block ischaracterized by r-bit output. Without loss of generality, wecan consider that m ¼ w � r. Using this information, wepropose a lemma for w-bit parity.

Lemma 1. Let pðsðnÞc Þ 2 f0; 1g be the predicted parity of the cthblock of the syndrome for nth iteration and pðgj;cÞ 2 f0; 1g bethe parity of the r elements of the jth column of G startingfrom c:rth element, then for l � m,

pðsðnÞc Þ ¼Xl�1

j¼0

bðnÞj pðgj;cÞ þ

Xl�1

j¼l�m

sðn�1Þj�ðl�mÞ pðgj;cÞ: (16)

where pðgj;cÞ is the parity of the r elements of the jth column ofG starting from the c rth element and is described as

pðgj;cÞ ¼Prðcþ1Þ�1

i¼cr gi;j and 0 � c � w� 1.

Proof. Using (12), the predicted parity of the cth block of thesyndrome after the nth iteration can be expressed as,

pðsðnÞc Þ ¼Xrðcþ1Þ�1

i¼cr

sðnÞi ; (17)

TABLE 1List of the Symbols Used Throughout the Paper

Symbols Definitions

l Number of input message bits in each iterationm Degree of CRC generator polynomialGðxÞ Generator polynomial of CRC over GF ð2ÞUðxÞ k - bit input messageSðxÞ m�bit CRC checksum, known as the syndromeMðxÞ The encoded message to be transmittedq Number of message blocks each being l�bit longUnðxÞ Message fragment processed at n�th iterationU ðnÞðxÞ Message segment containing all the

blocks processed till n�th iterationBðnÞðxÞ Present block of the message

when UnðxÞ ¼ BðnÞðxÞxlðq�1�nÞ

T ðnÞðxÞ xl�m Sðn�1ÞðxÞ þBðnÞðxÞG m� lmatrix whose each element

can be represented as gi;jsðnÞi

i-th bit of them-bit syndrome after the nth iteration

pðsðnÞc Þ Predicted parity of the c�th block

of the syndrome for n�th iterationpðgj;cÞ Parity of the r elements of j�th column ofG

pðsðnÞ0 Þ Actual parity of the c�th block of

the syndrome for n�th iterationw Number of parity bits i.e., number of blocks

in whichm�bit CRC output is splittedr Number of bits of each blockeðxÞ Error polynomialEc Theoretical error coverage

Fig. 4. Simplified block diagram of the proposed multiple-bit parity-basederror detection structure of parallel CRC architecture. The coloured por-tion shows the overhead.

GANGOPADHYAY AND REYHANI-MASOLEH: MULTIPLE-BIT PARITY-BASED CONCURRENT FAULT DETECTION ARCHITECTURE FOR PARALLEL... 2147

Page 6: New Multiple-Bit Parity-Based Concurrent Fault Detection … · 2019. 7. 21. · Multiple-Bit Parity-Based Concurrent Fault Detection Architecture for Parallel CRC Computation Dipanwita

From (12) and (17) for l > m,

pðsðnÞc Þ ¼Xrcþr�1

i¼cr

Xl�m�1

j¼0

gi;j bðnÞj þ

"

þXl�1

j¼l�m

gi;j�bðnÞj þ s

ðn�1Þj�ðl�mÞ

�#: (18)

Again for l ¼ m, the predicted parity for the cth block

pðsðnÞc Þ ¼Xrðcþ1Þ�1

i¼cr

Xl�1

j¼0

gi;j�bðnÞj þ s

ðn�1Þj

�" #: (19)

Now combining (18) and (19), we can further derive (16)for l � m. tuThis formulation will be used to generate the proposed

fault detection of parallel CRC architecture using multiple-bit parity for l � m. A parity prediction circuit will bedesigned based on (16) and the actual multiple-bit parity ofthe syndrome will be compared with these predicted paritybits generating an error signal in the case of a mismatch.

Now we will provide an illustrative example for the casel ¼ m.

Example 1. We consider CRC generator polynomial as

GðxÞ ¼ 1þ x3 þ x4 þ x5 þ x8 and degree of parallelisml ¼ 8. Then the equation of the first column of G is

r0ðxÞ ¼ 1þ x3 þ x4 þ x5. Now we get,r1ðxÞ ¼ xþ x4 þ x5 þ x6, r2ðxÞ ¼ x2 þ x5 þ x6 þ x7, r3

ðxÞ ¼ 1þ x4 þ x5 þ x6 þ x7; r4ðxÞ ¼ 1þ xþ x3 þ x4 þ x6

þx7; r5ðxÞ ¼ 1þ xþ x2 þ x3 þ x7, r6ðxÞ ¼ 1þ xþ x2þ x5,

r7ðxÞ ¼ xþ x2 þ x3 þ x6. Combining all these,

G ¼

1 0 0 1 1 1 1 00 1 0 0 1 1 1 10 0 1 0 0 1 1 11 0 0 0 1 1 0 11 1 0 1 1 0 0 01 1 1 1 0 0 1 00 1 1 1 1 0 0 10 0 1 1 1 1 0 0

266666666664

377777777775:

We assume that BðnÞðxÞ ¼ 1þ x6 þ x7 and

Sðn�1ÞðxÞ ¼ 1þ x2 þ x6. So, bðnÞ0 ¼ 1, b

ðnÞ1 ¼ 0, b

ðnÞ2 ¼ 0,

bðnÞ3 ¼ 0, b

ðnÞ4 ¼ 0, b

ðnÞ5 ¼ 0, b

ðnÞ6 ¼ 1, b

ðnÞ7 ¼ 1 and s

ðn�1Þ0 ¼ 1,

sðn�1Þ1 ¼ 0, s

ðn�1Þ2 ¼ 1, s

ðn�1Þ3 ¼ 0, s

ðn�1Þ4 ¼ 0, s

ðn�1Þ5 ¼ 0,

sðn�1Þ6 ¼ 1, s

ðn�1Þ7 ¼ 0. Now using (15), we get, s

ðnÞ0 ¼ g0;2þ

g0;7 ¼ 0, sðnÞ1 ¼ g1;2 þ g1;7 ¼ 1, s

ðnÞ2 ¼ g2;2 þ g2;7 ¼ 0, s

ðnÞ3 ¼

g3;2 þ g3;7 ¼ 1, sðnÞ4 ¼ g4;2 þ g4;7 ¼ 0, s

ðnÞ5 ¼ g5;2 þ g5;7 ¼ 1,

sðnÞ6 ¼ g6;2 þ g6;7 ¼ 0, s

ðnÞ7 ¼ g7;2 þ g7;7 ¼ 1. So, output of the

parallel CRC circuit for nth iteration is SðnÞðxÞ ¼xþ x3 þ x5 þ x7. Now considering w ¼ 4 and r ¼ 2; actualmultiple-bit parity of these syndrome bits can be repre-

sented as pðsðnÞ0 Þ ¼ sðnÞ0 þ s

ðnÞ1 ¼ 1, pðsðnÞ1 Þ ¼ s

ðnÞ2 þ s

ðnÞ3 ¼ 1,

pðsðnÞ2 Þ ¼ sðnÞ4 þ s

ðnÞ5 ¼ 1, pðsðnÞ3 Þ ¼ s

ðnÞ6 þ s

ðnÞ7 ¼ 1.

Again considering w ¼ 4, we get the values of the pre-

dicted parity from (16) as, pðsðnÞ0 Þ ¼ pðg2;0Þ þ pðg7;0Þ ¼ ðg0;2 þg1;2Þþðg0;7 þ g1;7Þ ¼ s

ðnÞ0 þ s

ðnÞ1 ¼ pðsðnÞ0 Þ ¼ 1: Similarly,

pðsðnÞ1 Þ ¼ pðg2;2Þ þ pðg7;2Þ ¼ ðg2;2 þ g3;2Þ þ ðg2;7þ g3;7Þ ¼ sðnÞ2 þ

sðnÞ3 ¼ pðsðnÞ0 Þ ¼ 1, pðsðnÞ2 Þ ¼ pðg2;4Þ þ pðg7;4Þ ¼ ðg4;2þ g5;2Þ þðg4;7 þ g5;7Þ ¼ s

ðnÞ4 þ s

ðnÞ5 ¼ pðsðnÞ2 Þ ¼ 1 and pðsðnÞ3 Þ ¼ pðg2;6Þþ

pðg7;6Þ ¼ ðg6;2 þ g7;2Þ þ ðg6;7 þ g7;7Þ ¼ sðnÞ6 þ s

ðnÞ7 ¼ pðsðnÞ3 Þ ¼ 1:

The values of the predicted parity exactly matches with thevalues of the actual parity which clearly shows the correct-ness of the proposed Lemma 1.

Next, we assume there is a stuck-at-fault in the syndromegeneration circuit resulting in an erroneous value of theCRC output. Let the actual erroneous output be

SðnÞðxÞ ¼ xþ x3 þ x6 þ x7. Now, the 4�bit parity of the

actual syndrome bits are pðsðnÞ0 Þ ¼ 1, pðsðnÞ1 Þ ¼ 1, pðsðnÞ2 Þ ¼ 0

and pðsðnÞ3 Þ ¼ 0. A comparison of the actual parity and the

predicted parity shows a mismatch in the value of pðsðnÞ2 Þand pðsðnÞ2 Þ , i.e., a mismatch between the predicted parityand actual parity of the block denoted by c ¼ 2. Similarly

another mismatch is seen between pðsðnÞ3 Þ and pðsðnÞ3 Þ , i.e.,between the predicted parity and actual parity of the blockdenoted by c ¼ 3: These two errors will result in the flag toraise an alarm depicting a successful concurrent fault detec-tion. It should be noted that, since there are two erroneousbits in the output, single bit parity prediction based faultdetection will not be able to detect this error. But our multi-ple-bit parity scheme can easily detect this kind of error. Amore detailed investigation towards this direction is pre-sented in Section 4.

3.2 Multiple-Bit Parity Prediction Architecture forl � m

In this section, we present the architecture of the proposedconcurrent fault detection of parallel CRC structure basedon multiple-bit parity prediction. We have already formu-lated multiple-bit parity prediction expressions for l � mwhen l is the number of input message bits in each iterationandm is the number of CRC bits.

Fig. 5a shows the block diagram of the proposed multi bitparity prediction scheme and Fig. 5b shows the parity pre-diction circuit developed in (16). In Fig. 5b each I.S. has oneinput as pðgj;cÞ, when pðgj;cÞ is the parity of the r elements ofjth column of G starting from r:cth element for0 � j � l� 1. In Fig. 5b, the outputs of the I.S. boxes areshown to be connected to the Binary tree of XOR (BTX)block. The number of inputs to the BTX block is equal to thenumber of ones in pðgj;cÞ for 0 � j � l� 1, which is fixed fora given generator polynomial or GðxÞ.

In Fig. 5a, the checksum portion of the design is shownto have been divided into w blocks while c denotes thenumber of the block. Each block consists of r syndromebits. After nth iteration the single bit predicted parity of

all those w blocks range from pðsðnÞ0 Þ to pðsðnÞw�1Þ forming amultiple-bit parity prediction having width w. This multi-ple-bit parity is compared to the actual parity in the Com-parison Unit and any discrepency will raise an alarm. TheComparison Unit includes w 2-input XOR gates and a two

2148 IEEE TRANSACTIONS ON COMPUTERS, VOL. 65, NO. 7, JULY 2016

Page 7: New Multiple-Bit Parity-Based Concurrent Fault Detection … · 2019. 7. 21. · Multiple-Bit Parity-Based Concurrent Fault Detection Architecture for Parallel CRC Computation Dipanwita

rail parity checker circuit consisting of a w-input ANDgate and another w-input NOR gate. Since this final errorsignal is a single point of failure, the inclusion of a tworail parity check structure and two final error signalsensure detection of any stuck at faults in the comparisonunit. This multiple-bit parity prediction architecture has adefinite advantage. From the mismatch, we can approxi-mately figure out the location of the faulty portion in thecircuit depending upon the location of the erroneousactual parity bit. The mapping between a fault in the par-allel CRC circuit and an error in the actual parity bits willbe discussed in the next section.

Next we present the multiple-bit parity prediction archi-tecture for l ¼ m. Since this diagram corresponds to the casel ¼ m, the internal structure of T Generation Unit andMatrix Multiplication Unit differs from those shown inFig. 3. Hence those details are shown in Fig. 5c. The detailsof the Parity Prediction Unit basically the internal structureof the design of the calculation of each predicted parity bitsbased on (16) corresponding to each block is shown inFig. 5d. Each I.S. in Fig. 5d has one input as pðgj;cÞ, when

pðgj;cÞ is the parity of the r elements of jth column ofG start-

ing from r:cth element for 0 � j � l� 1.The syndrome bits for this case also are divided into w

blocks while c denotes the number of the blocks similar tothe case shown in Fig. 5a. The single bit predicted parity ofall those w blocks after nth iteration are shown to range

from pðsðnÞ0 Þ to pðsðnÞw�1Þ thus forming a multiple-bit parity

prediction having width w. Similar to the concept shown in

Fig. 4, the w -bit predicted parity is compared to the actualw -bit parity and a high value of error ðeÞ indicates the pres-ence of an error. Similar to the previous case, in this case toowe have the advantage of approximately catching the faultyportion of the circuit depending on the location of the erro-neous actual parity bit.

In the next section, the parallel CRC architecture pro-posed in [17] for the case l � m has been extended togenerate the parallel CRC structure for l < m. Then weapply a multiple-bit parity approach on this architectureto achieve concurrent fault detection. Similar to the casedepicted in the previous section, this multiple-bit paritychecking circuit will also compare predicted parity andactual parity of the syndrome bits at the end of each iter-ation and raise an alarm if they do not match as shownin Fig. 4.

3.3 Formulation of Parity Prediction for l < m

Next we consider the last remaining case when the number ofmessage bits in each iteration ðlÞ is less than the number ofCRC bits ðm). First we derive the matrix based parallel CRCformulation based on (12). This is an extension of the idea pro-posed in [17]. Thenwe propose a formulation and then, by thesupport of this formulation, we propose a lemma to design amultiple-bit parity prediction architecture of CRC for l < m.Towards that end, we define a m� ðm� lÞ matrix H, whichwill be required further on in this section. Each element ofmatrix H can be represented as hi;j when 0 � i � m� 1 and0 � j � m� l� 1 and for each ith column only hiþl; i ¼ 1 andthe rest of the elements are zero.

Fig. 5. (a) Block diagram of the proposed multi bit scheme, (b) parity prediction circuit of each block for l > m, (c) details of T generation unit andmatrix multiplication unit for l ¼ m, (d) parity prediction circuit of each block for l ¼ m.

GANGOPADHYAY AND REYHANI-MASOLEH: MULTIPLE-BIT PARITY-BASED CONCURRENT FAULT DETECTION ARCHITECTURE FOR PARALLEL... 2149

Page 8: New Multiple-Bit Parity-Based Concurrent Fault Detection … · 2019. 7. 21. · Multiple-Bit Parity-Based Concurrent Fault Detection Architecture for Parallel CRC Computation Dipanwita

For l < m, we derive from (5), SðnÞðxÞ ¼ TðnÞt ðxÞ

mod GðxÞ; where

TðnÞt ðxÞ ¼ T

ðnÞ1 ðxÞ þ T

ðnÞ2 ðxÞ; (20)

TðnÞ1 ðxÞ ¼ xl

Xm�l�1

j¼0

sðn�1Þj xj; (21)

and

TðnÞ2 ðxÞ ¼ xm

Xl�1

j¼0

�sðn�1Þjþm�l þ b

ðnÞj

�xj: (22)

Using (11), we can derive from (20), (21) and (22),

sðnÞ ¼ HtðnÞ1 þGt

ðnÞ2 ; (23)

when we can express TðnÞ1 ðxÞ ¼ x t

ðnÞ1 ; and T

ðnÞ2 ðxÞ ¼ x t

ðnÞ2 :

Here tðnÞ1 ¼ ½tðnÞ1 0; . . . ; t

ðnÞ1 ðm�lÞ�1�T and t

ðnÞ2 ¼ ½tðnÞ2 0; . . . ;

tðnÞ2 ðm�lÞ�1�T are the column vector. Now from (23),

sðnÞi ¼

Xm�l�1

j¼0

hi;j sðn�1Þj þ

Xl�1

j¼0

gi;j�bðnÞj þ s

ðn�1Þjþm�l

�: (24)

Similar to the previous cases discussed in the lasttwo sections, in the present case too, we extend the above-mentioned formulation to the multiple-bit parity structure.The parallel CRC structure involving l-bit input and m-bitoutput is considered to be splited into w blocks each havingr-bit output. We can safely consider without loss of general-ity that m ¼ w � r. A lemma is proposed using this infor-mation for l < m.

Lemma 2. Let pðsðnÞc Þ 2 f0; 1g be the predicted parity of the cthblock of the syndrome for the nth iteration, pðgj;cÞ 2 f0; 1g bethe parity of the r elements of the jth column of G startingfrom the c:rth element and pðhj;cÞ 2 f0; 1g be the parity of ther elements of the jth column of H starting from the c:rth ele-ment, then

pðsðnÞc Þ ¼Xm�l�1

j¼0

sðn�1Þj pðhj;cÞ þ

Xl�1

j¼0

�bðnÞj þ s

ðn�1Þjþm�l pðgj;cÞ

�: (25)

Proof. Similar to the proof of Lemma 1, using (17), we get

pðsðnÞc Þ¼Prðcþ1Þ�1i¼cr s

ðnÞi ;Now using (24),

pðsðnÞc Þ ¼Xrðcþ1Þ�1

i¼cr

Xm�l�1

j¼0

hi;j sðn�1Þj þ

"

þXl�1

j¼0

gi;j�bðnÞj þ s

ðn�1Þjþm�l

�#: (26)

From (26) we can further derive (25). tuWe will use the above formulation to generate the pro-

posed concurrent fault detection of CRC architecture usingmultiple-bit parity for l < m. The actual parity bits of theCRC syndrome are XORed with the predicted multiple-bitparity-based on (25). If these two sets of parity bits do notmatch with each other, then a flag indicating the occuranceof an error is generated.

3.4 Multiple-Bit Parity Prediction Architecturefor l < m

Next we present the multiple-bit parity prediction architec-ture for l < m. In Fig. 6, the block diagram of the matrix for-mulation based parallel CRC architecture for l < m havebeen shown in details. This circuit is designed from (24).Since this diagram corresponds to the case l < m, the inter-nal structure of CRC Generation Unit greatly differs from Tgeneration unit and Matrix Multiplication Unit as shown inFig. 3 which corresponds to the case l > m.

The block diagram of the proposed multiple-bit parityscheme for l < m is similar to the diagram shown inFig. 5a. Next for l < m the details of the Parity PredictionUnit basically the internal structure of the design of the cal-culation of each predicted parity bit corresponding to eachblock is shown in Fig. 7. This circuit is generated using (25).In Fig. 7 each I.S. has one input as pðgj;cÞ, when pðgj;cÞ is theparity of the r elements of jth column of G starting fromc:rth element for 0 � j � l� 1.

The actual parity bits are compared with the predictedparity bits shown to be generated in Fig. 7 and in presenceof an error, e goes high. Both the predicted and actual parityare w-bit wide forming a multiple-bit parity prediction hav-ing width w. In this case also we have the advantage thatfrom the mismatch, we can approximately figure out which

Fig. 6. Parallel CRC structure based on matrix multiplication for l < m.

Fig. 7. Parity Prediction circuit of each block of the proposed multi bitscheme for l < m.

2150 IEEE TRANSACTIONS ON COMPUTERS, VOL. 65, NO. 7, JULY 2016

Page 9: New Multiple-Bit Parity-Based Concurrent Fault Detection … · 2019. 7. 21. · Multiple-Bit Parity-Based Concurrent Fault Detection Architecture for Parallel CRC Computation Dipanwita

portion of the circuit has got a fault depending on the loca-tion of the erroneous actual parity bit.

Now we are ready with all the required formulations toanalyze the architecture of multiple-bit parity-based concur-rent fault detection of parallel CRC architecture.

4 ANALYSIS AND SIMULATION RESULTS

A physical defect due to imperfection, flaw, shorts or opensin a circuit is defined as a fault and an deviation from cor-rect result is called error. In other words, faults occured inphysical universe is manifested by errors in informationaluniverse which further leads to failures [30]. In this section,we first discuss the theoretical error coverage analysis alongwith a comparison of detectable errors of our proposedtechnique with other error detection techniques. Then wepropose a lemma for the computation of error coverage ofour proposed scheme. We also present simulation resultssupporting the theoretical value obtained from the lemma.Finally we consider fault models and fault analysis of paral-lel CRC structures based on generator polynomials. At theend of this section, we present the results of the simulationin which first we inject faults into the parallel CRC structureand calculate the number of cases where the injected faultsresult in occured errors. Then we calculate the number ofcases where our proposed scheme has been successful inidentifying the errors. All these results have been presentedin tabular format.

4.1 Error Model

In this section, we present a comparison of the error cover-age among the three methods, namely Duplication withCompariosn, Single bit Parity Check and Proposed Multi bitParity Check. Duplication with Comparison method candetect any error and single bit parity check can detect onlyodd number of bit flips. While our proposed method candetect any odd number of bit flips, it can also detect some ofthe even number of bit flips as well.

In this section, theoretical error coverage has also beenexplained. In the following sections, fault types and faultmodels have been discussed. For m�bit CRC checksum, afault has the effect of flipping any number of bits from thosem bits. An addition of error polynomial with the expected

checksum represents this fault. This error polynomial can

be represented as eðxÞ ¼ Pm�1i¼0 eix

i when ei 2 GF ð2Þ. Andthen SeðxÞ ¼ SðxÞ þ eðxÞ presents the erroneous output.A no error situation takes place when eðxÞ ¼ 0 i.e., ei ¼ 0for 0 � i � m� 1. An error in i�th bit position is manifestedby ei ¼ 1 in the error polynomial because it flips the i�th bitmaking the output erroneous. Since there are total m bits inthe output, there are total 2m � 1 possible errors.

Total number of possible 2-bit error in m-bit CRC syn-

drome is m2

� �. In a single bit parity check scheme of CRC,

2-bit error goes undetected. But our proposed multiple-bit scheme can detect some of the 2-bit errors. Thisscheme cannot detect a 2-bit error if both of the errone-ous bits fall in the same block. So the number of possibleundetectable 2-bit errors for multiple-bit scheme is equalto the product of the number of blocks, i.e., w and num-

ber of possible 2-bit errors in a block, i.e., r2

� �where r is

the number of bits in each block. Hence the number ofpossible detectable 2-bit errors in the proposed multiple-bit parity scheme = total number of possible 2-bit errors

- number of undetectable 2-bit errors = ½ m2

� �� w r2

� �� =

½mðm�rÞ2 �.Similarly, we can calculate the number of 4-bit error

detectable by multiple-bit parity scheme. Total number of

possible 4-bit error in a m-bit CRC syndrome is m4

� �. For a

multiple-bit parity scheme, there can be various ways inwhich erroneous bits can be distributed. The multiple-bitparity scheme cannot detect a 4-bit error in two cases. First,when all of the four erroneous bits are in the same block

which can happen in w r4

� �ways. Second, when two of them

are present in one block and the other two in another block

which can happen in w2

r2

� �ðw� 1Þ r2

� �� �ways, while the rea-

son for the division by 2 is the fact that each pair getsselected twice. So, the number of possible detectable 4-biterrors in the proposed multiple-bit parity scheme = totalnumber of possible 4-bit errors-number of undetectable

4-bit errors ¼ m4

� �� ½w r4

� �þ w2 ðw� 1Þ r

2

� �r2

� �� ¼ m24 ðm� 1Þ½

ðm� 2Þ ðm� 3Þ � ðr� 1Þfr2w� rðw� 4Þ þ 6g�.A comparison of the fault coverage in terms of number of

detectable errors of the above three schemes is shown inTable 2.

TABLE 3Theoretical Complexity Analysis in Terms of Area Overhead and Critical Path Delay

Double Modular Redundancy Proposed Multi bit Parity Check

Number of XOR gates 2ml mlþ wðlþmÞNumber of FFs m mDelay TX þ log2ld eTX þ TX þ log2md eTX TX þ log2ld eTX þ TX þ log2rd eTX þ log2wd eTX

TABLE 2Comparison of Theoretical Fault Coverage in Terms of Detectable Errors

Duplication with Compariosn Single bit Parity Check Proposed Multi bit Parity Check

Single bit Error 100% 100% 100%Odd number of bits in Error 100% 100% 100%Double Bit Errors mðm�1Þ

20 mðm�rÞ

2

Quadruple Bit Errors mðm�1Þðm�2Þðm�3Þ24

0 m24 ðm� 1Þðm� 2Þðm� 3Þ � ðr� 1Þfr2w� rðw� 4Þ þ 6g½ �

GANGOPADHYAY AND REYHANI-MASOLEH: MULTIPLE-BIT PARITY-BASED CONCURRENT FAULT DETECTION ARCHITECTURE FOR PARALLEL... 2151

Page 10: New Multiple-Bit Parity-Based Concurrent Fault Detection … · 2019. 7. 21. · Multiple-Bit Parity-Based Concurrent Fault Detection Architecture for Parallel CRC Computation Dipanwita

4.2 Error Analysis

In this section we propose a lemma for the calculation oferror coverage in terms of number of detectable errors usingour proposed multiple-bit scheme.

Lemma 3. Let m be the degree of CRC generator polynomial andw be the number of parity bits, then error coverage Ec can beexpressed as

Ec ¼ 2m�wð2w � 1Þ � 1

2m � 1: (27)

Proof. For m - bit CRC computation, total number of possi-ble errors can be NTotal ¼ 2m � 1. An error detectionscheme using single bit parity can detect error in pres-ence of any odd number of erroneous bits, but it cannotdetect error when number of erroneous bits is even.However, our proposed multiple-bit parity scheme iscapable of detecting all odd number of errors and someof the even number of errors. For m - bit CRC computa-tion, among all the possible errors, half of them will haveodd number of erroneous bits. So possible odd number

of errors is Nodd ¼ 2m�1. The rest of the errors have evennumber of erroneous bits. So possible even number of

errors is Neven ¼ 2m�1 � 1: Here this �1 corresponds tothe case of zero number of errors which means actuallyno error. The proposed multiple-bit scheme consists of wblocks while each block is characterized by r bits. Similarto the previous reasoining here also we can write that, ineach block total number of errors, possible odd numberof errors and possible even number of errors can be given

by respectively ntotal ¼ 2r � 1, nodd ¼ 2r�1 and neven ¼2r�1 � 1. Now for our proposed scheme, an error will goundetected in case if all the blocks contain either no erroror even number of errors, i.e., only if there is no oddnumber of error in any of the blocks our scheme will failto detect the error. So for our proposed scheme, numberof undetectable error, NUndetectable ¼ 2m�w. So error cover-age, which is the ratio of detectable errors and total num-ber of errors, can be expressed as

Ec ¼ NTotal �NUndetectable

NTotal

¼ ð2m � 1Þ � 2m�w

2m�1¼ 2m�wð2w � 1Þ � 1

2m � 1:

tuWe have simulated our proposed scheme for several fre-

quently used CRC generator polynomials. The simulationhas been performed form ¼ 8,m ¼ 16 andm ¼ 32. The pol-ynomials used for these simulations are mentioned here.

For m ¼ 8, we have used GðxÞ ¼ x8 þ x2 þ xþ 1. For

m ¼ 16, we have used GðxÞ ¼ x16 þ x15 þ x2 þ 1. Finally for

m ¼ 32, we have used GðxÞ ¼ x32 þ x26 þ x23 þ x22þ x16 þx12þ x11 þþx10 þ x8 þ x7 þ x5 þ x4 þ x2 þ xþ 1. For eachof these values ofm, number of parity bits, i.e., w is changedand error coverage value is recorded for each particularvalue of w for corresponding value ofm.

A thorough investigation of the simulation results vali-dates the accuracy of Lemma 3. For this simulation, wehave introduced errors in the CRC checksum output bits

i.e., we made some of the bits erroneous to find out whetherthe proposed method can catch the error. For CRC-8, the

error can occur in eight output bits in total 28 � 1ð Þ ways

and we have injected all 28 � 1ð Þ number of possible errors.But for CRC-16, and for CRC-32, we have introduced ran-dom errors while making sure to include all the smallernumber of errors, i.e., the cases when the number of errorsare one, two, three, four and five.

4.3 Fault Model and Analysis of CRC Based onGenerator Polynomial

In this section, we analyse the possible fault locations andfault propagation through the CRC structure shown inFig. 3. Nowwe consider various types of faults and their cor-responding locations. We can safely assume without loss ofgenerality that the inputs i.e.,BðnÞðxÞ, Sðn�1ÞðxÞ andGmatrixare free from errors. Error due to stuck-at faults can take

place in the calculation of T ðnÞðxÞ and SðnÞðxÞ . Now first we

consider the case when there is an error in T ðnÞðxÞ . This errormay not propagate if the corresponding bit position of G

matrix for a particular bit of SðnÞðxÞ is 0: But this same errorcan flip another output bit. In theMatrixMultiplication Unit,I.S. boxes are followed by XOR gates and any single bit errorin the input of a XOR gate is sure to propagate through theBTX block to reach the output. Hence any error present inthe I.S. box output will propagate to the CRC checksum out-put and will generate an erroneous result. We have imple-mented the fault model in C language to confirm the abovefact. In this simulation, we have introduced several stuck-at-faults in various locations and found out the erroneous bitsto investigate the nature of error propagation depending onthe value of G matrix. We have a detailed discussion aboutthis fault model in this section. Now Fig. 8 shows all possiblestuck-at-fault locations in parallel CRC.

In this figure, we have shown that there are mainly fivepositions for stuck-at-faults. Those four positions have beennamed F.L. 1, F.L. 2, F.L. 3, F.L. 4 and F.L. 5 in the diagram.Now from Fig. 8, it is evident that any fault in F.L. 4 and F.L. 5 affects only the corresponding syndrome bits, but anysingle stuck-at-fault in F.L. 1 and F.L. 2 may affect multiplesyndrome bits though the manifestation of the faultdepends upon the value of the G matrix. Now a single faultin F.L. 4 implies that the particular syndrome bit is stuck at

Fig. 8. Fault locations of parallel CRC structure.

2152 IEEE TRANSACTIONS ON COMPUTERS, VOL. 65, NO. 7, JULY 2016

Page 11: New Multiple-Bit Parity-Based Concurrent Fault Detection … · 2019. 7. 21. · Multiple-Bit Parity-Based Concurrent Fault Detection Architecture for Parallel CRC Computation Dipanwita

either 1 or 0. So a single fault in F.L.4 makes only one bit ofthe syndrome erroneous. Similarly, a single stuck-at-fault inF.L.3 usually affects a single syndrome bit and can bedetected by the output flag. It can also correspond to multi-ple syndrome bit errors. This can be ensured by arguingthat resource sharing may not have a significant impact inthis case because there is very little scope of resource shar-ing for two syndrome bits in F.L. 3 zone due to the fact thatthe matrix G is in most cases a sparse matrix. The possiblelocations of the stuck-at fault in the output of the flip-flopshave been considered in F.L. 5. An additional design unit,shown in dotted line in Fig. 4, has been added to detect thepresence of single stuck-at fault in the output of the flip-flops. This additional design unit first calculates the w-bitparity of the output of the flip-flops using the same formula-tions provided for the Parity Calculation Unit. Then theseparities are compared with the actual parities of the previ-ous cycle which are calculated in the Parity CalculationUnit and passed through flip-flops. Any mismatch betweenthe w-bit parity of the Additional Unit and that of the previ-ous cycle Parity Calculation Unit can be flagged using anerror signal. Therefore, this error signal detects the presenceof any single stuck-at fault in the output of the flip-flops.

Nowwe discuss the cases of stuck-at-faults in F.L.1 and F.L. 2. Any single stuck-at-fault in F.L. 1 will result in errone-

ous value of the T ðnÞðxÞ in only one bit position i.e., if there is

a stuck-at-fault either in bðnÞi and/or s

ðn�1Þi , then only t

ðnÞi will

be erroneous. Also if there are faults in F.L. 1 simultaneouslyin the locations which are not the inputs of the same XOR

gate, then multiple bits in T ðnÞðxÞ will be erroneous. Now

any single bit error in T ðnÞðxÞ or a single stuck-at-fault in F.L.2 may influence all the syndrome bits depending on the

value of the G matrix. If there is a stuck-at-fault in tðnÞi , then

sðnÞj will be erroneous only if gj;i ¼ 1. This clearly indicates

that for each different CRC generator polynomial, fault prop-agation will be different. First we consider CRC-8-CCITT

polynomial GðxÞ ¼ x8 þ x2 þ xþð 1Þ which has been widelyused in Asynchronous Transfer Mode Header Error Con-trol/Check (ATM HEC), and then we will consider CRC-8

GðxÞ ¼ x8 þ x4 þ x3 þ x2 þ 1� �

polynomial which has AES3

(Audio Engineering Society) as its application area. Thenwe consider some other frequently used polynomials

specifically CRC-16 GðxÞ ¼ x16þð x15 þ x2 þ 1Þ which isused for USB applications. Finally we consider a widelyused CRC-32 polynomial used forMPEG, GZIP and PNG.

4.3.1 CRC-8-CCITT Polynomial

The generator matrix of CRC-8-CCITT polynomial

GðxÞ ¼ x8 þ x2 þ xþ 1ð Þ is given below,

G ¼

1 0 0 0 0 0 1 1 1 01 1 0 0 0 0 1 0 0 11 1 1 0 0 0 1 0 1 00 1 1 1 0 0 0 1 0 10 0 1 1 1 0 0 0 1 00 0 0 1 1 1 0 0 0 10 0 0 0 1 1 1 0 0 00 0 0 0 0 1 1 1 0 0

266666666664

377777777775:

Now we show in Table 4 the erroneous syndrome bitsresulted from a particular stuck-at-fault of T ðnÞðxÞ in F.L. 2.

From Table 4, we see that all the single stuck-at-faults in

T ðnÞðxÞ affect odd number of bits in syndrome, then all of

these single stuck-at-faults of T ðnÞðxÞ result in odd numberof bit errors which can be detected by a single-bit parity pre-diction scheme as well as by our proposed multiple-bit par-ity scheme. But not in all cases we will get to see a situationwhere affected bit numbers are always odd. In the next sec-tion, we will discuss a case involving a single stuck-at-faultof F.L. 2 affecting even number of syndrome bits thus mak-ing the scenario undetectable by single bit parity prediction.

4.3.2 CRC-8 Polynomial

The generator matrix of CRC-8 polynomial GðxÞ ¼ x8 þðx4 þ x3 þ x2 þ 1Þ is given below,

G ¼

1 0 0 0 1 1 1 0 0 00 1 0 0 0 1 1 1 0 01 0 1 0 1 1 0 1 1 01 1 0 1 1 0 0 0 1 11 1 1 0 0 0 1 0 0 10 1 1 1 0 0 0 1 0 00 0 1 1 0 0 0 0 1 00 0 0 1 1 1 0 0 0 1

266666666664

377777777775:

Now we show in Table 5, the erroneous syndrome bitsresulted from a particular stuck-at-fault of T ðnÞðxÞ in F.L. 2.

TABLE 4Fault Analysis of CRC-8-CCITT

Stuck-at-Fault Location Erroneous Syndrome Bits

tðnÞ0 s

ðnÞ0 , s

ðnÞ1 , s

ðnÞ2

tðnÞ1 s

ðnÞ1 , s

ðnÞ2 , s

ðnÞ3

tðnÞ2 s

ðnÞ2 ,s

ðnÞ3 , s

ðnÞ4

tðnÞ3 s

ðnÞ3 , s

ðnÞ4 , s

ðnÞ5

tðnÞ4 s

ðnÞ4 ,s

ðnÞ5 ,s

ðnÞ6

tðnÞ5 s

ðnÞ5 , s

ðnÞ6 , s

ðnÞ7

tðnÞ6 s

ðnÞ0 , s

ðnÞ1 , s

ðnÞ2 , s

ðnÞ6 , s

ðnÞ7

tðnÞ7 s

ðnÞ1 ,s

ðnÞ3 , s

ðnÞ7

tðnÞ8 s

ðnÞ1 ,s

ðnÞ2 ,s

ðnÞ4

tðnÞ9 s

ðnÞ1 ,s

ðnÞ3 ,s

ðnÞ5

TABLE 5Fault Analysis of CRC-8

Stuck-at-Fault Location Erroneous Syndrome Bits

tðnÞ0 s

ðnÞ0 , s

ðnÞ2 , s

ðnÞ3 , s

ðnÞ4

tðnÞ1 s

ðnÞ1 , s

ðnÞ3 , s

ðnÞ4 , s

ðnÞ5

tðnÞ2 s

ðnÞ2 ,s

ðnÞ4 , s

ðnÞ5 ,s

ðnÞ6

tðnÞ3 s

ðnÞ3 , s

ðnÞ5 , s

ðnÞ6 ,s

ðnÞ7

tðnÞ4 s

ðnÞ0 ,s

ðnÞ2 ,s

ðnÞ3 ,s

ðnÞ6 ,s

ðnÞ7

tðnÞ5 s

ðnÞ0 , s

ðnÞ1 , s

ðnÞ2 ,s

ðnÞ7

tðnÞ6 s

ðnÞ0 , s

ðnÞ1 , s

ðnÞ4

tðnÞ7 s

ðnÞ1 ,s

ðnÞ2 , s

ðnÞ5

tðnÞ8 s

ðnÞ2 ,s

ðnÞ3 ,s

ðnÞ6

tðnÞ9 s

ðnÞ3 ,s

ðnÞ4 ,s

ðnÞ7

GANGOPADHYAY AND REYHANI-MASOLEH: MULTIPLE-BIT PARITY-BASED CONCURRENT FAULT DETECTION ARCHITECTURE FOR PARALLEL... 2153

Page 12: New Multiple-Bit Parity-Based Concurrent Fault Detection … · 2019. 7. 21. · Multiple-Bit Parity-Based Concurrent Fault Detection Architecture for Parallel CRC Computation Dipanwita

In Table 5, we see that for tðnÞ4 , t

ðnÞ6 , t

ðnÞ7 , t

ðnÞ8 , t

ðnÞ9 , odd num-

ber of syndrome bits are affected. But for tðnÞ0 , t

ðnÞ1 , t

ðnÞ2 , t

ðnÞ3 ,

tðnÞ5 , even number of syndrome bits are affected resulting

these faults undetectable by single bit parity prediction.But since the erroneous bits are not consecutive bits, allof these errors can be detected by our proposed multi-ple-bit parity prediction scheme as shown in Fig. 5a.using Lemma 1.

4.3.3 CRC-16 Polynomial

Considering the generator matrix of CRC-16 polynomial

GðxÞ ¼ x16 þ x15 þ x2 þ 1ð Þ, now we show in Table 6, the

erroneous syndrome bits resulted from a particular stuck-

at-fault of T ðnÞðxÞ in F.L. 2.We investigate the number of erroneous syndrome

bits resulted from various particular stuck-at-fault of

T ðnÞðxÞ in F.L. 2. Following the value of the G matrix we

find that, all the single stuck-at-faults in T ðnÞðxÞ affectodd number of bits in syndrome. Thus, all of these sin-

gle stuck-at-faults of T ðnÞðxÞ result in odd number of biterrors which can be detected by a single-bit parity pre-diction scheme as well as by our proposed scheme.

4.3.4 CRC-32 Polynomial

Considering the generator matrix of CRC-32 polynomial

ðGðxÞ ¼ x32 þ x26 þ x23 þ x22 þ x16 þ x12þþ x11 þ x10 þ x8 þ x7 þ x5 þ x4 þ x2 þ xþ 1Þ;

we can investigate the number of erroneous syndrome bitsresulted from various particular stuck-at-fault of T ðnÞðxÞ inF.L. 2 following the value of the G matrix. We see that for

tðnÞ0 , t

ðnÞ1 , t

ðnÞ2 , t

ðnÞ3 , t

ðnÞ4 , t

ðnÞ5 , t

ðnÞ16 , t

ðnÞ17 , t

ðnÞ18 , t

ðnÞ24 , t

ðnÞ25 , even number

of syndrome bits are affected resulting these faults undetect-able by single bit parity prediction. But since the erroneousbits are not consecutive bits, all of these errors can be detectedby our proposedmultiple-bit parity prediction scheme.

The above discussion shows the relation between astuck-at-fault and resulting location of erroneous bitsdepending on presence of 1 or 0 in rows and columns of Gmatrix. So if we find a bit erroneous, we can simply back-track using G matrix to approximately locate a possibleregion having stuck-at fault. In this way our proposedscheme also helps to find out an approximate position ofthe faulty portion of the circuit. Further study in this direc-tion may lead to online error correction of CRC.

4.4 Fault Simulation

Now we present the results of the fault injection simulation.In this simulation, first faults are injected into the parallelCRC circuit. Then the resulting number of error occuranceis computed. Finally we calculate the number of errorsdetected by our proposed scheme and tabulate the results.We have coded this simulation using C language and usedthe generator polynomials mentioned before. For CRC-8;CRC-16 and CRC-32, we have introduced all possible singlestuck-at-faults for investigation of error propagation. Forthis purpose we have incorporated the fault model consist-ing of F.L. 1, F.L. 2, F.L. 3, F.L. 4 and F.L. 5 discussed in theprevious section. A detail investigation reveals that the totalnumber of possible single stuck-at-fault in F.L. 2, F.L. 3, F.L.4 and F.L. 5 are respectively l, ðm� lÞ, m� ðl� 1Þ and m.Table 7 gives a detail result of all the possible single stuck-atfault for F.L. 2, F.L. 3, F.L. 4 and F.L. 5, and also providesthe summation of all these results.

The table clearly proves the high error detection capabil-ity of the proposed scheme.

5 COMPLEXITY OVERHEAD AND IMPLEMENTATION

RESULTS

In this section, first we discuss the theoretical complexityof the presented error detection method using tables and

TABLE 6Fault Analysis of CRC-16

Stuck-at-Fault Location Erroneous Syndrome Bits

tðnÞ0 s

ðnÞ0 , s

ðnÞ2 , s

ðnÞ15

tðnÞ1 s

ðnÞ0 ,s

ðnÞ1 ,s

ðnÞ2 ,s

ðnÞ3 ,s

ðnÞ15

tðnÞ2 s

ðnÞ0 ,s

ðnÞ1 ,s

ðnÞ3 ,s

ðnÞ4 ,s

ðnÞ15

tðnÞ3 s

ðnÞ0 ,s

ðnÞ1 ,s

ðnÞ4 ,s

ðnÞ5 ,s

ðnÞ15

tðnÞ4 s

ðnÞ0 ,s

ðnÞ1 ,s

ðnÞ5 ,s

ðnÞ6 ,s

ðnÞ15

tðnÞ5 s

ðnÞ0 ,s

ðnÞ1 ,s

ðnÞ6 ,s

ðnÞ7 ,s

ðnÞ15

tðnÞ6 s

ðnÞ0 ,s

ðnÞ1 ,s

ðnÞ7 ,s

ðnÞ8 ,s

ðnÞ15

tðnÞ7 s

ðnÞ0 ,s

ðnÞ1 ,s

ðnÞ8 ,s

ðnÞ9 ,s

ðnÞ15

tðnÞ8 s

ðnÞ0 ,s

ðnÞ1 ,s

ðnÞ9 ,s

ðnÞ10 ,s

ðnÞ15

tðnÞ9 s

ðnÞ0 ,s

ðnÞ1 ,s

ðnÞ10 ,s

ðnÞ11 ,s

ðnÞ15

tðnÞ10 s

ðnÞ0 ,s

ðnÞ1 ,s

ðnÞ11 ,s

ðnÞ12 ,s

ðnÞ15

tðnÞ11 s

ðnÞ0 ,s

ðnÞ1 ,s

ðnÞ12 ,s

ðnÞ13 ,s

ðnÞ15

tðnÞ12 s

ðnÞ0 ,s

ðnÞ1 ,s

ðnÞ13 ,s

ðnÞ14 ,s

ðnÞ15

tðnÞ13 s

ðnÞ0 ,s

ðnÞ1 ,s

ðnÞ14

tðnÞ14 s

ðnÞ1 ,s

ðnÞ2 ,s

ðnÞ15

tðnÞ15 s

ðnÞ0 ,s

ðnÞ3 ,s

ðnÞ15

TABLE 7Error Detection Capability of the Proposed Scheme for Single-Bit Stuck-At Fault for All Fault Locations

2154 IEEE TRANSACTIONS ON COMPUTERS, VOL. 65, NO. 7, JULY 2016

Page 13: New Multiple-Bit Parity-Based Concurrent Fault Detection … · 2019. 7. 21. · Multiple-Bit Parity-Based Concurrent Fault Detection Architecture for Parallel CRC Computation Dipanwita

figures. Next using these area overhead and error detec-tion capability, we come to a trade-off of these twoparameters to calculate the particular value of number ofparity bits to achieve high error coverage withouthaving high area overhead for the proposed multiple-bitparity-based concurrent fault detection of parallel CRCstructure. We also present the implementation result ofthe proposed scheme.

A complexity analysis between double modular redun-dancy and the proposed scheme is presented in Table 3.While the parallel CRC implementation based on matrixmultiplication employs m:l number of XOR gates, therequired number of XOR gates for the proposed multiple-bit parity-based concurrent fault detection of CRC structure

is m:lþ w:ðlþmÞ, thus resulting in the overhead of w:ðlþmÞm:l .

If we consider the case when m ¼ l, the overhead becomes2:wm which is 25% for the operating value of w explained later

in this section. The critical path delay comparison has alsobeen summarised in Table 3, where TX represents the delayof a two input XOR gate. The correctness of the proposedscheme has been verified by Verilog coding which alsoproves the validity of the given overhead formulation. InVerilog, we have coded the proposed multiple-bit parityscheme and validated the functionality of the proposedarchitecture. We have added an additional design unit todetect the fault in the output of the flip-flops. The area over-head of this Additional Unit can be expressed aswðr� 1Þ þ w number of XOR gates, w number of flip-flopsand the delay can be expressed as 1þ log2rd eð ÞTX . Sincethis delay is much less compared to the critical path delayof the Parity Prediction Unit, the inclusion of this AdditionalUnit does not have significant impact on timing overhead ofthe overall architecture. We have also implemented thisAdditional Unit using Verilog code.

We have selected three values of w, i.e., 2, 3 and 4 form ¼ 32; as the corresponding error covrerages yield goodresults presented in the previous section and the area over-heads are depicted in Table 8. Similarly, we have imple-mented the proposed scheme for m ¼ 16 and the areaoverhead for w ¼ 2 is shown in Table 8. We have given thearea overhead for the proposed architecture in two separatecases, without the Additional Unit and with the AdditionalUnit. This Additional Unit is very similar to the Parity Cal-culation Unit in terms of architecture. So if we add theAdditional Unit but at the same time remove the Parity Cal-culation Unit, then we will not have this extra area overheaddue to the Additional Unit.

From previous discussions, we see that there is a trade-off between fault coverage in terms of number of detect-able errors and area overhead since increase in number ofparity bits (w) increases both the fault coverage and areaoverhead.

For three different values of m namely m ¼ 64 , m ¼ 32and m ¼ 16, we compare area overhead (in terms of extragates) and fault coverage (in terms of number of detectableerrors) for 2-bit errors for several values of number of paritybits (w). We have chosen 2-bit error because among all theeven number of errors undetectable by single bit parityscheme, this is the most common. For all these calculations,we have considered three different values for l covering allthree cases namely l > m represented by l ¼ 2m, l ¼ m andl < m represented by l ¼ m=2. This analysis helps us tocome into the conclusion that the value of w for which weget a high value of fault coverage while the area overhead isnot very high, i.e., the operating value of w can be safely con-sidered as w ¼ m

8 for l ¼ m. In short, if we use this operating

value of w, we will achieve good fault coverage and at thesame time, area overhead of the design will not be too high.Table 2 shows that fault coverage is independent of l whileTable 3 shows that area overhead is linearly dependent on lwhich gets confirmed from our calculation. For l ¼ m, theoperating value of w is w ¼ m

8 . If l > m, the operating value

ofw is found to be reduced fromw ¼ m8 to take care of linearly

increasing area overhead. So, for l > m, operating value ofwis less than m

8 . Similarly, for l < m, operating value of w is

found to be safely increased fromw ¼ m8 .

6 CONCLUSIONS

In this paper, we have proposed a concurrent fault detectiondesign structure for parallel CRC architecture. Two differ-ent cases, namely l � m and l < m have been consideredand the formulation for the multiple-bit parity-based con-current fault detection scheme has been derived followedby the implementation of the corresponding architecture foreach of them.

The error and fault coverage of the proposed schemehave been analyzed in detail using error and fault models.We have also presented a formulation for the error detectioncapability of the scheme covering all the possible 2m � 1cases where the output can become erroneous. Moreover, athorough analysis of all the possible single stuck-at faultlocations have been discussed. Furthermore, we have inves-tigated the relation between stuck-at fault location and errorpropagation resulting in an erroneous output and the rela-tion between this propagation with the CRC generator poly-nomial. Finally, the fault detection capability and thetheoretical time and area overheads of the proposedschemes have been reported and confirmed by softwaresimulation and ASIC implementation results. We havecoded the proposed scheme in C for software simulationand in Verilog for ASIC implementation purpose. The pro-posed method is shown to be area efficient and at the sametime it has a high fault detection capability. The proposedmultiple-bit parity prediction scheme has also been shown

TABLE 8Complexity Overhead of the Proposed Scheme from ASIC Implementation Results

GANGOPADHYAY AND REYHANI-MASOLEH: MULTIPLE-BIT PARITY-BASED CONCURRENT FAULT DETECTION ARCHITECTURE FOR PARALLEL... 2155

Page 14: New Multiple-Bit Parity-Based Concurrent Fault Detection … · 2019. 7. 21. · Multiple-Bit Parity-Based Concurrent Fault Detection Architecture for Parallel CRC Computation Dipanwita

to have not only the ability to detect a fault, but also to pointout the region of the fault. This potential advantage can fur-ther be utilized while extending the proposed scheme tocorrect the error online in future, i.e., to achieve an errorfree output even in the presence of a hardware fault.

ACKNOWLEDGMENTS

The authors would like to thank the reviewers for theirvaluable comments. This work has been supported byNSERC Discovery Accelerator Supplementary Grantsawarded to Arash Reyhani-Masoleh. The authors also thankCMC for the CAD tools.

REFERENCES

[1] W. Peterson and D. Brown, “Cyclic codes for error detection,”Proc. IRE, vol. 49, no. 1, pp. 228–235, 1961.

[2] ATM Layer Specification, ITU-T Recommendation I.361, 1999.[3] IEEE Standard for Information Technology: Carrier Sense Multiple

Access with Collision Detection (CSMA/CD) Access Method and Physi-cal Layer Specifications, ANSI/IEEE Std 802.3-2005.

[4] Wireless LAN Medium Access Control (MAC) and Physical Layer(PHY) Specifications, ANSI/IEEE Std 802.11-1999.

[5] IEEE Standard for Local and Metropolitan Area Networks: Air Interfacefor Fixed and Mobile Broadband WirelessAccess Systems, ANSI/IEEEStd. 802.16-2004.

[6] C. Kennedy, “High performance hardware and software imple-mentations of the cyclic redundancy check computation,” PhDThesis, University of Western Ontario, London, ON, 2009.

[7] S. Lin and D.J. Costello, Error Control Coding. Englewood Cliffs,NJ: Prentice-Hall, 1983.

[8] T. Ramabadran and S. Gaitonde, “A tutorial on CRCcomputations,” IEEE Micro, vol. 8, no. 4, pp. 62–75, Aug. 1988.

[9] T.-B. Pei and C. Zukowski, “High-speed parallel CRC circuits inVLSI,” IEEE Trans. Commun., vol. 40, no. 4, pp. 653–657, Apr. 1992.

[10] G. Albertengo and R. Sisto, “Parallel CRC generation,” IEEEMicro, vol. 10, no. 5, pp. 63–71, Oct. 1990.

[11] G. Campobello, G. Patane, and M. Russo, “Parallel CRC real-ization,” IEEE Trans. Comput., vol. 52, no. 10, pp. 1312–1319, Oct.2003.

[12] M.-D. Shieh, M.-H. Sheu, C.-H. Chen, and H.-F. Lo, “A systematicapproach for parallel CRC computations,” J. Inf. Sci. Eng., vol. 17,no. 3, pp. 445–461, 2001.

[13] M. Sprachmann, “Automatic generation of parallel CRC circuits,”IEEE Des. Test Comput., vol. 18, no. 3, pp. 108–114, May/Jun. 2001.

[14] C. Cheng and K. K. Parhi, “High-speed parallel CRC implementa-tion based on unfolding, pipelining, and retiming,” IEEE Trans.Circuits Syst. II, Express Briefs, vol. 53, no. 10, pp. 1017–1021, Oct.2006.

[15] C. Cheng and K. K. Parhi, “High speed VLSI architecture for gen-eral linear feedback shift register (LFSR) structures,” in Proc. IEEE43rd Asilomar Conf. Signals, Syst. Comput., 2009, pp. 713–717.

[16] M. Ayinala and K. K. Parhi, “High-speed parallel architectures forlinear feedback shift registers,” IEEE Trans. Signal Process., vol. 59,no. 9, pp. 4459–4469, Sep. 2011.

[17] C. Kennedy and A. Reyhani-Masoleh, “High-speed parallel CRCcircuits,” in Proc. 42nd Asilomar Conf. Signals, Syst. Comput., Oct.2008, pp. 1823–1829.

[18] M. Braun, J. Friedrich, T. Grn, and J. Lembert, “Parallel CRC com-putation in FPGAs” in Proc. 6th Int. Workshop Field-Programm.Logic Appl., 1996, vol. 1142, pp. 156–165.

[19] R. Glaise, “A two-step computation of cyclic redundancy codeCRC-32 for ATM networks,” IBM J. Res. Develop., vol. 41, no. 6,pp. 705–709, 1997.

[20] J. Derby, “High-speed CRC computation using state-space trans-formations,” in Proc. IEEE Global Telecommun. Conf., 2001, vol. 1,pp. 166–170.

[21] P. Reviriego, S. Pontarelli and J. A. Maestro, “Concurrent errordetection for orthogonal latin squares encoders and syndromecomputation,” IEEE Trans. Very Large Scale Integration Syst.,vol. 21, no 12, pp. 2334–2338, Dec. 2013.

[22] K. Namba and F. Lombardi, “A novel scheme for concurrent errordetection of OLS parallel decoders,” in Proc. IEEE Int. Symp. DefectFault Tolerance VLSI Nanotechnol. Syst., 2013, pp. 52–57.

[23] N. A. Touba and E. J. McCluskey, “Logic synthesis of multi-level circuits with concurrent error detection,” IEEE Trans.Comput.-Aided Des. Integrated Circuits Syst., vol. 16, no. 7,pp. 783–789, Jul. 1997.

[24] K. De, C. Natarajan, D. Nair and P. Banerjee, “RSYN: A system forautomated synthesis of reliable multilevel circuits,” IEEE Trans.Very Large Scale Integration Syst., vol. 2, no. 2, pp. 186–195, Jun.1994.

[25] M. Walma, “Pipelined cyclic redundancy check (CRC) calcu-lation,” in Proc. Int. Conf. Comput. Commun. Netw., Aug. 2007,pp. 365–370.

[26] E. Normand, “Single event upset at ground level,” IEEE Trans.Nuclear Sci., vol. 43, no. 6, pp. 2742–2750, Dec. 1996.

[27] D. Larner, “Sun flips bits in chips,” Electron. Times, vol. 878,pp. 72–24, 10 Nov. 1997.

[28] J. F. Ziegler, “Terrestrial cosmic rays intensities,” IBM J. Res.Develop., vol. 42, pp. 117–139, 1998.

[29] C. R. Baumann, “Radiation-induced soft errors in advanced semi-conductor technologies,” IEEE Trans. Device Materials Rel., vol. 5,no. 3, pp. 305–316, Sep. 2005.

[30] B. W. Johnson,Design and Analysis of Fault-Tolerant Digital Systems.Reading, MA, USA: Addison-Wesley, 1989.

[31] S. J. Piestrak, A. Dandache and F. Monteiro, “Designing fault-secure parallel encoders for systematic linear error correctingcodes,” IEEE Trans. Rel., vol. 52, no. 4, pp. 492–500, Dec. 2003.

[32] K. K. Parhi, “Eliminating the fanout bottleneck in parallel longBCH encoders,” IEEE Trans. Circuits Syst. I, Reg. Papers, vol. 51,no. 3, pp. 512–516, Mar. 2004.

[33] X. Zhang and K.K. Parhi, “High-speed architectures for parallellong BCH encoders,” in Proc. ACMGreat Lakes Symp. VLSI, Boston,MA, Apr. 2004, pp. 1–6.

[34] H. Jaber, F. Monteiro, S. J. Piestrak, and A. Dandache, “Design ofparallel fault-secure encoders for systematic cyclic block transmis-sion codes,” Microelectron. J., vol. 40, no. 12, pp. 1686–1697, Dec.2009.

[35] S. Pontarelli, G. C. Cardarilli, M. Re, and A. Salsano, “A novelerror detection and correction technique for RNS based FIR fil-ters,” in Proc. IEEE Int. Symp. Defect Fault Tolerance VLSI Syst., Oct.2008, pp. 436–444.

[36] G. C. Cardarilli, S. Pontarelli, M. Re, and A. Salsano, “A self check-ing Reed Solomon encoder: Design and analysis,” in Proc. IEEEInt. Symp. Defect Fault Tolerance VLSI Syst., Oct. 2005, pp. 111–119.

[37] G. C. Cardarilli, S. Pontarelli, M. Re, and A. Salsano, “Concurrenterror detection in Reed-Solomon encoders and decoders,” IEEETrans. Very Large Scale Integration Syst., vol. 15, no. 7, pp. 842–846,Jul. 2007.

[38] S. Bayat-Sarmadi and M. A. Hasan, “Concurrent error detection ofpolynomial basis multiplication over extension fields using a mul-tiple-bit parity scheme,” in Proc. IEEE Int. Symp. Defect Fault Toler-ance VLSI Syst., Oct. 2005, pp. 102–110.

[39] D. C. Feldmeier, “Fast software implementation of error detectioncodes,” IEEE/ACM Trans. Netw., vol. 3, no. 6, pp. 640–651, Dec.1995.

2156 IEEE TRANSACTIONS ON COMPUTERS, VOL. 65, NO. 7, JULY 2016

Page 15: New Multiple-Bit Parity-Based Concurrent Fault Detection … · 2019. 7. 21. · Multiple-Bit Parity-Based Concurrent Fault Detection Architecture for Parallel CRC Computation Dipanwita

Dipanwita Gangopadhyay received the BEdegree in electrical and computer engineeringfrom Jadavpur University, Kolkata, India, in 2004,and the MS degree in electrical and computerengineering from the Indian Institute of Technol-ogy (IIT) Madras, Chennai, India, in 2007, and iscurrently working toward the PhD degree with theDepartment of Electrical and Computer Engineer-ing, Western University, London, Ontario. Shewas an engineer with Qualcomm India PrivateLtd, Bangalore, India, from 2007 to 2009.

Arash Reyhani-Masoleh received the BScdegree in electrical and electronic engineeringfrom the Iran University of Science and Technol-ogy in 1989, the MSc degree in electrical andelectronic engineering from the University of Teh-ran in 1991, both with the first rank, and the PhDdegree in electrical and computer engineeringfrom the University of Waterloo in 2001. From1991 to 1997, he was with the Department ofElectrical Engineering, Iran University of Scienceand Technology. From June 2001 to September

2004, he was with the Center for Applied Cryptographic Research,University of Waterloo, where he was awarded a Natural Sciences andEngineering Research Council of Canada (NSERC) postdoctoral fellow-ship in 2002. In October 2004, he joined the Department of Electricaland Computer Engineering, Western University, London, Canada,where he is currently a tenured associate professor. His currentresearch interests include fault-tolerant computing, algorithms and VLSIarchitectures for computations in finite fields, cryptography, and error-control coding. He has been a two-time recipient of NSERC DiscoveryAccelerator Supplement (DAS) award in 2010 and 2015. Currently, heserves as an associate editor for Integration the VLSI Journal (Elsevier).He is a member of the IEEE and the IEEE Computer Society.

" For more information on this or any other computing topic,please visit our Digital Library at www.computer.org/publications/dlib.

GANGOPADHYAY AND REYHANI-MASOLEH: MULTIPLE-BIT PARITY-BASED CONCURRENT FAULT DETECTION ARCHITECTURE FOR PARALLEL... 2157