1_Fault Tolerant ALU System

7/21/2019 1_Fault Tolerant ALU System

1/6

Fault Tolerant ALU System

Ayon Majumdar, Sahil Nayyar, Jitendra Singh Sengar

School of Electronics and Communication EngineeringLovely Professional University

[email protected]

Abstract This paper presents the design of FAULT TOLERANT

ALU SYSTEM by using Triple Modular Redundancy. ALU is a

critical component of microprocessor and is the core component of

central processing unit. Therefore, it is necessary for making the

ALU to be fault tolerant. The use of voting logic and disagreement

detector has been implied in making the ALU system to be faulttolerant. The source code for the following was developed in

VerilogHDL. The software used was XilinxISE.

Keywords fault tolerance, redundancy, TMR, ALU, voting logic

I.

INTRODUCTION

When some part of the system fails, the fault tolerant design

enables it to continue its normal operation, probably at reduced

level rather than total failure of the system. The whole system is

not failed due to the failure of a component whether its in the

case of hardware or software. [1]. Assume that a motor vehicle

has a spare tire, so as that its drivable when one of the tires is

punctured. Thus, the integrity of the structure is maintained inspite of failures like corrosion, fatigue etc. [1].

There are majorly two types of faults-

1. Permanent Faults are due to manufacturing defects,

early life failures, wear out failures

2.

Temporary Faults are only present for a short period of

time. Mostly caused by external disturbance ormarginal design parameters.

Permanent faults are quite hard to avoid, as they are

manufacturing defects of a system but we can avoid the

temporary faults. So, to avoid a system from temporary faults

we make it a Fault Tolerant System.

II. FAULT TOLERANT SYSTEM

Sometimes the system is able to continue its normal operation

even when some of its components fail. This property of the

system is called fault tolerance [2]. The operating quality is

proportional to the severity of the failure i.e. operating quality

decreases as the severity of the failure increases for naively

designed systems [2]. Fault tolerance becomes substantialdesign criteria for the applications where the reliability of

hardware was crucial. Medical, military and long-range

missions are such applications that the fault tolerance of

hardware became key issue [3].

A. Fault Tolerance Requirement

The basic characteristics of fault tolerant system are [1]-1. In case of failure, the system should be able to continue

its normal operation during the repair process withoutany interruption.

2. The failure should be isolated to the faulty component

instead of propagating it to the whole system.

3.

Mechanisms for the isolation of faulty components are

required for system protection.

B. Deciding Parameters for the System to be Fault Tolerant

To make the entire components fault tolerant for a system is notan ideal option. Below is given the criteria which should be kept

in mind before deciding which component should be made fault

tolerant.-

1.

Importance of the component, like in case of laptops,

the microprocessor is the most critical component.

Therefore it is more likeable to be made fault tolerantrather than any other component.

2.

Probability of the failure of the component, if a

component is more likely to fail than others, then it

should be made fault tolerant.

3.

Cost for making the component fault tolerant, for

example providing a redundant heat sink for a laptop is

too expensive both economically as well as in terms of

weight and board space.

C. System Level Operation

In hardware fault tolerance, it is required that the faulty part isreplaced with a spare one while the system is still in operation.

Systems that have a single backup are known as single point

tolerant.in such systems; the repair time should be quite less as

compared to mean time between failures [1].

Suppose the state of system operation is represented as S, where

S=0 means system operates normally and S=1 represents system

failure. Then S is a function of time t, as shown in Fig. 1 [4].

2012 International Conference on Computing Sciences

978-0-7695-4817-3/12 $26.00 2012 IEEE

DOI 10.1109/ICCS.2012.36

255

2012 International Conference on Computing Sciences

978-0-7695-4817-3/12 $26.00 2012 IEEE

DOI 10.1109/ICCS.2012.36

255


2/6

Fig. 1 System Operation and Repair

Suppose the system is in normal operation at t = 0, it fails at t1,

and the normal system operation is recovered at t2 by somesoftware modification, reset, or hardware replacement. Similar

failure and repair events happen at t3and t4 [4]. The duration of

normal system operation (Tn), for intervals such as t1 t0and t3

t2, is generally assumed to be a random number that is

exponentially distributed. This is known as the exponential

failure law.Hence, the probability that a system will operate normally until

time t, referred to as reliability, is given by:

(1)

where is the failure rate[4]. Because a system is composed of anumber of components, the overall failure rate for the system is

the sum of the individual failure rates (i) for each of the k

components:

(2)

The mean time between failures(MTBF) is given by:

(3)

Similarly, the repair time (R) is also assumed to obey an

exponential distribution and is given by:

(4)

where is the repair rate[4]. Hence, the mean time to repair

(MTTR) is given by:

(5)

The fraction of time that a system is operating normally (failure-

free) is the system availability and is given by:

(6)

This formula is widely used in reliability engineering; for

example, telephone systems are required to have system

availability of 0.9999 (simply called four nines), while high-reliability systems may require seven nines or more [4].

Redundancy is the most critical concept for a system to make

fault tolerant.

III. REDUNDANCY

The critical components or functions of the system areduplicated or might be triplicated, so as to increase the

reliability of the system [5]. This process is called redundancy.For example, for hydraulic systems of aircraft, the control

system may be triplicated to make it redundant. Therefore, if

there is an error in one component then it will be voted out by

the other two components [5]. Thus, the probability for thefailure of the system as a whole is greatly reduced.

A. Types of Redundancy

The four major forms of redundancy are as follows [5]:

1. Hardware redundancy, for example, DMR and TMR.

2.

Information redundancy, for example, Error detection

and correction methods.

3. Time redundancy, performs same operations twice tosee if it gets same outputs both time.

4.

Software redundancy, such as N-version programming.

B. Functions of Redundancy

There are two functions of redundancy i.e. passive redundancy

and active redundancy [5].

When excess capacity is used to reduce the impact of the

components failures it is known as passive redundancy. One

common example is increasing the build quality of some

components that are critical to the device [5].

The performance of each device is monitored and any decline in

it is eliminated. This is called active redundancy and this

monitoring is used in voting logic. Thus the voting logic can be

used for fault masking. The voting logic automatically

reconfigures components as it is linked to switching [5].

IV. TRIPLE MODULAR REDUNDANCY

For some time it has been known that the reliability of digital

systems can be improved through the use of redundant

components, if these additional components are properly

employed. The most common type of redundancy method is

Triple Modular Redundancy (TMR) which has been explainedfurther in this paper [7].

Triple modular redundancy, (TMR) is a fault-tolerant form of N-

modular redundancy, in which three systems perform a process

and that result is processed by a voting system to produce a

single output [6]. If any one of the three systems fails, the other

two systems can correct and mask the fault. If the voter failsthen the complete system will fail.

The majority voter uses voting logic as shown in Fig. 2.

256256


3/6

Fig. 2 Example of Triple Modular Redun

In TMR, as shown in Fig. 2, the outputs of all

are compared using the majority voter andpassed as the final output. Suppose two out

have similar outputs the majority voter can

replication has error as two-to-one vote is

majority voter. After this only two modulesmajority voter can switch to dual modular red

TMR can be used for N number of replicatiosystem will not fail if none of the three mo

exactly one of the three modules fails [7]. It is

failures of the three modules are independent [

events are mutually exclusive, the reliability R

system is equal to the sum of the probabili

events [7]. Hence,

R=Rm3+3Rm

2(1-Rm) = 3Rm

2-2Rm

3

The voting logic compares the outputs of all

the majority output i.e. if all three outputs

becomes the final output and if two out ofsame then the two same outputs become the f

if the two same outputs are erred output then i

final output.

V.

ARITHMETIC LOGIC

ALU (Arithmetic logic unit) is a critical

microprocessor and is the core component of

unit [8]. ALUs comprise the combinati

implements logic operations, such as AND

arithmetic operations, such as ADD and SUBT

Most of a processor's operations are performe

ALUs. All the data is loaded from the inputALU and the operation to be performed on that

is decided by the Control Unit [9]. The outputoutput registers. Control Unit is used to trans

data between the two registers, ALU and mem

ancy

he three modules

the majority areof three modules

determine which

observed by the

are left and theundancy (DMR).

s.The redundantdules fails, or if

assumed that the

7]. Since the two

of the redundant

ies of these two

(7)

the modules pass

are same then it

three outputs areinal output. Also,

t will become the

NIT

component of a

entral processing

onal logic that

nd OR etc., and

ACT etc. [8]

by one or more

registers into an

data by the ALU

result is stored infer the processed

ory [9]. An ALU

implements a total of 16 functions i.

8 logical functions. Most ALUs

operations:1.

Bitwise logic operations

NAND, NOR, XNOR)

2. Integer arithmetic operatio

3. Bit-shifting operations.

VI. FAULT TOLER

ALU is an essential part of CPU; th

it fault tolerant rather than any other

Fig. 3 Fault Tolerant

To make the ALU fault tolerant

Triple Modular Redundancy. In thimplemented is triplicated, each h

making it triple mode redundant.

The output of the three ALUs is

Circuit that will compare the out

output. This means that if any two

output, then that output will be pas

becomes the final output of the wh

ALUs giving the same outputs, thfinal output but in case of all the

outputs then the voting circuit is u

this time the final output will be ind

Disagreement Detector compares t

ALUs and indicates which ALU is

in general which ALU is the fault

outputs are same then it indicatesdisagreement detector fails if any t

will then indicate that the one ALU

e. 8 arithmetic functions and

can perform the following

(AND, NOT, OR, XOR,

s

NT ALU SYSTEM

erefore it is critical to make

component.

LU System

e have used the method of

is method the ALU systemaving the same input, thus

passed through the Voting

uts and pass the majority

ALUs are giving the same

ed by the voting circuit and

le circuit. In case of all the

en that output becomes thehree ALUs giving different

der a conflict and fails. At

terminate.

he outputs of all the three

giving a different output or

one. Moreover, if all three

that no ALU is faulty. Thewo ALUs become faulty. It

hat is fault free to be faulty.

257257


4/6

Thus, we have made the ALU system fault tolerant to a great

level but still the problem persists. Its because practically we

are unable to make a 100% fault free system. We can reduce thelevel of fault occurrence but we cannot totally omit it. In the

above Fault Tolerant ALU System, there is a limitation i.e. it

fails if N-1 systems become faulty. In other words, out of N

systems (where N being odd no. of systems), if N-1 systems are

faulty then our model fails. In case of ALU, out of three ALUs,if any two ALUs fail then the whole model fails.

A. Result of the ALU Implemented

An 8-bit ALU was implemented on VerilogHDL. It has two

input ports, a and b, one output port out and one port for

command line. The RTL schematic of the ALU is shown along

with the simulated output.

Fig. 4 Simulated output of the ALU

The 8-bit ALU implemented has 8 arithmetic and 8 logical

functions. Its simulated output is shown in Fig. 4 showing all the

functions along with its RTL schematic in Fig. 5.

The variable command determines which function to be

executed and when to be executed. If command is 0 then

addition function is executed as 0 has been assigned to addition.

If command is 8 then logical AND will be performed, as 8 has

been assigned to it and so on. Whereas the output enable oe

determines the availability of the output. When oe is 1, the

output is available and when oe is 0, no output is obtained. So,

oe is made high by default to receive the output.

Below is the RTL Schematic of the ALU implemented showing

blocks of various functions like addition, subtraction,

multiplication, division etc.

Fig. 5 RTL Schematic of the ALU

258258


5/6

B. Result of Fault Tolerant ALU System

Below is the simulated output of the fault tolerant ALU system

designed using VerilogHDL.

Fig. 6 Simulated Output of Fault Tolerant ALU System

Algorithm for the fault tolerant ALU system is as follows:

1.

Design an ALU system and then triplicate it to achieve

TMR.

2. Now design the voting circuit, compare all the three

outputs of the ALUs-

a.

Lets consider the outputs to be a, b and c of

the three ALUs and y, the majority output

passing from the voting circuit.

b.

If a=b and ac then y=a.

c. If b=c and ba then y=b.

d. If c=a and cb then y=c.

e.

If a=b=c then y=a or y=b or y=c.

3. Now design the disagreement detector, again compare

the outputs of the three ALUs-a.

Lets consider the outputs to be p, q and r of

the three ALUs.

b. Lets take three indicators u, v and w for p, qand r respectively.

c.

If p=q and pr then ALU_3 is faulty; w=1.

d. If q=r and qp then ALU_1 is faulty; u=1.

e. If r=p and rq then ALU_2 is faulty; v=1.

f.

If a=b=c then No ALU is faulty; p=0, q=0

and r=0.

Fig. 7 RTL Schematic for Fault Tolerant ALU System

259259


6/6

The above schematic shows three ALU modules integrated into

a single module thus exhibiting triple modular redundancy.

The previously mentioned algorithm implies the design of faulttolerant ALU system on VerilogHDL. Here, a, b and c are

considered to be the outputs of ALU_1, ALU_2 and ALU_3

respectively.

Similarly, p, q and r are considered to be the outputs of

ALU_1, ALU_2 and ALU_3 respectively.The simulated output of the fault tolerant ALU system is shownin Fig. 6, from which it is clear that a and b are the primary

inputs whereas oe used for output enable and command is used

for which function of the ALU to be selected. The out1, out2

and out3 in Fig. 6 represent the output of the three ALUs

respectively whereas dout represents the output of the

disagreement detector. Also, the indicators u, v and w arerepresented as x, y and z respectively.

In this fault tolerant ALU system, the second ALU module is

considered to be faulty and can be seen in the simulated output

in Fig. 6. Also, the function performed by the ALU is addition

for this case.

VII. CONCLUSION

Ideal systems that can be made completely fault tolerant or fail

safe do not exist in real world. Thus, the fault tolerant ALU

system has its limitations that can be overcome by replacing the

faulty module with a spare one. For this the system should be

optimized in such a manner that the mean time between failures

(MTBF) is more than the mean time to repair (MTTR). The

faulty module can be replaced with a spare one before the other

module fails while the system continues its normal operation.

Also, the built quality can be increased while taking care of

other measures, such that the ALU becomes less likely to fail.

Thus, the ALU system becomes fault tolerant to a great extent as

achieving sufficient fault tolerance is the major design issue.

REFERENCES[1] Fault Tolerant Design [Online]. Available: http://www.bgb.gr/storage/

[2] P. J. Denning (December 1976). "Fault Tolerant Operating Systems".ACM Computing Surveys (CSUR)

[3] Hierarchical Triple-Modular Redundancy (H-TMR)Network For DigitalSystems by B. Baykant Alagoz

[4] Laung Terng Wang, Cheng Wen Wu and Xiaoqing Wen VLSI TestPrinciples and Architectures: Design for Testability The MorganKaufmann Series in Systems on Silicon, 2008

[5] Redundancy Management Technique for Space Shuttle Computers, IBMResearch

[6] David Ratter. "FPGAs on Mars"[7] The Use of Triple-Modular Redundancy to Improve Computer

Reliability by R.E. Lyons and W. Vanderkulk[8] 8 Bit Arithmetic Logic Unit by Samuel Winchenbach and Mohammed

Driss, University of Maine, Orono.

[9]

Stallings, William (2006). Computer Organization & Architecture:Designing for Performance7th ed. Pearson Prentice Hall.

260260

1_Fault Tolerant ALU System

Documents

Transcript of 1_Fault Tolerant ALU System