1_Fault Tolerant ALU System
-
Upload
wahyu-dharma -
Category
Documents
-
view
218 -
download
0
Transcript of 1_Fault Tolerant ALU System
-
7/21/2019 1_Fault Tolerant ALU System
1/6
Fault Tolerant ALU System
Ayon Majumdar, Sahil Nayyar, Jitendra Singh Sengar
School of Electronics and Communication EngineeringLovely Professional University
Abstract This paper presents the design of FAULT TOLERANT
ALU SYSTEM by using Triple Modular Redundancy. ALU is a
critical component of microprocessor and is the core component of
central processing unit. Therefore, it is necessary for making the
ALU to be fault tolerant. The use of voting logic and disagreement
detector has been implied in making the ALU system to be faulttolerant. The source code for the following was developed in
VerilogHDL. The software used was XilinxISE.
Keywords fault tolerance, redundancy, TMR, ALU, voting logic
I.
INTRODUCTION
When some part of the system fails, the fault tolerant design
enables it to continue its normal operation, probably at reduced
level rather than total failure of the system. The whole system is
not failed due to the failure of a component whether its in the
case of hardware or software. [1]. Assume that a motor vehicle
has a spare tire, so as that its drivable when one of the tires is
punctured. Thus, the integrity of the structure is maintained inspite of failures like corrosion, fatigue etc. [1].
There are majorly two types of faults-
1. Permanent Faults are due to manufacturing defects,
early life failures, wear out failures
2.
Temporary Faults are only present for a short period of
time. Mostly caused by external disturbance ormarginal design parameters.
Permanent faults are quite hard to avoid, as they are
manufacturing defects of a system but we can avoid the
temporary faults. So, to avoid a system from temporary faults
we make it a Fault Tolerant System.
II. FAULT TOLERANT SYSTEM
Sometimes the system is able to continue its normal operation
even when some of its components fail. This property of the
system is called fault tolerance [2]. The operating quality is
proportional to the severity of the failure i.e. operating quality
decreases as the severity of the failure increases for naively
designed systems [2]. Fault tolerance becomes substantialdesign criteria for the applications where the reliability of
hardware was crucial. Medical, military and long-range
missions are such applications that the fault tolerance of
hardware became key issue [3].
A. Fault Tolerance Requirement
The basic characteristics of fault tolerant system are [1]-1. In case of failure, the system should be able to continue
its normal operation during the repair process withoutany interruption.
2. The failure should be isolated to the faulty component
instead of propagating it to the whole system.
3.
Mechanisms for the isolation of faulty components are
required for system protection.
B. Deciding Parameters for the System to be Fault Tolerant
To make the entire components fault tolerant for a system is notan ideal option. Below is given the criteria which should be kept
in mind before deciding which component should be made fault
tolerant.-
1.
Importance of the component, like in case of laptops,
the microprocessor is the most critical component.
Therefore it is more likeable to be made fault tolerantrather than any other component.
2.
Probability of the failure of the component, if a
component is more likely to fail than others, then it
should be made fault tolerant.
3.
Cost for making the component fault tolerant, for
example providing a redundant heat sink for a laptop is
too expensive both economically as well as in terms of
weight and board space.
C. System Level Operation
In hardware fault tolerance, it is required that the faulty part isreplaced with a spare one while the system is still in operation.
Systems that have a single backup are known as single point
tolerant.in such systems; the repair time should be quite less as
compared to mean time between failures [1].
Suppose the state of system operation is represented as S, where
S=0 means system operates normally and S=1 represents system
failure. Then S is a function of time t, as shown in Fig. 1 [4].
2012 International Conference on Computing Sciences
978-0-7695-4817-3/12 $26.00 2012 IEEE
DOI 10.1109/ICCS.2012.36
255
2012 International Conference on Computing Sciences
978-0-7695-4817-3/12 $26.00 2012 IEEE
DOI 10.1109/ICCS.2012.36
255
-
7/21/2019 1_Fault Tolerant ALU System
2/6
Fig. 1 System Operation and Repair
Suppose the system is in normal operation at t = 0, it fails at t1,
and the normal system operation is recovered at t2 by somesoftware modification, reset, or hardware replacement. Similar
failure and repair events happen at t3and t4 [4]. The duration of
normal system operation (Tn), for intervals such as t1 t0and t3
t2, is generally assumed to be a random number that is
exponentially distributed. This is known as the exponential
failure law.Hence, the probability that a system will operate normally until
time t, referred to as reliability, is given by:
(1)
where is the failure rate[4]. Because a system is composed of anumber of components, the overall failure rate for the system is
the sum of the individual failure rates (i) for each of the k
components:
(2)
The mean time between failures(MTBF) is given by:
(3)
Similarly, the repair time (R) is also assumed to obey an
exponential distribution and is given by:
(4)
where is the repair rate[4]. Hence, the mean time to repair
(MTTR) is given by:
(5)
The fraction of time that a system is operating normally (failure-
free) is the system availability and is given by:
(6)
This formula is widely used in reliability engineering; for
example, telephone systems are required to have system
availability of 0.9999 (simply called four nines), while high-reliability systems may require seven nines or more [4].
Redundancy is the most critical concept for a system to make
fault tolerant.
III. REDUNDANCY
The critical components or functions of the system areduplicated or might be triplicated, so as to increase the
reliability of the system [5]. This process is called redundancy.For example, for hydraulic systems of aircraft, the control
system may be triplicated to make it redundant. Therefore, if
there is an error in one component then it will be voted out by
the other two components [5]. Thus, the probability for thefailure of the system as a whole is greatly reduced.
A. Types of Redundancy
The four major forms of redundancy are as follows [5]:
1. Hardware redundancy, for example, DMR and TMR.
2.
Information redundancy, for example, Error detection
and correction methods.
3. Time redundancy, performs same operations twice tosee if it gets same outputs both time.
4.
Software redundancy, such as N-version programming.
B. Functions of Redundancy
There are two functions of redundancy i.e. passive redundancy
and active redundancy [5].
When excess capacity is used to reduce the impact of the
components failures it is known as passive redundancy. One
common example is increasing the build quality of some
components that are critical to the device [5].
The performance of each device is monitored and any decline in
it is eliminated. This is called active redundancy and this
monitoring is used in voting logic. Thus the voting logic can be
used for fault masking. The voting logic automatically
reconfigures components as it is linked to switching [5].
IV. TRIPLE MODULAR REDUNDANCY
For some time it has been known that the reliability of digital
systems can be improved through the use of redundant
components, if these additional components are properly
employed. The most common type of redundancy method is
Triple Modular Redundancy (TMR) which has been explainedfurther in this paper [7].
Triple modular redundancy, (TMR) is a fault-tolerant form of N-
modular redundancy, in which three systems perform a process
and that result is processed by a voting system to produce a
single output [6]. If any one of the three systems fails, the other
two systems can correct and mask the fault. If the voter failsthen the complete system will fail.
The majority voter uses voting logic as shown in Fig. 2.
256256
-
7/21/2019 1_Fault Tolerant ALU System
3/6
Fig. 2 Example of Triple Modular Redun
In TMR, as shown in Fig. 2, the outputs of all
are compared using the majority voter andpassed as the final output. Suppose two out
have similar outputs the majority voter can
replication has error as two-to-one vote is
majority voter. After this only two modulesmajority voter can switch to dual modular red
TMR can be used for N number of replicatiosystem will not fail if none of the three mo
exactly one of the three modules fails [7]. It is
failures of the three modules are independent [
events are mutually exclusive, the reliability R
system is equal to the sum of the probabili
events [7]. Hence,
R=Rm3+3Rm
2(1-Rm) = 3Rm
2-2Rm
3
The voting logic compares the outputs of all
the majority output i.e. if all three outputs
becomes the final output and if two out ofsame then the two same outputs become the f
if the two same outputs are erred output then i
final output.
V.
ARITHMETIC LOGIC
ALU (Arithmetic logic unit) is a critical
microprocessor and is the core component of
unit [8]. ALUs comprise the combinati
implements logic operations, such as AND
arithmetic operations, such as ADD and SUBT
Most of a processor's operations are performe
ALUs. All the data is loaded from the inputALU and the operation to be performed on that
is decided by the Control Unit [9]. The outputoutput registers. Control Unit is used to trans
data between the two registers, ALU and mem
ancy
he three modules
the majority areof three modules
determine which
observed by the
are left and theundancy (DMR).
s.The redundantdules fails, or if
assumed that the
7]. Since the two
of the redundant
ies of these two
(7)
the modules pass
are same then it
three outputs areinal output. Also,
t will become the
NIT
component of a
entral processing
onal logic that
nd OR etc., and
ACT etc. [8]
by one or more
registers into an
data by the ALU
result is stored infer the processed
ory [9]. An ALU
implements a total of 16 functions i.
8 logical functions. Most ALUs
operations:1.
Bitwise logic operations
NAND, NOR, XNOR)
2. Integer arithmetic operatio
3. Bit-shifting operations.
VI. FAULT TOLER
ALU is an essential part of CPU; th
it fault tolerant rather than any other
Fig. 3 Fault Tolerant
To make the ALU fault tolerant
Triple Modular Redundancy. In thimplemented is triplicated, each h
making it triple mode redundant.
The output of the three ALUs is
Circuit that will compare the out
output. This means that if any two
output, then that output will be pas
becomes the final output of the wh
ALUs giving the same outputs, thfinal output but in case of all the
outputs then the voting circuit is u
this time the final output will be ind
Disagreement Detector compares t
ALUs and indicates which ALU is
in general which ALU is the fault
outputs are same then it indicatesdisagreement detector fails if any t
will then indicate that the one ALU
e. 8 arithmetic functions and
can perform the following
(AND, NOT, OR, XOR,
s
NT ALU SYSTEM
erefore it is critical to make
component.
LU System
e have used the method of
is method the ALU systemaving the same input, thus
passed through the Voting
uts and pass the majority
ALUs are giving the same
ed by the voting circuit and
le circuit. In case of all the
en that output becomes thehree ALUs giving different
der a conflict and fails. At
terminate.
he outputs of all the three
giving a different output or
one. Moreover, if all three
that no ALU is faulty. Thewo ALUs become faulty. It
hat is fault free to be faulty.
257257
-
7/21/2019 1_Fault Tolerant ALU System
4/6
Thus, we have made the ALU system fault tolerant to a great
level but still the problem persists. Its because practically we
are unable to make a 100% fault free system. We can reduce thelevel of fault occurrence but we cannot totally omit it. In the
above Fault Tolerant ALU System, there is a limitation i.e. it
fails if N-1 systems become faulty. In other words, out of N
systems (where N being odd no. of systems), if N-1 systems are
faulty then our model fails. In case of ALU, out of three ALUs,if any two ALUs fail then the whole model fails.
A. Result of the ALU Implemented
An 8-bit ALU was implemented on VerilogHDL. It has two
input ports, a and b, one output port out and one port for
command line. The RTL schematic of the ALU is shown along
with the simulated output.
Fig. 4 Simulated output of the ALU
The 8-bit ALU implemented has 8 arithmetic and 8 logical
functions. Its simulated output is shown in Fig. 4 showing all the
functions along with its RTL schematic in Fig. 5.
The variable command determines which function to be
executed and when to be executed. If command is 0 then
addition function is executed as 0 has been assigned to addition.
If command is 8 then logical AND will be performed, as 8 has
been assigned to it and so on. Whereas the output enable oe
determines the availability of the output. When oe is 1, the
output is available and when oe is 0, no output is obtained. So,
oe is made high by default to receive the output.
Below is the RTL Schematic of the ALU implemented showing
blocks of various functions like addition, subtraction,
multiplication, division etc.
Fig. 5 RTL Schematic of the ALU
258258
-
7/21/2019 1_Fault Tolerant ALU System
5/6
B. Result of Fault Tolerant ALU System
Below is the simulated output of the fault tolerant ALU system
designed using VerilogHDL.
Fig. 6 Simulated Output of Fault Tolerant ALU System
Algorithm for the fault tolerant ALU system is as follows:
1.
Design an ALU system and then triplicate it to achieve
TMR.
2. Now design the voting circuit, compare all the three
outputs of the ALUs-
a.
Lets consider the outputs to be a, b and c of
the three ALUs and y, the majority output
passing from the voting circuit.
b.
If a=b and ac then y=a.
c. If b=c and ba then y=b.
d. If c=a and cb then y=c.
e.
If a=b=c then y=a or y=b or y=c.
3. Now design the disagreement detector, again compare
the outputs of the three ALUs-a.
Lets consider the outputs to be p, q and r of
the three ALUs.
b. Lets take three indicators u, v and w for p, qand r respectively.
c.
If p=q and pr then ALU_3 is faulty; w=1.
d. If q=r and qp then ALU_1 is faulty; u=1.
e. If r=p and rq then ALU_2 is faulty; v=1.
f.
If a=b=c then No ALU is faulty; p=0, q=0
and r=0.
Fig. 7 RTL Schematic for Fault Tolerant ALU System
259259
-
7/21/2019 1_Fault Tolerant ALU System
6/6
The above schematic shows three ALU modules integrated into
a single module thus exhibiting triple modular redundancy.
The previously mentioned algorithm implies the design of faulttolerant ALU system on VerilogHDL. Here, a, b and c are
considered to be the outputs of ALU_1, ALU_2 and ALU_3
respectively.
Similarly, p, q and r are considered to be the outputs of
ALU_1, ALU_2 and ALU_3 respectively.The simulated output of the fault tolerant ALU system is shownin Fig. 6, from which it is clear that a and b are the primary
inputs whereas oe used for output enable and command is used
for which function of the ALU to be selected. The out1, out2
and out3 in Fig. 6 represent the output of the three ALUs
respectively whereas dout represents the output of the
disagreement detector. Also, the indicators u, v and w arerepresented as x, y and z respectively.
In this fault tolerant ALU system, the second ALU module is
considered to be faulty and can be seen in the simulated output
in Fig. 6. Also, the function performed by the ALU is addition
for this case.
VII. CONCLUSION
Ideal systems that can be made completely fault tolerant or fail
safe do not exist in real world. Thus, the fault tolerant ALU
system has its limitations that can be overcome by replacing the
faulty module with a spare one. For this the system should be
optimized in such a manner that the mean time between failures
(MTBF) is more than the mean time to repair (MTTR). The
faulty module can be replaced with a spare one before the other
module fails while the system continues its normal operation.
Also, the built quality can be increased while taking care of
other measures, such that the ALU becomes less likely to fail.
Thus, the ALU system becomes fault tolerant to a great extent as
achieving sufficient fault tolerance is the major design issue.
REFERENCES[1] Fault Tolerant Design [Online]. Available: http://www.bgb.gr/storage/
[2] P. J. Denning (December 1976). "Fault Tolerant Operating Systems".ACM Computing Surveys (CSUR)
[3] Hierarchical Triple-Modular Redundancy (H-TMR)Network For DigitalSystems by B. Baykant Alagoz
[4] Laung Terng Wang, Cheng Wen Wu and Xiaoqing Wen VLSI TestPrinciples and Architectures: Design for Testability The MorganKaufmann Series in Systems on Silicon, 2008
[5] Redundancy Management Technique for Space Shuttle Computers, IBMResearch
[6] David Ratter. "FPGAs on Mars"[7] The Use of Triple-Modular Redundancy to Improve Computer
Reliability by R.E. Lyons and W. Vanderkulk[8] 8 Bit Arithmetic Logic Unit by Samuel Winchenbach and Mohammed
Driss, University of Maine, Orono.
[9]
Stallings, William (2006). Computer Organization & Architecture:Designing for Performance7th ed. Pearson Prentice Hall.
260260