Dealing with Multiple Simultaneous Faults in Future Technologies

49
Embedded Systems Laboratory Informatics Institute Federal University of Rio Grande do Sul Porto Alegre – RS – Brazil SRC TechCon 2005 Portland, Oregon, USA Dealing with Multiple Simultaneous Faults in Future Technologies Carlos A. L. Lisbôa Erik Schüler Luigi Carro

description

Dealing with Multiple Simultaneous Faults in Future Technologies. Carlos A. L. LisbôaErik Schüler Luigi Carro. Why Multiple Simultaneous Faults ?. Future technologies (2010 and beyond) very small transistors and fewer electrons to form the channel (  SETs) - PowerPoint PPT Presentation

Transcript of Dealing with Multiple Simultaneous Faults in Future Technologies

Page 1: Dealing with Multiple Simultaneous Faults in Future Technologies

Embedded Systems Laboratory Informatics Institute

Federal University of Rio Grande do SulPorto Alegre – RS – Brazil

SRC TechCon 2005Portland, Oregon, USA

Dealing withMultiple Simultaneous Faults

in Future Technologies

Carlos A. L. Lisbôa Erik Schüler

Luigi Carro

Page 2: Dealing with Multiple Simultaneous Faults in Future Technologies

Carlos A. L. Lisbôa SRC TechCon 2005 - October, 26, 2005 - Paper # 20.4 2

Why Multiple Simultaneous Faults ?

• Future technologies (2010 and beyond)

• very small transistors and fewer electrons to form the

channel ( SETs)

• transient pulses due to radiation attack will last longer

than the propagation delays of gates

• devices will be more sensitive to the effects of

electromagnetic noise, neutrons and alpha particles

Page 3: Dealing with Multiple Simultaneous Faults in Future Technologies

Carlos A. L. Lisbôa SRC TechCon 2005 - October, 26, 2005 - Paper # 20.4 3

Single Event Upset Origin

1 0 1 0 0 0 0 1

0 1 0 1 1 1 1 0 1 1 0 1 1 1 1 0

Page 4: Dealing with Multiple Simultaneous Faults in Future Technologies

Carlos A. L. Lisbôa SRC TechCon 2005 - October, 26, 2005 - Paper # 20.4 4

Why Should One Study Multiple Faults ?

Change in paradigm:

Gates will behave statistically,

producing correct outputs only a

fraction of the time.

Page 5: Dealing with Multiple Simultaneous Faults in Future Technologies

Carlos A. L. Lisbôa SRC TechCon 2005 - October, 26, 2005 - Paper # 20.4 5

• New paradigm: multiple simultaneous faults• new fault tolerance techniques will be required

(TMR will no longer provide enough protection)

How to Deal with Multiple Faults ?

Page 6: Dealing with Multiple Simultaneous Faults in Future Technologies

Carlos A. L. Lisbôa SRC TechCon 2005 - October, 26, 2005 - Paper # 20.4 6

• New paradigm: multiple simultaneous faults• new fault tolerance techniques will be required (TMR

will no longer provide enough protection)

• How to deal with this problem ?

• new materials and manufacturing technologies

must be developed

OR• new design approaches must be taken

How to Deal with Multiple Faults ?

Page 7: Dealing with Multiple Simultaneous Faults in Future Technologies

Carlos A. L. Lisbôa SRC TechCon 2005 - October, 26, 2005 - Paper # 20.4 7

• New paradigm: multiple simultaneous faults• new fault tolerance techniques will be required (TMR

will no longer provide enough protection)

• How to deal with this problem ?

How to Deal with Multiple Faults ?

•new design approaches must be taken (our bet !)

Page 8: Dealing with Multiple Simultaneous Faults in Future Technologies

Carlos A. L. Lisbôa SRC TechCon 2005 - October, 26, 2005 - Paper # 20.4 8

Research Approaches

• Use of stochastic operators

• Use of bit stream operators

• Ensuring voter reliability to use n-MR while dealing with multiple simultaneous faults

• Next steps: 2005 - 2007 time frame

Page 9: Dealing with Multiple Simultaneous Faults in Future Technologies

Carlos A. L. Lisbôa SRC TechCon 2005 - October, 26, 2005 - Paper # 20.4 9

Research Evolution

OK for someDSP

Applications

Look

ing fo

r

mor

e sp

eed

StochasticOperators

Small footprintand fast

Tolerant to multiple faults in n-MR solutions

AnalogVoter

Bit StreamOperators

Looking for

tolerant converter

Page 10: Dealing with Multiple Simultaneous Faults in Future Technologies

Carlos A. L. Lisbôa SRC TechCon 2005 - October, 26, 2005 - Paper # 20.4 10

Using Stochastic Operators

• SEU induced transient errors are of random nature

Page 11: Dealing with Multiple Simultaneous Faults in Future Technologies

Carlos A. L. Lisbôa SRC TechCon 2005 - October, 26, 2005 - Paper # 20.4 11

Using Stochastic Operators

• SEU induced transient errors are of random nature

• Stochastic operators rely on randomness to produce approximate results

Page 12: Dealing with Multiple Simultaneous Faults in Future Technologies

Carlos A. L. Lisbôa SRC TechCon 2005 - October, 26, 2005 - Paper # 20.4 12

Using Stochastic Operators

• SEU induced transient errors are of random nature

• Stochastic operators rely on randomness to produce approximate results

• The injection of random faults in the input signals processed by stochastic operators did not impact the precision of the results

0 faults 2 faults 4 faults 8 faults0.1412 0.2580 0.1768 0.2196

Stochastic AdderConventional

0.0000

% Errors in 1,000 additions

Page 13: Dealing with Multiple Simultaneous Faults in Future Technologies

Carlos A. L. Lisbôa SRC TechCon 2005 - October, 26, 2005 - Paper # 20.4 13

Using Stochastic Operators

• SEU induced transient errors are of random nature

• Stochastic operators rely on randomness to produce approximate results

• The injection of random faults in the input signals processed by stochastic operators did not impact the precision of the results

• Several application areas (DSP) can deal with approximate values and still produce acceptable results (outputs)

Page 14: Dealing with Multiple Simultaneous Faults in Future Technologies

Carlos A. L. Lisbôa SRC TechCon 2005 - October, 26, 2005 - Paper # 20.4 14

Using Stochastic Operators

• Benefit: reduced area of the operators

Stochastic multiplier circuit

1000100110011010

10010001000010111000000100001010

Stochastic Adder Circuit

01100010101

010111011001S1

S3

Sum

01010101101

0010100110101

S2

Page 15: Dealing with Multiple Simultaneous Faults in Future Technologies

Carlos A. L. Lisbôa SRC TechCon 2005 - October, 26, 2005 - Paper # 20.4 15

Using Stochastic Operators

How does it work ?

Come and see the posters !

No free drinks, but the answer to this question is granted !

Page 16: Dealing with Multiple Simultaneous Faults in Future Technologies

Carlos A. L. Lisbôa SRC TechCon 2005 - October, 26, 2005 - Paper # 20.4 16

Using Bit Stream Operators

• Computation principles similar to those of the stochastic adder and multiplier

Page 17: Dealing with Multiple Simultaneous Faults in Future Technologies

Carlos A. L. Lisbôa SRC TechCon 2005 - October, 26, 2005 - Paper # 20.4 17

Using Bit Stream Operators

• Computation principles similar to those of the stochastic adder and multiplier

• Operators can produce bit streams which represent the exact results of the operation

Proposed Multiplication Algorithm - bit stream product(the count of 1’s in the stream is equal to the product value)

F12 F11 F10

x F22 F21 F20

F20.F12 F20.F11 F20.F10

F21.F12 F21.F11 F21.F10

F22.F12 F22.F11 F22.F10

b48 .. b33 b32 .. b17 b16 .. b5 b4 .. b1 b0

Page 18: Dealing with Multiple Simultaneous Faults in Future Technologies

Carlos A. L. Lisbôa SRC TechCon 2005 - October, 26, 2005 - Paper # 20.4 18

b48 .. b48 b47 .. b47 ... b0 .. b0 1 1 1 1 0 0 0

8 times 8 times 8 times +4total count of 1’s = 8 * product + 4

Using Bit Stream Operators

• Computation principles similar to those of the stochastic adder and multiplier

• Operators can produce bit streams which represent the exact results of the operation

• Redundancy is added to the bit streams in order to stand to multiple bit flips

Adding robustness to the bit stream through redundancy

Page 19: Dealing with Multiple Simultaneous Faults in Future Technologies

Carlos A. L. Lisbôa SRC TechCon 2005 - October, 26, 2005 - Paper # 20.4 19

Using Bit Stream Operators

• Computation principles similar to those of the stochastic adder and multiplier

• Operators can produce bit streams which represent the exact results of the operation

• Redundancy is added to the bit streams in order to stand to multiple bit flips

• Conversion of bit streams to binary coded values is delayed as much as possible, and conversion circuits must use TMR or n-MR for protection against faults

Page 20: Dealing with Multiple Simultaneous Faults in Future Technologies

Carlos A. L. Lisbôa SRC TechCon 2005 - October, 26, 2005 - Paper # 20.4 20

Using Bit Stream Operators

• Computation principles similar to those of the stochastic adder and multiplier

• Operators can produce bit streams which represent the exact results of the operation

• Redundancy is added to the bit streams in order to stand to multiple bit flips

• Conversion of bit streams to binary coded values is delayed as much as possible, and conversion circuits must use TMR or n-MR for protection against faults

• Issues to be further investigated: size of bit streams and area of the conversion circuits

Page 21: Dealing with Multiple Simultaneous Faults in Future Technologies

Carlos A. L. Lisbôa SRC TechCon 2005 - October, 26, 2005 - Paper # 20.4 21

Using Bit Stream Operators

No free food, but some more info on this subject will be provided !

How does it work ?

Come and see the posters !

Page 22: Dealing with Multiple Simultaneous Faults in Future Technologies

Carlos A. L. Lisbôa SRC TechCon 2005 - October, 26, 2005 - Paper # 20.4 22

VOTER

correct output

What is Wrong with TMR ?

• TMR protects only against single faults in one of the modules

Module 1

Module 2

Module 3

correct output

correct output

correct output

Page 23: Dealing with Multiple Simultaneous Faults in Future Technologies

Carlos A. L. Lisbôa SRC TechCon 2005 - October, 26, 2005 - Paper # 20.4 23

Module 2 wrong output

What is Wrong with TMR ?

Module 1

Module 3

correct output

correct output

VOTER

correct output

• TMR protects only against single faults in one of the modules

Page 24: Dealing with Multiple Simultaneous Faults in Future Technologies

Carlos A. L. Lisbôa SRC TechCon 2005 - October, 26, 2005 - Paper # 20.4 24

Module 2 correct output

What is Wrong with TMR ?

• TMR does not protect against double faults in different modules

Module 1

Module 3

wrong output

wrong output

VOTER

wrong output

Page 25: Dealing with Multiple Simultaneous Faults in Future Technologies

Carlos A. L. Lisbôa SRC TechCon 2005 - October, 26, 2005 - Paper # 20.4 25

VOTER

correct output

What is Wrong with TMR ?

• When a single fault occurs in the voter circuit, the voter output may be wrong

Module 1

Module 2

Module 3

correct output

correct output

correct output

Page 26: Dealing with Multiple Simultaneous Faults in Future Technologies

Carlos A. L. Lisbôa SRC TechCon 2005 - October, 26, 2005 - Paper # 20.4 26

VOTER

correct output ?

What is Wrong with TMR ?

Module 1

Module 2

Module 3

correct output

correct output

correct output

• When a single fault occurs in the voter circuit, the voter output may be wrong

Page 27: Dealing with Multiple Simultaneous Faults in Future Technologies

Carlos A. L. Lisbôa SRC TechCon 2005 - October, 26, 2005 - Paper # 20.4 27

Making TMR (n-MR) more reliable

• Known solutions imply in• area, performance and / or power penalties

• deadlock: how to protect the output generator ?

Page 28: Dealing with Multiple Simultaneous Faults in Future Technologies

Carlos A. L. Lisbôa SRC TechCon 2005 - October, 26, 2005 - Paper # 20.4 28

Making TMR (n-MR) more reliable

• Known solutions imply in• area, performance and / or power penalties

• deadlock: how to protect the output generator ?

• Proposed solution:• use TMR to cope with single faults in the modules

Page 29: Dealing with Multiple Simultaneous Faults in Future Technologies

Carlos A. L. Lisbôa SRC TechCon 2005 - October, 26, 2005 - Paper # 20.4 29

Making TMR (n-MR) more reliable

• Known solutions imply in• area, performance and / or power penalties

• deadlock: how to protect the output generator ?

• Proposed solution:• use TMR to cope with single faults in the modules

• replace the digital voter by an analog voter that• uses a comparator to generate the output

Page 30: Dealing with Multiple Simultaneous Faults in Future Technologies

Carlos A. L. Lisbôa SRC TechCon 2005 - October, 26, 2005 - Paper # 20.4 30

• Known solutions imply in• area, performance and / or power penalties

• deadlock: how to protect the output generator ?

• Proposed solution:• use TMR to cope with single faults in the modules

• replace the digital voter by an analog voter that• uses a comparator to generate the output

• can support some noise, nevertheless producing the correct result

Making TMR (n-MR) more reliable

Page 31: Dealing with Multiple Simultaneous Faults in Future Technologies

Carlos A. L. Lisbôa SRC TechCon 2005 - October, 26, 2005 - Paper # 20.4 31

The Analog Voter

Page 32: Dealing with Multiple Simultaneous Faults in Future Technologies

Carlos A. L. Lisbôa SRC TechCon 2005 - October, 26, 2005 - Paper # 20.4 32

Injection of faultsin the comparator (*)

Minimum Area Comparator

(*) using CMOS 0.35µm

Page 33: Dealing with Multiple Simultaneous Faults in Future Technologies

Carlos A. L. Lisbôa SRC TechCon 2005 - October, 26, 2005 - Paper # 20.4 33

Electrical Simulation: Multiple Faults(SPICE and CMOS 0.35 m)

Page 34: Dealing with Multiple Simultaneous Faults in Future Technologies

Carlos A. L. Lisbôa SRC TechCon 2005 - October, 26, 2005 - Paper # 20.4 34

Dealing with Multiple Simultaneous Faults: n-MR

The Analog Voter with 5 Inputs (for 5-MR)

Page 35: Dealing with Multiple Simultaneous Faults in Future Technologies

Carlos A. L. Lisbôa SRC TechCon 2005 - October, 26, 2005 - Paper # 20.4 35

Dealing with Multiple Simultaneous Faults: n-MR

The Analog Voter with 5 Inputs (for 5-MR)

Simulations with injection of2 simultaneous faults also succeeded

Page 36: Dealing with Multiple Simultaneous Faults in Future Technologies

Carlos A. L. Lisbôa SRC TechCon 2005 - October, 26, 2005 - Paper # 20.4 36

The Analog Voter ... Oops !

Does t

his

work ??

?

Page 37: Dealing with Multiple Simultaneous Faults in Future Technologies

Carlos A. L. Lisbôa SRC TechCon 2005 - October, 26, 2005 - Paper # 20.4 37

Let’s

see the

posters !

The Analog Voter

Page 38: Dealing with Multiple Simultaneous Faults in Future Technologies

Carlos A. L. Lisbôa SRC TechCon 2005 - October, 26, 2005 - Paper # 20.4 38

Future Work - Short Term (2005-2006)

• use of signal redundancy with other number representation forms, such as Sigma-Delta

Page 39: Dealing with Multiple Simultaneous Faults in Future Technologies

Carlos A. L. Lisbôa SRC TechCon 2005 - October, 26, 2005 - Paper # 20.4 39

Future Work - Short Term (2005-2006)

• use of signal redundancy with other number representation forms, such as Sigma-Delta

• use of the analog voter as an efficient way to implement robust n-MR circuits

Page 40: Dealing with Multiple Simultaneous Faults in Future Technologies

Carlos A. L. Lisbôa SRC TechCon 2005 - October, 26, 2005 - Paper # 20.4 40

Future Work - Short Term (2005-2006)

• use of signal redundancy with other number representation forms, such as Sigma-Delta

• use of the analog voter as an efficient way to implement robust n-MR circuits

• investigate the application of statistical methods and neural networks to the design of fault tolerant circuits with minimum redundancy

Page 41: Dealing with Multiple Simultaneous Faults in Future Technologies

Carlos A. L. Lisbôa SRC TechCon 2005 - October, 26, 2005 - Paper # 20.4 41

Future Work - Long Term (2006-2007)

• use of logic properties to develop signal redundancy with low cost

Page 42: Dealing with Multiple Simultaneous Faults in Future Technologies

Carlos A. L. Lisbôa SRC TechCon 2005 - October, 26, 2005 - Paper # 20.4 42

Future Work - Long Term (2006-2007)

• use of logic properties to develop signal redundancy with low cost

• apply the developed techniques to actual processors w/ DSP and VLIW architectures

Page 43: Dealing with Multiple Simultaneous Faults in Future Technologies

Carlos A. L. Lisbôa SRC TechCon 2005 - October, 26, 2005 - Paper # 20.4 43

Future Work - Long Term (2006-2007)

• use of logic properties to develop signal redundancy with low cost

• apply the developed techniques to actual processors with DSP and VLIW architectures

• discuss the architectural impact of new technologies together with fault tolerance

Page 44: Dealing with Multiple Simultaneous Faults in Future Technologies

Carlos A. L. Lisbôa SRC TechCon 2005 - October, 26, 2005 - Paper # 20.4 44

Research Evolution

StochasticOperators Analog

Voter

Bit StreamOperators

previous work (2004-2005) 2005 2006 2007

Page 45: Dealing with Multiple Simultaneous Faults in Future Technologies

Carlos A. L. Lisbôa SRC TechCon 2005 - October, 26, 2005 - Paper # 20.4 45

Research Evolution

StochasticOperators Analog

Voter

Bit StreamOperators

SigmaDelta

previous work (2004-2005) 2005 2006 2007

Page 46: Dealing with Multiple Simultaneous Faults in Future Technologies

Carlos A. L. Lisbôa SRC TechCon 2005 - October, 26, 2005 - Paper # 20.4 46

Research Evolution

StochasticOperators Analog

Voter

Bit StreamOperators

SigmaDelta

Logic Properties

previous work (2004-2005) 2005 2006 2007

Page 47: Dealing with Multiple Simultaneous Faults in Future Technologies

Carlos A. L. Lisbôa SRC TechCon 2005 - October, 26, 2005 - Paper # 20.4 47

Low cost

redundancy

Research Evolution

StochasticOperators Analog

Voter

Bit StreamOperators

SigmaDelta

Logic Properties

previous work (2004-2005) 2005 2006 2007

Page 48: Dealing with Multiple Simultaneous Faults in Future Technologies

Carlos A. L. Lisbôa SRC TechCon 2005 - October, 26, 2005 - Paper # 20.4 48

Application to actualDSP and VLIW processors

Low cost

redundancy

Research Evolution

StochasticOperators Analog

Voter

Bit StreamOperators

SigmaDelta

Logic Properties

DSP / VLIW

previous work (2004-2005) 2005 2006 2007

Page 49: Dealing with Multiple Simultaneous Faults in Future Technologies

Carlos A. L. Lisbôa SRC TechCon 2005 - October, 26, 2005 - Paper # 20.4 49

Questions ?

Looking forward to answer them at the poster booth!

(# 20.4)

Contact: [email protected]

Thank You !

No free anything, but a nice chat about these matters will be a pleasure !