Layali Rashid , Karthik Pattabiraman and Sathish...
Transcript of Layali Rashid , Karthik Pattabiraman and Sathish...
![Page 1: Layali Rashid , Karthik Pattabiraman and Sathish …webhost.laas.fr/TSF/WDSN10/WDSN10_files/Slides/WDSN10...Layali Rashid, Karthik Pattabiraman and Sathish Gopalakrishnan Created Date](https://reader030.fdocuments.us/reader030/viewer/2022013003/5f515b4de5f918157102d1b7/html5/thumbnails/1.jpg)
TOWARDS UNDERSTANDING THE EFFECTS OF
INTERMITTENT HARDWARE FAULTS ON PROGRAMS
Layali Rashid, Karthik Pattabiraman and Sathish GopalakrishnanDEPARTMENT OF ELECTRICAL AND COMPUTER ENGINEERING
THE UNIVERSITY OF BRITISH COLUMBIA
![Page 2: Layali Rashid , Karthik Pattabiraman and Sathish …webhost.laas.fr/TSF/WDSN10/WDSN10_files/Slides/WDSN10...Layali Rashid, Karthik Pattabiraman and Sathish Gopalakrishnan Created Date](https://reader030.fdocuments.us/reader030/viewer/2022013003/5f515b4de5f918157102d1b7/html5/thumbnails/2.jpg)
Motivation: Why Intermittent Faults?
� Intermittent faults are likely to be a significant concern in future processors� Do not persist forever unlike permanent faults
� Persist for longer duration than transient faults
� May impact program more than transient faults� May impact program more than transient faults
� Assumption:
� An intermittent fault affects two or more consecutive instructions in the program.
![Page 3: Layali Rashid , Karthik Pattabiraman and Sathish …webhost.laas.fr/TSF/WDSN10/WDSN10_files/Slides/WDSN10...Layali Rashid, Karthik Pattabiraman and Sathish Gopalakrishnan Created Date](https://reader030.fdocuments.us/reader030/viewer/2022013003/5f515b4de5f918157102d1b7/html5/thumbnails/3.jpg)
Contributions
� Study the impact of intermittent faults on programs.
� Model the propagation of intermittent faults in programs at the instruction-level.
� Validate the model using fault injections.� Validate the model using fault injections.
![Page 4: Layali Rashid , Karthik Pattabiraman and Sathish …webhost.laas.fr/TSF/WDSN10/WDSN10_files/Slides/WDSN10...Layali Rashid, Karthik Pattabiraman and Sathish Gopalakrishnan Created Date](https://reader030.fdocuments.us/reader030/viewer/2022013003/5f515b4de5f918157102d1b7/html5/thumbnails/4.jpg)
Motivation: Why Model Error Propagation?
� Fault injection experiments are prohibitively expensive.� Intermittent faults vary in location and duration.
� An order of magnitude slower than modeling.
� Modeling error propagation provides more insights that may help in tolerating faults.
![Page 5: Layali Rashid , Karthik Pattabiraman and Sathish …webhost.laas.fr/TSF/WDSN10/WDSN10_files/Slides/WDSN10...Layali Rashid, Karthik Pattabiraman and Sathish Gopalakrishnan Created Date](https://reader030.fdocuments.us/reader030/viewer/2022013003/5f515b4de5f918157102d1b7/html5/thumbnails/5.jpg)
Primary Research Questions
� Do all intermittent faults lead to program crash?
� How many instructions are executed before the program crashes? program crashes?
� How many variables are corrupted by the fault before the program crashes?
![Page 6: Layali Rashid , Karthik Pattabiraman and Sathish …webhost.laas.fr/TSF/WDSN10/WDSN10_files/Slides/WDSN10...Layali Rashid, Karthik Pattabiraman and Sathish Gopalakrishnan Created Date](https://reader030.fdocuments.us/reader030/viewer/2022013003/5f515b4de5f918157102d1b7/html5/thumbnails/6.jpg)
Approach
Crash ModelFault Model
Dynamic Dependency Graph
SimpleScalarsimulator
Evaluate using FI
![Page 7: Layali Rashid , Karthik Pattabiraman and Sathish …webhost.laas.fr/TSF/WDSN10/WDSN10_files/Slides/WDSN10...Layali Rashid, Karthik Pattabiraman and Sathish Gopalakrishnan Created Date](https://reader030.fdocuments.us/reader030/viewer/2022013003/5f515b4de5f918157102d1b7/html5/thumbnails/7.jpg)
Approach
Crash Model
Fault Model• Decoder•ALU Unit• Load/Store Unit
SimpleScalarsimulator
Evaluate using FI
Dynamic Dependency Graph
![Page 8: Layali Rashid , Karthik Pattabiraman and Sathish …webhost.laas.fr/TSF/WDSN10/WDSN10_files/Slides/WDSN10...Layali Rashid, Karthik Pattabiraman and Sathish Gopalakrishnan Created Date](https://reader030.fdocuments.us/reader030/viewer/2022013003/5f515b4de5f918157102d1b7/html5/thumbnails/8.jpg)
Approach
Fault Model
Crash Model•Memory address•Branch/jump address•Function call address
SimpleScalarsimulator
Evaluate using FI
Dynamic Dependency Graph
![Page 9: Layali Rashid , Karthik Pattabiraman and Sathish …webhost.laas.fr/TSF/WDSN10/WDSN10_files/Slides/WDSN10...Layali Rashid, Karthik Pattabiraman and Sathish Gopalakrishnan Created Date](https://reader030.fdocuments.us/reader030/viewer/2022013003/5f515b4de5f918157102d1b7/html5/thumbnails/9.jpg)
Approach
Crash ModelFault Model
Dynamic Dependency Graph is a directed acyclic graph that models the dynamic dependencies between instructions. [Agrawal '90]
SimpleScalarsimulator
Evaluate using FI
![Page 10: Layali Rashid , Karthik Pattabiraman and Sathish …webhost.laas.fr/TSF/WDSN10/WDSN10_files/Slides/WDSN10...Layali Rashid, Karthik Pattabiraman and Sathish Gopalakrishnan Created Date](https://reader030.fdocuments.us/reader030/viewer/2022013003/5f515b4de5f918157102d1b7/html5/thumbnails/10.jpg)
Code Fragment Node
mov R1, #5 1
mov R2, #6 2
mov R3, #7 3
ld R4, R1, Array_Addr 4AA
1
4
2
5
Array_Addr
#5 #6
3
6
#7
A
Example
ld R4, R1, Array_Addr 4
ld R5, R2, Array_Addr 5
ld R6, R3, Array_Addr 6
mult R7, R5, R4 7
4 5
7
6
R R...
![Page 11: Layali Rashid , Karthik Pattabiraman and Sathish …webhost.laas.fr/TSF/WDSN10/WDSN10_files/Slides/WDSN10...Layali Rashid, Karthik Pattabiraman and Sathish Gopalakrishnan Created Date](https://reader030.fdocuments.us/reader030/viewer/2022013003/5f515b4de5f918157102d1b7/html5/thumbnails/11.jpg)
Code Fragment Node
mov R1, #5 1
mov R2, #6 2
mov R3, #7 3
ld R4, R1, Array_Addr 4
1
4
2
5
Array_Addr
#5 #6
3
6
#7
A A A
Example
ld R4, R1, Array_Addr 4
ld R5, R2, Array_Addr 5
ld R6, R3, Array_Addr 6
mult R7, R5, R4 7
4 5
7
6
R R...
A node is a value produced by a dynamic instruction
![Page 12: Layali Rashid , Karthik Pattabiraman and Sathish …webhost.laas.fr/TSF/WDSN10/WDSN10_files/Slides/WDSN10...Layali Rashid, Karthik Pattabiraman and Sathish Gopalakrishnan Created Date](https://reader030.fdocuments.us/reader030/viewer/2022013003/5f515b4de5f918157102d1b7/html5/thumbnails/12.jpg)
Code Fragment Node
mov R1, #5 1
mov R2, #6 2
mov R3, #7 3
ld R4, R1, Array_Addr 4AA
1
4
2
5
Array_Addr
#5 #6
3
6
#7
A
Example
ld R4, R1, Array_Addr 4
ld R5, R2, Array_Addr 5
ld R6, R3, Array_Addr 6
mult R7, R5, R4 7
4 5
7
6
R R...
The edges represent the instructions’ operands:•A is an address operand• R is a regular operand.
![Page 13: Layali Rashid , Karthik Pattabiraman and Sathish …webhost.laas.fr/TSF/WDSN10/WDSN10_files/Slides/WDSN10...Layali Rashid, Karthik Pattabiraman and Sathish Gopalakrishnan Created Date](https://reader030.fdocuments.us/reader030/viewer/2022013003/5f515b4de5f918157102d1b7/html5/thumbnails/13.jpg)
DDG Metrics
� Intermittent Propagation Set (IPS): set of program values to which an intermittent fault propagates,
� Crash Distance (CD): number of instructions � Crash Distance (CD): number of instructions that execute from the time an intermittent fault occurs until the program crashes (due to fault).
![Page 14: Layali Rashid , Karthik Pattabiraman and Sathish …webhost.laas.fr/TSF/WDSN10/WDSN10_files/Slides/WDSN10...Layali Rashid, Karthik Pattabiraman and Sathish Gopalakrishnan Created Date](https://reader030.fdocuments.us/reader030/viewer/2022013003/5f515b4de5f918157102d1b7/html5/thumbnails/14.jpg)
Example
Code Fragment Node
mov R1, #5 1
mov R2, #6 2
mov R3, #7 3
ld R4, R1, Array_Addr 4AA
1 2
5
Array_Addr
#5 #6
3
6
#7
A
Intermittent Error
4ld R4, R1, Array_Addr 4
ld R5, R2, Array_Addr 5
ld R6, R3, Array_Addr 6
mult R7, R5, R4 7
5
7
6
R R...
4
Intermittent Propagation Set (1,2) = {?}Crash Distance (1, 2) = ?
![Page 15: Layali Rashid , Karthik Pattabiraman and Sathish …webhost.laas.fr/TSF/WDSN10/WDSN10_files/Slides/WDSN10...Layali Rashid, Karthik Pattabiraman and Sathish Gopalakrishnan Created Date](https://reader030.fdocuments.us/reader030/viewer/2022013003/5f515b4de5f918157102d1b7/html5/thumbnails/15.jpg)
Example
Code Fragment Node
mov R1, #5 1
mov R2, #6 2
mov R3, #7 3
ld R4, R1, Array_Addr 4AA
1 2
5
Array_Addr
#5 #6
3
6
#7
A
4
Transient Error
Crash Nodeld R4, R1, Array_Addr 4
ld R5, R2, Array_Addr 5
ld R6, R3, Array_Addr 6
mult R7, R5, R4 7
5
7
6
R R...
Transient Propagation Set (1) = {1, 4}Transient Crash Distance (1) = 4
4Crash Node
![Page 16: Layali Rashid , Karthik Pattabiraman and Sathish …webhost.laas.fr/TSF/WDSN10/WDSN10_files/Slides/WDSN10...Layali Rashid, Karthik Pattabiraman and Sathish Gopalakrishnan Created Date](https://reader030.fdocuments.us/reader030/viewer/2022013003/5f515b4de5f918157102d1b7/html5/thumbnails/16.jpg)
Example
Code Fragment Node
mov R1, #5 1
mov R2, #6 2
mov R3, #7 3
ld R4, R1, Array_Addr 4AA
1
4
2Array_Addr
#5 #6
3
6
#7
A
5
Transient Error
ld R4, R1, Array_Addr 4
ld R5, R2, Array_Addr 5
ld R6, R3, Array_Addr 6
mult R7, R5, R4 7
4
7
6
R R...
5
Transient Propagation Set (1) = {1, 4}Transient Crash Distance (1) = 4
Transient Propagation Set (2) = {2, 5}Transient Crash Distance (2) = 4
![Page 17: Layali Rashid , Karthik Pattabiraman and Sathish …webhost.laas.fr/TSF/WDSN10/WDSN10_files/Slides/WDSN10...Layali Rashid, Karthik Pattabiraman and Sathish Gopalakrishnan Created Date](https://reader030.fdocuments.us/reader030/viewer/2022013003/5f515b4de5f918157102d1b7/html5/thumbnails/17.jpg)
Example
Code Fragment Node
mov R1, #5 1
mov R2, #6 2
mov R3, #7 3
ld R4, R1, Array_Addr 4AA
1 2
5
Array_Addr
#5 #6
3
6
#7
A
4
Intermittent Error
Crash Nodeld R4, R1, Array_Addr 4
ld R5, R2, Array_Addr 5
ld R6, R3, Array_Addr 6
mult R7, R5, R4 7
5
7
6
R R...
Intermittent Propagation Set (1,2) = {1, 2, 4}Crash Distance (1, 2) = 4
4Crash Node
![Page 18: Layali Rashid , Karthik Pattabiraman and Sathish …webhost.laas.fr/TSF/WDSN10/WDSN10_files/Slides/WDSN10...Layali Rashid, Karthik Pattabiraman and Sathish Gopalakrishnan Created Date](https://reader030.fdocuments.us/reader030/viewer/2022013003/5f515b4de5f918157102d1b7/html5/thumbnails/18.jpg)
Approach
Crash ModelFault Model
Dynamic
SimpleScalarsimulator
Evaluate using FI
Dynamic Dependency Graph
![Page 19: Layali Rashid , Karthik Pattabiraman and Sathish …webhost.laas.fr/TSF/WDSN10/WDSN10_files/Slides/WDSN10...Layali Rashid, Karthik Pattabiraman and Sathish Gopalakrishnan Created Date](https://reader030.fdocuments.us/reader030/viewer/2022013003/5f515b4de5f918157102d1b7/html5/thumbnails/19.jpg)
Experimental Setup
� Evaluating the Model’s Accuracy� Intermittent fault injections in instruction level
simulator (SimpleScalar)
� Measure the difference between the predicted and the actual CD for crashesactual CD for crashes
� Computation of Intermittent Fault Propagation� Construct the DDG of each program.
� Find the IPS and the CD for each fault
![Page 20: Layali Rashid , Karthik Pattabiraman and Sathish …webhost.laas.fr/TSF/WDSN10/WDSN10_files/Slides/WDSN10...Layali Rashid, Karthik Pattabiraman and Sathish Gopalakrishnan Created Date](https://reader030.fdocuments.us/reader030/viewer/2022013003/5f515b4de5f918157102d1b7/html5/thumbnails/20.jpg)
Benchmarks
� Preliminary results for two programs: Matrix Multiply and Insertion Sort.
� Each program has about 11,000 static MIPS instructions.
![Page 21: Layali Rashid , Karthik Pattabiraman and Sathish …webhost.laas.fr/TSF/WDSN10/WDSN10_files/Slides/WDSN10...Layali Rashid, Karthik Pattabiraman and Sathish Gopalakrishnan Created Date](https://reader030.fdocuments.us/reader030/viewer/2022013003/5f515b4de5f918157102d1b7/html5/thumbnails/21.jpg)
Results: DDG Model Vs. SimpleScalar
� 88% of the expected CD fall within 10 nodes from the actual ones and 97% fall within 100 nodes.
![Page 22: Layali Rashid , Karthik Pattabiraman and Sathish …webhost.laas.fr/TSF/WDSN10/WDSN10_files/Slides/WDSN10...Layali Rashid, Karthik Pattabiraman and Sathish Gopalakrishnan Created Date](https://reader030.fdocuments.us/reader030/viewer/2022013003/5f515b4de5f918157102d1b7/html5/thumbnails/22.jpg)
Results: CD Absolute values
� 95% of the faults cause program to crash within 10 nodes of the fault’s start.
![Page 23: Layali Rashid , Karthik Pattabiraman and Sathish …webhost.laas.fr/TSF/WDSN10/WDSN10_files/Slides/WDSN10...Layali Rashid, Karthik Pattabiraman and Sathish Gopalakrishnan Created Date](https://reader030.fdocuments.us/reader030/viewer/2022013003/5f515b4de5f918157102d1b7/html5/thumbnails/23.jpg)
Results: Effect of Fault Length
![Page 24: Layali Rashid , Karthik Pattabiraman and Sathish …webhost.laas.fr/TSF/WDSN10/WDSN10_files/Slides/WDSN10...Layali Rashid, Karthik Pattabiraman and Sathish Gopalakrishnan Created Date](https://reader030.fdocuments.us/reader030/viewer/2022013003/5f515b4de5f918157102d1b7/html5/thumbnails/24.jpg)
Conclusions and Discussion� We enhanced Dynamic Dependency Graph to model intermittent
fault propagation in programs.
� 88% of the expected faults' CDs fall within 10 nodes of the actual CDs.
� The majority of the intermittent faults cause programs to crash The majority of the intermittent faults cause programs to crash within few hundreds of dynamic instructions.
� Discussion� Detection using software-based techniques of intermittent faults
can be efficient.
� Diagnosis of intermittent faults is possibly feasible using software-based techniques.
� Recovery using check-pointing techniques on the order of thousands of instructions will be effective.
![Page 25: Layali Rashid , Karthik Pattabiraman and Sathish …webhost.laas.fr/TSF/WDSN10/WDSN10_files/Slides/WDSN10...Layali Rashid, Karthik Pattabiraman and Sathish Gopalakrishnan Created Date](https://reader030.fdocuments.us/reader030/viewer/2022013003/5f515b4de5f918157102d1b7/html5/thumbnails/25.jpg)
THANKYOU
![Page 26: Layali Rashid , Karthik Pattabiraman and Sathish …webhost.laas.fr/TSF/WDSN10/WDSN10_files/Slides/WDSN10...Layali Rashid, Karthik Pattabiraman and Sathish Gopalakrishnan Created Date](https://reader030.fdocuments.us/reader030/viewer/2022013003/5f515b4de5f918157102d1b7/html5/thumbnails/26.jpg)
BACKUP SLIDES
![Page 27: Layali Rashid , Karthik Pattabiraman and Sathish …webhost.laas.fr/TSF/WDSN10/WDSN10_files/Slides/WDSN10...Layali Rashid, Karthik Pattabiraman and Sathish Gopalakrishnan Created Date](https://reader030.fdocuments.us/reader030/viewer/2022013003/5f515b4de5f918157102d1b7/html5/thumbnails/27.jpg)
Insertion Sort CD
![Page 28: Layali Rashid , Karthik Pattabiraman and Sathish …webhost.laas.fr/TSF/WDSN10/WDSN10_files/Slides/WDSN10...Layali Rashid, Karthik Pattabiraman and Sathish Gopalakrishnan Created Date](https://reader030.fdocuments.us/reader030/viewer/2022013003/5f515b4de5f918157102d1b7/html5/thumbnails/28.jpg)
Insertion Sort IPS