F ORMAL D IAGNOSIS OF H ARDWARE T RANSIENT E RRORS IN P ROGRAMS Layali Rashid, Karthik Pattabiraman...
-
Upload
cody-benson -
Category
Documents
-
view
212 -
download
0
Transcript of F ORMAL D IAGNOSIS OF H ARDWARE T RANSIENT E RRORS IN P ROGRAMS Layali Rashid, Karthik Pattabiraman...
![Page 1: F ORMAL D IAGNOSIS OF H ARDWARE T RANSIENT E RRORS IN P ROGRAMS Layali Rashid, Karthik Pattabiraman and Sathish Gopalakrishnan T HE E LECTRICAL AND C OMPUTER.](https://reader035.fdocuments.us/reader035/viewer/2022072006/56649cf75503460f949c73ef/html5/thumbnails/1.jpg)
FORMAL DIAGNOSIS OF HARDWARE TRANSIENT ERRORS
IN PROGRAMSLayali Rashid, Karthik Pattabiraman and
Sathish Gopalakrishnan
THE ELECTRICAL AND COMPUTER ENGINEERING DEPARTMENT
THE UNIVERSITY OF BRITISH COLUMBIA
![Page 2: F ORMAL D IAGNOSIS OF H ARDWARE T RANSIENT E RRORS IN P ROGRAMS Layali Rashid, Karthik Pattabiraman and Sathish Gopalakrishnan T HE E LECTRICAL AND C OMPUTER.](https://reader035.fdocuments.us/reader035/viewer/2022072006/56649cf75503460f949c73ef/html5/thumbnails/2.jpg)
2
Contributions
• Software-driven diagnosis of hardware transient errors– Diagnosis: “isolate the first affected
instruction”• Program-level analysis
– Guarantees on the diagnosis• Completeness• Accuracy
THE UNIVERSITY OF BRITISH COLUMBIA
![Page 3: F ORMAL D IAGNOSIS OF H ARDWARE T RANSIENT E RRORS IN P ROGRAMS Layali Rashid, Karthik Pattabiraman and Sathish Gopalakrishnan T HE E LECTRICAL AND C OMPUTER.](https://reader035.fdocuments.us/reader035/viewer/2022072006/56649cf75503460f949c73ef/html5/thumbnails/3.jpg)
3
Why Software-Driven Diagnosis?
• No expensive hardware modifications.• Minimal software instrumentation.• Diagnose faults which manifest at the
program-level only.• Direct access to the affected device is not
required.
THE UNIVERSITY OF BRITISH COLUMBIA
![Page 4: F ORMAL D IAGNOSIS OF H ARDWARE T RANSIENT E RRORS IN P ROGRAMS Layali Rashid, Karthik Pattabiraman and Sathish Gopalakrishnan T HE E LECTRICAL AND C OMPUTER.](https://reader035.fdocuments.us/reader035/viewer/2022072006/56649cf75503460f949c73ef/html5/thumbnails/4.jpg)
4
Diagnosis Approach
THE UNIVERSITY OF BRITISH COLUMBIA
Detector Triggered
Dump File(e.g. failing detector, register file)
Error Diagnosis
Transient Error Faulty inst
![Page 5: F ORMAL D IAGNOSIS OF H ARDWARE T RANSIENT E RRORS IN P ROGRAMS Layali Rashid, Karthik Pattabiraman and Sathish Gopalakrishnan T HE E LECTRICAL AND C OMPUTER.](https://reader035.fdocuments.us/reader035/viewer/2022072006/56649cf75503460f949c73ef/html5/thumbnails/5.jpg)
5
Diagnosis Approach
Detector Triggered
Dump File(e.g. failing detector, register file)
Model Checking
Transient Error Faulty inst
THE UNIVERSITY OF BRITISH COLUMBIA
![Page 6: F ORMAL D IAGNOSIS OF H ARDWARE T RANSIENT E RRORS IN P ROGRAMS Layali Rashid, Karthik Pattabiraman and Sathish Gopalakrishnan T HE E LECTRICAL AND C OMPUTER.](https://reader035.fdocuments.us/reader035/viewer/2022072006/56649cf75503460f949c73ef/html5/thumbnails/6.jpg)
6
Model Checking Using SymPLFIED
• Formal model for analyzing programs[DSN’08]– Evaluate the effect of transient hardware errors on
programs.• Symbolic error propagation technique
– Represent errors using a single symbol (err) to avoid state space explosion.
THE UNIVERSITY OF BRITISH COLUMBIA
![Page 7: F ORMAL D IAGNOSIS OF H ARDWARE T RANSIENT E RRORS IN P ROGRAMS Layali Rashid, Karthik Pattabiraman and Sathish Gopalakrishnan T HE E LECTRICAL AND C OMPUTER.](https://reader035.fdocuments.us/reader035/viewer/2022072006/56649cf75503460f949c73ef/html5/thumbnails/7.jpg)
7
Example: Factorial Program1 movi $2, #1 2 read $13 mov $3, $1 4 movi $4, #15 loop: setgt $5, $3, $4 6 beq $5, #0, exit7 mult $2, $2, $38 subi $3, $3, #19 assert($3 < $1 + 1) 10 beq $0, #0, loop 11 exit: prints "Factorial = "12 print $2
Result variable
User input
Loops while $3 < $4
Error detector
THE UNIVERSITY OF BRITISH COLUMBIA
![Page 8: F ORMAL D IAGNOSIS OF H ARDWARE T RANSIENT E RRORS IN P ROGRAMS Layali Rashid, Karthik Pattabiraman and Sathish Gopalakrishnan T HE E LECTRICAL AND C OMPUTER.](https://reader035.fdocuments.us/reader035/viewer/2022072006/56649cf75503460f949c73ef/html5/thumbnails/8.jpg)
8
1 movi $2, #1 2 read $13 mov $3, $1 4 movi $4, #15 loop: setgt $5, $3, $4 6 beq $5, #0, exit7 mult $2, $2, $38 subi $3, $3, #19 assert($3 < $1 + 1) 10 beq $0, #0, loop 11 exit: prints "Factorial = "12 print $2
A transient fault, $3 = 13
THE UNIVERSITY OF BRITISH COLUMBIA
Example: Error Propagation
$1 = 5
Detector is triggered
![Page 9: F ORMAL D IAGNOSIS OF H ARDWARE T RANSIENT E RRORS IN P ROGRAMS Layali Rashid, Karthik Pattabiraman and Sathish Gopalakrishnan T HE E LECTRICAL AND C OMPUTER.](https://reader035.fdocuments.us/reader035/viewer/2022072006/56649cf75503460f949c73ef/html5/thumbnails/9.jpg)
9
1 movi $2, #1 2 read $13 mov $3, $1 4 movi $4, #15 loop: setgt $5, $3, $4 6 beq $5, #0, exit7 mult $2, $2, $38 subi $3, $3, #19 assert($3 < $1 + 1) 10 beq $0, #0, loop 11 exit: prints "Factorial = "12 print $2
A transient fault, $3 = 13
THE UNIVERSITY OF BRITISH COLUMBIA
Example: Error Propagation
$1 = 5
Detector is triggered
Dump file: Detector triggered$1 = 5$2 = 13$3 = 12$4 = 1$5 = 1
![Page 10: F ORMAL D IAGNOSIS OF H ARDWARE T RANSIENT E RRORS IN P ROGRAMS Layali Rashid, Karthik Pattabiraman and Sathish Gopalakrishnan T HE E LECTRICAL AND C OMPUTER.](https://reader035.fdocuments.us/reader035/viewer/2022072006/56649cf75503460f949c73ef/html5/thumbnails/10.jpg)
10
1 movi $2, #1 2 read $13 mov $3, $1 4 movi $4, #15 loop: setgt $5, $3, $4 6 beq $5, #0, exit7 mult $2, $2, $38 subi $3, $3, #19 assert($3 < $1 + 1) 10 beq $0, #0, loop 11 exit: prints "Factorial = "12 print $2
THE UNIVERSITY OF BRITISH COLUMBIA
Example: Error Diagnosis
A transient fault, $3 = err
False Line 7
True Exit
True Line 10
False Detector triggered
$2 = err
![Page 11: F ORMAL D IAGNOSIS OF H ARDWARE T RANSIENT E RRORS IN P ROGRAMS Layali Rashid, Karthik Pattabiraman and Sathish Gopalakrishnan T HE E LECTRICAL AND C OMPUTER.](https://reader035.fdocuments.us/reader035/viewer/2022072006/56649cf75503460f949c73ef/html5/thumbnails/11.jpg)
11
1 movi $2, #1 2 read $13 mov $3, $1 4 movi $4, #15 loop: setgt $5, $3, $4 6 beq $5, #0, exit7 mult $2, $2, $38 subi $3, $3, #19 assert($3 < $1 + 1) 10 beq $0, #0, loop 11 exit: prints "Factorial = "12 print $2
THE UNIVERSITY OF BRITISH COLUMBIA
Example: Error Diagnosis
A transient fault, $3 = err
False Line 7
True Exit
True Line 10
False Detector triggered
$2 = err
SymPLFIED’s SolutionInstruction 3 InjectedDetector triggered$1 = 5$2 = err$3 = err$4 = 1$5 = 1
Dump file: Detector triggered$1 = 5$2 = 13$3 = 12$4 = 1$5 = 1
![Page 12: F ORMAL D IAGNOSIS OF H ARDWARE T RANSIENT E RRORS IN P ROGRAMS Layali Rashid, Karthik Pattabiraman and Sathish Gopalakrishnan T HE E LECTRICAL AND C OMPUTER.](https://reader035.fdocuments.us/reader035/viewer/2022072006/56649cf75503460f949c73ef/html5/thumbnails/12.jpg)
12
1 movi $2, #1 2 read $13 mov $3, $1 4 movi $4, #15 loop: setgt $5, $3, $4 6 beq $5, #0, exit7 mult $2, $2, $38 subi $3, $3, #19 assert($3 < $1 + 1) 10 beq $0, #0, loop 11 exit: prints "Factorial = "12 print $2
THE UNIVERSITY OF BRITISH COLUMBIA
Example: Error Diagnosis
A transient fault, $3 = err
False Line 7
True Exit
True Line 10
False Detector triggered
$2 = err
SymPLFIED’s SolutionInstruction 3 InjectedDetector triggered$1 = 5$2 = err$3 = err$4 = 1$5 = 1
Dump file: Detector triggered$1 = 5$2 = 13$3 = 12$4 = 1$5 = 1
The crash dump file can be used to identify the faulty instruction.
![Page 13: F ORMAL D IAGNOSIS OF H ARDWARE T RANSIENT E RRORS IN P ROGRAMS Layali Rashid, Karthik Pattabiraman and Sathish Gopalakrishnan T HE E LECTRICAL AND C OMPUTER.](https://reader035.fdocuments.us/reader035/viewer/2022072006/56649cf75503460f949c73ef/html5/thumbnails/13.jpg)
13
Instructions that trigger a detector
Inject at a random bit in SimpleScalar
Y
YCreate a dump
fileError diagnosisDone
More inst?
NDetector
triggered?
Experimental Methodology
THE UNIVERSITY OF BRITISH COLUMBIA
• Enhance SymPLFIED to diagnose errors. • Modify SimpleScalar simulator to inject faults.• Evaluate for Matrix Multiply and Insertion Sort.
![Page 14: F ORMAL D IAGNOSIS OF H ARDWARE T RANSIENT E RRORS IN P ROGRAMS Layali Rashid, Karthik Pattabiraman and Sathish Gopalakrishnan T HE E LECTRICAL AND C OMPUTER.](https://reader035.fdocuments.us/reader035/viewer/2022072006/56649cf75503460f949c73ef/html5/thumbnails/14.jpg)
14
Results for Matrix Multiply Number of detectors 1 4 6Number of faults injected in SS 167 275 286
Number of faults detected in SS 74 135 150
Diagnosed faults (%) 100 77 80Undiagnosed fault (%) 0 23 20
THE UNIVERSITY OF BRITISH COLUMBIA
![Page 15: F ORMAL D IAGNOSIS OF H ARDWARE T RANSIENT E RRORS IN P ROGRAMS Layali Rashid, Karthik Pattabiraman and Sathish Gopalakrishnan T HE E LECTRICAL AND C OMPUTER.](https://reader035.fdocuments.us/reader035/viewer/2022072006/56649cf75503460f949c73ef/html5/thumbnails/15.jpg)
15
Number of detectors 1 4 6Number of faults injected in SS 167 275 286
Number of faults detected in SS 74 135 150
Diagnosed faults (%) 100 77 80Undiagnosed fault (%) 0 23 20
Results for Matrix Multiply (1)
• The proposed technique diagnoses 77%-100% of the detected errors for the matrix multiply program.
• The undiagnosed errors are implementation artifacts of the SymPLFIED tool.
THE UNIVERSITY OF BRITISH COLUMBIA
![Page 16: F ORMAL D IAGNOSIS OF H ARDWARE T RANSIENT E RRORS IN P ROGRAMS Layali Rashid, Karthik Pattabiraman and Sathish Gopalakrishnan T HE E LECTRICAL AND C OMPUTER.](https://reader035.fdocuments.us/reader035/viewer/2022072006/56649cf75503460f949c73ef/html5/thumbnails/16.jpg)
16
Number of detectors 1 4 6Number of faults injected in SS 167 275 286
Number of faults detected in SS 74 135 150
Diagnosed faults (%) 100 77 80Undiagnosed fault (%) 0 23 20
Results for Matrix Multiply (2)
• The number of faults injected in SimpleScalar is proportional to the number of detectors.
• Adding more detectors increases the diagnosis accuracy.
THE UNIVERSITY OF BRITISH COLUMBIA
![Page 17: F ORMAL D IAGNOSIS OF H ARDWARE T RANSIENT E RRORS IN P ROGRAMS Layali Rashid, Karthik Pattabiraman and Sathish Gopalakrishnan T HE E LECTRICAL AND C OMPUTER.](https://reader035.fdocuments.us/reader035/viewer/2022072006/56649cf75503460f949c73ef/html5/thumbnails/17.jpg)
17
Conclusions and Future Work• Software diagnosis of hardware faults is
possible and can be automated using formal techniques.– Our diagnosis method is able to diagnose significant
number of errors using a few detectors.• Future Work
– Investigate improvements with limited hardware support.
– Improve scalability using heuristics.– Extend to intermittent & permanent faults.
THE UNIVERSITY OF BRITISH COLUMBIA
![Page 18: F ORMAL D IAGNOSIS OF H ARDWARE T RANSIENT E RRORS IN P ROGRAMS Layali Rashid, Karthik Pattabiraman and Sathish Gopalakrishnan T HE E LECTRICAL AND C OMPUTER.](https://reader035.fdocuments.us/reader035/viewer/2022072006/56649cf75503460f949c73ef/html5/thumbnails/18.jpg)
18
Backup Slides
THE UNIVERSITY OF BRITISH COLUMBIA
![Page 19: F ORMAL D IAGNOSIS OF H ARDWARE T RANSIENT E RRORS IN P ROGRAMS Layali Rashid, Karthik Pattabiraman and Sathish Gopalakrishnan T HE E LECTRICAL AND C OMPUTER.](https://reader035.fdocuments.us/reader035/viewer/2022072006/56649cf75503460f949c73ef/html5/thumbnails/19.jpg)
19
Related Work
Hardware Fault Diagnosis
Hardware- BasedTechniques
ProbabilisticTechniques Formal Methods Periodic-Testing
Techniques
THE UNIVERSITY OF BRITISH COLUMBIA
![Page 20: F ORMAL D IAGNOSIS OF H ARDWARE T RANSIENT E RRORS IN P ROGRAMS Layali Rashid, Karthik Pattabiraman and Sathish Gopalakrishnan T HE E LECTRICAL AND C OMPUTER.](https://reader035.fdocuments.us/reader035/viewer/2022072006/56649cf75503460f949c73ef/html5/thumbnails/20.jpg)
20
Results for Insertion Sort
THE UNIVERSITY OF BRITISH COLUMBIA
Number of detectors 1 4 7Number of faults injected in SS 11 165 198
Number of faults detected in SS 8 64 83
Diagnosed faults (%) 100 87 89Undiagnosed fault (%) 0 13 11