Evaluating the Error Resilience of Parallel Programs
-
Upload
melanie-whitney -
Category
Documents
-
view
55 -
download
3
description
Transcript of Evaluating the Error Resilience of Parallel Programs
![Page 1: Evaluating the Error Resilience of Parallel Programs](https://reader033.fdocuments.us/reader033/viewer/2022061603/568130b0550346895d96c1eb/html5/thumbnails/1.jpg)
Evaluating the Error Resilience of Parallel
Programs
Bo Fang , Karthik Pattabiraman, Matei Ripeanu,
The University of British ColumbiaSudhanva Gurumurthi
AMD Research1
![Page 2: Evaluating the Error Resilience of Parallel Programs](https://reader033.fdocuments.us/reader033/viewer/2022061603/568130b0550346895d96c1eb/html5/thumbnails/2.jpg)
Reliability trends The soft error rate per chip will continue to
increase [Constantinescu, Micro 03; Shivakumar, DSN 02]
ASC Q Supercomputer at LANL encountered 27.7 CPU failures per week [Michalak, IEEE transactions on device and materials reliability 05]
2
![Page 3: Evaluating the Error Resilience of Parallel Programs](https://reader033.fdocuments.us/reader033/viewer/2022061603/568130b0550346895d96c1eb/html5/thumbnails/3.jpg)
Goal
3
Application-specific, software-
based, fault tolerance
mechanisms
Investigate the error-resilience
characteristics of
applications
Find correlation between
error resilience
characteristics and
application properties
GPU
CPU
✔
![Page 4: Evaluating the Error Resilience of Parallel Programs](https://reader033.fdocuments.us/reader033/viewer/2022061603/568130b0550346895d96c1eb/html5/thumbnails/4.jpg)
Previous Study on GPUs
Understand the error resilience of GPU applications by building a methodology and tool called GPU-Qin [Fang, ISPASS 14]
4
![Page 5: Evaluating the Error Resilience of Parallel Programs](https://reader033.fdocuments.us/reader033/viewer/2022061603/568130b0550346895d96c1eb/html5/thumbnails/5.jpg)
Operations and Resilience of GPU ApplicationsOperations Benchmarks Measured SDC
Comparison-based
MergeSort 6%
Bit-wise Operation
HashGPU, AES 25% - 37%
Average-out Effect
Stencil, MONTE 1% - 5%
Graph Processing
BFS 10%
Linear Algebra Transpose, MAT, MRI-Q, SCAN-block, LBM, SAD
15% - 38%
5
![Page 6: Evaluating the Error Resilience of Parallel Programs](https://reader033.fdocuments.us/reader033/viewer/2022061603/568130b0550346895d96c1eb/html5/thumbnails/6.jpg)
This paper: OpenMP programs
6
Thread model Program structureInput
processing
Pre algorithm
Parallel region
Post algorithm
Output processing
Begin:read_graphics(image_orig);image = image_setup(image_orig);scale_down(image)
#pragma omp parallel for shared(var_list1) ...
scale_up(image)
write_graphics(image);
End
![Page 7: Evaluating the Error Resilience of Parallel Programs](https://reader033.fdocuments.us/reader033/viewer/2022061603/568130b0550346895d96c1eb/html5/thumbnails/7.jpg)
Evaluate the error resilience Naïve approach:
Randomly choose thread and instruction to inject
NOT working: biasing the FT design Challenges:
C1: exposing the thread model C2: exposing the program structure
7
![Page 8: Evaluating the Error Resilience of Parallel Programs](https://reader033.fdocuments.us/reader033/viewer/2022061603/568130b0550346895d96c1eb/html5/thumbnails/8.jpg)
Methodology
Controlled fault injection: fault injection in master and slave threads separately (C1) Integrate with the thread ID
Identify the boundaries of segments (c2) Identify the range of the instructions for
each segment by using source-code to instruction-level mapping
LLVM IR level (assembly-like, yet captures the program structure)
8
![Page 9: Evaluating the Error Resilience of Parallel Programs](https://reader033.fdocuments.us/reader033/viewer/2022061603/568130b0550346895d96c1eb/html5/thumbnails/9.jpg)
9
LLFI and our extension
StartFault injection instruction/register selector
Instrument IR code of the
program with function calls
Profiling executabl
e
Fault injection
executable
Custom fault
injector
Inject?
Next instructio
n
Compile time
Runtime
Yes No
Add support for structure and thread
selection(C1)
Collect statistics on per-thread basis (C1)
Specify code region and thread to
inject (C1 & C2)
![Page 10: Evaluating the Error Resilience of Parallel Programs](https://reader033.fdocuments.us/reader033/viewer/2022061603/568130b0550346895d96c1eb/html5/thumbnails/10.jpg)
Fault Model
Transient faults Single-bit flip
No cost to extend to multiple-bit flip Faults in arithmetic and logic
unit(ALU), floating point Unit (FPU) and the load-store unit(LSU, memory-address computation only).
Assume memory and cache are ECC protected; do not consider control logic of processor (e.g. instruction scheduling mechanism)
10
![Page 11: Evaluating the Error Resilience of Parallel Programs](https://reader033.fdocuments.us/reader033/viewer/2022061603/568130b0550346895d96c1eb/html5/thumbnails/11.jpg)
Characterization study Intel Xeon CPU X5650 @2.67GHz
24 hardware cores Outcomes of experiment:
Benign: correct output Crash: exceptions from the system or
application Silent Data Corruption (SDC):
incorrect output Hang: finish in considerably long time
10,000 fault injection runs for each benchmark (to achieve < 1% error bar)
11
![Page 12: Evaluating the Error Resilience of Parallel Programs](https://reader033.fdocuments.us/reader033/viewer/2022061603/568130b0550346895d96c1eb/html5/thumbnails/12.jpg)
Details about benchmarks 8 Applications from Rodinia
benchmark suite
12
Benchmark Acronym
Bread-first-search bfs
LU Decomposition
lud
K-nearest neighbor
nn
Hotspot hotspot
Needleman-Wunsch
nw
Pathfinder pathfinder
SRAD srad
Kmeans kmeans
![Page 13: Evaluating the Error Resilience of Parallel Programs](https://reader033.fdocuments.us/reader033/viewer/2022061603/568130b0550346895d96c1eb/html5/thumbnails/13.jpg)
Difference of SDC rates
13
Compare the SDC rates for 1. Injecting into master thread only 2. Injecting into a random slave thread
Master threads have higher SDC rates than slave threads (except for lud) Biggest: 16% in pathfinder; Average: 7.8%
bfs lud nn hotspot nw pathfinder srad kmeans0%
20%
40%
60% SDC rate of slave threads
SDC rate of the master thread
It shows the importance of differentiating between the master and slave threads For correct characterization For selective fault tolerance
mechanisms
![Page 14: Evaluating the Error Resilience of Parallel Programs](https://reader033.fdocuments.us/reader033/viewer/2022061603/568130b0550346895d96c1eb/html5/thumbnails/14.jpg)
Differentiating between program segments
14
✖ ✖ ✖ ✖ ✖ ✖ ✖ ✖ ✖ ✖
Faults occur in each of the program segments
![Page 15: Evaluating the Error Resilience of Parallel Programs](https://reader033.fdocuments.us/reader033/viewer/2022061603/568130b0550346895d96c1eb/html5/thumbnails/15.jpg)
Output processing has the highest mean SDC rate, and also the lowest standard deviation
15
Segment InputProcessing
Pre-algorithm
ParallelSegment
Post-algorithm
Outputprocessing
Number of applications
6 2 7 2 5
Mean 28.6% 20.3% 16.1% 27% 42.4%
StandardDeviation
11% 23% 14% 13% 5%
Mean and standard deviation of SDC rates for each segment
![Page 16: Evaluating the Error Resilience of Parallel Programs](https://reader033.fdocuments.us/reader033/viewer/2022061603/568130b0550346895d96c1eb/html5/thumbnails/16.jpg)
Grouping by operationsOperations
Benchmarks
Measured SDC
Comparison-based
nn, nw less than 1%
Grid computation
hotspot, srad
23%
Average-out Effect
kmeans 4.2%
Graph Processing
bfs, pathfinder
9% ~ 10%
Linear Algebra
lud 44%
16
Operations Measured SDC
Comparison-based
6%
Bit-wise Operation
25% - 37%
Average-out Effect
1% - 5%
Graph Processing
10%
Linear Algebra
15% - 38%
Comparison-based and average-out operations show the lowest SDC rates in both cases
Linear algebra and grid computation, which are regular computation patterns, give the highest SDC rates
![Page 17: Evaluating the Error Resilience of Parallel Programs](https://reader033.fdocuments.us/reader033/viewer/2022061603/568130b0550346895d96c1eb/html5/thumbnails/17.jpg)
Future work
Understand variance and investigate more applications
Compare CPUs and GPUs Same applications Same level of abstraction
17
Operations Benchmarks Measured SDC
Comparison-based
nn, nw less than 1%
Grid computation
hotspot, srad 23%
Average-out Effect
kmeans 4.2%
Graph Processing
bfs, pathfinder
9% ~ 10%
Linear Algebra
lud 44%
Operations Measured SDC
Comparison-based
6%
Bit-wise Operation
25% - 37%
Average-out Effect
1% - 5%
Graph Processing
10%
Linear Algebra
15% - 38%
![Page 18: Evaluating the Error Resilience of Parallel Programs](https://reader033.fdocuments.us/reader033/viewer/2022061603/568130b0550346895d96c1eb/html5/thumbnails/18.jpg)
Conclusion
Error resilience characterization of OpenMP programs needs to take into account the thread model and the program structure.
Preliminary support for our hypothesis that error resilience properties do correlate with the algorithmic characteristics of parallel applications.
Project website:http://netsyslab.ece.ubc.ca/wiki/index.php/FTHPC
18
![Page 19: Evaluating the Error Resilience of Parallel Programs](https://reader033.fdocuments.us/reader033/viewer/2022061603/568130b0550346895d96c1eb/html5/thumbnails/19.jpg)
Backup slideSDC rates converge
19