FITL: Extending LLVM for the Translaon of Fault- Injec8on … · • Case Studies Background &...
Transcript of FITL: Extending LLVM for the Translaon of Fault- Injec8on … · • Case Studies Background &...
ORNL is managed by UT-Battelle for the US Department of Energy
FITL: Extending LLVM for the Transla8on of Fault-Injec8on Direc8ves
November 15, 2015
LLVM HPC @ SC15
http://ft.ornl.gov [email protected]
Joel E. Denny Seyong Lee
Jeffrey S. Vetter
2
Outline • Background&Motivation
• Design&Contributions• Fault-InjectionPragmasforC
• FITLExtensionstoLLVM
• CaseStudies
Background & Mo8va8on
4
Background: Fault Injec8on • Emulatehardwarefaultstostudyapplicationresiliency• Examplecausesoffaults:
– Cosmicrays,alphaparticles– Thermaleffects– Noise
• Typesoffaults:– Permanentfaults:wedonotmodelthis– Transientfaults:modeledassinglebitflips
• Impactoftransientfaults:– Benignfault–noimpactonapplication– Crash–obviousfailureofapplication– Silentdatacorruption(SDC)
5
Mo8va8on: HPC and Resilience • HPCsystemsevolvingtowardexascale:
– Millionsofcomputationalunits,memoryunits,sockets– Newpowerreductionstrategiesneeded– Frequencyofhardwarefaultswillgrowdramatically
• Prediction:– Meantimetofailurewillbetooshortforexistingresiliencysolutions– Checkpoint,restart,hardware-onlysolutionswillnotbesufficient– Resiliencemustbeaddressedinsoftwarestack– Hardwarefaultsexposedtoapplicationlevel
• Needawaytostudyapplicationresilience
6
Mo8va8on: Fault Injector Tool Design • Faultinjectortoolusecases:
– Studyresiliencyofapplicationstodevelopnewresiliencemechanisms– Unittestinganddebuggingframeworkforresiliency
• Designqualities:– Flexibility:whatkindsofhardwarefaultscenarioscanweemulate?– Accuracy:arethesimulationsrealistic?– Usability:howeasyisittoconfigurestudiesandunderstandresults?– Implementability:howeasyisittoimplement?
• Designvariables:– Levelofrepresentationatwhichfaultinjectoroperates– Levelofrepresentationatwhichuserconfiguresfaultinjection
Design & Contribu8ons
8
Design Quality vs. Level of Representa8on
Hardware Assembly Compiler IR Source
Des
ign
Qua
lity
Level of Representation
LLVM IR
Usability &
Implementability
Accuracy &
Flexibility
9
Design Quality vs. Level of Representa8on
Hardware Assembly Compiler IR Source
Des
ign
Qua
lity
Fault Injection
Accuracy &
Flexibility Implementability
Hardware Assembly Compiler IR Source
Des
ign
Qua
lity
User Interface
Usability
LLVM IR Directives
10
Architecture
...
FITL LLVM IR
LLVM IR
Target Binary
OpenARC Other Compilers
FITL Pass
LLVM
C Source +
Fault-Injection Directives
Other Languages +
Fault-Injection Directives
11
Contribu8ons • Asetofnovelfault-injectionpragmasforC
• FITL(Fault-InjectionToolkitforLLVM):– FITLLLVMIR:metadataandintrinsicsthatspecifyfaultinjection– FITLpass:LLVMpassthatlowersFITLLLVMIRtostandardLLVMIR
• AbstractionsthatfacilitatethetranslationofpragmastoFITL
• Fault-injectionstudies:– Includescomparisonwithsource-levelfaultinjector
Fault-Injec8on Pragmas for C
13
Fault-Injec8on Sites • FundamentalconceptatsourcelevelandLLVMIRlevel:
– Source:sitesaredefinedbydirectives– LLVMIR:siteisabasicunitofcodeinsertedtoperformfaultinjection
• Eachsiteisatriple:– Executionposition:positionincodewheresiteisinserted– Target:memoryorinstructionintowhichafaultmightbeinjected– Condition:whetherfaultisinjected
• Sitesaredefinedstatically,conditionisevaluateddynamically
14
Pragma: Pinject + Pdata • Precisespecificationofasinglesitetargetingmemory• Executionposition:exactlywhereftinjectappears:immediatelybeforeprintfcall
• Target:arr[i] • Condition:true
void print_array(double *arr, int n) { for (int i = 0; i < n; ++i) { #pragma openarc resilience { #pragma openarc ftinject ftdata(arr[i:1]) printf(“arr[%d] = %e\n”, i, arr[i]); } } }
15
Pragma: Pregion + Pdata • Randomselectionfromregionofsitesismorerealistic
• Executionpositions:immediatelybeforeeveryinstruction
• Target:arr[i] • Condition:onesiteisrandomlyselected
void print_array(double *arr, int n) { for (int i = 0; i < n; ++i) { #pragma openarc resilience { #pragma openarc ftregion ftdata(arr[i:1]) printf(“arr[%d] = %e\n”, i, arr[i]); } } }
16
Pragma: Pregion + Pkind • Targetscomputationalunitsinsteadofmemory
• Executionpositions:immediatelybeforeeveryinstructiontakingafloatingpointargument
• Targets:thefloatingpointerarguments
• Condition:onesiteisrandomlyselected void print_array(double *arr, int n) { for (int i = 0; i < n; ++i) { #pragma openarc resilience { #pragma openarc ftregion ftkind(floating_arg) printf(“arr[%d] = %e\n”, i, arr[i]); } } }
FITL Extensions to LLVM
18
Basic Block Set • EachpragmaandanyattachedCblockgeneratesasetofLLVMIRinstructions
• CompilerfrontendmustcommunicatetoFITLpasswhichinstructionsbelongtowhichpragma
• Ourchoiceofgranularityisthebasicblock,soeachpragmahasabasicblockset
• Compilerfrontendmightneedtosplitbasicblocks
19
Entry Intrinsic • AbasicblocksetScanhaveanentryintrinsic:
– CalledsoonaftercontrolflowentersS– ConfiguresfunctionalitywithinS– Itsargsencodeclausesofassociatedpragma
Pragma Basic Block Set Kind Entry Intrinsic resilience llvm.fitl.domain llvm.fitl.startDomain
ftregion llvm.fitl.region llvm.fitl.startRegion
ftinject llvm.fitl.desert llvm.fitl.inject
20
Basic Block Set Example #pragma openarc ftregion ftkind(floating_arg) printf("arr[%d] = %e\n", i, arr[i]);
.fitl.region.entryFirstBB: call void @llvm.fitl.startRegion.i64(metadata !4, metadata !"floating_arg”, i8* null, i64 0, i64 0, i64 0) br label %.fitl.region.bodyFirstBB, !llvm.fitl.regionList !7 .fitl.region.bodyFirstBB: %.load3 = load double** %arr.stackedParam %.load4 = load i32* %i %.add = getelementptr double* %.load3, i32 %.load4 %.load5 = load i32* %i %.load6 = load double* %.add %.ret = call i32 (i8*, ...)* @printf(i8* getelementptr inbounds ([14 x i8]* @.str, i32 0, i32 0), i32 %.load5, double %.load6) br label %.fitl.region.nextBB, !llvm.fitl.regionList !7
!4 = metadata !{metadata !"llvm.fitl.region", metadata !4} !7 = metadata !{metadata !"llvm.fitl.regionList", metadata !4}
basic block set list metadata node
basic block set identifier metadata node
21
FITL Pass • Insertscodeforfault-injectionsitesspecifiedbyFITLbasicblocksetsandentryintrinsics
• LowersFITLLLVMIRtostandardLLVMIR
• MustbeperformedbeforeotherLLVMpasses
• BasicBlockSet:– LLVMlibrarycomponentwedeveloped– Readsbasicblocksetmetadata– Findsentryintrinsiccalls– Generallyusefulfordirective-drivenprogramming
...
FITL LLVM IR
LLVM IR
Target Binary
OpenARC Other Compilers
FITL Pass
LLVM
C Source +
Fault-Injection Directives
Other Languages +
Fault-Injection Directives
Case Studies
23
Experimental Setup • Studiedresilienceoffourapplications:
– Twokernelbenchmarks:JACOBIandMATMUL– TwoRodiniabenchmarks:BFSandNW
• Eachtestflipsonerandomlyselectedbit,targetiseither:– Userdata– LLVMIRinstructionargumentofspecifictype– LLVMIRinstructionresultofspecifictype
• Eachtestrepeated100x• SequentialexecutiononsingleCPUonmachinethathas:
– Twoquad-coreIntelXeonE5520– 12GBRAM– ScientificLinuxVersion6.5
24
Results
JACOBI MATMUL
Rodinia BFS Rodinia NW
25
JACOBI
fault injection into floating point
26
JACOBI • Faultsinfloatmemory:
– Nocrashes:• Datadoesn’taffectcontrolflow• Datadoesn’tincludepointers
– FewSDCs:• Dataoverwritteneveryiteration• Roundofferror
• Faultsinfloatcomputation:– Nocrashes(samereasons)– FewSDCsforprofiled(P)– ManySDCsfornon-profiled(L)
27
Acknowledgements • ContributorsandSponsors
– FutureTechnologiesGroup:http://ft.ornl.gov
– USDepartmentofEnergyOfficeofScience
• DOEARESProject:http://ft.ornl.gov/research/ares