Automatic Synthesis of Safety-RelatedSoftware — … Synthesis of Safety-RelatedSoftware — Short...

4
Automatic Synthesis of Safety-Related Software — Short Paper — Johann Schumann RIACS / NASA Ames email:[email protected] Abstract For specific domains (e.g., data analysis, planning and scheduling, or state estimation), automated program synthe- sis systems have been developed which are capable of pro- ducing hundreds of lines of non-trivial code. However, the potential applicability of an automatic program synthesis sys- tem does not only depend on size and quality of the generated code, but also its ability to be integrated into the overall soft- ware process. Therefore, the generation of executable code alone is not enough. In this paper, we will describe three techniques which enhance the capabilities of a synthesis tool with respect to generation of explanations, certificates, and simulation data. The synthesis system encodes enough do- main knowledge, such that the appropriate information can directly be extracted during the synthesis process. ExplainIt! is a component for the AMPHION/NAV system (synthesis of state estimation software) which automatically generates and displays explanations for each piece of the syn- thesized code, thus effectively achieving traceability between code and specification. For safety-relevant applications, software must undergo a rig- orous certification process where it must be demonstrated that certain safety policies are not violated. Traditional for- mal verification approaches (e.g., with Hoare-style rules) are impractical, because they require large amounts of man- ual code annotations. In this paper, we discuss an exten- sion of the AUTOBAYES system (synthesis of data analy- sis programs) for the automatic generation of code annota- tions which can be handled by a verification condition gen- erator and an automated theorem prover. Speed of this ap- proach compares favorably with commercial static analysis tools (e.g., PolySpace). Finally, we discuss a module of AUTOBAYES which synthe- sizes code for the generation of artificial data for simulation, experimentation, and testing purposes. Introduction Over the recent years, size and complexity of software in safety-related areas has grown tremendously. A major rea- This paper discusses work done in several synthesis projects at the Automated Software Engineering group (Guillaume Brat, Bernd Fischer, Mike Lowry, John Penix, Tom Pressburger, Phil Oh, Grigore Rosu, Mahadevan Subramaniam, Jeffrey van Baalen, Jonathan Whittle). Copyright c 2002, American Association for Artificial Intelli- gence (www.aaai.org). All rights reserved. son for this is that functionality which has been traditionally realized by hardware is now implemented as a program on a general-purpose processor, thus reducing production costs and increasing functionality. Typical application areas range from avionics, process control (e.g., for chemical or nuclear plants) to car industry. However, the production of reliable, high-quality code for safety-related applications is far from easy. In particular, modern, highly iterative software lifecy- cles (e.g., spiral or use-case based processes) are major cost drivers, because for each iteration, substantial testing, doc- umentation, and certification efforts are necessary. For ex- ample, flight-critical software (e.g., for position estimation or control of an aircraft) requires rigorous certification by an independent certification authority (e.g., the FAA). This time-consuming, highly manual process which is defined in standard documents (e.g., DO-178B) prescribes the required testing, documentation, and engineering efforts to guarantee traceability between specification and the executable binary. An approach which could facilitate the production of such pieces of software is automated program synthesis. Given a high-level specification, an automated program synthe- sis tool generates executable code which implements the specification. Because rigorous formal logic underlies this approach, the synthesized code is often considered to be “correct-by-construction”. Deduction-based program synthesis is around for a long time, and several synthesis systems (e.g., Am- phion (Stickel et al. 1994), KIDS (Smith 1990), or Plan- ware (Burstein & Smith 1996)) have been developed over the years and it seems that in certain (albeit small) do- mains such systems are capable of producing reasonably good code. However, the usability of such systems in the area of safety-related domains is still rather limited. In fact, they share many severe limitations with state-of-the-art code generators for traditional modeling systems (e.g., Ma- trixX (MatrixX 2001), ControlShell (ControllShell 2001)). As discussed above, production of a piece of code is not enough. Rather, a code-producing system needs to synthe- size the following artifacts: well documented, human-understandable code. Only if a piece of software can be easily understood, manual modi- fications can be applied or it can be subject to (successful) code reviews.

Transcript of Automatic Synthesis of Safety-RelatedSoftware — … Synthesis of Safety-RelatedSoftware — Short...

Automatic Synthesisof Safety-RelatedSoftware�

— Short Paper —

Johann SchumannRIACS/ NASA Ames

email:[email protected]

Abstract

For specific domains (e.g., data analysis, planning andscheduling,or stateestimation),automatedprogramsynthe-sis systemshave beendevelopedwhich arecapableof pro-ducinghundredsof lines of non-trivial code. However, thepotentialapplicabilityof anautomaticprogramsynthesissys-temdoesnotonly dependonsizeandqualityof thegeneratedcode,but alsoits ability to beintegratedinto theoverall soft-wareprocess.Therefore,the generationof executablecodealoneis not enough. In this paper, we will describethreetechniqueswhich enhancethecapabilitiesof a synthesistoolwith respectto generationof explanations,certificates,andsimulationdata. The synthesissystemencodesenoughdo-main knowledge,suchthat the appropriateinformationcandirectlybeextractedduringthesynthesisprocess.ExplainIt! is a componentfor the AMPHION/NAV system(synthesisof stateestimationsoftware)which automaticallygeneratesanddisplaysexplanationsfor eachpieceof thesyn-thesizedcode,thuseffectively achieving traceabilitybetweencodeandspecification.For safety-relevantapplications,softwaremustundergoarig-orous certification processwhere it must be demonstratedthat certainsafetypoliciesarenot violated. Traditional for-mal verification approaches(e.g., with Hoare-stylerules)areimpractical,becausethey requirelargeamountsof man-ual codeannotations. In this paper, we discussan exten-sion of the AUTOBAYES system(synthesisof data analy-sis programs)for the automaticgenerationof codeannota-tions which canbe handledby a verificationconditiongen-eratorandan automatedtheoremprover. Speedof this ap-proachcomparesfavorably with commercialstatic analysistools(e.g.,PolySpace).Finally, we discussa moduleof AUTOBAYES which synthe-sizescodefor thegenerationof artificial datafor simulation,experimentation,andtestingpurposes.

Intr oductionOver the recentyears,size and complexity of software insafety-relatedareashasgrown tremendously. A major rea-�This paperdiscusseswork donein several synthesisprojects

at the AutomatedSoftware Engineeringgroup (GuillaumeBrat,Bernd Fischer, Mike Lowry, JohnPenix, Tom Pressburger, PhilOh, GrigoreRosu,Mahadevan Subramaniam,Jeffrey vanBaalen,JonathanWhittle).Copyright c

�2002, American Associationfor Artificial Intelli-

gence(www.aaai.org). All rightsreserved.

sonfor this is thatfunctionalitywhichhasbeentraditionallyrealizedby hardwareis now implementedasa programonageneral-purposeprocessor, thusreducingproductioncostsandincreasingfunctionality. Typicalapplicationareasrangefrom avionics,processcontrol(e.g.,for chemicalor nuclearplants)to car industry. However, theproductionof reliable,high-qualitycodefor safety-relatedapplicationsis far fromeasy. In particular, modern,highly iterative softwarelifecy-cles(e.g.,spiralor use-casebasedprocesses)aremajorcostdrivers,becausefor eachiteration,substantialtesting,doc-umentation,andcertificationefforts arenecessary. For ex-ample,flight-critical software(e.g., for positionestimationor control of an aircraft) requiresrigorouscertificationbyan independentcertificationauthority(e.g.,the FAA). Thistime-consuming,highly manualprocesswhich is definedinstandarddocuments(e.g.,DO-178B)prescribestherequiredtesting,documentation,andengineeringefforts to guaranteetraceabilitybetweenspecificationandtheexecutablebinary.

An approachwhichcouldfacilitatetheproductionof suchpiecesof software is automatedprogramsynthesis.Givena high-level specification,an automatedprogramsynthe-sis tool generatesexecutablecode which implementsthespecification.Becauserigorousformal logic underliesthisapproach,the synthesizedcode is often consideredto be“correct-by-construction”.

Deduction-basedprogram synthesis is around for along time, and several synthesis systems (e.g., Am-phion (Stickel etal. 1994), KIDS (Smith1990), or Plan-ware (Burstein& Smith1996)) have beendevelopedoverthe yearsand it seemsthat in certain (albeit small) do-mains such systemsare capableof producingreasonablygoodcode. However, the usability of suchsystemsin theareaof safety-relateddomainsis still rather limited. Infact,they sharemany severelimitationswith state-of-the-artcodegeneratorsfor traditionalmodelingsystems(e.g.,Ma-trixX (MatrixX 2001), ControlShell(ControllShell2001)).As discussedabove, productionof a pieceof code is notenough.Rather, a code-producingsystemneedsto synthe-sizethefollowing artifacts:

� well documented,human-understandablecode.Only if apieceof softwarecanbeeasilyunderstood,manualmodi-ficationscanbeappliedor it canbesubjectto (successful)codereviews.

� traceability information betweencodeand specificationsuch� thatall piecesof thecodecanberelatedto their ori-gin in thespecification.

� supportfor simulation,animation,andtesting.A success-ful synthesissystemneedsto beableto produceartificialdatawhich conformto the given specification.State-of-the-artmodelingtools (e.g.,Simulink/MatLab,Controll-Shell)arealreadyprettyadvancedwith thatrespect.

� support for certification (e.g., providing annotationsorevenproofs).

In this paper, we demonstratethat a programsynthesissystemencodesenoughdomainknowledgeto supporttherequirementslistedabove. We will discussthreeextensionsto a programsynthesisarchitecturewhich, in addition toproducingexecutablecode, generatedetaileddocumenta-tion/explanations,certificatesfor thesynthesizedcodewithrespectto a givensafetypolicy, andtest/simulationdata,re-spectively.

The work, describedin this paper, is ongoing work.Therefore, these extensions have not been devel-oped for one single program synthesis system, butrather for two tools, namely AMPHION/NAV andAUTOBAYES. AMPHION/NAV (Whittle et al. 2001,-Schumann& Robinson2001) is a tool based on theAmphion system (Stickel etal. 1994) which is capableof automatically synthesizing C/C++ code for state-estimationand navigation of aircraft or spacecraft. Thedomain of AUTOBAYES (Fischer& Schumann2001,-Fischer, Schumann,& Pressburger2000) is data analysis,using the approachof Bayesiannetworks. This tool canbe used for scientific data analysis (e.g., clustering orclassificationproblems),but it alsocansynthesizecodetomodel sensorsand sensorfailures. Both systemsare aimtowardapplicationswheresafetyis important,for example,state-estimationof Marsroversor (on-board)scientificdataanalysis.

Ar chitecture of an ExtendedSynthesisSystem

Figure 1 shows the systemarchitectureof a modern,ex-tendedprogramsynthesissystem.Givena specification,thesynthesissystemproducesexecutablecode. For this coretask,domainknowledgein form of a domaintheoryis usedto guidethesynthesisprocess.Theunderlyingprincipleofthe synthesisengineis of no greatimportancefor the dis-cussionin this paper. For example, the AMPHION/NAVsystemis basedupondeduction-basedsynthesis(usingthefirst-ordertheoremprover SNARK), whereasAUTOBAYESusesschema-guidedsynthesis.However, all thesesystemshave in commonthat they rely on a substantialbodyof en-codeddomainknowledge. This domainknowledge,com-binedwith informationon how theprogramwasassembled(e.g.,a proof) canbeusedto extendthesynthesissystemtoproducecommentedcode,designdocuments,testdata,andsupportfor rigorouscertification. Theseextensionswill bedescribedin thefollowing sections.

x ~ N(mu,sigma)max pr(x | mu..

FOR i:=1 TO N mu[i] := ...

forall I : int & asize(mu) = N and ...

mem_safety: OKop_safety : OK...

Certifier

product certificate

Certification SupportFOR i:=1 TO N X[i] = rnd(...);

0.12361.02020.34320.31030.00133.2322

simulation

DocumentDesign

Program Synthesis System

input specification

synthesized, commentedcode

design document

knowledge

Domaintheory/

data

Figure 1: SystemArchitecturefor an ExtendedProgramSynthesisSystem

Explaining SynthesizedCodeIn the AMPHION/NAV system,mostaxiomsin thedomaintheory1 are given as a set of first-orderequations. Theseequationsrelatethevariousobjectson differentabstractionlevels. Due to the synthesisprocess(deductive synthesis)andadditionalprogramtransformationsteps,it is nearlyim-possibleto tell which partsof the synthesizedcodecorre-spondsto which part of the specification,or why the codeis structuredin a specificway. In a safety-relatedappli-cationenvironment,traceabilitybetweenspecificationandcodeis of major importance.During manualdevelopmentof suchsoftware,considerableeffort is spenton writing de-taileddocumentationon all aspectsof thecode.

Here,deductive programsynthesiscanhelp, becauseallinformationrelatingspecification,code,anddomaintheoryis available in the proof producedby the automatedtheo-rem prover. The proof, containinghundredsof inferencestepsis converted in such a way that it relatesthe inputspecificationwith the final product(C/C++ code). The ex-planationthuscanbe seenasa descriptionof the programdesign “from first principle”. AMPHION/NAV containsthe subsystem“ExplainIt!” which producesexplanationsfor the synthesistask(for detailssee(Whittle etal. 2001,-Schumann& Robinson2001)). Eachaxiom of the domaintheoryis annotatedby explanationtemplates,consistingofplain text and (logical) variables. Whenever an axiom isusedfor theproof,thevariablesin thetemplatesareinstanti-ated.In orderto find theentireexplanation,asetof explana-tionequalities(vanBaalenet al. 1998) is generatedwhichisusedto composethecorrespondingexplanationtemplates.

Humanreadabilityandunderstandabilityof suchan ex-planationis extremelyimportant.However, thetargetaudi-enceis notalogically trainedsynthesisperson,but adomainexpert/engineer. This meansthat not only all evidenceof

1Thedomaintheoryfor AMPHION/NAV is built on top of thedomaintheoryof the AMPHION system(Stickel et al. 1994) ongeometricrelationships,coordinatesystems,andcelestialmechan-ics.

Figure2: Screendumpof apartof theexplanationdocument

low-level deductionneedsto behiddenfrom theuser. Fur-thermore,therepresentationof datashoulduseform andvo-cabulary of thedomain.In thedomainof AMPHION/NAV,thecommonlyuseddatastructuresarevectorsandmatrices(asopposedto listsandlists-of-listsin AMPHION/NAV’sin-ternalrepresentation).Thus,explanationof a matrix is bestrepresentedin a tabular form, asshown in thescreen-dumpin Figure2. It shows a partof the explanationfor a matrix(“measurementmatrix � ”) which relatesthemeasurementswith the currentposition estimate. Eachcell of the tablecorrespondsto a single entry in the matrix. This HTMLdocumentis producedfrom theinternalXML representationwhich is generatedby “ExplainIt!”. Translation/formattingis donewith XSLT. HyperlinkedHTML documentshavetheadvantagethatall statementsof thesynthesizedcodecanbelinkedto their explanations.Thus,a simpleclick on a state-mentimmediatelyproducestherelateddocumentation.Us-ing XML asa flexible internaldocumentformatenablesusto alsogenerateprintedPDF documentationin a standard-izedform.

Certifying SynthesizedCodeCode certification is a lightweight approach(as opposedto e.g.,full functionalverification)to demonstratesoftwarequality on a formal level. Its basicideais to produceformalproofsdemonstratingthat the codesatisfiescertainqualityproperties(e.g.,memoryor operatorsafety). Theseproofscanbeseenascertificates(for theproducedcode)whichcanbecheckedindependentlyby a simpleproof checker. Sincecodecertificationusesthe sameunderlyingtechnologyasHoare-styleprogramverification,it alsorequiresmany de-tailedannotations(e.g.,loop invariants)to make theproofspossible.However, manuallyaddingtheseannotationsto thecodeis anextremelytime-consuminganderror-pronetask..

In a certificationextensionof AUTOBAYES, we addressthis problem(Whalen,Schumann,& Fischer2002). AUTO-BAYES containssufficient high-level domainknowledgeto

generatetherequireddetailedannotations.Becauseall con-straintsandinformationondesigndecisionsis availabledur-ing synthesistime, detailedandpowerful local annotationscanbegeneratedeasilyby AUTOBAYES. A separateprop-agationalgorithmdistributestheannotationsto all placesinthecodewherethey arevalid. Whenannotationsweregen-eratedby AUTOBAYES, theoriginal380linesof commentedcodegrew to morethan2100linesof codewith annotations.Thisis aclearindicationthatwriting manualannotationsareinfeasible.

From this annotated code, a general-purposever-ification condition generator (in our case MOPS(Kaiser, Fischer, & Struckmann2000)) produces a setof proof obligationsin first-orderlogic. Theobligationsarethenprocessedby theautomatedtheoremproverE-SETHEO(CASC2001).

In (Whalen,Schumann,& Fischer2002) wehavedemon-stratedourapproachby certifyingoperatorsafetyandmem-ory safetyfor a generatediterative dataclassificationpro-gram( ����� linesof documentedC++ code)without man-ual annotationof the code. For this example,a total of 69proof taskshave beengenerated.E-SETHEO could solve65 automaticallywith a run-time limit of �� secondson a1000MHz SunBladeworkstation. Most of the taskscouldbe solved in aboutone second,but several taskstook upto ��� seconds(averagetime: �� � seconds).The remainingfour proof taskscurrentlyrequiresomemanualpreprocess-ing which will be automatedin future versions. A com-parisonwith the state-of-the-artcommercialstaticanalysistool PolySpace(PolySpace2002) showedthatour approachcould reacha bettercoveragewith a substantiallyshorterruntime.

Generationof Simulation and TestData

Testingandsimulationplaysavital rolein mostsoftwarede-velopmentprocesses.Whereastestingaimsat showing thatthe pieceof codeworks correctly, simulationis often usedto demonstratehow thecodeworksandto assessits qualityandperformance.Therefore,the availability of simulationandtest-datais of greatimportance.To setup a simulationenvironmentmanually, however, is usuallya very time con-suminganderror pronetask. This is especiallytrue whentherequirementsspecificationsaremodifiedin a rapidsuc-cession(e.g.,in aniterative life cycle).

With programsynthesis,thedevelopmentof a simulationenvironmentcan be very straightforward; we synthesizeaprogramfrom our given specificationwhich generatestestdata.Theadvantagesareobvious: we alreadyhave a spec-ification, andmostof thesynthesizer’s infra-structure(e.g.,symbolichandling,codegeneration)canbe usedas is forthis task.For AUTOBAYES, we havedevelopeda tool com-ponentwhichcansynthesizeaprogramto generaterandom-izeddataaccordingto thegivenspecification.Thisdatagen-eratorcouldbeimplementedin lessthan200linesof Prologcodeon top of theAUTOBAYES system.

ConclusionsIn this� paper, we have briefly describedthree extensionsto bare-bonesprogramsynthesistechnologywhich canin-creaseusability of a synthesistool in safety-relatedappli-cationareas.In the AMPHION/NAV system,a detailedex-planationis generatedfully automaticallyandpresentedina way suitablefor the domainengineer. It fully hidestheunderlyinglogic and reasoningsystemusedto synthesizethe program. The proof stepsis convertedin sucha waythat it relatesthe input specificationwith the final product,thusopeningupanentirelynew levelof traceabilitybetweenspecificationandsourcecode.

Explanationanddocumentationis only oneaspect.Cur-rent practice of certification of safety-critical code re-quires huge testing effort and lengthy manual code re-views. Automaticcertificationof synthesizedcodehasthepotentialto substantiallyfacilitateandacceleratecertifica-tion. In combinationwith techniquesfrom proof-carryingcode(Necula& Lee1998), dynamic certificationof field-loadablesoftwarecanbe addressed.Hereagain,we bene-fit from the fact, that the synthesissystemencodesenoughdomainknowledgesuchthattherequiredHoare-styleanno-tationscanbemadeautomatically. Last,but not least,tryingout synthesizedcodeduringsimulationrunsis animportantfeaturefor apracticalusablesystem.Thetestdatageneratorprovidesimmediatefeed-backon the specification(doesitmake senseor aretheresomeobviousbugs?) andhelpstonavigatethroughthedesignspace.

All thosefeaturesform essentialingredientsof a modernprogramsynthesissystemif it shouldhave a chanceto beusedin practice.Bare-bonessynthesispower doesnot helphere,it only leadsto repeatingthe samemistakesashavebeenmadewith automatedtheoremprovers,which areusu-ally restricted“moreby generalusabilitythanby raw deduc-tivepower”2.

ReferencesBurstein,M. B., andSmith, D. 1996. ITAS: A PortableInteractiveTransportationSchedulingTool Usinga SearchEngineGeneratedfrom FormalSpecifications.In Proceed-ings of the 3rd International Conferenceon AI PlanningSystems(AIPS-96), 35–44.AAAI Press.CASC-JC,2001. TheCASC-JCtheoremproving compe-tition. URL:http://www.cs.miams.edu/˜tptp/CAS C/JC .Controlshell. 2001. RTI Real-Time Innovations.http://www.rti.com .Fischer, B., andSchumann,J. 2001. AutoBayes:A sys-tem for generatingdataanalysisprogramsfrom statisticalmodels. Submittedfor publication.Preprintavailable athttp://ase.arc.nasa.gov/people/.. .fischer/papers.html .Fischer, B.; Schumann,J.;andPressburger, T. 2000.Gen-eratingdataanalysisprogramsfrom statisticalmodels(po-sitionpaper).In Taha,W., ed.,Proc.Intl. WorkshopSeman-tics Applications,and Implementationof ProgramGener-

2M. Kaufmannin his invited talk duringCADE 15,1998.

ation, volume1924 of Lect. NotesComp.Sci., 212–229.Montreal,Canada:Springer.Kaiser, T.; Fischer, B.; andStruckmann,W. 2000. Mops:Verifying Modula-2 programsspecifiedin VDM-SL. InProc. 4th WorkshopTools for SystemDesignandVerifica-tion, 163–167.MatrixX: AutoCode Product Overview. ISI. URL:http://www.isi.com .Necula,G. C., and Lee, P. 1998. Efficient representa-tion and validation of logical proofs. In Proceedingsofthe13thAnnualSymposiumonLogic in ComputerScience(LICS’98), 93–104.IEEEComputerSocietyPress.PolySpacetechnologies.URL: http://www.polyspace.com .Schumann,J.,andRobinson,P. 2001. [] or successis notenough:Currenttechnologyandfuturedirectionsin proofpresentation. In Future Trendsin AutomatedDeduction(during IJCAR2001).Smith,D. R. 1990. KIDS: A SemiautomaticProgramDe-velopmentSystem.IEEE Trans.on Software Engineering16(9):1024–1043.Stickel, M.; Waldinger, R.; Lowry, M.; Pressburger, T.;andUnderwood,I. 1994.Deductivecompositionof astro-nomicalsoftwarefrom subroutinelibraries. In Bundy, A.,ed., Proc. 12th International ConferenceAutomatedDe-duction, volume814of Lecture Notesin Artificial Intelli-gence, 341–355.Springer.vanBaalen,J.; Robinson,P.; Lowry, M.; andPressburger,T. 1998.Explainingsynthesizedsoftware.In ThirteenthIn-ternationalConferenceon AutomatedSoftware Engineer-ing, 240–248.IEEEComputerSocietyPress.Whalen,M.; Schumann,J.; andFischer, B. 2002. Synthe-sizingcertifiedcode.In Proc. ICSE2002. (submitted).Whittle, J.; van Baalen,J.; Schumann,J.; Robinson,P.;Pressburger, T.; Penix,J.; Oh, P.; Lowry, M.; andBrat, G.2001. Amphion/NAV: Deductive Synthesisof StateEsti-mation(shortpaper).In Proceedingsof the16thAutomatedSoftwareEngineeringConference2001(ASE2001). IEEE.