1 A Survey on Theorem Provers in Formal Methods
Transcript of 1 A Survey on Theorem Provers in Formal Methods
1
A Survey on Theorem Provers in FormalMethods
M. Saqib Nawaz, Moin Malik, Yi Li, Meng Sun and M. Ikram Ullah Lali
Abstract—Mechanical reasoning is a key area of research that lies at the crossroads of mathematical logic and artificial intelligence.The main aim to develop mechanical reasoning systems (also known as theorem provers) was to enable mathematicians to provetheorems by computer programs. However, these tools evolved with time and now play vital role in the modeling and reasoning aboutcomplex and large-scale systems, especially safety-critical systems. Technically, mathematical formalisms and automated reasoningbased-approaches are employed to perform inferences and to generate proofs in theorem provers. In literature, there is a shortage ofcomprehensive documents that can provide proper guidance about the preferences of theorem provers with respect to their designs,performances, logical frameworks, strengths, differences and their application areas. In this work, more than 40 theorem provers arestudied in detail and compared to present a comprehensive analysis and evaluation of these tools. Theorem provers are investigatedbased on various parameters, which includes: implementation architecture, logic and calculus used, library support, level ofautomation, programming paradigm, programming language, differences and application areas.
Index Terms—Mathematical logics, Reasoning, Interactive theorem provers, Automated theorem provers, Proof automation, Survey.
F
1 INTRODUCTION
Recent developments and evolution in Information and Com-munication Technology (ICT) have made computing systems moreand more complex. The criteria that how much we can rely onthese systems is based on their correctness. Bugs or any loopholesin the system lead to severe risks that endanger human safety orfinancial loss. In recent times, bugs ratio has increased due to thecomplex designs of the modern systems under market pressureand user requirements. The efforts and cost required to correctbugs increases as the gap widens between their introduction anddetection. Table 1, taken from [272], shows the relative coststo fix bugs that are introduced in the requirements phase. Abug introduced in a particular stage of a system development isrelatively cheap to fix if also detected in that stage. It becomesmore hard and expensive to fix a bug that is introduced in onestage and detected in the other stage. For safety-critical systems,the impact of bugs can be so large as to make a fix effectivelymandatory.
TABLE 1: Cost to fix a bug introduced in requirements phaseBug Found at Stage Relative Cost to Fix
Requirements 1 (definition)Architecture 3
Design 5-10System Test 10Post-Release 10-100
Testing and verification techniques are used in the systemtest phase to empirically check their correctness. In testing, asystem is tested against the software/hardware requirements [201].Similarly, simulation provides virtual environments for any real
• M. Saqib Nawaz is with School of Computer Science andTechnology, Harbin Institute of Technology, Shenzhen, China. E-mail:[email protected]
• Moin Malik is with Department of Computer Science & IT, University ofSargodha, Pakistan. E-mail:[email protected]
• Yi Li and Meng Sun are with School of Mathematical Sciences, PekingUniversity, Beijing, China. E-mail:{liyi math, sunm}@pku.edu.cn
• M. Ikram Ullah Lali is with Department of Computer Science,Faculty of Computing and IT, University of Gujrat, Pakistan. E-mail:[email protected]
events. However, there are some inherent limitations of thesetechniques. A program can be only tested against the functionalrequirements of the system, which may be not refined and maycontain ambiguities that leads to inadequate testing. Exhaustivetesting of systems is not possible. Moreover, the time and budgetconstraints may affect the testing process. Simulations are alsobased on assumptions and does not always cover all the aspectsof the system [77]. Furthermore, both testing and simulation cannot be used efficiently for analyzing the continuous or hybridsystems. According to [82]: “Program testing is an effective wayto find errors, but it does not guarantee the absence of errors”.On the other hand, formal methods formally verify the systemcorrectness. Formal methods are “mathematical-based techniquesthat are used in the modeling, analysis and verification of boththe software and hardware systems” [275]. Formal methods allowthe early introduction of models in the development life-cycle,against which the system can be proved by using appropriatemathematics. As mathematics is involved in the modeling andanalysis, 100% accuracy is guaranteed [123]. But why we needformal methods in place of other well-known, widely acceptableand easy to use techniques such as testing and simulation? Toanswer this question, we first provide a few examples wheretesting and simulation failed.
Air France Flight 447 crashed in June 2009, which resultedinto hundred of casualties. During investigation, it was found thatthe probe sensors were unable to measure the accurate speedof the plane, which provides the automatic disengagement ofautopilot. Similarly, in August 2005, Malaysian Airbus 124 landedunexpectedly 18 minutes after taking off due to a fault in its airdata inertial reference unit. There are two accelerators that controlthe airspeed of the flight, but one of them failed, which resultedinto a sudden rapid climbing and passed almost 4000 feet higherthan expected without any warning. After investigation, it cameto know that on the failure of the first accelerator, the secondone used the falsy data of former due to input anomaly in thesoftware. The probe, which laid hidden for a decade, was not
arX
iv:1
912.
0302
8v1
[cs
.SE
] 6
Dec
201
9
2
found in testing because the designer had never assumed thatsuch an event might occur [58]. In June 2009, Metro Train inWashington crashed and as a result, the operator of the train and80 other people got injured severely. The cause of this incidentwas the design anomaly in the safety signal system. The safetysystem sent a green signal to the upcoming station, while thetrack was not empty. Similarly, other examples are failure of theLondon Ambulance Service’s computer added dispatch system[217], Therac 25 [184], Anaesthetic equipment and the respirationmonitoring device [189] which resulted into the casualties andfinancial losses. All listed accidents could have been avoided ifthe design of the systems were analyzed mathematically.
Formal methods techniques in contrast to testing permit theexhaustive investigation and reveals those bugs which are missedby testing methods. Actual requirements of the system in suchtechniques are translated into formal specifications which aremathematically verified and elaborate the actual behavior of thesystem in real scenarios. Two most popular formal verificationmethods are model checking and theorem proving. Inmodel checking, a finite model of the system is developed first,whose state space is then explored by the model checker toexamine whether a desired property is satisfied in the model ornot [28]. Model checking is automatic, fast, effective and it canbe used to check the partial specification of the system. However,model checkers still face the state-space explosion problem [62].Model state space grows infinitely large with increase in the totalnumber of variables and components used and the number ofdistinct values assigned to these variables [204].
Theorem proving on the other hand, can be used to handleinfinite systems. In theorem proving, systems are defined andspecified by users in an appropriate mathematical logic. Im-portant/critical properties of the system are verified by theoremprovers. Theorem prover checks that whether a statement (goal)can be derived from a logical set of statements (axiom/hypothesis).It can model and verify any system that can be defined withthe help of mathematical logics. It is akin to Computer AlgebraSystem (CAS) because both are used for symbolic computation.However, theorem provers have some advantages over CAS suchas: flexibility in logic expressiveness, clear expression and morerigor. Theorem provers can be further categorized into two maintypes: Automated Theorem Provers (ATPs) and Interactive Theo-rem Provers (ITPs). ATPs deal with the development of automatedcomputer programs to prove the goals [254]. In contrast, ITPsinvolve human interaction with computer in the process of proofsearching and development. That is why ITPs are also known asproof-assistants. Due to practical limitations in pure automation,interactive proving is the suitable way for the formalization of“most non trivial theorems in mathematics or computer systemcorrectness” [121]. Theorem provers have been used successfullyin various domains such as biomedical [232], game theory [159],machine learning [147], economy [160], computer science [97],artificial intelligence [268] and self-adaptive systems [271]. Notethat the terms theorem provers and mechanical reasoning systemsare used interchangeably in this paper.
We believe that a comprehensive review on mechanical rea-soning systems is strongly needed. People having little knowledgeabout them generally think that all the systems based on math-ematics have similar nature. However, it is not the case. Eachsystem has different functionality and it is not an easy task toselect which system should be used for the formalization efforts.Theorem provers are diverse in nature and the main aim of this
work is to demonstrate how different they are. Moreover, the goalis to provide a proper guidance to new researchers in formalverification. In order to substantiate our work, a questionnairehas been designed for the evaluation of theorem provers. Thequestionnaire is then filled by the developers and active researchersof theorem provers. Mechanical reasoning systems are investigatedfor the following parameters:
• Mathematical logic used in the system,• Implementation language of the system,• System type,• Platform-support in the system,• System category, whether it belongs to ATP or ITP,• Truth value of the system (binary/fuzzy),• Calculus (deductive/inductive) of the system,• Set (ZF/fuzzy) theoretic support in the system,• Programming paradigm of the system,• User Interface (UI) of the system,• Scalability of the system,• Distributed/multi-threaded,• Integrated development environment (IDE) support,• Library support in the the system,• Whether the system satisfy the de Bruijn criterion, and• Whether the system satisfy the Poincare principle of au-
tomation.
Primarily, this survey is a collection of tables and figuresthat illustrates various aspects of theorem provers. We receivedreplies from experts/developers of 16 theorem provers. Another27 theorem provers characteristics are investigated through onlinedatabases and research articles. We also report the top scientistswho have proved maximum number of theorems in provers andtop provers in which most number of mathematical theoremsare proved till now. ATP that performed best at CADE ATPsystem competition (CASC) are also discussed. CASC is a yearlycompetition for the first-order logic fully ATPs. Moreover, afore-mentioned parameters are used to compare the provers and topresent their main characteristics and differences. Finally, theirapplications are discussed along-with the existing recent work andthe potential application/problems, where they can be used.
The rest of the survey is structured as follows: An overviewon the historical background of theorem provers in the light oflogical frameworks is provided in Section 2. Related work is alsodiscussed. Section 3 elaborates the research methodology that isbased on the systematic literature review in software engineering.Section 4 presents the results of the questionnaire, where answersof the experts and developers are presented. In Section 5, topscientist that proved most of the theorems and top provers in whichmaximum number of theorems are proved till now are listed.ATP that performed best at CASC competition are also listed.Finally, theorem provers are compared for aforementioned criteria.The strengths, in-depth analysis of the differences among theexisting theorem provers and their application areas are discussedin Section 6, along-with the future research directions. The surveyconcludes with some remarks in Section 7.
2 BACKGROUND
In this section, historical background of the logical frameworksthat are used in mechanical reasoning systems is presented. Fur-thermore, related work on the surveys and previous comparisonsof theorem provers is discussed.
3
2.1 Mathematical Logic
Treating mathematics as a formal theory where all the mathemat-ical statements are proved with a set of few basic axioms andinference rules is a long-standing goal. However, formal proofs oftheorems require a lot of steps, effort and time. Many ancientGreek logicians, mathematicians and philosophers successfullyexpressed reasoning through syllogism [124]. Syllogism dealswith formalization of deductive reasoning on logical arguments toarrive at a conclusion based on two or more propositions. Leibnizworked on the ways to reduce human reasoning calculationsin symbolic logic and embodied these deductive reasoning intomathematics, making it easier to be implemented in computerprograms. Mathematical reasoning provide objectivity and cre-ativity that is hard to be found in any other fields. In mechanicalreasoning, two paths were presented. One was to analyze humanproof creation process and implement it using computationalresources. The other was to utilize the work of logicians andtransform the logical reasoning into a standard form on whichalgorithms are based [97]. In the mid of 1950s, the relationshipof the computer to mathematics has emerged in the form ofautomated reasoning, especially the automation of mathematicalproofs [188]. More discussion on formal deduction, LCF (Logicof Computable Functions) and modern type theory role in thedevelopment of theorem provers can be found at [31], [191].
Emergence of automated reasoning initiated the earliest workon computer-assisted proofs, when the first general purpose com-puters became available [121]. The field of computer supportedtheorem proving gets attention in the second half of 20th century.In 1970s, theorem provers were investigated for verification ofcomputers systems. Extensive research in this area was done in late1980s when these tools were used in the verification of hardwaresystems. In mid 1990s, a bug in Intel’s Pentium II processor causedby floating point division increased the interest in formal methodsand formal hardware verification tools were used by industry intheir system design in late 1990s [176]. In 1994, Boyer proposedthe QED manifesto [47] for a “computer-based database of allmathematical knowledge” (formalized mathematics). During theyears, the QED manifesto is adopted by many theorem provers.In recent years, the mechanical mathematical proofs for systemverification has gained popularity [265]. Future of formal methodslooks promising and practical. Big companies such as Google,Facebook, IBM, Intel and Microsoft are now using and conductingresearch in formal methods.
Theorem provers are fundamentally based on mathematicallogics. Among the most popular theorem provers, we take threewidely used ones: Propositional Logic, First-Order Logic (FOL)and Higher-Order Logic (HOL). Each logic is further discussednext.
2.1.1 Propositional Logic
Propositional logic is used to represent atomic proposi-tions or declarative sentences with the help of mathematicalBoolean operators such as and, or, not, implicationand equivalence. It is also known as axiomatization ofBoolean logic. Logical operators such as conjunction (∧),disjunction (∨), not (¬), implication (=⇒) andbi-implication (⇐⇒) are used to bind propositions tomake sentences. Truth values (True or False) are assigned tothese propositions for evaluation of a sentence. Axiomatizationof Boolean algebra is also performed through propositional logic.
Strong argument about propositional logic is that it is decidablewith the help of truth tables.
Newell developed the first theorem proving program LogicTheorist in 1956 [245]. This program proved propositional logictheorems using axioms/inference rules. It not only worked fornumeric expressions but also for symbolic formulas and proofsearching is guided through heuristics. It proved 38 theoremsout of 52 and proofs of the theorems were more elegant. An-other contribution in mechanical reasoning system was Davisand Putnam’s semantics-based procedure reference [78]. It wasa decision method for checking whether a formula in Conjunc-tive Normal Form (CNF) is satisfiable or not. Such problemsnowadays are called SAT (satisfiability) problems. They used“ground resolution” for proving mathematical formulas in theform of predicate logic. Ground resolution used two propositionalclauses and generate another propositional clause. For testingBoolean formulas, Davis and Putnam’s procedure implementeda series of ground resolution for proving satisfiability of theseformulas [95]. Satisfiability of Boolean formulas was also testedwith Davis-Putnam-Longemann-Loveland (DPLL) method [79].This was a searching algorithm based on backtracking mechanism.DPLL was used in checking the satisfiability of propositional logicformulas that are in CNF. This is an improved version of Davisand Putnam’s procedure. This method used backtracking searchinstead of ground resolution. DPLL as compared to Davis andPutnam’s method was much faster. It has a semantics searchingprocess which helps in truth assignment. NP complete problem[93] was the new notion, which was introduced several years laterafter DPLL. These problems were developed and proved by Cook[65]. This motivated the development of SAT for solving hardproblems, which has been improved over the last few years bothon algorithmic and implementation level.
2.1.2 First-Order LogicPropositional logic has less power as it is based on propositionsand have no ability to predict any complex behavior. Furthermore,it does not support variables, functions, relations and quantifica-tions to compute complex problems. For example, “Socrates isa man” can be represented by propositional logic, but “all menare mortal” can not be represented in it because of quantificationinvolvement. First-Order Logic (FOL) is the extension of propo-sitional logic that allows quantifiers. Predicate logic is the generalcategory to which FOL belongs. A predicate is a two valued orBoolean function that maps a given domain of objects onto aBoolean set, and this function is used to show specific quality orproperty between variables. For example, we assumed that Q(n)is a predicate where n is an even integer. Domain of discoursefor this predicate is set of all integers. Therefore Q(n) dependson the value of n. Logical operators and quantifiers (universal(∀), existential (∃)) are used in predicate logic to expressproblems. It is less challenging to build automated theorem proverbased on FOL. However, satisfiability checking of FOL is semi-decidable. Skolem and Herbrand first designed a semi-decidableprocedure in 1920 [141]. Their method was based on unguidedretrieving for proof searching process and enumeration of groundterms. This method was useless in practical terms such as provingnon-trivial theorems. However, it played an important role forimplementing theorem provers based on FOL.
Prawitz was the first one who developed a general mechanicalreasoning system for FOL [39]. FOL provers such as Otter[197] and Setheo [183] are mostly based on resolution and
4
tableaux methods and they are used in solving puzzles, algebraicproblems, software retrieval and verification of protocols. Sometheorem provers even support recursive functions, dependent and(co)inductive types, e.g. ACL2 [156], Metamath [198] and E[240]. Such systems are used in mathematics (number theoryand set theory), compiler verification, hardware verification andcommercial applications. However, exponential time algorithmsare required for automatic proving in practice. Therefore, proofsin FOL are generally achieved by changing the FOL formula intoa tautology or Boolean satisfiability problem. In such way, BDD,DPLL based SAT solvers can be used to automatically checkthe formulas. Satisfiability Modulo Theories (SMT) deals withthe satisfiability of formulas against some logical theory [34]. Inrecent years, SMT solvers further extended the capabilities of SATsolvers.
2.1.3 Higher-Order LogicFOL is more expressive as compared to propositional logic but lessthan Higher-Order Logic (HOL). FOL only quantifies variables.Predicates, propositions and functions are not quantified by FOL.For example: quantifier quantifying a proposition
∃s(R(y) −→ s) (1)
Another example is quantifier quantifying a predicate.
∀S(Q(s) ∧ ¬R(s)) (2)
HOL extends FOL by supporting many types of quantifi-cation. HOL permits predicates to accept premises which arealso predicates, and allows quantification over predicates andfunctions which is not the case for FOL. Based on HOL, onecan construct a proof environment which is logically sound, dometa reasoning, interactive as well as automated and practicallyimplementable. HOL is mostly used for ITPs. Various methodsused in HOL-based theorem provers are decision procedures,inductions, tableaux, rewriting, interactions and many heuristics.They are used in formalizing mathematics and verification ofprogramming languages, distributed system, compiler, softwareand hardware systems. Some well known HOL based theoremprovers are: HOL [212], Coq [38], PVS [237] and Agda [211].
HOL is undecidable, so proving HOL properties is not fully-automatic and thus human assistance is required. It is moreexpressive and has the ability to prove complex problems andtheorems. But it is more challenging to build an ATP or ITP basedon HOL. Generally, the proof process in ITPs is as follows. Userfirst states the property or feature (in the form of a theorem) that iscalled a goal. User then applies proof commands to solve it. Proofcommands may decompose the goal into sub-goals. The proofprocess is completed once all the sub-goals are solved [185].
2.2 Related WorkOur survey on mechanical reasoning systems is certainly notthe first one. Some work is done in the past on the surveyand detailed discussion and comparison on theorem provers [23],[113], [117], [118], [121], [191], [275]. A detailed description ofproof-assistants and their theoretical foundations is given in [31]along-with the comparison for nine theorem provers. However,their comparison results take only one and a half pages. 17theorem provers are compared in [274] for three parameters: (i)library size of each prover, (ii) strength of the logic used in theprover, and (iii) level of automation in each prover. Similarly, [46]
surveyed Coq, PVS, Mizar, ProofPower-Hol, HOL4, Isabelle/HOLand HOL Light for formally analyzing the real time systems.They also investigated the extended standard libraries that playmain role in proof automations: C-CoRN/MathClasses for Coq,ACL2(r) for ACL2 and the NASA PVS library. The applicationof theorem provers in economics is discussed in [160], with thefocus on two domains: social choice theory and auction theory.
In literature, other comparisons between provers can be found.However, in most of those works, only two systems are comparedgenerally. Such works include the comparisons between HOL andPVS [112], HOL and Isabelle [9], NuPRL and Nqthm [35], Coqand HOL [139], HOL and ALF [8] and Isabelle/HOL and Coq[277]. Some works have also been done on how to adapt the proofsto different systems [108], [202], [213]. In [99], Reentrant ReadersWriters problem is first modeled in UPPAAL model checker andfound a possible deadlock scenario. They further converted theUPPAAL model and analyzed the model in PVS and checked thePVS model for arbitrary number of processes. Moreover, SPINmodel checker is used in [100] for modeling and analysis ofReentrant Reader Writers problem. Promela model is convertedto PVS specification and the correctness of the model was thenverified.
3 RESEARCH METHODOLOGY
Core part of a Systematic Literature Review (SLR) is the require-ment of research questions. Right and meaningful questions aredemanded to ask during the review process and we also need topinpoint the scope of research accomplishments. Following theguidelines of [162], we have structured the research questionswith the support of PIOC (Population, Intervention, Outcome,and Context) standard for applying the SLR process in softwareengineering field. PIOC for this work is presented in Table 2.
TABLE 2: PIOC for this workPopulation Mechanical reasoning systems
Intervention Theorem proving approachOutcome Comprehensive document, evaluation and comparisonContext Developers and experts from industry and researchers
from academia
Efforts are made to collect evidence on recent scenarios ofresearch in the development of mechanical reasoning tools. Forthis purpose, we have designed research questions which arepresented in Table 3. We forwarded our questionnaire to a numberof theorem prover developers, experts and forums. Names andanswers of the domain experts that responded are presented inSection 4.
3.1 Keywords Retrieval for Search StringsKeywords (question elements) are find out from relevant researcharticles that we studied. These keywords are used for retriev-ing more information related to the development of mechanicalreasoning tools from electronic databases. These keywords arepresented in Table 4. Alternative words for the keywords are findout by using synonyms and thesaurus. These alternative keywordsare also used for the searching process. Keywords are linkedtogether with Boolean OR and search strings are constructed bylinking the four OR lists with Boolean AND.
Online databases, journals and conferences related to mechan-ical reasoning tools are used for comparison, analysis and evalua-tion. Seven electronic sources of relevance in software engineering
5
TABLE 3: Research questions from our questionnaireResearch questions about system general information
What are the names of people who contributed into the system?When system first time (date, year) appeared in the market?
What is the latest version of the system?When was the system updated last time?
What is the address of web page for accessing it online?What are the unique features of the system?What are the success stories of the system?
Research questions about system category informationWhat is the type of the theorem prover (ATP/ITP)?
What is the type of the system w.r.t. reasoning (mathematical)?What is the type of the system w.r.t. logic (FOL or HOL)?
What is the type of the system w.r.t. truth values (binary/fuzzy)?What is the type of the system w.r.t. calculus (deductive/inductive)?Either set theoretic support (ZF/fuzzy) is available in the system?
Research questions on system programming frameworkWhat is the paradigm (functional/imperative/other) of the system?
What is the programming language (C/C++/Java/Other) of the system?What is the user interface (GUI/CLI) of the system?
What is the scalability (distributed/multi-threaded) of the system?Either library support is available in the system?
TABLE 4: Keywords from mechanical reasoning surveysArmstronget al. [19]
Formal tools, software and hardware correctness, provablycorrect design, theorem provers, Satisfiability ModuloTheory, abstract models, Event-B, digital systems, formalmethods, model correctness.
Mackenzie[188]
Mathematical proofs, interactive theorem proving, auto-matic theorem proving, mathematical logic, mathematicalreasoning, formal logic, proof automation, classical andconstructive logic, machine intelligence.
Azurat &Prasetya[25]
Theorem prover, proof checker, formal verification, pro-gramming logic, formal representation, HOL, PVS, auto-matic proof generation, Coq.
Harrison[117]
Logic and program meaning, mathematical logic, sym-bolic manipulation, algebraic interpretation, formal lan-guages, software engineering.
Harrisonet al.[121]
Formal proof, interactive theorem provers, proof goals,semi automated mathematics, Automath, Coq, NuPRL,Agda, Logic of computable functions, HOL, PVS, prooflanguage, proof automation.
Boldo etal. [46]
Proof assistant, formalization, proof libraries, interac-tive theorem provers, PVS, Coq, HOL4, Isabelle/HOL,ProofPower-HOL, HOL Light, proof automation.
Wiedijk[274]
Proof assistants, proof kernel, logical framework, decid-able types, dependable types, de Bruijn criterion, Isabelle,Theorema, HOL, Coq, Metamath, PVS, Nuprl, Otter, Alfa,Mizar, ACL2.
Maric[191]
Decision procedures, proof search, theorem provers, soft-ware correctness, interactive theorem provers survey, for-mal deduction, proof checking, logical frameworks, SATsolvers, SMT solvers, Poincare principle.
Hales[113]
Computer proofs, proof assistant, small proof kernel, logi-cal framework, proof tactics, first-order automated reason-ing, mathematical proof, theorem provers.
Avigad &Harrison[23]
Axiomatic set theory, mathematical proof, calculus ofreasoning, formalized mathematics, Formal verification,interactive theorem proving, Poincare conjecture, formalproof systems.
Barendregt&Geuvers[31]
proof checking, mathematical logic, type theory and typechecking, type systems, predicate logic, higher-order logicproof development, proof-assistants, Coq, Agda, NUPRL,HOL, Isabelle, Mizar, PVS, ACL2.
is identified in [48]. However, in last few years, many new andfamous libraries are developed especially in computer sciencefield. Therefore, it may also be necessary to consider other sources.The search strings were used on 10 digital libraries: (i) DBLP, (ii)IEEE Explore, (iii) ACM Digital Library, (iv) Springer Link, (v)Science Direct, (vi) CiteSeerX , (vii) Scopus, (viii) Inspec, (ix) EICompendex, and (x) Web of Science.
4 PROVERS AND THEIR CHARACTERISTICS
In this section, answers of the developers and experts thatresponded to our questionnaire are presented. The order in whichwe present the answers of our respondents is the order in whichwe received their replies. In this way, we wish to express ourgratitude to them.
Matt Kaufman (ACL2): Matt Kaufman is a senior researchscientist working at Department of Computer Science, Universityof Texas, Austin. Matt provided information about “ACL2” [156].Main authors are M. Kaufman and J. Moore. However, severalothers have also made significant contributions. The first publicrelease of ACL2 was 1.9 in 1994. The latest version is 8.2 andupdated last time in May 2019. Basically, it is a monolithicsystem, but the applicative style of programming often makesit straightforward to use pieces of the system. It is generallyclassified as an ITP. However, it takes automation seriously; inthat sense it shares characteristics with ATP. Its logic is FOL withinduction and is written mostly in the ACL2 language, which isan applicative language extending a non-trivial subset of CommonLisp. Its UI is typically Emacs based. ACL2 is a cross platformtool: it runs on Linux, MacOS and Windows. It also runs on thetop of 6 different common Lisp implementations. Input format ofACL2 is s-expressions, though output can be pro-grammaticallyproduced. Web address is cs.utexas.edu/users/moore/acl2. ACL2has been scaled to large applications, recently at Centaur andOracle. Its users seem pretty happy with readability, but othersmight be put off by the s-expression format. Inter-operabilitybetween different provers is limited, though there has beenwork [108], [109] that connects ACL2 and HOL4, e.g. ACL2is first-order, but is still quite expressive because of its supportfor recursive definitions. Several capabilities allow it to do somethings that might be considered higher-order in nature: macros, aproof technique called functional instantiation and oracle-apply.ACL2(p) [227] supports parallelism for execution, proof andother infrastructure supports parallelism at the level of collectionsof files. Run-time assertions are supported and lots of debuggingtools are available for program execution and proof. ACL2 canoften emulate other logics by formalizing their proof theories. Itis extensible or programmable by the user via rule classes anddirectly via meta rules and clause-processors. ACL2 may be theonly ITP that presents a single logic for both its programminglanguage (provide efficient execution) and its theorem prover(including definitions and theorems to prove). There is a largelibrary of “Community Books” developed over many years byusers, in daily use. There are users in academia, government andindustry.
Stephen Schulz (E): Stephen Schulz is the next person whoresponded to our questionnaire. Stephen designed and developed“E” theorem prover [240]. The first public release of E was 0.2 in1998. The latest version is 2.4 and updated last time in October2019. E was originally developed at TU Munich, but now it ismaintained and extended at DHBW Stuttgart, Germany. Licensetype is open source/free software under GNU GPL Version2. It is an ATP for full FOL with equality, where first-orderproblems are reduced to clause normal form and uses a saturationprocedure based on the equational superposition calculus. Mainuser community of the system is mathematician. Web address iswww.eprover.org. E won several CADE ATP competition and
6
has a good ranking. The type of E with respect to reasoning ismathematical, type with respect to the logic is classical FOLand with respect to the truth value is binary. Calculus usedin E is deductive and set theoretic support is available but onlogical level via axiomatization for ZF. Programming paradigmis imperative, it is purely developed in C and it supports CLI(command line interface). E is officially distributed in sourcefiles and supports Linux, Mac OS, FreeBSD, Solaris, Windowsand w/Cygwin. It is not a multi-threaded system, has a mixedarchitecture (modular + monolithic) and has its own library. Proofcan be generated in TPTP-3, PCL2 and Graphviz format. Someinput codes are generated automatically from test data. Systemhas no dedicated proof kernel, but has explicit proof-object.Any standard text editor can be used for files input. E has beencombined with other systems (Waldmeister, LEO-II, Vampire, Z3,etc.) at Isabelle Sledgehammer tool [223] to increase the level ofproof automation.
Makarius Wenzel (Isabelle): Makarius Wenzel providedinformation about “Isabelle” [222], originally published by L.Paulson (Cambridge, UK). Many people have contributed toIsabelle in the last 30 years. It was released in 1986 and its purelogical framework first came up in 1989. Latest version of thesystem was released in June 2019. Isabelle grew out of universityresearch projects, but it is of industrial quality, or even beyondthat, because it is not subjected to constraints imposed by marketeconomy. The full distribution uses add-on tools with variousstandard open-source licenses: LGPL, GPL, etc. Web address isisabelle.in.tum.de. Isabelle unique features is a huge integratedenvironment for interactive and automated theorem proving. It islike a word-processor for formal logic, with specifications andproofs. Main user community is the people interested in formallogic and formalized mathematics and people doing proofs aboutsoftware and hardware. Isabelle is in fact a multiplicity of ITPand ATP systems. Type of the system with respect to reasoningis mathematical, type with respect to logic is mostly HOL, butusers can also do something else if they really want to. Its typewith respect to truth value is mostly classical logic/Boolean.Programming paradigm is purely functional (ML and Scala) andUI is a full-scale IDE. It supports multiple operating systemssuch as: Linux, Windows, Mac and is available both for 32-bitand 64-bit architectures. For scalability, it provides support forclassic shared-memory workstations with many cores. Isabelleis highly modular, to the extent that it is hard to tell where itstarts and ends and what is actually its true structure. It providescode generation facility for SML, OCaml, Haskell and Scala.Isabelle has a small proof kernel, according to the classic “LCFapproach”, but with many add-ons and reforms over the decades.It is based on λ-calculus with simple types and natural deduction.Moreover, it supports inductive recursion and has very powerfulderived principles for inductive sets, predicates, primitive andgeneral recursive functions.
Thierry Lecomte (Atelier B): Thierry Lecomte works asdirector at ClearSy organization. Under his supervision, “AtelierB” was developed. Atelier B implements the B method [4] andoffers a theorem prover. It was released first time in 1994. Latestversion is 4.5.1 and updated last time in May 2018. AtelierB is an interactive rule based theorem prover plus interactiveand dedicated tableau method. Logic of the system is classicalFOL and truth value is traditional Boolean. Calculus type is
deductive and supports ZF set theory. Programming paradigm isimperative and programming language is similar to Prolog. Webaddress is clearsy.com/en/our-tools/atelier-b. UI of Atelier B isgraphical-based and it operates on various operating systems. Italso provides support for the Linux based clusters. Its architectureis monolithic, where inheritance and library support is notavailable. Atelier B (CASE tool) provides C and Ada codegeneration. It has no small proof kernel but has proof-objectsfor more then 130 axioms. Mathematical rules (transformation,rewriting, hypotheses generation) are added by users, but it isnot programmable by users. Syntax is inspired from Haskell,ML, Java, C, C++ and Prolog languages. Infix/postfix/mixfixoperators’ support are available and also for Unicode, Binaryand ASCII coding schemes. Native support for B language is theunique feature of the system. Rich tactic language is available forwriting proof instead by hand and it does not support inductiverecursion.
David Crocker (Escher Verifier): David Crocker is serving atMISRA C++ working group. He developed “Escher” verifier[53]. Its latest version is 6.10.02 and was updated last time in2015. It is an industrial product and license type of the systemis commercial. Web address is eschertech.com/products/ecv.php.One of the unique feature of Escher Verifier is that sometimesit suggests missing preconditions/assertions/invariants, etc. inthe model or software being verified when a proof is not found.Main user community is the defense industry. It falls in ATPcategory. Reasoning type of the system is mathematical, logicaltype is a combination of FOL, SOL and some HOL. Type withrespect to the truth value is mostly binary but triadic wherenecessary. Calculus of the system is deductive and programmingparadigm is mostly functional, but imperative in speed-criticalparts. Programming language is C++. There is no direct interfacefor the theorem prover. However, GUI is available for theverification tools that uses it. It runs on Windows and Linuxoperating systems. System scalability is limited to a singlethread and architecture is monolithic and standalone. It does nothave small proof kernel and editor support. It is not extensibleor programmable by its user. Moreover, it does not supportconstructive logic.
Norman Megill (Metamath): Norman Megill is the nextrespondent of the questionnaire. He is the originator of the“Metamath” [198]. There are 34 other contributors who helpedto extend the system. It was introduced first time in 1993.Latest version is 0.131 and was updated last time in June 2016.It is an independent development by Norman. Web page isus.metamath.org. License type of the system is GPL. User FOLscheme is the unique feature of the system. It is an ITP andalso used as a proof checker. Reasoning type of the system ismathematical, logical type is FOL. HOL is also possible but notdeveloped yet. Truth value of the system is binary and deductivecalculus is used. There are 12 independent verifiers availablein C, Java, C#, Lua, Mathematica, Julia, Rust, Python, Haskell,C++, and JavaScript. Metamath supports both CLI, GUI andruns on almost all operating systems. Architecture of the systemvaries according to the environment. Metamath also displayscomprehensive error message for the debug output and runtimeassertion. Library of the system contains over 20000 theoremsthat covers results in logic, algebra, set and group theory, topologyanalysis, Hilbert spaces and quantum logic. It is a standalone
7
system and no code generation facility is available. It is extensibleand programmable by the user. Metamath has a small proofkernel. Human readability feature according to the syntax isunique as compared to others. Unification process is used forpattern matching. Argument handling is implicitly available andit is lightweight. Metamath supports inductive recursion and doesnot allow to write non-terminating programs. The system is easyto learn, but require experience with library and reasoning foradvanced proofs.
Frank Pfenning (Twelf); Frank Pfenning works as professorat Computer Science Department, Carnegie Mellon University,USA and is the creator of “Twelf” [226]. C. Schurman alsocontributed to the system. It was released publicly in January1999. Latest version is 1.7.1 and updated last time in January2015. Website address is twelf.org. Twelf is an ITP and simplifiedBSD is the type of system license. Unique features of the systemare meta-theorem proving for programming languages and logics.It is mainly developed for academia community. Type of thereasoning system with respect to logic is type theory and its typewith respect to the truth value is intuitionistic. Twelf is built ondeductive calculus and is developed in standard ML. UI of thesystem is CLI and runs on almost all operating systems. It maybe scalable but it is not distributed or multi-threaded system.Twelf supports IDE and has its own libraries. System is extensibleand programmable by users. It supports no tactic language andproofs are written by hand. Twelf has been used to formalizemany different logics and programming languages (examples areincluded with the distribution).
Ulf Norell (Agda): Ulf Norell works as a principalresearch engineer at University of Gothenburg, Sweden. Hedeveloped “Agda” system [211]. Latest version is 2.6.0.1,which was updated last time in May 2019. Web address iswiki.portal.chalmers.se/agda/pmwiki.php. License type of thesystem is BSD-like. Dependent types are the unique feature andacademia is the main user community of the system. Popularresearch language is the main success story of the system. It isan ITP and based on functional programming. System type withrespect to the logic is intuitionistic HOL, type with respect tothe truth value is binary and is built on inductive calculus. Itsupport constructive type theory. Haskell programming languageis used for developing the system and UI of the system is graphicbased. Agda supports and runs on all popular operating systems.It is scalable but not used for distributed or multi-threadedenvironment. It supports IDE, has its own proof kernel andlibrary. It is extensible and programmable by users and has atactic language for proof writing. An important aspect of Agda isits dependence on Unicode. Its standard library is under constantdevelopment and includes many useful definitions and theoremsabout basic mathematical designs.
Adam Naumowicz (Mizar): Adam Naumowicz provideshis services to computer science institute at University ofBialystok, Poland. He gave information on “Mizar” [110], whichwas publicly announced in November 1973. Latest versionis 8.1.09 and updated last time in June 2019. Web addressis mizar.org. Andrzej Trybulec is the founder and Mizar isdeveloped at University of Bialystok. The system is free for anynoncommercial purposes. User friendly input language based onnatural language and a large library of formalized mathematics
are the unique features. Mathematicians, computer scientistsand students are the main user community. Mizar is an ITPand based on syllogism or mathematical statements. FOL withschemes (statements with free second-order variables) is thesystem type with respect to the logic and is based on binarytruth value. It is based on deductive calculus and ZF set theoreticsupport is available. Declarative is the programming paradigmand object Pascal programming language is used for developingthe system. Type of the interface is CLI and runs on almost alloperating systems. It is scalable, but not suitable for distributedor multi-threaded environment. It supports IDE, has its ownlibrary and proof kernel. However, it is not extensible and notactic language support is available for proof writing. MizarMathematical Library (MML) contains approximately 10,000formal definitions and 52,000 lemmas and theorems.
Michael Norrish (HOL): Michael Norrish has been working asprincipal research engineer at Australian National University. Hetalked about “HOL” theorem prover [212]. It was publicly releasedin January 1985. Latest version is Kananaskis-13 and was updatedlast time in August 2019. Web address is hol-theorem-prover.org.HOL was developed at Cambridge University. Four toolsnow comes in HOL family: HOL4 [247], HOL Light [120],ProofPower [20] and HOLZero [7]. Other tools that come in HOLfamily are developed jointly by Cambridge University, Data61,CSIRO and Chalmers University of Technology. License type ofHOL is BSD. It is an ITP based on syllogism or mathematicalstatements. HOL is the logical framework of the system andis based on binary truth value. System is based on deductivecalculus and does not supports set theory directly, but it has aset-theoretic model. Programming paradigm is functional anddeveloped in SML programming language. UI is command line. Itsupports and runs on all famous operating systems. It is scalable,but not suitable for distributed or multi-threaded environment. Itdoes not support IDE, but has its own proof kernel and library.System is extensible and support tactic language for proof writing.
Jonathan Sterling (RedPRL): Jonathan Sterling is a graduateresearch assistant at School of Computer Science, CarnegieMellon University and creator of “RedPRL” [250]. Web addressis redprl.org. MIT is the license type of the system. Uniquefeatures of RedPRL are higher dimensional types, support forstrict equality and tactic scripts, refinement of dependent proofsand functional extensionality. Main user community is homotopytype theory. It is an ITP and based on syllogism or mathematicalstatements. Type theory is the logical framework and is basedon intuitionistic truth values. System is based on deductivecalculus. Programming paradigm is functional and it is developedin standard ML. Visual studio code extension is the UI of thesystem. It runs on almost all major operating systems. RedPRLis not suitable for distributed or multi-thread environment, buthas its own IDE. It has no library, but has its own proof kernel.RedPRL is extensible by the user and syntax of the system isinspired by Nurpl programming language [13]. Tactic languagesupport is also available for proof writing.
Oleg Okhotnikov (Class & Int Proof Checker): YuriVtorushin and Oleg Okhotnikov implemented the “Class andInt proof checker”. It was publicly announced first time inOctober 2007. Latest version of the system is Class 2.0 andInt 2.0 and was updated last time in November 2017. Web
8
address is class-int.narod.ru/. Automated proof search for naturalreasoning and support for iterative equalities are the uniquefeatures of the system. System is mainly developed for studentsand teachers. Vtorushin and Okhotnikov uses Class and Intprograms on seminars with students in courses “MathematicalLogics and Algorithm Theory”, “Artificial intelligence”, etc.It is an ATP based on syllogism or mathematical statements.It is based on FOL, supports binary truth value and built ondeductive calculus. Axiomatic method set theoretic support isavailable. Programming paradigm is declarative and developedin C++. Moreover, it supports Windows operating system onlyand has a CLI. It is scalable and also supports distributed andmulti-threaded environment. System supports IDE and has itsown proof kernel. System is not extensible by the user and syntaxis inspired by Mizar and SAD. Tactic language is also availablefor proof writing.
Hans de Nivelle (GEO): Hans de Nivelle from School of Scienceand Technology, Nazarbayev University, Kazakhstan developedthe “Geo” [61] prover. It was released first time in August 2015and latest version is Geo2016C. It is an ATP for FOL that is basedon graph theory and supports partial classical logic (PCL) with3-valued logic as a truth value. Calculus of the system is basedon geometric resolution and is developed in C++. It supportsCLI and only runs on Mac. The system takes geometric formulasand FOL formulas as input, where FOL formulas are changed togeometric formulas. During proof search, it looks for a geometricformulas model through backtracking. Main success story is itsexistence in the current scenario. License type is GNU GPL,Version 3. System is scalable, but not designed for distributed ormulti-threaded environment and it has no IDE. Library support isnot available but proof kernel is owned by the system. Syntax ofthe system is inspired by TPTP-language and it is not extensibleby the user with no tactic support. Web page of the system ishttp://www.ii.uni.wroc.pl/∼nivelle/software/geo III/CASC.html.
Hugo Herbelin (Coq): Hugo Heberlin is working as researcherat INRIA, France. He talked about the “Coq” system [38]. Itwas first released in May 1989. Latest version is 8.10.1 and lasttime updated in October 2019. Web address of the system iscoq.inria.fr. It is developed by INRIA and academic partners.LPGL 2.1 is the license type. Unique features of the system areexpressive logic and programming language, program extraction,elaborated certification language, proof techniques and a tacticlanguage that allows users to define proof methods. Teaching,formalization of mathematics and certified programming are mainuser community of the system. The specification language of Coqis called Gallina (based on Calculus of Inductive Constructions),which allows its users to write their specification by developingtheories. Coq follows the Curry-Howard isomorphism [248]and uses Calculus of Inductive Constructions language [67]to formalize programs, properties and proofs. Curry-Howardisomorphism provides a direct relation between programing andproving and says that proofs in a given subset of mathematicsare exactly programs from a particular language. It means thatone can use a programming paradigm to encode propositionsand their proofs. Coq is an ITP and supports various decision orsemi-decision procedures produced proof-terms checked validby a kernel. Logical framework of the system is based on HOL,λ-calculus and is built on both inductive as well deductivecalculus. Different ways to represent sets are available in the
system. Programming paradigm of the system is functional anddeveloped in OCaml programming language. System supportsboth graphical as well as CLI and run on almost all operatingsystems. System is scalable, but not designed for distributed ormulti-threaded environment. System has IDE, library support isavailable and proof kernel is also owned by the system. Syntax ofthe system is inspired by ML language and is extensible by theusers.
Clark Barrett (CVC4): Clark Barret is the last respondentof the questionnaire and provided information about CVC4 [32],developed at Stanford University and University of Iowa. CVC4is an ATP for SMT problems and was released first time inDecember 2014. Latest version is 1.7 and last time updated inApril 2019. Web address is http://cvc4.cs.stanford.edu and licensetype is BSD 3-clause. CVC4 is based on DPLL(T) calculus [33]and its type with respect to reasoning is mathematical and isbased on standard many-sorted FOL, with limited support forHOL. It also supports finite sets. Main user communities ofthe system are people that are interested in program analysis.Programming paradigm is logical and is developed in C++.CVC4 supports CLI, offers API’s for C, C++, Java, Pythonand runs on Mac and Windows. Support for finite sets is alsoavailable. Moreover, system is modular, does not provide anysupport for distributed computing and offers limited supportfor multi-threading. CVC4 offers solvers for separation logic,sets and relations, where models assign every formulas eithertrue or false. Debug output and run time assertions support isavailable, whereas code generation support is not available. Inputlanguage to CVC4 is SMT-Lib, which is inspired by LISP. It alsosupport CVC input language, which is more human-readable thanSMT-LIB. Moreover, limited support is available for inductivereasoning. CVC4 is used as the main engine in Altran SPARKtoolset and at GE and Google. CVC4 comes first in variousdivisions of Satisfiability Modulo Theories (SMT-COMP), CASCand SyGuS (Syntax-Guided Synthesis) competitions.
The summary of the main characteristics of theorem proversfor which we received answers from experts/developers is listed inTable 5. The answers for PVS is provided by authors of this paperas they have done some work in PVS in the past.
We also developed a layout for the survey questionnaire,which is listed in Table 6. Mnemonics codes are used to rep-resent headlines. These abbreviations are: CLang = Computa-tional Language, 1st Rel = First Release, Ind/Uni/Inde = Indus-try/University/Independent, Prog.P = Programming Paradigm, LV= Latest Version, LT = License Type, UI = User Interface, OS =Operating System, Lib = Library, CG = Code Generation, Ed =Editor, Ext = Extendable, I/O = Input/Output, TType = Tool Type,CLogic = Computational Logic, TV = Truth Value, ST = SetTheory, App.Areas = Application Areas and Eval = Evaluation.We filled the layout for 27 more theorem provers. We collect thedata from various resources such as electronic databases, researcharticles and dissertations. Complete detail for each system is listedin Appendix A.
9
TABLE 5: Main characteristics of 16 theorem provers
Cha
ract
eris
tics
AC
L2
EIs
abel
leA
telie
rB
Esc
her
Met
amat
hTw
elf
Agd
aM
izar
HO
LR
edPR
LC
lass
&In
tG
eoC
oqPV
SC
VC
4
Syst
emTy
peT
PT
PT
PT
PT
PT
PT
PT
PT
PT
PT
PT
PT
PT
PT
PT
P4SM
T
The
orem
Prov
erC
ateg
ory
ITP
AT
PA
TP+
ITP
ITP
AT
PIT
PIT
PIT
PIT
PIT
PIT
PA
TP
AT
PIT
PIT
PA
TP
Syst
emB
ased
onSy
llogi
smSy
llogi
smSy
llogi
smSy
llogi
smSy
llogi
smSy
llogi
smL
FFP
Syllo
gism
Syllo
gism
Syllo
gism
Syllo
gism
GT
DP
Syllo
gism
DPL
L
Log
icU
sed
FOL
FOL
HO
LFO
LFO
L+H
OL
FOL
+HO
LT
TH
OL
FOL
HO
LT
TFO
LPC
LH
OT
TH
OL
FOL
Syst
em’s
Trut
hV
alue
Bin
ary
Bin
ary
Bin
ary
Bin
ary
Bin
+Tri
Bin
ary
Intu
ition
Bin
ary
Bin
ary
Bin
ary
Intu
ition
Bin
ary
3-va
lue
Bin
ary
Bin
ary
Bin
ary
Cal
culu
sIn
duct
ive
Ded
uctiv
eD
ed+I
ndu
Ded
uctiv
eD
educ
tive
Ded
uctiv
eD
educ
tive
Indu
ctiv
eD
educ
tive
Ded
uctiv
eD
educ
tive
Ded
uctiv
eD
educ
tive
Ded
+Ind
uD
educ
tive
Ded
uctiv
e
SetT
heor
etic
Supp
ort
No
Yes
Yes
Yes
No
Yes
Yes
Yes
Yes
No
No
Yes
No
Yes
Yes
Yes
Prog
ram
min
gPa
radi
gmFu
ncIm
peFu
ncIm
peFu
nc+I
mp
Func
LP
Func
Dec
lFu
ncFu
ncD
ecl
Dec
lFu
ncFu
nc+O
OL
ogic
al
Syst
emA
rchi
tect
ure
Mod
ular
Mod
+Mon
oM
odul
arM
onol
ithic
Mon
olith
icM
od+M
ono
Mon
olith
icM
odul
arM
odul
arM
odul
arM
odul
arM
odul
arM
onol
ithic
Mod
ular
Mod
ular
Mod
ular
Prog
ram
min
gL
angu
age
AC
L2
CM
L+S
cala
Prol
ogC
++M
MSM
LH
aske
llPa
scal
SML
SML
C++
C++
OC
aml
CL
isp
C++
Use
rIn
terf
ace
CL
IC
LI
GU
IG
UI
CL
I+G
UI
CL
I+G
UI
CL
IG
UI
CL
IC
LI
GU
IC
LI
CL
IC
LI+
GU
IG
UI
CL
I
Plat
form
Supp
ort
Cro
ssC
ross
Cro
ssC
ross
Win
+Lin
uxC
ross
Cro
ssC
ross
Cro
ssC
ross
Cro
ssW
indo
ws
Mac
Cro
ssM
ac+L
inux
Mac
+Win
Scal
abili
tyY
esY
esY
esY
esN
oY
esY
esY
esY
esY
esN
oY
esY
esY
esY
esN
o
Mul
ti-th
read
edY
esN
oY
esN
oN
oY
esN
oN
oN
oN
oN
oY
esN
oN
oY
esY
es
IDE
Yes
Yes
Yes
Yes
No
Yes
Yes
Yes
Yes
No
Yes
Yes
No
Yes
Yes
No
Lib
rary
Supp
ort
Yes
Yes
Yes
No
No
Yes
Yes
Yes
Yes
Yes
Yes
Yes
No
Yes
Yes
Yes
Prog
ram
mab
ility
Yes
No
Yes
No
No
Yes
Yes
Yes
No
Yes
Yes
No
No
No
Yes
No
Tact
icL
angu
age
Supp
ort
Yes
No
Yes
Yes
No
Yes
No
Yes
No
Yes
Yes
Yes
No
Yes
Yes
No
TABLE 6: Systematic literature review designTheorem provers
Gen
eral Name
Contributor1st RelInd/Uni/Ind
Impl
emen
tatio
n
CLangProg.PLVLTUIOSLibCGEdExt
Log
ico-
Mat
h TTypeCLogicTVSTCalculusProofKernel
Oth
ers App. Areas
EvalUnique Features
5 COMPARISON
In this section, we first listed those scientists that contributed mostin the formalization of mathematical theorems. We also presentthose theorem provers in which most of the theorems are proved.Furthermore, the top systems (from 1996 till 2019) in CADEATP system competition are described. Finally, we showed thecomparison of more than 40 provers for parameters mentioned inSection 1.
5.1 Top Scientists and Theorem ProversEfficiency and power of theorem provers are generally evaluatedon the number of theorems they proved from top hundred theoremslist (available at: http://www.cs.ru.nl/∼freek/100/). People whocontributed most in verifying these theorems are presented inFigure 1. John Harrison currently working at Intel proved 84theorems and he used HOL (particularly HOL Light) and Isabelle.Rob Arthan (second in the list) also used family of HOL theoremprovers for theorem proofs. Theorem provers which proved mostof the theorems from top hundred theorems list are presented inthe order:
HOL Light (86)→ Isabelle (81)→ Metamath (71)→ Coq (69)→ Mizar (69)→ ProofPower (43)→ Nqhtm/ACL2 (18)→ PVS
(16)→ NuPRL/MetaPRL (8)
HOL Light’s performance is outstanding and it is at the top byformalizing and proving 86 theorems. Isabelle, another powerfultool is at the second number in the list. Metamath, Coq, Mizar andProofPower are also the computationally strong tools and playa vital role in the formalization of top hundred theorems. Twosystems in HOL family (HOL Light and ProofPower) are includedin the list.
5.2 Best FOL Theorem Provers at CASCEach year, FOL based ATPs performances are checked in theCADE ATP System Competition (CASC). This competition wasstarted in 1996. CASC consists of various divisions and thesedivisions are categorized based on the type of their problems andthe characteristic of systems. There are two major divisions. First
10
8
43
8
5
13
14
35
13
17
9
9
10
84
6
6
12
10
7
7
Ruben Gambao
Rob Arthan
Ricky Butler
Paul Jackson
Norman Megill
NASA Library
Mario Carneiro
Marco Riccardi
Manual Eberl
Lukas Bulwahn
Laurent Thery
Karol Pak
John Harrison
Jacques D. Fleuriot
Grzegrorz Bancerek
Frederique Guilhot
C-CoRN
Benjamin Porter
Amnie Chaieb
Fig. 1: Theorems proved by scientists in ITPs
one is the competition division which ranks the reasoning system,second one is the demonstration division which enables the systemto demonstrate its potential without ranking. These divisions arefurther divided into subdivision on the basis of problem categories.Competition division is an open platform for automated reasoningsystems that meet the requirements of this division. System se-lected for the competition division tries to attempt all the problemsof this division. Subdivisions of this division are: THF, THN, TFA,TFN, FOF, FNT, CNF, SAT, EPR, SLH (changed to UEQ in 2015and back to SLH in 2018) and LTB. More details on subdivisionscan be found in [251]. These divisions are presented on horizontalaxis in Figure 2, while vertical axis represents the competitionyear. Some tools are specified to only one division, while some aretested on different problem divisions which show excellent results.Figure 2 has sketched the overall results for each division. Arrowsin the figure show the continuous winners in a particular division.For example, Vampire [235] in the FOF division is performing bestfrom 2002 to 2019 and Satallax [49] is coming first in the THFdivision from last seven years. Vampire system topped the TFA,FOF, FNT and EPR divisions respectively. Similarly, iProver [170]dominates the EPR division from 2008 to 2014 and 2016 to 2018.These provers perform best at CASC due to the following reasons:
1) Sound theoretical foundations,2) Thorough tuning and testing,3) Huge implementation efforts, and4) Understanding of how to optimize for the competition.
5.3 Provers Comparative Analysis ResultsThis subsection presents the comparative analysis of theoremprovers for various parameters, that includes: theorem prover cate-gory, mechanical reasoning system type, logical framework, truthvalue, calculus of the system, set theoretic support, programmingparadigm and programming language of the system, UI, systemscalability, distributed/multi-threaded, IDE and library support.Results for each parameter is presented next.
5.3.1 Theorem Prover CategoryFigure 3a shows the category of theorem provers such as ATPs,ITPs, geometric theorem provers, decision procedures and theorygenerators, etc. Major portion is shared by ATPs (44%) and ITPs(31%). Systems working as both ATP and ITP take 11%. Whereas,systems that work as either theory generator, as ATP and model
1996
2004
2001
20032002
2000199919981997
2005
2013
2010
20122011
2009200820072006
2018
2015
20172016
2014
2019
THF THN TFA TFN FOF FNT CNF SAT EPR SLH LTBE-SETHEO Otter
SPASS Gandalf SPASS
SPASS
SPASS
E-SETHEO
Gandalf SPASS Waldmeister
Vampire
Vampire
Vampire
TPS
LEO-I
Satallax
Isabelle
Satallax
Satallax
SPASS + T
Princess
CVC4
VampireZ3
Vampire
Vampire
Niptick
SPASS + T
CVC4
Beagle
Paradox
Meta Prover
Paradox
Paradox
iProver
iProver
Vampire
Vampire
Vampire
E
E-SETHEO
Vampire
Vampire
E
E-SETHEO
OtterMACE
GandalfSAT
Gandalf
Gandalf
Paradox
Paradox
Meta Prover
Paradox
DCTP
DCTP
Darwin
Darwin
iProver
iProver
Vampire
iProver
iProver
Vampire
Waldmeister
Waldmeister
UEQ
Vampire
SLH
E
E-SETHEOSEM
SInE
Vampire
Vampire
MaLARea
Vampire
VampireMaLARea
LEO-III
Fig. 2: Top ATPs at CASC
generator and as automated geometric theorem prover share 6%.Systems that work as decision procedures and ATP for SMTproblems take 8%.
5.3.2 Mechanical Reasoning System Type
Figure 3b represents the theorem provers grounding either theseare based on syllogism, mathematical statements or various logictheories. 54.5% of the systems are based on syllogism or mathe-matical statements. While other types take 9.1% each.
5.3.3 Logical Framework
Figure 3c shows the logical framework of theorem provers. Mostsystems (59%) are FOL based systems. 16% of the systems arebased on HOL. Systems that are based on graph theory, dynamicmodal logic, FOL/HOL, pure classical logic, equational logic, typetheory and higher order type theory share collectively 25%.
5.3.4 Truth Value and System Calculus
Figure 3d shows 86% of theorem provers are based on Booleanlogic or binary logic, while 7% of the systems are based onintuitionistic logic. Systems that result in triadic (3-value) share2% and both binary and triadic value systems take 5%.
Figure 3e shows 52% of the systems are based on deductivecalculus while 10% are based on inductive calculus. Theoremprovers which are based on both inductive and deductive calculustake 5%. Systems based on euclidean and differential geometry,first order predicate calculus, fixed point co-induction, λ-calculus,sequent calculus, tableau calculus, instantiations calculus, hypertableau calculus and typed λ-calculus are respectively 3%, 3%,3%, 5%, 5%, 8%, 2%, 2% and 2%.
5.3.5 Set Theoretic Support
Figure 3f shows that only 30% of theorem provers provide settheoretic support, while 56% do not support set theory. Systemsthat support Horn theory, swinging type theory and ZF set theorytake 2% each. Systems that supports Quine’s and B-Method settheory also take 3% each.
11
5.3.6 Programming Paradigm and Programming LanguageFigure 3g presents the programming paradigm of the systemssuch as functional, imperative and declarative, etc. Programmingparadigms of 23% theorem provers are functional. While systemshaving functional, imperative and object oriented paradigms take16%. Systems that belong to logic programming paradigm take9%. Provers that belong to both procedural and object orientedparadigm are 7% and 12% systems belong to the declarativeparadigm. 2% of the systems belong to functional, concurrent,and object oriented programming paradigm. Systems that be-long to functional and imperative paradigm take 5%. Theoremprovers which come under the functional and procedural paradigmtake 5%. Systems that only belong to concurrent programmingparadigm take 2%. Systems belonging to both functional andmodular paradigm take 2%.
Figure 3h presents the overview of programming languageswhich are used to develop theorem provers: Ocaml (20%), C/C++(17%), Common Lisp (10%), Java (10%), Prolog (10%), SML(10%), Haskell (7%), Mathematica (5%), Pascal (3%), Metamath(2%), Perl (2%), Scala (2%), ML and Scala (2%).
5.3.7 User Interface and Operating SystemTheorem provers are mostly available with CLI (54%) only, while30% percent of the systems are available with GUI only and 16%of the systems provide both CLI and GUI.
On the other hand, 51% theorem provers run on cross platformas shown in Figure 3i. 15 % of the systems support Linux,Mac and Windows operating systems. 3% of the systems runon all Unixoids-based operating systems. Systems that run onlyon Windows take 3%, Unix (5%), Linux (5%), and Mac (2%).Systems that support Unix as well as Linux are 2%. Systemsthat run on Linux, Unix, Windows and Mac are 5%. Systemssupporting Linux, Solaris and Mac take 2%. Systems that onlysupport Linux and Solaris take 2%.
5.3.8 Distributed/Multi-threaded, IDE and Library SupportOnly 17% (8 out of 43) of theorem provers support distributed ormulti-threaded environment, while 83% of the systems does notsupport such environments. Further, our results showed that allof the systems have the ability of scalability according to futureneeds. On the other hand, 65% (30 out of 43) of the systemssupport IDE. Furthermore, 56% (26 out of 43) of the systemshave their own libraries while 44% of the systems does not havetheir own libraries.
5.4 The de Bruijn CriterionAccording to the de Bruijn criterion “the correctness of the math-ematics in the system should be guaranteed by a small checker”[30]. This means that a system has a ‘proof kernel’ (also calledproof checker) that is used to filter all the mathematics. Table 7shows whether 15 theorem provers for which we received answersfrom experts/developers have small proof kernels or not + standsfor yes and - for no). Proof kernel for other 27 theorem provers areshown in Appendix A, and majority of them have no proof kernel.Whereas, HOL Light has extremely small proof kernel containngonly several hundred lines of OCaml.
In some ITPs (e.g., Coq and Agda), the proof kernel alsochecks the correctness of proof-objects that are generated by othertools included in the whole system. For an ITP with proof-objects,the proof-script for a statement (theorem or lemma) contain a list
TABLE 7: de Bruijn CriterionSystem De Bruijn Criterion Proof-objectACL2 - -
E - +Isabelle + x
Atelier B - -Escher - -
Metamath + +Twelf - -Agda + +Mizar - -HOL + x
RedPRL/Nuprl + xClass & Int + +
Geo + +Coq + +PVS - -
of tactics/strategies that are required to make the proof-assistantto verify the validity of the statement. The proof-script generatesand stores a term that is a proof that can be checked by a simpleproof kernel. The reliability of the whole system depends on thesoundness of proof-objects and the proof kernel. Even if someonehave doubts about the validity of certain statements or if someparts of the systems contain bugs, the proof-object for a givenstatement and the proof kernel can be used to locally verify thestatement within the corresponding logical system.
HOL, Isabelle and Nuprl come in the class of ITPs that havea proof kernel but no proof-objects. In such systems, the proof-script are considered as no-standard proof-object (shown with xin Table 7). They translate the proof-script into a proof-object thatrequires some system specific preprocessing. The trustworthinessof the translation is verified with the proof kernel. For ITPs withno proof kernel (e.g., PVS, Mizar and ACL2), there is no way(yet) that provides a proof-object with high reliability. One has totrust these systems for the correctness of statement accepted bythe assistants. The advantage of these kind of systems generallyis their larger automated deduction facilities and user-friendliness[31].
5.5 The Poincare Principle and Automation
For theorem provers, one of the important aspects is the automa-tion of trivial tasks [274]. It means that a user is not requiredto explain all the details of the calculations to a theorem prover.A theorem prover satisfies the Poincare principle (formulated by[29]) if it has the ability to automatically prove the correctnessof calculations. For example 3 + 4 = 7 holds by computation andit should not be justified with long chains of logical inferences.Table 8 lists whether a prover satisfies Poincare principle or not.
Another important feature of a theorem prover is whether itenables its user to write programs that can solve proof problemsalgorithmically. HOL, PVS, Coq and Isabelle offers such kindof user automation. Strong built-in automation (proof tactics,decision and search procedures, and induction automation etc.) isanother important aspect of a theorem prover. ACL2 and PVS hasthe powerful built-in automation. Then comes HOL, Isabelle andthen Mizar and Coq. The following order indicates the reliabilityof ITPs with most reliable on the left side and least reliable on theright side.
Agda, Coq, Metamath, Nuprl, HOL, Isabelle, Twelf, Mizar, PVS,ACL2.
12
(a) Theorem provers category (b) Mechanical reasoning system type
(c) Logical framework of systems(d) Truth value of the systems
(e) Calculus of the systems
(f) Set theoretic support
(g) System programming paradigm (h) System programming language
(i) Operating system
Fig. 3: Theorem provers comparisons
13
TABLE 8: The Poincare principleSystem Poincare principle User automationACL2 + +
E - +Isabelle + +
Atelier B + +Escher - -
Metamath - -Twelf - -Agda - -Mizar - -HOL + +
RedPRL/Nuprl + -Class & Int + -
Geo - +Coq + +PVS + +
Agda is placed at first place as it only uses predicative logic.The middle places are occupied by Nuprl, HOL and Isabellebecause of their non-standard proof-objects. Finally, the leastreliable are those that do not work wit proof-objects. On the otherhand, the order for internal automation in ITPs is opposite withACL2 and PVS on the top.
6 STRENGTHS, ANALYSIS, APPLICATIONS ANDFUTURE DIRECTIONS
In this section, we provide the details on the strengths, in-depthanalysis of the differences and applications of theorem provers.Moreover, some future research directions based on recent workis also discussed. Some theorem provers such as Geo, Class & Intproof checker are omitted in this section and some other famousprovers such as Nuprl, Vampire, Prover9/iProver and MaLAReaare included.
6.1 ITPsACL2 main strengths are: state-of-the-art prover heuristics, robustengineering and extensive hyperlinked documentation. Among allITPs, ACl2 uses FOL instead of higher-order logics. Only twoITPs, Isabelle and ACl2 offer parallel proof checking facility.ACl2 also supports program extraction (also called code gener-ation) by translating the specification in ACL2 language to Com-mon Lisp. Theories can also be developed and executed in ACL2as it is built around a real programming language. Users can notconstruct inductive types, but powerful built-in induction schemein ACl2 allows users to define their own recursive functions.The inference engine is based on the waterfall design of Nqthmprover [219]. ACL2 has been used extensively to verify hardwareand software designs at AMD, Centaur, Oracle, Intel, IBM andto prove separation kernel properties at Rockwell Collins [137].Moreover, ACL2 is also used successfully in processor modeling[91], digital systems [157], programming languages [200], asyn-chronous circuits [59] and concurrency [199]. ACl2(ml) [125] usesmachine learning to facilitate users in the proof process. In past,some work has been done in integrating SAT solvers into ACL2[224], [255]. However, it has generated new issues because oftheir support for a wide range of domains including real numbersand uninterpreted functions. The x86isa library [104] in ACL2offers a formal model for reasoning about x86 machine-codeprograms. Adding several features to current x86isa library suchas exceptions, interrupts handling and extending I/O capabilitieswill enable us to reason about real system codes.
Atelier B offers a framework that automatically prove andreview user added mathematical proof rules. Proof obligations inAtelier B contain traceability information that helps in locatingmodeling errors and model editor allows the navigation of modelsand operations [177]. B-method allows one to develop many mod-els of the same system with the refinement technique. However,one is required to explain the B model while proving theoremsand the proof obligation generator may generates small proofobligations that need to be discharged. Atelier B is used in thedevelopment of safety-critical systems [178] and communicationprotocol [145]. Moreover, B-method is also used successfully inthe development of safety-critical parts in two railway projects [5],[27] and byte code verifier of the Java card [57].
Main strength of Metamath is that it uses the minimumpossible framework that is required to express mathematics andtheir proofs. Unlike most ITPs, no assumption is made by theMetamath’s verification engine about the underlying logic and themain verification algorithm is very simple (essentially nothingmore than substitution of variables expression, enhanced withchecking for conflicts in distinct variables). Weaker logics suchas quantum or intuitionistic can be handled in Metamath withdifferent sets of axioms. Proofs in Metamath are generally verylong, but the proofs are completely transparent and Metamathdatabase contains over 30,000 human readable formal proofs.In the proof development process, users prove a theorem/lemmainteractively within the program, which is then written to thesource by the program. A definition is provided in [54] for modelsof Metamath style formal systems, which are demonstrated onpropositional calculus examples. From mathematical foundations,Hilbert space and quantum logic is developed in Metamath, whichare used in the verification of some new results in these fields.Metamath is used in the formalization of Dirichlet’s theorem andSelberg’s proof of the prime number theorem [55]. An algorithm ispresented in [56] that converts HOL Light proofs into Metamath.
Twelf strengths are representing deductive systems with sideconditions and judgments with contexts. It offers an environ-ment for experimenting with encodings and to verify their meta-theoretic properties. It also provides a module system for the or-ganization of large developments. Twelf implements the λProlog[89] and its logic is very close to the Edinburgh Logical Frame-work (LF) [226]. Twelf specifications can be executed througha search procedure, which means that it can also be used as alogical programming language. Twelf is used in proving the safetyof standard ML programming language [179], in typed assemblylanguage system [69], in foundational Proof Carrying-Code sys-tem [18], in cut-elimination proofs for classical and intuitionisticlogic [225], for specifying and validating logic morphisms [242]and construction of a safety policy for the IA-32 architecture [70].
Main features of Agda is the interactive construction of pro-grams and proofs with meta-variables and place-holders. Unlikeother ITPs that work with proof-scripts, Agda acts as a structureeditor, providing support for term construction. Users can editthe proof-object by focusing on a hole and executing one of theoperations (tactics) that is applied to that hole. It is the only ITPthat offers a functional programming language with dependenttypes. Moreover, strictly positive inductive and inductive-recursivedata types are supported in Agda. Emacs interface for Agdaallows the incremental development of programs [15]. Agda isused to formally verify railway interlocking systems [155], webclient application development [143], fully certified merge sort[66], hardware description and verification [92], formalizing Type-
14
Logical Grammars [168], Curry programs verification [17] andformalization of Valiant’s Algorithm [37]. In [94], automated the-orem prover Waldmeister [129] is integrated in Agda to facilitatethe proof automation. Similarly, another tool is proposed in [186]for automated theorem proving in Agda.
Mizar received popularity because of its huge repository offormalized mathematical knowledge, which has been used indeveloping AI/ATP methods to solve conjectures in mathematics[260]. Over the years, the syntax of Mizar is improved, simpli-fied and the Mizar language now contains a rich set of logicalquantifiers and connectives. The evolution of Mizar in first 30years is presented in [194]. In Mizar, proofs are written in adeclarative way and proofs are developed according to the rules ofthe Jaskowski style of natural deduction [142]. This characteristicsinfluenced other systems to build similar kind of proof layerson top of several other systems, such as the Mizar mode forHOL [115], the Isar language for Isabelle [270], Mizar-Light forHOL-Light [273] and the declarative proof language (DPL) forCoq [68]. Mizar does not support the Poincare principle, yet ithas some automated deduction and a set of tactics that are veryuser-friendly. The main unique feature is that the proof-scriptis close to an ordinary proof in mathematics. Apart from largeMML, Mizar is used in the development of rigorous mathematics,in hardware/software verification and in mathematical education[110]. Some recent work in Mizar includes the formalization ofPell’s equation [6], Nominative Algorithmic Algebra [169] andformalization of bounded functions for cryptology [214]. A suiteof AI/ATP system is developed on Mizar library in [151] thatcontains approximately 58000 theorems. 14 strongest methods thatexecuted in parallel proved 40.6% of the theorems. Moreover, anindependent certification mechanism is developed in Mizar that isbased on Isabelle framework [148]. In [149], Mizar environmentis also emulated inside the Isabelle.
HOL was build upon the Cambridge LCF approach [221] thatis now referred to as HOL/88 with the purpose of hardware veri-fication. It has influenced the development of other famous ITPssuch as Isabelle/HOL, HOL Zero, ProofPower and HOL Light. InHOL, the computations that involves recursion can become quitelengthy and complex when they are converted to proof-objects.Thus, the proof-objects are not stored, only the proof-scripts.This is the reason why non-standard proof-objects are used inHOL. Users in HOL generally work inside the implementationlanguage. As HOL is fully programmable, various other means ofinteracting with HOL have also been developed. HOL Light is themost widely used provers of this family and it is probably the onlyprover that represents the LCF approach in its purest form. Its logicis based on simple type theory with polymorphic type variable.The terms in HOL Light are of simply typed λ−calculus, withjust two primitive types: bool (Booleans) and ind (individuals).HOL has been used extensively for formal verification projectsin industry. HOL provers are used widely in the formalization ofmathematical theorems [23], hardware design [2], [116], commu-nication protocols verification [122], programs [119] and controlsystem analysis [232]. Similarly, HOL(y) Hammer offers machinelearning based premise selection and automated reasoning bothfor HOL Light and HOL4 [151]. Recent work in HOL includesthe formalization of quaternions [96], linear analog circuits [257],process algebra CCS [258] and metric spaces [190]. A library forcombinational circuits is developed in [244]. Moreover, formalizedLambek calculus has been ported from Coq to HOL4 in [259] withsome new theorems and improvements. A technique based on A*
algorithm is presented in [101] to automate the selection processof tactics and tactic-sequences in HOL4.
Nuprl [64], which inspired the development of RedPRL, is aproof development system based on Computational Type Theory.Nuprl has evolved significantly in years and now can handlethose logics where inference rules can be declared in a sequentstyle. It follows the LCF approach and the type theory includeless-common subtype, very-dependent function types and typeconstructors of quotient type. Type checking in Nuprl is unde-cidable as subtypes can be defined with arbitrary types. Whereasin PVS, an algorithm for type-checking automates all simpler typechecking tasks. Moreover, the computation language is untypedand judgments are also not decidable because the Poincare prin-ciple is assumed not only for intensional equality but also forextensional equality. Users can interact with Nuprl only throughstructural editors where proofs can be edited and viewed as prooftrees. Nuprl is used in mathematics formalization and hundredof theorems are proved in the system [11]. Nuprl is also usedin protocol verification [41], hardware and software specificationand verification [1], [181], reasoning about functional programs[135], design of reliable networks [175], and the development ofdemonstrably correct distributed systems [40]. Some work on theintegration of Nuprl with other systems is done in the past. In[12], PVS is integrated in Nuprl to enables users to access PVSfrom the Nuprl environment. A new semantics is provided in [134]to embed the logic of the HOL prover inside Nuprl. Similarly,Nuprl’s meta-theory is formalized in Coq [16], which is later usedin the Nuprl proof for the validity of Brouwer’s Bar Inductionprinciple [228].
Coq also follows the LCF approach and probably the mostdeveloped ITP after HOL Light. The logic used in Coq is veryexpressive that can define rich mathematical objects. Moreover,Coq has the ability to explicitly manipulate proof-objects fromproof-scripts, which makes the integrity of the syetm dependent onthe correctness of the type-checking algorithm. As Coq is based onconstructive foundations, it has two basic meta-types (also calledsorts): Prop (as a type of logical propositions) and Set (as a typeof other types (eg., Booleans, naturals, subsets, etc) [277]. Thisallows Coq to distinguish between terms that represent proofsand terms that represent programs. A program extractor can alsobe used to synthesize and extract verified programs (in OCaml,Haskell or Scheme) from their formal specifications [182]. Coquses two languages for proofs: Gallina (a pure functional program-ming language) for writing specification and LTac (a procedurallanguage) for the proof process manipulation. Main success storiesof the system are formalization of fully certified C-compiler [44],[180], disjoint-set data structure [69], formalization of two wayfinite automata [83], multiplier circuit formalization [220] andcoordination language [185]. Main mathematical formalizationsdone in Coq include the formalization of Feit-Thompson theorem[107], four-color theorem [106], three gap theorem [195], realanalysis [45] and theory of algebra [102]. A new approach inCoq is presented in [256] that directly generates provably-safe Ccode. Recently, Coq is used in the formal verification of dynamicalsystems [63], password quality checkers [90], security protocol[218], QWIRE quantum circuit language [230], complex datastructure invariants [146], component connectors [133], [279] andthe control function for the inverted pendulum [236]. Moreover, aplug-in (called SMTCoq) is developed in [86] to integrate externalSMT solvers in Coq. An IDE for integration of Coq projects intoEclipse is also developed in [88].
15
Compared to other ITPs, PVS is based on classical simpletype theory and is without proof-objects, which allows all kindsof rewriting for numeric as well as symbolic equalities. It of-fers theory interpretation, dependent types, predicate sub-typing,powerful decision procedures, Latex support for specifications andproofs, and is user-friendly due to highly expressive specificationlanguage and powerful built-in automated deduction. It is alsointegrated with other outside systems such as a BDD-based modelchecker and also serves as a back-end verification tool for com-puter algebra and code verification systems [191]. During proofconstruction, PVS builds a graphical proof tree in which remainingproof obligations are at the leaves of tree. Each node in the treerepresents a sequence and each branch is considered as a (sub)proof goal that is followed from its off-spring branch with thehelp of a proof step. PVS prover is based on sequent calculuswhere each proof goal is a sequent consisting of a sequence offormulas called antecedents and a sequence of formulas calledconsequents. The type system of PVS is not algorithmicallydecidable and theorem proving may be required to establish thetype-consistency of a PVS specification. Theorems that need to beproved are called type-correctness conditions (TCCs). PVS is usedin hardware and software verification [161], [215], concurrencyproblems verification [127], file systems verification [128], controlsystems [266], cryptographic protocol [24], microprocessor veri-fication [249], real time systems [243], formalization of integralcalculus [51] and medical devices [193]. Some recent work in PVSincludes the specification of multi-window user interface [246],formalization of component connectors [206], [208], analysis ofdistributed cognition systems [192], genetic algorithms operators[205] and cloud services [207]. PVS along with its libraries istranslated to the OMDoc/MMT framework in [167]. The proposedtranslation provides a universal representation format for theoremprover libraries. Similarly, PVS is allowed in [103] to export proofcertificates that can be verified externally. Moreover, denotationalsemantics for Answer Set Programming (ASP) is encoded in [10]and fundamental properties of ASP are proved with PVS theoremprover. Some of the differences between top ten famous proof-assistants are listed in Table 9.
TABLE 9: Comparison of proof-assistants
ITPs rel T dep. T dec. T state. R rpif LLibACL2 - - - - + + +
Isabelle ++ + - + + + +Metamath + - - - + + +
Twelf + + + + - - -Agda +++ + + + - - +Mizar + + + + + + +HOL ++ + - + + - +Nuprl ++ + + - - - -Coq + + + + + - +PVS - + + - + - +
rel: reliability, T: typed, dep. T: dependent type, dec. T: decidable type,state. R: statement about R, rpif: readable proof input files, LLib: largelibrary
6.2 ATPsIsabelle is built around a relatively small core that implementsmany theories as classical FOL, constructive type theory, intu-itionistic natural deduction and ZFC. Its meta-logic is based onthe fragment of intuitionistic simple type theory that includesbasic types as functional types. Whereas the terms are of typedλ-calculus. Only the type prop (proposition) is defined by the
meta-logic and the formulas are terms of type prop. The meta-logicsupports implication, the universal quantification and equality, andthe inference rule is provided in natural-deduction style [191]. Forproofs, Isabelle combines HOL for writing specification and Isaras the language to describe procedures for proofs manipulation.Isabelle/HOL [210] is the most widely used system nowadays.Isabelle offers a rich infrastructure for high-level proof schemes.During theory development, both structured and unstructuredproofs can be mixed freely. It is important to state that both HOLand Isabelle use non-standard proof-objects in the form of tacticsfor equational reasoning. This makes formalization relatively easyin both systems but it has the disadvantage that the proof-objectscan not be used to see the proof details. In principal, both systemscan be modified for proof-objects creation and storing.
HP used Isabelle in the design and analysis of the HP 9000 lineof servers’ Runway bus [52]. The L4.verified project at NICTAused Isabelle to prove the functional correctness of seL4 micro-kernel [163]. Moreover, Isabelle is successfully used in securityprotocols’ correctness [50], formalization of Java programminglanguage [267], Java virtual machine code soundness and com-pleteness [164], property verification of programming languagesemantics [165]. A list of research projects that uses Isabelle canbe found at: isabelle.in.tum.de/community/projects. Recent worksin Isabelle include the verification of Ethereum smart contractbytecode [14], imperative programs asymptotic time complexityverification [278] and formalization of Akra-Bazzi method [84],deep learning theoretical foundations [36], Green’s theorem [3]and Markov chains and Markov decision processes with discretetime and discrete state-spaces [132].
From last 15 years, E theorem prover is constantly participat-ing at CASC in more than one category (winnrer in SLH divisonin CASC-27 (2019)). The semantics of E is purely decelrative andinternal unique features are: shared terms with cached rewriting,folding feature vector indexing and fingerprint indexing. Uniquefeatures that are visible to the users are advanced and highlyflexible search heuristics. E main strengths are the generation ofproof-objects, the automatic problem analysis and the support forthe TPTP standrad for answers [241]. E is successfully used in thereasoning of large ontologies [229], software verification [231]and certification [81]. One of the main limitations in ATP is thelack of mechanism that allows proofs to guide the proof searchfor a new conjecture. In this regard, E is extended in [140] withvarious new clause selection strategies. These strategies are basedon similarity of a clause with the conjecture. The use of watchlists(also known as hint list) in large E theories is explored in [105],to improve the proof process automation.
Escher Verifier (the successor of Perfect Developer [71]) isbased on the Verified Design-by-Contract Paradigm [72] inspiredfrom Hoare logic and weakest precondition. It performs staticanalysis and formal verification of C programs by checking theout-of-bounds array indexing, arithmetic overflow, null-pointerde-referencing and other undefined behavior in the program. Itextends the C language with additional keywords and constructsthat are required in programs specifications expression and to givestrength to the C type system [53]. Term rewriting and FOL basedtheorem prover is implemented for the verification purpose. Theunique feature of the tool is that it provides hints for the caseswhen the provers is unable to verify the code automatically [114].Escher verifier is used in the verification of C programs [74],compilers [73] and formal analysis of web applications [75].
Vampire [235] is an ATP for FOL, based on equational super-
16
position calculus and is one of the best ATPs at CASC. Uniquefeatures of Vampire include the generation of interpolants andimplementation of a limited resource strategy (LRS). Moreover,symbol elimination is also implemented in Vampire that is used toautomatically find first-order program properties. It has a specialmode for working with very large knowledge bases and can answerqueries to them according to TPTP standars. On a multi-coresystem, Vampire can perform several proof attempts in parallel[173]. The strength of ATPs such as Vampire, E and SPASS inproving theorems from MML is presented in [262]. Some work onadding arithmetic to Vampire is done in [171]. Vampire is used in[111] to automate the soundness proofs of type systems. Vampireis also used for program analysis and in proving properties of loopswith arrays [172]. Cheap SAT solvers such as Avatar [233] playsan important role in the success of Vampire. In general, Vampireis well-suited in the domain of type soundness proofs. However,the use of Vampire relies heavily on the size of the chosen axiomset and on the concrete form of individual axioms.
Prover9 [196], the successor of the Otter prover, is a resolutionbased automated prover for equational logic and FOL. Mainstrength of Prover9 is that it is paired with Mace4. Users givesformulas and Prover9 attempts to find a proof. If proof is not findthen Mace4 looks for a counter-example. Similalry, Prooftrans canbe used to transform Prover9 proofs into more detailed proofs,simplify the justifications, re-number and expand proof steps,produce them in XML format, generate hints to guide subsequentsearches and produces proofs for input to proof checkers suchas IVY. Prover9 is used in the analysis of cryptographic-keyAssignment schemes [239], verification of Alloy specificationlanguage [187] and access control policies [238]. Moreover sometasks in combinatorics on words [131] and geometric procedure[216] is also formalized in Prover9, along with proofs of theoremsin Tarskian geometry [264]. Similarly, iProver [170] is based onan instantiation framework for FOL called Inst-Gen [98]. Firstorder reasoning is combined with ground reasoning in iProverwith the help of SAT solver called MiniSat [85]. Main strengths ofiProver are: reasoning with large theories, a predicate eliminationprocedure as a preprocessing technique, EPR-based k-inductionwith counterexample, model representation with first order defini-tions in term algebra and proof extraction for resolution as wellas instantiation. More details on iProver and other ATPs can befound in Appendix A.
Table 10 compares the famous ATPs that perform best atCASC for some features. SonTPTP (Systems on TPTP) is anonline interface for ATPs. It can be used by the users to run theATP on TPTP (thousand problems for theorem provers) library ortheir own problems in the TPTP language [252]. It is important topoint out that MaLARea [261] is not an ATP. It is a simple meta-system that combines several ATPs (E, SPASS, Vampire, etc) witha machine learning based component (SNoW system). MaLAReainterleaves the ATPs by first running them (in cycles) on problems,followed by machine learning from successful proofs. The learnedinformation is then used to limit the set of axioms provided toATPs in the next cycle. In CASC-J9 (2018), MaLARea comesfirst in the LTB division.
ATPs can be integrated (through hammers [43]) with ITPs forthe proof automation in interactive proof development process.Hammers use ATPs to automatically find the proofs for userdefined proof goals. They combine the learning from previousproofs with translation of the goals to the logics of ATPs andreconstruction of the successfully found proofs for the goals.
TABLE 10: Comparison of famous ATPs
Type ILang Lib SS Web serviceE SB-FOP C - + SonTPTP
Vampire SB-FOP C++ + - SonTPTPProver9/Otter RB-FOP C + - SonTPTP
SPASS SB-FOP Java/C - + SonTPTPSatallax TB-HOP OCaml - + SonTPTPiProver IB-FOP OCaml - + SonTPTPLEO-II RB-HOP OCaml + + SonTPTP
MaLARea MS-ATP Perl + - -
SS: standalone system, FOP: first-order prover SB-FOP: super-position-based FOP, RB-FOP: resolution-based FOP, TB-FOP:tableau-based FOP, TB-HOP: tableau-based higher-order prover,IB-FOP: instantaition-based FOP, RB-HOP: resolution-based HOP,SonTPTP: systems on TPTP. MS-ATP: metasystem for ATP
Similarly, the SAT/SMT solvers can be used in ITPs by firsttranslating and passing the goals to the fragment supported bya SAT/SMT solver. The SAT/SMT solver then solve the translatedgoal without human intervention and guidance. Table 11 lists someof the hammers and SAT/SMT solvers that are developed andintegrated with ITPs. Waldmeister [129] is integrated with Agdain [94], but no SAT/SMT solvers is yet integrated with Agda.Moreover, PVS employs the Yices SMT solver as an oracle [237],but not integrated with any ATPs yet. Some new theorem proversthat aims to fill the gap between interactive and automated theoremproving such as Lean [22] offers APIs to access features of SMTsolvers (CVC4, Z3) and ATPs.
TABLE 11: ATPs and SAT/SMT solvers for ITPs
ITP Hammers SAT/SMT solversIsabelle/HOL Sledgehammer [223] Yices in Isabelle/HOL [87]
HOL Light/HOL4 HOLyHammer [152] SMT solvers for HOL4 [269]Mizar MizAR [153] MiniSAT for Mizar [203]Coq Hammer for Coq [76] SMTCoq [86]
ACL2 ATPs for ACL2 [144] Smtlink for ACL2 [224]
6.3 Some Future DirectionsActive research activity is going on in both ITPs and ATPs.Despite the great progress in last three decades, general purposeFOL based theorem provers are still unable to directly determinethe satisfiability of a first-order formula. SMT problem dealswith whether a formula written in first-order is satisfiable insome logical theory. One of the famous theorem prover for SMTproblem is CVC4 [32]. SMT solvers may not terminate on someproblems due to undecidability of FOL. In such cases, users wouldlike to know the reason why the solver failed. Developing toolsfor SMT solvers which allows developers and users in helping thesystem to finish some proofs is an interesting research area.
Currently, most of the SMT solvers display “unknown” whenthey are unable in proving the unsatisfiability of quantified formu-las [234]. Main research direction in SMT solvers is to enablethem to return counter models in case they fail to prove theunsatisfiability of quantified formulas that ranges from integersand inductive datatypes to free sorts. Popular SMT solvers (CVC4,Yices, Z3 etc) generally work in a sequential manner. Oneanother research area is to parallelize SMT solving to betterutilize the capability of hardware in multi-core systems. PZ3[60] (the parallel solver for Z3) is one example for this kindof parallelization. Another important area is the development oftools that can integrate SMT solvers in ITPs to increase the levelof automation by offering safe tactics/strategies for solving proof
17
goals automatically with the help of external solvers. SMT solversfor ITPs listed in the preceding Table 11 are some example of this.
One of the main challenge in ATPs is reasoning with large on-tologies, which are becoming more dominant as large knowledgebases. Some techniques used for reasoning with large theoriesare based on methods for different axiom selection [130], [253].Machine learning is also used for axiom selection where previousknowledge is used to prove a conjecture [261], [263]. Frameworkfor abstract refinement, where the axioms selection and reasoningphases are interleaved, can also be used in reasoning of largeontologies, as shown recently in [126].
Theorem provers have the limitations that it is not fast enough,the logic is inconvenient as a scripting language and majority oftheorem provers do not support graphics and visualization tools.Similarly, ITPs requires heavy interactions between a user and theproof-assistant, which consumes a lot of time. IDE’s in theoremprovers , especially in ITPs, can substantially facilitate the creationof large proofs. However, very few of them are equipped withfull-fledged IDE’s. Some future work in this direction includes:(i) making the provers fast and efficient, and (ii) developmentof IDE’s and integration of IDE’s in different provers. This willmake these tools more acceptable in industrial sector. Similarly, inITPs, users make use of tactics that reduces a goal to simpler andsmaller sub-goals. Another interesting area is the development ofstrategies/tactics by using tactic languages, such as HITAC [21]and Ltac [80] which will allow users to elaborate proof strategiesand combine tactics.
ITPs also lack the inter-operability among proof-assistants andother related tools, which means that tool support cannot be easilyshared between ITPs. Translation of ITPs to a universal format isneeded to overcome the duplication efforts in the development ofsystems, their libraries and supporting periphery tools. Similarly,some work [99], [100] is done on integrating the model checkingwith theorem provers. However, integrating model checking withtheorem proving is more difficult as it involves the mapping ofmodels and mathematics involved in the analysis of the systems.
The vision of QED manifesto is to develop a universal,computer-based database for all mathematical knowledge thatis formalized in logic and is supported by proofs that can bevalidated mechanically. As shown in previous sections, theoremprovers are diverse in nature with radically different foundations.On one hand, using various provers offers a diverse experience,which helps to better understand the strengths and weaknessesof provers. However, on the other hand, the effort and overheadneeded to learn even one prover effectively makes researchersto stick to using just one system. This results in duplication ofsimilar work. A way of sharing the work and knowledge amongprovers would not be just appealing but it would also make proversmore powerful and practical. One feasible approach is to importtheorems from one prover to another.
Theorems between different systems are transported by trans-lating the libraries between systems, as done in [158], [174],[213]. The main challenge in sharing theorems is to ensure ameaningful semantic match between the provers, meaning thatlogic, definitions, types and treatment of functions, etc. in proversare compatible with each other. Isabelle’s sledgehammer [223]describes the way for integrating different automation tools. How-ever, sledgehammer has number of limitations, such as unsoundtranslation, primitive relevance filter and low performance onhigher-order problems. Integrating ITPs with ATPs still requiresa lot of research into approaches of interfacing. One of the main
challenge is a sound and reliable translation among differentlanguages and logics. Similarly, other main concern is the inter-pretation of ATP outputs back into ITP environment.
ITPs does have a large corpora of computer-understandablereasoning knowledge [42], [121] in the form of libraries. Thesecorpora can play an important role in artificial intelligence basedmethods, such as concept matching, structure formation and theoryexploration. The ongoing fast progress in machine learning anddata mining made it possible to use these learning techniqueson such corpora in guiding the proof search process, in proofautomation and in developing proof tactics/strategies, as indicatedin recent works [101], [105], [147], [154], [209]. Such machinelearning systems can also be combined with ATPs on the largeITPs libraries to develop an artificial intelligence based feedbackloops. Another interesting area is to use evolutionary algorithms[26] in ITPs to find and evolve proofs. Till now, effective searchmechanisms for formal proofs are lacking and we believe thatevolutionary algorithms are more suitable for this task due to theirsuitability to solve search and optimization problems. Moreover,investigating evolutionary/heuristic algorithms (as the program(proof) generator) and ITPs (as the proof verifier) to automaticallyfind formal proofs for new conjectures is also worth pursuing.Some initial work can be found in [136], [276], where a GA is usedwith the Coq to automatically find formal proofs for theorems.
A single tool based on machine learning is developed in[151] that can be used for every ITP on one condition: both thelanguage and its corresponding library are available in a universalformat, so that they can be easily put into the common selectionalgorithm. However, the universal format is generally infeasibleand expensive for many applications. The reason is that it isvery hard to built a universal format that can offer a good trade-off between universality and simplicity. Even if such a universalformat is available, the implementation of library export into theuniversal format is laborious.
One of the such universal format for formal knowledge isthe OMDoc/MMT framework [166], which has been used totranslate Mizar [138], HOL Light [150] and PVS [167] librariesinto OMDoc/MMT framework . Their work has made the librariesbecomes accessible to a wide range of OMDoc-based tools. Itwould be interesting to translate other famous ITPs logics andlibraries to OMDoc/MMT framework for formal mathematics andknowledge management. Also, translation of ITPs to one universalformat will make the machine learning based premise selectionto provers much easier. Moreover, with flexible alignments [149]between the libraries, the developers of different provers can beguided in the approximate translation of contents across librariesand in reuse notations, such as to show one prover content in aform that looks familiar to other prover users.
7 CONCLUSION
Mechanical reasoning systems are actively developed since thebirth of modern logic, and now these state-of-the-art tools areused in proving complicated mathematical theorems and verifyinglarge computer systems. A comprehensive survey on mechanicalreasoning systems (both ITPs and ATPs) is presented in this work.Main characteristics, strengths, differences and application areasof the systems are investigated. Some future research directionsbased on recent work are also discussed. In summary, we findthat formalization with theorem provers have not become ma-ture enough to adopt the working style of vast mathematical
18
community. It still needs: (i) better libraries support for proofsautomation, (ii) better ways of knowledge sharing between theproof systems, (iii) computation incorporation and verification,(iv) better means to store and search the background facts, (v)improved interfaces and integrations of ATPs, ITPs and SAT/SMTsolvers, and (vi) better support for the machine learning and deepmining techniques for proof guidance and automation. Currently,a wider researcher community is working on these problems and itis sanguinely estimated that mechanically formalized and verifiedmathematics will be a commonplace till the mid of this century.
In this survey, judgments which we made about theoremprovers may be subjective. The authors hope that this surveywill provide a quick and easy guide to the interested users andpeople without good mathematics knowledge into the work oftheorem provers. In advance, we allege for misrepresentationof these systems from their perspective developers or owners.We feel happy to be informed via e-mail about any lapses [email protected].
ACKNOWLEDGMENTS
The work has been supported by the National Natural ScienceFoundation of China under grant no. 61772038, 61532019 and61272160, and the Guandong Science and Technology Depart-ment (Grant no. 2018B010107004).
REFERENCES
[1] M. Aagaard and M. Leeser. Verifying a logic synthesis tool in Nuprl:A case study in software verification. In Proceedings of InternationalConference on Computer Aided Verification, pages 69–81, 1992.
[2] A. T. Abdel-Hamid, S. Tahar, and J. Harrison. Enabling hardwareverification through design changes. In Proceedings of the InternationalConference on Formal Engineering Methods, pages 459–470, 2002.
[3] M. Abdulaziz and L. C. Paulson. An Isabelle/HOL formalisationof Green’s theorem. In Proceedings of International Conference onInteractive Theorem Proving, pages 3–19, 2016.
[4] J. R. Abrial. The B-book: Assigning programs to meanings. CambridgeUniversity Press, 2005.
[5] J. R. Abrial. Formal methods: Theory becoming practice. Journal ofUniversal Computer Science, 13(5):619–628, 2007.
[6] M. Acewicz and K. Pak. Formalization of Pell’s equations in the Mizarsystem. In Proceedings of the Federated Conference on ComputerScience and Information Systems, pages 223–226, 2017.
[7] M. Adams. Introducing HOL Zero. In Proceedings of 3rd InternationalCongress on Mathematical Software, pages 142–143. Springer, 2010.
[8] S. Agerholm, I. Beylin, and P. Dybjer. A comparison of HOL andALF formalizations of a categorical coherence theorem. In Proceedingsof 9th International Conference on Theorem Proving in Higher OrderLogics, pages 17–32, 1996.
[9] S. Agerholm and M. Gordon. Experiments with ZF set theory inHOL and Isabelle. In Proceedings of 8th International Conferenceon Higher Order Logic Theorem Proving and its Applications, pages32–45. Springer, 1995.
[10] F. Aguado, P. Ascariz, P. Cabalar, G. Perez, and C. Vidal. Verificationfor ASP denotational semantics: A case study using the PVS theoremprover. Logic Journal of the IGPL, 25(2):195–213, 2015.
[11] S. Allen, R. Constable, R. Eaton, C. Kreitz, and L. Lorigo. TheNuprl open logical environment. In Proceedings of 17th InternationalConference on Automated Deduction, pages 170–176, 2000.
[12] S. F. Allen, M. Bickford, R. Constable, R. Eaton, and C. Kreitz. ANuprl–PVS connection: Integrating libraries of formal mathematics.Technical Report TR2003-1889, Cornell University, USA, 2003.
[13] S. F. Allen, M. Bickford, R. L. Constable, R. Eaton, C. Kreitz, L. Lorigo,and E. Moran. Innovations in computational type theory using Nuprl.Journal of Applied Logic, 4(4):428–469, 2006.
[14] S. Amani, M. Begel, M. Bortin, and M. Staples. Towards verifyingEthereum smart contract bytecode in Isabelle/HOL. In Proceedings ofthe 7th ACM SIGPLAN International Conference on Certified Programsand Proofs, pages 66–77, 2018.
[15] P. D. Ana Bove and U. Norell. A brief overview of Agda-A functionallanguage with dependent types. In Proceedings of 22nd InternationalConference on Theorem Proving in Higher Order Logic, pages 73–78,2009.
[16] A. Anand and V. Rahli. Towards a formally verified proof assistant.In Proceedings of International Conference on Interactive TheoremProving, pages 27–44, 2014.
[17] S. Antoy, M. Hanus, and S. Libby. Proving non-deterministic com-putations in Agda. In Proceedings of WLP’15/’16/WFLP’16, pages180–195, 2016.
[18] A. W. Appel. Foundational proof-carrying code. In Proceedings of16th Annual Symposium on Logic in Computer Science, pages 247–256,2001.
[19] R. C. Armstrong, R. J. Punnoose, M. H. Wong, and J. R. Mayo.Survey of existing tools for formal verification. Technical report, SandiaNational Laboratories, USA, 2014.
[20] R. Arthan. ProofPower–SLRP user guide. Technical report, Lemma 1Limited, 2005.
[21] D. Aspinall, E. Denney, and C. Luth. A tactic language for Hiproofs.In Proceedings of International Conference on Intelligent ComputerMathematics, pages 339–354, 2008.
[22] J. Avigad, L. de Moura, and S. Kong. Theorem proving in Lean, release3.4.0, 2019.
[23] J. Avigad and J. Harrison. Formally verified mathematics. Communica-tions of the ACM, 57(4):66–75, 2014.
[24] M. Ayala-Rincon and Y. S. Rego. Formalization in PVS of balancingproperties necessary for proving security of the Dolev-Yao cascadeprotocol model. Journal of Formalized Reasoning, 6(1):31–61, 2013.
[25] A. Azurat and W. Prasetya. A survey on embedding programminglogics in a theorem prover. Technical report, UU-CS-2002-007, UtrechtUniversity, Netharlands, 2002.
[26] T. Back. Evolutionary Algorithms in Theory and Practice. OxfordUniversity Press, 1996.
[27] F. Badeau and A. Amelot. Using B as a high level programminglanguage in an industrial project: Roissy VAL. In Proceedings of theInternational Conference of B and Z Users, pages 334–354, 2005.
[28] C. Baier, J. P. Katoen, and K. G. Larsen. Principles of model checking.MIT press, 2008.
[29] H. Barendregt. The impact of the lambda calculus in logic and computerscience. Bulletin of Symbolic Logic, 3(2):181–215, 1997.
[30] H. Barendregt and E. Barendsen. Autarkic computations in formalproofs. Journal of Automated Reasoning, 28(3):321–336, 2002.
[31] H. Barendregt and H. Geuvers. Proof-assistants using dependent typesystems. In Handbook of Automated Reasoning (in 2 volumes), pages1149–1238. 2001.
[32] C. Barrett, C. L. Conway, M. Deters, L. Hadarean, D. Jovanovic,T. King, A. Reynolds, and C. Tinelli. CVC4. In Proceedings of 23rdInternational Conference on Computer Aided Verification, pages 171–177, 2011.
[33] C. Barrett, R. Nieuwenhuis, A. Oliveras, and C. Tinelli. Splitting ondemand in SAT modulo theories. In Proceedings of 13th InternationalConference on Logic for Programming, Artificial Intelligence, andReasoning, pages 512–526, 2006.
[34] C. W. Barrett, R. Sebastiani, S. A. Seshia, and C. Tinelli. Satisfiabilitymodulo theories. In Handbook of Satisfiability, pages 825–885. 2009.
[35] D. Basin and M. Kaufmann. The Boyer-Moore prover and Nuprl: Anexperimental comparison, logical frameworks, 1991.
[36] A. Bentkamp, J. C. Blanchette, and D. Klakow. A formal proof of theexpressiveness of deep learning. In Proceedings of 8th InternationalConference on International Conference on Interactive Theorem Prov-ing, pages 46–64, 2017.
[37] J. P. Bernardy and P. Jansson. Certified context-free parsing: Aformalisation of Valiant’s algorithm in Agda. Logical Methods inComputer Science, 12, 2016.
[38] Y. Bertot and P. Casteran. Interactive theorem proving and pro-gram development-Coq’Art: The calculus of inductive constructions.Springer, 2013.
[39] W. Bibel. Research perspectives for logic and deduction. In Proceedingsof Reasoning, Action and Interaction in AI Theories and Systems, pages25–43, 2006.
[40] M. Bickford and D. Guaspari. A programming logic for distributedsystems. Technical report, ATC-NY, 2005.
[41] M. Bickford, C. Kreitz, R. van Renesse, and X. Liu. Proving hybridprotocols correct. In Proceedings of 14th International Conference onTheorem Proving in Higher Order Logics, pages 105–120, 2001.
19
[42] J. C. Blanchette, M. P. L. Haslbeck, D. Matichuk, and T. Nipkow.Mining the archive of formal proofs. In Proceedings of InternationalConference on Intelligent Computer Mathematics, pages 3–17, 2015.
[43] J. C. Blanchette, C. Kaliszyk, L. C. Paulson, and J. Urban. Hammeringtowards QED. Journal of Formalized Reasoning, 9(1):101–148, 2016.
[44] S. Blazy, Z. Dargaye, and X. Leroy. Formal verification of a C compilerfront-end. In Proceedings of 2006 International Symposium on FormalMethods, pages 460–475, 2006.
[45] S. Boldo, C. Lelay, and G. Melquiond. Coquelicot: A user-friendlylibrary of real analysis for Coq. Technical report, INRIA, France, 2013.
[46] S. Boldo, C. Lelay, and G. Melquiond. Formalization of real analysis:A survey of proof assistants and libraries. Mathematical Structures inComputer Science, 26(7):1196–1233, 2016.
[47] R. Boyer. The QED manifesto. In Proceedings of 12th InternationalConference on Automated Deduction, pages 238–251, 1994.
[48] P. Brereton, B. A. Kitchenham, D. Budgen, M. Turner, and M. Khalil.Lessons from applying the systematic literature review process withinthe software engineering domain. Journal of Systems and Softwares,80(4):571–583, 2007.
[49] C. Brown. Satallax: An automatic higher-order prover. In Proceedingsof 6th International Joint Conference on Automated Reasoning, pages111–117. Springer, 2012.
[50] D. F. Butin. Inductive analysis of security protocols in Isabelle/HOLwith applications to electronic voting. PhD thesis, 2012.
[51] R. W. Butler. Formalization of the integral calculus in the PVS theoremprover. Journal of Formalized Reasoning, 2(1):1–26, 2009.
[52] J. Camilleri. A hybrid approach to verifying liveness in a symmetricmultiprocessor. In 10th International Conference on Theorem Provingin Higher-Order Logics, pages 49–67, 1997.
[53] J. Carlton and D. Crocker. Escher verification studio perfect developerand Escher C verifier. Industrial Use of Formal Methods: FormalVerification, pages 155–193, 2013.
[54] M. Carneiro. Models for Metamath. Technical report, The Ohio StateUniversity, USA, 2015.
[55] M. Carneiro. Formalization of the prime number theorem and Dirich-let’s theorem. In Joint Proceedings of the FM4M, MathUI, and ThEduWorkshops, pages 10–13, 2016.
[56] M. M. Carneiro. Conversion of HOL Light proofs into Metamath.Journal of Formalized Reasoning, 9:187–200, 2016.
[57] L. Casset. Development of an embedded verifier for Java card bytecode using formal methods. In Proceedings of Formal Methods Europe,pages 20–309, 2002.
[58] R. N. Charette. Automated to death. IEEE Spectrum, 15, 2009.[59] C. K. Chau, W. A. Hunt, M. Roncken, and I. E. Sutherland. A
framework for asynchronous circuit modeling and verification in ACL2.In Proceedings of 13th International Haifa Conference on Hardwareand Software: Verification and Testing, pages 3–18, 2017.
[60] X. Cheng, M. Zhou, X. Song, M. Gu, and J. Sun. Parallelizing SMTsolving: Lazy decomposition and conciliation. Artificial Intelligence,257:127–157, 2018.
[61] S. J. Chou. Geo prover- A geometry theorem prover developed atUT. In Proceedings of the 8th International Conference on AutomatedDeduction, pages 679–680. Springer, 1986.
[62] E. M. Clarke, W. Klieber, M. Novacek, and P. Zuliani. Model checkingand the state explosion problem. In Tools for Practical SoftwareVerification, pages 1–30. 2012.
[63] C. Cohen and D. Rouhling. A formal proof in Coq of LaSalles’sinvariance principle. In Proceedings of International Conference onInteractive Theorem Proving, pages 148–163, 2017.
[64] R. L. Constable, S. F. Allen, H. M. Bromley, W. R. Cleaveland, J. F.Cremer, R. W. Harper, D. J. Howe, T. B. Knoblock, N. P. Mendler,P. Panangaden, J. T. Sasaki, and S. F. Smith. Implementing Mathematicswith the Nuprl proof development system. Prentice Hall, 1986.
[65] S. A. Cook. The complexity of theorem-proving procedures. InProceedings of the 3rd Annual Symposium on Theory of Computing,pages 151–158. ACM, 1971.
[66] E. Copello, A. Tasistro, and B. Bianchi. Case of (quite) painlessdependently typed programming: Fully certified merge sort in Agda.In Proceedings of Brazilian Symposium on Programming Languages,pages 62–76, 2014.
[67] T. Coquand and G. Huet. The calculus of constructions. Informationand Computation, 76(2–3):95–120, 1988.
[68] P. Corbineau. A declarative language for the Coq proof assistant.In Proceedings of International Workshop on Types for Proofs andPrograms, pages 69–84, 2007.
[69] K. Crary. Toward a foundational typed assembly language. In Pro-ceedings of International Symposium on the Principles of ProgrammingLanguages, pages 198–212, 2003.
[70] K. Crary and S. Sarkar. Foundational certified code in the Twelfmetalogical framework. ACM Transactions on Computational Logic,9(3):1–26, 2008.
[71] D. Crocker. Perfect developer: A tool for object-oriented formalspecification and refinement. In Tools Exhibition Notes at FormalMethods Europe, 2003.
[72] D. Crocker. Safe object-oriented software: The verified design-by-contract paradigm. In Proceedings of 12th Safety-Critical SystemsSymposium, pages 19–41, 2004.
[73] D. Crocker. Verifying compilers for financial applications. In GrandChallenge 6 workshop of Formal Methods, 2005.
[74] D. Crocker and J. Carlton. Verification of C programs using automatedreasoning. In Proceedings of 5th International Conference on SoftwareEngineering and Formal Methods, pages 1–8, 2007.
[75] D. Crocker and J. H. Warren. Generating commercial web applicationsfrom precise requirements and formal specifications. In 1st Interna-tional Workshop on Automated Specification and Verification of WebSites, pages 1–6, 2005.
[76] L. Czajka and C. Kaliszyk. Hammer for Coq: Automation for dependenttype theory. Journal of Automated Reasoning, 61(1-4):423–453, 2018.
[77] E. Davis and G. Marcus. The scope and limits of simulation inautomated reasoning. Artificial Intelligence, 233:60–72, 2016.
[78] M. Davis. The prehistory and early history of automated deduction. InAutomation of Reasoning 1: Classical Papers on Computational Logic1957-1966, pages 1–28. 1983.
[79] M. Davis, G. Logemann, and D. Loveland. A machine program fortheorem-proving. Communications of the ACM, 5(7):394–397, 1962.
[80] D. Delahaye. A tactic language for the system Coq. In Proceedings of7th International Conference on Logic for Programming and AutomatedReasoning, pages 85–95, 2000.
[81] E. Denney, B. Fischer, and J. Schumann. An empirical evaluationof automated theorem provers in software certification. InternationalJournal on Artificial Intelligence Tools, 15(1):81–108, 2006.
[82] E. W. Dijkstra. Cooperating sequential processes. In The Origin ofConcurrent Programming, pages 65–138. 1968.
[83] C. Doczkal and G. Smolka. Two-way automata in Coq. In Proceedingsof International Conference on Interactive Theorem Proving, pages151–166, 2016.
[84] M. Eberl. Proving divide and conquer complexities in Isabelle/HOL.Journal of Automated Reasoning, 58(4):483–508, 2017.
[85] N. Een and N. Sorensson. An extensible SAT-solver. In Proceedings of6th International Conference on Theory and Applications of Satisfiabil-ity Testing, pages 502–518, 2003.
[86] B. Ekici, A. Mebsout, C. Tinelli, C. Keller, G. Katz, A. Reynolds, andC. W. Barrett. Smtcoq: A plug-in for integrating SMT solvers into Coq.In Proceedings of 29th International Conference on Computer AidedVerification, pages 126–133, 2017.
[87] L. Erkk and J. Matthews. Using Yices as an automated solver inIsabelle/HOL. In Proceedings of Automated Formal Methods, pages3–13, 2008.
[88] A. Faithfull, J. Bengtson, E. Tassi, and C. Tankink. Coqoon - An IDEfor interactive proof development in Coq. International Journal onSoftware Tools for Technology Transfer, 20(2):125–137, 2018.
[89] A. P. Felty, E. L. Gunter, J. Hannan, D. Miller, G. Nadathur, and A. Sce-drov. Lambda-Prolog: An extended logic programming language. InProceedings of 9th International Conference on Automated Deduction,pages 754–755, 1988.
[90] J. F. Ferreira, S. A. Johnson, A. Mendes, and P. J. Brooke. Certifiedpassword quality - A case study using Coq and Linux pluggableauthentication modules. In Proceedings of International Conferenceon Integrated Formal Methods, pages 407–421, 2017.
[91] A. Flatau, M. Kaufmann, D. Reed, D. Russinoff, E. Smith, and R. Sum-ners. Formal verification of microprocessors at AMD. In Proceedingsof Designing Correct Circuits, 2002.
[92] J. P. P. Flor, W. Swierstra, and Y. Sijsling. Pi-Ware: Hardwaredescription and verification in Agda. In 21st International Conferenceon Types for Proofs and Programs, pages 1–27, 2015.
[93] L. Fortnow. The status of the P versus NP problem. Communicationsof the ACM, 52(9):78–86, 2009.
[94] S. Foster and G. Struth. Integrating an automated theorem prover intoAgda. In Proceedings of NASA Formal Methods Symposium, pages116–130, 2011.
[95] J. Franco and J. Martin. A history of satisfiability. volume 185 ofFrontiers in Artificial Intelligence and Applications. IOS Press, 2009.
20
[96] A. Gabrielli and M. Maggesi. Formalizing basic quaternionic analysis.In Proceedings of International Conference on Interactive TheoremProving, pages 225–240, 2017.
[97] J. H. Gallier. Logic for Computer Science: Foundations of automatictheorem proving. Courier Dover Publications, 2015.
[98] H. Ganzinger and K. Korovin. New directions in instantiation-basedtheorem proving. In Proceedings of 18th Symposium on Logic inComputer Science, pages 55–64, 2003.
[99] B. V. Gastel, L. Lensink, S. Smetsers, and M. van Eekelen. Reentrantreaders-writers: A case study combining model checking with theoremproving. In Proceedings of International Worksop on Formal Methodsfor Industrial Critical Systems, pages 85–102, 2008.
[100] B. V. Gastel, L. Lensink, S. Smetsers, and M. van Eekelen. Deadlockand starvation free reentrant readers–writers: A case study combiningmodel checking with theorem proving. Science of Computer Program-ming, 76(2):82–99, 2011.
[101] T. Gauthier, C. Kaliszyk, and J. Urban. Tactictoe: Learning to reasonwith HOL4 tactics. In Proceedings of 21st International Conferenceon Logic for Programming, Artificial Intelligence and Reasoning, pages125–143, 2017.
[102] H. Geuvers, R. Pollack, F. Wiedijk, and J. Zwanenburg. A construc-tive algebraic hierarchy in Coq. Journal of Symbolic Computation,34(4):271–286, 2002.
[103] F. Gilbert. Proof certificates in PVS. In Proceedings of InternationalConference on Interactive Theorem Proving, pages 262–268, 2017.
[104] S. Goel. The x86isa books: Features, usage, and future plans. InProceedings of 14th International Workshop on the ACL2 TheoremProver and its Applications, pages 1–17, 2017.
[105] Z. Goertzel, J. Jakubuv, S. Schulz, and J. Urban. Proofwatch: Watchlistguidance for large theories in E. In Proceedings of 9th InternationalConference on Interactive Theorem Proving, pages 270–288, 2018.
[106] G. Gonthier. Formal proof of the Four-Color theorem. Notices of theAmerican Mathematical Society, 55:182–1393, 2008.
[107] G. Gonthier, A. Asperti, J. Avigad, Y. Bertot, C. Cohen, F. Garillot,S. L. Roux, A. Mahboubi, R. OConnor, S. O. Biha, I. Pasca, L. Rideau,A. Solovyev, E. Tassi, and L. Thery. A machine-checked proof of theodd order theorem. In Proceedings of 4th International Conference onInteractive Theorem Proving, pages 163–179, 2013.
[108] M. J. C. Gordon, W. A. Hunt, M. Kaufmann, and J. Reynolds. Anembedding of the ACL2 logic in HOL. In Proceedings of the 6th Inter-national Workshop on the ACL2 Theorem Prover and its Applications,pages 40–46. ACM, 2006.
[109] M. J. C. Gordon, J. Reynolds, W. A. Hunt, and M. Kaufmann. Anintegration of HOL and ACL2. In Proceedings of the 6th InternationalConference on Formal Methods in Computer-Aided Design, pages 153–160, 2006.
[110] A. Grabowski, A. Kornilowicz, and A. Naumowicz. Mizar in a nutshell.Journal of Formalized Reasoning, 3(2):153–245, 2010.
[111] S. Grewe, S. Erdweg, and M. Mezini. Using vampire in soundnessproofs of type systems. Technical report, 2016.
[112] D. Griffioen and M. Huisman. A comparison of PVS and Isabelle/HOL.In Proccedings of 11th International Conference on Theorem Provingin Higher Order Logics, pages 123–142, 1998.
[113] T. C. Hales. Mathematics in the age of the turing machine. Turing’slegacy. Technical report, University of Pittsburgh, 2013.
[114] M. R. Harbach. Methods and tools for the formal verification ofsoftware. Master’s thesis, Technical University Wien, 2011.
[115] J. Harrison. A Mizar mode for HOL. In Proceedings of 9th InternationalConference on Theorem Proving in Higher Order Logics, pages 203–220, 1996.
[116] J. Harrison. Floating-point verification. Journal of Universal ComputerScience, 13(5):629–638, 2007.
[117] J. Harrison. A short survey of automated reasoning. In Proceedings ofInternational Conference on Algebraic Biology, pages 334–349, 2007.
[118] J. Harrison. Formal proof theory and practice. Notices of the AmericanMathematical Society, 55(11):1395–1406, 2008.
[119] J. Harrison. Handbook of Practical Logic and Automated Reasoning.Cambridge University Press, 2009.
[120] J. Harrison. Hol Light: An overview. In Proceedings of 22st Interna-tional Conference on Theorem Proving in Higher Order Logics, pages60–66. Springer, 2009.
[121] J. Harrison, J. Urban, and F. Wiedijk. History of interactive theoremproving. In Computational Logic, volume 9, pages 135–214, 2014.
[122] O. Hasan and S. Tahar. Performance analysis and functional verificationof the stop-and-wait protocol in HOL. Journal of Automated Reasoning,42(1):1–33, 2009.
[123] O. Hasan and S. Tahar. Formal verification methods. In Encyclopedia ofInformation Science and Technology, Third Edition, pages 7162–7170.IGI Global, 2015.
[124] J. V. Heijenoort. Historical development of modern logic. LogicaUniversalis, pages 1–11, 2012.
[125] J. Heras and E. Komendantskaya. ACL2(ml): Machine-learning forACL2. In Proceedings of 12th International Workshop on ACL2Theorem Prover and its Applications, pages 461–75, 2014.
[126] J. C. L. Hernandez and K. Korovin. Towards an abstraction-refinementframework for reasoning with large theories. In Proceedings of IWILWorkshop and LPAR Short Presentations, pages 119–123, 2017.
[127] W. Hesselink and M. IJbema. Starvation-free mutual exclusion withsemaphores. Formal Aspects of Computing, 25:947–969, 2013.
[128] W. Hesselink and M. I. Lali. Formalizing a hierarchical file system.Formal Aspects of Computing, 24:27–44, 2012.
[129] T. Hillenbrand, A. Buch, R. Vogt, and B. Lochner. Waldmeister: Highperformance equational deduction. Journal of Automated Reasoning,18:265–270, 1997.
[130] K. Hoder and A. Voronkov. Sine qua non for large theory reasoning. InProceedings of 23rd International Conference on Automated Deduction,pages 299–314, 2011.
[131] S. Holub and R. Veroff. Formalizing a fragment of combinatorics onwords. In Proceedings of 13th International Conference on Computabil-ity in Europe, pages 24–31, 2017.
[132] J. Holzl. Markov chains and markov decision processes in Is-abelle/HOL. Journal of Automated Reasoning, 59(3):345–387, 2017.
[133] W. Hong, M. S. Nawaz, X. Zhang, Y. Li, and M. Sun. Using Coq forformal modeling and verification of timed connectors. In Proceedingsof Software Engineering and Formal Methods: SEFM 2017 CollocatedWorkshops, Revised Selected Papers, pages 558–573.
[134] D. J. Howe. Semantic foundations for embedding HOL in Nuprl. InProceedings of International Conference on Algebraic Methodology andSoftware Technology, pages 85–101, 1996.
[135] D. J. Howe. Reasoning about functional programs in Nuprl. InProceedings of Functional Programming, Concurrency, Simulation andAutomated Reasoning, pages 145–164, 2005.
[136] S. Huang and Y. Chen. Proving theorems by using evolutionary searchwith human involvement. In Proceedings of Congress on EvolutionaryComputation, pages 1495–1502, 2017.
[137] W. A. Hunt, M. Kaufmann, J. S. Moore, and A. Slobodova. Industrialhardware and software verification with ACL2. Philosophical Transac-tions A, 375:20150399, 2017.
[138] M. Iancu, M. Kohlhase, F. Rabe, and J. Urban. The Mizar mathematicallibrary in OMDoc: Translation and applications. Journal of AutomatedReasoning, 50(2):191–202, 2013.
[139] L. Jakubiec, S. Coupet-Grimal, and P. Curzon. A comparison of the Coqand HOL proof systems for specifying hardware. Short Presentationsat 10th International Conference on Theorem Proving in Higher OrderLogics, 63:78, 1997.
[140] J. Jakubuv and J. Urban. Extending E prover with similarity basedclause selection strategies. In Proceedings of 9th International Confer-ence on Intelligent Computer Mathematics, pages 151–156, 2016.
[141] P. Janicic. Automated reasoning: Some successes and new challenges.In Proceedings of the 22nd Central European Conference on Informa-tion and Intelligent Systems, 2011.
[142] S. Jaskowski. On the rules of supposition in formal logic. Studia Logica,1, 1934.
[143] A. Jeffrey. Dependently typed web client applications. In Proceed-ings of International Symposium on Practical Aspects of DeclarativeLanguages, pages 228–243, 2013.
[144] S. J. C. Joosten, C. Kaliszyk, and J. Urban. Initial experimentswith TPTP-style automated theorem provers on ACL2 problems. InProceedings of 12th International Workshop on the ACL2 TheoremProver and its Applications, pages 77–85, 2014.
[145] J. Julliand, B. Legeard, T. Machicoane, B. Parreaux, and B. Tatibouet.Specification of an integrated circuit card protocol application using theB method and linear temporal logic. In Proceedings of InternationalConference of B Users, pages 273–292, 1998.
[146] J. Kaiser, B. Pientka, and G. Smolka. Relating system F and Lambda2:A case study in Coq, Abella and Beluga. In Proceedings of 2ndInternational Conference on Formal Structures for Computation andDeduction, pages 21:1–21:19, 2017.
[147] C. Kaliszyk, F. Chollet, and C. Szegedy. HolStep : A machine learningdataset for higher order logic theorem proving. In Proceedings of 5thInternational Conference on Learning Representations, 2017.
[148] C. Kaliszyk and K. Pak. Progress in the independent certificationof Mizar mathematical library in Isabelle. In Proceedings of 12th
21
Federated Conference on Computer Science and Information Systems,pages 227–236, 2017.
[149] C. Kaliszyk, K. Pak, and J. Urban. Towards a Mizar environment forIsabelle: Foundations and language. In Proceedings of the Conferenceon Certified Programs and Proofs, pages 58–65, 2016.
[150] C. Kaliszyk and F. Rabe. Towards knowledge management for HOLLight. In Proceedings of Internaional Conference on Intelligent Com-puter Mathematics, pages 357–372, 2014.
[151] C. Kaliszyk and J. Urban. Hol(y)hammer: Online ATP service for HOLLight. Mathematics in Computer Science, 9(1):5–22, 2015.
[152] C. Kaliszyk and J. Urban. Hol(y)hammer: Online ATP service for HOLLight. Mathematics in Computer Science, 9(1):5–22, 2015.
[153] C. Kaliszyk and J. Urban. MizAR 40 for Mizar 40. Journal ofAutomated Reasoning, 55(3):245–256, 2015.
[154] C. Kaliszyk, J. Urban, H. Michalewski, and M. Olsak. Reinforcementlearning of theorem proving. In Proceedings of Annual Conference onNeural Information Processing Systems, pages 8836–8847, 2018.
[155] K. Kanso. Agda as a platform for the development of verified railwayinterlocking systems. PhD thesis, 2012.
[156] M. Kaufmann and J. S. Moore. An industrial strength theorem proverfor a logic based on Common Lisp. IEEE Transactions on SoftwareEngineering, 23(4):203–213, 1997.
[157] M. Kaufmann and J. S. Moore. ACL2 and its applications to digitalsystem verification. In Proceedings of the International Conference onDesign and Verification of Microprocessor Systems for High-AssuranceApplications, 2010.
[158] C. Keller and B. Werner. Importing HOL Light into Coq. In Proceedingsof International Conference on Interactive Theorem Proving, pages307–322, 2010.
[159] M. Kerber, C. Lange, and C. Rowat. Formal representation andproof for cooperative games. In Symposium on Mathematical Practiceand Cognition II. Society for the Study of Artificial Intelligence andSimulation of Behaviour, pages 15–18, 2012.
[160] M. Kerber, C. Lange, and C. Rowat. An introduction to mechanizedreasoning. Journal of Mathematical Economics, 66:26–39, 2016.
[161] T. Kim, D. Stringer-Calvert, and S. Cha. Formal verification of func-tional properties of an SCR-style software requirements specificationusing PVS. In Proceedings of 8th International Conference on Toolsand Algorithms for the Construction and Analysis of Systems, pages205–220, 2002.
[162] B. Kitchenham, H. Al-Khilidar, M. A. Babar, M. Berry, K. Cox,J. Keung, F. Kurniawati, M. Staples, H. Zhang, and L. Zhu. Evalu-ating guidelines for reporting empirical software engineering studies.Emperical Software Engineering, 13(1):97–121, 2008.
[163] G. Klein, K. Elphinstone, G. Heiser, J. Andronick, D. Cock, P. Derrin,D. Elkaduwe, K. Engelhardt, R. Kolanski, M. Norrish, T. Sewell,H. Tuch, and S. Winwood. seL4: Formal verification of an OS kernel. InProceedins of 22nd Symposium on Operating System Principles, pages200–220, 2009.
[164] G. Klein and T. Nipkow. Verified lightweight bytecode verification.Concurrency and Computataion: Practice and Experience, 13:1133–1151, 2001.
[165] G. Klein and T. Nipkow. Applications of interactive proof to data flowanalysis and security. In Software Systems Safety, pages 77–134, 2014.
[166] M. Kohlhase. OMDoc - An Open Markup Format for MathematicalDocuments [version 1.2], volume 4180 of LNCS. Springer, 2006.
[167] M. Kohlhase, D. Mller, S. Owre, and F. Rabe. Making PVS accessible togeneric services by interpretation in a universal format. In Proceedingsof 8th International Conference on Interactive Theorem Proving, pages319–335, 2017.
[168] W. Kokke. Formalising type-logical grammars in Agda. In Proceedingsof 1st Workshop on Type Theory and Lexical Semantics, 2015.
[169] A. Kornilowicz, A. Kryvolap, M. Nikitchenko, and I. Ivanov. Formal-ization of the nominative algorithmic algebra in Mizar. In Proceedins of38th Iternational Conference on Information Systems Architecture andTechnology, pages 176–186, 2017.
[170] K. Korovin. iProver - an instantiation-based theorem prover for first-order logic. In Proceedings of 4th International Joint Conference onAutomated Reasoning, pages 292–298, 2008.
[171] K. Korovin and A. Voronkov. Solving systems of linear inequalities bybound propagation. In Proceedings of 23rd International Conferenceon Automated Deduction, pages 269–383, 2011.
[172] L. Kovacs and A. Voronkov. Finding loop invariants for programs overarrays using a theorem prover. In Proceedings of the International Con-ference on Fundamental Approaches to Software Engineering, pages470–485, 2009.
[173] L. Kovacs and A. Voronkov. First-order theorem proving and Vampire.In Proceedings of 25th International Conference on Computer AidedVerification, pages 1–35, 2013.
[174] A. Krauss and A. Schropp. A mechanized translation from higher-order logic to set theory. In Proceedings of International Conference onInteractive Theorem Proving, pages 323–338, 2010.
[175] C. Kreitz. Building reliable, high-performance networks with the Nuprlproof development system. Journal of Functional Programming, 14:21–68, 2004.
[176] T. Kropf. Introduction to formal hardware verification. Springer, 2013.[177] T. Lecomte, D. Deharbe, E. Prun, and E. Mottin. Applying a formal
method in industry: A 25-year trajectory. In Proceedings of 20thBrazilian Symposium on Formal Methods, pages 70–87, 2017.
[178] T. Lecomte, T. Servat, and G. Pouzancre. Formal methods in safety-critical railway systems. In 10th Brasilian Symposium on FormalMethods, 2007.
[179] D. Lee, K. Crary, and R. Harper. Towards a mechanized metatheory ofstandard ML. In Proceedings of 34th Symposium on the Principles ofProgramming Languages, pages 173–184, 2007.
[180] X. Leroy. A formally verified compiler back-end. Journal of AutomatedReasoning, 43:363–446, 2009.
[181] M. Lesser. Using Nuprl for the verification and synthesis of hardware.Philosophical Transactions of the Royal Society of London A: Mathe-matical, Physical and Engineering Sciences, 339(1652):49–68, 1992.
[182] P. Letouzey. Extraction in Coq: An overview. In Proceedings of 4thConference on Computability in Logic and Theory of Algorithms, pages359–369, 2008.
[183] R. Letz, J. Schuman, S. Bayerl, and W. Bibel. SETHEO: Ahigh-performance theorem prover. Journal of Automated Reasoning,8(2):183–212, 1992.
[184] N. G. Levenson. System safety and computers. Addison Wesley, 1995.[185] Y. Li and M. Sun. Modeling and verification of component connectors
in Coq. Science in Computer Progamming, 113:285–301, 2015.[186] F. Lindblad and M. Benke. A tool for automated theorem proving in
Agda. In Proceedings of International Workshop on Types for Proofsand Programs, pages 154–169, 2004.
[187] N. Macedo and A. Cunha. Automatic unbounded verification of Alloyspecifications with Prover9. CoRR, abs/1209.5773, 2012.
[188] D. Mackenzie. The automation of proof: A historical and sociologicalexploration. IEEE Annals of the History of Computing, 17(3):7–29,1995.
[189] J. Mackie and I. Sommerville. Failures of healthcare systems. InProceedings of the 1st Dependability IRC Workshop, pages 1–8, 2000.
[190] M. Maggesi. A formalization of metric spaces in HOL light. Journal ofAutomated Reasoning, 60(2):237–254, 2018.
[191] F. Maric. A survey of interactive theorem proving. Zb. Rad, 18:173–223, 2015.
[192] P. Masci, P. Curzon, D. Furniss, and A. Blandford. Using PVS tosupport the analysis of distributed cognition systems. Innovations inSystems and Software Engineering, 11:113–130, 2015.
[193] P. Masci, Y. Zhang, P. Jones, P. Curzon, and H. Thimbleby. Formal ver-ification of medical device user interfaces using PVS. In Proceedings ofthe International Conference on Fundamental Approaches to SoftwareEngineering, pages 200–214, 2014.
[194] R. Matuszewski and P. Rudnicki. Mizar: The first 30 years. MechanizedMathematics and Its Applications, 4:3–24, 2005.
[195] M. Mayero. The three gap theorem (steinhauss conjecture). Technicalreport, INRIA, France, 2006.
[196] W. McCune. Prover9 and Mace4. Available at:cs.unm.edu/˜mccune/prover9, 2005-2010.
[197] W. W. McCune. Otter 3.0 reference manual and guide. Technical report,ANL-94/6, Argonne National Laboratory, 1994.
[198] N. Megill. Metamath: A Computer Language for Pure Mathematics.Lulu Press, USA, 2007.
[199] J. S. Moore. A mechanical analysis of program verification strategies.Formal Methods in System Design, 14:213–228, 1999.
[200] J. S. Moore. Proving theorems about Java-Like byte code. In CorrectSystem Design, 2000.
[201] G. J. Myers, T. Badgett, and C. Snadler. The art of software testing,Third Edition. John Wiley & Sons Publishers, 2011.
[202] P. Naumov, M.-O. Stehr, and J. Meseguer. The HOL/NuPRL prooftranslator. Proceedings of 14th International Conference on TheoremProving in Higher Order Logic, 2152:329–345, 2001.
[203] A. Naumowicz. SAT-enhanced Mizar proof checking. In Proceedingsof 7th International Conference on Intelligent Computer Mathematics,pages 449–452, 2014.
22
[204] M. S. Nawaz, M. I. Lali, and S. Meng. Formal modeling, analysisand verification of Black White Bakery algorithm. In 9th InternationalConference on Intelligent Human-Machine Systems and Cybernatics,pages 407–410. IEEE, 2017.
[205] M. S. Nawaz and M. Sun. A formal design model for genetic algorithmsoperators and its encoding in PVS. In Proceedings of 2nd InternationalConference on Big Data and Internet of Things, pages 2186–190, 2018.
[206] M. S. Nawaz and M. Sun. Reo2PVS: Formal specification and verifica-tion of component connectors. In Proceedings of 30th InternationalConference on Software Engineering and Knowledge Engineering,pages 391–396, 2018.
[207] M. S. Nawaz and M. Sun. Using PVS for modeling and verifying cloudservices and their composition. In Proceedings of 6th InternationalConference on Advanced Cloud and Big Data, pages 41–46, 2018.
[208] M. S. Nawaz and M. Sun. Using PVS for modeling and verificationof probabilistic connectors. In Proceedings of International Conferenceon Fundamentals of Software Engineering, pages 61–76, 2019.
[209] M. S. Nawaz, M. Sun, and P. Fournier-Viger. Proof guidance inPVS with sequential pattern mining. In Proceedings of InternationalConference on Fundamentals of Software Engineering, pages 45–60,2019.
[210] T. Nipkow, L. C. Paulson, and M. Wenzel. Isabelle/HOL - A proofassistant for higher-order logic, volume 2283 of LNCS. Springer, 2002.
[211] U. Norell. Dependently typed programming in Agda. In Proceedingsof the 4th International Workshop on Types in Language Design andImplementation, pages 1–2. ACM, 2009.
[212] M. Norrish. Formalising C in HOL. PhD thesis, Computer Laboratory,University of Cambridge, 1998.
[213] S. Obua and S. Skalberg. Importing HOL into Isabelle/HOL. In Pro-ceedings of 3rd Inernational Joint Conference on Automated Reasoning,pages 298–302, 2006.
[214] H. Okazaki and Y. Futa. Formalization of polynomially bounded andnegligible functions using the computer-aided proof-checking systemMizar. In Proceedings of 9th Conference on Intelligent ComputerMathematics, pages 117–131, 2016.
[215] S. Owre, J. M. Rushby, N. Shankar, and M. K. Srivas. A tutorial on us-ing PVS for hardware verification. In Proceedings of 2nd InternationalConference on Theorem Provers in Circuit Design - Theory, Practiceand Experience, pages 258–279, 1994.
[216] R. Padmanabhan and R. Veroff. A geometric procedure with Prover9. InAutomated Reasoning and Mathematics - Essays in Memory of WilliamW. McCune, pages 139–150, 2013.
[217] D. Page. Report of the inquiry into the london ambulance service, 1993.[218] H. M. Palombo, H. Zheng, and J. Ligatti. POSTER: Towards precise
and automated verification of security protocols in Coq. In Proceedingsof the 2017 Conference on Computer and Communications Security,pages 2567–2569, 2017.
[219] P. Papapanagiotou and J. D. Fleuriot. The Boyer-Moore waterfall modelrevisited. CoRR, abs/1808.03810, 2018.
[220] C. Paulin-Mohring. Circuits as streams in Coq: Verification of asequential multiplier. In Proceedings of the International Workshop onTypes for Proofs and Programs, slected papers, pages 216–230, 2005.
[221] L. C. Paulson. Logic and Computation: Interactive Proof with Cam-bridge LCF. Cambridge University Press, 1987.
[222] L. C. Paulson. Isabelle: A generic theorem prover. Springer, 1994.[223] L. C. Paulson and J. Blanchette. Three years of experience with
Sledgehammer, a practical link between automated and interactivetheorem provers. In Invited talk at 8th International Workshop on theImplementation of Logics, 2010.
[224] Y. Peng and M. R. Greenstreet. Extending ACL2 with SMT solvers.In Proceedings of 13th International Workshop on the ACL2 TheoremProver and Its Applications, pages 61–77, 2015.
[225] F. Pfenning. Structural cut elimination. In Proceedings of 10th AnnualSymposium on Logic in Computer Science, pages 156–166, 1995.
[226] F. Pfenning and C. Schurmann. System description: Twelf-a meta-logical framework for deductive systems. In Proceedings of Interna-tional Conference on Automated Deduction, pages 202–206, 1999.
[227] D. L. Rager. Adding parallelism capabilities to ACL2. In Proceedingsof the 6th International Workshop on the ACL2 Theorem Prover and itsapplications, pages 90–94. ACM, 2006.
[228] V. Rahli and M. Bickford. Coq as a metatheory for Nuprl with barinduction. Technical report, Cornell University, 2015.
[229] D. Ramachandran, P. Reagan, and K. Goolsbery. First-orderized re-searchcyc : Expressivity and efficiency in a common-sense ontology. InProceedings of AAAI Workshop on Contexts and Ontologies: Theory,Practice and Applications, 2005.
[230] R. Rand, J. Paykin, and S. Zdancewic. QWIRE practice: Formal verifi-cation of quantum circuits in Coq. In Proceedings of 14th InternationalConference on Quantum Physics and Logic, pages 119–132, 2017.
[231] S. Ranise and D. Deharbe. Applying light-weight theorem provingto debugging and verifying pointer programs. In Proceedings of 4thInternational Workshop on First-Order Theorem Proving, pages 109–119, 2009.
[232] A. Rashid, O. Hasan, U. Siddique, and S. Tahar. Formal reasoning aboutsystems biology using theorem proving. PLoS ONE, 12(7):e0180179,2017.
[233] G. Reger, M. Suda, and A. Voronkov. Playing with AVATAR. InProceedings of the 25th Internal Conference on Automated Deduction,pages 399–415, 2015.
[234] A. Reynolds, C. Tinelli, and C. Barrett. Constraint solving for finitemodel finding in SMT solvers. Theory and Practice of Logic Program-ming, 17(4):516–558, 2017.
[235] A. Riazanov and A. Voronkov. Vampire 1.1 (system description).In Proceedings of 1st International Joint Conference on AutomatedReasoning, pages 376–380, 2001.
[236] D. Rouhling. A formal proof in Coq of a control function for theinverted pendulum. In Proceedings of the 7th International Conferenceon Certified Programs and Proofs, pages 28–41, 2018.
[237] J. M. Rushby. Tutorial: Automated formal methods with PVS, SAL,and Yices. In 4th International Conference on Software Engineeringand Formal Methods, page 262, 2006.
[238] K. E. Sabri. Automated verification of role-based access control policiesconstraints using Prover9. CoRR, abs/1503.07645, 2015.
[239] K. E. Sabri and R. Khedri. A generic algebraic model for the analysisof cryptographic-key assignment schemes. In Proceedings of 5thInternational Symposium on Foundations and Practice of Security,pages 62–77, 2012.
[240] S. Schulz. E-A brainiac theorem prover. Ai Communications, 15(2,3):111–126, 2002.
[241] S. Schulz, S. Cruanes, and P. Vukmirovic. Faster, higher, stronger: E2.3. In Proceedings of 27th International Conference on AutomatedDeduction, pages 495–507, 2019.
[242] C. Schurmann and M. Stehr. An executable formalization of theHOL/Nuprl connection in the meta-logical framework Twelf. InProceedings of International Conference on Logic for ProgrammingArtificial Intelligence and Reasoning, pages 150–166, 2006.
[243] N. Shankar. Verification of real-time systems using PVS. In Proceedingsof the International Conference on Computer Aided Verification, pages280–291, 1993.
[244] S. Shiraz and O. Hasan. A library for combinational circuit verificationusing the HOL theorem prover. IEEE Transaction on CAD of IntegratedCircuits and Systems, 37(2):512–516, 2018.
[245] J. Siekmann and G. Wrightson. Automation of Reasoning: 2: ClassicalPapers on Computational Logic 1967–1970. Springer, 2012.
[246] K. Singh and B. Auernheimer. Formal specification of Multi-Windowuser interface in PVS. In Proceedings of International Conference onHuman-Computer Interaction, pages 144–149, 2016.
[247] K. Slind and M. Norrish. A brief overview of HOL4. In Proceedingsof 21st International Conference on Theorem Proving in Higher OrderLogics, pages 28–32, 2008.
[248] M. H. Sorensen and P. Urzyczyn. Lectures on the Curry-HowardIsomorphism. Elsevier, 2006.
[249] M. K. Srivas and S. P. Miller. Applying formal verification to theAAMP5 microprocessor: A case study in the industrial use of formalmethods. Formal Methods in System Design, 8(2):153–188, 1996.
[250] J. Sterling, D. Gratzer, V. Rahli, D. Morrison, E. Akentyev, andA. Tosun. RedPRL–The people’s refinement logic, available at:http://www.redprl.org/, 2016.
[251] G. Sutcliffe. The CADE ATP system competition - CASC. AIMagazine, 37(2):99–101, 2016.
[252] G. Sutcliffe. The TPTP problem library and associated infrastructure- From CNF to TH0, TPTP v6.4.0. Journal of Automated Reasoning,59(4):483–502, 2017.
[253] G. Sutcliffe and Y. Puzis. SRASS - A semantic relevance axiomselection system. In Proceedings of 21st International Conference onAutomated Deduction, pages 295–310, 2007.
[254] G. Sutcliffe and C. Suttner. Evaluating general purpose automatedtheorem proving systems. Artificial Intelligence, 131(1):39–54, 2001.
[255] S. Swords and J. Davis. Bit-blasting ACL2 theorems. In Proceedingsof 10th International Workshop on the ACL2 Theorem Prover and itsApplications, pages 84–102, 2011.
23
[256] A. Tanaka, R. Affeldt, and J. Garrigue. Safe low-level code genera-tion in Coq using monomorphization and monadification. Journal ofInformaion Processing, 26:54–72, 2018.
[257] S. H. Taqdees and O. Hasan. Formally verifying transfer functions oflinear analog circuits. IEEE Design & Test, 34(5):30–37, 2017.
[258] C. Tian. A formalization of the process algebra CCS in HOL4. CoRR,abs/1705.07313, 2017.
[259] C. Tian. Formalized Lambek calculus in higher order logic (HOL4).CoRR, abs/1705.07318, 2017.
[260] J. Urban. Translating Mizar for first order theorem provers. In Pro-ceedings of 2nd International Conference on Mathematical KnowledgeManagement, pages 203–215, 2003.
[261] J. Urban. MaLARea: A metasystem for automated reasoning in largetheories. In Proceedings of 21th Workshop on Empirically SuccessfulAutomated Reasoning in Large Theories, 2007.
[262] J. Urban, K. Hoder, and A. Voronkov. Evaluation of automatedtheorem proving on the Mizar mathematical library. In Proceedingsof International Congress on Mathematical Software, pages 155–166,2010.
[263] J. Urban, G. Sutcliffe, P. Pudlak, and J. Vyskocil. MaLARea SG1-machine learner for automated reasoning with semantic guidance. InProceedings of 4th International Conference on Automated Reasoning,pages 441–456, 2008.
[264] J. Urban and R. Veroff. Experiments with state-of-the-art automatedprovers on problems in tarskian geometry. In Procedings of 11thInternational Workshop on the Implementation of Logics, pages 122–126, 2015.
[265] J. Urban and J. Vyskocil. Theorem proving in large formal mathematicsas an emerging AI field. In Automated Reasoning and Mathematics:Essays in Memory of William McCune, pages 240–257, 2013.
[266] J. Vitt and J. Hooman. Assertional specification and verificationusing PVS of the steam boiler control system. In Proceedings of theInternational Conference Formal Methods for Industrial Applications,pages 453–472, 1995.
[267] D. von Oheimb. Hoare logic for Java in Isabelle/HOL. Concurrencyand Computation: Practive and Experience, 13(13):1173–1214, 2001.
[268] H. Wang. Computer theorem proving and artificial intelligence. Com-putational Logic, pages 63–75, 1990.
[269] T. Weber. SMT solvers: New oracles for the HOL theorem prover. Inter-national Journal on Software Tools for Technology Transfer, 13(5):419–429, 2011.
[270] W. F. Wenzel, M. A comparison of Mizar and Isar. Journal of AutomatedReasoning, 29(3–4):389–411, 2002.
[271] D. Weyns, M. U. Iftikhar, D. G. de la Iglesia, and T. Ahmad. Asurvey of formal methods in self-adaptive systems. In Proceedingsof 5th International Conference on Computer Science & SoftwareEngineering, pages 67–79, 2012.
[272] N. White, S. Matthews, and R. Chapman. Formal verification: Willthe seedling ever flower? Philosophical Transactions A, 375:20150402,2017.
[273] F. Wiedijk. Mizar Light for HOL Light. In Proceedings of InternationalConference on Theorem Proving in Higher Order Logic, pages 378–393,2001.
[274] F. Wiedijk. The seventeen provers of the world: Foreword by Dana S.Scott, volume 3600. Springer, 2006.
[275] J. Woodcock, P. G. Larsen, J. Bicarregui, and J. S. Fitzgerald. Formalmethods: Practice and experience. ACM Computing Surveys, 41(4):1–36, 2009.
[276] L. A. Yang, J. P. Liu, C. H. Chen, and Y. ping Chen. Automaticallyproving mathematical theorems with evolutionary algorithms and proofassistants. In Proceedings of Congress on Evolutionary Computation,pages 4421–4428, 2016.
[277] A. Yushkovskiy. Comparison of two theorem provers: Isabelle/HOL andCoq. In Proceedings of the Seminar in Computer Science (CS-E4000),pages 1–17, 2017.
[278] B. Zhan and M. P. L. Haslbeck. Verifying asymptotic time complexityof imperative programs in Isabelle. In Proceedings of 9th InternationalJoint Conference on Automated Reasoning, pages 532–548, 2018.
[279] X. Zhang, W. Hong, Y. Li, and M. Sun. Reasoning about connectorsusing Coq and Z3. Science of Computer Programming, 170:27–44,2019.
APPENDIXSystem details of 27 theorem provers.
General
Nam
eH
RIC
SA
naly
tica
Zen
onC
ontr
ibut
orSi
mon
Col
ton
Jean
-Chr
isto
phe
Filli
tre
E.C
lark
e&
X.Z
hao
R.B
onic
hon,
D.D
elah
aye
&D
.Dol
igez
1stR
el20
0220
0219
9020
07In
d/U
ni/I
ndU
nive
rsity
ofE
dinb
urgh
SRI
Inte
rnat
iona
l,U
SAC
arne
gie
Mel
lon
Uni
vers
ityIn
depe
nden
t
Implementation
CL
ang
Java
Oca
ml
Mat
hem
atic
aO
Cam
lPr
og.P
Func
tiona
l,C
oncu
rren
tFu
nctio
nal,
Impe
rativ
e,O
OPr
oced
ural
,Fun
ctio
nal,
OO
Func
tiona
l,Im
pera
tive,
OO
LVH
R2.
0Y
ices
Ana
lytic
a2
0.8.
2LT
Ope
nso
urce
Ope
nso
urce
Free
BSD
BSD
and
MIT
UI
CL
IC
LI
GU
IC
LI
OS
Lin
ux,W
indo
ws
Lin
ux,S
olar
is,M
AC
Cro
ssC
ross
Lib
API
No
Stan
dard
libra
rySt
anda
rdlib
rary
CG
No
No
No
No
Ed
Yes
Yes
Yes
No
Ext
Yes
Yes
Yes
Yes
I/O
No
No
Tex
files
No
Logico-Math
TTy
peT
heor
emge
nera
tor
Dec
isio
npr
oced
ure
AT
PA
TP
(Alg
ebra
icsp
ecifi
catio
n&
proo
fsy
stem
)C
Log
icFO
LFO
Lw
itheq
ualit
y&
quan
tifier
free
FOL
FOL
(with
poly
mor
phic
&eq
ualit
y)T
VB
inar
yB
inar
yB
inar
yB
inar
yST
No
No
No
B-M
etho
dse
tthe
ory
Cal
culu
sIn
duct
ive
Ded
uctiv
eD
educ
tive
Ded
uctio
nm
odul
o(T
able
aum
etho
d)Pr
oofK
erne
lN
oN
oY
esN
o
Others
App
.Are
asPr
oduc
ela
rge
num
ber
ofth
eore
ms
for
test
ing
AT
PSy
stem
sE
mbe
dded
inap
plic
atio
nto
prov
ide
dedu
ctiv
ese
rvic
esSy
mbo
licco
mpu
tatio
nsy
stem
Use
din
foca
len
viro
nmen
t,O
bjec
tor
i-en
ted
alge
bra
spec
ifica
tion
Eva
lZ
aris
kiSp
ecifi
catio
nN
ASA
,par
tof
PVS
Polic
yan
alys
isT
PTP
Cat
egor
ySe
t(2
27ou
tof
462)
SEU
(110
outo
f90
0)U
niqu
eFe
atur
esM
achi
neL
earn
ing
API
for
proo
fse
arch
and
sym
bolic
sim
ulat
ion
Tran
slat
edto
OM
Doc
fram
ewor
kPr
oduc
elo
wle
velp
roof
dire
ctly
forC
oq
24
General
Nam
eYa
rrow
Wat
son
KR
Hyp
er/E
-KR
Hyp
er/H
yper
iPro
ver
Con
trib
utor
Jan
Zw
anen
burg
M.R
anda
llH
olm
esB
jorn
Pelz
erK
onst
antin
Kor
ovin
1stR
el19
9720
0620
0720
08In
d/U
ni/I
ndE
indh
oven
Uni
vers
ityB
oise
Stat
eU
nive
rsity
Kob
lenz
Uni
vers
ityU
nive
rsity
ofM
anch
este
r
ImplementationC
Lan
gH
aske
llSt
anda
rdM
LO
Cam
lO
Cam
lPr
og.P
Func
tiona
l,M
odul
arFu
nctio
nala
ndIm
pera
tive
Func
tiona
l,Im
pera
tive,
OO
Func
tiona
l,Im
pera
tive,
OO
LVV
1.20
0.8.
21.
4V
0.99
LTFr
eeB
SDB
SDan
dM
ITG
NU
Gen
eral
GN
UG
PLG
NU
UI
GU
I&
CL
IC
LI
CL
IC
LI
OS
Uni
x,L
inux
Lin
uxW
indo
ws
and
Uni
xL
inux
Lib
Fudg
etfo
rgr
aphi
cali
nter
face
No
No
No
CG
No
No
No
No
Ed
Yes
No
Yes
Yes
Ext
Yes
Yes
Yes
Yes
I/O
Poly
mor
phic
No
TPT
Psu
ppor
ted
prot
ein
form
atN
o
Logico-Math
TTy
peIT
PIT
PA
TP
and
mod
elge
nera
tor
AT
PC
Log
icC
onst
ruct
ive
HO
LFO
Lw
itheq
ualit
yFO
LT
VB
inar
yB
inar
yB
inar
yB
inar
yST
No
Qui
ne’s
sett
heor
yN
oN
oC
alcu
lus
Type
dλ
-cal
culu
sTy
pedλ
-cal
culu
sH
yper
tabl
eau
calc
ulus
Inst
antia
tion
calc
ulus
Proo
fKer
nel
Yes
No
No
No
Others
App
.Are
asR
epre
sent
atio
nen
viro
nmen
tfo
rlo
g-ic
s&
prog
ram
min
gla
ngua
ges.
Soft
war
eve
rific
atio
n,m
odel
chec
k-in
g,ed
ucat
ion
&m
athe
mat
ics
Em
bedd
edin
know
ledg
ere
pres
enta
-tio
nsy
stem
sH
ardw
are
veri
ficat
ion
and
finite
mod
elin
gE
val
Form
aliz
edty
peth
eory
TPT
Pca
tego
rySo
lved
74%
ofth
esu
best
ofT
PTP
CA
SC-2
6E
PRdi
visi
onw
inne
rU
niqu
eFe
atur
esE
xper
imen
tco
ntex
tfo
rte
stin
gpu
rety
pesy
stem
Type
free
HO
Lsu
ppor
tU
sed
for
desc
ript
ion
logi
cpr
oble
ms
Mod
ular
com
bina
tion
ofpr
opos
ition
and
inst
antia
tiona
lrea
soni
ng
General
Nam
eJA
PEE
-Dar
win
lean
CoP
LE
O-I
IC
ontr
ibut
orR
icha
rdB
orna
tB
aum
gart
ner
Otte
nJe
nsO
tten
C.B
enzm
ulle
r,F.
The
iss,
N.S
ulta
na1s
tRel
1996
2005
2003
2012
Ind/
Uni
/Ind
Que
enM
arry
,Uni
vers
ityof
Lon
don
Kob
lenz
Uni
vers
ityU
nive
rsity
ofO
slo
Frei
eU
nive
rsity
Ber
linan
dC
am-
brid
geU
nive
rsity
Implementation
CL
ang
Java
OC
aml
Prol
ogO
Cam
lPr
og.P
Obj
ecto
rien
ted
(OO
)Fu
nctio
nal,
impe
rativ
e,O
OL
ogic
prog
ram
min
gFu
nctio
nal,
Impe
rativ
e,O
OLV
v7-d
151.
52.
11.
720
15LT
GN
UG
PLG
NU
Gen
eral
GN
Uge
nera
lB
SDU
IG
UI
CL
IC
LI
CL
IO
SL
inux
,Mac
Uni
x,W
indo
ws
Win
dow
s,U
nix,
Lin
ux,M
acU
nix,
Win
dow
sL
ibN
oN
oN
oY
esC
GN
oN
oN
oN
oE
dY
esY
esY
esY
esE
xtY
esY
esY
esY
esI/
ON
oIn
putT
PTP
orT
ME
form
atL
eanC
oPor
TPT
Psy
ntax
TPT
PT
HF
lang
uage
Logico-Math
TTy
pePr
oof
assi
stan
tA
TP
AT
PA
TP+
ITP
CL
ogic
FOL
FOL
clau
salw
itheq
ualit
yFi
rst-
orde
rin
tuiti
onis
ticH
OL
TV
Bin
ary
Bin
ary
Bin
ary
Bin
ary
STY
esN
oN
oN
oC
alcu
lus
Ded
uctiv
e&
Sequ
entc
alcu
lus
Mod
elev
alua
tion
Con
nect
ion/
tabl
eau
calc
ulus
Res
olut
ion
byun
ifica
tion
and
equa
l-ity
(RU
E)
Proo
fKer
nel
No
No
No
No
Others
App
.Are
asU
sed
asa
proo
fas
sist
anta
ndim
ple-
men
tJA
PEth
eori
esE
ncry
ptan
dso
lve
prob
lem
sFo
rmal
izat
ion
Coo
pera
tion
with
first
-ord
erA
TP
Eva
lTe
achi
ngpu
rpos
eto
olE
PRw
inne
rat
CA
SC-2
0an
dJ3
Thi
rdin
FOF
divi
sion
atC
ASC
-22
CA
SC-J
5w
inne
rin
TH
Fdi
visi
onU
niqu
eFe
atur
esFo
rwar
dre
ason
ing
and
logi
cen
cod-
ing
Bac
k-ju
mpi
ngan
ddy
nam
icba
ck-
trac
king
Prog
ram
can
beea
sily
mod
ified
for
spec
ific
task
orap
plic
atio
ndu
eto
itsco
mpa
ctco
de
Coo
pera
tive
Proo
fSe
arch
25
General
Nam
eM
aLA
Rea
Mus
cade
tPr
ince
ssSa
talla
xC
ontr
ibut
orJo
sef
Urb
anD
omin
ique
Past
rePh
ilipp
Rum
mer
Cha
dE
.Bro
wn
1stR
el20
0720
0320
0820
10In
d/U
ni/I
ndC
harl
esU
nive
rsity
Uni
vers
ityPa
ris
Des
cart
esU
ppsa
laun
iver
sity
Saar
land
Uni
vers
ity
Implementation
CL
ang
Perl
SWI-
Prol
ogSc
ala
OC
aml
Prog
.PFu
nctio
nal,
Impe
rativ
e,O
OL
ogic
prog
ram
min
gO
O,f
unct
iona
l,co
ncur
rent
Func
tiona
l,im
pera
tive,
OO
LV0.
54.
5V
2.1
3.0
LTG
PL2
BSD
LG
PLB
SDU
IC
LI
CL
IC
LI
CL
IO
SL
inux
Uni
x,L
inux
Lin
uxL
inux
Lib
No
No
No
No
CG
No
No
No
No
Ed
Yes
Yes
Yes
Yes
Ext
Yes
Yes
Yes
Yes
I/O
No
No
Nat
ive
lang
uage
SMT
No
Logico-Math
TTy
peA
TP
Kno
wle
dge
base
TP
The
orem
prov
erA
TP
CL
ogic
HO
Lse
cond
-ord
erlo
gic
FOL
HO
LT
VB
inar
yB
inar
yB
inar
yB
inar
yST
No
No
Mod
ular
linea
rin
tege
rar
ithm
etic
Chu
rch’
ssi
mpl
ety
peth
eory
Cal
culu
sD
educ
tive
Nat
ural
dedu
ctio
nFr
eeva
riab
leta
blea
u,C
onst
rain
edse
-qu
entc
alcu
lus
Tabl
eau
calc
ulus
Proo
fKer
nel
No
No
No
No
Others
App
.Are
asL
earn
ing
and
reas
onin
gsy
stem
for
prov
ing
inla
rge
form
allib
rari
esTo
polo
gica
llin
ear
spac
es,
cellu
lar
auto
mat
aSo
ftw
are
veri
ficat
ion
and
mod
elch
ecki
ngFo
rmal
izat
ion
Eva
lC
ASC
-24
LTB
divi
sion
win
ner
Win
ner
CA
SC-J
Cin
IJC
AR
2001
Solv
edpr
oble
ms
from
QF-
LIA
cate
-go
ryof
SMT
Lib
rary
CA
SC-2
6T
HF
divi
sion
win
ner
Uni
que
Feat
ures
Util
ize
AT
Pas
core
syst
emw
ithA
Ite
chni
ques
.E
ffici
entf
orth
epr
oble
ms
cont
aini
ngto
om
any
axio
ms
and
form
ulas
Solv
edqu
antifi
edm
odul
olin
eari
nte-
ger
arith
met
icSe
man
ticem
bedd
ing
and
cuts
imul
a-tio
n
General
Nam
eSP
ASS
Cor
alD
ISC
OU
NT
DO
RIS
Con
trib
utor
Chr
isto
phe
Wei
denb
ach
Ala
nB
undy
,G
raha
mSt
eel
and
Mon
ika
Mai
dlJo
rgD
enzi
nger
Joha
nB
os
1stR
el19
9920
0619
9719
98In
d/U
ni/I
nde
Max
Plan
ckIn
stitu
tefo
rC
ompu
ter
Scie
nce
Uni
vers
ityof
Edi
nbur
ghU
nive
rsity
ofC
alga
ryU
nive
rsity
ofE
dinb
urgh
Implementation
CL
ang
Java
Bui
lton
SPA
SSth
eore
mpr
over
CPr
olog
Prog
.PFu
nctio
nal
Func
tiona
lPr
oced
ural
Log
icpr
ogra
mm
ing
LV3.
920
082.
0D
OR
IS20
01LT
Free
BSD
Free
BSD
Free
BSD
Free
BSD
UI
CL
I+G
UI
GU
IC
LI
CL
IO
SW
indo
ws,
MA
C,L
inux
Cro
ssL
inux
,Sol
aris
Cro
ssPl
atfo
rmL
ibY
esY
esY
esN
oC
GN
oN
oN
oN
oE
dY
esY
esY
esY
esE
xtY
esY
esN
oY
esI/
ON
oN
oE
(Uni
vers
alIm
plic
atio
n)N
o
Logico-Math
TTy
peA
TP
Indu
ctiv
eth
eore
mpr
over
Dis
trib
uted
equa
tiona
lTP
TP+
Sem
antic
anal
yzer
CL
ogic
FOL
FOL
FOL
FOL
TV
Bin
ary
Bin
ary
Bin
ary
Bin
ary
STN
oN
oN
oN
oC
alcu
lus
Supe
rpos
ition
calc
ulus
Tabl
eau
calc
ulus
Pure
unit
equa
lity
Lam
bda
calc
ulus
Proo
fKer
nel
No
No
Yes
No
Others
App
.Are
asA
naly
sis
ofse
curi
typr
otoc
ols,
colli
-si
onav
oida
nce
prot
ocol
sC
rypt
ogra
phic
secu
rity
prot
ocol
anal
ysis
,fin
dat
tack
son
faul
tyse
curi
typr
otoc
ols
Mac
hine
lear
ning
Com
puta
tiona
lse
man
tics,
cove
rva
r-io
uslin
guis
ticph
enom
enas
Eva
lD
isco
vern
ewat
tack
son
the
Aso
kan-
Gin
zboo
rgpr
otoc
olD
isco
vern
ewat
tack
son
the
Tagh
idri
and
Jack
son
impr
oved
prot
ocol
Ent
ranc
eco
mpe
titio
nD
isco
unt/G
LSt
udy
beha
vior
ofR
OB
’sal
gori
thm
Uni
que
Feat
ures
AT
Pw
itheq
ualit
yFi
ndco
unte
rexa
mpl
eto
indu
ctiv
eco
njec
ture
Bas
edon
team
wor
km
etho
dfo
rkn
owle
dge
base
ddi
stri
butio
nTr
ansl
ate
Eng
lish
text
into
disc
ours
ere
pres
enta
tion
stru
ctur
e
26
General
Nam
eG
etfo
lG
oede
lE
xpan
der
Geo
met
ryE
xper
tC
ontr
ibut
orFa
usto
Giu
nchi
glia
Joha
nB
elin
fant
ePe
ter
Pada
witz
Xia
o-Sh
anG
ao1s
tRel
1994
2005
2007
1998
Ind/
Uni
/Ind
Uni
vers
ityof
Tren
toG
eorg
iaIn
stitu
teof
Tech
nolo
gyD
ortm
und
univ
ersi
tyK
eyL
abor
ator
yof
Chi
na
ImplementationC
Lan
gC
omm
onL
isp
Mat
hem
atic
aO
’Has
kell
(ex
tent
ion
ofH
aske
ll)Ja
vaPr
og.P
Met
are
flect
ive,
OO
,fun
ctio
nal
Func
tiona
l,pr
oced
ural
Con
curr
entp
rogr
amm
ing
Func
tiona
lLV
2.00
120
14E
xpan
der
MM
P/G
eom
eter
LTFr
eeB
SDFr
eeB
SDFr
eeB
SDG
NU
gene
ralp
ublic
lisce
nce
UI
GU
IG
UI
GU
IG
UI
OS
Uni
xC
ross
Cro
ssC
ross
Lib
Yes
No
Yes
No
CG
Yes
No
No
No
Ed
Yes
Yes
Yes
Yes
Ext
Yes
Yes
Yes
Yes
I/O
Proo
f-sc
ript
No
No
No
Logico-Math
TTy
peIT
PA
TP
AT
PA
utom
atic
Geo
met
ric
TP
CL
ogic
FOL
FOL
FOL
Dyn
amic
logi
cm
odel
sT
VB
inar
yB
inar
yB
inar
yB
inar
yST
No
ZF
sett
heor
ySw
ingi
ngty
pes
No
Cal
culu
sN
atur
alde
duct
ion
Nat
ural
dedu
ctio
nN
arro
w&
Fixe
dpo
intc
o-in
duct
ion
Euc
lidea
nan
ddi
ffer
entia
lgeo
met
ryPr
oofK
erne
lN
oN
oN
oN
o
Others
App
.Are
asIn
vari
ous
data
stru
ctur
ere
alw
orld
embe
ddin
gD
eriv
ene
wth
eore
mfo
rau
tom
ated
reas
onin
gTe
stin
gal
gebr
aic
data
type
and
func
-tio
nall
ogic
prog
ram
Teac
hing
geom
etry
,al
gebr
aan
dph
ysic
sin
Chi
naat
scho
olE
val
Ent
ranc
ein
CA
DE
-17
393
exam
ple
ofQ
AIF
in91
seco
nds
Supe
rco
ncen
trat
ors
Wu’
sm
etho
dim
plem
ente
dU
niqu
eFe
atur
esM
eta-
theo
ryim
plem
enta
tion
Red
uce
num
ber
ofst
eps
Inte
ract
ive
term
rew
ritin
g,gr
aph
tran
sfor
mat
ion,
seve
ral
repr
esen
ta-
tion
offo
rmal
expr
essi
on
Aut
omat
edge
omet
ric
diag
ram
con-
stru
ctio
n
General
Nam
eD
TP
Z/E
VE
SG
raffi
tiC
ontr
ibut
orD
onG
eddi
sM
ark
Saal
tinkz
Eps
tein
1stR
el19
9519
9719
86In
d/U
ni/I
ndSt
anfo
rdU
nive
rsity
Uni
vers
ityof
Ken
tU
nive
rsity
ofH
oust
on
Implementation
CL
ang
Com
mon
Lis
pC
omm
onL
isp
C++
Prog
.PM
eta
refle
ctiv
e,O
O,f
unct
iona
lM
eta
refle
ctiv
e,O
O,f
unct
iona
l,pr
o-ce
dura
lPr
oced
ural
,OO
,sta
tical
lyty
pe,t
ype
chec
king
LV3.
02.
4.1
GR
AFF
ITI
LTFr
eeB
SDFr
eeB
SDFr
eeB
SDU
IC
LI
GU
I/C
LI
GU
IO
SC
ross
Win
dow
s,U
nix,
Lin
ux,M
acU
inx
Lib
Yes
Epi
log
Yes
Yes
CG
No
Yes
No
Ed
Yes
Yes
Yes
Ext
Yes
Yes
Yes
I/O
KIF
(Kno
wle
dge
Inte
rcha
nge
For-
mat
)L
atex
No
Logico-Math
TTy
peM
odal
elim
inat
ion
theo
rem
prov
erA
TP+
ITP
Dec
isio
npr
oced
ure
CL
ogic
FOL
FOL
Gra
phth
eory
TV
Bin
ary
Bin
ary
Bin
ary
STH
orn
theo
ries
ZF
sett
heor
y,A
xiom
atic
sett
heor
yY
esC
alcu
lus
Firs
t-or
der
pred
icat
eca
lcul
usλ
-cal
culu
sD
educ
tive
Proo
fKer
nel
No
No
No
Others
App
.Are
asB
lack
box
infe
renc
een
gine
for
vari
-ou
sm
achi
nele
arni
ngpr
ogra
mG
ener
alth
eore
mpr
over
Use
din
chem
istr
yan
dm
athe
mat
ics
(gra
phth
eory
)for
mak
ing
conj
ectu
reE
val
Spec
ifica
tion
Bar
bara
Proj
ect
Uni
que
Feat
ures
Dom
ain
inde
pend
entc
ontr
olof
infe
r-en
ceT
heor
empr
over
,sy
ntax
and
type
chec
ker,
dom
ain
chec
ker
Peda
gogi
cto
ol