Reliable Reconstruction of Fine-Grained Proofs in a Proof ... · We compare our procedure with the...

21
Reliable Reconstruction of Fine-Grained Proofs in a Proof Assistant Hans-J¨ org Schurr 1 , Mathias Fleury 2,3 , and Martin Desharnais 4 1 University of Lorraine, CNRS, Inria, and LORIA, Nancy, France [email protected] 2 Johannes Kepler University Linz, Linz, Austria [email protected] 3 Max-Planck Institute f¨ ur Informatik, Saarland Informatics Campus, Saarbr¨ ucken, Germany 4 Universit¨ at der Bundeswehr M¨ unchen, M¨ unchen, Germany [email protected] Abstract. We present a reliable and fast reconstruction of proofs gen- erated by the SMT solver veriT in Isabelle. The fine-grained proof format makes the reconstruction simple and efficient. For typical proof steps, such as arithmetic reasoning and skolemization, our reconstruction is able to avoid expensive search. By skipping proof steps which are not relevant for Isabelle, it is possible and improves the performance of proof checking. Our method increases the success rate of Sledgeham- mer and is as robust as the existing Z3 integration. We provide a detailed evaluation of the reconstruction time for each rule. This indicates, that the runtime is influenced by both simple rules that appear very often and common complex rules. Keywords: automatic theorem provers · proof assistants · proof verification 1 Introduction Proof assistants are used in verification, formal mathematics, and other areas to provide trustworthy, machine-checkable formal proofs of theorems. Proof au- tomation reduces the burden of proof on users, thereby allowing them to focus on the core of their arguments instead of technical details. A successful approach to automation is to invoke an external automatic theorem prover, such as a Sat- isfiability Modulo Theories (SMT) solver [10], and to reconstruct the generated proof using the proof assistant’s inference kernel. Typically, such invocations will be generated by a “hammer”, like Sledgehammer for Isabelle [13]. Invoked on a proof goal, this tool heuristically selects facts from the background theory and uses external solvers to filter out facts not needed to discharge the goal. A proof by an external prover is not trusted and Sledgehammer subse- quently attempts to find the fastest trusted method that can recreate the proof. This is often a call to an automated theorem prover for which a proof recon- struction method exists. Isabelle’s smt proof method is such a procedure. It

Transcript of Reliable Reconstruction of Fine-Grained Proofs in a Proof ... · We compare our procedure with the...

Page 1: Reliable Reconstruction of Fine-Grained Proofs in a Proof ... · We compare our procedure with the existing Z3 integration and show similar levels of robustness while solving more

Reliable Reconstruction of Fine-Grained Proofsin a Proof Assistant

Hans-Jorg Schurr1 , Mathias Fleury2,3 , and Martin Desharnais4

1 University of Lorraine, CNRS, Inria, and LORIA, Nancy, [email protected]

2 Johannes Kepler University Linz, Linz, [email protected]

3 Max-Planck Institute fur Informatik, Saarland Informatics Campus, Saarbrucken,Germany

4 Universitat der Bundeswehr Munchen, Munchen, [email protected]

Abstract. We present a reliable and fast reconstruction of proofs gen-erated by the SMT solver veriT in Isabelle. The fine-grained proofformat makes the reconstruction simple and efficient. For typical proofsteps, such as arithmetic reasoning and skolemization, our reconstructionis able to avoid expensive search. By skipping proof steps which are notrelevant for Isabelle, it is possible and improves the performance ofproof checking. Our method increases the success rate of Sledgeham-mer and is as robust as the existing Z3 integration. We provide a detailedevaluation of the reconstruction time for each rule. This indicates, thatthe runtime is influenced by both simple rules that appear very often andcommon complex rules.

Keywords: automatic theorem provers · proof assistants ·proof verification

1 Introduction

Proof assistants are used in verification, formal mathematics, and other areasto provide trustworthy, machine-checkable formal proofs of theorems. Proof au-tomation reduces the burden of proof on users, thereby allowing them to focuson the core of their arguments instead of technical details. A successful approachto automation is to invoke an external automatic theorem prover, such as a Sat-isfiability Modulo Theories (SMT) solver [10], and to reconstruct the generatedproof using the proof assistant’s inference kernel. Typically, such invocations willbe generated by a “hammer”, like Sledgehammer for Isabelle [13]. Invokedon a proof goal, this tool heuristically selects facts from the background theoryand uses external solvers to filter out facts not needed to discharge the goal.

A proof by an external prover is not trusted and Sledgehammer subse-quently attempts to find the fastest trusted method that can recreate the proof.This is often a call to an automated theorem prover for which a proof recon-struction method exists. Isabelle’s smt proof method is such a procedure. It

Page 2: Reliable Reconstruction of Fine-Grained Proofs in a Proof ... · We compare our procedure with the existing Z3 integration and show similar levels of robustness while solving more

2 Schurr, Fleury, and Desharnais

translates the current goal to the smt-lib format [9], runs an SMT solver, parsesthe proof, and replays it through Isabelle’s kernel. The tactic was originallydeveloped for the SMT solver Z3 [16, 32].

The SMT solver CVC4 [8] is one of the best solvers on Sledgehammergenerated problems [12], but currently does not produce proofs for problems withquantifiers. To reconstruct its proofs, Sledgehammer mostly uses the smt proofmethod based on Z3. Since CVC4 uses more elaborate quantifier instantiationtechniques then Z3, a problem provable for CVC4 might be unprovable forZ3. veriT [17] (Sect. 2) supports these techniques and generates proofs andreconstructing its proofs increases Sledgehammer’s success rate.

Here, we extend the smt tactic to support the SMT solver veriT (Sect. 3).We devised an early prototype [6] of the extension to validate the proof rulesfor skolemization. We also published progress towards making it efficient andreliable [23] (Sect. 3).

To simplify the reconstruction, we extend veriT to produce more detailedproofs. Instead of a single rule simplifying the term with a combination of propo-sitional, arithmetic, and quantifier reasoning, veriT now gives several detailedsteps to simplify checking. Similarly, it provides more information to avoid search,like for linear arithmetic and for normalizations performed automatically. veriTproofs contain idioms whose steps are more efficiently checked at once. Therefore,we have added a step skipping mode that combines (Sect. 4).

Because a typical theory file may contain many calls to the smt method, usersexpect the entire method, viz. goal translation, proof search, and reconstruction, tobe fast. The execution time of the goal translation depends on the size of the goaland is independent of the solver. Multiple options influence the execution time ofan SMT solver. To fine-tune veriT’s search procedure we selected four differentcombinations of options, or strategies, by generating typical problems and selectingoptions with complementary performance on these problems. Upon reconstuction,Sledgehammer compares these strategies and suggests the best performingone to the user. We then evaluate the reconstruction with Sledgehammer onan extensive benchmark. The resulting tactic is as efficient as the existing one forZ3 and improves reconstruction, in particular for CVC4. We also study the timerequired to reconstruct each kind of rule to see how to improve reconstruction.Many simple rules occur very often, showing the importance of step skipping(Sect. 5).

Finally, we discuss related work (Sect. 6). We integrated the work presentedhere into Isabelle version 2021-RC3 and it should be active in the releaseexpected February 2021; i.e., Sledgehammer can also suggest veriT.

2 veriT and Proofs

veriT is an open source, CDCL(T ) SMT solver. In proof-production mode, itsupports the theories of uninterpreted functions with equality, linear real andinteger arithmetic, and quantifiers. To support quantifiers veriT uses quantifierinstantiation and extensive preprocessing. It supports smt-lib [9] as input.

Page 3: Reliable Reconstruction of Fine-Grained Proofs in a Proof ... · We compare our procedure with the existing Z3 integration and show similar levels of robustness while solving more

Reliable Reconstruction of Fine-Grained Proofs in a Proof Assistant 3

A proof produced by veriT is an indexed list of steps resembling the smt-lib format. Each step has a conclusion term and is annotated with a rule, a listof premises, and optionally some rule-dependent arguments. veriT distinguishes90 different rules [37]. Subproofs enable the use of local assumptions or of anadditional context to reason about binders, e.g., to express preprocessing stepslike transformation under quantifiers. The conclusion of rules with contexts arealways equalities. The context models a substitution into the free variables of theterm on the left-hand side of the equality. Consider the following proof extract:

(assume a0 (exists (x A) (P x))

(anchor :step t1 :args (:= x vr))

(step t1.t1 (cl (= x vr)) :rule cong)

(step t1.t2 (cl (= (f x) (f vr))) :rule cong)

(step t1 (cl (= (exists (x A) (f vr))

(exists (vr A) (f vr))) :rule bind)

The assume command is used to assert assumptions or repeat them from theinput problem. Subproofs start with the anchor command which introduces acontext. Semantically, the context is a shorthand for a lambda abstraction ofthe free variable and an application of the substituted term. Here the contextis x ÞÑ vr and the step t1.t1 means Bpλx. xq vr “ vr. The step is proven byapplying congruence (rule cong). Then congruence is applied again (step t1.t2)to prove that pλx. f xq vr “ f vr and step t1 concludes the α-renaming.

Internally veriT records proofs in an eager fashion. During proof searcheach module appends steps into an array. Once the proof is completed, veriTperforms some cleanup before printing the proof. First, a pruning phase removesduplicates and branches of the proof not connected to the root K. Second, amerge phase detects shared conclusions and removes the duplicates. The final passprepares the data structures for the optional term sharing via name annotationsand the explicit definition of skolem functions.

3 Overview of the veriT-Powered smt Tactic

Isabelle is a generic proof assistant based on an intuitionistic logic framework,Pure, and is almost always only used parameterized with a logic. In this work weuse only Isabelle/HOL, the parameterization of Isabelle with higher-orderlogic with rank-1 (top level) polymorphism. Isabelle adheres to the LCF [24]tradition. Only inferences that go through the small kernel are allowed. Tacticsare programs that prove a goal by using only the kernel. This tradition alsomeans that external tools, like SMT solvers, are not trusted.

Nevertheless, external tools are successfully used. Either they provide informa-tion, such as unsat cores, which are useful to find relevant facts and pass themto internal tactics which can prove a goal (preplay), or they provide a detailedproof which is subsequently reconstructed. In Isabelle/HOL the Sledge-hammer tool implements the former, while tactics like smt implement the latterapproach. The focus of our work is the smt tactic – our reconstruction procedure

Page 4: Reliable Reconstruction of Fine-Grained Proofs in a Proof ... · We compare our procedure with the existing Z3 integration and show similar levels of robustness while solving more

4 Schurr, Fleury, and Desharnais

translates proofs generated by veriT into calls to the kernel –, but we alsoextended Sledgehammer such that it also suggests the new tactic to users.

The smt tactic translates the current goal to the smt-lib format [9], runsan SMT solver, parses the proof, and replays it through Isabelle’s kernel. Tochoose the existing smt tactic the user calls (smt (z3)) to use Z3 and (smt

(verit)) to use veriT. We will refer to them as z-smt and v-smt. The proofformats of Z3 and veriT are so different that reconstruction is not shared.

The first step negates the proof goal to allow proof by contradiction and toencode the goal into the first-order logic supported by veriT and Z3. ThenveriT is called to find a proof.

The second step parses the found proof and encodes it as a directed acyclicgraph with K as the only conclusion. This graph may contain subproofs with localassumptions. The proof structure is slightly amended to simplify reconstruction.

The third step converts the smt-lib terms to typed Isabelle terms. It alsoreverses most of the encoding used to convert higher-order into first-order terms.

The fourth step finally traverses the proof graph, checks that all input asser-tions, as repeated in the proof output, match their Isabelle counterpart andthen reconstruct the proof step-by-step using the kernel’s primitives.

4 Tuning the Reconstruction

In this section, we describe the changes we made to the reconstruction in order toimprove efficiency. Most changes are backed up by a change in veriT to produceproofs with more details. We split a rule which collected various preprocessingsimplifications into small and well-defined rules (Sect. 4.1). Previously, veriT im-plicitly normalized every generated step; e.g., repeated literals were immediatelydeleted. It now produces proofs for this (Sect. 4.2). Finally, the linear-arithmeticsteps contains coefficients which allow Isabelle to reconstruct the step withoutrelying on its limited arithmetic automation (Sect. 4.3). On the Isabelle side,the reconstruction selectively decodes the first-order encoding to preserve thestructure of terms (Sect. 4.4). To improve the performance of the reconstruction,it skips some proof steps (Sect. 4.5).

4.1 Preprocessing Rules

During preprocessing, veriT performs simplifications on the operator levelwhich are often akin to simple calculations. For example, it replaces the terma ˆ 0 ˆ fpxq by the constant 0. Previously, the rule connect equiv collectedall those simplifications in a single rule. Furthermore, no list of the operationsperformed by this rule existed and we had to fall back on an experimentallycreated list of automatic tactics for efficiency.

By studying the source code of veriT, we create a list of all simplificationsemitted by this rule and split it into 17 new rules: one rule per arithmeticoperator, one to replace boolean operators such as xor with their definition, andone to replace n-ary operator applications with binary applications. The example

Page 5: Reliable Reconstruction of Fine-Grained Proofs in a Proof ... · We compare our procedure with the existing Z3 integration and show similar levels of robustness while solving more

Reliable Reconstruction of Fine-Grained Proofs in a Proof Assistant 5

above now produces a prod simplify step with the conclusion aˆ 0ˆ fpxq “ 0.This is a compromise with having one rule for every possible simplification. Ourapproach has the benefit that the fallback to automated tactics utilized by thereconstruction can catch simplifications added to veriT in the future until thereconstruction is adapted.

On the Isabelle side, the reconstruction becomes easier and faster, becausewe can direct the search instead of trying automated tactics that can also workon other parts of the formula. We use the simplifier to reconstruct some of thoserules. For example, the simplifier handles the numeral manipulations of the prodsimplify rule and we restrict it to only use arithmetic lemmas.

Moreover, since we know the performed transformations, we can ignore someparts of the terms by abstracting or replacing them by constants. Because termsare smaller, the search is more directed, and we are less likely to hit the search-depth limitation of Isabelle’s auto tactic as before. Overall, the reconstructionis more robust and easier to debug.

4.2 Implicit Steps

The internal API of veriT’s proof module performs automatic normalizationof each recorded proof step. Namely, it removes repeated literals and doublenegations from literals and simplifies clauses that contain complementary literalsto J. While documented, the normalization was not proof producing.

We extend the normalization code such that it now generates proof steps.For the removal of duplicated literals and the detection of clauses with comple-mentary literals we add a step that has the original clause as premises and thesimplified clause as conclusion. To remove a double negation t we introducethe tautology t _ t and resolve it with the original clause. Since the APIhides the normalization of steps, our changes do not affect any other part ofveriT. The solver also prunes steps concluding J.

On the Isabelle side, the reconstruction becomes more regular with fewerspecial cases. For example, the rule ite1 corresponds to the simplification rulepif b then Q else P q ùñ b _ P . It can also produce duplicates in the specificcase pif P then Q else P ) ùñ P _ P . The reconstruction can apply therule and the duplicates are now removed in a subsequent step. In both cases,abstraction over the literals is possible. The reconstruction used to deal withthis by first generating the conclusion of the theorem and then running thesimplifier to match veriT’s conclusion. The extra step improves the reliabilityof the reconstruction, because the tactic is only applied to constants and is notdistracted by the potentially complex structure of P .

We also improve the proof and reconstruction of quantifier instantiation steps.The instantiation scheme conflicting instances only works on clauses and implic-itly clausified terms before instantiation. We introduce an explicit quantified-clausification rule qnt cnf that is issued before instantiating. While this rule isnot detailed, knowing when clausification is needed improves reconstruction. Theclausifiction is also shared between instantations of the same term.

Page 6: Reliable Reconstruction of Fine-Grained Proofs in a Proof ... · We compare our procedure with the existing Z3 integration and show similar levels of robustness while solving more

6 Schurr, Fleury, and Desharnais

4.3 Arithmetic Reasoning

veriT supports linear real and integer arithmetic. The real arithmetic solverrelies on the simplex method and follows the implementation in yices [19]. Theinteger arithmetic solver is a restricted branch and bound method.

The la generic rule is the core proof rule generated by the arithmetic solver.It produces a tautological clause of linear inequalities and equations. A la

generic step is created whenever the arithmetic solver determines that thecurrent propositional model is unsatisfiable in the theory of linear arithmetic.The justification of the step is a list of coefficients such that the linear combinationis a trivially contradictory inequality after simplification (e.g., 0 ě 1). Farkaslemma guarantees the existence of such coefficients for the reals. The simplexalgorithm calculates the coefficients during normal operation.

veriT also strengthens inequalities on integer variables before adding them tothe simplex tableau. For example, the inequality 2x ă 3 becomes x ď 1, if x is anintegers. Rational coefficients for integer variables express strengthening: 2x ă 3multiplied by 1

2 gives x ď 1. The reconstruction must replay this strengthening.

For example, the linear arithmetic proof step 1 ă x_ 2x ă 3 looks like:

(step t11 (cl (< 1 x) (< (* 2 x) 3))

:rule la_generic :args (1 (div 1 2)))

To replay linear arithmetic steps, Isabelle can use the tactic linarith asused for Z3 proofs. It searches the coefficients necessary to verify the lemma.The reconstruction used it previously [23], but the tactic can only find integercoefficients and fails if strengthening is required. Now the rule is a mechanicallycheckable certificate.

The reconstruction of a la generic step starts with a lemma of the formŽ

i ci where each ci is either an equality or an inequality. Before starting thereconstruction, it abstracts over the non-arithmetic parts. Then it transforms thelemma into the equivalent formulation c1 ùñ ¨ ¨ ¨ ùñ cn ùñ K and removesall negations (e.g., replacing a ď b by b ą a).

After that, the method combines the theorems. As the first step, it multipliesthe equation by the corresponding coefficient, based on Isabelle theorems. Forexample, for integers, for the equation A ă B and the rational coefficient p

q (with

p ą 0 and q ą 0), it strengthens the equation and multiplies by p to get

pˆ pAdiv qq ` pˆ pif Bmod q “ 0 then 1 else 0q ď pˆ pB div qq.

The if-then-else term pif Bmod q “ 0 then 1 else 0q corresponds to the strength-ening. If Bmod q “ 0, the result is an equation of the form A1 ` 1 ď B1, i.e.,A1 ă B1. No strengthening is required for the corresponding theorem over reals.

Finally, we can combine all the equations by summing them while being carefulwith the equalities that can appear. We simplify the resulting (in)equality usingIsabelle’s simplifier to derive K.

Page 7: Reliable Reconstruction of Fine-Grained Proofs in a Proof ... · We compare our procedure with the existing Z3 integration and show similar levels of robustness while solving more

Reliable Reconstruction of Fine-Grained Proofs in a Proof Assistant 7

4.4 Selective Decoding of the First-order Encoding

One example of a reconstruction issue purely on the Isabelle side that showsthe interplay of the higher-order encoding is the rule eq congruent pred. Thisrule expresses congruence on a predicate: pt1 “ u1q _ ¨ ¨ ¨ _ ptn “ unq _P pt1, . . . , tnq Ø P pu1, . . . , unq. For example, veriT can produce the theoremf ‰ f 1 _ x ‰ x1 _ apppf, xq Ø apppf 1, x1q, where app is a function introducedby the first-order encoding to encode function application. The reconstructionassumes f “ f 1, then identifies the arguments, and finally rewrites them in theleft side to produce apppf, xq Ø apppf 1, xq.

However, Isabelle β-expands all terms implicitly, changing the term structure;for example, when f and f 1 are lambda function like f :“ λx. x “ a andf 1 :“ λx. a “ x. After unfolding all constructs which encode higher-order termsand after β-expansion, we cannot identify the arguments f and f 1 anymore inapppf, xq Ø apppf 1, xq, i.e., px “ aq Ø pa “ y1q _ pλx. x “ aq ‰ pλx. a “ x1q _

px ‰ x1q. Keeping the encoding simplifies the reconstruction of eq congruent

pred steps by avoiding special cases, makes the code easier to test, and makes thereconstruction tactic easier to implement and maintain. Our earlier prototypehad a special case to detect lambda functions, but the code was very involved.

4.5 Skipping Steps

A major difference between the proofs generated by Z3 and veriT is the numberof steps. The latter produces more detailed proofs, but this can be detrimental toperformance. The or rule only returns its premise, but accounts for more than 1%of the reconstruction. A more involved example is skolemization from Dv. P v: Itis one step in Z3 and at least five in veriT—first pDv. P vq “ P pεv. P vq (withsubproof of at least 2 steps), ppDv. P vq ‰ P pεv. P vqq_ pDv. P vq_P pεv. P vqand finally P pεv. P vq by resolution.

The current reconstruction skips two kinds of steps. First, it replaces everyusage of the or rule by its premise. Second, it skips the renaming of boundvariables. veriT treats @x. P x and @y. P y as two different terms and producesa detailed proof of the conversion. Isabelle, on the other hand, uses de Brujnindices and variable names are irrelevant. Hence, we replace bind steps of theform p@x. P xq ðñ p@y. P yq by a single application of reflexivity. Since veriTinitially renames all variables to canonical names, this eliminates many steps.

We can also replay multiple steps at once. veriT generates the idiom “equivpos2; th resolution” for each skolemization and variable renaming. Step skip-ping replaces it by a single step which we replay using a specialized theorem.

On proof with quantifiers, step skipping can remove more than half of thesteps—only two steps remain in the skolemization example above—but it hasthe drawback that the reconstruction is no longer a full proof checker. While thisis not a problem for Isabelle users, errors in the proof output of veriT couldin theory remain undiscovered.

Page 8: Reliable Reconstruction of Fine-Grained Proofs in a Proof ... · We compare our procedure with the existing Z3 integration and show similar levels of robustness while solving more

8 Schurr, Fleury, and Desharnais

Table 1. Options corresponding to the different veriT strategies

Name Options

default (no option)del insts --index-sorts --index-fresh-sorts --ccfv-breadth --inst-deletion

--index-SAT-triggers --inst-deletion-loops --inst-deletion-track-varccfv SIG --triggers-new --index-SIG --triggers-sel-rm-specificccfv insts --triggers-new --index-sorts --index-fresh-sorts --triggers-sel-rm-specific

--triggers-restrict-combine --inst-deletion-loops --index-SAT-triggers--inst-deletion-track-vars --ccfv-index=100000 --ccfv-index-full=1000--inst-sorts-threshold=100000 --ematch-exp=10000000 --inst-deletion

best --triggers-new --index-sorts --index-fresh-sorts --triggers-sel-rm-specific

5 Evaluation

During development we routinely tested our proof reconstruction to find bugs. Asa side effect, we produced SMT-LIB files corresponding to the calls. We measurethe performance of veriT with various options on them and select five differentstrategies (Sect. 5.1). After that we evaluate the repartition of the tactics calledby Sledgehammer (Sect. 5.2), the impact of the rules (Sect. 5.3), and of stepskipping (Sect. 5.4).

We performed the training on a computer with two Intel Xeon Gold 6130CPUs with a total of 32 cores and 192 GB of RAM and all Isabelle experimentswith Isabelle2021-RC2 on a computer with an AMD EPYC 7702 CPU with 64cores amounting to 256 threads and 2 TB RAM.

5.1 Strategies

We test the reconstruction by using Isabelle’s mirabelle tool. It reads theoriesand automatically runs Sledgehammer [12] on all proof steps. Sledgeham-mer selects facts from the background theory and calls various automatic provers(here the SMT solvers CVC4, veriT, and Z3 and the superposition proverE [35]) to filter facts and then attempts to find a trusted tactic that can provethe goal from the filtered facts. It chooses the fastest tactic.

We call mirabelle on some Isabelle files from the theories we consider withonly veriT as fact-filter backend. This produces SMT files typical for the kindsof problems Isabelle users want to solve and a series of calls to v-smt. Forfailing v-smt calls three cases are possible: veriT does not find a proof, thereconstruction works but takes more time than allocated, or it fails. In the lastcase we fixed the reconstruction.

veriT exposes a wide range of options to fine-tune the proof search. Inthe SMT competition it utilizes a scheduler which runs the solver with differentcombinations of options (strategies) on short timeouts. To find strategies whichwork well on typical Sledgehammer problems, we determine which problemsare solved by each strategy within a two second timeout. We then choose thestrategy which solves the most benchmarks and three strategies which together

Page 9: Reliable Reconstruction of Fine-Grained Proofs in a Proof ... · We compare our procedure with the existing Z3 integration and show similar levels of robustness while solving more

Reliable Reconstruction of Fine-Grained Proofs in a Proof Assistant 9

solves the most benchmarks. We also keep the default strategy. For this trainingwe uses the theories from HOL-Library (an extended standard library containingvarious developments), a formalization of Green’s theorem [1, 2], of the PrimeNumber Theorem [21], and of the KBO ordering [11].

The strategies are shown in Tab. 1 and mostly differ in the instantiationschemes. The strategy del insts uses instance deletion [5] and uses a breadth-firstalgorithm to find conflicting instances. All other strategies utilize an implementa-tion of triggers which use trigger inference techniques [27]. The strategy ccfv SIGuses a different indexing method for instantiation. It also restricts enumerativeinstantiation [33], because the options “--index-sorts --index-fresh-sorts” are notused. The strategy ccfv insts increases some thresholds. Finally, the strategy bestutilizes a subset of the options used by the other strategies.

We have also considered using a scheduler as used in the SMT competitionin Isabelle. The advantage of this approach is that we do not need to selectthe strategy on the Isabelle side. However, it would make v-smt unreliable.A problem solved by only one strategy just before the end of its time slice canbecome unprovable on slower hardware. Issues with z-smt timeouts have beenreported on the Isabelle mailing list, for example due to an antivirus delayingthe startup [25].

5.2 Sledgehammer Experiments

To measure the performance of the v-smt tactic we ran mirabelle on the full HOL-Library, the theory Prime Distribution Elementary (PDE) [20], a formalizationof an executable resolution prover (RP) [34], and a formalization of the Simplexalgorithm used within SMT solvers [28]. We extended Sledgehammer’s proofpreplay to try all veriT strategies and added instrumentation for the time of alltried tactics. For each theory, we ran mirabelle multiple times using the provers E,CVC4, veriT, and Z3 as fact-filter backend, both with and without v-smt as aproof-preplay tactic. Sledgehammer and provers are mostly non-deterministicprograms. To reduce the variance between the different mirabelle runs, we usethe deterministic MePo fact filter [31] and not the better performing MaSh [26]that uses machine learning (and depends on previous runs) and underuse thehardware to minimize contention. We use the default timeouts of 30 secondsfor the fact-filtering and 1 second for the proof preplay. This is similar to theJudgment Day experiments [15].

Success Rate. Users are not interested in which tactics are used to prove a goal,but only in how often Sledgehammer succeeds. When running it, there arethree possible outcomes: it (i) finds a one-liner, (ii) detects that all one-linerstimeout, but finds a proof with intermediate steps, or (iii) finds no working proof(the solver is an oracle). A failure indicates that no tactic was able to reconstructthe proof. We did not ask Sledgehammer to produce the more detailed Isarproofs, but still got some.

Tab. 2 gathers the results with and without v-smt as proof-preplay tactic forSledgehammer. We can see not only that the number of failures significantly

Page 10: Reliable Reconstruction of Fine-Grained Proofs in a Proof ... · We compare our procedure with the existing Z3 integration and show similar levels of robustness while solving more

10 Schurr, Fleury, and Desharnais

Table 2. Outcome of Sledgehammer calls showing when the reconstruction finds asingle tactic to replay the proof (one-line, OL), needs intermediate steps (Isar), or fails

HOL-Library Simplex RP PDEProver OL Isar Fail OL Isar Fail OL Isar Fail OL Isar Fail

with v-smt:CVC4 7761 ` 0 73 599 ` 0 5 1104 ` 0 5 1080 ` 0 10E 7897 ` 0 186 659 ` 0 20 1056 ` 0 14 1039 ` 0 13veriT 6982 ` 12 24 477 ` 3 0 1001 ` 1 2 997 ` 2 3Z3 7276 ` 19 22 522 ` 1 0 1064 ` 1 4 1025 ` 2 3

without v-smt:CVC4 7635 ` 0 217 597 ` 0 16 1111 ` 0 14 1076 ` 0 19E 7810 ` 0 253 653 ` 0 26 1048 ` 0 19 1038 ` 0 14veriT 6874 ` 57 78 470 ` 8 0 993 ` 6 5 983 ` 6 11Z3 7136 ` 55 62 492 ` 4 5 1024 ` 3 7 1023 ` 2 5

decreases for CVC4 showing that the additional instantiations techniques notsupported by Z3 are helpful, but also that overall, the number of failures decreaseswhen Sledgehammer can call v-smt. This table includes the goals solved bybuilt-in Isabelle tactics. These tactics are less powerful but faster than smt.

In the cases of veriT and Z3, if the reconstruction were fast and perfect,v-smt or z-smt could replay every proof as a one-liner and there would beno failure at all. veriT and Z3 have a similar failure rate, showing that ourreconstruction is as reliable as the one from Z3.

We can confirm the observation from [12] that producing Isar proofs improvesthe efficiency of Sledgehammer. While CVC4 cannot produce proofs, gener-ation of Isar proof works better for Z3 than for veriT. Isar reconstruction forveriT proof requires inlining assumptions and goals and replacing the equiva-lence introducing skolem constants in a separate step. This makes the generatedproofs unnatural (the goals in each steps can be long) and more likely to timeout.

Overall, these results show that our proof reconstruction enables Sledge-hammer to successfully preplay more proofs. For the user, this means thatenabling v-smt as a proof preplay tactic increases the number of goals that canbe fully automatically proved.

Repartition. When resorting to a proof using a smt tactic, Sledgehammer triesall available tactics and strategies in parallel and suggests the fastest, successfultactic to the user. Tab. 3 shows the distribution of the 3252 proofs by smtsuggested by Sledgehammer in our experiment with v-smt. In total, z-smtwas selected for approximately 60% of the calls and v-smt for the remaining40%. Of the five strategies, best outperforms the others, but the remaining fourare selected a comparable number of time. These results suggest that we weresuccessful in selecting complementary strategies.

Overall, these results show that the performance of our proof reconstructionis competitive with that of z-smt. For the user, this means that enabling v-smtas a proof preplay tactic improves the performance of their proofs.

Page 11: Reliable Reconstruction of Fine-Grained Proofs in a Proof ... · We compare our procedure with the existing Z3 integration and show similar levels of robustness while solving more

Reliable Reconstruction of Fine-Grained Proofs in a Proof Assistant 11

Table 3. Tactic used to reconstruct proofs produced by Mirabelle

Tactic StrategyTheories

Total HOL-Library PDE RP Simplex

z-smt 1979 1566 169 92 152v-smt (combined) 1273 988 144 73 68

best 487 405 38 23 21ccfv SIG 269 196 41 18 14

v-smt del insts 212 165 24 10 13ccfv threshold 159 126 14 13 6default 148 98 27 9 14

5.3 Performance

To better understand what the key rules of our reconstruction are, we recordedthe time used to reconstruct each rule. Fig. 1 shows the distribution of the timespent on some rules.5 We remove the slowest and fastest 5% of the applications,because garbage collection can trigger at any moment and even trivial rulescan take a long time. Fig. 2 gives the sum of all reconstruction times over allproofs. parsing is the time required to parse and convert the veriT proof intoIsabelle terms.

Overall, there are two kinds of rules: (1) direct application of a sequence oftheorems (e.g., equiv pos2 corresponds to theorems of the form pa Ø bq _ a_ b), and (2) calls to full-blown tactics (like qnt simplify).

First, direct application of theorems (e.g., in cong) are usually fast, but theyoccur so often that the cumulative time is significant. For example, cong onlyneeds to unfold assumption and apply reflexivity. However, this still takes asignificant amount of time.

Second, full-blown tactics (like qnt cnf) are the slowest rules are the onesrequiring full-blown tactics. For qnt cnf (CNF under quantifiers, see Sect. 4.2),we did not write specialize tactics, but rely on Isabelle’s tableau based blasttactic. This rule is rather slow, but is rarely used. It is similar for the rule la

generic: it is slow on average, but searching the coefficients to replay the proofcan take even more time.

We can also see that the time required to check the simplification steps thatwere formerly combined into the connect equiv rule is not significant anymore.

We have performed the same experiments with the reconstruction of theSMT solver Z3. Unlike for veriT, we do not have the amount of time requiredfor parsing. The results are shown in Figs. 3 and 4. The rule distribution isvery different. The nnf-neg and nnf-pos rules are the slowest rules and takea huge amount of time in the worst case. However, the more coarse quantifierinstantiation steps is faster than the one produced by veriT. We suspect thatreconstruction is faster because the rule, which is only an implication withoutchoice terms, is easier to check.

5 Note to reviewers: Figs. 5 and 6 of the appendix show the graph with all rules.

Page 12: Reliable Reconstruction of Fine-Grained Proofs in a Proof ... · We compare our procedure with the existing Z3 integration and show similar levels of robustness while solving more

12 Schurr, Fleury, and Desharnais

Table 4. Reconstruction of Sledgehammer with and without step skipping for veriT

Tactic StrategyStep skipping No step skipping

CVC4 E veriT Z3 CVC4 E veriT Z3

z-smt 672 274 458 575 673 274 441 584v-smt combined 468 160 302 343 364 114 254 237

best 192 62 106 127 98 34 60 53ccfv SIG 93 26 34 71 64 21 66 49

v-smt del insts 70 36 52 54 58 27 39 45ccfv threshold 63 20 33 43 69 14 34 43default 50 16 34 48 75 18 55 47

5.4 Step Skipping Performance

We have repeated the experiments from Sect. 5.2 without step skipping (Tab. 4).Overall, we can see that except for more v-smt calls are generated, validing howimportant step skipping is. The number of Z3 calls generated does not changemuch, indicating that v-smt is not replacing z-smt as much as it is completingit on those problems. This is particularly visible for CVC4.

Due to space constrains we cannot give detailed timings, but the most strikingexample of useless rule is the or rule. veriT distinguishes between clausesand disjunctions and uses or steps to deconstruct a disjunction into a clause.Isabelle uses only clauses and the rule translates to returning the singularelement of the premise. However, in our tests it used 1.1% of the overall time,more than the complicated la generic rule overall.

6 Related Work

The state-of-the-art SMT solvers CVC4 [8], Z3 [32], and veriT [17] produceproofs. CVC4 does not record quantifier reasoning in the proof, whereas Z3uses some macro rules. Proofs from SMT solvers have also been used to generateunsatisfiability cores [18], and interpolants [30]. They are also useful to debugthe solver itself, since unsound steps often point to the origin of bugs. Our workalso relates to systems like Dedukti [4] that focuses on translating proof steps,not on replaying them.

Proof reconstruction has been implemented in various systems, includingCVC4 proofs in HOL Light [29], Z3 in HOL4 and Isabelle/HOL [16],and veriT [3] and CVC4 [22] in Coq. Only veriT produces detailed proofsfor preprocessing and skolemization. SMTCoq currently supports veriT’s oldversion 1 of the proof output which has different rules, does not support detailedskolemization rules, and is implemented in the 2016 version of veriT, which hasworse performance. SMTCoq also supports bitvectors and arrays.

The reconstruction of Z3 proofs in HOL4 and Isabelle/HOL is one ofthe most advanced and well tested. It is regularly used by Isabelle users.The Z3 proof reconstruction succeeds in more than 90% of Sledgehammer

Page 13: Reliable Reconstruction of Fine-Grained Proofs in a Proof ... · We compare our procedure with the existing Z3 integration and show similar levels of robustness while solving more

Reliable Reconstruction of Fine-Grained Proofs in a Proof Assistant 13

0 5 10 15 20 25 30 35

congresolution

or-simplifyth-resolutionand-simplify

forall-instcontraction

comp-simplifybool-simplify

eq-simplifylia-generic

sko-exsko-forall

connective-defqnt-cnf

ite-introonepointac-simp

prod-simplifysum-simplify

bfun-elimminus-simplify

la-genericparsing

time (ms) (lower is better)

Fig. 1. Timing, sorted by median, of a subset of veriT’s rules. From left to right, thelower whisker marks the 5th percentile, the lower box line the first quartile, the middleof the box the median, the upper box line the third quartile, and the upper whisker the95th percentile.

0 2 4 6 8 10 12 14

congresolution

or-simplifyth-resolutionand-simplify

forall-instcontraction

comp-simplifybool-simplify

eq-simplifylia-generic

sko-exsko-forall

connective-defqnt-cnf

ite-introonepointac-simp

prod-simplifysum-simplify

bfun-elimminus-simplify

la-genericparsing

Percentage of time

Fig. 2. Total percentage spent on each rule for the SMT solver veriT in the sameorder as Fig. 1. This graph maps the rules already shown in Fig. 1 to the total amountof time. The slowest rule are th resolution (14.5%), parsing (10.7%), and cong (9.6%).

Page 14: Reliable Reconstruction of Fine-Grained Proofs in a Proof ... · We compare our procedure with the existing Z3 integration and show similar levels of robustness while solving more

14 Schurr, Fleury, and Desharnais

0 1 2 3 4 5 6 7 8

intro-defsk

assertedapply-def

iff-trueiff-false

mp„symm

hypothesismp

commutativitytrans

reflrewrite

elim-unusedand-elim

monotonicitylemma

def-axiomnnf-pos

unit-resolutionnot-or-elimquant-inst

quant-intropull-quant

nnf-negth-lemma-arith

time (ms) (lower is better)

Fig. 3. Timing of some of Z3’s rules sorted by median. From left to right, the lowerwhisker marks the 5th percentile, the lower box line the first quartile, the middle of thebox the median, the upper box line the third quartile, and the upper whisker the 95thpercentile. nnf-neg’s 95th percentile is 33 ms. nnf-pos’s 95th percentile is 88 ms.

0 10 20 30 40 50

assertedintro-def

skapply-def

iff-trueiff-false

mp„symm

hypothesismp

commutativitytrans

reflrewrite

elim-unusedand-elim

monotonicitylemma

def-axiomnnf-pos

unit-resolutionnot-or-elimquant-inst

quant-intropull-quant

nnf-negth-lemma-arith

Percentage of time

Fig. 4. Total amount of time per rule for the SMT solver Z3.

Page 15: Reliable Reconstruction of Fine-Grained Proofs in a Proof ... · We compare our procedure with the existing Z3 integration and show similar levels of robustness while solving more

Reliable Reconstruction of Fine-Grained Proofs in a Proof Assistant 15

benchmarks [12, Section 9] and is efficient (an older version of Z3 was used).Performance numbers are reported [14, 16] not only for problems generated byproof assistants (including Isabelle), but also for preexisting smt-lib filesfrom the smt-lib library. Unfortunately, the code to read smt-lib problemswas never included in the standard Isabelle distribution and is now lost.6

The performance study by Bohme [14, Sect. 3.4] uses version 2.15 of the SMTsolver Z3, whereas we use version 4.4.0 which currently ships with Isabelle.Since version 2.15, the proof format changed slighty (e.g., th-lemma-arith wasintroduced), fulfilling some of the wishes expressed by Bohme and Weber [16]to simplify reconstruction. Surprisingly, the nnf rules do not appear among thefive rules that used the most runtime. Instead, the th-lemma and rewrite werethe slowest rule. Similarly to veriT, the cong rule was among the most used(without accounting for the most time), but it does not appear in our tests.

The SMT solver CVC4 follows a different philosophy from veriT and Z3:it produces proofs in a logical framework with side conditions [36]. The outputcan contain programs to check certain rules. The proof format is flexible, butcurrently CVC4 does not generate proofs for quantifiers.

7 Conclusion

We presented an efficient reconstruction of proofs generated by a modern SMTsolver in an interactive theorem prover. Our improvements address reconstructionchallenges for proof steps of typical inferences performed by SMT solvers.

By studying the time required to replay each rule, we were able to comparethe reconstruction for two different proof formats with different design directions.The very detailed proof format of veriT makes the reconstruction easier toimplement and allows for more specialization of the tactics. Our experimentsindicate that for systems with strong internal automation, like Isabelle, thefine grained proof format is competitive with the coarse format of Z3 and avoidsslow reconstruction steps. However, the cost of reconstructing the big number ofsteps can be higher then using automated tactics on coarse steps. Post-processing,like our step skipping, however, can offset this downside.

Our work is integrated into Isabelle and should be part of the next release.Sledgehammer suggests v-smt if it the fastest tactic; so users profit withoutaction required on their side. We plan to improve the reconstruction of the slowestrules and remove inconsistencies in the proof format. The developers of the SMTsolver CVC4 are currently rewriting the proof generation and plan to supporta similar proof format. We hope to be able to reuse the current reconstructionby only adding new rules. Generating and reconstructing proofs from the veriTversion with higher-order logic [7] could also improve the usefulness of veriTon Isabelle problems. The current proof rule should be sufficient to expressproofs in the more expressive logic.

6 via private communication with the authors

Page 16: Reliable Reconstruction of Fine-Grained Proofs in a Proof ... · We compare our procedure with the existing Z3 integration and show similar levels of robustness while solving more

16 Schurr, Fleury, and Desharnais

Acknowledgment We would like to thank Haniel Barbosa for his support with theimplementation in veriT. We also thank Haniel Barbosa, Jasmin Blanchette,Pascal Fontaine, and the reviewers for many fruitful discussions and suggestingmany textual improvements. The first author has received funding from theEuropean Research Council (ERC) under the European Union’s Horizon 2020research and innovation program (grant agreement No. 713999, Matryoshka).The second author is supported by the LIT AI Lab funded by the State of UpperAustria. The training presented in this paper was carried out using the Grid’5000testbed, supported by a scientific interest group hosted by Inria and includingCNRS, RENATER and several Universities as well as other organizations (seehttps://www.grid5000.fr).

References

1. Abdulaziz, M., Paulson, L.C.: An Isabelle/HOL formalisation of Green’s theorem.Archive of Formal Proofs (Jan 2018),https://isa-afp.org/entries/Green.html, formal proof development

2. Abdulaziz, M., Paulson, L.C.: An Isabelle/HOL formalisation of Green’s theorem.Journal of Automated Reasoning 63(3), 763–786 (Nov 2019).https://doi.org/10.1007/s10817-018-9495-z

3. Armand, M., Faure, G., Gregoire, B., Keller, C., Thery, L., Werner, B.: A modularintegration of SAT/SMT solvers to Coq through proof witnesses. In: Jouannaud,J.P., Shao, Z. (eds.) CPP 2011. LNCS, vol. 7086, pp. 135–150. Springer BerlinHeidelberg (2011). https://doi.org/10.1007/978-3-642-25379-9 12

4. Assaf, A., Burel, G., Cauderlier, R., Delahaye, D., Dowek, G., Dubois, C., Gilbert,F., Halmagrand, P., Hermant, O., Saillard, R.: Expressing theories in theλπ-calculus modulo theory and in the dedukti system. In: TYPES: Types forProofs and Programs. Novi SAd, Serbia (May 2016)

5. Barbosa, H.: Efficient instantiation techniques in SMT (work in progress). CEURWorkshop Proceedings, vol. 1635, pp. 1–10. CEUR-WS.org (Jul 2016),http://ceur-ws.org/Vol-1635/#paper-01

6. Barbosa, H., Blanchette, J.C., Fleury, M., Fontaine, P.: Scalable fine-grainedproofs for formula processing. Journal of Automated Reasoning (Jan 2019).https://doi.org/10.1007/s10817-018-09502-y

7. Barbosa, H., Reynolds, A., Ouraoui, D.E., Tinelli, C., Barrett, C.W.: ExtendingSMT solvers to higher-order logic. In: Fontaine, P. (ed.) CADE 27. LNCS, vol.11716, pp. 35–54. Springer International Publishing (2019).https://doi.org/10.1007/978-3-030-29436-6 3

8. Barrett, C., Conway, C.L., Deters, M., Hadarean, L., Jovanovic, D., King, T.,Reynolds, A., Tinelli, C.: CVC4. In: Gopalakrishnan, G., Qadeer, S. (eds.) CAV2011. LNCS, vol. 6806, pp. 171–177. Springer Berlin Heidelberg (2011).https://doi.org/10.1007/978-3-642-22110-1 14

9. Barrett, C., Fontaine, P., Tinelli, C.: The SMT-LIB Standard: Version 2.6. Tech.rep., Department of Computer Science, The University of Iowa (2017), available atwww.SMT-LIB.org

10. Barrett, C.W., Tinelli, C.: Satisfiability modulo theories. In: Clarke, E.M.,Henzinger, T.A., Veith, H., Bloem, R. (eds.) Handbook of Model Checking, pp.305–343. Springer International Publishing, Cham (2018).https://doi.org/10.1007/978-3-319-10575-8 11

Page 17: Reliable Reconstruction of Fine-Grained Proofs in a Proof ... · We compare our procedure with the existing Z3 integration and show similar levels of robustness while solving more

Reliable Reconstruction of Fine-Grained Proofs in a Proof Assistant 17

11. Becker, H., Blanchette, J.C., Waldmann, U., Wand, D.: Formalization ofKnuth–Bendix orders for lambda-free higher-order terms. Archive of FormalProofs (Nov 2016), https://isa-afp.org/entries/Lambda_Free_KBOs.html,formal proof development

12. Blanchette, J.C., Bohme, S., Fleury, M., Smolka, S.J., Steckermeier, A.:Semi-intelligible Isar proofs from machine-generated proofs. Journal of AutomatedReasoning 56(2), 155–200 (2016). https://doi.org/10.1007/s10817-015-9335-3

13. Blanchette, J.C., Bohme, S., Paulson, L.C.: Extending Sledgehammer with smtsolvers. In: Bjørner, N., Sofronie-Stokkermans, V. (eds.) CADE 23. LNCS,vol. 6803, pp. 116–130. Springer Berlin Heidelberg (2011).https://doi.org/10.1007/978-3-642-22438-6 11

14. Bohme, S.: Proving Theorems of Higher-Order Logic with SMT Solvers. Ph.D.thesis, Technische Universitat Munchen (2012),http://mediatum.ub.tum.de/node?id=1084525

15. Bohme, S., Nipkow, T.: Sledgehammer: Judgement day. In: Giesl, J., Hahnle, R.(eds.) IJCAR 2010. pp. 107–121. Springer Berlin Heidelberg (2010).https://doi.org/10.1007/978-3-642-14203-1 9

16. Bohme, S., Weber, T.: Fast LCF-style proof reconstruction for Z3. In: Kaufmann,M., Paulson, L.C. (eds.) ITP 2010. LNCS, vol. 6172, pp. 179–194. Springer BerlinHeidelberg (2010). https://doi.org/10.1007/978-3-642-14052-5 14

17. Bouton, T., de Oliveira, D.C.B., Deharbe, D., Fontaine, P.: veriT: An open,trustable and efficient SMT-solver. In: Schmidt, R.A. (ed.) CADE 22. LNCS,vol. 5663, pp. 151–156. Springer Berlin Heidelberg (2009).https://doi.org/10.1007/978-3-642-02959-2 12

18. Deharbe, D., Fontaine, P., Guyot, Y., Voisin, L.: SMT solvers for Rodin. In:Derrick, J., Fitzgerald, J.A., Gnesi, S., Khurshid, S., Leuschel, M., Reeves, S.,Riccobene, E. (eds.) ABZ 2012. LNCS, vol. 7316, pp. 194–207. Springer BerlinHeidelberg (Jun 2012). https://doi.org/10.1007/978-3-642-30885-7 14

19. Dutertre, B., de Moura, L.: Integrating simplex with DPLL(T). Tech. rep., SRIInternational (May 2006),http://www.csl.sri.com/users/bruno/publis/sri-csl-06-01.pdf

20. Eberl, M.: Elementary facts about the distribution of primes. Archive of FormalProofs (Feb 2019),https://isa-afp.org/entries/Prime_Distribution_Elementary.html, formalproof development

21. Eberl, M., Paulson, L.C.: The prime number theorem. Archive of Formal Proofs(Sep 2018), https://isa-afp.org/entries/Prime_Number_Theorem.html, formalproof development

22. Ekici, B., Mebsout, A., Tinelli, C., Keller, C., Katz, G., Reynolds, A., Barrett,C.W.: SMTCoq: A plug-in for integrating SMT solvers into Coq. In: Majumdar,R., Kuncak, V. (eds.) CAV 2017. LNCS, vol. 10427, pp. 126–133. SpringerInternational Publishing (2017). https://doi.org/10.1007/978-3-319-63390-9 7

23. Fleury, M., Schurr, H.: Reconstructing veriT proofs in Isabelle/HOL. In: Reis, G.,Barbosa, H. (eds.) PxTP 2019. EPTCS, vol. 301, pp. 36–50 (2019).https://doi.org/10.4204/EPTCS.301.6

24. Gordon, M.J.C., Milner, R., Wadsworth, C.P.: Edinburgh LCF: A MechanisedLogic of Computation, LNCS, vol. 78. Springer Berlin Heidelberg (1979).https://doi.org/10.1007/3-540-09724-4

25. Immler, F.: Re: [isabelle] Isabelle2019-RC2 sporadic smt failures. Email (May2019), https://lists.cam.ac.uk/pipermail/cl-isabelle-users/2019-May/msg00130.html

Page 18: Reliable Reconstruction of Fine-Grained Proofs in a Proof ... · We compare our procedure with the existing Z3 integration and show similar levels of robustness while solving more

18 Schurr, Fleury, and Desharnais

26. Kuhlwein, D., Blanchette, J.C., Kaliszyk, C., Urban, J.: Mash: Machine learningfor Sledgehammer. In: ITP. LNCS, vol. 7998, pp. 35–50. Springer (2013)

27. Leino, K.R.M., Pit-Claudel, C.: Trigger selection strategies to stabilize programverifiers. In: Chaudhuri, S., Farzan, A. (eds.) CAV 2016. LNCS, vol. 9779, pp.361–381. Springer International Publishing (2016).https://doi.org/10.1007/978-3-319-41528-4 20

28. Maric, F., Spasic, M., Thiemann, R.: An incremental simplex algorithm withunsatisfiable core generation. Archive of Formal Proofs (Aug 2018),https://isa-afp.org/entries/Simplex.html, formal proof development

29. McLaughlin, S., Barrett, C., Ge, Y.: Cooperating theorem provers: A case studycombining HOL-Light and CVC Lite. Electronic Notes in Theoretical ComputerScience 144(2), 43–51 (2006). https://doi.org/10.1016/j.entcs.2005.12.005

30. McMillan, K.L.: Interpolants from Z3 proofs. In: FMCAD 2011. pp. 19–27.FMCAD Inc, Austin, Texas (2011)

31. Meng, J., Paulson, L.C.: Lightweight relevance filtering for machine-generatedresolution problems. J. Appl. Log. 7(1), 41–57 (2009)

32. de Moura, L., Bjørner, N.: Z3: An efficient SMT solver. In: Ramakrishnan, C.R.,Rehof, J. (eds.) TACAS 2008. LNCS, vol. 4963, pp. 337–340. Springer BerlinHeidelberg (2008). https://doi.org/10.1007/978-3-540-78800-3 24

33. Reynolds, A., Barbosa, H., Fontaine, P.: Revisiting enumerative instantiation. In:Beyer, D., Huisman, M. (eds.) TACAS 2018. LNCS, vol. 10806, pp. 112–131.Springer International Publishing (2018).https://doi.org/10.1007/978-3-319-89963-3 7

34. Schlichtkrull, A., Blanchette, J.C., Traytel, D., Waldmann, U.: Formalization ofBachmair and Ganzinger’s ordered resolution prover. Archive of Formal Proofs(Jan 2018), https://isa-afp.org/entries/Ordered_Resolution_Prover.html,formal proof development

35. Schulz, S.: E - a brainiac theorem prover. AI Communications 15(2-3), 111–126(2002), http://content.iospress.com/articles/ai-communications/aic260

36. Stump, A., Oe, D., Reynolds, A., Hadarean, L., Tinelli, C.: SMT proof checkingusing a logical framework. Formal Methods in System Design 42(1), 91–118 (Feb2013). https://doi.org/10.1007/s10703-012-0163-3

37. The veriT Team and Contributors: Proofonomicon: A reference of the veriT proofformat. Software Documentation (2020),https://verit.loria.fr/documentation/proofonomicon.pdf, last Accessed:September 2020

Page 19: Reliable Reconstruction of Fine-Grained Proofs in a Proof ... · We compare our procedure with the existing Z3 integration and show similar levels of robustness while solving more

Reliable Reconstruction of Fine-Grained Proofs in a Proof Assistant 19

8 Appendix

Figure 5 ist the complete version of Figure 1. Figure 6 is the complete version ofFigure 2.

Page 20: Reliable Reconstruction of Fine-Grained Proofs in a Proof ... · We compare our procedure with the existing Z3 integration and show similar levels of robustness while solving more

20 Schurr, Fleury, and Desharnais

0 5 10 15 20 25 30 35

falseeq-reflexive

choiceequiv1

not-simplifyequiv-neg2

la-disequalityequiv-neg1

–local-inputrefl

equiv-pos1and

transequiv2implies

ite2not-equiv1

ite1not-or

not-equiv2equiv-simplify

–theory-resolution2implies-neg1implies-neg2

qnt-rm-unusedresolution

implies-simplifycong

ite-simplifyeq-congruent

ite-neg2and-negand-pos

eq-congruent-predimplies-pos

or-posite-neg1not-not

equiv-pos2not-andite-pos1

qnt-simplifyite-pos2

inputbind

eq-transitiveor-neg

subproof–normalized-input

or-simplifyth-resolutionand-simplify

forall-instcontraction

comp-simplifybool-simplify

eq-simplifylia-generic

sko-exsko-forall

connective-defqnt-cnf

ite-introonepointac-simp

prod-simplifysum-simplify

bfun-elimminus-simplify

la-genericparsing

time (ms) (lower is better)

Fig. 5. Timing of veriT rules sorted by median. From left to right, the lower whiskermarks the 5th percentile, the lower box line the first quartile, the middle of the box themedian, the upper box line the third quartile, and the upper whisker the 95th percentile.

Page 21: Reliable Reconstruction of Fine-Grained Proofs in a Proof ... · We compare our procedure with the existing Z3 integration and show similar levels of robustness while solving more

Reliable Reconstruction of Fine-Grained Proofs in a Proof Assistant 21

0 2 4 6 8 10 12 14

falseeq-reflexive

choiceequiv1

not-simplifyequiv-neg2

la-disequalityequiv-neg1

–local-inputrefl

equiv-pos1and

transequiv2implies

ite2not-equiv1

ite1not-or

equiv-simplifynot-equiv2

–theory-resolution2implies-neg1implies-neg2

qnt-rm-unusedresolution

implies-simplifycong

ite-simplifyeq-congruent

ite-neg2and-negand-pos

eq-congruent-predimplies-pos

or-posite-neg1not-not

equiv-pos2not-andite-pos1

qnt-simplifyite-pos2

inputbind

eq-transitiveor-neg

subproof–normalized-input

or-simplifyth-resolutionand-simplify

forall-instcontraction

comp-simplifybool-simplify

eq-simplifylia-generic

sko-exsko-forall

connective-defqnt-cnf

ite-introonepointac-simp

prod-simplifysum-simplify

bfun-elimminus-simplify

la-genericparsing

Percentage of time

Fig. 6. Percentage of the total time spent per rule for the SMT solver veriT.