Consequence Generation, Interpolants, and Invariant Discovery Ken McMillan Cadence Berkeley Labs.
-
Upload
lillian-hart -
Category
Documents
-
view
220 -
download
2
Transcript of Consequence Generation, Interpolants, and Invariant Discovery Ken McMillan Cadence Berkeley Labs.
Consequence Generation,Interpolants, and
Invariant Discovery
Ken McMillan
Cadence Berkeley Labs
Automated abstraction• Abstraction means throwing away information about a system not needed to prove a given
property.• Automated abstraction has become a key element in the practice of model checking.
– Verification of sequential circuits• without abstraction, up to about 100 registers• with abstraction, > 200,000 registers!
– Software model checking• without abstraction, finite-state only• with abstraction, infinite-state, > 100,000 loc
Note, we are talking about very shallow properties!
Predicate Abstraction• Terminology: A safety invariant is an inductive invariant that implies the safety condition of a program.• Given a set of atomic predicates P, construct the strongest inductive invariant of a program expressible as a Boolean
combination of P.– That is, we are restricted to the language of Boolean combinations of P.
• Example: let P = {i=j,x=y}
x=i; y=j;while(x!=0) {x--; y--;}if (i == j) assert y==0;
But where do the predicates come from?
strongest inductive invariant:
i = j ) x = y
Graf Saïdi
Iterative refinement
• CounterExample Guided Abstraction Refinement (CEGAR)– Diagnostic information is an abstract counterexample
– Refinement adds information sufficient to refute counterexample.
– In the infinite state case this refinement loop can diverge. This talk is concerned with avoiding diverging of the refinement loop, thus
– guaranteeing a limited kind of completeness.
RefineAbstraction
VerifyAbstraction
new abstraction
safe
safe
diagnostic information
Completeness of abstraction• An abstraction is a restricted language L
– Example: predicate abstraction (without refinement)
• L is the language of Boolean combinations of predicates in P
– We try to compute the strongest inductive invariant of a program in L
• An abstraction refinement heuristic chooses a sequence of sublangauges L0 µ L1,... from a broader langauge L.
– Example: predicate abstraction with refinement
• L is the set of quantifier-free FO formulas (QF)
• Li is characterized by a set of atomic predictes Pi
• Completeness relative to a language:
An abstraction refinement heuristic is complete for language L, iff it always eventually chooses a sublanguage Li µ L containing a safety invariant whenever L contains a safety invariant.
Where do abstractions come from?• Existing methods based on the idea of generalizing from the proof of
particular cases.
Heuristic: Information that is used to prove a particularcase is likely to be useful in the general case.
• Examples• Prove all executions of just k steps are safe
• Prove a particular program path is safe
• Refute a particular "abstract counterexample"
x=i,y=j
[x!=0]x--, y--
[x==0][i==j][y!=0]
Error!
Structured Proofs• A sequence of formulas assigned to the states of a program path, s.t.
– Each is a postcondition of its predecessor
– Starts with true, ends with false
• Example, path that executes our program loop once.
x=i; y=j;while(x!=0) {x--; y--;}if (i == j) assert y==0;
False
i0=j0 ) x1=y1
True
i0=j0 ) x2=y2
Extract predicatesfrom proof:
P = {i=j,x=y}
x1=i0,y1=j0
x1 0x2=x1-1y2=y1-1
x2=0i0 = j0,y2 0
SSA form!
Good proofs and bad proofs• Bad example: refute path using "weakest precondition"
x=i; y=j;while(x!=0) {x--; y--;}if (i == j) assert y==0;
x=i,y=j
[x!=0]x--, y--
[x==0][i==j][y!=0]
Error!False
True
i=j Æ x=0 ) y=0
Extract predicatesfrom proof:
P = {i=j, x=0, y=0 x=1, y=1...}
i=j Æ x=1 ) y=1
As we unwind the loop further, these predicates diverge...
Two questions• How to generate structured proofs
– Proofs generated by decision procedures will not be structured
– Solution: We can rewrite unstructured proofs into structured ones
• How to guarantee completeness– Bad proofs lead to divergence
– Solution: a structured prover
Interpolation Lemma• Notation: L() is the set of FO formulas over the symbols of • If A B = false, there exists an interpolant A' for (A,B) such that:
A A'A' B = falseA' 2 L(A) Å L(B)
• Example: – A = p q, B = q r, A' = q
• Interpolants from proofs– in certain quantifier-free theories, we can obtain an interpolant for a
pair A,B from a refutation in linear time. [TACAS05]– in particular, we can have linear arithmetic,uninterpreted functions,
and arrays
(Craig,57)
Interpolants for sequences• Let A1...An be a sequence of formulas
• A sequence A’0...A’n is an interpolant for A1...An when
– A’0 = True
– A’i-1 Æ Ai ) A’i, for i = 1..n
– An = False
– and finally, A’i 2 L (A1...Ai) Å L(Ai+1...An)
A1 A2 A3 Ak...
A'1 A'2 A'3 A'k-1...True False) ) ) )
In other words, the interpolant is a structured
refutation of A1...An
Structured proofs are interpolants
x=i,y=j
[x!=0]x--, y--
[x==0][i==j][y!=0]
False
i0=j0 ) x1=y1
Truex1=i0,y1=j0
x1 0x2=x1-1y2=y1-1
x2=0i0 = j0,y2 0
i0=j0 ) x2=y2
))
)
1. Each formula implies the next
2. Each is over common symbols of prefix and suffix
3. Begins with true, ends with false
Abstraction refinement procedure
SSAsequence Prover
Interpolation
Extractpredicates
proof structuredproof
idea: R. Jhala
Enforcing completeness
x=0
x=1
x=2
L
Lattice of sublanguages
x=yL0
L1
L2
...
1. Stratify L into finite languages L0µL1µ
2. Refute counterexampleat lowest possible level
If a saftey invariant exists in Lk, then we never exit Lk. Sincethis is f finite language, abstraction refinement must converge.
Restriction Language Example• Difference-bound formulas
– Let Lk be the Boolean combinations of constraints of the form:
x· y + c, or x · c,
where |c| · k.
• Restrict the interpolants to L0
False
i0=j0 ) x1=y1
Truex1=i0,y1=j0
x1 0x2=x1-1y2=y1-1
x2=0i0 = j0,y2 0
i0=j0 ) x2=y2
L0-restricted:
False
True
i0=j0 Æ x2=0 ) y2=0
i0=j2 Æ x1=1 ) y1=1
not L0-restricted:
x1=i0,y1=j0
x1 0x2=x1-1y2=y1-1
x2=0i0 = j0,y2 0
Restriction forces us to generalize!
Consequence finders• A consequence finder takes as input a set of hypothese and returns a
set of consequences of .• Consequence finder R is complete for L-generation iff, for any 2 L
² implies R()Å L²
That is, the consequence finder need not generate all consequencesof in L, but the generated L-consequences must imply all others.
[McIlraith & Amir, 2001]
Split proverDivide the prover into a sequence of communicating consequence finders...
R1 R2 R3 Rn
• Each Ri knows just i
• Ri and Ri+1 exchange only facts in L(A1...Ai)ÅL(Ai+1...An)
• iRi is the composition of the Ri’s
Theorem: If each Ri is complete for L(Ai+1...An)-generation, then
iRi is complete for refutation [McIlraith & Amir, 2001].
L-restricted split prover
• In the L-restricted composition, LRi, the provers can exchange only
formulas in L.
R1 R2 R3 Rn
Theorem: If each Ri is complete for LÅL(Ai+1...An)-generation, then
Ai+1...An has an L-interpolant exactly when is refuted by LRi.
L L L L
Moreover, the refutation generated by LRi, induces
an L-interpolant.
L-restricted interpolants
• That is, if we can build complete consequence generators for some restriction language L, we have a complete procedure to generate L-restricted interpolants.
LRi
Interpolation
L-restrictedinterpolant
splitproof
structuredproof
L-restrictedsplit prover
SSAsequence
Complete abstraction heuristic• Given finite languages L0 µ L1, µ where [ Li = QF...
Theorem: This procedure is complete for QF invariants. That is, if asafety invariant exists in QF, we conclude "safe".
PredAbs
P={}
safe
program path not refutable with P
Lk-restricted
interpolant?no
k=k+1
add AP's of interpolant to P
yes
k=0
Proof idea• Let be a program path ending in an error location.
• Let 2 Lk be a safety invariant of the program.
• Then n-1 is an Lk-interpolant
• Thus, the split prover must find an Lk-interpolant
• Moreover, must contain an AP not in P– (else predicate abstraction would have refute the path with P)
• Thus, we must add some AP in Lk to P at each iteration
• This must terminate, since Lk is a finite langauge
– (over the program variables)
Building a split prover• First you have to choose your hierarchy L0,L1,...• We will consider QF formulas with
– integer difference bound constraints (e.g., x · y + c)– equality and uninterpreted functions– restrict use of array operations "select" and "store"
• These are sufficient to prove simple properties of programs with arrays• Our restriction language Lk will be determined by
– The finite set CD of allowed constants c in x · y + c– The finite set CB of allowed constants c in x · c– The bound bf on the depth of nesting of function sybols
For a finite vocabulary, Lk is finite and every formula is included
in some Lk.
Lazy architecture
• Note: propositional part of refutation has non-local steps, but generated interpolant is still L-restricted, because propositional interpolation rules don’t introduce new atomic predicates.
SATsolver
GroundDecision
Procedure
ConstraintsSatisfyingminterms
Refutations
Refutation
L-restrictedSplit
Prover
SplitRefutations
Interpolation
Prover architecture• Lazy approach means split prover must refute only minterms• Convexity: theory is convex if Horn consequences are complete
– In convex case, provers only exchange literals [Nelson & Oppen, 1980]
• Simple proof rules for complete unit consequence finding in Lk
– For EUF, us Shostak-style canonizer• order terms appropriately, to generate desired consequences
– For linear arithmetic, use Fourier-Motzkin• need weakening rule: x· y + c ! x· y + d, if c < d• in case of a non-convexity, split cases in SAT solver• integers and array store operations introduce non-convexities.
• Multiple theories handled by hierarchical decomposition
=,f
·+
=,f
·+
=,f
·+
=,f
·+
These and other optimizations can result in a relatively efficient prover...
Performance comparison• Refuting counterexamples for two hardest Windows device driver
examples in the Blast benchmark set.• Compare split prover against Nelson-Oppen style, same SAT solver
Some "trivial" benchmarks
main(){ char x[*], z[*]; int from,to,i,j,k; i = from; j = 0; while(x[i] != 0 && i < to){ z[j] = x[i]; i++; j++; }
/* prove strlen(z) >= j */ assert !(k >= 0 && k < j && z[k] == 0);}
example: substring copy
Resultsexample SatAbs Magic Blast Blast (new)
simple loop X X
array copy X
two loops X X
array fill (increment) X
array fill (fixed size) X X
zero fill X X
scan for zero X X
string overflow X X
string concat (size) X
string concat (ovfl) X X slow
string copy X X
substring (size) X X
substring (ovfl) X X
X = refine fail, = bug, = diverge, TO = timeout, = verified safe
Summary• An abstraction refinement heuristic is complete for language L if it
guarantees to find a safety invariant if one exists in L• Existing PA heuristics are incomplete and diverge on trivial programs• CEGAR can be made complete by...
– Stratifying L into a hierarchy of finite sublanguages L0, L1, ...
– Refuting counterexamples with Lk-restricted split prover
– Staying at the lowest possible level of the hierarchy
• A split prover can be made efficient enough to use in practice– (at least for some useful theories)
• Theoretical completeness can acutally lead to improved practical performance.
Abstraction in infinite lattices
• In a lattice of infinite height, fixed point computation may not converge ) widening ) incompleteness
x¸0
x¸1
x¸2
L
L0
µ µ µ ....
L1
L2
L
By stratifying L into a sequence of lattices of finite hieght, we avoidwidening, and guarantee to find a fixed point proving a given property.
(Though not the least fixed point)
Pure interpolant approach• Progressively unfold the program, computing interpolants...
I F
T
I F
T T
I F
T T T
...
• If program has a safety invariant in QF, interpolants must eventually contain a safety invariant.– This gives an alternative to predicate abstraction [CAV06]– Note, this is neither a fixed point iteration, nor a parameterized approach.
Quantified invariants• Even very simple properties often require quantified invariants• This can be handled be a method called “indexed predicate abstraction”
– Predicates can contain “index variables” the are implicitly quantified
– Computes strongest quantified inductive invariant expressible as a Boolean combination of the the given atomic predicates
• To obtain completeness in this case, we need to restrict the number of quantifiers in Lk (else Lk is not finite, and we may diverge)
• Questions:– Is there a resolution strategy that is complete for consequence generation
with a restricted number of free variables?
– Can we extend to richer theories, including, e.g., transitive closure?