SAT/SMT Solvers and Their Applications - IIT Bombay · 2017. 3. 2. · CFDVS 2017 Ashutosh Gupta...

cbna CFDVS 2017 Ashutosh Gupta TIFR, India 1

SAT/SMT Solversand

Their Applications

Ashutosh Gupta

TIFR, India

Compile date: 2017-03-01

http://creativecommons.org/licenses/by-nc-sa/4.0/

http://www.tcs.tifr.res.in/~agupta/



Logic is the backbone of formal methods

Differential equationsare the calculus of

Electrical engineering

Logicis the calculus ofFormal methods

Logic provides tools to define/manipulate computational objects




Topic 1.1

SAT problem




Example: SAT problem

Let x , y be rational variables.

Choose a value of x and y such that the following formula holds true.

x + y = 3

We say{x 7→ 1, y 7→ 2} |= x + y = 3

Commentary: We are not calling x and y rational numbers. They are not numbers. They are symbols that can hold numbers.




Example: SAT problem(contd.)

Let x , y be rational variables.

Choose a value of x and y such that the following formula holds true.

x + y = 3 ∧ y > 10 ∧ x > 0 theory formulas Eas

y

x + y = 3 ∧ y > 10 ∧ (x > 0 ∨ x < −4) Quantifier-free Har

d

∀y . x + y = 3 ∧ y > 10 ∧ (x > 0 ∨ x < −4) quantified formulas

Imp

ossi

ble

Commentary: The above are increasingly harder class of satisfiability problems.




Solvers for Quantifier-free formulas

We will look at satisfiability solvers for the quantifier-free formulas thatconsists of

I Theory atoms

I Boolean structure

Example 1.1

x + y = 3 ∧ y > 10 ∧ (x > 0 ∨ x < −4)

Theory atoms Boolean Structure




A comment on theories

Theory is a technical name for the subject of interest.

I Rationals

I Integers

I Reals

I Floats

I Arrays

I Chairs

I Cartoons

Let us stick to rational/integer arithmetic in this talk.

Theory is a verygeneral concept.




Propositional formulas

Propositional formulas are a special case, where the theory atoms areBoolean variables.

Example 1.2

Let p1, p2, p3 be Boolean variables.

p1 ∧ ¬p2 ∧ (p3 ∨ p2)

A satisfying assignment.

{p1 7→ 1, p2 7→ 0, p3 7→ 1} |= p1 ∧ ¬p2 ∧ (p3 ∨ p2)




A bit of jargon

I Solvers for quantifier-free propositional formulas are called

SAT solvers.I Solvers for quantifier-free formulas with the other theories are called

SMT solvers.SMT = satisfiability modulo theory




Topic 1.2

SAT problems are everywhere




SAT problems

Every field of S&T encounters SAT problem of quantifier-free formulas.

A few are listed here

I Hardware verification and design assistanceAlmost all hardware/EDA companies have their own SAT solver

I Planning: many resource allocation problems are convertible to SAT

I Security: analysis of crypto algorithms

I Solving hard problems, e. g., travelling salesman problem

I Sampling/counting




Example: Solving Sudoku using SAT solvers

Example 1.3I Variables: vi ,j ,k ∈ B and i , j , k ∈ {1, ...., 9}I If vi ,j ,k = 1, column i and row j contains k.

I Value in each cell is valid:9∑

k=1

vi ,j ,k = 1 i , j ∈ {1, .., 9}

I Each value used exactly once in each row:9∑

i=1

vi ,j ,k = 1 j , k ∈ {1, .., 9}

I Each value used exactly once in each column:9∑

j=1

vi ,j ,k = 1 i , k ∈ {1, .., 9}

I Each value used exactly once in each 3× 3 grid3∑

s=1

3∑r=1

v3i+r ,j+s,k = 1 i , j ∈ {0, 1, 2}, k ∈ {1, .., 9}




Encoding x1 + .... + xk = 1

I At least one of xi is true

(x1 ∨ .... ∨ xk)

I Not more than two xi s are true

(¬xi ∨ ¬xj) i , j ∈ {1, .., 9}




SMT problem in bug detection

Example 1.4

Consider program

foo(x,y) {

u=x+y;

if (u!=1)

z=2;

else

z=u+1;

u = y/z;//avoid divide by 0

return u;

}

The following formula in quantifier-free linear integer arithmetic encodesthe program behaviors

u = x + y∧(u = 1 ∧ z = 2 ∨ u 6= 1 ∧ z = u + 1)∧

z = 0

If the above formula is sat, the pro-gram has a bug

Detailed presentation will be given on Tuesday




Topic 1.3

Rise of Solvers




Rise of SAT/SMT solvers

SAT solving is theoretically known to be a hard problem.

However, it did not stop researchers to attempt building practical solvers.

I In early 2000s, stable and scalable SAT/SMT solvers started appearing.e.g., zChaff, Yiecs

I SAT/SMT competitions became a driving force in their ever increasingefficiency

I Formal methods community quickly realized their potential

I Z3, one of the leading SMT solver, alone has about 3000+ citations(375 per year)(June 2016)




Efficiency of SAT solvers over the years

Source: http://satsmt2014.forsyte.at/files/2014/07/SAT-introduction.pdf

Cactus plot:

Y-axis: time out

X-axis: number of solved problems

Color: a competing solver



http://satsmt2014.forsyte.at/files/2014/07/SAT-introduction.pdf


SAT technology: quite revolution

Impact is enormous.

Probably, one of the greatest achievement of the first decade of this century

All verification tools depends on the solvers.




Topic 1.4

SAT solver




Some terminologyI Propositional variables are also referred as atomsI A literal is either an atom or its negationI A clause is a disjunction of literals.I A formula is in CNF if it is a conjunction of clauses.

Example 1.5

I p is an atom but ¬p is not.

I ¬p and p both are literals.

I p ∨ ¬p ∨ p ∨ q is a clause.

I ¬p and p both are in CNF.

I (p ∨ ¬q) ∧ (r ∨ ¬q) ∧ ¬r is in CNF.

I (p ∨ ¬q) ∧ ((r ∧ ¬p) ∨ ¬q) ∧ ¬r is not in CNF.

Definition 1.1Let atoms(F ) denote the set of atoms appearing in F .




Partial model

Definition 1.2For a CNF F , A partial model m is an ordered partial map from atoms(F ) toB.

Example 1.6

partial models m1 = {x 7→ 0, y 7→ 1} and m2 = {y 7→ 1, x 7→ 0} are notsame.




Some notation

Before presenting the solvers, let us define some notations.

Under partial model m,

A literal ` is true if m(`) = 1 and` is false if m(`) = 0.Otherwise, ` is undefined.

A clause C is true if there is ` ∈ C s.t. ` is true andC is false if for each ` ∈ C , ` is false.Otherwise, C is undefined.

CNF F is true if for each C ∈ F , C is true andF is false if there is C ∈ F s.t. C is false.Otherwise, F is undefined.




Unit clause and unit literal

Definition 1.3C is a unit clause under m if a literal ` ∈ C is undefined and the rest are false.` is called unit literal.




DPLL (Davis-Putnam-Loveland-Logemann)Algorithm 1.1: DPLL(F,m)

Input: CNF F , partial model m1 if F is true under m then2 return sat

3 if F is false under m then4 return unsat

5 if ∃ unit literal x under m then6 return DPLL(F ,m[x 7→ 1])

7 if ∃ unit literal ¬x under m then8 return DPLL(F ,m[x 7→ 0])

9 Choose an undefined x ;10 if DPLL(F ,m[x 7→ 0]) == sat then11 return sat12 else13 return DPLL(F ,m[x 7→ 1])

Backtracking atconflict




Example: Brancing and bracktracking in DPLL

Example 1.7

c1 = (¬p1 ∨ p2)

c2 = (¬p1 ∨ p3 ∨ p5)

c3 = (¬p2 ∨ p4)

c4 = (¬p3 ∨ ¬p4)

c5 = (p1 ∨ p5 ∨ ¬p2)

c6 = (p2 ∨ p3)

c7 = (p2 ∨ ¬p3)

c8 = (p6 ∨ ¬p5)

p6

p5

0

p1

0, c8

p3

1

p2

1, c2

p4

1, c1

p3

1, c3

conflict

0, c4

..0

Backtrackto the lastdecision

Decisionvariable

Propagatedvariable

Exercise 1.1Complete the DPLL run




Optimizations

There are various optimizations in implementing DPLL

We will discuss only four optimizations.

I clause learning

I 2-watched literals

I variable ordering

I restarts




Topic 1.5

Clause learning




Clause learning

As we decide and propagate, we may construct a data structure that allowsus to do efficient back tracking.

Definition 1.4 (implication graph)

An implication graph is a labeled directed graph (N,E ), where

I N contains true literals and a conflict node to denote contradiction

I E = {(`1, `2)|¬`1 ∈ clause(`2)}clause(`) , clause due to which unit propagation made ` trueNote: For decision literals clause(`) is undefined

Note: Not same definition as defined for 2-SAT!

We also annotate each node with decision level (e. g., ¬p@3), i.e., thenumber of decisions after which the variable was assigned




Example: implication graph

Example 1.8

c1 = (¬p1 ∨ p2)

c2 = (¬p1 ∨ p3 ∨ p5)

c3 = (¬p2 ∨ p4)

c4 = (¬p3 ∨ ¬p4)

c5 = (p1 ∨ p5 ∨ ¬p2)

c6 = (p2 ∨ p3)

c7 = (p2 ∨ ¬p3∨p7)

c8 = (p6 ∨ ¬p5)

Note: Modified example

p6

p5

0

p7

p1

0

0, c8

p3

1

p2

1, c2

p4

1, c1

p3

1, c3

conflict0, c4

Implication graph

¬p6@1

¬p5@1

c8

¬p7@2 p1@3

p3@3

c2 c2

p2@3

c1

p4@3

c3

conflict

c4

c4




Conflict clause

In the case of conflict, we traverse the implication graph backwards to findthe set of decisions that caused the conflict.

The clause of the negations of the decisions is called conflict clause.

Example 1.9¬p6@1

¬p5@1

c8

¬p7@2 p1@3

p3@3

c2 c2

p2@3

c1

p4@3

c3

conflictc4

c4

Conflict clause : p6 ∨ ¬p1




Clause learning

Clause learning heuristics

I add conflict clause in the input clauses and

I backtrack to the second last conflicting decision, and proceed like DPLL




Benefit of adding conflict clauses1. Prunes away search space2. Records past work of the SAT solver3. Enables very many other heuristics without much complications.

We will see them shortly.

Example 1.10

In the previous example, we made decisions :m(p6) = 0, m(p7) = 0, and m(p1) = 1

We learned a conflict clause : p6 ∨ ¬p1

Adding this clause to the input clauses results in

1. m(p6) = 0, m(p7) = 1, and m(p1) = 1 will never be tried

2. m(p6) = 0 and m(p1) = 1 will never occur simultaneously.

Impact of clause learning was so profound that some people call the optimizedalgorithm CDCL(conflict driven clause learning) instead of DPLL




CDCL as an algorithmAlgorithm 1.2: CDCLInput: CNF F

1 AddClauses(F ); m := UnitPropagation(); dl := 0; dstack := λx .0;2 do3 // backtracking4 while ∃x {x 7→ 0, x 7→ 1} ⊆ m do5 if dl = 0 then return unsat;6 (C , dl) := AnalyzeConflict(m);7 m.resize(dstack(dl)); AddClauses({C}); m := UnitPropagation();

8 // Boolean decision9 if m is partial then

10 dstack(dl) := m.size();11 dl := dl + 1; m := Decide(); m := UnitPropagation() ;

12 while m is partial or ∃x {x 7→ 0, x 7→ 1} ⊆ m;13 return sat

I AddClauses(Cs) - adds Cs to the current set of problem clauses

I UnitPropagation() - applies unit propagation and extends m as much as possible

I Decide() - chooses an undefined variable in m and assigns a Boolean value

I AnalyzeConflict() - returns a conflict clause learned using implication graph anda decision level for back tracking

stands for decision level

dstack records historyfor backtracking




Topic 1.6

Other heuristics




Other heuristics

Now we will discuss the other heuristics that may improve the performance ofSAT solvers

I 2-watched literals

I pure literals

I variable ordering

I restarts

I Learned clause deletion

I Cache aware implementation

Commentary: Clause learning is an algorithmic change. The above optimization are clever data structures and implementations.




2-watched literals

This data structure optimizes unit clause propagation

Observation:To decide if a clause is ready for unit propagation, we need to look at onlytwo literals that are not false

For each clause we choose two literals and we call them watched literals.

In a clause,

I if watched literals are non-false, the clause is not a unit clause

I if any of the two becomes false, we look for another two non-false literals

I If we can not find another two, the clause is a unit clause

Exercise 1.2Why this scheme may optimize CDCL?




Example: 2-watched literals

Example 1.11

Consider clause p1 ∨ p2 ∨ ¬p3 ∨ ¬p4 in a formula among other variables andclauses. Let us suppose initially we watch p1 and p2 in the clause.∗ , watched literals.© , no work to be done!

Initially: p∗1 ∨ p∗2 ∨ ¬p3 ∨ ¬p4 m = {}...Assign p1 = 0: p1 ∨ p∗2 ∨ ¬p∗3 ∨ ¬p4 m = {. . . , p1 7→ 0}Assign p2 = 1: p1 ∨ p∗2 ∨ ¬p∗3 ∨ ¬p4 m = {. . . , p1 7→ 0, p2 7→ 1} ©Backtrack to p1: p1 ∨ p∗2 ∨ ¬p∗3 ∨ ¬p4 m = {. . . } ©Assign p4 = 1: p1 ∨ p∗2 ∨ ¬p∗3 ∨ ¬p4 m = {. . . , p4 7→ 1} ©

The benefit: often no work to be done!




Topic 1.7

SMT solver




SMT solver

We will now solve quantifier-free formulas in some theory.

Example 1.12

I f (x) ≈ g(h(x , y)) is a formula in QF EUF.

I x > 0 ∨ y + x ≈ 3.5z is a formula in QF LRA.




CDCL(T )

CDCL solves(i.e. checks satisfiability) quantifier-free propositional formulas

CDCL(T ) solves quantifier-free formulas in theory T ,

I separates the boolean and theory reasoning,

I proceeds like CDCL, and

I needs support of a T -solver DPT , i.e., a decision procedure forconjunction of literals of T

The tools that are build using CDCL(T ) are calledsatisfiablity modulo theory solvers (SMT solvers)




Boolean encoder

For a formula F , let boolean encoder e be a partial map from atoms(F ) tofresh boolean variables.

For a term t, let e(t) denote the term obtained by replacing each atom a bye(a) if e(a) is defined.

Example 1.13

Let F = x < 2 ∨ (y > 0 ∨ x ≥ 2)and e = {x < 2 7→ x1, y > 0 7→ x2}e(F ) = x1 ∨ (x2 ∨ ¬x1)

Definition 1.5For a partial model m of e, lete−1(m) , {e−1(x)|x 7→ 1 ∈ m} ∪ {¬e−1(x)|x 7→ 0 ∈ m}




CDCL(T )Algorithm 1.3: CDCL(T )Input: CNF F , boolean encoder e

1 AddClauses(e(F )); m := UnitPropagation(); dl := 0; dstack := λx .0;2 do3 // backtracking4 while ∃x {x 7→ 0, x 7→ 1} ⊆ m do5 if dl = 0 then return unsat;6 (C , dl) := AnalyzeConflict(m) ; // clause learning

7 m.resize(dstack(dl)); AddClauses({C}); m := UnitPropagation();

8 // Boolean decision9 if m is partial then

10 dstack(dl) := m.size();11 dl := dl + 1; m := Decide(); m := UnitPropagation() ;

12 // Theory propagation13 if ∀x {x 7→ 0, x 7→ 1} 6⊆ m then14 (Cs, dl ′) := TheoryDeduction(

∧e−1(m));

15 if dl ′ < dl then {dl = dl ′; m.resize(dstack(dl)); } ;16 AddClauses(e(Cs)); m := UnitPropagation();

17 while m is partial or ∃x {x 7→ 0, x 7→ 1} ⊆ m;18 return sat

stands for decision level

dstack records historyfor backtracking

returns a clause setand a decision level




Theory propagation

TheoryDeduction looks at the atoms assigned so far and checks

I if they are mutually unsatisfiable

I if not, are there other literals from F that are implied by the currentassignment

Any implementation must comply with the following goals

I Correctness: boolean model is consistent with TI Termination: unsat partial models are never repeated




TheoryDeduction

TheoryDeduction solves conjunction of literals and returns a set ofclauses and a decision level.

(Cs, dl ′) := TheoryDeduction(∧

e−1(m))

Cs may contain the clauses of the form

(∧

L)⇒ `

where ` ∈ lits(F ) ∪ {⊥} and L ⊆ e−1(m).Note: The RHS need not be a single literal




Requirement form TheoryDeduction

The output of TheoryDeduction must satisfy the following conditions

I If∧

e−1(m) is unsat in T then Cs must contain a clause with ` = ⊥.

I if∧

e−1(m) is sat then dl ′ = dl .Otherwise, dl ′ is the decision level immediately after which theunsatisfiablity occurred (clearly stated shortly).




Example : CDCL(T )

Consider F = (x = y ∨ y = z) ∧ (y 6= z ∨ z = u) ∧ (z = x)e(F ) = (x1 ∨ x2) ∧ (¬x2 ∨ x3) ∧ x4

After AddClauses(e(F )); m := UnitPropagation()m = {x4 7→ 1}

After m := Decide();m = {x4 7→ 1, x2 7→ 0}

After m := UnitPropagation()m = {x4 7→ 1, x2 7→ 0, x1 7→ 1}

After (Cs, dl ′) := TheoryDeduction(x = y ∧ y 6= z ∧ z = x)Cs = {x 6= y ∨ y = z ∨ z 6= x}, dl ′ = 0,e(Cs) = {¬x1 ∨ x2 ∨ ¬x4}

After AddClauses(e(Cs)); m := UnitPropagation()m = {x4 7→ 1, x2 7→ 0, x1 7→ 1, x1 7→ 0} ← conflict




Topic 1.8

Theory propagation implementation




Theory propagation implementation - Incremental theorysolver

Typically, theory propagation is implemented using incremental/onlinesolvers.

Incremental/online solver DPTI takes input constraints as a sequence of literals,

I maintains a data structure that defines the solver state and satisfiabilityof constraints seen so far.

I provides a stack like interfaceI push( ` ) - adds literal ` in “constraint store”I pop() - removes last pushed literal from the storeI checkSat() - checks satisfiability of current storeI unsatCore() - returns the set of literals that caused unsatisfiablity

Note: We assume that push and pop call checkSat() at the end of their execution.

Therefore, explicit calls to checkSat() are not necessary. However, practical tools

allow users to choose the policy of calling checkSat() - lazy vs. eager




Theory propagation implementationAlgorithm 1.4: TheoryDeductionInput: Set of literals Ls

1 Read only input: m partial model, dstack decision depths, dl current decision level2 foreach ` ∈ Ls do3 DPT .push(`)

4 if DPT .checkSat() == unsat then5 Ls ′ := DPT .unsatCore(); // minimize clause6 dl ′ := max{dl ′′|∃` ∈ Ls ′, i . dstack(dl ′′) < i ∧m[i ] = e(`) 7→ };7 return ({¬

∧Ls ′}, dl ′)

8 else9 //implied clauses

10 Cs := ∅;11 foreach ` ∈ Lits(F ) do12 DPT .push(¬`);13 if DPT .checkSat() == unsat then14 Ls ′ := DPT .unsatCore(); // ` is called implied model and ¬` ∈ Ls ′

15 Cs := Cs ∪ {¬∧

Ls ′};16 DPT .pop();

17 return (Cs,dl)




Topic 1.9

Indian research in SAT/SMT solving




Indian research in SAT/SMT solving

Limited research activity in the field in India.




Solvers for verification tools

are like

engines for the cars.

One must learn to build engines ifone wants to build cars.




What should we do?

We need to build an eco-system for the field

I funding for the backend research

I concretely defined projects

I more users/researchers interactions

I support for start-ups in the related areas

I no expectations of finished products from academia

I support promising individuals



SAT/SMT Solvers and Their Applications - IIT Bombay · 2017. 3. 2. · CFDVS 2017 Ashutosh Gupta...

Documents

Transcript of SAT/SMT Solvers and Their Applications - IIT Bombay · 2017. 3. 2. · CFDVS 2017 Ashutosh Gupta...