Catching Bugs in Software Rajeev Alur Systems Design Research Lab University of Pennsylvania alur

Catching Bugs in Software

Rajeev Alur Systems Design Research LabUniversity of Pennsylvaniawww.cis.upenn.edu/~alur/

Software Reliability

Software bugs are pervasiveBugs can be expensive

Bugs can cost lives

Bulk of development cost is in validation, testing, bug fixes

Old problem that just won’t go away

Many approaches and decades of researchSystematic testing

Programming languages technology (e.g. types)

Formal methods (specification and verification)

Grand challenge for computer science:Tools for designing “correct” software

Correctness is formalized as a mathematical claim to be proved or falsified rigorously

always with respect to the given specification

A brief history of formal verification 1. Structured programs; Hoare logic; 1969

2. Network protocols; State-space search; 1990

3. Cache coherency protocols; Symbolic search; 1995

4. Device drivers; Automated abstraction; 2001

Verifiersoftware/model

correctnessspecification

Yes/proof

No/bug

1. Program Verification

Hoare logic for formalizing correctness of structured programs (late 1960s)

Typical examples: sorting, graph algorithms Specification for sorting

Permute(A,B): array B is a permutation of elements in array A

Sorted(A): for 0<i<n, A[i]<=A[i+1] Function sort is correct if following holds

{True} B := sort(A) {Permute(A,B)&Sorted(B)} Provides calculus for pre/post conditions of

structured programs

BubbleSort (A : array[1..n] of int) { B = A : array[1..n] of int;

for (i=0; i<n; i++) {Permute(A,B)Sorted(B[n-i,n])for 0<k<=n-i-1 and n-i<=k’<=n B[k]<=B[k’]

for (j=0; j<n-i; j++) {Permute(A,B), Sorted(B[n-i,n],for 0<k<=n-i-1 and n-i<=k’<=n

B[k]<=B[k’]for 0<k<j B[k] <= B[j]

if (B[j]>B[j+1]) swap(B,j,j+1)}

}; return B;}

Sample Proof: Bubble Sort

BubbleSort (A : array[1..n] of int) { B = A : array[1..n] of int;

for (i=0; i<n; i++) {

for (j=0; j<n-i; j++) {

if (B[j]>B[j+1]) swap(B,j,j+1)}

}; return B;}

Key to proof:Finding suitable loop invariants

Program Verification

Powerful mathematical logic (e.g. first-order logic, Higher-order logics) needed for formalization

Automation extremely difficult

Finding proof decomposition requires great expertise

Alive and well, but not booming

Contemporary theorem provers: HOL, PVS, ACL2 provide decision procedures and tactics for decomposition

Main applications: Microprocessor verification, Correctness of JVM…

2. Protocol Analysis

Automated analysis of finite-state protocolsNetwork protocols, Distributed algorithms

Great progress in the last 20 yearsProtocol modeled as communicating finite-state processes

Correctness specified using temporal logic

Verification performed automatically to reveal errors

Highly optimized state-space search techniques

Model checker SPIN from Bell LabsACM Software Systems award (2001)

Success in finding high-quality bugs in real systems (NASA space shuttle, Lucent’s Pathstar switch)

Example: X.21 Communication Protocol

State-space Explosion !! Analysis is basically a reachability problem in a graph

Nodes are states, where each state gives values of all the variables of all the communicating processes

An edge represents execution of a single action of one of the processes (asynchronous communication)

Size of graph grows exponentially as the number of bits required for state encoding, but…

Graph is constructed only incrementally, on-the-fly

Clever hashing and state compaction techniques

Many techniques for exploiting structure: symmetry, data independence, partial order reduction …

Millions of states can be explored quickly to reveal bugs

Great flexibility in modelingAbstract many details, simplify

Scale down parameters (buffer size, number of network nodes…)

3. Symbolic Model Checking

Constraint-based analysis of Boolean systemsCache coherency protocols, Memory controllers,…

Active in the past 12 yearsSymbolic Boolean representations (propositional formulas, BDDs) used to encode system dynamics

Correctness specified using temporal logic CTL

Fix-point computation over state sets

Highly optimized memory management

Model checker SMV from CMUACM Kannellakis Theory in Practice Award (1999)

Success in finding high-quality bugs in hardware applications (VHDL/Verilog code)

Cache consistency: Gigamax

Real design of a distributed multiprocessor

Similar successes: IEEE Futurebus+ standard, IBM/Intel/Motorola…

Deadlock found using SMV

M P

UICUIC

UIC

M P

Global bus

Cluster bus

Read-shared/read-owned/write-invalid/write-shared/…

Symbolic Reachability Problem

Model variables X ={x1, … xn} Each var is of finite type, say, boolean

Initialization: I(X) condition over XUpdate: T(X,X’)

How new vars X’ are related to old vars X as a result of executing one step of the program

Target set: F(X)Computational problem:

Can F be satisfied starting with I by repeatedly applying T ?

Graph Search problem

Symbolic SolutionData type: region to represent state-setsR:=I(X)Repeat

If R intersects T report “yes”else if R contains Post(R) report “no”else R := R union Post(R)

Post(R(X))= (Exists X. R(X) and T(X,X’))[X’ -> X]

Operations needed: union, intersection, test for inclusion/emptiness, projection, renaming

Binary Decision Diagrams

Popular representations for Boolean functions

Key properties:Canonical!Size depends on choice of ordering of variablesOperations such as union/intersection are efficient

a

bc

d

0

0

0

0

0

1

1

1 1

1

Function: (a and b) or (c and d)

Like a decision graphNo redundant nodesNo isomorphic subgraphsVariables tested in fixed order

Symbolic Search Techniques Size of BDDs can explode during search, and is

quite unpredictableYears of research leading to plethora of heuristics

Significant industrial interestIn-house groups: Cadence, Synopsis, IBM, NEC…

Commercial model checkers/verification consultants

Recent focus: SAT solversChecking whether F can be reached within k steps can be formulated as a satisfiability of a propositional formula with nk variables

Extremely fast solvers such as zChaff (from Princeton) can solve problems with 1000 vars fast !

SAT + BDD can be combined to great effects

4. Software Model Checking via Abstraction

Can we apply model checking to C programs?

SPIN approach is fine for analyzing models, but constructing models is expensive, and models have no relation to code

Given a program P, build an abstract finite-state (Boolean) model A such that set of behaviors of P is a subset of those of A (conservative abstraction)

Basic ideas around for a while, but all components put together effectively only recently by Microsoft Research team in the project SLAM

Shown to be effective on Windows device drivers, Linux source code (about 10K lines of code)

Program Abstraction

int x, y;

if x>0 {…………y:=x+1……….}else {…………y:=x+1……….}

bool bx, by;

if bx {…………by:=true……….}else {…………by:={true,false}……….}

Predicate Abstraction

bx: x>0; by : y>0

do {KeAcquireSpinLock();

nPacketsOld = nPackets;

if(request){request = request->Next;KeReleaseSpinLock();nPackets++;

}} while (nPackets != nPacketsOld);

KeReleaseSpinLock();

Verification ExampleDoes this code

obey the locking spec?

Unlocked Locked

Error

Rel Acq

Acq

Rel

Specification


if(*){


}} while (*);


Initial AbstractionModel checking boolean program

Using BDDs

U

L

L

L

L

U

L

U

U

U

E


nPacketsOld = nPackets;

if(request){request = request->Next;KeReleaseSpinLock();nPackets++;

}} while (nPackets != nPacketsOld);


Feasibility Analysis Is error path feasiblein C program?

Requires theorem prover for constraint

propagationU

L

L

L

L

U

L

U

U

U

E


nPacketsOld = nPackets; b = true;

if(request){request = request->Next;KeReleaseSpinLock();nPackets++; b = b ? false : *;

}} while (nPackets != nPacketsOld); !b


Predicate DiscoveryAdd new predicateto boolean program

New techniques

b : (nPacketsOld == nPackets)

U

L

L

L

L

U

L

U

U

U

E

Revised Abstraction


b = true;

if(*){

KeReleaseSpinLock();b = b ? false : *;

}} while ( !b );


b : (nPacketsOld == nPackets)

b

b

b

b

U

L

L

L

L

U

L

U

U

b

b

!b

Model checking refined

boolean program

Abstraction Based Techniques Tools for verifying source code combine many

techniquesProgram analysis techniques such as slicing

Abstraction

Model checking

Refinement from counter-examples

New challenges for model checking (beyond finite-state reachability analysis)

Recursion gives pushdown control

Pointers, dynamic creation of objects, inheritence….

A very active and emerging research area

Research in Formal Methods

Verifiermodel

correctnessspecification

proof

bug

softwareDecision proceduresAlgorithms engineeringAutomated abstractionCompositional analysis

Temporal logicsAutomataFrom requirements to specs

Modeling languagesHierarchy, recursionReal-time, HybridStochastic

Bridging the gapModel extractionModel-based design: from models to code

Current Research Projects

FoundationsAnalysis of context-free models

Stochastic hybrid systems

Decision problems for timed automata

Algorithms EngineeringCombining SAT, BDDs, Abstraction

Symbolic solutions to games

Model-based designFrom hybrid automata to embedded software

From state-machine models to Java card policies

Software verification for Java classes

Classical Model Checking Both model M and specification S are regular (finite-state)

M as a generator of all possible behaviors

S as an acceptor of “good” behaviors (verification is language inclusion of M in S) or as an acceptor of “bad” behaviors (verification is checking emptiness of intersection of M and S)

Typical specifications (using automata or temporal logic)

Safety: Always not ( both P1 and P2 have write-exclusive copy)

Liveness: Always (if P1 requests, eventually it gets response)

Robustness of theory of regular languages helps in many ways

M can be product of several components (closure under intersection)

For liveness properties, one needs to consider automata over infinite words, but corresponding theory of omega-regular languages is well developed and well understood

Recursive State Machines

A2

A1

A3

A2

A2

A3

A3

A1Entry-point Exit-point

Box (superstate)

main() { bool y; … x = P(y); … z = P(x); …}bool P(u: bool) {…return Q(u);}bool Q(w: bool) { if … else return P(~w)}

Boolean Programs

Model Checking of Recursive Models

Control-flow requires stack, so model M defines a context-free language

Algorithms exist for checking regular specifications against context-free models

Emptiness of pushdown automata is solvable

Product of a regular language and a context-free language is context-free

But, checking context-free spec against a context-free model is undecidable!

Context-free languages are not closed under intersection

Inclusion as well as emptiness of intersection undecidable

Are Context-free Specs Interesting? Classical Hoare-style pre/post conditions

If p holds when procedure A is invoked, q holds upon return

Total correctness: every invocation of A terminates

Integral part of emerging standard JML

Stack inspection properties (security/access control)

If a variable x is being accessed, procedure A must be in the call stack

Above requires matching of calls with returns, or finding unmatched calls

Recall: Language of words over [, ] such that brackets are well matched is not regular, but context-free

Caret for Context-free Specifications Caret: Temporal Logic of Calls and Returns [AEM03]

Context-free extension of Pnueli’s Linear Temporal Logic LTL

Allows specification of pre/post conditions

Allows specification of stack inspection properties

Main result: Checking Caret specifications against a context-free model is decidable

Polynomial in the size of the model and exponential in the size of formula (as in case of classical model checking)

Proof technique: Product of pushdown model M and Caret specification S is again a pushdown automaton

Key to success: The notion of calls and returns is the same for M as well as S

Caret Definition

Interpreted over “structured” words in which positions are marked with calls { and returns }

p {q {r p r q {p p p} r q} p p

Caret provides classical temporal operators such as Next and Always

q’ q’ q’ q’=Next(q)

p’ p’ p’p’=Always(p or q)

Caret Abstract Operators

Abstract versions of operators jump from a call to the matching return

p {q {r p r q {p p p} r q} p p

Sample specification: pre/post:

Always( p & call -> abstract-next q )

q’ q’ q’q’=abstract-next(q)

q’

p’ p’ p’

p’=abstract-always(p or q)

p’ p’ p’p’ p’

Visibly Pushdown Languages [AM03] Subclass of context-free languages that is suitable for program

analysis / algorithmic verification Alphabet is structured: Symbols are tagged with calls and

returns A visibly pushdown automaton’s moves are constrained by input

If current symbol is a call, it must push

If current symbol is a return it must pop

Else it can only update control state

Class of languages defined by these automata is very robust

Closed under union, intersection, complement, Kleene-*.

Emptiness, inclusion, equivalence decidable

Alternative characterizations: Embeddings of regular tree languages, Monadic Second Order theory with a binary matching predicate

Caret is a subset of visibly pushdown languages

Synthesis of Behavioral Interfaces Behavioral type of a class specifies the allowed sequences

of method calls Type for a file class may be (open; (read+open)*;close)*

Can we synthesize this type automatically?

Given source code for the class implementation

Construct a regular language over the method calls so that a particular exception is never raised

This is useful for compositional verification also: behavioral interface is a suitable abstraction of the class

Proposed route (ongoing project)

Use abstraction to get a finite-state model

Solve a symbolic game to get the most general strategy for invoking methods to keep the abstract model “safe”

Extract interface type from the game solution

Behavioral Interface

public Object next() { … lastRet = cursor++; …}public Object prev() { … lastRet = cursor; …}public void remove() { if (lastRet==-1) throw new IllegalExc(); … lastRet = -1; …}public void add(Object o) { … lastRet = -1; …}

AbstractList.ListItr

Start

Unsafe Safe

addnext

add

remove,add

next,prev

next,prev

Game in Abstracted Program

next

prev

From black states,Player0 gets to choosethe input method call

From purple states, Player1 gets to choose a path in the abstract program till call returns

Objective for Player0: Ensure error states (fromwhich exception can be rasied) are avoidedWinning strategy: Correct method sequence calls

Challenges Techniques for generating finite-state abstractions

How to solve large games symbolically?

In fact, a partial information game (Player0 should choose the next method call only based on values returned so far)

How to construct an understandble behavioral type from the winning strategy?

Abstraction refinement

If Player0 does not invoke any method, exceptions can never be raised

How to refine the current abstraction based on quality of current behavioral type?

Integrating all these into a working tool

Catching Bugs in Software Rajeev Alur Systems Design Research Lab University of Pennsylvania alur

Documents

Transcript of Catching Bugs in Software Rajeev Alur Systems Design Research Lab University of Pennsylvania alur