Catching Bugs in Software Rajeev Alur Systems Design Research Lab University of Pennsylvania alur
-
Upload
giancarlo-andrus -
Category
Documents
-
view
213 -
download
1
Transcript of Catching Bugs in Software Rajeev Alur Systems Design Research Lab University of Pennsylvania alur
Catching Bugs in Software
Rajeev Alur Systems Design Research LabUniversity of Pennsylvaniawww.cis.upenn.edu/~alur/
Software Reliability
Software bugs are pervasiveBugs can be expensive
Bugs can cost lives
Bulk of development cost is in validation, testing, bug fixes
Old problem that just won’t go away
Many approaches and decades of researchSystematic testing
Programming languages technology (e.g. types)
Formal methods (specification and verification)
Grand challenge for computer science:Tools for designing “correct” software
Correctness is formalized as a mathematical claim to be proved or falsified rigorously
always with respect to the given specification
A brief history of formal verification 1. Structured programs; Hoare logic; 1969
2. Network protocols; State-space search; 1990
3. Cache coherency protocols; Symbolic search; 1995
4. Device drivers; Automated abstraction; 2001
Verifiersoftware/model
correctnessspecification
Yes/proof
No/bug
1. Program Verification
Hoare logic for formalizing correctness of structured programs (late 1960s)
Typical examples: sorting, graph algorithms Specification for sorting
Permute(A,B): array B is a permutation of elements in array A
Sorted(A): for 0<i<n, A[i]<=A[i+1] Function sort is correct if following holds
{True} B := sort(A) {Permute(A,B)&Sorted(B)} Provides calculus for pre/post conditions of
structured programs
BubbleSort (A : array[1..n] of int) { B = A : array[1..n] of int;
for (i=0; i<n; i++) {Permute(A,B)Sorted(B[n-i,n])for 0<k<=n-i-1 and n-i<=k’<=n B[k]<=B[k’]
for (j=0; j<n-i; j++) {Permute(A,B), Sorted(B[n-i,n],for 0<k<=n-i-1 and n-i<=k’<=n
B[k]<=B[k’]for 0<k<j B[k] <= B[j]
if (B[j]>B[j+1]) swap(B,j,j+1)}
}; return B;}
Sample Proof: Bubble Sort
BubbleSort (A : array[1..n] of int) { B = A : array[1..n] of int;
for (i=0; i<n; i++) {
for (j=0; j<n-i; j++) {
if (B[j]>B[j+1]) swap(B,j,j+1)}
}; return B;}
Key to proof:Finding suitable loop invariants
Program Verification
Powerful mathematical logic (e.g. first-order logic, Higher-order logics) needed for formalization
Automation extremely difficult
Finding proof decomposition requires great expertise
Alive and well, but not booming
Contemporary theorem provers: HOL, PVS, ACL2 provide decision procedures and tactics for decomposition
Main applications: Microprocessor verification, Correctness of JVM…
2. Protocol Analysis
Automated analysis of finite-state protocolsNetwork protocols, Distributed algorithms
Great progress in the last 20 yearsProtocol modeled as communicating finite-state processes
Correctness specified using temporal logic
Verification performed automatically to reveal errors
Highly optimized state-space search techniques
Model checker SPIN from Bell LabsACM Software Systems award (2001)
Success in finding high-quality bugs in real systems (NASA space shuttle, Lucent’s Pathstar switch)
Example: X.21 Communication Protocol
State-space Explosion !! Analysis is basically a reachability problem in a graph
Nodes are states, where each state gives values of all the variables of all the communicating processes
An edge represents execution of a single action of one of the processes (asynchronous communication)
Size of graph grows exponentially as the number of bits required for state encoding, but…
Graph is constructed only incrementally, on-the-fly
Clever hashing and state compaction techniques
Many techniques for exploiting structure: symmetry, data independence, partial order reduction …
Millions of states can be explored quickly to reveal bugs
Great flexibility in modelingAbstract many details, simplify
Scale down parameters (buffer size, number of network nodes…)
3. Symbolic Model Checking
Constraint-based analysis of Boolean systemsCache coherency protocols, Memory controllers,…
Active in the past 12 yearsSymbolic Boolean representations (propositional formulas, BDDs) used to encode system dynamics
Correctness specified using temporal logic CTL
Fix-point computation over state sets
Highly optimized memory management
Model checker SMV from CMUACM Kannellakis Theory in Practice Award (1999)
Success in finding high-quality bugs in hardware applications (VHDL/Verilog code)
Cache consistency: Gigamax
Real design of a distributed multiprocessor
Similar successes: IEEE Futurebus+ standard, IBM/Intel/Motorola…
Deadlock found using SMV
M P
UICUIC
UIC
M P
Global bus
Cluster bus
Read-shared/read-owned/write-invalid/write-shared/…
Symbolic Reachability Problem
Model variables X ={x1, … xn} Each var is of finite type, say, boolean
Initialization: I(X) condition over XUpdate: T(X,X’)
How new vars X’ are related to old vars X as a result of executing one step of the program
Target set: F(X)Computational problem:
Can F be satisfied starting with I by repeatedly applying T ?
Graph Search problem
Symbolic SolutionData type: region to represent state-setsR:=I(X)Repeat
If R intersects T report “yes”else if R contains Post(R) report “no”else R := R union Post(R)
Post(R(X))= (Exists X. R(X) and T(X,X’))[X’ -> X]
Operations needed: union, intersection, test for inclusion/emptiness, projection, renaming
Binary Decision Diagrams
Popular representations for Boolean functions
Key properties:Canonical!Size depends on choice of ordering of variablesOperations such as union/intersection are efficient
a
bc
d
0
0
0
0
0
1
1
1 1
1
Function: (a and b) or (c and d)
Like a decision graphNo redundant nodesNo isomorphic subgraphsVariables tested in fixed order
Symbolic Search Techniques Size of BDDs can explode during search, and is
quite unpredictableYears of research leading to plethora of heuristics
Significant industrial interestIn-house groups: Cadence, Synopsis, IBM, NEC…
Commercial model checkers/verification consultants
Recent focus: SAT solversChecking whether F can be reached within k steps can be formulated as a satisfiability of a propositional formula with nk variables
Extremely fast solvers such as zChaff (from Princeton) can solve problems with 1000 vars fast !
SAT + BDD can be combined to great effects
4. Software Model Checking via Abstraction
Can we apply model checking to C programs?
SPIN approach is fine for analyzing models, but constructing models is expensive, and models have no relation to code
Given a program P, build an abstract finite-state (Boolean) model A such that set of behaviors of P is a subset of those of A (conservative abstraction)
Basic ideas around for a while, but all components put together effectively only recently by Microsoft Research team in the project SLAM
Shown to be effective on Windows device drivers, Linux source code (about 10K lines of code)
Program Abstraction
int x, y;
if x>0 {…………y:=x+1……….}else {…………y:=x+1……….}
bool bx, by;
if bx {…………by:=true……….}else {…………by:={true,false}……….}
Predicate Abstraction
bx: x>0; by : y>0
do {KeAcquireSpinLock();
nPacketsOld = nPackets;
if(request){request = request->Next;KeReleaseSpinLock();nPackets++;
}} while (nPackets != nPacketsOld);
KeReleaseSpinLock();
Verification ExampleDoes this code
obey the locking spec?
Unlocked Locked
Error
Rel Acq
Acq
Rel
Specification
do {KeAcquireSpinLock();
if(*){
KeReleaseSpinLock();
}} while (*);
KeReleaseSpinLock();
Initial AbstractionModel checking boolean program
Using BDDs
U
L
L
L
L
U
L
U
U
U
E
do {KeAcquireSpinLock();
nPacketsOld = nPackets;
if(request){request = request->Next;KeReleaseSpinLock();nPackets++;
}} while (nPackets != nPacketsOld);
KeReleaseSpinLock();
Feasibility Analysis Is error path feasiblein C program?
Requires theorem prover for constraint
propagationU
L
L
L
L
U
L
U
U
U
E
do {KeAcquireSpinLock();
nPacketsOld = nPackets; b = true;
if(request){request = request->Next;KeReleaseSpinLock();nPackets++; b = b ? false : *;
}} while (nPackets != nPacketsOld); !b
KeReleaseSpinLock();
Predicate DiscoveryAdd new predicateto boolean program
New techniques
b : (nPacketsOld == nPackets)
U
L
L
L
L
U
L
U
U
U
E
Revised Abstraction
do {KeAcquireSpinLock();
b = true;
if(*){
KeReleaseSpinLock();b = b ? false : *;
}} while ( !b );
KeReleaseSpinLock();
b : (nPacketsOld == nPackets)
b
b
b
b
U
L
L
L
L
U
L
U
U
b
b
!b
Model checking refined
boolean program
Abstraction Based Techniques Tools for verifying source code combine many
techniquesProgram analysis techniques such as slicing
Abstraction
Model checking
Refinement from counter-examples
New challenges for model checking (beyond finite-state reachability analysis)
Recursion gives pushdown control
Pointers, dynamic creation of objects, inheritence….
A very active and emerging research area
Research in Formal Methods
Verifiermodel
correctnessspecification
proof
bug
softwareDecision proceduresAlgorithms engineeringAutomated abstractionCompositional analysis
Temporal logicsAutomataFrom requirements to specs
Modeling languagesHierarchy, recursionReal-time, HybridStochastic
Bridging the gapModel extractionModel-based design: from models to code
Current Research Projects
FoundationsAnalysis of context-free models
Stochastic hybrid systems
Decision problems for timed automata
Algorithms EngineeringCombining SAT, BDDs, Abstraction
Symbolic solutions to games
Model-based designFrom hybrid automata to embedded software
From state-machine models to Java card policies
Software verification for Java classes
Classical Model Checking Both model M and specification S are regular (finite-state)
M as a generator of all possible behaviors
S as an acceptor of “good” behaviors (verification is language inclusion of M in S) or as an acceptor of “bad” behaviors (verification is checking emptiness of intersection of M and S)
Typical specifications (using automata or temporal logic)
Safety: Always not ( both P1 and P2 have write-exclusive copy)
Liveness: Always (if P1 requests, eventually it gets response)
Robustness of theory of regular languages helps in many ways
M can be product of several components (closure under intersection)
For liveness properties, one needs to consider automata over infinite words, but corresponding theory of omega-regular languages is well developed and well understood
Recursive State Machines
A2
A1
A3
A2
A2
A3
A3
A1Entry-point Exit-point
Box (superstate)
main() { bool y; … x = P(y); … z = P(x); …}bool P(u: bool) {…return Q(u);}bool Q(w: bool) { if … else return P(~w)}
Boolean Programs
Model Checking of Recursive Models
Control-flow requires stack, so model M defines a context-free language
Algorithms exist for checking regular specifications against context-free models
Emptiness of pushdown automata is solvable
Product of a regular language and a context-free language is context-free
But, checking context-free spec against a context-free model is undecidable!
Context-free languages are not closed under intersection
Inclusion as well as emptiness of intersection undecidable
Are Context-free Specs Interesting? Classical Hoare-style pre/post conditions
If p holds when procedure A is invoked, q holds upon return
Total correctness: every invocation of A terminates
Integral part of emerging standard JML
Stack inspection properties (security/access control)
If a variable x is being accessed, procedure A must be in the call stack
Above requires matching of calls with returns, or finding unmatched calls
Recall: Language of words over [, ] such that brackets are well matched is not regular, but context-free
Caret for Context-free Specifications Caret: Temporal Logic of Calls and Returns [AEM03]
Context-free extension of Pnueli’s Linear Temporal Logic LTL
Allows specification of pre/post conditions
Allows specification of stack inspection properties
Main result: Checking Caret specifications against a context-free model is decidable
Polynomial in the size of the model and exponential in the size of formula (as in case of classical model checking)
Proof technique: Product of pushdown model M and Caret specification S is again a pushdown automaton
Key to success: The notion of calls and returns is the same for M as well as S
Caret Definition
Interpreted over “structured” words in which positions are marked with calls { and returns }
p {q {r p r q {p p p} r q} p p
Caret provides classical temporal operators such as Next and Always
q’ q’ q’ q’=Next(q)
p’ p’ p’p’=Always(p or q)
Caret Abstract Operators
Abstract versions of operators jump from a call to the matching return
p {q {r p r q {p p p} r q} p p
Sample specification: pre/post:
Always( p & call -> abstract-next q )
q’ q’ q’q’=abstract-next(q)
q’
p’ p’ p’
p’=abstract-always(p or q)
p’ p’ p’p’ p’
Visibly Pushdown Languages [AM03] Subclass of context-free languages that is suitable for program
analysis / algorithmic verification Alphabet is structured: Symbols are tagged with calls and
returns A visibly pushdown automaton’s moves are constrained by input
If current symbol is a call, it must push
If current symbol is a return it must pop
Else it can only update control state
Class of languages defined by these automata is very robust
Closed under union, intersection, complement, Kleene-*.
Emptiness, inclusion, equivalence decidable
Alternative characterizations: Embeddings of regular tree languages, Monadic Second Order theory with a binary matching predicate
Caret is a subset of visibly pushdown languages
Synthesis of Behavioral Interfaces Behavioral type of a class specifies the allowed sequences
of method calls Type for a file class may be (open; (read+open)*;close)*
Can we synthesize this type automatically?
Given source code for the class implementation
Construct a regular language over the method calls so that a particular exception is never raised
This is useful for compositional verification also: behavioral interface is a suitable abstraction of the class
Proposed route (ongoing project)
Use abstraction to get a finite-state model
Solve a symbolic game to get the most general strategy for invoking methods to keep the abstract model “safe”
Extract interface type from the game solution
Behavioral Interface
public Object next() { … lastRet = cursor++; …}public Object prev() { … lastRet = cursor; …}public void remove() { if (lastRet==-1) throw new IllegalExc(); … lastRet = -1; …}public void add(Object o) { … lastRet = -1; …}
AbstractList.ListItr
Start
Unsafe Safe
addnext
add
remove,add
next,prev
next,prev
Game in Abstracted Program
next
prev
From black states,Player0 gets to choosethe input method call
From purple states, Player1 gets to choose a path in the abstract program till call returns
Objective for Player0: Ensure error states (fromwhich exception can be rasied) are avoidedWinning strategy: Correct method sequence calls
Challenges Techniques for generating finite-state abstractions
How to solve large games symbolically?
In fact, a partial information game (Player0 should choose the next method call only based on values returned so far)
How to construct an understandble behavioral type from the winning strategy?
Abstraction refinement
If Player0 does not invoke any method, exceptions can never be raised
How to refine the current abstraction based on quality of current behavioral type?
Integrating all these into a working tool