1 Revisiting Difficult Constraints if (hash(x) == hash(y)) {... } How do we cover this code? Suppose...

138
1 Revisiting Difficult Constraints if (hash(x) == hash(y)) { ... } How do we cover this code? Suppose we’re running (DART, SAGE, SMART, CUTE, SPLAT, etc.) – we get here, but hash(x) != hash(y). Can we solve for hash(x) == hash(y)? Concrete values won’t help us much – we still have to solve for hash(x) == C1 or for hash(y) == C2. . . Any ideas?
  • date post

    22-Dec-2015
  • Category

    Documents

  • view

    214
  • download

    0

Transcript of 1 Revisiting Difficult Constraints if (hash(x) == hash(y)) {... } How do we cover this code? Suppose...

Page 1: 1 Revisiting Difficult Constraints if (hash(x) == hash(y)) {... } How do we cover this code? Suppose we’re running (DART, SAGE, SMART, CUTE, SPLAT, etc.)

1

Revisiting Difficult Constraints

if (hash(x) == hash(y)) {

...

}

How do we cover this code?

Suppose we’re running (DART, SAGE,SMART, CUTE, SPLAT, etc.) – we gethere, but hash(x) != hash(y). Can wesolve for hash(x) == hash(y)?

Concrete values won’t help us much – westill have to solve for hash(x) == C1 or forhash(y) == C2. . .

Any ideas?

Page 2: 1 Revisiting Difficult Constraints if (hash(x) == hash(y)) {... } How do we cover this code? Suppose we’re running (DART, SAGE, SMART, CUTE, SPLAT, etc.)

2

Today

A brief “digression” on causality and philosophy (of science)

Fault localization & error explanation• Renieris & Reiss: Nearest Neighbors• Jones & Harrold: Tarantula

• How to evaluate a fault localization• PDGs (+ BFS or ranking)

• Solving for a nearest run (not really testing)

Page 3: 1 Revisiting Difficult Constraints if (hash(x) == hash(y)) {... } How do we cover this code? Suppose we’re running (DART, SAGE, SMART, CUTE, SPLAT, etc.)

3

Causality

When a test case fails we start debugging

We assume that the fault (what we’re really after) causes the failure• Remember RIP (Reachability, Infection,

Propagation)?

What do we mean when we say that• “A causes B”?

Page 4: 1 Revisiting Difficult Constraints if (hash(x) == hash(y)) {... } How do we cover this code? Suppose we’re running (DART, SAGE, SMART, CUTE, SPLAT, etc.)

4

Causality

We don’t know

Though it is central to everyday life – and to the aims of science• A real understanding of causality eludes us

to this day• Still no non-controversial way to answer the

question “does A cause B”?

Page 5: 1 Revisiting Difficult Constraints if (hash(x) == hash(y)) {... } How do we cover this code? Suppose we’re running (DART, SAGE, SMART, CUTE, SPLAT, etc.)

5

Causality

Philosophy of causality is a fairly active area, back to Aristotle, and (more modern approaches) Hume• General agreement that a cause is

something that “makes a difference” – if the cause had not been, then the effect wouldn’t have been

• One theory that is rather popular with computer scientists is David Lewis’ counterfactual approach

• Probably because it (and probabilistic or statistical approaches) are amenable to mathematical treatment and automation

Page 6: 1 Revisiting Difficult Constraints if (hash(x) == hash(y)) {... } How do we cover this code? Suppose we’re running (DART, SAGE, SMART, CUTE, SPLAT, etc.)

6

Causality (According to Lewis)

For Lewis (roughly – I’m conflating his counterfactual dependency and causal dependency)• A causes B (in world w) iff• In all possible worlds that are

maximally similar to w, and in which A does not take place, B also does not take place

Page 7: 1 Revisiting Difficult Constraints if (hash(x) == hash(y)) {... } How do we cover this code? Suppose we’re running (DART, SAGE, SMART, CUTE, SPLAT, etc.)

7

Causality (According to Lewis)

Causality does not depend on• B being impossible without A• Seems reasonable: we don’t, when

asking “Was Larry slipping on the banana peel causally dependent on Curly dropping it?” consider worlds in which new circumstances (Moe dropping a banana peel) are introduced

Page 8: 1 Revisiting Difficult Constraints if (hash(x) == hash(y)) {... } How do we cover this code? Suppose we’re running (DART, SAGE, SMART, CUTE, SPLAT, etc.)

8

Causality (According to Lewis)

Many objections to Lewis in the literature• e.g. cause precedes the event in time seems

to not be required by his approach

One is not a problem for our purposes• Distance metrics (how similar is world w to

world w’) are problematic for “worlds”• Counterfactuals are tricky

• Not a problem for program executions• May be details to handle, but no one has in-

principle objections to asking how similar two program executions are

• Or philosophical problems with multiple executions (no run is “privileged by actuality”)

Page 9: 1 Revisiting Difficult Constraints if (hash(x) == hash(y)) {... } How do we cover this code? Suppose we’re running (DART, SAGE, SMART, CUTE, SPLAT, etc.)

9

Causality (According to Lewis)A

B

Did A cause B in this program execution?

d d’

Yes! d < d’

A

B

d d’

B

No. d > d’

Page 10: 1 Revisiting Difficult Constraints if (hash(x) == hash(y)) {... } How do we cover this code? Suppose we’re running (DART, SAGE, SMART, CUTE, SPLAT, etc.)

10

Formally

A predicate e is causally dependent on a predicate c in an execution a iff:

1. c(a) e(a)

2. b . (c(b) e(b) (b’ . (c(b’) e(b’)) (d(a, b) < d(a, b’))))

Page 11: 1 Revisiting Difficult Constraints if (hash(x) == hash(y)) {... } How do we cover this code? Suppose we’re running (DART, SAGE, SMART, CUTE, SPLAT, etc.)

11

What does this have to do with automated debugging??

A fault is an incorrect part of a program

In a failing test case, some fault is reached and executes• Causing the state of the program to be

corrupted (error)• This incorrect state is propagated

through the program (propagation is a series of “A causes B”s)

• Finally, bad state is observable as a failure – caused by the fault

Page 12: 1 Revisiting Difficult Constraints if (hash(x) == hash(y)) {... } How do we cover this code? Suppose we’re running (DART, SAGE, SMART, CUTE, SPLAT, etc.)

12

Fault Localization

Fault localization, then, is:• An effort to automatically find (one of

the) causes of an observable failure• It is inherently difficult because there

are many causes of the failure that are not the fault

• We don’t mind seeing the chain of cause and effect reaching back to the fault

• But the fact that we reached the fault at all is also a cause!

Page 13: 1 Revisiting Difficult Constraints if (hash(x) == hash(y)) {... } How do we cover this code? Suppose we’re running (DART, SAGE, SMART, CUTE, SPLAT, etc.)

13

Enough!

Ok, let’s get back to testing and some methods for localizing faults from test cases• But – keep in mind that when we

localize a fault, we’re really trying to automate finding causal relationships

• The fault is a cause of the failure

Page 14: 1 Revisiting Difficult Constraints if (hash(x) == hash(y)) {... } How do we cover this code? Suppose we’re running (DART, SAGE, SMART, CUTE, SPLAT, etc.)

14

Lewis and Fault Localization

Causality:• Generally agreed that explanation is about

causality. [Ball,Naik,Rajamani],[Zeller],[Groce,Visser],[Sosa,Tooley],[Lewis],etc.

Similarity:• Also often assumed that successful

executions that are similar to a failing run can help explain an error. [Zeller],[Renieris,Reiss][Groce,Visser],etc.

• This work was not based on Lewis’ approach – it seems that this point about similarity is just an intuitive understanding most people (or at least computer scientists) share

Page 15: 1 Revisiting Difficult Constraints if (hash(x) == hash(y)) {... } How do we cover this code? Suppose we’re running (DART, SAGE, SMART, CUTE, SPLAT, etc.)

15

Distance and Similarity

We already saw this idea at play in one version of Zeller’s delta-debugging• Trying to find the one change needed to take

a successful run and make it fail• Most similar thread schedule that doesn’t cause a

failure, etc.

Renieris and Reiss based a general fault localization technique on this idea – measuring distances between executions• To localize a fault, compare the failing trace

with its nearest neighbor according to some distance metric

Page 16: 1 Revisiting Difficult Constraints if (hash(x) == hash(y)) {... } How do we cover this code? Suppose we’re running (DART, SAGE, SMART, CUTE, SPLAT, etc.)

16

Renieris and Reiss’ Localization

Basic idea (over-simplified)• We have lots of test cases

• Some fail• A much larger number pass

• Pick a failure• Find most similar successful test case• Report differences as our fault localization

“nearest neighbor”

Page 17: 1 Revisiting Difficult Constraints if (hash(x) == hash(y)) {... } How do we cover this code? Suppose we’re running (DART, SAGE, SMART, CUTE, SPLAT, etc.)

17

Renieris and Reiss’ Localization

Collect spectra of executions, rather than the full executions• For example, just count the number of times

each source statement executed• Previous work on using spectra for

localization basically amounted to set difference/union – for example, find features unique to (or lacking in) the failing run(s)

• Problem: many failing runs have no such features – many successful test cases have R (and maybe I) but not P!

• Otherwise, localization wouldbe very easy

Page 18: 1 Revisiting Difficult Constraints if (hash(x) == hash(y)) {... } How do we cover this code? Suppose we’re running (DART, SAGE, SMART, CUTE, SPLAT, etc.)

18

Renieris and Reiss’ Localization

Some obvious and not so obvious points to think about• Technique makes intuitive sense• But what if there are no successful runs that

are very similar?• Random testing might produce runs that all differ

in various accidental ways• Is this approach over-dependent on test suite

quality?

Page 19: 1 Revisiting Difficult Constraints if (hash(x) == hash(y)) {... } How do we cover this code? Suppose we’re running (DART, SAGE, SMART, CUTE, SPLAT, etc.)

19

Renieris and Reiss’ Localization

Some obvious and not so obvious points to think about• What if we minimize the failing run using

delta-debugging?• Now lots of differences with original successful

runs just due to length!• We could produce a very similar run by using

delta-debugging to get a 1-change run that succeeds (there will actually be many of these)

• Can still use Renieris and Reiss’ approach – because delta-debugging works over the inputs, not the program behavior, spectra for these runs will be more or less similar to the failing test case

Page 20: 1 Revisiting Difficult Constraints if (hash(x) == hash(y)) {... } How do we cover this code? Suppose we’re running (DART, SAGE, SMART, CUTE, SPLAT, etc.)

20

Renieris and Reiss’ Localization

Many details (see the paper):• Choice of spectra• Choice of distance metric• How to handle equal spectra for failing/passing

tests?

Basic idea is nonetheless straightforward

Page 21: 1 Revisiting Difficult Constraints if (hash(x) == hash(y)) {... } How do we cover this code? Suppose we’re running (DART, SAGE, SMART, CUTE, SPLAT, etc.)

21

The Tarantula Approach

Jones, Harrold (and Stasko): Tarantula

Not based on distance metrics or a Lewis-like assumption

A “statistical” approach to fault localization

Originally conceived of as a visualization approach: produces a picture of all source in program, colored according to how “suspicious” it is• Green: not likely to be faulty• Yellow: hrm, a little suspicious• Red: very suspicious, likely fault

Page 22: 1 Revisiting Difficult Constraints if (hash(x) == hash(y)) {... } How do we cover this code? Suppose we’re running (DART, SAGE, SMART, CUTE, SPLAT, etc.)

22

The Tarantula Approach

Page 23: 1 Revisiting Difficult Constraints if (hash(x) == hash(y)) {... } How do we cover this code? Suppose we’re running (DART, SAGE, SMART, CUTE, SPLAT, etc.)

23

The Tarantula Approach

How do we score a statement in this approach? (where do all those colors come from?)

Again, assume we have a large set of tests, some passing, some failing

“Coverage entity” e (e.g., statement)• failed(e) = # tests covering e that fail• passed(e) = # tests covering e that pass• totalfailed, totalpassed = what you’d expect

Page 24: 1 Revisiting Difficult Constraints if (hash(x) == hash(y)) {... } How do we cover this code? Suppose we’re running (DART, SAGE, SMART, CUTE, SPLAT, etc.)

24

The Tarantula Approach

How do we score a statement in this approach? (where do all those colors come from?)

dtotalfaileefailed

dtotalpasseepassed

dtotalfaileefailed

enesssuspicious)()(

)(

)(

Page 25: 1 Revisiting Difficult Constraints if (hash(x) == hash(y)) {... } How do we cover this code? Suppose we’re running (DART, SAGE, SMART, CUTE, SPLAT, etc.)

25

The Tarantula Approach

Not very suspicious: appears in almost every passing test and almost every failing test

Highly suspicious: appears much more frequently in failing than passing tests

dtotalfaileefailed

dtotalpasseepassed

dtotalfaileefailed

enesssuspicious)()(

)(

)(

Page 26: 1 Revisiting Difficult Constraints if (hash(x) == hash(y)) {... } How do we cover this code? Suppose we’re running (DART, SAGE, SMART, CUTE, SPLAT, etc.)

26

The Tarantula Approach

dtotalfaileefailed

dtotalpasseepassed

dtotalfaileefailed

enesssuspicious)()(

)(

)(

Simple program to computethe middle of three inputs,with a fault.

mid() int x, y, z, m;1 read (x, y, z);2 m = z;3 if (y < z)4 if (x < y)5 m = y;6 else if (x < z)7 m = y;8 else9 if (x > y)10 m = y;11 else if (x > z)12 m = x;13 print (m);

Page 27: 1 Revisiting Difficult Constraints if (hash(x) == hash(y)) {... } How do we cover this code? Suppose we’re running (DART, SAGE, SMART, CUTE, SPLAT, etc.)

27

The Tarantula Approach

dtotalfaileefailed

dtotalpasseepassed

dtotalfaileefailed

enesssuspicious)()(

)(

)(

mid() int x, y, z, m;1 read (x, y, z); 2 m = z;3 if (y < z)4 if (x < y)5 m = y;6 else if (x < z)7 m = y;8 else9 if (x > y)10 m = y;11 else if (x > z)12 m = x;13 print (m);

Run some tests. . .

(3,3,5) (1,2,3) (3,2,1) (5,5,5) (5,3,4) (2,1,3)

Look at whether they pass or failLook at coverage of entities

Compute suspiciousness using the formula

0.50.50.50.630.00.710.830.00.00.00.00.00.5

Fault is indeed most suspicious!

Page 28: 1 Revisiting Difficult Constraints if (hash(x) == hash(y)) {... } How do we cover this code? Suppose we’re running (DART, SAGE, SMART, CUTE, SPLAT, etc.)

28

The Tarantula Approach

Obvious benefits:• No problem if the fault is reached in some

successful test cases• Doesn’t depend on having any successful tests that

are similar to the failing test(s)• Provides a ranking of every statement, instead of

just a set of nodes – directions on where to look next• Numerical, even – how much more suspicious is X than Y?

• The pretty visualization may be quite helpful in seeing relationships between suspicious statements

• Is it less sensitive to accidental features of random tests, and to test suite quality in general?

• What about minimized failing tests here?

Page 29: 1 Revisiting Difficult Constraints if (hash(x) == hash(y)) {... } How do we cover this code? Suppose we’re running (DART, SAGE, SMART, CUTE, SPLAT, etc.)

29

Tarantula vs. Nearest Neighbor

Which approach is better?• Once upon a time:

• Fault localization papers gave a few anecdotes of their technique working well, showed it working better than another approach on some example, and called it a day

• We’d like something more quantitative (how much better is this technique than that one?) and much less subjective!

Page 30: 1 Revisiting Difficult Constraints if (hash(x) == hash(y)) {... } How do we cover this code? Suppose we’re running (DART, SAGE, SMART, CUTE, SPLAT, etc.)

30

Evaluating Fault Localization Approaches

Fault localization tools produce reports

We can reduce a report to a set (or ranking) of program locations

Let’s say we have three localization tools which produce• A big report that includes the fault• A much smaller report, but the actual

fault is not part of it• Another small report, also not

containing the fault

Which of theseis the “best”fault localization?

Page 31: 1 Revisiting Difficult Constraints if (hash(x) == hash(y)) {... } How do we cover this code? Suppose we’re running (DART, SAGE, SMART, CUTE, SPLAT, etc.)

31

Evaluating a Fault Localization Report

Idea (credit to Renieris and Reiss):• Imagine an “ideal” debugger, the

perfect programmer• Starts reading the report

• Expands outwards from nodes (program locations) in the report to associated nodes, adding those at each step

• If a variable use is in the report, looks at the places it might be assigned

• If code is in the report, looks at the condition of any ifs guarding that code

• In general, follows program (causal) dependencies

• As soon as a fault is reached, recognizes it!

Page 32: 1 Revisiting Difficult Constraints if (hash(x) == hash(y)) {... } How do we cover this code? Suppose we’re running (DART, SAGE, SMART, CUTE, SPLAT, etc.)

32

Evaluating a Fault Localization Report

Score the reports according to• How much code the ideal debugger

would read, starting from the report• Empty report: score = 0• Every line in the program: score = 0• Big report, containing the bug?

mediocre score• Small report, far from the bug? bad

score• Small report, “near” the bug? good

score• Report is the fault: great score (0.9)

0.4

0.8

0.2

0.9

Page 33: 1 Revisiting Difficult Constraints if (hash(x) == hash(y)) {... } How do we cover this code? Suppose we’re running (DART, SAGE, SMART, CUTE, SPLAT, etc.)

33

Evaluating a Fault Localization Report

Breadth-first search of Program Dependency Graph (PDG) starting from fault localization:• Terminate the search when a real

fault is found• Score is proportion of the PDG that is

not explored during the breadth-first search

• Score near 1.00 = report includes only faults

Page 34: 1 Revisiting Difficult Constraints if (hash(x) == hash(y)) {... } How do we cover this code? Suppose we’re running (DART, SAGE, SMART, CUTE, SPLAT, etc.)

34

Details of Evaluation MethodPDG

12 total nodes in PDG

Page 35: 1 Revisiting Difficult Constraints if (hash(x) == hash(y)) {... } How do we cover this code? Suppose we’re running (DART, SAGE, SMART, CUTE, SPLAT, etc.)

35

Details of Evaluation MethodPDG

12 total nodes in PDG

Fault

Report

Page 36: 1 Revisiting Difficult Constraints if (hash(x) == hash(y)) {... } How do we cover this code? Suppose we’re running (DART, SAGE, SMART, CUTE, SPLAT, etc.)

36

Details of Evaluation MethodPDG

12 total nodes in PDG

Fault

Report + 1 Layer BFS

Page 37: 1 Revisiting Difficult Constraints if (hash(x) == hash(y)) {... } How do we cover this code? Suppose we’re running (DART, SAGE, SMART, CUTE, SPLAT, etc.)

37

Details of Evaluation MethodPDG

12 total nodes in PDG

Fault

Report + 1 Layer BFSSTOP: Real fault discovered

Page 38: 1 Revisiting Difficult Constraints if (hash(x) == hash(y)) {... } How do we cover this code? Suppose we’re running (DART, SAGE, SMART, CUTE, SPLAT, etc.)

38

Details of Evaluation MethodPDG

12 total nodes in PDG

8 of 12 nodes not covered by

BFS: score = 8/12 ~= 0.67.

Fault

Report + 1 Layer BFSSTOP: Real fault discovered

Page 39: 1 Revisiting Difficult Constraints if (hash(x) == hash(y)) {... } How do we cover this code? Suppose we’re running (DART, SAGE, SMART, CUTE, SPLAT, etc.)

39

Details of Evaluation MethodPDG

12 total nodes in PDG

Fault

Report

Page 40: 1 Revisiting Difficult Constraints if (hash(x) == hash(y)) {... } How do we cover this code? Suppose we’re running (DART, SAGE, SMART, CUTE, SPLAT, etc.)

41

Details of Evaluation MethodPDG

12 total nodes in PDG

Fault

Report + 2 layers BFS

Page 41: 1 Revisiting Difficult Constraints if (hash(x) == hash(y)) {... } How do we cover this code? Suppose we’re running (DART, SAGE, SMART, CUTE, SPLAT, etc.)

42

Details of Evaluation MethodPDG

12 total nodes in PDG

Fault

Report + 3 layers BFS

Page 42: 1 Revisiting Difficult Constraints if (hash(x) == hash(y)) {... } How do we cover this code? Suppose we’re running (DART, SAGE, SMART, CUTE, SPLAT, etc.)

43

Details of Evaluation MethodPDG

12 total nodes in PDG

Fault

Report + 4 layers BFSSTOP: Real fault discovered

Page 43: 1 Revisiting Difficult Constraints if (hash(x) == hash(y)) {... } How do we cover this code? Suppose we’re running (DART, SAGE, SMART, CUTE, SPLAT, etc.)

44

Details of Evaluation MethodPDG

12 total nodes in PDG

0 of 12 nodes not covered by

BFS: score = 0/12 ~= 0.00.

Fault

Report + 4 layers BFS

Page 44: 1 Revisiting Difficult Constraints if (hash(x) == hash(y)) {... } How do we cover this code? Suppose we’re running (DART, SAGE, SMART, CUTE, SPLAT, etc.)

45

Details of Evaluation MethodPDG

Fault = Report

12 total nodes in PDG

11 of 12 nodes not covered by

BFS: score = 11/12 ~= 0.92.

Page 45: 1 Revisiting Difficult Constraints if (hash(x) == hash(y)) {... } How do we cover this code? Suppose we’re running (DART, SAGE, SMART, CUTE, SPLAT, etc.)

46

Evaluating a Fault Localization Report

Caveats:• Isn’t a misleading report (a small number of

nodes, far from the bug) actually much worse than an empty report?

• “I don’t know” vs.• “Oh, yeah man, you left your keys in the living

room somewhere” (when in fact your keys are in a field in Nebraska)

• Nobody really searches a PDG like that!• Not backed up by user studies to show high

scores correlate to users finding the fault quickly from the report

Page 46: 1 Revisiting Difficult Constraints if (hash(x) == hash(y)) {... } How do we cover this code? Suppose we’re running (DART, SAGE, SMART, CUTE, SPLAT, etc.)

47

Evaluating a Fault Localization Report

Still, the Renieris/Reiss scoring has been widely adopted by the testing community and some model checking folks• Best thing we’ve got, for now

Page 47: 1 Revisiting Difficult Constraints if (hash(x) == hash(y)) {... } How do we cover this code? Suppose we’re running (DART, SAGE, SMART, CUTE, SPLAT, etc.)

48

Evaluating Fault Localization Approaches

So, how do the techniques stack up?

Tarantula seems to be the best of the test suite based techniques• Next best is the Cause Transitions approach

of Cleve and Zeller (see their paper), but it sometimes uses programmer knowledge

• Two different Nearest-Neighbor approaches are next best

• Set-intersection and set-union are worst

For details, see the Tarantula paper

Page 48: 1 Revisiting Difficult Constraints if (hash(x) == hash(y)) {... } How do we cover this code? Suppose we’re running (DART, SAGE, SMART, CUTE, SPLAT, etc.)

49

Evaluating Fault Localization Approaches

Tarantula got scores at the 0.99 or > level 3 times more often than the next best

Trend continued at every ranking – Tarantula was always the best approach

Also appeared to be efficient:• Much faster than Cause-Transitions approach

of Cleve and Zeller• Probably about the same as the Nearest

Neighbor and set-union/intersection methods

Page 49: 1 Revisiting Difficult Constraints if (hash(x) == hash(y)) {... } How do we cover this code? Suppose we’re running (DART, SAGE, SMART, CUTE, SPLAT, etc.)

50

Evaluating Fault Localization Approaches

Caveats:• Evaluation is over the Siemens suite (again!)

• But Tarantula has done well on larger programs• Tarantula and Nearest Neighbor might both

benefit from larger test suites produced by random testing

• Siemens is not that many tests, done by hand

Page 50: 1 Revisiting Difficult Constraints if (hash(x) == hash(y)) {... } How do we cover this code? Suppose we’re running (DART, SAGE, SMART, CUTE, SPLAT, etc.)

51

Another Way to Do It

Question:• How good would the Nearest Neighbors method be if

our test suite contained all possible executions (the universe of tests)?

• We suspect it would do much better, right?

• But of course, that’s ridiculous – we can’t check for distance to every possible successful test case!

• Unless our program can be model checked• Leads us into next week’s topic, in a roundabout way:

testing via model checking

Page 51: 1 Revisiting Difficult Constraints if (hash(x) == hash(y)) {... } How do we cover this code? Suppose we’re running (DART, SAGE, SMART, CUTE, SPLAT, etc.)

52

Explanation with Distance Metrics

Algorithm (very high level):1. Find a counterexample trace

(model checking term for “failing test case”)

2. Encode search for maximally similar successful execution under a distance metric d as an optimization problem

3. Report the differences (s) as an explanation (and a localization) of the error

Page 52: 1 Revisiting Difficult Constraints if (hash(x) == hash(y)) {... } How do we cover this code? Suppose we’re running (DART, SAGE, SMART, CUTE, SPLAT, etc.)

53

Implementation #1

CBMC Bounded Model Checker for ANSI-C programs:• Input: C program + loop bounds• Checks for various properties:

• assert statements• Array bounds and pointer safety• Arithmetic overflow

• Verifies within given loop bounds• Provides counterexample if property

does not hold• Now provides error explanation and

fault localization.

Page 53: 1 Revisiting Difficult Constraints if (hash(x) == hash(y)) {... } How do we cover this code? Suppose we’re running (DART, SAGE, SMART, CUTE, SPLAT, etc.)

54

14: assert (a < 4);

5: b = 4

6: c = -4

7: a = 2

8: a = 1

9: a = 6

10: a = 4

11: c = 9

12: c = 10

13: a = 10

4: a = 5

Given a counterexample,

Page 54: 1 Revisiting Difficult Constraints if (hash(x) == hash(y)) {... } How do we cover this code? Suppose we’re running (DART, SAGE, SMART, CUTE, SPLAT, etc.)

55

14: assert (a < 4);

5: b = 4

6: c = -4

7: a = 2

8: a = 1

9: a = 6

10: a = 4

11: c = 9

12: c = 10

13: a = 10

4: a = 5

produce a successful executionthat is as similar as possible

(under a distance metric)

Page 55: 1 Revisiting Difficult Constraints if (hash(x) == hash(y)) {... } How do we cover this code? Suppose we’re running (DART, SAGE, SMART, CUTE, SPLAT, etc.)

56

14: assert (a < 4);

5: b = 4

6: c = -4

7: a = 2

8: a = 1

9: a = 6

10: a = 4

11: c = 9

12: c = 10

13: a = 10

4: a = 5

14: assert (a < 4);

5: b = -3

6: c = -4

7: a = 2

8: a = 1

9: a = 6

10: a = 4

11: c = 9

12: c = 3

13: a = 3

4: a = 5

produce a successful executionthat is as similar as possible

(under a distance metric)

Page 56: 1 Revisiting Difficult Constraints if (hash(x) == hash(y)) {... } How do we cover this code? Suppose we’re running (DART, SAGE, SMART, CUTE, SPLAT, etc.)

57

14: assert (a < 4);

5: b = 4

6: c = -4

7: a = 2

8: a = 1

9: a = 6

10: a = 4

11: c = 9

12: c = 10

13: a = 10

4: a = 5

14: assert (a < 4);

5: b = -3

6: c = -4

7: a = 2

8: a = 1

9: a = 6

10: a = 4

11: c = 9

12: c = 3

13: a = 3

4: a = 5

and examine the necessary differences:

Page 57: 1 Revisiting Difficult Constraints if (hash(x) == hash(y)) {... } How do we cover this code? Suppose we’re running (DART, SAGE, SMART, CUTE, SPLAT, etc.)

58

14: assert (a < 4);

5: b = 4

6: c = -4

7: a = 2

8: a = 1

9: a = 6

10: a = 4

11: c = 9

12: c = 10

13: a = 10

4: a = 5

14: assert (a < 4);

5: b = -3

6: c = -4

7: a = 2

8: a = 1

9: a = 6

10: a = 4

11: c = 9

12: c = 3

13: a = 3

4: a = 5

and examine the necessary differences:

s

Page 58: 1 Revisiting Difficult Constraints if (hash(x) == hash(y)) {... } How do we cover this code? Suppose we’re running (DART, SAGE, SMART, CUTE, SPLAT, etc.)

59

14: assert (a < 4);

5: b = 4

6: c = -4

7: a = 2

8: a = 1

9: a = 6

10: a = 4

11: c = 9

12: c = 10

13: a = 10

4: a = 5

14: assert (a < 4);

5: b = -3

6: c = -4

7: a = 2

8: a = 1

9: a = 6

10: a = 4

11: c = 9

12: c = 3

13: a = 3

4: a = 5

and examine the necessary differences:these are the causes

Page 59: 1 Revisiting Difficult Constraints if (hash(x) == hash(y)) {... } How do we cover this code? Suppose we’re running (DART, SAGE, SMART, CUTE, SPLAT, etc.)

60

14: assert (a < 4);

5: b = 4

6: c = -4

7: a = 2

8: a = 1

9: a = 6

10: a = 4

11: c = 9

12: c = 10

13: a = 10

4: a = 5

14: assert (a < 4);

5: b = -3

6: c = -4

7: a = 2

8: a = 1

9: a = 6

10: a = 4

11: c = 9

12: c = 3

13: a = 3

4: a = 5

and the localization –lines 5, 12, and 13 are

likely bug locations.

Page 60: 1 Revisiting Difficult Constraints if (hash(x) == hash(y)) {... } How do we cover this code? Suppose we’re running (DART, SAGE, SMART, CUTE, SPLAT, etc.)

61

Explanation with Distance Metrics

How it’s done:

Model checker

P+spec

First, the program (P) and

specification (spec) are sent

to the model checker.

Page 61: 1 Revisiting Difficult Constraints if (hash(x) == hash(y)) {... } How do we cover this code? Suppose we’re running (DART, SAGE, SMART, CUTE, SPLAT, etc.)

62

Explanation with Distance Metrics

How it’s done:

Model checker

P+spec C

The model checker finds

a counterexample, C.

Page 62: 1 Revisiting Difficult Constraints if (hash(x) == hash(y)) {... } How do we cover this code? Suppose we’re running (DART, SAGE, SMART, CUTE, SPLAT, etc.)

63

Explanation with Distance Metrics

How it’s done:

Model checker

BMC/constraint generator

P+spec C

The explanation tool uses P,

spec, and C to generate (via

Bounded Model Checking) a

formula with solutions that

are executions of P that are

not counterexamples

Page 63: 1 Revisiting Difficult Constraints if (hash(x) == hash(y)) {... } How do we cover this code? Suppose we’re running (DART, SAGE, SMART, CUTE, SPLAT, etc.)

64

Explanation with Distance Metrics

How it’s done:

Model checker

BMC/constraint generator

P+spec C

S

Constraints are added to this

formula for an optimization

problem: find a solution that

is as similar to C as possible,

by the distance metric d. The

formula + optimization

problem is S

Page 64: 1 Revisiting Difficult Constraints if (hash(x) == hash(y)) {... } How do we cover this code? Suppose we’re running (DART, SAGE, SMART, CUTE, SPLAT, etc.)

65

Explanation with Distance Metrics

How it’s done:

Model checker

BMC/constraint generator

P+spec C

Optimization tool

S -C

An optimization tool (PBS,

the Pseudo-Boolean Solver)

finds a solution to S:

an execution of P that is not

a counterexample, and is

as similar as possible to C:

call this execution -C

Page 65: 1 Revisiting Difficult Constraints if (hash(x) == hash(y)) {... } How do we cover this code? Suppose we’re running (DART, SAGE, SMART, CUTE, SPLAT, etc.)

66

Explanation with Distance Metrics

How it’s done:

Model checker

BMC/constraint generator

P+spec C

Optimization tool

S -C

C

-Cs

Report the differences (s)

between C and –C to the

user: explanation and fault

localization

Page 66: 1 Revisiting Difficult Constraints if (hash(x) == hash(y)) {... } How do we cover this code? Suppose we’re running (DART, SAGE, SMART, CUTE, SPLAT, etc.)

67

Explanation with Distance Metrics

The metric d is based on Static Single Assignment (SSA) (plus loop unrolling)• A variation on SSA, to be precise

CBMC model checker (bounded model checker for C programs) translates an ANSI C program into a set of equations

An execution of the program is just a solution to this set of equations

Page 67: 1 Revisiting Difficult Constraints if (hash(x) == hash(y)) {... } How do we cover this code? Suppose we’re running (DART, SAGE, SMART, CUTE, SPLAT, etc.)

68

“SSA” Transformation

int main () {

int x, y;

int z = y;

if (x > 0)

y--;

else

y++;

z++;

assert (y == z);

}

int main () {

int x0, y0;

int z0 = y0;

y1 = y0 - 1;

y2 = y0 + 1;

guard1 = x0 > 0;

y3 = guard1?y1:y2;

z1 = z0 + 1;

assert (y3 == z1);

}

Page 68: 1 Revisiting Difficult Constraints if (hash(x) == hash(y)) {... } How do we cover this code? Suppose we’re running (DART, SAGE, SMART, CUTE, SPLAT, etc.)

69

Transformation to Equationsint main () {

int x0, y0;

int z0 = y0;

y1 = y0 - 1;

y2 = y0 + 1;

guard1 = x0 > 0;

y3 = guard1?y1:y2;

z1 = z0 + 1;

assert (y3 == z1);

}

(z0 == y0

y1 == y0 – 1

y2 == y0 + 1

guard1 == x0 > 0

y3 == guard1?y1:y2

z1 == z0 + 1

y3 == z1)

Page 69: 1 Revisiting Difficult Constraints if (hash(x) == hash(y)) {... } How do we cover this code? Suppose we’re running (DART, SAGE, SMART, CUTE, SPLAT, etc.)

70

Transformation to Equationsint main () {

int x0, y0;

int z0 = y0;

y1 = y0 - 1;

y2 = y0 + 1;

guard1 = x0 > 0;

y3 = guard1?y1:y2;

z1 = z0 + 1;

assert (y3 == z1);

}

(z0 == y0

y1 == y0 – 1

y2 == y0 + 1

guard1 == x0 > 0

y3 == guard1?y1:y2

z1 == z0 + 1

y3 == z1)

Uninitialized variables in CBMC are unconstrained inputs.

Page 70: 1 Revisiting Difficult Constraints if (hash(x) == hash(y)) {... } How do we cover this code? Suppose we’re running (DART, SAGE, SMART, CUTE, SPLAT, etc.)

71

Transformation to Equationsint main () {

int x0, y0;

int z0 = y0;

y1 = y0 - 1;

y2 = y0 + 1;

guard1 = x0 > 0;

y3 = guard1?y1:y2;

z1 = z0 + 1;

assert (y3 == z1);

}

(z0 == y0

y1 == y0 – 1

y2 == y0 + 1

guard1 == x0 > 0

y3 == guard1?y1:y2

z1 == z0 + 1

y3 == z1)

CBMC (1) negates the assertion

Page 71: 1 Revisiting Difficult Constraints if (hash(x) == hash(y)) {... } How do we cover this code? Suppose we’re running (DART, SAGE, SMART, CUTE, SPLAT, etc.)

72

Transformation to Equationsint main () {

int x0, y0;

int z0 = y0;

y1 = y0 - 1;

y2 = y0 + 1;

guard1 = x0 > 0;

y3 = guard1?y1:y2;

z1 = z0 + 1;

assert (y3 == z1);

}

(z0 == y0

y1 == y0 – 1

y2 == y0 + 1

guard1 == x0 > 0

y3 == guard1?y1:y2

z1 == z0 + 1

y3 != z1)

(assertion is now negated)

Page 72: 1 Revisiting Difficult Constraints if (hash(x) == hash(y)) {... } How do we cover this code? Suppose we’re running (DART, SAGE, SMART, CUTE, SPLAT, etc.)

73

Transformation to Equationsint main () {

int x0, y0;

int z0 = y0;

y1 = y0 - 1;

y2 = y0 + 1;

guard1 = x0 > 0;

y3 = guard1?y1:y2;

z1 = z0 + 1;

assert (y3 == z1);

}

(z0 == y0

y1 == y0 – 1

y2 == y0 + 1

guard1 == x0 > 0

y3 == guard1?y1:y2

z1 == z0 + 1

y3 != z1)

then (2) translates to SAT and usesa fast solver to find a counterexample

Page 73: 1 Revisiting Difficult Constraints if (hash(x) == hash(y)) {... } How do we cover this code? Suppose we’re running (DART, SAGE, SMART, CUTE, SPLAT, etc.)

74

Execution Representation

(z0 == y0

y1 == y0 – 1

y2 == y0 + 1

guard1 == x0 > 0

y3 == guard1?y1:y2

z1 == z0 + 1

y3 != z1)

Remove the assertion to get an equation forany execution of the program

(take care of loops by unrolling)

Page 74: 1 Revisiting Difficult Constraints if (hash(x) == hash(y)) {... } How do we cover this code? Suppose we’re running (DART, SAGE, SMART, CUTE, SPLAT, etc.)

75

Execution Representation

(z0 == y0

y1 == y0 – 1

y2 == y0 + 1

guard1 == x0 > 0

y3 == guard1?y1:y2

z1 == z0 + 1

y3 != z1)

Execution represented by assignments toall variables in the equations

x0 == 1

y0 == 5

z0 == 5

y1 == 4

y2 == 6

guard1 == true

y3 == 4

z1 == 6

Counterexample

Page 75: 1 Revisiting Difficult Constraints if (hash(x) == hash(y)) {... } How do we cover this code? Suppose we’re running (DART, SAGE, SMART, CUTE, SPLAT, etc.)

76

Execution Representation

x0 == 1

y0 == 5

z0 == 5

y1 == 4

y2 == 6

guard1 == true

y3 == 4

z1 == 6

Counterexample

Execution represented by assignments toall variables in the equations

x0 == 0

y0 == 5

z0 == 5

y1 == 4

y2 == 6

guard1 == false

y3 == 6

z1 == 6

Successful execution

Page 76: 1 Revisiting Difficult Constraints if (hash(x) == hash(y)) {... } How do we cover this code? Suppose we’re running (DART, SAGE, SMART, CUTE, SPLAT, etc.)

77

The Distance Metric d

x0 == 1

y0 == 5

z0 == 5

y1 == 4

y2 == 6

guard1 == true

y3 == 4

z1 == 6

Counterexample

d = number of changes (s) between two executions

x0 == 0

y0 == 5

z0 == 5

y1 == 4

y2 == 6

guard1 == false

y3 == 6

z1 == 6

Successful execution

Page 77: 1 Revisiting Difficult Constraints if (hash(x) == hash(y)) {... } How do we cover this code? Suppose we’re running (DART, SAGE, SMART, CUTE, SPLAT, etc.)

78

The Distance Metric d

x0 == 1

y0 == 5

z0 == 5

y1 == 4

y2 == 6

guard1 == true

y3 == 4

z1 == 6

Counterexample

d = number of changes (s) between two executions

x0 == 0

y0 == 5

z0 == 5

y1 == 4

y2 == 6

guard1 == false

y3 == 6

z1 == 6

Successful execution

Page 78: 1 Revisiting Difficult Constraints if (hash(x) == hash(y)) {... } How do we cover this code? Suppose we’re running (DART, SAGE, SMART, CUTE, SPLAT, etc.)

79

The Distance Metric d

x0 == 1

y0 == 5

z0 == 5

y1 == 4

y2 == 6

guard1 == true

y3 == 4

z1 == 6

Counterexample

d = number of changes (s) between two executions

x0 == 0

y0 == 5

z0 == 5

y1 == 4

y2 == 6

guard1 == false

y3 == 6

z1 == 6

Successful execution

Page 79: 1 Revisiting Difficult Constraints if (hash(x) == hash(y)) {... } How do we cover this code? Suppose we’re running (DART, SAGE, SMART, CUTE, SPLAT, etc.)

80

The Distance Metric d

x0 == 1

y0 == 5

z0 == 5

y1 == 4

y2 == 6

guard1 == true

y3 == 4

z1 == 6

Counterexample

d = number of changes (s) between two executions

x0 == 0

y0 == 5

z0 == 5

y1 == 4

y2 == 6

guard1 == false

y3 == 6

z1 == 6

Successful execution

d = 3

Page 80: 1 Revisiting Difficult Constraints if (hash(x) == hash(y)) {... } How do we cover this code? Suppose we’re running (DART, SAGE, SMART, CUTE, SPLAT, etc.)

81

The Distance Metric d

x0 == 1

y0 == 5

z0 == 5

y1 == 4

y2 == 6

guard1 == true

y3 == 4

z1 == 6

Counterexample

3 is the minimum possible distance between thecounterexample and a successful execution

x0 == 0

y0 == 5

z0 == 5

y1 == 4

y2 == 6

guard1 == false

y3 == 6

z1 == 6

Successful execution

d = 3

Page 81: 1 Revisiting Difficult Constraints if (hash(x) == hash(y)) {... } How do we cover this code? Suppose we’re running (DART, SAGE, SMART, CUTE, SPLAT, etc.)

82

The Distance Metric d

x0 == 1

y0 == 5

z0 == 5

y1 == 4

y2 == 6

guard1 == true

y3 == 4

z1 == 6

Counterexample

To compute the metric, add a new SATvariable for each potential

x0 == (x0 != 1)

y0 == (y0 != 5)

z0 == (z0 != 5)

y1 == (y1 != 4)

y2 == (y2 != 6)

guard1 == !guard1

y3 == (y3 != 4)

z1 == (z1 != 6)

New SAT variables

Page 82: 1 Revisiting Difficult Constraints if (hash(x) == hash(y)) {... } How do we cover this code? Suppose we’re running (DART, SAGE, SMART, CUTE, SPLAT, etc.)

83

The Distance Metric d

x0 == 1

y0 == 5

z0 == 5

y1 == 4

y2 == 6

guard1 == true

y3 == 4

z1 == 6

Counterexample

And minimize the sum of the variables(treated as 0/1 values): a pseudo-Boolean problem

x0 == (x0 != 1)

y0 == (y0 != 5)

z0 == (z0 != 5)

y1 == (y1 != 4)

y2 == (y2 != 6)

guard1 == !guard1

y3 == (y3 != 4)

z1 == (z1 != 6)

New SAT variables

Page 83: 1 Revisiting Difficult Constraints if (hash(x) == hash(y)) {... } How do we cover this code? Suppose we’re running (DART, SAGE, SMART, CUTE, SPLAT, etc.)

84

The Distance Metric d

An SSA-form oddity:• Distance metric can compare values

from code that doesn’t run in either execution being compared

• This can be the determining factor in which of two traces is most similar to a counterexample

• Counterintuitive but not necessarily incorrect: simply extends comparison to all hypothetical control flow paths

Page 84: 1 Revisiting Difficult Constraints if (hash(x) == hash(y)) {... } How do we cover this code? Suppose we’re running (DART, SAGE, SMART, CUTE, SPLAT, etc.)

85

Explanation with Distance Metrics

Algorithm (lower level):1. Find a counterexample using Bounded Model

Checking (SAT)

2. Create a new problem: SAT for a successful execution + constraints for minimizing distance to counterexample (least changes)

3. Solve this optimization problem using a pseudo-Boolean solver (PBS) (= 0-1 ILP)

4. Report the differences (s) to the user as an explanation (and a localization) of the error

Page 85: 1 Revisiting Difficult Constraints if (hash(x) == hash(y)) {... } How do we cover this code? Suppose we’re running (DART, SAGE, SMART, CUTE, SPLAT, etc.)

86

Explanation with Distance Metrics

Model checker

BMC/constraint generator

P+spec C

Optimization tool

S -C

C

-Cs

CBMC

explain

PBS

Page 86: 1 Revisiting Difficult Constraints if (hash(x) == hash(y)) {... } How do we cover this code? Suppose we’re running (DART, SAGE, SMART, CUTE, SPLAT, etc.)

87

Explanation with Distance Metrics

Details hidden behind a Graphical User Interface (GUI) that hides SAT and distance metrics from users

GUI automatically highlights likely bug locations, presents changed values

Next slides: GUI in action + a teaser for experimental results

Page 87: 1 Revisiting Difficult Constraints if (hash(x) == hash(y)) {... } How do we cover this code? Suppose we’re running (DART, SAGE, SMART, CUTE, SPLAT, etc.)

88

Page 88: 1 Revisiting Difficult Constraints if (hash(x) == hash(y)) {... } How do we cover this code? Suppose we’re running (DART, SAGE, SMART, CUTE, SPLAT, etc.)

89

Page 89: 1 Revisiting Difficult Constraints if (hash(x) == hash(y)) {... } How do we cover this code? Suppose we’re running (DART, SAGE, SMART, CUTE, SPLAT, etc.)

90

Explaining Abstract Counterexamples

Page 90: 1 Revisiting Difficult Constraints if (hash(x) == hash(y)) {... } How do we cover this code? Suppose we’re running (DART, SAGE, SMART, CUTE, SPLAT, etc.)

91

Explaining Abstract CounterexamplesFirst implementation presents

differences as changes in concrete values, e.g.:• “In the counterexample, x is 14.

In the successful execution, x is 18.”

Which can miss the point:• What really matters is whether x is less

than y• But y isn’t mentioned at all!

Page 91: 1 Revisiting Difficult Constraints if (hash(x) == hash(y)) {... } How do we cover this code? Suppose we’re running (DART, SAGE, SMART, CUTE, SPLAT, etc.)

92

Explaining Abstract Counterexamples If the counterexample and successful

execution were abstract traces, we’d get variable relationships and generalization for “free”

Abstraction should also make the model checking more scalable• This is why abstraction is traditionally

used in model checking, in fact

Page 92: 1 Revisiting Difficult Constraints if (hash(x) == hash(y)) {... } How do we cover this code? Suppose we’re running (DART, SAGE, SMART, CUTE, SPLAT, etc.)

93

Model Checking + Abstraction

In abstract model checking, the model checker explores an abstract state space

In predicate abstraction, states consist of predicates that are true in a state, rather than concrete values:• Concrete:x = 12, y = 15, z = 0

• Abstract:x < y, z != 1

Page 93: 1 Revisiting Difficult Constraints if (hash(x) == hash(y)) {... } How do we cover this code? Suppose we’re running (DART, SAGE, SMART, CUTE, SPLAT, etc.)

94

Model Checking + Abstraction

In abstract model checking, the model checker explores an abstract state space.

In predicate abstraction, states consist of predicates that are true in a state, rather than concrete values:• Concrete:x = 12, y = 15, z = 0

• Abstract:x < y, z != 1

Potentially represents many concrete states

Page 94: 1 Revisiting Difficult Constraints if (hash(x) == hash(y)) {... } How do we cover this code? Suppose we’re running (DART, SAGE, SMART, CUTE, SPLAT, etc.)

95

Model Checking + Abstraction

Conservative predicate abstraction preserves all erroneous behaviors in the original system

Abstract “executions” now potentially represent a set of concrete executions

Must check execution to see if it matches some real behavior of program: abstraction adds behavior

Page 95: 1 Revisiting Difficult Constraints if (hash(x) == hash(y)) {... } How do we cover this code? Suppose we’re running (DART, SAGE, SMART, CUTE, SPLAT, etc.)

96

Implementation #2

MAGIC Predicate Abstraction Based Model Checker for C programs:• Input: C program• Checks for various properties:

• assert statements• Simulation of a specification machine

• Provides counterexample if property does not hold

• Counterexamples are abstract executions – that describe real behavior of the actual program

• Now provides error explanation and fault localization

Page 96: 1 Revisiting Difficult Constraints if (hash(x) == hash(y)) {... } How do we cover this code? Suppose we’re running (DART, SAGE, SMART, CUTE, SPLAT, etc.)

97

Model Checking + AbstractionPredicates & counterexample produced by the usual

Counterexample Guided Abstraction Refinement Framework.

Explanation will work as in the first case presented, except:• The explanation will be in terms of control flow differences and• Changes in predicate values.

Page 97: 1 Revisiting Difficult Constraints if (hash(x) == hash(y)) {... } How do we cover this code? Suppose we’re running (DART, SAGE, SMART, CUTE, SPLAT, etc.)

98

MAGIC Overview

YesAbstractionAbstractionModel

CounterexampleReal?

CounterexampleReal?

No

Abstract Counterexample

AbstractionRefinementAbstractionRefinement

New Predicates

No

SpuriousCounterexample

Yes

VerificationVerification

Spec

Spec Holds

P

Real

Page 98: 1 Revisiting Difficult Constraints if (hash(x) == hash(y)) {... } How do we cover this code? Suppose we’re running (DART, SAGE, SMART, CUTE, SPLAT, etc.)

99

MAGIC Overview

YesAbstractionAbstractionModel

CounterexampleReal?

CounterexampleReal?

No

AbstractionRefinementAbstractionRefinement

No

SpuriousCounterexample

Yes

VerificationVerification

Spec

Spec Holds

P

Real

New Predicates Abstract Counterexample

Page 99: 1 Revisiting Difficult Constraints if (hash(x) == hash(y)) {... } How do we cover this code? Suppose we’re running (DART, SAGE, SMART, CUTE, SPLAT, etc.)

100

Model Checking + Abstraction

Explain an abstract counterexample that represents (at least one) real execution of the program

Explain with another abstract execution that:• Is not a counterexample• Is as similar as possible to the

abstract counterexample• Also represents real behavior

Page 100: 1 Revisiting Difficult Constraints if (hash(x) == hash(y)) {... } How do we cover this code? Suppose we’re running (DART, SAGE, SMART, CUTE, SPLAT, etc.)

101

14: assert (a < 4);

5: b = 4

6: c = -4

7: a = 2

8: a = 1

9: a = 6

10: a = 4

11: c = 9

12: c = 10

13: a = 10

4: a = 5

14: assert (a < 4);

5: b = -3

6: c = -4

7: a = 2

8: a = 1

9: a = 6

10: a = 4

11: c = 9

12: c = 3

13: a = 3

4: a = 5

Abstract rather than concrete traces:represent more than one execution

Automatic generalization

Page 101: 1 Revisiting Difficult Constraints if (hash(x) == hash(y)) {... } How do we cover this code? Suppose we’re running (DART, SAGE, SMART, CUTE, SPLAT, etc.)

102

14: assert (a < 4);

5: b > 2

6: c < 7

7: a >= 4

8: a <= 4

9: a >= 4

10: a <= 4

11: c >= 7

12: c >= 7

13: a >= 4

4: a >= 4

14: assert (a < 4);

5: b <= 2

6: c < 7

7: a > 4

8: a <= 4

9: a > 4

10: a <= 4

11: c >= 9

12: c < 7

13: a < 3

4: a >= 4

Abstract rather than concrete traces:represent more than one execution

Automatic generalization

Page 102: 1 Revisiting Difficult Constraints if (hash(x) == hash(y)) {... } How do we cover this code? Suppose we’re running (DART, SAGE, SMART, CUTE, SPLAT, etc.)

103

14: assert (a < 4);

5: b > 2

6: c < 7

7: a >= 4

8: a <= 4

9: a >= 4

10: a <= 4

11: c >= 7

12: c >= 7

13: a >= 4

4: a >= 4

14: assert (a < 4);

5: b <= 2

6: c < 7

7: a > 4

8: a <= 4

9: a > 4

10: a <= 4

11: c >= 9

12: c < 7

13: a < 3

4: a >= 4

Automatic generalization

c >= 7:

c = 7, c = 8,

c = 9, c = 10…

c < 7:

c = 6, c = 5,

c = 4, c = 3…

Page 103: 1 Revisiting Difficult Constraints if (hash(x) == hash(y)) {... } How do we cover this code? Suppose we’re running (DART, SAGE, SMART, CUTE, SPLAT, etc.)

104

14: assert (a < 4);

5: b > 2

6: c < 7

7: a >= 4

8: a <= 4

9: a >= 4

10: a <= 4

11: c >= 7

12: c >= a

13: a >= 4

4: a >= 4

14: assert (a < 4);

5: b <= 2

6: c < 7

7: a > 4

8: a <= 4

9: a > 4

10: a <= 4

11: c >= 9

12: c < a

13: a < 3

4: a >= 4

Relationships between variables

c >= a:

c = 7 a = 7,

c = 9 a = 6…

c < a:

c = 7 a = 10,

c = 3 a = 4…

Page 104: 1 Revisiting Difficult Constraints if (hash(x) == hash(y)) {... } How do we cover this code? Suppose we’re running (DART, SAGE, SMART, CUTE, SPLAT, etc.)

105

An Example

1 int main () {2 int input1, input2, input3;3 int least = input1;4 int most = input1;5 if (most < input2)6 most = input2;7 if (most < input3)8 most = input3;9 if (least > input2)10 most = input2;11 if (least > input3)12 least = input3;13 assert (least <= most);14 }

Page 105: 1 Revisiting Difficult Constraints if (hash(x) == hash(y)) {... } How do we cover this code? Suppose we’re running (DART, SAGE, SMART, CUTE, SPLAT, etc.)

106

An Example

1 int main () {2 int input1, input2, input3;3 int least = input1;4 int most = input1;5 if (most < input2)6 most = input2;7 if (most < input3)8 most = input3;9 if (least > input2)10 most = input2;11 if (least > input3)12 least = input3;13 assert (least <= most);14 }

Page 106: 1 Revisiting Difficult Constraints if (hash(x) == hash(y)) {... } How do we cover this code? Suppose we’re running (DART, SAGE, SMART, CUTE, SPLAT, etc.)

107

An Example

1 int main () {2 int input1, input2, input3;3 int least = input1;4 int most = input1;5 if (most < input2)6 most = input2;7 if (most < input3)8 most = input3;9 if (least > input2)10 most = input2;11 if (least > input3)12 least = input3;13 assert (least <= most);14 }

Page 107: 1 Revisiting Difficult Constraints if (hash(x) == hash(y)) {... } How do we cover this code? Suppose we’re running (DART, SAGE, SMART, CUTE, SPLAT, etc.)

108

An Example

Value changed (line 2): input3#0 from 2147483615 to 0Value changed (line 12): least#2 from 2147483615 to 0Value changed (line 13): least#3 from 2147483615 to 0

Page 108: 1 Revisiting Difficult Constraints if (hash(x) == hash(y)) {... } How do we cover this code? Suppose we’re running (DART, SAGE, SMART, CUTE, SPLAT, etc.)

109

An Example

Not very obvious what

this means…

Value changed (line 2): input3#0 from 2147483615 to 0Value changed (line 12): least#2 from 2147483615 to 0Value changed (line 13): least#3 from 2147483615 to 0

Page 109: 1 Revisiting Difficult Constraints if (hash(x) == hash(y)) {... } How do we cover this code? Suppose we’re running (DART, SAGE, SMART, CUTE, SPLAT, etc.)

110

An Example

Control location deleted (step #5): 10: most = input2Predicate changed (step #5): was: most < least now: least <= mostPredicate changed (step #5): was: most < input3 now: input3 <= most------------------------Predicate changed (step #6): was: most < least now: least <= mostAction changed (step #6): was: assertion_failure

Page 110: 1 Revisiting Difficult Constraints if (hash(x) == hash(y)) {... } How do we cover this code? Suppose we’re running (DART, SAGE, SMART, CUTE, SPLAT, etc.)

111

An Example

Control location deleted (step #5): 10: most = input2Predicate changed (step #5): was: most < least now: least <= mostPredicate changed (step #5): was: most < input3 now: input3 <= most------------------------Predicate changed (step #6): was: most < least now: least <= mostAction changed (step #6): was: assertion_failure

Here, on the other hand:

Page 111: 1 Revisiting Difficult Constraints if (hash(x) == hash(y)) {... } How do we cover this code? Suppose we’re running (DART, SAGE, SMART, CUTE, SPLAT, etc.)

112

An Example

Control location deleted (step #5): 10: most = input2Predicate changed (step #5): was: most < least now: least <= mostPredicate changed (step #5): was: most < input3 now: input3 <= most------------------------Predicate changed (step #6): was: most < least now: least <= mostAction changed (step #6): was: assertion_failure

Here, on the other hand:

Line with error indicated

Avoid error by notexecuting line 10

Page 112: 1 Revisiting Difficult Constraints if (hash(x) == hash(y)) {... } How do we cover this code? Suppose we’re running (DART, SAGE, SMART, CUTE, SPLAT, etc.)

113

An Example

Control location deleted (step #5): 10: most = input2Predicate changed (step #5): was: most < least now: least <= mostPredicate changed (step #5): was: most < input3 now: input3 <= most------------------------Predicate changed (step #6): was: most < least now: least <= mostAction changed (step #6): was: assertion_failure

Predicates show howchange in control flowaffects relationship of

the variables

Page 113: 1 Revisiting Difficult Constraints if (hash(x) == hash(y)) {... } How do we cover this code? Suppose we’re running (DART, SAGE, SMART, CUTE, SPLAT, etc.)

114

Explaining Abstract Counterexamples Implemented in the MAGIC predicate

abstraction-based model checker

MAGIC represents executions as paths of states, not in SSA form

New distance metrics resembles traditional metrics from string or sequence comparison:• Insert, delete, replace operations• State = PC + predicate values

Page 114: 1 Revisiting Difficult Constraints if (hash(x) == hash(y)) {... } How do we cover this code? Suppose we’re running (DART, SAGE, SMART, CUTE, SPLAT, etc.)

115

Explaining Abstract CounterexamplesSame underlying method as for

concrete explanation

Revise the distance metric to account for the new representation of program executions

Model checker

BMC/constraint generator

P+spec C

Optimization tool

S -C

C-C

s

MAGIC

MAGIC/explain

still PBS

Page 115: 1 Revisiting Difficult Constraints if (hash(x) == hash(y)) {... } How do we cover this code? Suppose we’re running (DART, SAGE, SMART, CUTE, SPLAT, etc.)

116

CBMC vs. MAGIC Representations

input1#0 == 0

input2#0 == -1

input3#0 == 0

least#0 == 0

most#0 == 0

guard0 == true

guard1 == false

least#1 == 0

CBMC: SSA Assignments

s0

s1

s2

s3

MAGIC: States & actions

0

1

2

Page 116: 1 Revisiting Difficult Constraints if (hash(x) == hash(y)) {... } How do we cover this code? Suppose we’re running (DART, SAGE, SMART, CUTE, SPLAT, etc.)

117

CBMC vs. MAGIC Representations

input1#0 == 0

input2#0 == -1

input3#0 == 0

least#0 == 0

most#0 == 0

guard0 == true

guard1 == false

least#1 == 0

CBMC: SSA Assignments

s0

s1

s2

s3

MAGIC: States & actions

0

1

2

Control location:

Line 5

Predicates:

input1 > input2

least == input1

...

Page 117: 1 Revisiting Difficult Constraints if (hash(x) == hash(y)) {... } How do we cover this code? Suppose we’re running (DART, SAGE, SMART, CUTE, SPLAT, etc.)

118

A New Distance Metric

s0

s1

s2

s3

0

1

2

s’0

s’1

s’2

s’3

0

1

2

s’4

3

Must determine whichstates to compare: maybe different number of

states in two executions

Make use of literatureon string/sequence

comparison & metrics

Page 118: 1 Revisiting Difficult Constraints if (hash(x) == hash(y)) {... } How do we cover this code? Suppose we’re running (DART, SAGE, SMART, CUTE, SPLAT, etc.)

119

Alignment

s0

s1

s2

s3

0

1

2

s’0

s’1

s’2

s’3

0

1

2

s’4

3

1. Only compare stateswith matching

control locations

1

5

7

9

1

3

7

8

11

Page 119: 1 Revisiting Difficult Constraints if (hash(x) == hash(y)) {... } How do we cover this code? Suppose we’re running (DART, SAGE, SMART, CUTE, SPLAT, etc.)

120

Alignment

s0

s1

s2

s3

0

1

2

s’0

s’1

s’2

s’3

0

1

2

s’4

3

1

5

7

9

1

3

7

8

11

Page 120: 1 Revisiting Difficult Constraints if (hash(x) == hash(y)) {... } How do we cover this code? Suppose we’re running (DART, SAGE, SMART, CUTE, SPLAT, etc.)

121

Alignment

s0

s1

s2

s3

0

1

2

s’0

s’1

s’2

s’3

0

1

2

s’4

3

1

5

7

9

1

3

7

8

11

Page 121: 1 Revisiting Difficult Constraints if (hash(x) == hash(y)) {... } How do we cover this code? Suppose we’re running (DART, SAGE, SMART, CUTE, SPLAT, etc.)

122

Alignment

s0

s1

s2

s3

0

1

2

s’0

s’1

s’2

s’3

0

1

2

s’4

3

2. Must be unique

Page 122: 1 Revisiting Difficult Constraints if (hash(x) == hash(y)) {... } How do we cover this code? Suppose we’re running (DART, SAGE, SMART, CUTE, SPLAT, etc.)

123

Alignment

s0

s1

s2

s3

0

1

2

s’0

s’1

s’2

s’3

0

1

2

s’4

3

Page 123: 1 Revisiting Difficult Constraints if (hash(x) == hash(y)) {... } How do we cover this code? Suppose we’re running (DART, SAGE, SMART, CUTE, SPLAT, etc.)

124

Alignment

s0

s1

s2

s3

0

1

2

s’0

s’1

s’2

s’3

0

1

2

s’4

3

3. Don’t cross overother alignments

Page 124: 1 Revisiting Difficult Constraints if (hash(x) == hash(y)) {... } How do we cover this code? Suppose we’re running (DART, SAGE, SMART, CUTE, SPLAT, etc.)

125

Alignment

s0

s1

s2

s3

0

1

2

s’0

s’1

s’2

s’3

0

1

2

s’4

3

Page 125: 1 Revisiting Difficult Constraints if (hash(x) == hash(y)) {... } How do we cover this code? Suppose we’re running (DART, SAGE, SMART, CUTE, SPLAT, etc.)

126

A New Distance Metric

s0

s1

s2

s3

0

1

2

s’0

s’1

s’2

s’3

0

1

2

s’4

3

In sum: much like thetraditional metrics used

to compare strings,except the alphabet

is over control locations,predicates, and actions

Page 126: 1 Revisiting Difficult Constraints if (hash(x) == hash(y)) {... } How do we cover this code? Suppose we’re running (DART, SAGE, SMART, CUTE, SPLAT, etc.)

127

A New Distance Metric

s0

s1

s2

s3

0

1

2

s’0

s’1

s’2

s’3

0

1

2

s’4

3

Encoded using BMCand psuedo-Boolean

optimization as inthe first case, with

variables for alignmentand control, predicateand action differences

Page 127: 1 Revisiting Difficult Constraints if (hash(x) == hash(y)) {... } How do we cover this code? Suppose we’re running (DART, SAGE, SMART, CUTE, SPLAT, etc.)

128

Explaining Abstract Counterexamples

One execution (Potentially) many executions

Changes in values Changes in predicates

Always real execution May be spurious

- May need to iterate/refine

Execution as SSA values Execution as path & states

- Counterintuitive metric - Intuitive metric

- No alignment problem - Must consider alignments:

Which states to compare?

BMC to produce PBS problem BMC to produce PBS problem

(CBMC) (MAGIC)

Page 128: 1 Revisiting Difficult Constraints if (hash(x) == hash(y)) {... } How do we cover this code? Suppose we’re running (DART, SAGE, SMART, CUTE, SPLAT, etc.)

129

Results

Page 129: 1 Revisiting Difficult Constraints if (hash(x) == hash(y)) {... } How do we cover this code? Suppose we’re running (DART, SAGE, SMART, CUTE, SPLAT, etc.)

130

Results: Overview

Produces good explanations for numerous interesting case studies: C/OS-II RTOS Microkernel (3K lines)• OpenSSL code (3K lines)• Fragments of Linux kernel• TCAS Resolution Advisory component• Some smaller, “toy” linear temporal logic

property examples

C/OS-II, SSL, some TCAS bugs precisely isolated: report = fault

Page 130: 1 Revisiting Difficult Constraints if (hash(x) == hash(y)) {... } How do we cover this code? Suppose we’re running (DART, SAGE, SMART, CUTE, SPLAT, etc.)

131

Results: Quantitative Evaluation

Very good scores by Renieris & Reiss method for evaluating fault localization:• Measures how much source code user

can avoid reading thanks to the localization. 1 is a perfect score

For SSL and C/OS-II case studies, scores of 0.999

Other examples (almost) all in range 0.720-0.993

Page 131: 1 Revisiting Difficult Constraints if (hash(x) == hash(y)) {... } How do we cover this code? Suppose we’re running (DART, SAGE, SMART, CUTE, SPLAT, etc.)

132

Results: Comparison

Scores were generally much better than Nearest Neighbor – when it could be applied at all• Much more consistent• Testing-based methods of Renieris

and Reiss occasionally worked better• Also gave useless (score 0) explanations

much of the time

Scores a great improvement over the counterexample traces alone

Page 132: 1 Revisiting Difficult Constraints if (hash(x) == hash(y)) {... } How do we cover this code? Suppose we’re running (DART, SAGE, SMART, CUTE, SPLAT, etc.)

133

Results: Comparison

Scores and times for variouslocalization methods

Best score for each program highlighted

* alternative scoring method for large programs

Program Explain JPF n-c n-s CBMCscore time score time score score score

TCAS 1 0.91 4 0.87 1521 0.00 0.58 0.41TCAS 11 0.93 7 0.93 5673 0.13 0.13 0.51TCAS 31 0.93 7 - - 0.00 0.00 0.46TCAS 40 0.88 6 0.87 30482 0.83 0.77 0.35TCAS 41 0.88 5 0.30 34 0.58 0.92 0.38

uCOS-ii 0.99 62 - - - - 0.97uCOS-ii* 0.81 62 - - - - 0.00

Page 133: 1 Revisiting Difficult Constraints if (hash(x) == hash(y)) {... } How do we cover this code? Suppose we’re running (DART, SAGE, SMART, CUTE, SPLAT, etc.)

134

Results: MAGIC

No program required iteration to finda non-spurious explanation: good

abstraction already discovered

Program score time CE lengthmutex-n-01.c (lock) 0.79 0.04 6mutex-n-01.c (unlock) 0.99 0.04 6pci-n-01.c 0.78 0.07 9pci-rec-n-01.c 0.72 0.09 8SSL-1 0.99 8.07 29SSL-2 0.99 3.45 52uCOS-ii 0.00 0.76 19

Page 134: 1 Revisiting Difficult Constraints if (hash(x) == hash(y)) {... } How do we cover this code? Suppose we’re running (DART, SAGE, SMART, CUTE, SPLAT, etc.)

135

Results: Time

Time to explain comparable to model checking time• No more than 10 seconds for

abstract explanation (except when it didn’t find one at all…)

• No more than 3 minutes for concrete explanations

Page 135: 1 Revisiting Difficult Constraints if (hash(x) == hash(y)) {... } How do we cover this code? Suppose we’re running (DART, SAGE, SMART, CUTE, SPLAT, etc.)

136

Results: Room for Improvement

Concrete explanation worked better than abstract in some cases• When SSA based metric produced

smaller optimization constraints

For TCAS examples, user assistance was needed in some cases• Assertion of form (A implies B)• First explanation “explains” by showing

how A can fail to hold• Easy to get a good explanation—force

model checker to assume A

Page 136: 1 Revisiting Difficult Constraints if (hash(x) == hash(y)) {... } How do we cover this code? Suppose we’re running (DART, SAGE, SMART, CUTE, SPLAT, etc.)

137

Conclusions: Good News

Counterexample explanation and fault localization can provide good assistance in locating errors

The model checking approach, when it can be applied (usually not to large programs or with complex data structures) may be most effective

But Tarantula is the real winner, unless model checking starts scaling better

Page 137: 1 Revisiting Difficult Constraints if (hash(x) == hash(y)) {... } How do we cover this code? Suppose we’re running (DART, SAGE, SMART, CUTE, SPLAT, etc.)

138

Future Work?

The ultimate goal: testing tool or model checker fixes our programs for us – automatic program repair!

That’s not going to happen, I think

But we can try (and people are doing just that, right now)

Page 138: 1 Revisiting Difficult Constraints if (hash(x) == hash(y)) {... } How do we cover this code? Suppose we’re running (DART, SAGE, SMART, CUTE, SPLAT, etc.)

139

Model Checking and Scaling

Next week we’ll look at a kind of “model checking” that doesn’t involve building SAT equations or producing an abstraction• We’ll run the program and backtrack

execution• Really just an oddball form of testing• Can’t do “stupid SAT-solver tricks”

like using PBS to produce great fault localizations, but has some other benefits