Symbolic Execution Amal Khalil & Juergen Dingel CISC836: Models in Software Development: Methods,...

27
Symbolic Execution Amal Khalil & Juergen Dingel CISC836: Models in Software Development: Methods, Techniques, and Tools Winter 2015

Transcript of Symbolic Execution Amal Khalil & Juergen Dingel CISC836: Models in Software Development: Methods,...

Page 1: Symbolic Execution Amal Khalil & Juergen Dingel CISC836: Models in Software Development: Methods, Techniques, and Tools Winter 2015.

Symbolic Execution

Amal Khalil & Juergen Dingel

CISC836: Models in Software Development: Methods, Techniques, and Tools

Winter 2015

Page 2: Symbolic Execution Amal Khalil & Juergen Dingel CISC836: Models in Software Development: Methods, Techniques, and Tools Winter 2015.

Outline

• Overview of Classical Symbolic Execution– How it works– Application of symbolic execution– Challenges of symbolic execution

• Modern Symbolic Execution Techniques: Combining concrete and symbolic executions– Concolic Testing– Execution Generated Testing (EGT)

• KLEE Demo

2

Page 3: Symbolic Execution Amal Khalil & Juergen Dingel CISC836: Models in Software Development: Methods, Techniques, and Tools Winter 2015.

Motivation

• Testing is a practical way for verifying programs.

• Manual testing is difficult and it requires knowledge of code and constant maintenance.

• Random testing is easy to perform but it is ineffective.– It does not guarantee a full coverage of all program paths.

• Symbolic execution can systematically explore a large number of program paths.– It is commonly used to derive the testing process and hence achieving

higher path coverage.

3

Page 4: Symbolic Execution Amal Khalil & Juergen Dingel CISC836: Models in Software Development: Methods, Techniques, and Tools Winter 2015.

Symbolic Execution• A program analysis technique that allows the execution of programs in a

parametric way using symbolic inputs to derive precise characterizations of their properties and their execution paths.

• Firstly introduced in the 70’s by Lori A. Clarke [1976] & James C. King [1976] for program testing.

• Since 2003, a lot of research efforts has been devoted to improve the effectiveness, the efficiency and the applicability of the traditional technique [Yang et al. 2014].

• Examples of Symbolic Execution Tools:– jCUTE, JPF (Java)– KLEE (LLVM IR for C/C++)– Pex (.NET Framework )

4

Page 5: Symbolic Execution Amal Khalil & Juergen Dingel CISC836: Models in Software Development: Methods, Techniques, and Tools Winter 2015.

How does Symbolic Execution work?

• The main idea is to substitute program inputs with symbolic values and then execute the program parametrically such that:

– The values of all program variables are computed as symbolic expressions over the symbolic input values;

– The execution can proceed along any feasible path.

5

Page 6: Symbolic Execution Amal Khalil & Juergen Dingel CISC836: Models in Software Development: Methods, Techniques, and Tools Winter 2015.

How does Symbolic Execution work?

• The result from the symbolic execution of a program is a tree-based structure called symbolic execution tree (SET).– The nodes of a SET represent the symbolic program states and the

edges represent the transitions between these states.– Each program symbolic state consists of the set of program variables

and their symbolic valuations, a program location, and a path constraint (PC) which is the the conjunction of all the logical constraints collected over the program variables to reach that program location.• Decision procedures and SMT solvers are used to check the satisfiablity of each

path constraint (PC).• The set of path constraints computed by symbolic execution is used to enable

various analysis, verification, and testing tasks.

– The paths of a SET characterize all the distinct execution paths of a program.

6

Page 7: Symbolic Execution Amal Khalil & Juergen Dingel CISC836: Models in Software Development: Methods, Techniques, and Tools Winter 2015.

Constraints, Decision Procedures, and SMT Solvers

• Constraints– X > Y Λ Y+X ≤ 10 (X, Y are called free variables)– A solution of the constraint is a set of assignments, one for each free

variable that makes the constraint satisfiable.– {X = 3, Y=2} is a solution but {X = 6, Y=5} is not.– Types of constraints

• Linear constraint (e.g., X > Y Λ Y+X ≤ 10)• Non-linear constraint (e.g., X * Y < 100, X % 3 Λ Y > 10, and (X >> 3) < Y)• Use of function symbols (e.g., f(X)> 10 Λ (forall a. f(a) = a + 10))

• A decision procedure is a tool that can decide if a constraint is satisfiable.– In general, checking constraint satisfiability is undecidable.• A constraint solver is a tool that finds satisfying assignments for a

constraint, if it is satisfiable.Note: This page is taken from Saswat Anand’s slides on Symbolic Execution, 2009. http://www.cc.gatech.edu/~

harrold/6340/cs6340_fall2009/Slides/SymExClass-09.pdf

7

Page 8: Symbolic Execution Amal Khalil & Juergen Dingel CISC836: Models in Software Development: Methods, Techniques, and Tools Winter 2015.

int foo (int x, int y){1: if (x > y)2: x = x - y;3: else4: x = y - x;5: if (x > 0)6: x++;7: else8: x--;9: return x; }

Loc: 1x: X, y: YPC: true

Loc: 4x: X, y: YPC: X<=Y

1: if(x>y) - else

Loc: 5x: Y-X, y: YPC: X<=Y

4: x = y - x;

Loc: 6x: X-Y, y: Y

PC: X>Y^X-Y>0

5: if(x>0) - then

Loc: 8x: X-Y, y: Y

PC: X>Y^X-Y<=0

5: if(x>0) - else

Loc: 6x: Y-X, y: Y

PC: X<=Y^Y-X>0

5: if(x>0) - then

Loc: 8x: Y-X, y: Y

PC: X<=Y^Y-X<=0

5: if(x>0) - else

Loc: 5x: X-Y, y: YPC: X>Y

2: x = x - y;

Loc: 2x: X, y: YPC: X>Y

1: if(x>y) - then

Loc: 9x: X-Y+1, y: Y

PC: X>Y^X-Y>0

6: x++;

Loc: 9x: Y-X+1, y: Y

PC: X<=Y^Y-X>0

6: x++;

Loc: 9x: Y-X-1, y: Y

PC: X<=Y^Y-X<=0

8: x--;Unsatisfiable PC >> Infeasible path

Example #1

Path: 1, 2, 5, 6, 9Test inputs: x: 7, y: 5

Path: 1, 4, 5, 6, 9Test inputs: x: 3, y: 9

Path: 1, 4, 5, 8, 9Test inputs: x: 1, y: 1

8

Page 9: Symbolic Execution Amal Khalil & Juergen Dingel CISC836: Models in Software Development: Methods, Techniques, and Tools Winter 2015.

int foo (int x, int y){1: if (x > y)2: x = x - y;3: else4: x = y - x;5: if (x >= 0)6: x++;7: else8: x--;9: return x; }

Loc: 1x: X, y: YPC: true

Loc: 4x: X, y: YPC: X<=Y

1: if(x>y) - else

Loc: 5x: Y-X, y: YPC: X<=Y

4: x = y - x;

Loc: 6x: X-Y, y: Y

PC: X>Y^X-Y>=0

5: if(x>0) - then

Loc: 8x: X-Y, y: Y

PC: X>Y^X-Y<0

5: if(x>0) - else

Loc: 6x: Y-X, y: Y

PC: X<=Y^Y-X>=0

5: if(x>0) - then

Loc: 8x: Y-X, y: Y

PC: X<=Y^Y-X<0

5: if(x>0) - else

Loc: 5x: X-Y, y: YPC: X>Y

2: x = x - y;

Loc: 2x: X, y: YPC: X>Y

1: if(x>y) - then

Loc: 9x: X-Y+1, y: Y

PC: X>Y^X-Y>=0

6: x++;

Loc: 9x: Y-X+1, y: Y

PC: X<=Y^Y-X>=0

6: x++;Unsatisfiable PC >> Infeasible path

Example #1

“Dead Code”

9

Unsatisfiable PC >> Infeasible path

Page 10: Symbolic Execution Amal Khalil & Juergen Dingel CISC836: Models in Software Development: Methods, Techniques, and Tools Winter 2015.

Applications of Symbolic Execution

• Test case generation• Infeasible paths detection• Invariants checking• Bug findings• Programs equivalence checking• Regression analysis• Others

10

Page 11: Symbolic Execution Amal Khalil & Juergen Dingel CISC836: Models in Software Development: Methods, Techniques, and Tools Winter 2015.

SS3 - Loc: 3N: N1, sum: 0

PC: N1>0

2: while (N>0) - true

SS1 - Loc: 1N: N1

PC: true

SS2 - Loc: 2N: N1, sum: 0

PC: true

1: sum = 0;

SS5 - Loc: 2N: N2, sum: N1

PC: N1>0

4: N = sym_input();

void testme_inf(int N) {1: int sum = 0;2: while (N > 0) {3: sum = sum + N;4: N = sym_input();5: } }

Example #2 - [Cadar & Sen 2013]

SS4 - Loc: 4N: N1, sum: N1

PC: N1>0

3: sum = sum + N;

SS6 - Loc: 3N: N2, sum: N1PC: N1>0^N2>0

2: while (N>0) - true

11

Page 12: Symbolic Execution Amal Khalil & Juergen Dingel CISC836: Models in Software Development: Methods, Techniques, and Tools Winter 2015.

void testme_inf(int N) {1: int sum = 0;2: while (N > 0) {3: sum = sum + N;4: N = sym_input();5: } }

Example #2 - [Cadar & Sen 2013]

SS3 - Loc: 3N: N1, sum: 0

PC: N1>0

2: while (N>0) - true

SS1 - Loc: 1N: N1

PC: true

SS2 - Loc: 2N: N1, sum: 0

PC: true

1: sum = 0;

SS4 - Loc: 4N: N1, sum: N1

PC: N1>0

SS5 - Loc: 2N: N2, sum: N1

PC: N1>0

4: N = sym_input();

SS6 - Loc: 3N: N2, sum: N1PC: N1>0^N2>0

3: sum = sum + N;

2: while (N>0) - true

12

Page 13: Symbolic Execution Amal Khalil & Juergen Dingel CISC836: Models in Software Development: Methods, Techniques, and Tools Winter 2015.

SS6 - Loc: 3N: N2, sum: N1PC: N1>0^N2>0

2: while (N>0) - true

SS4 - Loc: 4N: N1, sum: N1

PC: N1>0

SS5 - Loc: 2N: N2, sum: N1

PC: N1>0

SS8 - Loc: 2N: N3, sum: N1+N2

PC: N1>0^N2>0

4: N = sym_input();

void testme_inf(int N) {1: int sum = 0;2: while (N > 0) {3: sum = sum + N;4: N = sym_input();5: } }

Example #2 - [Cadar & Sen 2013]

SS7 - Loc: 4N: N2, sum: N1+N1

PC: N1>0^N2>0

3: sum = sum + N;

SS9 - Loc: 3N: N3, sum: N1+N2

PC: N1>0^N2>0^N3>0

2: while (N>0) - true

4: N = sym_input();

13

Page 14: Symbolic Execution Amal Khalil & Juergen Dingel CISC836: Models in Software Development: Methods, Techniques, and Tools Winter 2015.

void testme_inf(int N) {1: int sum = 0;2: while (N > 0) {3: sum = sum + N;4: N = sym_input();5: } }

Example #2 - [Cadar & Sen 2013]

SS6 - Loc: 3N: N2, sum: N1PC: N1>0^N2>0

2: while (N>0) - true

SS4 - Loc: 4N: N1, sum: N1

PC: N1>0

SS5 - Loc: 2N: N2, sum: N1

PC: N1>0

SS7 - Loc: 4N: N2, sum: N1+N1

PC: N1>0^N2>0

SS8 - Loc: 2N: N3, sum: N1+N2

PC: N1>0^N2>0

4: N = sym_input();

SS9 - Loc: 3N: N3, sum: N1+N2

PC: N1>0^N2>0^N3>0

3: sum = sum + N;

2: while (N>0) - true

4: N = sym_input();

14

Page 15: Symbolic Execution Amal Khalil & Juergen Dingel CISC836: Models in Software Development: Methods, Techniques, and Tools Winter 2015.

SS9 - Loc: 3N: N3, sum: N1+N2

PC: N1>0^N2>0^N3>0

2: while (N>0) - true

SS7 - Loc: 4N: N2, sum: N1+N1

PC: N1>0^N2>0

SS8 - Loc: 2N: N3, sum: N1+N2

PC: N1>0^N2>0

SS11- Loc: 2N: N4, sum: N1+N2+N3PC: N1>0^N2>0^N3>0

4: N = sym_input();

void testme_inf(int N) {1: int sum = 0;2: while (N > 0) {3: sum = sum + N;4: N = sym_input();5: } }

Example #2 - [Cadar & Sen 2013]

SS10 - Loc: 4N: N3, sum: N1+N2+N3PC: N1>0^N2>0^N3>0

3: sum = sum + N;

4: N = sym_input();

SS12 - Loc: 3N: N4, sum: N1+N2+N3

PC: N1>0^N2>0^N3>0^N4>0

2: while (N>0) - true

15

Page 16: Symbolic Execution Amal Khalil & Juergen Dingel CISC836: Models in Software Development: Methods, Techniques, and Tools Winter 2015.

Challenges of Symbolic Execution

• Path explosion problem– The number of feasible paths in a program grows exponentially with

the size of the program and can be even infinite for programs with unbounded loops & recursion.

– Proposed solutions:• Set upper bound for the number of iterations;• Summarize loop effects;• Use some abstraction criteria (e.g., subsumption) for pruning redundant

paths and reducing the state space;• Use heuristics for path finding to achieve some user-defined coverage

criteria;• Divide a program into independent parts and run the symbolic execution

for each part in parallel.

16

Page 17: Symbolic Execution Amal Khalil & Juergen Dingel CISC836: Models in Software Development: Methods, Techniques, and Tools Winter 2015.

SS3 - Loc: 3N: N1, sum: 0

PC: N1>0

2: while (N>0) - true

Loc: 5N: N1, sum: 0

PC: N1<=0

2: while (N>0) - false

SS1 - Loc: 1N: N1

PC: true

SS2 - Loc: 2N: N1, sum: 0

PC: true

1: sum = 0;

SS5 - Loc: 2N: N2, sum: N1

PC: N1>0

4: N = sym_input();

void testme_inf(int N) {1: int sum = 0;2: while (N > 0) {3: sum = sum + N;4: N = sym_input();5: } }

Example #2 - [Cadar & Sen 2013]

SS4 - Loc: 4N: N1, sum: N1

PC: N1>0

3: sum = sum + N;

SS6 - Loc: 3N: N2, sum: N1PC: N1>0^N2>0

2: while (N>0) - true

Loc: 5N: N2, sum: N1

PC: N1>0^N2<=0

2: while (N>0) - false

Solution #1:Set max-depth = 2

17

Page 18: Symbolic Execution Amal Khalil & Juergen Dingel CISC836: Models in Software Development: Methods, Techniques, and Tools Winter 2015.

SS6 - Loc: 3N: N2, sum: N1PC: N1>0^N2>0

2: while (N>0) - true

SS9 - Loc: 5N: N2, sum: N1

PC: N1>0^N2<=0

2: while (N>0) - false

SS4 - Loc: 4N: N1, sum: N1

PC: N1>0

SS5 - Loc: 2N: N2, sum: N1

PC: N1>0

SS7 - Loc: 4N: N2, sum: N1+N1

PC: N1>0^N2>0

SS8 - Loc: 2N: N3, sum: N1+N2

PC: N1>0^N2>0

4: N = sym_input();

Example #2 - [Cadar & Sen 2013]

3: sum = sum + N;

SS9 - Loc: 3N: N3, sum: N1+N2

PC: N1>0^N2>0^N3>0

2: while (N>0) - true

4: N = sym_input();

Subsumed by

(N, sum) = {([-∞, + ∞], 1), ([-∞, + ∞], 2), ([-∞, + ∞], 3), …}

Concretization of SS5

(N, sum) = {([-∞, + ∞], 2), ([-∞, + ∞], 3), ([-∞, + ∞], 4), …}

Concretization of SS8

Solution #2: Subsumption

18

Page 19: Symbolic Execution Amal Khalil & Juergen Dingel CISC836: Models in Software Development: Methods, Techniques, and Tools Winter 2015.

Example #2 - [Cadar & Sen 2013]

Solution #2: Subsumption

SS10 - Loc: 5N: N1, sum: 0

PC: N1<=0

2: while (N>0) - false

SS6 - Loc: 3N: N2, sum: N1PC: N1>0^N2>0

2: while (N>0) - true

SS9 - Loc: 5N: N2, sum: N1

PC: N1>0^N2<=0

2: while (N>0) - false

SS4 - Loc: 4N: N1, sum: N1

PC: N1>0

SS5 - Loc: 2N: N2, sum: N1

PC: N1>0

SS7 - Loc: 4N: N2, sum: N1+N1

PC: N1>0^N2>0

SS8 - Loc: 2N: N3, sum: N1+N2

PC: N1>0^N2>0

4: N = sym_input();

4: N = sym_input();

Subsumed by

SS3 - Loc: 3N: N1, sum: 0

PC: N1>03: sum = sum + N;

⊆(N, sum) = {([-∞, + ∞], 1), ([-∞, + ∞], 2), ([-∞, + ∞], 3), …}

Concretization of SS5

(N, sum) = {([-∞, + ∞], 2), ([-∞, + ∞], 3), ([-∞, + ∞], 4), …}

Concretization of SS8

19

Page 20: Symbolic Execution Amal Khalil & Juergen Dingel CISC836: Models in Software Development: Methods, Techniques, and Tools Winter 2015.

Challenges of Symbolic Execution

• Inability to solve very complex and non-linear constraints– Proposed solutions:

• Use concretization (e.g., Concolic Symbolic Execution);• Perform constraints simplification.

• Inability to handle external library calls– Proposed solutions:

• Use concretization (e.g., Concolic Symbolic Execution);• Provide models to simulate/abstract the behavior of such external

modules.

20

Page 21: Symbolic Execution Amal Khalil & Juergen Dingel CISC836: Models in Software Development: Methods, Techniques, and Tools Winter 2015.

• Complex constraints

void testme(int x, int y) {1: int z = (y*y)%50;2: if (z == x) {3: if (x > y+10) {4: abort(); //ERROR5: }6: } } Loc: 1

x: X, y: YPC: true

Loc: 2x: X, y: Y, z: (Y*Y)%50

PC: true

1: int z = (y*y)%50;

SE cannot handle symbolic value of z!>> Stuck!

• External system/library calls

void testme(int x, int y) {1: int z = F(y);2: if (z == x) {3: if (x > y+10) {4: abort(); //ERROR5: }6: } } Loc: 1

x: X, y: YPC: true

Loc: 2x: X, y: Y, z: F(Y)

PC: true

1: int z = F(y);

Example #3 - [Cadar & Sen 2013]

21

Page 22: Symbolic Execution Amal Khalil & Juergen Dingel CISC836: Models in Software Development: Methods, Techniques, and Tools Winter 2015.

Concolic Symbolic Execution

• Novelty: Simultaneous Concrete & Symbolic Executions– DART: Directed Automated Random Testing [Godefroid

et al. 2005]– Execution-Generated Testing (EGT) [Cadar et al. 2005]

• “Replace symbolic expression by concrete value when symbolic expression becomes

unmanageable (e.g. non-linear).”

22

Page 23: Symbolic Execution Amal Khalil & Juergen Dingel CISC836: Models in Software Development: Methods, Techniques, and Tools Winter 2015.

Overview of DART• Example #3 - [Cadar & Sen 2013]

• Random testing alone is ineffective.– Probability of reaching abort() is extremely low!

• Solution?– Combine random testing & symbolic execution (twofold benefit).

• Improve test coverage of random testing• Alleviate some of the imprecision in SE

/* simple driver exercising testme() */int main(){ int inp1 = random(); int inp2 = random(); testme(inp1, inp2); return 0;}

void testme(int x, int y) {1: int z = 2 * y;2: if (z == x) {3: if (x > y + 10) 4: abort(); //ERROR5: } }

23

Page 24: Symbolic Execution Amal Khalil & Juergen Dingel CISC836: Models in Software Development: Methods, Techniques, and Tools Winter 2015.

Test inputs: x = 22, y = 7Path: 1, 2, 5

Loc: 1x: X, y: YPC: true

Loc: 2x: X, y: Y, z: 2*Y

PC: true

1: int z = 2*y;

Loc: 5x: X, y: Y, z: 2*Y

PC: 2*Y!=X

2: if(z==x) - false

Solve: 2*Y==XSolution: x=2, y=1

Test inputs: x = 2, y = 1Path: 1, 2, 3, 5

Loc: 3x: X, y: Y, z: 2*Y

PC: 2*Y==X

2: if(z==x) - true

Loc: 5x: X, y: Y, z: 2*Y

PC: 2*Y==X^X<=Y+10

3: if(x>y+10) - false

Solve: 2*Y==X^X>Y+10Solution: x=30, y=15

Test inputs: x = 30, y = 15Path: 1, 2, 3, 4

Loc: 4x: X, y: Y, z: 2*Y

PC: 2*Y==X^X>Y+10

3: if(x>y+10) - true

void testme(int x, int y) {1: int z = 2 * y;2: if (z == x) {3: if (x > y + 10) 4: abort(); //ERROR5: } }

Example #3 - [Cadar & Sen 2013]

Abort>>ERROR

24

Page 25: Symbolic Execution Amal Khalil & Juergen Dingel CISC836: Models in Software Development: Methods, Techniques, and Tools Winter 2015.

Test inputs: x = 22, y = 7Path: 1, 2, 5

Loc: 1x: X, y: YPC: true

Loc: 2x: X, y: 7, z: 49

PC: true

1: int z = (y*y)%50;

Loc: 5x: X, y: 7, z: 49

PC: 49!=X

2: if(z==x) - false

Solve: 49==XSolution: x=49, y=7

Test inputs: x = 49, y = 7Path: 1, 2, 3, 7

Loc: 3x: X, y: 7, z: 49

PC: 49==X

2: if(z==x) - true

Loc: 5x: X, y: 7, z: 49

PC: 49==X^X>17

3: if(x>y+10) - true

void testme(int x, int y) {1: int z = (y*y)%50; //int z = F(y);2: if (z == x) {3: if (x > y + 10) 4: abort(); //ERROR5: } }

Example #3 - [Cadar & Sen 2013]

Abort>>ERROR

25

Page 26: Symbolic Execution Amal Khalil & Juergen Dingel CISC836: Models in Software Development: Methods, Techniques, and Tools Winter 2015.

KLEE Demo

KLEE LLVM Execution Engine [Cadar et al 2008]

https://klee.github.io/

26

Page 27: Symbolic Execution Amal Khalil & Juergen Dingel CISC836: Models in Software Development: Methods, Techniques, and Tools Winter 2015.

References• [1] King, James C, "Symbolic execution and program testing", Communications of the ACM 19,

7 (1976), pp. 385--394.• [2] Clarke, Lori A. "A system to generate test data and symbolically execute programs."

Software Engineering, IEEE Transactions on 3 (1976): 215-222.• [3] Khurshid, Sarfraz, Corina S. Păsăreanu, and Willem Visser. "Generalized symbolic execution

for model checking and testing." Tools and Algorithms for the Construction and Analysis of Systems. Springer Berlin Heidelberg, 2003. 553-568.

• [4] Godefroid, Patrice, Nils Klarlund, and Koushik Sen. "DART: directed automated random testing." ACM Sigplan Notices. Vol. 40. No. 6. ACM, 2005.

• [5] Cadar, Cristian, and Dawson Engler. "Execution generated test cases: How to make systems code crash itself." Model Checking Software. Springer Berlin Heidelberg, 2005. 2-23.

• [6] Cadar, Cristian, Daniel Dunbar, and Dawson R. Engler. "KLEE: Unassisted and Automatic Generation of High-Coverage Tests for Complex Systems Programs." OSDI. Vol. 8. 2008.

• [7] Cadar, Cristian, and Koushik Sen. "Symbolic execution for software testing: three decades later." Communications of the ACM 56.2 (2013): 82-90.

• [8] Yang, Guowei, et al. "Directed incremental symbolic execution." ACM Transactions on Software Engineering and Methodology (TOSEM) 24.1 (2014): 3.

27