A Survey on Dynamic Symbolic Execution for Automatic Test Generation
-
Upload
sung-kim -
Category
Technology
-
view
530 -
download
3
description
Transcript of A Survey on Dynamic Symbolic Execution for Automatic Test Generation
A Survey on Dynamic Symbolic Execution
for Automatic Test Generation
Jan. 6 2014 PQE
Hyunmin Seo
1
Motivation
• Testing is a practical way to verify software
• The cost for testing account more than 50% of total software development costs [Tassey ‘02]
• Effective, efficient and scalable automatic testing is required [Bounimova ‘13, Kim ‘12]
2
Outline • Automatic Test Generation – Random Testing – Combinatorial Testing – Search-Based Testing – Symbolic Execution-Based Testing – Dynamic Symbolic Execution
• Challenges in DSE (SE) – Imprecision – Constraint Solving – Path Explosion
3
Outline • Automatic Test Generation – Random Testing – Combinatorial Testing – Search-Based Testing – Symbolic Execution-Based Testing – Dynamic Symbolic Execution
• Challenges in DSE (SE) – Imprecision – Constraint Solving – Path Explosion
4
Random Testing
• Random Testing – Randomly generate test inputs
• Adaptive Random Testing (ART) – Spread test cases evenly over input domain [Chen
’04] – Failure-causing inputs form contiguous region
[White ‘80, Chan ‘96]
• Feedback-Directed Random Testing – Randoop [Pacheco ‘07] – Unit testing
5
Random Testing Summary
• One of the most fundamental and well-studied approach [Hamlet ‘94, Loo ‘88] – Many variations
• Pros – Efficient, Scalable – No source code requirement
• Cons – Low coverage [Burnim ’08]
6
Outline • Automatic Test Generation – Random Testing – Combinatorial Testing – Search-Based Testing – Symbolic Execution-Based Testing – Dynamic Symbolic Execution
• Challenges in DSE (SE) – Imprecision – Constraint Solving – Path Explosion
7
Combinatorial Testing
• Find a subset of input parameters satisfying a certain property [Cohen ‘13]
• Mathematical property
8
Vertical Ruler
Ruler Units
Default View
SS Navigation End with Black Always Mirror
Warn Before
Visible Inches Normal Pop-up Yes Yes Yes
Invisible Centimeters Slide None No No No
Points Outline
Picas
Total # of configuration Settings = 2*4*3*2*2*2 = 384 9
N-way Covering Array
• A subset including all the possible combinations from any N factors at least once [Cohen ‘13]
10
No Vertical Ruler
Ruler Units
Default View
SS Navigation End with Black
Always Mirror
Warn Before
1 Visible Centimeters Outline Pop-up No No Yes
2 Invisible Inches Outline Pop-up No No No
3 Invisible Centimeters Slide None Yes Yes Yes
4 Visible Picas Outline Pop-up Yes Yes No
5 Invisible Centimeters Normal Pop-up Yes Yes No
6 Visible Points Outline None Yes No Yes
7 Invisible Points Slide Pop-up No No No
8 Invisible Picas Slide Pop-up No Yes Yes
9 Invisible Points Normal None No Yes No
10 Visible Inches Normal None Yes No Yes
11 Visible Inches Slide Pop-up No Yes Yes
12 Invisible Picas Normal None Yes No No
Vertical Ruler
Ruler Units
Default View
SS Navigation End with Black Always Mirror
Warn Before
Visible Inches Normal Pop-up Yes Yes Yes
Invisible Centimeters Slide None No No No
Points Outline
Picas
2-Way Covering Array
11
CA(12;2,(25,31,41)
Combinatorial Testing Summary • Research Direction
– How to find the minimum size array • Greedy [Tung ‘00, Colbourn ‘04] • Meta-heuristics [Cohen ‘03, Stardom ‘01]
– Application to different domain • Software Product Line [McGregor ‘01, Perrouin ‘10]
• Pros – Systematic testing with mathematical property [Cohen ‘13] – Sample configurations to be tested [Qu 08’]
• Cons – Too many combinations for program inputs
12
Outline • Automatic Test Generation – Random Testing – Combinatorial Testing – Search-Based Testing – Symbolic Execution-Based Testing – Dynamic Symbolic Execution
• Challenges in DSE (SE) – Imprecision – Constraint Solving – Path Explosion
13
Search-Based Testing
• A branch of SBSE in which meta heuristics are used to guide the search [McMinn ‘04]
• Typical process – Start with a random input – Search nearby locations for better solution – Evaluate with fitness function – Update the current solution with a better solution – Search is guided by meta-heuristics
14
Meta-Heuristics
Input domain
(a) Hill climbing
Fitn
ess v
alue
Input domain
(b) Simulated Annealing
Fitn
ess v
alue
Input domain
(c) Genetic Algorithm
Fitn
ess v
alue
15
[McMinn ‘11]
[McMinn ’11]
Input : A string count: The number of digits in the string
if (count >= 4)
if (count <= 10)
if (checksum % 10 == checkdigit)
FALSE
FALSE
FALSE
TRUE
TRUE
TRUE
Target
π2: count = 20 π3: count = 11
π1
π2 π3
Search Based-Testing Example
16
Fitness Function • Combination of approach level and branch distance • Approach level
– The number of target’s control dependent node not executed by the current input
• Branch distance [Tracey ‘98]
17
Element Value
Boolean if TRUE then 0 else K
a = b if abs(a-‐b) = 0 then 0 else abs(a-‐b) + K
a ≠ b if abs(a-‐b) ≠ 0 then 0 else K
a < b if a-‐b < 0 then 0 else (a-‐b) + K
a ≤ b if a-‐b ≤ 0 then 0 else (a-‐b) + K
a > b if b-‐a < 0 then 0 else (b-‐a) + K
a ≥ b if b-‐a ≤ 0 then 0 else (b-‐a) + K
a ∨ b min ( cost(a), cost(b) )
a ∧ b cost (a) + cost (b)
!a move negation inward and propagate
Search-Based Testing Summary
• A branch of SBSE – Different search heuristics – Different domain [Harman ’13]
• Pros – Guide the execution toward a specific branch – Non-functional testing (ex. longest execution time)
[Wegener ’98]
• Cons – Search space challenge – Design of fitness functions [Arcuri ‘10]
18
Outline • Automatic Test Generation – Random Testing – Combinatorial Testing – Search-Based Testing – Symbolic Execution-Based Testing – Dynamic Symbolic Execution
• Challenges in DSE (SE) – Imprecision – Constraint Solving – Path Explosion
19
Symbolic Execution-Based Testing
• Use symbolic values to represent program variables and path conditions [King ‘76, Clarke ‘76]
• Find precise constraints for each execution path and generate test input by solving the constraints.
20
x = sym_input(); y = sym_input(); z = sym_input(); a = x + y if (z > a) b = x – y else b = 2 * y ...
Var Value
PC: s3>s1+s2
PC: s3<=s1+s2
x s1
y s2
z s3
a s1 + s2
b s1 - s2
Var Value
x s1
y s2
z s3
a s1 + s2
b 2s2
Symbolic Execution
21
π1 : PC1 π2 : PC2 π3 : PC3
.
.
.
πn : PCn
Test Generation
SMT solver
π1 : x = 1, y = 2, ... π2 : x = 1, y = 5, ... π3 : x = -5, y = 0,..
.
.
.
πn : x = …, y = …
Path Conditions Test Inputs
22
Symbolic Execution Based-Testing Summary
• Pros – No redundant inputs taking the same path – High Coverage
• Cons – Low efficiency – Depends on constraint solving techniques – External library calls – State explosion – Imprecision
23
Outline • Automatic Test Generation – Random Testing – Combinatorial Testing – Search-Based Testing – Symbolic Execution-Based Testing – Dynamic Symbolic Execution
• Challenges in DSE (SE) – Imprecision – Constraint Solving – Path Explosion
24
Limitations of SE
25
01 void foo(int x, int y) { 02 if (external (x) == y) { 03 // branch 1 04 } 05 else if (hash(x) > y) { 06 // branch 2 07 } 08 }
è No source code available è hash() is complex arithmetic
Dynamic Symbolic Execution
• Perform symbolic execution dynamically along an execution path of a concrete input [DART ‘05, CUTE ’05, PEX ‘08]
• Apply concretization – External library calls – Complex constraints
26
DSE
π1
pc1
pc2
pc3
pc4
π2 π1 π2 π1 π3
PC’ = pc1∧pc2∧!pc3
PC’’ = pc1∧!pc2
27
PC = pc1∧pc2∧pc3 … ∧pcn
Benefit • Based on symbolic execution
– No redundant inputs taking the same path – High coverage
• Reach deep program state by starting from well-formed user provided input
• Use concrete values to overcome limitations – External library calls – Complicated constraints
• Many tools – CREST, CUTE, JCUTE, PEX, SAGE, EXE, KLEE
28
Comparison
Technique Efficiency Coverage Source code Requirement ETC
Random No
Combinatorial No Combine with
other techniques
Search-Based Yes/No Non-functional
Testing
Symbolic Execution Yes
DSE Yes Concretization
29
Outline • Automatic Test Generation – Random Testing – Combinatorial Testing – Search-Based Testing – Symbolic Execution-Based Testing – Dynamic Symbolic Execution
• Challenges in DSE (SE) – Imprecision – Constraint Solving – Path Explosion
30
Imprecision
• When the symbolic execution cannot represent the exact semantic of the program [Elkarablieh ’09] – Modeling a 4-Byte integer with a mathematical
integer
• Imprecision may manifest as Divergence [Godefroid ’08]
31
Divergence pc1
pc2
pc3
pc4
pc5
pc1 ∧ pc2 ∧ ! pc3
32
Proposed solutions
• Integer size, Bit operations – BitVector [SAGE ’08]
• Symbolic pointer dereferencing – Array Theory of SMT solvers [Elkarablieh ‘09]
• Floating-point operations – Combined static and dynamic analysis [Godefroid ‘10]
• Interaction with environment – Modeling [KLEE ‘08] – Reporting [Xiao ‘11]
33
BitVector
• Use bitvector in SMT solvers – Fixed-size integers – Bit operation on integer variables
• a & b • a << 4
• Slower than integer arithmetic
34
Symbolic Pointer Dereferencing
• Symbolic values are used to calculate the addresses of pointer values – Array index – a[S0]
35
01 void single array (BYTE x, BYTE y) { 02 BYTE ∗ a = new BYTE[4]; 03 a[0] = x; 04 a[1] = 0; 05 a[2] = 1; 06 a[3] = 2; 07 08 if (a[x] == a[y] + 2) 09 assert(false ); 10 11 delete [] a; 12 }
a[x] == a[y] + 2 è 0 != 0 + 2
a[x] == a[y] + 2 è S0 != 0 + 2
a[x] == a[y] + 2 è 1 != 0 + 2
[Elkarablieh ‘09]
36
Con Sym Con
x 0 S0 2
y 1 S1 1
a[0] 0 S0 2
a[1] 0 0 0
a[2] 1 1 1
a[3] 2 2 2
a[x] 0 S0 1
a[y] 0 0 0
Symbolic Pointer Dereferencing Example
01 void single array (BYTE x, BYTE y) { 02 BYTE ∗ a = new BYTE[4]; 03 a[0] = x; 04 a[1] = 0; 05 a[2] = 1; 06 a[3] = 2; 07 08 if (a[x] == a[y] + 2) 09 assert(false ); 10 11 delete [] a; 12 }
[Elkarablieh ‘09]
37
Array Theory of SMT Solver Con Sym Con
x 0 S0 2
y 1 S1 1
a[0] 0 S0 2
a[1] 0 0 0
a[2] 1 1 1
a[3] 2 2 2
a[x] 0 S0 1
a[y] 0 0 0
a[x] : 0 ≤ x ≤ 3 ∧ a[x] {0,1,2}
a[y] : 0 ≤ y ≤ 3 ∧ a[y] {0,1,2,x}
Floating Point Operation
• [Godefroid ’10] • FP code should only perform memory safe
data-processing – Payload of an image or video file
• Non-FP code should deal with buffer allocations and memory address computations
• Lightweight local path-insensitive “may” analysis + precise “must” dynamic analysis
38
Interaction With Environment
• Modeling [KLEE ‘08] – System Calls – int fd = open(argv[1], O_RDNLY);
• Precise Identification and Report – [Xiao ’11]
39
Imprecision Summary Reason Proposed Solutions Fixed-size Integer BitVector [SAGE ‘08] Symbolic Pointer Dereferencing
Array Theory [Elkarablieh ’09]
Floating-point operations Combined Static and Dynamic analysis [Godefroid ‘10]
Interaction with Environment
Modeling [KLEE ‘08] Precise identification and report [Xiao ’11]
40
Remaining Challenges: Precise reasoning about floating points, Interaction with Environment, External Library Calls, Concurrent programs
Outline • Automatic Test Generation – Random Testing – Combinatorial Testing – Search-Based Testing – Symbolic Execution-Based Testing – Dynamic Symbolic Execution
• Challenges in DSE (SE) – Imprecision – Constraint Solving – Path Explosion
41
Constraint Solving
• Need to solve path constraints to get the test input
• The major bottleneck – Takes long time – Cannot solve
42
Proposed Solutions
• Optimization [KLEE ‘08] – Expression rewriting – Implied value concretization – Irrelevant constraint elimination – Constraint caching
• Meta-heuristic based constraints solving [Borges ‘12, Souza ‘11, Lakhotia ‘10]
• Hybrid approach [Garg ‘13]
43
Optimization
• Irrelevant constraint elimination [KLEE ‘08]
• Constraint Caching [KLEE ‘08]
44
Meta-Heuristic Approach
• SMT solvers may not support – Non-linear constraints – Floating-Points expressions – Very complex constraints
• Use Meta-Heuristic Approaches [Borges ‘12, Souza ‘11, Lakhotia ’10]
45
Hybrid Approach [Garg ’13]
• Apply concretization first and solve it quickly with an off-the-shelf SMT solver
• If divergence occurred, use ICP (Interval Constraint Propagation) to solve the constraints
46
Constraint Solving Summary
Target Proposed Solutions
Time overhead Irrelevant Constraint Elimination Constraint Caching [KLEE ‘08]
Complex constraints Meta-heuristic Approach [Borges ‘12, Souza ‘11, Lakhotia ‘10]
Non-linear constraints ICP [Garg, ‘13]
47
Remaining Challenges: Floating points, Complex constraints, Non-linear constraints
Outline • Automatic Test Generation – Random Testing – Combinatorial Testing – Search-Based Testing – Symbolic Execution-Based Testing – Dynamic Symbolic Execution
• Challenges in DSE (SE) – Imprecision – Constraint Solving – Path Explosion
48
Path Explosion
• The number of paths in a program increases exponentially with the number of branches in the program
49
Path Explosion
π1
pc1
pc2
pc3
pc4
π2 π1 π2 π1 π3
pc1∧pc2∧!pc3 pc1∧!pc2
50
Proposed Solutions
• Pruning Redundant Path – RWset [Cristian ‘08] – Interpolation [Jaffar ’13]
• Function Summary – Compositional [Godefroid ‘07, ‘10] – Demand-driven compositional [Anand ‘08]
• Search Heuristics – CFG [Burnim ‘08] – Generational [Godefroid ‘08] – CarFast [Park ‘12] – Hybrid [Majumdar ‘07]
51
Pruning Redundant Paths
• RWset ‘08 – If an execution reached a program point in the
same state as some previous executions, then the execution will produce the same results
– If two states are only differ in program values that are not subsequently read, then the two state will produce the same results
52
Pruning Redundant Paths
• Interpolant [Jaffar ’13]
• Succinctly representation of the core reason why a branch cannot be covered
53
Interpolant Example
54
UNSAT branch
Full Interpolant ( x < 3z + 2)
[Jaffar ’13]
Function Summary
• A function summary [Godefroid ‘07, ‘10]
• prew is a conjunction of constraints of the inputs to the function
• postw , effect, is a conjunction of constraints of the outputs from the function
55
Function Summary
foo(x, y)
Assume foo has 10 execution paths
Without Summary With Summary
N paths
N × 10 paths
foo(x, y)
N paths
N paths
56
Search Heuristics
• Prioritize branches and explore relevant branches only
57
Search Heuristics
(a) DFS (b) BFS (c) Heuristic Search
58
Search Heuristics
• Coverage-Optimized – CFG-directed [Burnim ‘08] – CarFast [Park ‘12] – Generational [GodeFroid ‘10] – Hybrid [Majumdar ‘07]
• Patch-Optimized – KATCH [Cadar ‘13]
59
CFG-Directed Search
60
π1
pc1
pc2
pc3
pc4
[Burnim ’08]
Limitations of Search Heuristics
• Does not consider how execution reached to branch
• Does not handle non-symbolic path constraints – pc = 3 > 0 – pc’ = !(3 > 0) = 3 ≤ 0 = UNSAT
61
Guiding Execution Toward a Branch
62
UNSAT
Path Explosion Summary Approach Proposed Solutions Pruning Redundant Paths RWset [Boonstoppel ‘08]
Interpolation [Jaffar ‘13]
Function Summary Compositional [Godefroid ’07, ‘10] Demand-Driven Compositional [Anand ‘08]
Search Heuristics CFG-Directed [Burnim ‘08] Generational [Godefroid ‘08] CarFast [Park ‘12] Hybrid [Majumdar ‘07] KATCH [Cadar ’13]
63
Remaining Challenges: Better Search Strategies, Guiding execution toward a specific branch
Conclusion
• DSE is a promising automatic test generation techniques achieving a high coverage
• DSE relies on symbolic execution and constraint solving
• Challenges – Imprecision, Constraint solving, Path explosion – GUI Application Testing, Concurrent programs,
Object Creation problem
64
65
Challenges and Proposed Solutions
Imprecision Integer Size BitVector [SAGE ’08]
Symbolic Pointer Dereferencing
Array Theory [Elkarablieh ’09]
Floating-points Combined Static and Dynamic analysis [Godefroid ’10]
Environments Modeling [KLEE ‘08] Precise identification and report [Xiao ’11]
Constraint Solving Optimization Irrelevant Constraint Elimination Constraint Caching [KLEE ’08]
Meta-Heuristics [Borges ‘12, Souza ‘11, Lakhotia ’10]
Hybrid ICP [Garg, ‘13]
Path Explosion Pruning Redundant Paths RWset [Boonstoppel ‘08] Interpolation [Jaffar ’13]
Function Summary Compositional [Godefroid ’07, ‘10] Demand-Driven Compositional [Anand ’08]
Search Heuristics CFG-Directed [Burnim ‘08] Generational [Godefroid ‘08] CarFast [Park ‘12] KATCH [Cadar ’13] Hybrid [Majumdar ‘07]