Symbolic Execution And KLEE
-
Upload
shauvik-choudhary -
Category
Education
-
view
6.068 -
download
6
description
Transcript of Symbolic Execution And KLEE
Symbolic executionOverview of work done by Dawson Engler’s group
at Stanford (EGT/EXE/KLEE*)
byShauvik Roy Choudhary
http://cc.gatech.edu/~shauvik
Some slides adapted from the EXE and KLEE presentations + slides from Saswat
2
Old research area but still active..
First introduced in 1975 (source: Saswat)
1976 by James King, IBM – TJ watson
Very active area of research. Eg. EGT / EXE / KLEE [Stanford] DART [Bell Labs] CUTE [UIUC] SAGE, Pex [MSR Redmond] Vigilante [MSR Cambridge] BitScope [Berkeley/CMU] CatchConv [Berkeley] JPF [NASA Ames]
Spring 2005
DARTEGT
Fall 2005
CUTE
Fall 2006
EXE
Fall 2008
KLEE
3
Symbolic Execution Symbolic execution refers to execution of program with
symbols as argument.
Unlike concrete execution, in symbolic execution the program can take any feasible path. (limitation: constraint solver)
During symbolic execution, program state consists of symbolic values for some memory locations path condition
Path condition is a conjuct of constraints on the symbolic input values.
Solution of path-condition is an test-input that covers the respective path.
4
Implementation of Symbolic Execution
Transformation approach transform the program to another program that operates
on symbolic values such that execution of the transformed program is equivalent to symbolic execution of the original program
difficult to implement, portable solution, suitable for Java, .NET
Instrumentation approach callback hooks are inserted in the program such that
symbolic execution is done in background during normal execution of program
easy to implement for C
Customized runtime approach Customize the runtime (e.g., JVM) to support symbolic
execution Applicable to Java, .NET, difficult to implement, flexible,
not portable
CUTE, KLEE
JPF
5
Limitations of Symbolic Execution
Limited by the power of constraint solver cannot handle non-linear and very complex
constraints
Does not scale when number of paths are large. (subject of ongoing research in this area)
Source code, or equivalent (e.g., Java class files) is required for precise symbolic execution
EGT & EXE
Slides based on D. Engler’s slides
Generic features: Baroque interfaces, tricky input, rats nest of
conditionals. Enormous undertaking to hit with manual testing.
Random “fuzz” testing Charm: no manual work Blind generation makes hard
to hit errors for narrow input range
Also hard to hit errors that require structure
This talk: a simple trick to finesse.
Goal: find many bugs in systems code
8
EGT: Execution Generated Testing [SPIN’05]
Basic Idea: Use the code itself to construct its input !
Basic Algorithm: Symbolic execution + constraints solving.
Run code on symbolic inputs, initial value = “anything”
As code observes inputs, it tells us values it can be. At conditionals that uses symbolic input, fork
On true branch, add constraint that input satisfies check On false that it does not.
Then generate constraints using these inputs and re-run code using them.
HOW TO MAKE SYSTEM CODE CRASH ITSELF !
9
The toy example
Initialize x to be “any int”
Code will run 3 times.
Solve constraints at each to get our 3 test cases.
10
The big pictureImplementation prototype
Do source-to-source transformation using CIL Use CVCL decision procedure to solve constraints,
then re-run code on concrete values Robustness: use mixed symbolic and concrete execution
3 ways to look at what’s going on Grammar extraction Turn code inside out from input consumer to generator Sort-of Heisenberg effect: observations perturb symbolic
inputs into increasingly concrete ones. More definite observation = more definite perturbation
11
Mixed executionBasic idea: given an operation:
If all of it’s operands are concrete, just do it. If any are symbolic, add constraint.
If current constraints are impossible, stop. If current path causes something to blow up, solve &
emit. If current path calls unmodelled function, solve & call. If program exits, solve & emit.
How to track? Use variable addresses to determine if symbolic or
concreteNote: Symbolic assignment not destructive. Creates
new symbol
12
Example transformation “+”
Each var v has v.concrete and v.symbolic fields
If v is concrete, symbol = <invalid> and vice versa
14
ResultsMutt vs <= 1.4 have buffer overflow (osdi paper)
Input size 4, took 34 minutes to generate 458 tests with 98% st coverage
printf (3 implementations pintOS, gccfast, embedded) Made format strings symbolic Two bugs
Incorrect grouping of integers
Incorrect handling of plus flags (“%” followed by space)
15
More..WsMP3 server case study
2ooo LOC Technique: Make recv input symbolic
Found known security hole + 2 new bugsNetwork controlled infinite loop Buffer overflow
16
EXE: EXecution generated Executions [CCS’06]
Same ideas as EGT
Main contributions More practical tool:
Can test any code path Generates actual attacks
Constraint Solver : STP Decision solver for bitvectors and arrays. If solvable, passes constraints to MiniSAT Four times lesser code than CVCL and magnitude faster Array optimizations (substitution, refinements,
simplification)
AUTOMATICALLY GENERATING INPUTS OF DEATH !
The mechanicsUser marks input to treat symbolically using either:
Compile with EXE compiler, exe-cc. Uses CIL to Insert checks around every expression: if operands all concrete,
run as normal. Otherwise, add as constraint Insert fork calls when symbolic could cause multiple acts
./a.out: forks at each decision point. When path terminates use STP to solve constraints. Terminates when: (1) exit, (2) crash, (3) EXE detects err
Rerun concrete through uninstrumented code.
Isn’t exponential expensive?
Only fork on symbolic branches. Most concrete (linear).
Loops? Heuristics. Default: DFS. Linear processes with chain depth. Can get stuck. “Best first” search: chose branch, backtrack to point that
will run code hit fewest times. Can do better…
However: Happy to let run for weeks as long as generating
interesting test cases. Competition is manual and random.
Mixed executionBasic idea: given expression (e.g., deref, ALU op)
If all of its operands are concrete, just do it. If any are symbolic, add as constraint.
If current constraints are impossible, stop. If current path hits error or exit(), solve+emit. If calls uninstrumented code: do call, or solve and do call
Example: “x = y + z” If y, z both concrete, execute. Record x = concrete. Otherwise set “x = y + z”, record x =symbolic.
Result: Most code runs concretely: small slice deals w/ symbolics. Robust: do not need all source code (e.g., OS). Just run
LimitsMissed constraints:
If call asm, or CIL cannot eat file. STP cannot do div/mod: constraint to be power of 2,
shift, mask respectively. Cannot handle **p where “p” is symbolic: must
concretize *p. (Note: **p still symbolic.) Stops path if cannot solve; can get lost in exponentials.
Missing: No symbolic function pointers, symbolics passed to
varargs not tracked. No floating point. long long support is erratic.
21
EXE ResultsBerkley Packet Filter
Two buffer overflow exploits
udhcpd – well tested user level DHCP server Five memory errors
PCRE – Perl Compatible Regular Expressions Many out of bounds writes leading to abort in glibc on
free
Disks of death – File systems Four bugs on ext2 & ext 3 file systems. Null pointer dereference in JFS
KLEE
Thanks to Cristian Cadar for the slides
24
Code complexity Tricky control flow Complex dependencies Abusive use of pointer operations
Environmental dependencies Code has to anticipate all possible interactions Including malicious ones
Writing Systems Code Is Hard
25
Automatically generates high coverage test suites– Over 90% on average on ~160 user-level
apps
Finds deep bugs in complex systems programs– Including higher-level correctness ones
KLEE
Based on symbolic execution and constraint solving techniques
[OSDI 2008, Best Paper Award]
26
int bad_abs(int x)
{ if (x < 0)
return –x; if (x ==
1234) return –x; return x;}
x = 1234
x < 0
x < 0 x 0
return x
x 1234
return -x
return -x
x = 1234
x =
x = -2
x = 3x = 1234
test1.out
test2.out test3.out
Toy Example
TRUE
TRUE FALSE
FALSE
27
KLEE Architecture
LLVM bytecode
K L E ESYMBOLIC ENVIRONMENT
Constraint Solver (STP)
x = 3
x = -2
x = 1234
x = 3
C code
x 0x 1234
LLVM
28
Outline
Motivation Example and Basic ArchitectureScalability ChallengesExperimental Evaluation
29
Three Big Challenges
Motivation Example and Basic ArchitectureScalability Challenges– Exponential number of paths– Expensive constraint solving– Interaction with environment
Experimental Evaluation
30
Exponential Search Space
Naïve exploration can easily get “stuck”
Use search heuristics:Coverage-optimized search– Select path closest to an uncovered
instruction– Favor paths that recently hit new
code
Random path search– See [KLEE – OSDI’08]
31
Three Big Challenges
Motivation Example and Basic ArchitectureScalability Challenges– Exponential number of paths– Expensive constraint solving– Interaction with environment
Experimental Evaluation
32
Constraint Solving
Dominates runtime– Inherently expensive (NP-complete)– Invoked at every branch
Two simple and effective optimizations– Eliminating irrelevant constraints– Caching solutions
Dramatic speedup on our benchmarks
33
Eliminating Irrelevant Constraints
In practice, each branch usually depends on a small number of variables
x + y > 10z & -z = zx < 10 ?
……if (x < 10) { …}
34
Caching Solutions
2 y < 100x > 3x + y > 10
x = 5y = 15
2 y < 100x + y > 102 y < 100x > 3x + y > 10x < 10
Static set of branches: lots of similar constraint sets
Eliminating constraintscannot invalidate solution
Adding constraints often does not invalidate solution
x = 5y = 15
x = 5y = 15
UBTree data structure [Hoffman and Koehler, IJCAI ’99]
35
0
50
100
150
200
250
300
0 0.2 0.4 0.6 0.8 1
Base Irrelevant Constraint Elimination Caching Irrelevant Constraint Elimination + Caching
Dramatic Speedup
Aggregated data over 73 applications
Tim
e (
s)
Executed instructions (normalized)
36
Three Big Challenges
Motivation Example and Basic ArchitectureScalability Challenges– Exponential number of paths– Expensive constraint solving– Interaction with environment
Experimental Evaluation
37
Environment: Calling Out Into OS
int fd = open(“t.txt”, O_RDONLY);
If all arguments are concrete, forward to OS
Otherwise, provide models that can handle symbolic files– Goal is to explore all possible legal interactions with the
environment
int fd = open(sym_str, O_RDONLY);
38
Environmental Modeling
// actual implementation: ~50 LOCssize_t read(int fd, void *buf, size_t count) { exe_file_t *f = get_file(fd); … memcpy(buf, f->contents + f->off, count) f->off += count; …}
Plain C code run by KLEE
– Users can extend/replace environment w/o any knowledge of KLEE internals
Currently: effective support for symbolic command line arguments, files, links, pipes, ttys, environment vars
39
Does KLEE work?
Motivation Example and Basic ArchitectureScalability ChallengesEvaluation– Coverage results– Bug finding– Crosschecking
40
GNU Coreutils Suite
Core user-level apps installed on many UNIX systems
89 stand-alone (i.e. excluding wrappers) apps (v6.10)– File system management: ls, mkdir, chmod, etc.– Management of system properties: hostname, printenv, etc.– Text file processing : sort, wc, od, etc. – …
Variety of functions, different authors,intensive interaction with environment
Heavily tested, mature code
41
Coreutils ELOC (incl. called lib)
5
53
16
6 41 3 2
0
10
20
30
40
50
60
Executable Lines of Code (ELOC)
Num
ber
of
app
licati
ons
42
Methodology
Fully automatic runsRun KLEE one hour per utility, generate
test casesRun test cases on uninstrumented version
of utilityMeasure line coverage using gcov
– Coverage measurements not inflated by potential bugs in our tool
43
0%
20%
40%
60%
80%
100%
1 12 23 34 45 56 67 78 89
High Line Coverage (Coreutils, non-lib, 1h/utility = 89 h)
Overall: 84%, Average 91%, Median 95%16 at 100%
Apps sorted by KLEE coverage
Covera
ge (
ELO
C %
)
44
9
-20%
0%
20%
40%
60%
80%
100%
Beats 15 Years of Manual Testing
KLE
E c
overa
ge –
Manual co
vera
ge Avg/utility
KLEE 91%
Manual 68%
Apps sorted by KLEE coverage – Manual coverage
Manual tests also check correctness
45
72
0%
20%
40%
60%
80%
100%
1 13 25 37 49 61
Busybox Suite for Embedded Devices
Overall: 91%, Average 94%, Median 98%31 at 100%
Apps sorted by KLEE coverage
Covera
ge (
ELO
C %
)
46
72
-20%
0%
20%
40%
60%
80%
100%
1 13 25 37 49 61
Busybox – KLEE vs. Manual
Avg/utility
Apps sorted by KLEE coverage – Manual coverage
KLE
E c
overa
ge –
Manual co
vera
ge
KLEE 94%
Manual 44%
47
Does KLEE work?
Motivation Example and Basic ArchitectureScalability ChallengesEvaluation– Coverage results– Bug finding– Crosschecking
48
GNU Coreutils Bugs
Ten crash bugs– More crash bugs than approx last three years combined– KLEE generates actual command lines exposing crashes
49
md5sum -c t1.txt
mkdir -Z a b
mkfifo -Z a b
mknod -Z a b p
seq -f %0 1
pr -e t2.txt
tac -r t3.txt t3.txt
paste -d\\ abcdefghijklmnopqrstuvwxyz
ptx -F\\ abcdefghijklmnopqrstuvwxyz
ptx x t4.txt
t1.txt: \t \tMD5( t2.txt: \b\b\b\b\b\b\b\t t3.txt: \n t4.txt: A
Ten command lines of death
50
Does KLEE work?
Motivation Example and Basic ArchitectureScalability ChallengesEvaluation– Coverage results– Bug finding– Crosschecking
51
Finding Correctness Bugs
KLEE can prove asserts on a per path basis– Constraints have no approximations– An assert is just a branch, and KLEE
proves feasibility/infeasibility of each branch it reaches
– If KLEE determines infeasibility of false side of assert, the assert was proven on the current path
52
Crosschecking
Assume f(x) and f’(x) implement the same interface1. Make input x symbolic2. Run KLEE on assert(f(x) == f’(x))3. For each explored path:
a) KLEE terminates w/o error: paths are equivalentb) KLEE terminates w/ error: mismatch found
Coreutils vs. Busybox:1. UNIX utilities should conform to IEEE Std.1003.1
2. Crosschecked pairs of Coreutils and Busybox apps
3. Verified paths, found mismatches
53
Mismatches Found
Input Busybox Coreutils
tee "" <t1.txt [infinite loop] [terminates]
tee - [copies once to stdout]
[copies twice]
comm t1.txt t2.txt
[doesn’t show diff] [shows diff]
cksum / "4294967295 0 /" "/: Is a directory"
split / "/: Is a directory"
tr [duplicates input] "missing operand"
[ 0 ‘‘<’’ 1 ] "binary op. expected"tail –2l [rejects] [accepts]
unexpand –f [accepts] [rejects]
split – [rejects] [accepts]
t1.txt: a t2.txt: b (no newlines!)
54
KLEE can effectively:– Generate high coverage test suites
Over 90% on average on ~160 user-level applications
– Find deep bugs in complex softwareIncluding higher-level correctness bugs, via crosschecking
KLEE Effective Testing of Systems
Programs
55
KLEE DEMOTool available at http://klee.llvm.org/
Experiments Tool examples
isLower() RegExp
More experimentation
DiscussionQuestions / Ideas ?
Thanks for listening !