Elsa/Oink/Cqual++: Open-Source Static Analysis for C++ Scott McPeak Daniel Wilkerson work with Rob...
-
Upload
simon-sutton -
Category
Documents
-
view
214 -
download
0
Transcript of Elsa/Oink/Cqual++: Open-Source Static Analysis for C++ Scott McPeak Daniel Wilkerson work with Rob...
![Page 1: Elsa/Oink/Cqual++: Open-Source Static Analysis for C++ Scott McPeak Daniel Wilkerson work with Rob Johnson CodeCon 2006.](https://reader036.fdocuments.us/reader036/viewer/2022062717/56649e4d5503460f94b43964/html5/thumbnails/1.jpg)
Elsa/Oink/Cqual++:Open-Source Static Analysis for C++
Scott McPeak Daniel Wilkerson
work with Rob Johnson
CodeCon 2006
![Page 2: Elsa/Oink/Cqual++: Open-Source Static Analysis for C++ Scott McPeak Daniel Wilkerson work with Rob Johnson CodeCon 2006.](https://reader036.fdocuments.us/reader036/viewer/2022062717/56649e4d5503460f94b43964/html5/thumbnails/2.jpg)
Goals
• Build extensible infrastructure to
• Find certain categories of bugs– Exhaustively, within some constraints
• At compile time
• In real-world C and C++ programs
• Using composable analyses
![Page 3: Elsa/Oink/Cqual++: Open-Source Static Analysis for C++ Scott McPeak Daniel Wilkerson work with Rob Johnson CodeCon 2006.](https://reader036.fdocuments.us/reader036/viewer/2022062717/56649e4d5503460f94b43964/html5/thumbnails/3.jpg)
Components
• Elkhound: Generalized LR Parser Generator
• Elsa: C++ Parser
• Oink: Whole-program dataflow
• Cqual++: Type qualifier analysis
![Page 4: Elsa/Oink/Cqual++: Open-Source Static Analysis for C++ Scott McPeak Daniel Wilkerson work with Rob Johnson CodeCon 2006.](https://reader036.fdocuments.us/reader036/viewer/2022062717/56649e4d5503460f94b43964/html5/thumbnails/4.jpg)
Elkhound: GLR Parser Generator
• GLR eliminates the pain of LALR(1)– Unbounded lookahead– Allows ambiguous grammars!
• 10x faster than other GLR implementations– Novel combination of GLR and LALR(1)
• User-defined disambiguation– Early: during parsing– Late: after generating AST w/ambiguities
![Page 5: Elsa/Oink/Cqual++: Open-Source Static Analysis for C++ Scott McPeak Daniel Wilkerson work with Rob Johnson CodeCon 2006.](https://reader036.fdocuments.us/reader036/viewer/2022062717/56649e4d5503460f94b43964/html5/thumbnails/5.jpg)
Example: ‘>’ ambiguity
new C < 3 > + 4 > + 5 ;
new C < 3 > + 4 > + 5 ;
Expr
Type
Expr
Type
![Page 6: Elsa/Oink/Cqual++: Open-Source Static Analysis for C++ Scott McPeak Daniel Wilkerson work with Rob Johnson CodeCon 2006.](https://reader036.fdocuments.us/reader036/viewer/2022062717/56649e4d5503460f94b43964/html5/thumbnails/6.jpg)
Example: ‘>’ ambiguity
new C < 3 > + 4 > + 5 ;
new C < 3 > + 4 > + 5 ;
Expr
Type
Expr
Type
unparenthesized ‘>’ symbol
Correct
Incorrect
![Page 7: Elsa/Oink/Cqual++: Open-Source Static Analysis for C++ Scott McPeak Daniel Wilkerson work with Rob Johnson CodeCon 2006.](https://reader036.fdocuments.us/reader036/viewer/2022062717/56649e4d5503460f94b43964/html5/thumbnails/7.jpg)
Example: Type vs. Variable
• In C & C++, sometimes hard to tell whether a name refers to a type or a variable
(a) & (b) (a) & (b)
Expr Expr Type Expr
or
![Page 8: Elsa/Oink/Cqual++: Open-Source Static Analysis for C++ Scott McPeak Daniel Wilkerson work with Rob Johnson CodeCon 2006.](https://reader036.fdocuments.us/reader036/viewer/2022062717/56649e4d5503460f94b43964/html5/thumbnails/8.jpg)
Example: Type vs. Variable
• In C & C++, sometimes hard to tell whether a name refers to a type or a variable
int a; // hiddenclass C { int f(int b) { return (a) & (b); } typedef int a; // visible};
![Page 9: Elsa/Oink/Cqual++: Open-Source Static Analysis for C++ Scott McPeak Daniel Wilkerson work with Rob Johnson CodeCon 2006.](https://reader036.fdocuments.us/reader036/viewer/2022062717/56649e4d5503460f94b43964/html5/thumbnails/9.jpg)
Elsa: Extensible C++ Front-end
• Parses ANSI C++ with GNU extensions
• Uses GLR to handle the ambiguities
• Extensible components:– flex lexer– Elkhound parser– AST defined with custom tool– Type checker
![Page 10: Elsa/Oink/Cqual++: Open-Source Static Analysis for C++ Scott McPeak Daniel Wilkerson work with Rob Johnson CodeCon 2006.](https://reader036.fdocuments.us/reader036/viewer/2022062717/56649e4d5503460f94b43964/html5/thumbnails/10.jpg)
The Elsa Block Diagram
Lexer
preproc’dsource
Parser
tokenstream
TypeChecker
possiblyambiguousAST
PostProcess
annotatedunambiguousAST
finalAST
No lexer feedback hack!
![Page 11: Elsa/Oink/Cqual++: Open-Source Static Analysis for C++ Scott McPeak Daniel Wilkerson work with Rob Johnson CodeCon 2006.](https://reader036.fdocuments.us/reader036/viewer/2022062717/56649e4d5503460f94b43964/html5/thumbnails/11.jpg)
Extending the Syntax
• ANSI or GNU? Both!– Declarative language– Extend simply by concatenating
nonterm ConditionalExp { -> Exp {...} -> Exp "?" Exp ":" Exp {...}}
ANSI Base:
nonterm ConditionalExp { -> Exp "?" ":" Exp {...}}
GNU Extension:
![Page 12: Elsa/Oink/Cqual++: Open-Source Static Analysis for C++ Scott McPeak Daniel Wilkerson work with Rob Johnson CodeCon 2006.](https://reader036.fdocuments.us/reader036/viewer/2022062717/56649e4d5503460f94b43964/html5/thumbnails/12.jpg)
Declarative Abstract Syntax
class Statement (SourceLoc loc) { -> S_compound(ASTList<Statement> stmts); -> S_if(Condition cond, Statement thenBranch, Statement elseBranch);
-> S_while(Condition cond, Statement body);
// ...}
superclass name superclass ctor parameter
subclass names
subclass ctor parameter
subclass ctor list parameter
![Page 13: Elsa/Oink/Cqual++: Open-Source Static Analysis for C++ Scott McPeak Daniel Wilkerson work with Rob Johnson CodeCon 2006.](https://reader036.fdocuments.us/reader036/viewer/2022062717/56649e4d5503460f94b43964/html5/thumbnails/13.jpg)
Extending the Abstract Syntax
• ANSI or GNU? Both!– Declarative language– Extend simply by concatenating
ANSI Base: GNU Extension:
class Statement { -> S_decl(Declaration decl); -> S_expr(Expression expr); -> S_if(...); -> S_for(...); }
class Statement { -> S_function(Function f);}
GNU nested functions
![Page 14: Elsa/Oink/Cqual++: Open-Source Static Analysis for C++ Scott McPeak Daniel Wilkerson work with Rob Johnson CodeCon 2006.](https://reader036.fdocuments.us/reader036/viewer/2022062717/56649e4d5503460f94b43964/html5/thumbnails/14.jpg)
Semantic Analysis
• Disambiguate
• Compute types
• Resolve overloading
• Insert implicit conversions
• Instantiate templates
![Page 15: Elsa/Oink/Cqual++: Open-Source Static Analysis for C++ Scott McPeak Daniel Wilkerson work with Rob Johnson CodeCon 2006.](https://reader036.fdocuments.us/reader036/viewer/2022062717/56649e4d5503460f94b43964/html5/thumbnails/15.jpg)
Disambiguation
Ambiguous syntax example: return (x)(y);
S_return
E_cast
TypeId
x
E_funCall
E_variable E_variable E_variable
y
ambiguity link
expr
exprtype func arg
![Page 16: Elsa/Oink/Cqual++: Open-Source Static Analysis for C++ Scott McPeak Daniel Wilkerson work with Rob Johnson CodeCon 2006.](https://reader036.fdocuments.us/reader036/viewer/2022062717/56649e4d5503460f94b43964/html5/thumbnails/16.jpg)
Lowered Output: Simplified C++
• Original or Lowered output can be printed
• Lowering always done:– Templates are instantiated– Implicit type conversions inserted
• Lowering optionally done:– Implicit member functions created– Implicit ctor/dtor calls inserted
![Page 17: Elsa/Oink/Cqual++: Open-Source Static Analysis for C++ Scott McPeak Daniel Wilkerson work with Rob Johnson CodeCon 2006.](https://reader036.fdocuments.us/reader036/viewer/2022062717/56649e4d5503460f94b43964/html5/thumbnails/17.jpg)
C++ or XML, In and Out
Elsa
C++
XML
C++
XML
First pass renders to a canonical form.Serialization commutes with lowering.
![Page 18: Elsa/Oink/Cqual++: Open-Source Static Analysis for C++ Scott McPeak Daniel Wilkerson work with Rob Johnson CodeCon 2006.](https://reader036.fdocuments.us/reader036/viewer/2022062717/56649e4d5503460f94b43964/html5/thumbnails/18.jpg)
Cqual++: Dataflow
• Dataflow Analysis on Type Qualifiers
• Successor to Cqual: Jeff Foster, Alex Aiken
char $tainted *getenv();
void printf(char $untainted *fmt, ...);
int main() { char *x = getenv(“foo”));
printf(x);}
![Page 19: Elsa/Oink/Cqual++: Open-Source Static Analysis for C++ Scott McPeak Daniel Wilkerson work with Rob Johnson CodeCon 2006.](https://reader036.fdocuments.us/reader036/viewer/2022062717/56649e4d5503460f94b43964/html5/thumbnails/19.jpg)
Feature: Polymorphic Dataflow
int f(int x) {return x;}
int main() { int $tainted t = ...;
int a = f(t);
int $untainted u = f(3);
}
![Page 20: Elsa/Oink/Cqual++: Open-Source Static Analysis for C++ Scott McPeak Daniel Wilkerson work with Rob Johnson CodeCon 2006.](https://reader036.fdocuments.us/reader036/viewer/2022062717/56649e4d5503460f94b43964/html5/thumbnails/20.jpg)
Feature: “Funky Qualifiers”:Fake Function Bodies
char $_1_2 *strcat(char $_1_2 *dest,
const char $_1 *src);int main() { char $tainted *x; char $untainted *y; strcat(y, x);}
{1} ½ {1,2}
![Page 21: Elsa/Oink/Cqual++: Open-Source Static Analysis for C++ Scott McPeak Daniel Wilkerson work with Rob Johnson CodeCon 2006.](https://reader036.fdocuments.us/reader036/viewer/2022062717/56649e4d5503460f94b43964/html5/thumbnails/21.jpg)
Feature: Separate Compilation for Scalability
• “Compile” each file to a dataflow graph– only flow behavior between external symbols
matters– compress by finding smaller graph with same
flow behavior; typically saves factor of 12
• “Link” each graph– AST is gone at linking so we save even more
space
![Page 22: Elsa/Oink/Cqual++: Open-Source Static Analysis for C++ Scott McPeak Daniel Wilkerson work with Rob Johnson CodeCon 2006.](https://reader036.fdocuments.us/reader036/viewer/2022062717/56649e4d5503460f94b43964/html5/thumbnails/22.jpg)
Non-Feature: Cqual++ Is Not Flow-Sensitive
q = p;... time passes ...
p->s = read_from_network();use_in_untrusting_way(p->s);
// does p == q still??q->s = "innocuous";use_in_trusting_way(p->s);
$tainted??
![Page 23: Elsa/Oink/Cqual++: Open-Source Static Analysis for C++ Scott McPeak Daniel Wilkerson work with Rob Johnson CodeCon 2006.](https://reader036.fdocuments.us/reader036/viewer/2022062717/56649e4d5503460f94b43964/html5/thumbnails/23.jpg)
What Exactly Is ‘Data-Flow’?
char *launderString(char *in) { int len = strlen(in); char *out = malloc(len+1); for (int i=0; i<len; ++i) { out[i] = 0; for (int j=0; j<8; ++j) if (in[i] & (1<<j)) out[i] |= (1<<j); } out[len] = '\0'; return out;}
![Page 24: Elsa/Oink/Cqual++: Open-Source Static Analysis for C++ Scott McPeak Daniel Wilkerson work with Rob Johnson CodeCon 2006.](https://reader036.fdocuments.us/reader036/viewer/2022062717/56649e4d5503460f94b43964/html5/thumbnails/24.jpg)
Application: Finding Format-String Vulnerabilities
• Printf() is an interpreter
• the format string is a program– %n writes number of bytes written to memory
pointed to by the arg– ex: printf(“stuff%n”, p) means *p = 5
• if no argument p, printf() writes through some pointer on the stack– do not allow untrusted data in first arg to printf
![Page 25: Elsa/Oink/Cqual++: Open-Source Static Analysis for C++ Scott McPeak Daniel Wilkerson work with Rob Johnson CodeCon 2006.](https://reader036.fdocuments.us/reader036/viewer/2022062717/56649e4d5503460f94b43964/html5/thumbnails/25.jpg)
Application: Finding User-Kernel Vulnerabilities
• Kernel must check user pointers are valid– must point to memory mapped into user
process’s address space– otherwise could manipulate the kernel data
• This is also a dataflow/taint analysis
![Page 26: Elsa/Oink/Cqual++: Open-Source Static Analysis for C++ Scott McPeak Daniel Wilkerson work with Rob Johnson CodeCon 2006.](https://reader036.fdocuments.us/reader036/viewer/2022062717/56649e4d5503460f94b43964/html5/thumbnails/26.jpg)
Rob’s Cqual LinuxUser-Kernel Results
• 2.4.20, full config, 7 bugs, 275 false pos.
• 2.4.23, full config, 6 bugs, 264 false pos.
• including other trials on same kernels:– found 17 different security vulnerabilites– found bugs missed by other tools and manually– all but one bug confirmed exploitable– significant “bug churn” across kernel versions
![Page 27: Elsa/Oink/Cqual++: Open-Source Static Analysis for C++ Scott McPeak Daniel Wilkerson work with Rob Johnson CodeCon 2006.](https://reader036.fdocuments.us/reader036/viewer/2022062717/56649e4d5503460f94b43964/html5/thumbnails/27.jpg)
Linus’s “Sparse” Toolfor User-Kernel Vulnerabilities
• Linus also has a tool using type qualifiers– it requires manual annotation of every var
• In contrast, Cqual++ infers the qualifiers– only sources and sinks need be annotated– and any “sanitizer” functions:
• Linus says this “is not the C way”– ok, he can write all the annotations
![Page 28: Elsa/Oink/Cqual++: Open-Source Static Analysis for C++ Scott McPeak Daniel Wilkerson work with Rob Johnson CodeCon 2006.](https://reader036.fdocuments.us/reader036/viewer/2022062717/56649e4d5503460f94b43964/html5/thumbnails/28.jpg)
Future Application: Finding Character-Set Confusions
• Microsoft confusing ASCII and UCS2
• Mozilla has 20-ish differnt charcter sets
• they should only flow together through conversion functions
• if array sizes differ, confusions can be a security hole too
![Page 29: Elsa/Oink/Cqual++: Open-Source Static Analysis for C++ Scott McPeak Daniel Wilkerson work with Rob Johnson CodeCon 2006.](https://reader036.fdocuments.us/reader036/viewer/2022062717/56649e4d5503460f94b43964/html5/thumbnails/29.jpg)
Oink Vision:Composable Analysis Tools
• Compilers refuse to compile bugs– well, some classes of bugs– and you may have to wait until tomorrow
morning to find out
• Correctness analysis is expected as part of any compiler toolchain
• The analyses are composable and extensible