Generating Analyses for Detecting Faults in Path Segments

26
Generating Analyses for Detecting Faults in Path Segments Wei Le* and Mary Lou Soffa University of Virginia *currently with Rochester Institute of Technology

description

Generating Analyses for Detecting Faults in Path Segments. Wei Le* and Mary Lou Soffa University of Virginia. *currently with Rochester Institute of Technology. Motivation. Static analysis: an integral part of fault detection High code coverage No executables required - PowerPoint PPT Presentation

Transcript of Generating Analyses for Detecting Faults in Path Segments

Page 1: Generating Analyses for Detecting Faults in Path Segments

Generating Analyses for Detecting Faults in Path Segments

Wei Le* and Mary Lou SoffaUniversity of Virginia

*currently with Rochester Institute of Technology

Page 2: Generating Analyses for Detecting Faults in Path Segments

2

Motivation

• Static analysis: an integral part of fault detection

– High code coverage

– No executables required

– Find faults early, so cheaper to fix

Page 3: Generating Analyses for Detecting Faults in Path Segments

3

Challenges of Current Static Analysis

Precisionmany false positives and little support for diagnosis

Scalabilitymanual annotations sometimes required

Generalityhardcode heuristics, new tools for different types of faults

Important to achieve all three

Page 4: Generating Analyses for Detecting Faults in Path Segments

4

Precision: Path-Sensitive Analyses

Heuristics based: ESP[das02] (based on an assumption of typestate fault)

Summary based: Saturn[xie07] (lack of

interprocedual path-sensitivity)

Partially exploring the state space: Prefix[bush00]

exhaustive analysis based on the structure of a program

Page 5: Generating Analyses for Detecting Faults in Path Segments

Framework: AthenaAutomatically generate analyses from specifications:

• precise: low false positives and rich diagnostic info

interprocedural path-sensitive analysis

reports path-segments of a fault

• scalable: only covers code relevant to the fault

demand-driven analysis

• general: data- and control-centric, liveness and safety

a specification technique and a generation algorithm

5

Page 6: Generating Analyses for Detecting Faults in Path Segments

6

Faults

• Commonality of the faults - Generality

– The violations are always observable at certain statements

– We are able to construct constraints to express violations

• Locality of a fault - Scalability

– Only the segments along the paths that are relevant to the fault

– Only a limited number of statements on the paths that contribute to the fault

– Fault locality holds for a variety of the faults

Page 7: Generating Analyses for Detecting Faults in Path Segments

Path-SensitiveDemand-Driven

Template

Specification Language

ParserAnalyzer

Generator

Precision and Scalability of the Analyses

Generate Analyses

Athena: Components

Specification Repository

Page 8: Generating Analyses for Detecting Faults in Path Segments

Syntax trees Code modules

Athena: Workflow

Demand-Driven Template

ParserAnalyzer

Generator

SpecAnalyzer for

the Spec

Path Classification

Path Segment

Infeasible

Safe

Faulty (severity, root cause)

Don’t-know

Program Generated Analysis

Step 1: Specifying Faults

Step 2: Generating Analysis

Step 3: Analyzing programs with Generated Analysis

8

Definition of a FaultInformation for Detecting the Fault

Page 9: Generating Analyses for Detecting Faults in Path Segments

Path-SensitiveDemand-Driven

Template

Specification Language

ParserAnalyzer

Generator

Precision and Scalability of the Analyses

Generate Analyses

Components I: Specification and Language

Specification Repository

• Spec: <program point, constraints> <program point, actions>

• Language: attributes and operators on attributes

• Attributes – abstractions on program objects, e.g. len(s)

• Operators – comparison (>,<), computation (+, -), command (:=)

Page 10: Generating Analyses for Detecting Faults in Path Segments

Grammar of the LanguageSpecification→ Vars VarList DefineFault FaultSigList DetectFault DetectSigListVarList → Var*Var → VarType namelist;VarType → Vbuffer|Vint|Vany|Vptr|...FaultSigList → FaultSigItem <or FaultSigItem>*DetectSigList → DetectSigItem <or DetectSigItem>* |# include < ExistentSpec >FaultSigItem → CodeSignature ProgramPoint S-Constraint Condition|CodeSignature ProgramPoint L-Constraint ConditionDetectSigItem → CodeSignature ProgramPoint Update ActionProgramPoint → $LangSyntax$|Condition|$LangSyntax$&&ConditionCondition → Attribute Comparator Attribute|!Condition|[Condition]|Condition&&Condition|Condition || ConditionAction → Attribute:=Attribute| ^ Condition|[Action]|Action&&Action|Action || Action|Condition→ ActionAttribute → PrimitiveAttribute(var, ...)|Constant|!Attribute|¬ Attribute|[Attribute]|Attribute ° Attribute|Attribute Op Attribute|min(Attribute,Attribute)|[Attribute,Attribute]PrimitiveAttribute → Size|Len|Value|MatchOperand|TMax|TMin|...Constant → 0|true|false|...Comparator → = | | > | < | | | | Op → +| − | * | |

Page 11: Generating Analyses for Detecting Faults in Path Segments

Grammar of the LanguageSpecification→ Vars VarList DefineFault FaultSigList DetectFault DetectSigListVarList → Var*Var → VarType namelist;VarType → Vbuffer|Vint|Vany|Vptr|...FaultSigList → FaultSigItem <or FaultSigItem>*DetectSigList → DetectSigItem <or DetectSigItem>* |# include < ExistentSpec >FaultSigItem → CodeSignature ProgramPoint S-Constraint Condition|CodeSignature ProgramPoint L-Constraint ConditionDetectSigItem → CodeSignature ProgramPoint Update Action

ProgramPoint → $LangSyntax$| Condition|$LangSyntax$&&Condition

Condition → Attribute Comparator Attribute|!Condition|[Condition]|Condition&&Condition|Condition || Condition

Action → Attribute:=Attribute| ^ Condition|[Action]|Action&&Action|Action || Action|Condition→ Action

Attribute → PrimitiveAttribute(var, ...)|Constant|!Attribute|¬Attribute|[Attribute]|Attribute ° Attribute|Attribute Op Attribute|min(Attribute,Attribute)|[Attribute,Attribute]PrimitiveAttribute → Size|Len|Value|MatchOperand|TMax|TMin|...Constant → 0|true|false|...Comparator → = | | > | < | | | | Operators → +| − | * | |

Page 12: Generating Analyses for Detecting Faults in Path Segments

1212

DetectFault CodeSignatureUpdate

$strcpy(a,b)$Len(a):=Len(b)

or CodeSignatureUpdate

$d=strlen(b)$Value(d):=Len(b)

Specification

Buffer Overflow Specification

Vars Vbuffer a, b; Vint d; Vany e;

DefineFault CodeSignature $strcpy(a,b)$

S_Constraint Len(b) Size(a)

or CodeSignature $memcpy(a,b,d)$

S_Constraint Min(Len(b),Value(a) Size(a)

Page 13: Generating Analyses for Detecting Faults in Path Segments

Specification Language

Precision and Scalability of the Analyses

Generate Analyses

Component II: Demand-Driven Template

Specification Repository

ParserAnalyzer

Generator

Path-SensitiveDemand-Driven

Template

• Formulate fault detection problems into queries about program facts, e.g., variable relations

• Scalable: Buffer overflow detection [le08]

Page 14: Generating Analyses for Detecting Faults in Path Segments

14

Safe

x[10] = ‘0’

bar()

s = (char*)malloc(80)

strlen(t) < 8

strcpy(s,t) strcat(x,t)

size(s)>= len(t)

size(s)>= len(t) len(t) < 8 yes no

80/8>=len(t) len(t)<8 : safe

size(s)>= len(t) len(t) < 8

Query

Resolution 1

2

3

4

5 6

buffer overflow buffer access size(buf) >= len(str)

Demand-Driven TemplateProgram

no

yes

Raise Queries

Propagate Queries

Update Queries

Evaluate Queries

Page 15: Generating Analyses for Detecting Faults in Path Segments

Program

no

yes

Raise Queries

Propagate Queries

Update Queries

Evaluate Queries

Demand-Driven Template

• Rules for Propagating Query

Interprocedural, path-sensitive, context-sensitive Branch, loop, call, infeasible path

• Evaluating Queries (integer constraints)

Algebra rules, inequalities Integer constraint solver

Page 16: Generating Analyses for Detecting Faults in Path Segments

Path-SensitiveDemand-Driven

Template

Specification Language

Precision and Scalability of the Analyses

Generate Analyses

Components III: Parser and Code Generator

Specification Repository

ParserAnalyzer

Generator

Page 17: Generating Analyses for Detecting Faults in Path Segments

CodeSignature $strcpy(a,b)$

S_Constraint Len(b) Size(a)

17

CodeSignature: GetOp(s) = strcpy

S_Constraint: Size(Src1(s)) Len(Src2(s))

=

GetOp strcpy

Size

º

º

LenSrc1

CodeSignature, S_Constraint

A B

Src2

Parsing Specification (YACC)

Leaf: attribute

Non-leaf: Operator

Page 18: Generating Analyses for Detecting Faults in Path Segments

18

Construct a function that implements the semantics of the tree based on the semantics of operators

bool IsStrcpy(statement t){ if (GetOp(t)==“strcpy”) return true; else return false; }

Create the instance of the call

IsStrcpy(n)

Find the function that implements the semantics of leaf attributes

int GetOp (statement t) { C_Syntax(t); return t.opcode; }

Code Generation

=

GetOp strcpy

Code Signature

Page 19: Generating Analyses for Detecting Faults in Path Segments

1919

Generating Analysis

no

Demand-Driven Template

Raise Queries

Propagate Queries

Update Queries

Evaluate Queries yes

Code Module Generated

if(isnode(s)) q= raiseQ(s)

if(isnode(s)) updateQ(q)

CodeSignature $strcpy(a,b)$

S_Constraint Len(b) Size(a)

CodeSignatureUpdate

$strcpy(a,b)$Len(a):=Len(b)

if(isnode(s)) q= raiseQ(s)

if(isnode(s)) updateQ(q)

Syntax trees

Code modules

Demand-Driven Template

ParserAnalyzer

Generator

SpecAnalyzer for the Spec

Page 20: Generating Analyses for Detecting Faults in Path Segments

20

Experimental Setup Athena (analyze C/C++/C#) – YACC, Phoenix and Disolver

Research Questions

Experiments Benchmarks

Evaluation Metrics

Can we generate analyses for detecting different faults?

buffer overflow integer fault null-pointer derefmemory leak

bugbench ffmpeg putty apache

detection ratefalse positivesfalse negativesdiagnostic infoscalability

Comparable with manually customized detectors?

memory leak detectorsSaturn

SPEC CPU-INT 2000

Page 21: Generating Analyses for Detecting Faults in Path Segments

• Detection: 84 faults of four types from 9 benchmarks, 68 new

• False positive/negative: 18 false positives, missed 3

• Path segments: generally relevant to 1-4 procedures; maximum 35 procedures

• Scalability: apache (268.9 k) – 4 hours and ffmpeg (48.1 k) – 2.3 hours

21

Can We Generate Analyses for Different Faults?

New faults: many located along the same paths; dynamic tools would halt

Main source of imprecision: infeasible paths and pointers

Locality helped achieve the scalability; without guidance, manual inspection is hard

Code complexity matters;Generality does compromise scalability, but still scalable

Page 22: Generating Analyses for Detecting Faults in Path Segments

22

Comparable with Manually Customized Detectors?

Heuristics designed for suppressing false positives may adversely hurt detection rate

Leak FP

Athena 53 6

NoPaths

[Orlovich06]

3 29

ValueGraph [Jeffery07]

38 6

Null-p

FP Finish

Athena

9 3 9/12

Saturn

[xie07]

7 44 5/12

• Lack interprocedural path-sensitivity

• Heuristics of applying consistency rules

Page 23: Generating Analyses for Detecting Faults in Path Segments

23

Related Work• Static fault detection: type based, model checking,

data flow analysis

• Path-sensitive fault detection: Prefix, Metal, ESP, Archer, Saturn, Calysto – exhaustive based static analysis

• Athena is demand-driven, more precise, scalable and general

• Slicing and other demand-driven analyses

• Athena first uses it for computing path segments of faults

Page 24: Generating Analyses for Detecting Faults in Path Segments

24

ConclusionsAthena - generates demand-driven, path-based, symbolic analysis for detecting specified faults:

• Faults are developed along paths, but manifest locality, thus demand-driven, path-based analysis is more precise and scalable

• Specification provides a way of mapping fault detection problems to constraints on program objects at the program points

• To specify different faults, the required attributes are limited, and the expression power comes from the composition of the attributes

Page 25: Generating Analyses for Detecting Faults in Path Segments

Thank you and Questions?

Page 26: Generating Analyses for Detecting Faults in Path Segments

26

i <10

strcpy(p,t)

p[10]

scanf(%s, t)

yes

1

2

3i = strlen(t)

Value(i) < 10

Len(t) < 10

Feasible

Size(p) Len(t)

Len(t)<10 Size(p) Len(t)

Len(t)<10 IsEntry(t) Size(p) Len(t)

Fault DetectionBranch Analysis

4

Len(t)<10 IsEntry(t) 10 Len(t) [Safe]

5