Generating Analyses for Detecting Faults in Path Segments

Post on 02-Jan-2016

60 views 0 download

Tags:

description

Generating Analyses for Detecting Faults in Path Segments. Wei Le* and Mary Lou Soffa University of Virginia. *currently with Rochester Institute of Technology. Motivation. Static analysis: an integral part of fault detection High code coverage No executables required - PowerPoint PPT Presentation

Transcript of Generating Analyses for Detecting Faults in Path Segments

Generating Analyses for Detecting Faults in Path Segments

Wei Le* and Mary Lou SoffaUniversity of Virginia

*currently with Rochester Institute of Technology

2

Motivation

• Static analysis: an integral part of fault detection

– High code coverage

– No executables required

– Find faults early, so cheaper to fix

3

Challenges of Current Static Analysis

Precisionmany false positives and little support for diagnosis

Scalabilitymanual annotations sometimes required

Generalityhardcode heuristics, new tools for different types of faults

Important to achieve all three

4

Precision: Path-Sensitive Analyses

Heuristics based: ESP[das02] (based on an assumption of typestate fault)

Summary based: Saturn[xie07] (lack of

interprocedual path-sensitivity)

Partially exploring the state space: Prefix[bush00]

exhaustive analysis based on the structure of a program

Framework: AthenaAutomatically generate analyses from specifications:

• precise: low false positives and rich diagnostic info

interprocedural path-sensitive analysis

reports path-segments of a fault

• scalable: only covers code relevant to the fault

demand-driven analysis

• general: data- and control-centric, liveness and safety

a specification technique and a generation algorithm

5

6

Faults

• Commonality of the faults - Generality

– The violations are always observable at certain statements

– We are able to construct constraints to express violations

• Locality of a fault - Scalability

– Only the segments along the paths that are relevant to the fault

– Only a limited number of statements on the paths that contribute to the fault

– Fault locality holds for a variety of the faults

Path-SensitiveDemand-Driven

Template

Specification Language

ParserAnalyzer

Generator

Precision and Scalability of the Analyses

Generate Analyses

Athena: Components

Specification Repository

Syntax trees Code modules

Athena: Workflow

Demand-Driven Template

ParserAnalyzer

Generator

SpecAnalyzer for

the Spec

Path Classification

Path Segment

Infeasible

Safe

Faulty (severity, root cause)

Don’t-know

Program Generated Analysis

Step 1: Specifying Faults

Step 2: Generating Analysis

Step 3: Analyzing programs with Generated Analysis

8

Definition of a FaultInformation for Detecting the Fault

Path-SensitiveDemand-Driven

Template

Specification Language

ParserAnalyzer

Generator

Precision and Scalability of the Analyses

Generate Analyses

Components I: Specification and Language

Specification Repository

• Spec: <program point, constraints> <program point, actions>

• Language: attributes and operators on attributes

• Attributes – abstractions on program objects, e.g. len(s)

• Operators – comparison (>,<), computation (+, -), command (:=)

Grammar of the LanguageSpecification→ Vars VarList DefineFault FaultSigList DetectFault DetectSigListVarList → Var*Var → VarType namelist;VarType → Vbuffer|Vint|Vany|Vptr|...FaultSigList → FaultSigItem <or FaultSigItem>*DetectSigList → DetectSigItem <or DetectSigItem>* |# include < ExistentSpec >FaultSigItem → CodeSignature ProgramPoint S-Constraint Condition|CodeSignature ProgramPoint L-Constraint ConditionDetectSigItem → CodeSignature ProgramPoint Update ActionProgramPoint → $LangSyntax$|Condition|$LangSyntax$&&ConditionCondition → Attribute Comparator Attribute|!Condition|[Condition]|Condition&&Condition|Condition || ConditionAction → Attribute:=Attribute| ^ Condition|[Action]|Action&&Action|Action || Action|Condition→ ActionAttribute → PrimitiveAttribute(var, ...)|Constant|!Attribute|¬ Attribute|[Attribute]|Attribute ° Attribute|Attribute Op Attribute|min(Attribute,Attribute)|[Attribute,Attribute]PrimitiveAttribute → Size|Len|Value|MatchOperand|TMax|TMin|...Constant → 0|true|false|...Comparator → = | | > | < | | | | Op → +| − | * | |

Grammar of the LanguageSpecification→ Vars VarList DefineFault FaultSigList DetectFault DetectSigListVarList → Var*Var → VarType namelist;VarType → Vbuffer|Vint|Vany|Vptr|...FaultSigList → FaultSigItem <or FaultSigItem>*DetectSigList → DetectSigItem <or DetectSigItem>* |# include < ExistentSpec >FaultSigItem → CodeSignature ProgramPoint S-Constraint Condition|CodeSignature ProgramPoint L-Constraint ConditionDetectSigItem → CodeSignature ProgramPoint Update Action

ProgramPoint → $LangSyntax$| Condition|$LangSyntax$&&Condition

Condition → Attribute Comparator Attribute|!Condition|[Condition]|Condition&&Condition|Condition || Condition

Action → Attribute:=Attribute| ^ Condition|[Action]|Action&&Action|Action || Action|Condition→ Action

Attribute → PrimitiveAttribute(var, ...)|Constant|!Attribute|¬Attribute|[Attribute]|Attribute ° Attribute|Attribute Op Attribute|min(Attribute,Attribute)|[Attribute,Attribute]PrimitiveAttribute → Size|Len|Value|MatchOperand|TMax|TMin|...Constant → 0|true|false|...Comparator → = | | > | < | | | | Operators → +| − | * | |

1212

DetectFault CodeSignatureUpdate

$strcpy(a,b)$Len(a):=Len(b)

or CodeSignatureUpdate

$d=strlen(b)$Value(d):=Len(b)

Specification

Buffer Overflow Specification

Vars Vbuffer a, b; Vint d; Vany e;

DefineFault CodeSignature $strcpy(a,b)$

S_Constraint Len(b) Size(a)

or CodeSignature $memcpy(a,b,d)$

S_Constraint Min(Len(b),Value(a) Size(a)

Specification Language

Precision and Scalability of the Analyses

Generate Analyses

Component II: Demand-Driven Template

Specification Repository

ParserAnalyzer

Generator

Path-SensitiveDemand-Driven

Template

• Formulate fault detection problems into queries about program facts, e.g., variable relations

• Scalable: Buffer overflow detection [le08]

14

Safe

x[10] = ‘0’

bar()

s = (char*)malloc(80)

strlen(t) < 8

strcpy(s,t) strcat(x,t)

size(s)>= len(t)

size(s)>= len(t) len(t) < 8 yes no

80/8>=len(t) len(t)<8 : safe

size(s)>= len(t) len(t) < 8

Query

Resolution 1

2

3

4

5 6

buffer overflow buffer access size(buf) >= len(str)

Demand-Driven TemplateProgram

no

yes

Raise Queries

Propagate Queries

Update Queries

Evaluate Queries

Program

no

yes

Raise Queries

Propagate Queries

Update Queries

Evaluate Queries

Demand-Driven Template

• Rules for Propagating Query

Interprocedural, path-sensitive, context-sensitive Branch, loop, call, infeasible path

• Evaluating Queries (integer constraints)

Algebra rules, inequalities Integer constraint solver

Path-SensitiveDemand-Driven

Template

Specification Language

Precision and Scalability of the Analyses

Generate Analyses

Components III: Parser and Code Generator

Specification Repository

ParserAnalyzer

Generator

CodeSignature $strcpy(a,b)$

S_Constraint Len(b) Size(a)

17

CodeSignature: GetOp(s) = strcpy

S_Constraint: Size(Src1(s)) Len(Src2(s))

=

GetOp strcpy

Size

º

º

LenSrc1

CodeSignature, S_Constraint

A B

Src2

Parsing Specification (YACC)

Leaf: attribute

Non-leaf: Operator

18

Construct a function that implements the semantics of the tree based on the semantics of operators

bool IsStrcpy(statement t){ if (GetOp(t)==“strcpy”) return true; else return false; }

Create the instance of the call

IsStrcpy(n)

Find the function that implements the semantics of leaf attributes

int GetOp (statement t) { C_Syntax(t); return t.opcode; }

Code Generation

=

GetOp strcpy

Code Signature

1919

Generating Analysis

no

Demand-Driven Template

Raise Queries

Propagate Queries

Update Queries

Evaluate Queries yes

Code Module Generated

if(isnode(s)) q= raiseQ(s)

if(isnode(s)) updateQ(q)

CodeSignature $strcpy(a,b)$

S_Constraint Len(b) Size(a)

CodeSignatureUpdate

$strcpy(a,b)$Len(a):=Len(b)

if(isnode(s)) q= raiseQ(s)

if(isnode(s)) updateQ(q)

Syntax trees

Code modules

Demand-Driven Template

ParserAnalyzer

Generator

SpecAnalyzer for the Spec

20

Experimental Setup Athena (analyze C/C++/C#) – YACC, Phoenix and Disolver

Research Questions

Experiments Benchmarks

Evaluation Metrics

Can we generate analyses for detecting different faults?

buffer overflow integer fault null-pointer derefmemory leak

bugbench ffmpeg putty apache

detection ratefalse positivesfalse negativesdiagnostic infoscalability

Comparable with manually customized detectors?

memory leak detectorsSaturn

SPEC CPU-INT 2000

• Detection: 84 faults of four types from 9 benchmarks, 68 new

• False positive/negative: 18 false positives, missed 3

• Path segments: generally relevant to 1-4 procedures; maximum 35 procedures

• Scalability: apache (268.9 k) – 4 hours and ffmpeg (48.1 k) – 2.3 hours

21

Can We Generate Analyses for Different Faults?

New faults: many located along the same paths; dynamic tools would halt

Main source of imprecision: infeasible paths and pointers

Locality helped achieve the scalability; without guidance, manual inspection is hard

Code complexity matters;Generality does compromise scalability, but still scalable

22

Comparable with Manually Customized Detectors?

Heuristics designed for suppressing false positives may adversely hurt detection rate

Leak FP

Athena 53 6

NoPaths

[Orlovich06]

3 29

ValueGraph [Jeffery07]

38 6

Null-p

FP Finish

Athena

9 3 9/12

Saturn

[xie07]

7 44 5/12

• Lack interprocedural path-sensitivity

• Heuristics of applying consistency rules

23

Related Work• Static fault detection: type based, model checking,

data flow analysis

• Path-sensitive fault detection: Prefix, Metal, ESP, Archer, Saturn, Calysto – exhaustive based static analysis

• Athena is demand-driven, more precise, scalable and general

• Slicing and other demand-driven analyses

• Athena first uses it for computing path segments of faults

24

ConclusionsAthena - generates demand-driven, path-based, symbolic analysis for detecting specified faults:

• Faults are developed along paths, but manifest locality, thus demand-driven, path-based analysis is more precise and scalable

• Specification provides a way of mapping fault detection problems to constraints on program objects at the program points

• To specify different faults, the required attributes are limited, and the expression power comes from the composition of the attributes

Thank you and Questions?

26

i <10

strcpy(p,t)

p[10]

scanf(%s, t)

yes

1

2

3i = strlen(t)

Value(i) < 10

Len(t) < 10

Feasible

Size(p) Len(t)

Len(t)<10 Size(p) Len(t)

Len(t)<10 IsEntry(t) Size(p) Len(t)

Fault DetectionBranch Analysis

4

Len(t)<10 IsEntry(t) 10 Len(t) [Safe]

5