ANALYSIS OF PROG. LANG. PROGRAM ANALYSIS Instructors: Crista Lopes Copyright © Instructors. 1.

Post on 14-Dec-2015

213 views 1 download

Transcript of ANALYSIS OF PROG. LANG. PROGRAM ANALYSIS Instructors: Crista Lopes Copyright © Instructors. 1.

1

ANALYSIS OF PROG. LANG.PROGRAM ANALYSISInstructors: Crista Lopes

Copyright © Instructors.

2

Motivation(s)

Where do you see PA in your everyday life?

How does PA “work”? What is PA anyway?

3

Auto-completion

4

Pre-compilation error detection

Ex: missing parenthesis

5

How do you know ...

int a;

increment_a() { a ++;

}

while(true) { String a = “hello”;

increment_a(); }

This “a” is not that “a”

6

How do you remember ...

int a;

increment_a() { a ++;

}

while(true) { String a = “hello”;

increment_a(); }

Wait, what’s the type of “a” again?

“a” is of type int (FYI...)

7

Outline

Introduction/motivations Program representation

AST 3-address code

Control flow analysis Data flow

8

Intermediate Representation (IR) Initial Point Abstract Syntax Tree

Abstract vs Concrete Syntax Parse Tree vs Abstract Syntax Tree

Three-address Codes

9

IR-1 Starting Point

Parsing, Lexical

Analysis

Code Generation, Optimizatio

n

Code Execution

Source

code

Intermediaterepresentation

Targetcode

Analyze IR – Perform analysis on the resultsUse this information for applications

10

IR-2. Abstract Syntax Tree (AST) Concrete vs Abstract Syntax

Concrete show structure and is language-specific

Abstract shows structure

Representations Parse Tree represents Concrete Syntax Abstract Syntax Tree represents Abstract

Syntax

11

IR-2. Example : Grammar

Example a:= b+c (Language 1) a = b+c; (Language 2)

Grammar for 1stmtlist � stmt | stmt stmtliststmt assign | if-then | …assign ident “:=“ ident binop identbinop “+” | “-” | …

Grammar for 2stmtlist � stmt “;”| stmt “;” stmtliststmt assign | if-then | …assign ident “=“ ident binop identbinop “+” | “-” | …

12

IR-2. Example: Parse Tree

stmtlist

stmt

assign

Ident := ident binop ident

a b “+” c

Parse Tree for a:=b+c Parse Tree for a=b+c;

stmtlist

stmt “;”

assign

Ident = ident binop ident

a b “+” c

13

IR-2 Example: Abstract Syntax Tree

Example

1. a:=b+c

2. a=b+c;

Abstract Syntax Tree for 1 and 2

assign

a add

b c

14

IR-3. Three Address Code

General form: x = y op z More generally: (operator, operand1, operand2, result)

(at most 3 spots besides the operator) May include temporary variables Examples

Assignment Binary x:= y op z (op, y, z, x) Unary x := op y (op, v, _, x)

Copy x:=y (_, y, _, x) Jumps

Unconditional goto L (goto, L, _, _) Conditional if x relop y goto L (relop, x, y, L)

….

15

IR-3. Example: Three Address Code if a>10

then x=y+zelse

x=y-z

1. if a>10 goto 4 2. x = y-z 3. goto 5 4. x = y + z 5. …..

16

Analysis Levels

Local within a single basic block or statement

Intraprocedural within a single procedure, function, or method

Interprocedural across procedure boundaries, procedure call, shared

globals, etc Intraclass

within a single class Interclass

across class boundaries …..

17

Outline

Introduction/motivations Program representation Control flow analysis

Computing Control Flow (analysis and representation)

Search and Traversals Applications

Data flow

18

Computing Control flow (example)Procedure AVGS1 count=0;S2 fread(fptr , n)S3 while(not EOF) doS4 if(n<0)S5 return(error)

elseS6 nums[count]=nS7 count++ endifS8 fread(fptr , n);

endwhileS9 avg= mean(nums , count)S10 return (avg)

S1

S2

S3

S4

S5

S10

S6

S9

S8

S7

EXIT

entry

19

CF1: Control Flow (Basic Blocks) A basic block is a sequence of

consecutive statements in which flow of control enters at the beginning and leaves at the end without halt of possibility of branch except at the end

A basic block may or may not be maximal

For compiler optimizations, maximal blocks are desirable

For software engineering tasks, basic blocks that represent one source code statement are often used

20

Computing Control flow (example)Procedure AVGS1 count=0;S2 fread(fptr , n)S3 while(not EOF) doS4 if(n<0)S5 return(error)

elseS6 nums[count]=nS7 count++ endifS8 fread(fptr , n);

endwhileS9 avg= mean(nums , count)S10 return (avg)

S1

S2

S3

S4

S5

S10

S6

S9

S8

S7

EXIT

entry

21

CF1: Computing Control Flow Input: A list of program statements in some form Output: A list of CFG nodes and edges Procedure:

Construct basic blocks Create entry exit nodes; create edge (entry, B1); create

(exit, Bk) for each Bk that represents an exit from program Add CFG edge from Bi to Bj if Bj can immediately follow Bi

in some execution i.e., There is conditional or unconditional goto from last statement of

Bi to first statement of Bj or Bj immediately follows Bi in the order of the program and Bi

does not end in unconditional goto statement Label edges that represent conditional transfers of control

22

CF2: Search and Ordering

Many ways to visit the nodes in the graph Depth First Search: Visits descendants of the

node before visiting any of its siblings Breadth First Search: All of the node’s

immediate descendants are processed before any of their unprocessed children

Preorder Traversal: A node is processed before its descendants

Postorder Traversal: A node is processed after its descendants

23

CF2: Search and Ordering (cont’d) (DFS)

One DFS of CFG 13467810,back to 8,9, back to 8, 7,6,4,5, back to 4,3,1,2,back to 1

The number assigned to a node during DFS is its depth first number

Depth first ordering of nodes is the reverse of the order in which nodes are visited in DFS

For the DFS, nodes are visited 1,3,4,6,7,8,10,8,9,8,7,6,5,4,3,1,2,1

Depth first ordering is 1,2,3,4,5,6,7,8,9,10

1

2

S3

S4

S5

S10

S6

S9

S8

S7

24

CF: Types of Edges

Depth first representation is depth first spanning tree along with other edges not part of the tree; tree edges, other edges

Three kinds of edges Advanced (forward) edges: go

from a node to one of its proper descendants in the tree; these include tree edges

Back edges: go from a node to one of its ancestor in the tree

Cross edges: connect nodes such that neither is an ancestor of the other

25

Applications of Control Flow

Complexity – Pointers to refactoring

Testing Branch, Path, Basis Path Branch: Must test 1-2, 1-3,

4-5, 4-8, 5-6, 5-7 Path: Infinite, due to loop Basis Path: Set of paths

which covers all the edges at least once e.g. 1,2,4,8; 1,3,4,5,6,7,4,8

Program Understanding Recover program structure

Impact analysis …..

1

2 3

4

8

6

5

7

26

Outline

Introduction/motivations Program representation Control flow Data flow

Introduction Reaching definitions

27

Data flow - Introduction

Flow of various data throughout the program Obtained from AST or CFG Used in software engineering tasks

Exact solutions to most data flow problems are undecidable May depend on input May depend on the outcome of a conditional

statement May depend on termination of loop

Thus we compute approximations of the exact solution

28

Data flow - Introduction

Some Approximations “overestimate” the solution Approximations contain actual information plus some

spurious information but does not omit any actual information Conservative and safe approach

Some Approximations “underestimate” the solution Approximations may not contain all the information of the

actual solution Unsafe

Research challenge: Providing safe but precise information in an efficient way

Uses of data flow: Compiler optimization requires conservative analysis Software engineering tasks may only need unsafe info

29

Data flow – Compiler Optimization

Common subexpression elimination

c=a+b=a

e=a+b=a

d=a+b=a

30

Data flow – Compiler Optimization

Common subexpression elimination

Need to know available expressions: which expressions have been computed at that point before this statement

c=a+b=a

e=a+b=a

d=a+b=a

t=a+b

c=tc=a

t=a+b

d=tc=a

e=t=a

31

Data Flow - Compiler Optimization

Register (de)allocation When assigning memory locations to

registers, if a value in a register (ie a memory location) is not used again, no need to keep it in a register

Is R2 needed after this statement? Need to know “live variables”: which

variables are still used after current line

R1=R2+10=a

32

Data Flow - Compiler Optimization

Suppose every assignment that reaches this statement assigns 5 to c

then ‘a’ can be replaced by 15

But: Need to know reaching definitions: which definition(s) of variable c reach this statement

a=c+10 // need 3 registers=a

a=15 //need 2 registers/a

33

Data Flow - Sw Eng Tasks

Data-Flow testing Suppose that a statement assigns a value but the use

of that value is never executed under test

a never used on this path

Need to know definition use pairs: link between definition(s) and use(s) of a variable (or a memory location)

a=c+10=a

d=a+y=a

34

Data Flow - Sw Eng Tasks

Debugging Suppose that ‘a’ has an incorrect value in the

statement Eg int overflow

Need data dependence information: some

statements produce erroneous values, others are affected by those values

a=c+y=a

d=a+y=a

35

Data flow - Example

Compute the flow of data throughout the program Where does the

assignment to i in statement 1 reach?

Where does the expression computed in statement 2 reach?

Which uses of variable are reachable from the end of Block1?

Is the value of variable i live after statement 2?

1. i=22. k=i+1

3. i=1

4. k=k+1

5. k=k-4

B1

B2

B3

B4

36

Reaching definitions analysis

Definition = statement where a variable is assigned a value (e.g. input statement, assignment statement)

A definition of ‘a’ reaches a point ‘p’ if there exists a control flow path in the CFG from the definition to ‘p’ with no other definitions of ‘a’ on the path

Such a path may exist in the graph but may not be possible – infeasible path

1. i=22. k=i+1

3. i=1

4. k=k+1

5. k=k-4

B1

B2

B3

B4

37

Reaching definitions analysis

What are the definitions in the program? Of variable i: Of variable k:

Which basic blocks (before block) do these definitions reach? Def 1 reaches: Def 2 reaches: Def 3 reaches: Def 4 reaches: Def 5 reaches:

1. i=22. k=i+1

3. i=1

4. k=k+1

5. k=k-4

B1

B2

B3

B4

38

Reaching definitions analysis

What are the definitions in the program? Of variable i: 1,3 Of variable k: 2,4,5

Which basic blocks (before block) do these definitions reach? Def 1 reaches: B2 Def 2 reaches: B1, B2, B3 Def 3 reaches: B1, B3, B4 Def 4 reaches: B4 Def 5 reaches: exit

1. i=22. k=i+1

3. i=1

4. k=k+1

5. k=k-4

B1

B2

B3

B4

39

Reaching definitions analysis

Method Compute two kinds of basic

information (within the block) Gen[B]: set of definitions

generated within B Kill[B]: set of definitions that, if

they reach the point before B, won’t reach end of B

Compute two other sets by propagation IN[B]: set of definitions the

reach the beginning of B OUT[B]: set of definitions that

reach the end of B

1. i=22. k=i+1

3. i=1

4. k=k+1

5. k=k-4

B1

B2

B3

B4

40

Reaching definitions analysis

Init GEN

Init KILL

Init IN

Init OUT

IN OUT

1 1,2 3,4,5

-- 1,2 2,3 1,2

2 3 1 -- 3 1,2 2,3

3 4 2,5 -- 4 2,3 3,4

4 5 2,4 -- 5 3,4 3,5

1. i=22. k=i+1

3. i=1

4. k=k+1

5. k=k-4

B1

B2

B3

B4

41

Iterative Data-Flow analysis algorithm

Algorithm for Reaching Definitions Input: CFG with GEN[B], KILL[B] for all B Output: IN[B], OUT[B] for all B

Begin RDIN[B]=empty, OUT[B]=GEN[B] for all B; change = trueWhile change do begin

change=falseFor each B do begin

IN[B]=union OUT[P] (P is a predecessor of B)OLDOUT=OUT[B]OUT[B]=GEN[B] union (IN[B]-KILL[B])if (OUT[B]!=OLDOUT) then change = true;

End forEnd whileEnd RD

42

Tools

Eclipse JDT/AST (APIs to construct, traverse and manipulate AST)

http://www.vogella.de/articles/EclipseJDT/article.html Sourcererhttp://sourcerer.ics.uci.edu/index.html Crystal (Data Analysis Framework, mostly

for academic purposes)http://code.google.com/p/crystalsaf/wiki/Installation

43

Mandatory Reading List

Representation and Analysis of Software – Rep-Analysis.pdf

Crystal Notes – CrystalTutorialNotes.pdf, CrystalTutorial.ppt

Eclipse JDT - AST - http://www.vogella.de/articles/EclipseJDT/article.html

44

More (optional) Reading List

Principles of Program Analysis, Nielson and Hankin

Invariant Detection using Daikon – daikon.pdf

More optional readings available at Program Analysis course material at CMU http://www.cs.cmu.edu/~aldrich/courses/15-819M/