Program Analysis and Transformation

Post on 31-Dec-2015

36 views 1 download

Tags:

description

Program Analysis and Transformation. Program Analysis. Extracting information, in order to present abstractions of, or answer questions about, a software system Static Analysis: Examines the source code Dynamic Analysis: Examines the system as it is executing. What are we looking for?. - PowerPoint PPT Presentation

Transcript of Program Analysis and Transformation

Program Analysis and Transformation

Apr 19, 2023 COSC6431 2

Program Analysis

• Extracting information, in order to present abstractions of, or answer questions about, a software system

• Static Analysis: Examines the source code

• Dynamic Analysis: Examines the system as it is executing

Apr 19, 2023 COSC6431 3

What are we looking for?

• Depends on our goals and the system– In almost any language, we can find out information

about variable usage

– In an OO environment, we can find out which classes use other classes, which are a base of an inheritance structure, etc.

– We can also find potential blocks of code that can never be executed in running the program (dead code)

– Typically, the information extracted is in terms of entities and relationships

Apr 19, 2023 COSC6431 4

Entities

• Entities are individuals that live in the system, and attributes associated with them.

Some examples:– Classes, along with information about their superclass,

their scope, and ‘where’ in the code they exist.

– Methods/functions and what their return type or parameter list is, etc.

– Variables and what their types are, and whether or not they are static, etc.

Apr 19, 2023 COSC6431 5

Relationships

• Relationships are interactions between the entities in the system.

Relationships include:– Classes inheriting from one another.

– Methods in one class calling the methods of another class, and methods within the same class calling one another.

– A method referencing an attribute.

Apr 19, 2023 COSC6431 6

Information format

• Many different formats in use• Simple but effective: RSF

inherit TRIANGLE SHAPE• TA is an extension of RSF that includes a schema

$INSTANCE SHAPE Class• GXL is a XML-like extension of TA

Blow-up factor of 10 or more makes it rather cumbersome

Apr 19, 2023 COSC6431 7

Static Analysis

• Involves parsing the source code

• Usually creates an Abstract Syntax Tree

• Borrows heavily from compiler technology but stops before code generation

• Requires a grammar for the programming language

• Can be very difficult to get right

Apr 19, 2023 COSC6431 8

CppETS

• CppETS is a benchmark for C++ extractors

• It consists of a collection of C++ programs that pose various problems commonly found in parsing and reverse engineering

• Static analysis research tools typically get about 60% of the problems right

Apr 19, 2023 COSC6431 9

Example program

#include <iostream.h>class Hello {public: Hello(); ~Hello(); };Hello::Hello(){ cout << "Hello, world.\n"; } Hello::~Hello(){ cout << "Goodbye, cruel world.\n"; }main() {

Hello h;return 0;

}

Apr 19, 2023 COSC6431 10

Example Q&A

• How many member methods are in the Hello class?

• Where are these member methods used?

Answer: Two, the constructor (Hello::Hello()) and destructor (Hello::~Hello()).

Answer: The constructor is called implicitly when an instance of the class is created. The destructor is called implicitly when the execution leaves the scope of the instance.

Apr 19, 2023 COSC6431 11

Static analysis in IDEs

• High-level languages lend themselves better to static analysis needs– EiffelStudio automatically creates BON

diagrams of the static structure of Eiffel systems

– Rational Rose does the same with UML and Java

• Unfortunately, most legacy systems are not written in either of these languages

Apr 19, 2023 COSC6431 12

Static analysis pipeline

Source code Parser Abstract Syntax Tree

Fact base

Fact extractor

Clustering algorithm

Metrics tool

Visualizer

Apr 19, 2023 COSC6431 13

Dynamic Analysis

• Provides information about the run-time behaviour of software systems, e.g.– Component interactions– Event traces– Concurrent behaviour– Code coverage– Memory management

• Can be done with a profiler or a debugger

Apr 19, 2023 COSC6431 14

Instrumentation

• Augments the subject program with code that transmits events to a monitoring application, or writes relevant information to an output file

• A profiler can be used to examine the output file and extract relevant facts from it

• Instrumentation affects the execution speed and storage space requirements of the system

Apr 19, 2023 COSC6431 15

Instrumentation process

Source code Annotator Annotated program

Instrumentedexecutable

CompilerAnnotation

script

Apr 19, 2023 COSC6431 16

Dynamic analysis pipeline

Instrumentedexecutable

CPU Dynamic analysis data

Fact base

Profiler

Clustering algorithm

Metrics tool

Visualizer

Apr 19, 2023 COSC6431 17

Non-instrumented approach

• One can also use debugger log files to obtain dynamic information

• Disadvantage: Limited amount of information provided

• Advantage: Less intrusive approach, more accurate performance measurements

Apr 19, 2023 COSC6431 18

Dynamic analysis issues

• Ensuring good code coverage is a key concern

• A comprehensive test suite is required to ensure that all paths in the code will be exercised

• Results may not generalize to future executions

Apr 19, 2023 COSC6431 19

Static vs. Dynamic

• Reasons over all possible behaviours (general results)

• Conservative and sound

• Challenge: Choose good abstractions

• Observes a small number of behaviours (specific results)

• Precise and fast

• Challenge: Select representative test cases

Apr 19, 2023 COSC6431 20

SWAGKit

• SWAGKit is used to generate software landscapes from source code

• Based on a pipeline architecture with three phases– Extract (cppx, jfx)– Manipulate (prep, linkplus, layoutplus)– Present (lsedit)

• Currently usable for programs written in C/C++ and Java

Apr 19, 2023 COSC6431 21

The SWAGKit Pipeline

layoutpluslinkpluscppx prep lsedit

SourceCode

Landscape

Apr 19, 2023 COSC6431 22

The SWAGKit Pipeline

Function Filter Input Output

Extract cppx source .ta

Manipulate prep .ta .o.ta

Linkplus *.o.ta out.ln.ta

Layoutplus out.ln.ta out.ls.ta

Present lsedit out.ls.ta picture

Apr 19, 2023 COSC6431 23

cppx & prep

• C/C++ Fact extractor based on gcc (http://swag.uwaterloo.ca/~cppx)

• Extracts facts from one source file at a time

• Facts represent program information as a series of triples– $INSTANCE x integer == x is an integer

– inherit Student Person == Student inherits from Person

– call foo bar == foo calls bar

• Produces .c.ta files, one per source file

• Use –g option for gcc parameters

Apr 19, 2023 COSC6431 24

cppx & prep

• Prep is a series of scripts written in Grok

• Function is to “clean up” facts from cppx so they are in a form which can be usable by the rest of the pipeline.

• Produces one .o.ta for each .ta

• Can replace “manual” use of cppx & prep with gce– Edit makefile, replace gcc with gce

– Type make

Apr 19, 2023 COSC6431 25

Grok

• A simple scripting language

• A relational algebraic calculator– Powerful in manipulating binary relations– Widely used in architecture transformation

• Online documentation

http://swag.uwaterloo.ca/~nsynytskyy/grokdoc/index.html

Apr 19, 2023 COSC6431 26

Grok Features

• Set operations– Union (+), intersection (^), subtraction (-), cross-

product (X)

• Binary relation operations– Union (+), intersection (^), subtraction (-),

composition (o, *), projection (.), domain (dom), range (rng), identity (id), inverse (inv), entity (ent), transitive closure (+), and reflective transitive closure (*)

Apr 19, 2023 COSC6431 27

Grok Features Cont.

• Programming constructs– if else

– for, while

• Arithmetic, comparison, logical operators– +, -, *, /, %

– <, <=, ==, >=, >, !=

– !, &&, ||

Apr 19, 2023 COSC6431 28

Grok Scripts (1)$ Grok>> cat := {“Garfield”, “Fluffy”}>> mouse := {“Mickey”, “Nancy”}>> cheese := {“Roquefort”, “Swiss”}>> animals := cat + mouse>> food := mouse + cheese>> animalsWhichAreFood := animals ^ food>> animalsWhichAreNotFood := animals – food>> animalsWhichAreFoodMickeyNancy>> animals – foodGarfieldFluffy>> #food4>> mouse <= foodTrue>>

>> chase := cat X mouse

>> chase

Garfield Mickey

Garfield Nancy

Fluffy Mickey

Fluffy Nancy

>>

>> eat := chase + mouse X cheese

>> eat

Garfield Mickey

Garfield Nancy

Fluffy Mickey

Fluffy Nancy

Mickey Roquefort

Mickey Swiss

Nancy Roquefort

Nancy Swiss

Apr 19, 2023 COSC6431 29

Grok Scripts (2)

>> {“Mickey”} . eatRoquefortSwiss>> eat . {“Mickey”}GarfieldFluffy>>>> eater := dom eat>> food := rng eat>> chasedBy := inv chase>> topOfFoodChain := dom eat – rng eat>> bottomOfFoodChain := rng eat – dom eat>> bothEatAndChase :=  eat ^ chase>> eatButNotChase := eat – chase>> chaseButNotEat := chase – eat>> secondOrderEat :=  eat  o  eat>> anyOrderEat := eat +

if expression thenstatements

elsestatements

end if

loopstatementsexit when condition

end loop

for variable in setstatements

end for

Apr 19, 2023 COSC6431 30

A real example

containFacts := $1getdb containFactsd := dom containr := rng containe := ent containroot := d – rleaves := r – drootChildren := root . containtoKeep := leaves + rootChildrentoDelete := e – toKeepcc := contain+delset toDeletedelrel containcontain := ccrelToFile contain $2

Input: A containment treeOutput: A flattened version of thecontainment tree

Apr 19, 2023 COSC6431 31

linkplus

• Function is to “link” all facts into one large graph– Combine graphs from .o.ta files– Resolve inter-compilation unit relationships– Merge header files together– Do some cleanup to shrink final graph

• Usage:– linkplus list_of_files_to_link

• Produces out.ln.ta

Apr 19, 2023 COSC6431 32

layoutplus

• Adds– Clustering of facts based on contain.rsf (created manually or

from a clustering algorithm)

– Layout information so that graph can be displayed

– Schema information

• Usage– layoutplus contain_file out.ln.ta

• Produces out.ls.ta

Apr 19, 2023 COSC6431 33

lsedit

• View software landscape produced by previous parts of the pipeline

• Can make changes to landscape and save them

• Usage– lsedit out.ls.ta

Apr 19, 2023 COSC6431 34

Program Representation

• Fundamental issue in re-engineering– Provides means to generate abstractions– Provides input to a computational model for

analyzing and reasoning about programs– Provides means for translation and

normalization of programs

Apr 19, 2023 COSC6431 35

Key questions

• What are the strengths and weaknesses of various representations of programs?

• What levels of abstraction are useful?

Apr 19, 2023 COSC6431 36

Abstract Syntax Trees

• A translation of the source text in terms of operands and operators

• Omits superficial details, such as comments, whitespace

• All necessary information to generate further abstractions is maintained

Apr 19, 2023 COSC6431 37

AST production

• Four necessary elements to produce an AST:– Lexical analyzer (turn input strings into

tokens)– Grammar (turn tokens into a parse tree)– Domain Model (defines the nodes and arcs

allowable in the AST)– Linker (annotates the AST with global

information, e.g. data types, scoping etc.)

Apr 19, 2023 COSC6431 38

AST example

• Input string: 1 + /* two */ 2• Parse Tree:

• AST (withoutglobal info)

21

+

intint

Add

1 2

arg1 arg2

Apr 19, 2023 COSC6431 39

Program Transformation

• A program is a structured object with semantics

• Structure allows us to transform a program

• Semantics allow us to compare programs and decide on the validity of transformations

Apr 19, 2023 COSC6431 40

Program Transformation

• The act of changing one program into another (from a source language to a target language)

• Used in many areas of software engineering:– Compiler construction

– Software visualization

– Documentation generation

– Automatic software renovation

Apr 19, 2023 COSC6431 41

Application examples

• Converting to a new language dialect• Migrating from a procedural language to an

object-oriented one, e.g. C to C++• Adding code comments• Requirement upgrading, e.g. using 4 digits for

years instead of 2 (Y2K)• Structural improvements, e.g. changing GOTOs

to control structures• Pretty printing

Apr 19, 2023 COSC6431 42

Simple program transformation

• Modify all arithmetic expressions to reduce the number of parentheses using the formula: (a+b)*c = a*c + b*c

x := (2+5)*3becomesx := 2*3 + 5*3

Apr 19, 2023 COSC6431 43

Two types of transformations

• Translation– Source and target language are different– Semantics remain the same

• Rephrasing– Source and target language are the same– Goal is to improve some aspect of the program

such as its understandability or performance– Semantics might change

Apr 19, 2023 COSC6431 44

Translation

• Program synthesis– Lowers the level of abstraction, e.g. compilation

• Program migration– Transform to a different language

• Reverse Engineering– Raises the level of abstraction, e.g. create architectural

descriptions from the source code

• Program Analysis– Reduces the program to one aspect, e.g. control flow

Apr 19, 2023 COSC6431 45

Translation taxonomy

Apr 19, 2023 COSC6431 46

Rephrasing

• Program normalization– Decreases syntactic complexity (desugaring),

e.g. algebraic simplification of expressions

• Program optimization– Improves performance, e.g. inlining, common-

subexpression and dead code elimination

Apr 19, 2023 COSC6431 47

Rephrasing

• Program refactoring– Improves the design by restructuring while

preserving the functionality

• Program obfuscation– Deliberately makes the program harder to

understand

• Software renovation– Fixes bugs such as Y2K

Apr 19, 2023 COSC6431 48

Transformation tools

• There are many transformation tools

• Program-Transformation.org lists 90 of them

• Most are based on term rewriting

• Other solutions use functional programming, lambda calculus, etc.

Apr 19, 2023 COSC6431 49

Term rewriting

• The process of simplifying symbolic expressions (terms) by means of a Rewrite System, i.e. a set of Rewrite Rules.

• A Rewrite Rule is of the formlhs rhswhere lhs and rhs are term patterns

Apr 19, 2023 COSC6431 50

Example Rewrite System

0 + x x s(x) + y s(x + y)(x + y) + z x + (y + z)

Under these rewrite rules, the term((s(s(a)) + s(b)) + c)will be rewritten ass(s(s(a + (b + c))))

Apr 19, 2023 COSC6431 51

TXL

• A generalized source-to-source translation system

• Uses a context-free grammar to describe the structures to be transformed

• Rule specification uses a by-example style

• Has been used to process billions of lines of code for Y2K purposes

Apr 19, 2023 COSC6431 52

TXL programs

• TXL programs consist of two parts:– Grammar for the input language– Transformation Rules

• Let’s look at some examples…

Apr 19, 2023 COSC6431 53

Calculator.Txl - Grammar

% Part I. Syntax specification

define program

[expression]

end define

define expression

[term]

| [expression] [addop] [term]

end define

define term

[primary]

| [term] [mulop] [primary]

end define

define primary [number] | ( [expression] )end define define addop '+ | '-end define define mulop '* | '/end define

Apr 19, 2023 COSC6431 54

Calculator.Txl - Rules% Part 2. Transformation rulesrule main replace [expression] E [expression] construct NewE [expression] E [resolveAddition] [resolveSubtraction] [resolveMultiplication] [resolveDivision] [resolveParentheses] where not NewE [= E] by NewEend rule

rule resolveAddition replace [expression] N1 [number] + N2 [number] by N1 [+ N2]end rule rule resolveSubtraction …rule resolveMultiplication …rule resolveDivision …rule resolveParentheses replace [primary] ( N [number] ) by Nend rule

Apr 19, 2023 COSC6431 55

DotProduct.Txl

% Form the dot product of two vectors,% e.g., (1 2 3).(3 2 1) => 10define program ( [repeat number] ) . ( [repeat number] ) | [number]end define

rule main replace [program] ( V1 [repeat number] ) . ( V2 [repeat number] ) construct Zero [number] 0 by Zero [addDotProduct V1 V2]end rule

rule addDotProduct V1 [repeat number] V2 [repeat number] deconstruct V1 First1 [number]

Rest1 [repeat number] deconstruct V2 First2 [number]

Rest2 [repeat number] construct ProductOfFirsts [number] First1 [* First2] replace [number] N [number] by N [+ ProductOfFirsts]

[addDotProduct Rest1 Rest2]end rule

Apr 19, 2023 COSC6431 56

Sort.Txl

% Sort.Txl - simple numeric bubble sortdefine program [repeat number]end definerule main replace [repeat number] N1 [number] N2 [number] Rest [repeat number] where N1 [> N2] by N2 N1 Restend rule

Apr 19, 2023 COSC6431 57

Other TXL constructs

compounds -> :=end compoundskeys var procedure exists inout outend keysfunction isAnAssignmentTo X [id] match [statement] X := Y [expression]end function

Apr 19, 2023 COSC6431 58

www.txl.ca

• Guided Tour

• Many examples

• Reference manual

• Download TXL for many platforms

Apr 19, 2023 COSC6431 59

Example uses

• HTML Pretty Printing of Source Code

• Language to Language Translation

• Design Recovery from Source

• Improvement of security problems

• Program instrumentation and measurement

• Logical formula simplification and interpretation.