Download - Dataflow Frequency Analysis based on Whole Program Paths Eduard Mehofer Institute for Software Science University of Vienna [email protected] mehofer.

Transcript
Page 1: Dataflow Frequency Analysis based on Whole Program Paths Eduard Mehofer Institute for Software Science University of Vienna mehofer@par.univie.ac.at mehofer.

Dataflow Frequency Analysis based on Whole Program Paths

Eduard Mehofer

Institute for Software Science

University of Vienna

[email protected]

www.par.univie.ac.at/~mehofer

Bernhard Scholz

Institute of Computer Languages

Vienna University of Technology

[email protected]

www.complang.tuwien.ac.at/scholz

Page 2: Dataflow Frequency Analysis based on Whole Program Paths Eduard Mehofer Institute for Software Science University of Vienna mehofer@par.univie.ac.at mehofer.

Page 2

Dataflow Frequency Analysis

Goal– accurately computing frequencies of data flow facts

Problem: – high costs for computing accurate frequencies

• requires whole program path• efficient data structures and algorithm?

Approach:– exploiting algebraic properties of bi-distributive DFA problems – employing WPPs to capture control flow– computing frequencies in a bottom-up style on the WPP graph

Page 3: Dataflow Frequency Analysis based on Whole Program Paths Eduard Mehofer Institute for Software Science University of Vienna mehofer@par.univie.ac.at mehofer.

Page 3

Outline Motivation

WPP profiling

Properties of bi-distributive DFAs

Algorithm

Experiments

Conclusion

Page 4: Dataflow Frequency Analysis based on Whole Program Paths Eduard Mehofer Institute for Software Science University of Vienna mehofer@par.univie.ac.at mehofer.

Page 4

ProgramProgram

Classical Approach

Drawback:

Classical Program Optimization:

transformation

data flow analysis

optimizer

binary informationOptimizedprogram

Optimizedprogram

OptimizerOptimizer

heavily rarely never

Page 5: Dataflow Frequency Analysis based on Whole Program Paths Eduard Mehofer Institute for Software Science University of Vienna mehofer@par.univie.ac.at mehofer.

Page 5

ProgramProgram

Profiling Approach

Advantage:

Probabilistic Program Optimization:

transformation

dataflow freq. analysis

Optimizer based on profiling

frequency informationOptimizedprogram

Optimizedprogram

OptimizerOptimizer

heavily rarely never

ProfileProfile

Page 6: Dataflow Frequency Analysis based on Whole Program Paths Eduard Mehofer Institute for Software Science University of Vienna mehofer@par.univie.ac.at mehofer.

Page 6

Running Example CFG Example

– simple code fragment– 8 times left branch– terminates via right branch

Reaching definitions problem– two definitions: d1, d2

– d1 kills d2 and vice versa– use of x at the end of loop

Questions – How often does d1 hold at node 5?– How often does d2 hold at node 5?

s

1

32

4

5

d1: x:=... d2: x:=...

...x...

Page 7: Dataflow Frequency Analysis based on Whole Program Paths Eduard Mehofer Institute for Software Science University of Vienna mehofer@par.univie.ac.at mehofer.

Page 7

WPP Profiling

Captures the whole program path – Larus at PLDI’99

Path profiling techniques for acyclic paths– minimal insertion of instrumentation code– keeps executable fast

Sequitur for compression – builds a grammar – terminals are acyclic paths– nonterminals have only one production– graph representation of grammar – grammar has only sentence – best case: logarithmic size reduction

Page 8: Dataflow Frequency Analysis based on Whole Program Paths Eduard Mehofer Institute for Software Science University of Vienna mehofer@par.univie.ac.at mehofer.

Page 8

WPP Example

s

1

32

4

5

CFG Example

Program Run- 8x left branch- 1x right branch

A

S

a b c

S a A A A b c

A b b

WPP Graph & Grammar

Terminals:a: [s,1,2,4]

b: [1,2,4]

c: [1,3,4,5]

Page 9: Dataflow Frequency Analysis based on Whole Program Paths Eduard Mehofer Institute for Software Science University of Vienna mehofer@par.univie.ac.at mehofer.

Page 9

Bi-Distributive Dataflow Problems

Properties– finite lattice 2D (power set of dataflow facts)– transition functions are monotone– transition functions distribute

– representation relation– covers bit-vector problems

Due to properties– transition functions represented as 0/1-matrices– states represented as 0/1-vectors

)()()(

)()()(

YfXfYXf

YfXfYXf

Page 10: Dataflow Frequency Analysis based on Whole Program Paths Eduard Mehofer Institute for Software Science University of Vienna mehofer@par.univie.ac.at mehofer.

Page 10

Representation Relation Transition function f: 2D 2D

– represented by f r : D 2D

– artificial data fact )0(})({)(

}{)0()(

fdfdf

ffr

r

2

4d1: x:=...

1

{d1, }

{}d2

{}d1

M(24)rD

Example

Page 11: Dataflow Frequency Analysis based on Whole Program Paths Eduard Mehofer Institute for Software Science University of Vienna mehofer@par.univie.ac.at mehofer.

Page 11

Matrix Representation

Matrix representation of function f

otherwise,0

)( if,1 jr

iij

dfda

1

0

1

1

1

1

100

000

100

,)42( 21 ddM A

121,)42( dddM

Example

Page 12: Dataflow Frequency Analysis based on Whole Program Paths Eduard Mehofer Institute for Software Science University of Vienna mehofer@par.univie.ac.at mehofer.

Page 12

Dataflow Frequencies Definition of dataflow frequencies for node v

r whole program path

– prefix: set of all sub-paths from start node to node v : converts data flow facts to 0/1-vector– state(): data flow facts which hold along path – sums up the occurrences of data flow facts which hold in v

Approach for fast computation– adopt definition for grammar symbols of SEQUITUR

))state(()(),Prefix(

vr

rvy

s

v

Page 13: Dataflow Frequency Analysis based on Whole Program Paths Eduard Mehofer Institute for Software Science University of Vienna mehofer@par.univie.ac.at mehofer.

Page 13

Frequency Matrix Definition of frequency matrices

– sum computation due matrix calculus

Ak

vvuuu

uuuMvFk

]),,,[()( 21),Prefix(],,,,[ 121

)()(

))state(()(),Prefix(

cvF

vy

r

r

rv

Frequency matrices for eliminating sum

Computation of frequency matrices for grammar symbols

Page 14: Dataflow Frequency Analysis based on Whole Program Paths Eduard Mehofer Institute for Software Science University of Vienna mehofer@par.univie.ac.at mehofer.

Page 14

Terminals Transition function

– compose function for acyclic path t:[u1, u2, ..., uk]

– represent transition function as matrix

Akk

Ak

A

uuMuuM

uuuMtM

)()(

]),,,([)(

211

21

otherwise0

,,, if]),,,([)( 2121 k

A

t

uuuvvuuMvF

Frequency matrix

Page 15: Dataflow Frequency Analysis based on Whole Program Paths Eduard Mehofer Institute for Software Science University of Vienna mehofer@par.univie.ac.at mehofer.

Page 15

Nonterminals Transition function

– compose transition function for ntX1, X2, ..., Xk

– represent transition function as matrix

AAAk

Ak

A

XMXMXM

XXXMntM

)()()(

])()(

12

21

AkX

AXXnt

XXXMvF

XMvFvFvF

k)()(

)()()()(

21

121

Frequency matrix

Page 16: Dataflow Frequency Analysis based on Whole Program Paths Eduard Mehofer Institute for Software Science University of Vienna mehofer@par.univie.ac.at mehofer.

Page 16

Example Terminal b: [1,2,4]

100

000

100

])4,2,1([)4( Ab MF 2

4d1: x:=...

1

AAAAA bMbMbMbbMAM )()()()()(

200

000

200

)()4()4()4( AbbA bMFFF

Nonterminal Abb

Page 17: Dataflow Frequency Analysis based on Whole Program Paths Eduard Mehofer Institute for Software Science University of Vienna mehofer@par.univie.ac.at mehofer.

Page 17

Algorithm

forall vN do

forall tT do

compute terminal t for node v

endfor

forall ntNT in reverse topological order do

compute nonterminal nt for node v

endfor

endfor

)()()( cvFvy S

Pseudo-Code

Page 18: Dataflow Frequency Analysis based on Whole Program Paths Eduard Mehofer Institute for Software Science University of Vienna mehofer@par.univie.ac.at mehofer.

Page 18

Example

A

S

a b c

Transition matrices and frequency matrices for terminals

A

S

a b c

a

A

S

a b c

b

A

S

a b c

c

Page 19: Dataflow Frequency Analysis based on Whole Program Paths Eduard Mehofer Institute for Software Science University of Vienna mehofer@par.univie.ac.at mehofer.

Page 19

Example

A

S

a b c

Transition matrices and frequency matrices for nonterminals

A

S

a b c

A

A

S

a b c

S

Frequency matrix of start symbol S

contains the dataflow frequency information!

Page 20: Dataflow Frequency Analysis based on Whole Program Paths Eduard Mehofer Institute for Software Science University of Vienna mehofer@par.univie.ac.at mehofer.

Page 20

Experiments

Gcc-Compiler 2.95.2– data flow frequency analysis written in C++/C– implementation of WPP (runtime & compiletime)

Benchmark– some programs of SpecInt95– reaching definitions problem

Environment– Sun Ultra Enterprise 450 (4 x 296 MhZ) with 2.5 GB

Page 21: Dataflow Frequency Analysis based on Whole Program Paths Eduard Mehofer Institute for Software Science University of Vienna mehofer@par.univie.ac.at mehofer.

Page 21

Node Statistics

0

2000

4000

6000

8000

10000

12000

No

des

not executed

executed w/o DFA

analyzed

about 40% of nodes are executed no computations for 60% of nodes required

Page 22: Dataflow Frequency Analysis based on Whole Program Paths Eduard Mehofer Institute for Software Science University of Vienna mehofer@par.univie.ac.at mehofer.

Page 22

WPP Size & Overhead

0

5000

10000

15000

20000

099.go

124.m88ksim

129.compress130.li

132.ijpeg

134.perl0

5

10

15

20

25

30

35

099.go

124.m88ksim

129.compress130.li

132.ijpeg

134.perl

WPP Size in Kbytes Compile Overhead in %

- Compile time overhead almost proportional to WPP size

Page 23: Dataflow Frequency Analysis based on Whole Program Paths Eduard Mehofer Institute for Software Science University of Vienna mehofer@par.univie.ac.at mehofer.

Page 23

Conclusion

Novel dataflow frequency analysis– designed for bi-distributive dataflow analysis problems– matrix representation of transition functions– employs SEQUITUR Grammars

Accurate and efficient algorithm

Experiments– platform: gcc for Ultra 450– benchmark: reaching definitions problem for SpecInt95– overhead is proportional to the size of WPP

Page 24: Dataflow Frequency Analysis based on Whole Program Paths Eduard Mehofer Institute for Software Science University of Vienna mehofer@par.univie.ac.at mehofer.

Page 24

Stop!