Download - Iterative Data-flow Analysis C OMP 512 Rice University Houston, Texas Fall 2003 Copyright 2003, Keith D. Cooper Linda Torczon, all rights reserved. Students.

Iterative Data-flow Analysis

COMP 512Rice UniversityHouston, Texas

Fall 2003

Copyright 2003, Keith D. Cooper & Linda Torczon, all rights reserved.Students enrolled in Comp 512 at Rice University have explicit permission to make copies of these materials for their personal use.

New lecture, 2003

COMP 512, Fall 2003 2

Review

Last class:• Looked at Global Common Subexpression Elimination (Cocke

70)• Defined the available expressions problem as the key to finding

opportunities and proving safety

AVAIL(n0) = ØAVAIL(b) = xpred(b) (DEEXPR(x) (AVAIL(x) EXPRKILL(x) ))

• Looked at an algorithm to solve these equations Compute initial information: DEEXPR & EXPRKILL Apply an iterative solver to find a fixed-point solution

Today• Why does the iterative solver work?

COMP 512, Fall 2003 3

Data-flow Analysis

DefinitionData-flow analysis is a collection of techniques for compile-time reasoning about the run-time flow of values

• Almost always involves building a graph Problems are trivial on a basic block Global problems control-flow graph (or derivative) Whole program problems call graph (or derivative)

• Usually formulated as a set of simultaneous equations Sets attached to nodes and edges Semilattice to describe values We solved AVAIL with an iterative fixed-point algorithm

• Desired result is usually meet over all paths solution “What is true on every path from the entry?” “Can this happen on any path from the entry?” Related to the safety of optimization (how we use the

results)

COMP 512, Fall 2003 4

Data-flow Analysis

Limitations1. Precision – “up to symbolic execution”

Assume all paths are taken2. Solution – cannot afford to compute MOP solution

Large class of problems where MOP = MFP= LFP Not all problems of interest are in this class

3. Arrays – treated naively in classical analysis Represent whole array with a single fact

4. Pointers – difficult (and expensive) to analyze Imprecision rapidly adds up Need to ask the right questions

SummaryFor scalar values, we can quickly solve simple problems

Good news:Simple problems can carry us pretty far

*

COMP 512, Fall 2003 5

Data-flow Analysis

SemilatticeA semilattice is a set L and a meet operation such that, a, b, & c L :

1. a a = a2. a b = b a3. a (b c) = (a b) c

imposes an order on L, a, b, & c L :1. a ≥ b a b = b2. a > b a ≥ b and a ≠ b

A semilattice has a bottom element, denoted 1. a L, a = 2. a L, a ≥

COMP 512, Fall 2003 6

Data-flow Analysis

How does this relate to data-flow analysis?• Choose a semilattice to represent the facts• Attach a meaning to each a L

Each a L is a distinct set of known facts• With each node n, associate a function fn : L L

fn models behavior of code in block corresponding to n• Let F be the set of all functions that the code might generate

Example — AVAIL

• Semilattice is (2E,), where E is the set of all expressions & is Set are bigger than |variables|, is Ø

• For a node n, fn has the form fn(x) = an (x bn ) Where an is DEExpr(n) and bn is EXPRKILL(n)

COMP 512, Fall 2003 7

Concrete Example: Available Expressions

m a + bn a + b

A

p c + dr c + d

B

y a + bz c + d

G

q a + br c + d

C

e b + 18s a + bu e + f

D e a + 17t c + du e + f

E

v a + bw c + dx e + f

F

E = {a+b,c+d,e+f,a+17,b+18}2E is the set of all subsets of E

2E = [ {a+b,c+d,e+f,a+17,b+18}, {a+b,c+d,e+f,a+17}, {a+b,c+d,e+f,b+18}, {a+b,c+d,a+17,b+18}, {a+b,e+f,a+17,b+18}, {c+d,e+f,a+17,b+18}, {a+b,c+d,e+f}, {a+b,c+d,b+18}, {a+b,c+d,a+17}, {a+b,e+f,a+17}, {a+b,e+f,b+18},{a+b,a+17,b+18}, {c+d,e+f,a+17}, {c+d,e+f,b+18}, {c+d,a+17,b+18},{e+f,a+17,b+18}, {a+b,c+d},{a+b,e+f},{a+b,a+17}, {a+b,b+18},{c+d,e+f},{c+d,a+17}, {c+d,b+18},{e+f,a+17},{e+f,b+18}, {a+17,b+18},{a+b}, {c+d}, {e+f}, {a+17}, {b+18}, {} ]

COMP 512, Fall 2003 8


The Lattice

{ }

{a+b} {c+d} {e+f} {a+17} {b+18}

{a+b,c+d} {a+b,a+17} {c+d,e+f} {c+d,b+18} {e+f,b+18} {a+b,e+f} {a+b,b+18} {c+d,a+17} {e+f,a+17} {a+17,b+18}

{a+b,c+d,e+f} {a+b,c+d,b+18} {a+b,c+d,a+17} {a+b,e+f,a+17} {a+b,e+f,b+18} {a+b,a+17,b+18} {c+d,e+f,a+17} {c+d,e+f,b+18} {c+d,a+17,b+18}

{e+f,a+17,b+18},

{a+b,c+d,e+f,a+17} {a+b,c+d,e+f,b+18} {a+b,c+d,a+17,b+18} {a+b,e+f,a+17,b+18} {c+d,e+f,a+17,b+18}

{a+b,c+d,e+f,a+17,b+18},

*

Comparability(transitive)

COMP 512, Fall 2003 9


The Lattice

{ }

{a+b} {c+d} {e+f} {a+17} {b+18}

{a+b,c+d} {a+b,a+17} {c+d,e+f} {c+d,b+18} {e+f,b+18} {a+b,e+f} {a+b,b+18} {c+d,a+17} {e+f,a+17} {a+17,b+18}

{a+b,c+d,e+f} {a+b,c+d,b+18} {a+b,c+d,a+17} {a+b,e+f,a+17} {a+b,e+f,b+18} {a+b,a+17,b+18} {c+d,e+f,a+17} {c+d,e+f,b+18} {c+d,a+17,b+18}

{e+f,a+17,b+18},

{a+b,c+d,e+f,a+17} {a+b,c+d,e+f,b+18} {a+b,c+d,a+17,b+18} {a+b,e+f,a+17,b+18} {c+d,e+f,a+17,b+18}

{a+b,c+d,e+f,a+17,b+18},

*

meet

COMP 512, Fall 2003 10

Round-robin Iterative Algorithm

• Termination: does it halt?• Correctness: what answer does it produce?• Speed: how quickly does it find that answer?

AVAIL(b0 ) Øfor i 1 to N

AVAIL(bi ) { all expressions }change truewhile (change)

change false for i 0 to N

TEMP xpred (b) (DEF (x) (AVAIL(x) NKILL(x) ))if AVAIL(bi ) ≠ TEMP then

change true AVAIL(bi ) TEMP

The round-robin solver is easier to analyze than the worklist solver.

COMP 512, Fall 2003 11

Round-robin Iterative Algorithm

Termination• Makes sweeps over the nodes• Halts when some sweep produces no change

AVAIL(b0 ) Øfor i 1 to N

AVAIL(bi ) { all expressions }change truewhile (change)

change false for i 0 to N

TEMP xpred (b) (DEF (x) (AVAIL(x) NKILL(x) ))if AVAIL(bi ) ≠ TEMP then

change true AVAIL(bi ) TEMP

COMP 512, Fall 2003 12


• Any finite semilattice is bounded• Some infinite semilattices are bounded

-.002 … -.001 … 0 … .001 … .002 …

Real constants

Termination• If every fn F is monotone, i.e., x ≤ y f(x) ≤ f(y), and• If the lattice is bounded, i.e., every descending chain is finite

Chain is sequence x1, x2, …, xn where xi L, 1 ≤ i ≤ n xi > xi+1, 1 ≤ i < n chain is descending

Then• The set at each node can only change a finite number of times • The iterative algorithm must halt on an instance of the problem

COMP 512, Fall 2003 13


Correctness (What does it compute?)

• If every fn F is monotone, i.e., x ≤ y f(x) ≤ f(y), and• If the semilattice is bounded, i.e., every descending chain is

finite Chain is sequence x1, x2, …, xn where xi L, 1 ≤ i ≤ n xi > xi+1, 1 ≤ i < n chain is descending

Given a bounded semilattice S and a monotone function space F k such that f k() = f j() j > k• f k() is called the least fixed-point of f over S• If L has a T, then k such that f k(T) = f j(T) j > k and

f k(T) is called the maximal fixed-point of f over S

optimism

COMP 512, Fall 2003 14


Correctness• If every fn F is monotone, i.e., f(xy) ≤ f(x) f(y), and• If the lattice is bounded, i.e., every descending chain is finite

Chain is sequence x1, x2, …, xn where xi L, 1 ≤ i ≤ n xi > xi+1, 1 ≤ i < n chain is descending

Then• The round-robin algorithm computes a least fixed-point (LFP)• The uniqueness of the solution depends on other properties of F

• Unique solution it finds the one we want• Multiple solutions we need to know which one it finds

COMP 512, Fall 2003 15


Correctness• Does the iterative algorithm compute the desired answer?

Admissible Function Spaces1. f F, x,y L, f (x y) = f (x) f (y)2. fi F such that x L, fi(x) = x

3. f,g F h F such that h(x ) = f (g(x))4. x L, a finite subset H F such that x = f H f ()

If F meets these four conditions, then an instance of the problem will have a unique fixed point solution (instance graph + initial values) LFP = MFP = MOP order of evaluation does not matter

Not distributive fixed point solution may not be unique

*

COMP 512, Fall 2003 16


If a data-flow framework meets those admissibility conditionsthen it has a unique fixed-point solution• The iterative algorithm finds the (best) answer• The solution does not depend on order of computation• Algorithm can choose an order that converges quickly

Intuition• Choose an order so that changes propagate as far as possible

on each “sweep” Process a node’s predecessors before the node

• Cycles pose problems, of course Ignore back edges when computing the order?

*

COMP 512, Fall 2003 17

Ordering the Nodes to Maximize Propagation

2 3

4

1

Postorder

3 2

1

4

Reverse Postorder

• Reverse postorder visits predecessors before visiting a node• Use reverse preorder for backward problems

Reverse postorder on reverse CFG is reverse preorder

N+1 - postorder number

COMP 512, Fall 2003 18


Speed• For a problem with an admissible function space & a bounded

semilattice,• If the functions all meet the rapid condition, i.e.,

f,g F, x L, f (g()) ≥ g() f (x) xthen, a round-robin, reverse-postorder iterative algorithm will halt in d(G)+3 passes over a graph G

d(G) is the loop-connectedness of the graph w.r.t a DFST Maximal number of back edges in an acyclic path Several studies suggest that, in practice, d(G) is small

(<3) For most CFGs, d(G) is independent of the specific DFST

Sets stabilize in two passes around a

loop

Each pass does O(E ) meets & O(N ) other operations

*

COMP 512, Fall 2003 19

Iterative Data-flow analysis

What does this mean?• Reverse postorder

Easily computed order that increases propagation per pass• Round-robin iterative algorithm

Visit all the nodes in a consistent order (RPO) Do it again until the sets stop changing

• Rapid condition Most classic global data-flow problems meet this condition

These conditions are easily met Admissible framework, rapid function space Round-robin, reverse-postorder, iterative algorithm

The analysis runs in (effectively) linear time

COMP 512, Fall 2003 20

Some problems are not admissible

Global constant propagation

• First condition in admissibility f F, x,y L, f (x y) = f (x) f (y)

• Constant propagation is not admissible Kam & Ullman time bound does not hold There are tight time bounds, however, based on lattice height Require a variable-by-variable formulation …

a b + c

• Function “f” models block’s effects• f(S1) = {a=7,b=3,c=4}• f(S2) = {a=7,b=1,c=6}• f(S1S2) = Ø

S1: {b=3,c=4}

S2: {b=1,c=6}

COMP 512, Fall 2003 21

Some admissible problems are not rapid

Interprocedural May Modify sets

• Iterations proportional to number of parameters Not a function of the call graph Can make example arbitrarily bad

• Proportional to length of chain of bindings…

shift(a,b,c,d,e,f){ local t; … call shift(t,a,b,c,d,e); f = 1; …}

• Assume call-by-reference• Compute the set of variables (in shift) that can be modified by a call to shift• How long does it take?

shift

a b c d e f Nothing to do with d(G)

COMP 512, Fall 2003 22

Extra Slides Start Here

COMP 512, Fall 2003 23

Computing Available Expressions

AVAIL(b) = xpred(b) (DEEXPR(x) (AVAIL(x) EXPRKILL(x) ))

where

• EXPRKILL(b) is the set of expression killed in b, and • DEEXPR(b) is the set of expressions defined in b and not

subsequently killed in b

Initial conditionAVAIL(n0) = Ø, because nothing is computed before n0

The other node’s AVAIL sets will be computed over their preds.

N0 has no predecessor.

COMP 512, Fall 2003 24

Making Theory Concrete

Computing AVAIL for the exampleAVAIL(A) = ØAVAIL(B) = {a+b} (Ø all)

= {a+b}AVAIL(C) = {a+b}AVAIL(D) = {a+b,c+d} ({a+b} all)

= {a+b,c+d} AVAIL(E) = {a+b,c+d}AVAIL(F) = [{b+18,a+b,e+f}

({a+b,c+d} {all - e+f})]

[{a+17,c+d,e+f} ({a+b,c+d} {all -

e+f})]= {a+b,c+d,e+f}

AVAIL(G) = [ {c+d} ({a+b} all)] [{a+b,c+d,e+f} ({a+b,c+d,e+f} all)]= {a+b,c+d}

m a + bn a + b

A

p c + dr c + d

B

y a + bz c + d

G

q a + br c + d

C

e b + 18s a + bu e + f

D e a + 17t c + du e + f

E


F

A B C D E F GDEEXPR a+b c+d a+b,c+db+18,a+b,e+fa+17,c+d,e+fa+b,c+d,e+fa+b,c+dEXPRK ILL { } { } { } e+f e+f { } { }

*

COMP 512, Fall 2003 25

Redundancy Elimination Wrap-up

Algorithm Acronym CreditsLocal Value Numbering LVN Balke, 1967Superlocal Value Numbering SVN ManyDominator-based Value Num’g DVNT Simpson, 1996Global CSE (with AVAIL) GCSE Cocke, 1970SCC-based Value Numbering† SCCVN/VDCM Simpson, 1996Partitioning Algorithm† AWZ Alpern et al,

1988

… and there are many others …

†We have not seen these ones (yet).

Three general approaches•Hash-based, bottom-up

techniques•Data-flow techniques•PartitioningEach has strengths & weaknesses

COMP 512, Fall 2003 26

Making Theory Concrete

Comparing the techniques

m a + bn a + b

A

p c + dr c + d

B

y a + bz c + d

G

q a + br c + d

C

e b + 18s a + bu e + f

D e a + 17t c + du e + f

E


F

LVN

LVNSVN

SVNSVN

DVNDVNGRE

DVNGRE

The VN methods are ordered•LVN ≤ SVN ≤ DVN (≤ SCCVN)•GRE is different

o Based on names, not valueo Two phase algorithm

Analysis Replacement

COMP 512, Fall 2003 27


ComparisonsOn/Off Operates Basis of

Name Scope Line On IdentityLVN local online blocks valueSVN superlocal online EBBs value

DVNT regional online dom. Tree value GCSE global offline CFG lexical

Visits AlgebraicName per Node Commutes Identities Constants OptimisticLVN 1 yes yes yes n/aSVN 1 yes yes yes n/a

DVNT 1 yes yes yes n/aGCSE D(CFG) + 3 no no no no

Better results in loops

COMP 512, Fall 2003 28


Generalizations• Hash-based methods are fastest• AWZ (& SCCVN) find the most cases• Expect better results with larger scope

Experimental data• Ran LVN, SVN, DVNT, AWZ• Used global name space for DVNT

Requires offline replacement Exposes more opportunities

• Code was compiled with lots of optimization

How did they do? DVNT beat AWZ

Improvements grew with scope

DVNT vs. SCCVN was ± 1%

DVNT 6x faster than SCCVN SCCVN 2.5x faster than

AWZ

*

The partitioning method based on DFA minimization

COMP 512, Fall 2003 29


Conclusions• Redundancy elimination has some depth & subtlety• Variations on names, algorithms & analysis matter• Compile-time speed does not have to sacrifice code quality

DVNT is probably the method of choice• Results quite close to the global methods (± 1%)• Much lower costs than SCCVN or AWZ

COMP 512, Fall 2003 30

Lattice Theory

This stuff is somewhat dry

Everybody stand up and stretch