05 May 20061 Lazy Code Motion in an SSA World A CS 526 Course Project Patrick Meredith Steven...
-
Upload
rose-sparks -
Category
Documents
-
view
212 -
download
0
Transcript of 05 May 20061 Lazy Code Motion in an SSA World A CS 526 Course Project Patrick Meredith Steven...
05 May 2006 1
Lazy Code Motion in an SSA World
A CS 526 Course Project
Patrick MeredithSteven Lauterburg
05 May 2006
05 May 2006 2
Presentation Overview
Introduction Motivation Preliminaries
Implementing LCM Results Implementation Status
05 May 2006 3
Motivation
What problem are we trying to solve? Lazy Code Motion is a bit-vector-based iterative
dataflow algorithm for partial redundancy elimination (PRE) that delivers safe, computationally optimal results.
SSAPRE is an approach to PRE that was specifically designed to work on SSA-form that also delivers a computationally optimal placement
Unfortunately, the sparse SSAPRE algorithm does not always perform better than the older Lazy Code Motion dataflow algorithm.
05 May 2006 4
Solution?
LCM is based on the source level syntax of a program… an expression like a+b is easy to identify in non-SSA form.
In SSA-form, variables are renamed… What variables are the same from a source-level
perspective? What expressions are equivalent to each other? How do we handle multiple instances of the same
variable being live at the same time? Which instance of a variable do we use when we move a
computation to a new location?
Why is implementing Lazy Code Motion on an SSA-based internal representation (like LLVM’s) difficult?
05 May 2006 5
Redundant and Partially Redundant Computations Code motion is used to remove RedundantRedundant
computations…
… and Partially Redundant Partially Redundant computations.
f := 7
y := e + f
f := 7 y := e + f
y := e + f
s := b + c
t := b + c
05 May 2006 6
Critical Edges ProblemProblem: Code motion can be blocked by “Critical
Edges” – edges leading from nodes with more than one successor to nodes with more than one predecessor.
SolutionSolution: An edge splitting transformation can be performed that inserts extra nodes.
z := u + v
w := u + v
h := u + v
z := h
w := u + v
h := u + v
05 May 2006 7
Variable Equivalent Classes (VECs)
What is a Variable? A VEC?
Variables that are operands of a phi-node, along with the phi-node itself are placed in to the same VEC.
Many variables may be tied together by multiple phi-nodes.
Independent variables and constants are placed in singleton VECs
Function arguments can also be included in VECs
a1 := cx := a1 +
b
a0 := d
y := b + a2
a3 := fz := a3 +
b
a2 = phi (a1, a0)
05 May 2006 8
Expression Equivalent Classes (EECs)
When are two expressions equivalent?
Two expressions are considered equivalent for purposes of code motion if and only if…
1. they have the same operator and
2. the corresponding operands of the two expressions are in the same VEC.
Modulo commutativity, etc. of course
a1 := cx := a1 +
b
a0 := d
y := b + a2
a3 := fz := a3 +
b
a2 = phi (a1, a0)
05 May 2006 9
“Stale” Uses and “Fresh” Values
What is a “stale” use? A “stale” use occurs when
the live ranges of two different versions of the same source-level variable overlap
An expression that uses a “stale” definition is not the same syntactically as those using “fresh” values
What is a “fresh” value? Intuitively, it is the most
recent definition of a variable. More formally, for a given VEC, the fresh value is the VEC member definition that immediately dominates a program point.
a1 := c
a0 := d
==>y := b +
a2z := b +
a0
a2 = phi (a1, a0)
05 May 2006 10
Freshness Analysis
The Freshness Lattice BOTTOM < SSA values <
TOPLocal Freshness For each instruction, make
that SSA value the Fresh value for its VEC. What ever is Fresh at the exit is X_FRESH
Global Freshness To compute the N_FRESH
for a basic block we meet over the succesors.
Removal of Stale Uses After completion of the
Freshness analysis we remove Stale uses by inserting copies.
a1 := c
a0 := d
==>y := b +
a2z := b +
a0
a2 = phi (a1, a0)
05 May 2006 11
LCM Analyses
Which analyses do we perform?
We perform upsafety, downsaftey, earliestness, delayability, latenesses.
Why do we not do the Isolation analysis?
Mem2reg, essentially like leaving the original computation in place.
Worklist based
No predecessors/successors can cause problems
a1 := c
a0 := d
==>y := b +
a2z := b +
a0
a2 = phi (a1, a0)
05 May 2006 12
Moving Code… The Almost LCM TransformationThe Basic Block Local Transform We do not require local CSE as a prereq. We first insert new computations for everything marked as
N_INSERT for this basic block. As we step through instuctions in a given basic block we
update the local fresh set based on the fresh set at the beginning of the basic block. We also keep a set of dead computations.
For each Binary Operator if its computation is dead we insert a new computation with the proper Fresh operands. We store this computation to a memory location specific to each EEC.
At the point of each original computation we insert a load of the proper memory location, and replace all uses of that original computation with the load.
At the end of the basic block we insert computations and stores for all expressions that are X_INSERT and not X_REPLACE. These will be used in later basic blocks.
05 May 2006 13
Example
y0 := a0 + b0 y1 := a0 + b0a1 := G0y2 := a1 + b0y3 := y2 + b0
EEC0_comp_0 := a0 + b0store EEC0_comp_0, ECC0EEC0_load_0 := load ECC0a1 := G0EEC0_comp_1 := a1 + b0store EEC0_comp_1, ECC0EEC0_load_1 := load ECC0EEC1_comp_0 := EEC0_load_1 +
b0
05 May 2006 14
Results
Removed Stale Uses 3.0 2.0 16.0 2.0 2.0 7.0 5.0 4.0 111.0 0.0 0.0Unpropagated Constants 17.0 30.0 123.0 49.0 95.0 16.0 15.0 40.0 227.0 14.0 41.0VECs 227.0 1002.0 1040.0 1103.0 713.0 341.0 601.0 764.0 6879.0 169.0 409.0EECs 48.0 250.0 291.0 163.0 251.0 87.0 206.0 238.0 3088.0 34.0 131.0Non-singleton VECs 12.0 20.0 74.0 51.0 61.0 21.0 27.0 31.0 243.0 14.0 31.0Insertions 50.0 253.0 332.0 167.0 258.0 99.0 219.0 252.0 3196.0 34.0 134.0Replacements 50.0 257.0 349.0 115.0 257.0 99.0 219.0 249.0 3194.0 34.0 143.0Lines of code 537.0 448.0 703.0 634.0 785.0 432.0 579.0 913.0 4185.0 366.0 435.0Base +LCM time 6.7 1.8 8.9 12.3 59.9 NA 0.8 65.5 104.2 6.4 21.9Base time 6.6 1.8 8.8 12.3 79.5 NA 0.8 65.4 102.5 6.4 21.8Number of functions 13.0 9.0 17.0 21.0 5.0 9.0 9.0 12.0 76.0 3.0 5.0
05 May 2006 15
Bmps!
05 May 2006 16
Bmps!
05 May 2006 17
Limitations
Currently we can only dubiously handle programs which use unwind (maybe it will work, maybe not, if it does it is probably by accident).
While we appear to handle programs that use unreachable correclty we are not completely sure.
The algorithm is pretty slow due to all the book keeping we must do.
05 May 2006 18
Random Thoughts
I actually found a case where map is faster than hash_map!
Using handles to make Fresh updates not suck
The truth(?) of VECs!
05 May 2006 19
Results
What is a “stale” use? A “stale” use occurs when
the live ranges of two different versions of the same source-level variable overlap
An expression that uses a “stale” definition is not the same syntactically as those using “fresh” values
What is a “fresh” value? Intuitively, it is the most
recent definition of a variable. More formally, for a given VEC, the fresh value is the VEC member definition that immediately dominates a program point.
a1 := c
a0 := d
==>y := b +
a2z := b +
a0
a2 = phi (a1, a0)
05 May 2006 20
bmps
z := a + b
x := a + b
y := a + b
w := a + b
a := c
x := a + b
The predicate equations used by the algorithm make use of two local predicates defined below:
For every assignment node n ≡ v := t′ and every term t T \ V (where T is the set of all terms, and V is the set of all variables):
Used(n, t) = t SubTerms(t′ )
Transp(n, t) = v Var(t)
When t is understood, these predicates will be denoted Used(n) and Transp(n).
z := a + b
x := a + b
y := a + b
w := a + b
a := c
x := a + b
05 May 2006 21
Down-Safe
z := a + b
x := a + b
y := a + b
w := a + b
a := c
x := a + b
A node n is D-SAFE if a computation of a term t at n does not introduce a new value on a terminating path starting in n.
D-SAFE(n) =
false if n = e
Used(n) otherwiseTransp(n) D-SAFE(m)
m succ(n)
D-Safe
05 May 2006 22
Earliest
EARLIEST(n) =
true if n = s
(¬Transp(m) otherwise
¬ D-SAFE(m) EARLIEST(m))
Σ Σ m pred(n)
z := a + b
x := a + b
y := a + b
w := a + b
a := c
x := a + b
A node n is EARLIEST if there is a path from s to n where no node on that path prior to n is D-Safe and delivers the same value for t as when computed at n.
D-Safe Earliest
05 May 2006 23
Safe-Earliest Transformation
x := h
h := a + b
x := h
y := h
w := h
a := c
z := h
h := a + b
Introduce a new auxiliary variable h for the term t.
Insert at the entry of every node n that is both D-Safe and Earliest the assignment h := t.
Replace every original computation of t by h.
D-Safe & Earliest
The set of nodes that are both D-Safe and Earliest are computationally optimal computation points.
Safe-Earliest Transformation…
05 May 2006 24
An Example…
z := a + b
x := a + b
y := a + b
w := a + b
a := c
x := a + b
The Lazy Code Motion Approach:
Identifies the earliest optimal computation points based on predicate values determined through a series of forward and backward analyses.
Identifies computation points that allow variables to be initialized “as late as possible”.
Replaces original computations with auxiliary variables that are initialized at identified computation points… But only when there is computational gain.
05 May 2006 25
Delay
z := a + b
x := a + b
y := a + b
w := a + b
a := c
x := a + b
A node n is DELAY if on every path from s to n there is a computation of the Safe-Earliest Transform such that all subsequent original computations lie in n.
DELAY(n) =
D-SAFE(n) EARLIEST(n) false if n = s
¬Used(m) DELAY(m) otherwise m pred(n)
DelayD-Safe & Earliest
05 May 2006 26
Latest
z := a + b
x := a + b
y := a + b
w := a + b
a := c
x := a + b
LATEST(n) =
false if n = e
DELAY(n) otherwise
(Used(n) ¬ DELAY(m)) m succ(n)
Delay Latest
A node n is LATEST if… n is a computation point of some
computationally optimal placement.
On every terminating path starting in n, any subsequent optimal computation point follows an original computation.
05 May 2006 27
An Example…
z := a + b
x := a + b
y := a + b
w := a + b
a := c
x := a + b
The Lazy Code Motion Approach:
Identifies the earliest optimal computation points based on predicate values determined through a series of forward and backward analyses.
Identifies computation points that allow variables to be initialized “as late as possible”.
Replaces original computations with auxiliary variables that are initialized at identified computation points… But only when there is computational gain.
05 May 2006 28
Isolated
z := a + b
x := a + b
y := a + b
w := a + b
a := c
x := a + b
A node n is ISOLATED if on every terminating path starting from a successor of n, any original computation of t is preceded by a new, latest computation.
ISOLATED(n) =
true if n = e
(LATEST(m) otherwise
¬ Used(m) ISOLATED(m))
m succ(n)
Isolated Latest
05 May 2006 29
Lazy Code Motion Transformation
z := h
x := a + b
h := a + b
y := h
w := h
a := c
x := a + b
h := a + b
Set of Optimal Computation Points for t : OCP = { n | Latest(n) ¬ Isolated(n) }
Set of Redundant Occurrences of t : RO = { n | Used(n) ¬ (Latest(n) Isolated(n)) }
Introduce a new auxiliary variable h for the term t.
Insert at the entry of every node in OCPOCP the assignment h := t.
Replace every original computation of t in nodes of RORO by h.
Latest Latest & Isolated
LCM Transformation…
05 May 2006 30
Register pressure is not always reduced. Some desirable code motion is not allowed. Code size can be increased.
Considerations
05 May 2006 31
How do the live ranges of aa and bb affect “lifetime optimality”?
Reducing Register Pressure?
z := h
h := a + b
y := h
w := h
x := a + b
h := a + b
x := h
y := h
w := h
z := h
h := a + b
05 May 2006 32
Some desirable code motion is not D-Safe and therefore not allowed.
Code Motion & Down-Safety
w := a + b
w := h
h := a + b
05 May 2006 33
Late placement of computation points increases code size.
Code Bloat
y := h y := h
h := a + b
y := hy := hh := a +
by := h
h := a + b
y := h
h := a + b
y := h
h := a + b
y := h
05 May 2006 34
05 May 2006 35
Equations 1
D-SAFE(n) =
false if n = e
Used(n) otherwiseTransp(n) D-SAFE(m)
m succ(n)
EARLIEST(n) =
true if n = s
(¬Transp(m) otherwise
¬ D-SAFE(m) EARLIEST(m))
Σ Σ m pred(n)
For every node n ≡ v := t′ and every term t T \ V…
Used(n, t) = t SubTerms(t′ )
Transp(n, t) = v Var(t)
05 May 2006 36
Equations 2
DELAY(n) =
D-Safe(n) Earliest(n) false if n = s
¬Used(m) DELAY(m) otherwise m pred(n)
LATEST(n) =
false if n = e
Delay(n) otherwise
(Used(n) ¬ Delay(m)) m succ(n)
05 May 2006 37
Equations 3
ISOLATED(n) =
true if n = e
(Latest(m) otherwise
¬ Used(m) ISOLATED(m))
m succ(n)
RO = { n | Used(n) ¬ (Latest(n) Isolated(n))
OCP = { n | Latest(n) ¬ Isolated(n) }