05 May 20061 Lazy Code Motion in an SSA World A CS 526 Course Project Patrick Meredith Steven...

37
05 May 2006 1 Lazy Code Motion in an SSA World A CS 526 Course Project Patrick Meredith Steven Lauterburg 05 May 2006

Transcript of 05 May 20061 Lazy Code Motion in an SSA World A CS 526 Course Project Patrick Meredith Steven...

Page 1: 05 May 20061 Lazy Code Motion in an SSA World A CS 526 Course Project Patrick Meredith Steven Lauterburg 05 May 2006.

05 May 2006 1

Lazy Code Motion in an SSA World

A CS 526 Course Project

Patrick MeredithSteven Lauterburg

05 May 2006

Page 2: 05 May 20061 Lazy Code Motion in an SSA World A CS 526 Course Project Patrick Meredith Steven Lauterburg 05 May 2006.

05 May 2006 2

Presentation Overview

Introduction Motivation Preliminaries

Implementing LCM Results Implementation Status

Page 3: 05 May 20061 Lazy Code Motion in an SSA World A CS 526 Course Project Patrick Meredith Steven Lauterburg 05 May 2006.

05 May 2006 3

Motivation

What problem are we trying to solve? Lazy Code Motion is a bit-vector-based iterative

dataflow algorithm for partial redundancy elimination (PRE) that delivers safe, computationally optimal results.

SSAPRE is an approach to PRE that was specifically designed to work on SSA-form that also delivers a computationally optimal placement

Unfortunately, the sparse SSAPRE algorithm does not always perform better than the older Lazy Code Motion dataflow algorithm.

Page 4: 05 May 20061 Lazy Code Motion in an SSA World A CS 526 Course Project Patrick Meredith Steven Lauterburg 05 May 2006.

05 May 2006 4

Solution?

LCM is based on the source level syntax of a program… an expression like a+b is easy to identify in non-SSA form.

In SSA-form, variables are renamed… What variables are the same from a source-level

perspective? What expressions are equivalent to each other? How do we handle multiple instances of the same

variable being live at the same time? Which instance of a variable do we use when we move a

computation to a new location?

Why is implementing Lazy Code Motion on an SSA-based internal representation (like LLVM’s) difficult?

Page 5: 05 May 20061 Lazy Code Motion in an SSA World A CS 526 Course Project Patrick Meredith Steven Lauterburg 05 May 2006.

05 May 2006 5

Redundant and Partially Redundant Computations Code motion is used to remove RedundantRedundant

computations…

… and Partially Redundant Partially Redundant computations.

f := 7

y := e + f

f := 7 y := e + f

y := e + f

s := b + c

t := b + c

Page 6: 05 May 20061 Lazy Code Motion in an SSA World A CS 526 Course Project Patrick Meredith Steven Lauterburg 05 May 2006.

05 May 2006 6

Critical Edges ProblemProblem: Code motion can be blocked by “Critical

Edges” – edges leading from nodes with more than one successor to nodes with more than one predecessor.

SolutionSolution: An edge splitting transformation can be performed that inserts extra nodes.

z := u + v

w := u + v

h := u + v

z := h

w := u + v

h := u + v

Page 7: 05 May 20061 Lazy Code Motion in an SSA World A CS 526 Course Project Patrick Meredith Steven Lauterburg 05 May 2006.

05 May 2006 7

Variable Equivalent Classes (VECs)

What is a Variable? A VEC?

Variables that are operands of a phi-node, along with the phi-node itself are placed in to the same VEC.

Many variables may be tied together by multiple phi-nodes.

Independent variables and constants are placed in singleton VECs

Function arguments can also be included in VECs

a1 := cx := a1 +

b

a0 := d

y := b + a2

a3 := fz := a3 +

b

a2 = phi (a1, a0)

Page 8: 05 May 20061 Lazy Code Motion in an SSA World A CS 526 Course Project Patrick Meredith Steven Lauterburg 05 May 2006.

05 May 2006 8

Expression Equivalent Classes (EECs)

When are two expressions equivalent?

Two expressions are considered equivalent for purposes of code motion if and only if…

1. they have the same operator and

2. the corresponding operands of the two expressions are in the same VEC.

Modulo commutativity, etc. of course

a1 := cx := a1 +

b

a0 := d

y := b + a2

a3 := fz := a3 +

b

a2 = phi (a1, a0)

Page 9: 05 May 20061 Lazy Code Motion in an SSA World A CS 526 Course Project Patrick Meredith Steven Lauterburg 05 May 2006.

05 May 2006 9

“Stale” Uses and “Fresh” Values

What is a “stale” use? A “stale” use occurs when

the live ranges of two different versions of the same source-level variable overlap

An expression that uses a “stale” definition is not the same syntactically as those using “fresh” values

What is a “fresh” value? Intuitively, it is the most

recent definition of a variable. More formally, for a given VEC, the fresh value is the VEC member definition that immediately dominates a program point.

a1 := c

a0 := d

==>y := b +

a2z := b +

a0

a2 = phi (a1, a0)

Page 10: 05 May 20061 Lazy Code Motion in an SSA World A CS 526 Course Project Patrick Meredith Steven Lauterburg 05 May 2006.

05 May 2006 10

Freshness Analysis

The Freshness Lattice BOTTOM < SSA values <

TOPLocal Freshness For each instruction, make

that SSA value the Fresh value for its VEC. What ever is Fresh at the exit is X_FRESH

Global Freshness To compute the N_FRESH

for a basic block we meet over the succesors.

Removal of Stale Uses After completion of the

Freshness analysis we remove Stale uses by inserting copies.

a1 := c

a0 := d

==>y := b +

a2z := b +

a0

a2 = phi (a1, a0)

Page 11: 05 May 20061 Lazy Code Motion in an SSA World A CS 526 Course Project Patrick Meredith Steven Lauterburg 05 May 2006.

05 May 2006 11

LCM Analyses

Which analyses do we perform?

We perform upsafety, downsaftey, earliestness, delayability, latenesses.

Why do we not do the Isolation analysis?

Mem2reg, essentially like leaving the original computation in place.

Worklist based

No predecessors/successors can cause problems

a1 := c

a0 := d

==>y := b +

a2z := b +

a0

a2 = phi (a1, a0)

Page 12: 05 May 20061 Lazy Code Motion in an SSA World A CS 526 Course Project Patrick Meredith Steven Lauterburg 05 May 2006.

05 May 2006 12

Moving Code… The Almost LCM TransformationThe Basic Block Local Transform We do not require local CSE as a prereq. We first insert new computations for everything marked as

N_INSERT for this basic block. As we step through instuctions in a given basic block we

update the local fresh set based on the fresh set at the beginning of the basic block. We also keep a set of dead computations.

For each Binary Operator if its computation is dead we insert a new computation with the proper Fresh operands. We store this computation to a memory location specific to each EEC.

At the point of each original computation we insert a load of the proper memory location, and replace all uses of that original computation with the load.

At the end of the basic block we insert computations and stores for all expressions that are X_INSERT and not X_REPLACE. These will be used in later basic blocks.

Page 13: 05 May 20061 Lazy Code Motion in an SSA World A CS 526 Course Project Patrick Meredith Steven Lauterburg 05 May 2006.

05 May 2006 13

Example

y0 := a0 + b0 y1 := a0 + b0a1 := G0y2 := a1 + b0y3 := y2 + b0

EEC0_comp_0 := a0 + b0store EEC0_comp_0, ECC0EEC0_load_0 := load ECC0a1 := G0EEC0_comp_1 := a1 + b0store EEC0_comp_1, ECC0EEC0_load_1 := load ECC0EEC1_comp_0 := EEC0_load_1 +

b0

Page 14: 05 May 20061 Lazy Code Motion in an SSA World A CS 526 Course Project Patrick Meredith Steven Lauterburg 05 May 2006.

05 May 2006 14

Results

Removed Stale Uses 3.0 2.0 16.0 2.0 2.0 7.0 5.0 4.0 111.0 0.0 0.0Unpropagated Constants 17.0 30.0 123.0 49.0 95.0 16.0 15.0 40.0 227.0 14.0 41.0VECs 227.0 1002.0 1040.0 1103.0 713.0 341.0 601.0 764.0 6879.0 169.0 409.0EECs 48.0 250.0 291.0 163.0 251.0 87.0 206.0 238.0 3088.0 34.0 131.0Non-singleton VECs 12.0 20.0 74.0 51.0 61.0 21.0 27.0 31.0 243.0 14.0 31.0Insertions 50.0 253.0 332.0 167.0 258.0 99.0 219.0 252.0 3196.0 34.0 134.0Replacements 50.0 257.0 349.0 115.0 257.0 99.0 219.0 249.0 3194.0 34.0 143.0Lines of code 537.0 448.0 703.0 634.0 785.0 432.0 579.0 913.0 4185.0 366.0 435.0Base +LCM time 6.7 1.8 8.9 12.3 59.9 NA 0.8 65.5 104.2 6.4 21.9Base time 6.6 1.8 8.8 12.3 79.5 NA 0.8 65.4 102.5 6.4 21.8Number of functions 13.0 9.0 17.0 21.0 5.0 9.0 9.0 12.0 76.0 3.0 5.0

Page 15: 05 May 20061 Lazy Code Motion in an SSA World A CS 526 Course Project Patrick Meredith Steven Lauterburg 05 May 2006.

05 May 2006 15

Bmps!

Page 16: 05 May 20061 Lazy Code Motion in an SSA World A CS 526 Course Project Patrick Meredith Steven Lauterburg 05 May 2006.

05 May 2006 16

Bmps!

Page 17: 05 May 20061 Lazy Code Motion in an SSA World A CS 526 Course Project Patrick Meredith Steven Lauterburg 05 May 2006.

05 May 2006 17

Limitations

Currently we can only dubiously handle programs which use unwind (maybe it will work, maybe not, if it does it is probably by accident).

While we appear to handle programs that use unreachable correclty we are not completely sure.

The algorithm is pretty slow due to all the book keeping we must do.

Page 18: 05 May 20061 Lazy Code Motion in an SSA World A CS 526 Course Project Patrick Meredith Steven Lauterburg 05 May 2006.

05 May 2006 18

Random Thoughts

I actually found a case where map is faster than hash_map!

Using handles to make Fresh updates not suck

The truth(?) of VECs!

Page 19: 05 May 20061 Lazy Code Motion in an SSA World A CS 526 Course Project Patrick Meredith Steven Lauterburg 05 May 2006.

05 May 2006 19

Results

What is a “stale” use? A “stale” use occurs when

the live ranges of two different versions of the same source-level variable overlap

An expression that uses a “stale” definition is not the same syntactically as those using “fresh” values

What is a “fresh” value? Intuitively, it is the most

recent definition of a variable. More formally, for a given VEC, the fresh value is the VEC member definition that immediately dominates a program point.

a1 := c

a0 := d

==>y := b +

a2z := b +

a0

a2 = phi (a1, a0)

Page 20: 05 May 20061 Lazy Code Motion in an SSA World A CS 526 Course Project Patrick Meredith Steven Lauterburg 05 May 2006.

05 May 2006 20

bmps

z := a + b

x := a + b

y := a + b

w := a + b

a := c

x := a + b

The predicate equations used by the algorithm make use of two local predicates defined below:

For every assignment node n ≡ v := t′ and every term t T \ V (where T is the set of all terms, and V is the set of all variables):

Used(n, t) = t SubTerms(t′ )

Transp(n, t) = v Var(t)

When t is understood, these predicates will be denoted Used(n) and Transp(n).

z := a + b

x := a + b

y := a + b

w := a + b

a := c

x := a + b

Page 21: 05 May 20061 Lazy Code Motion in an SSA World A CS 526 Course Project Patrick Meredith Steven Lauterburg 05 May 2006.

05 May 2006 21

Down-Safe

z := a + b

x := a + b

y := a + b

w := a + b

a := c

x := a + b

A node n is D-SAFE if a computation of a term t at n does not introduce a new value on a terminating path starting in n.

D-SAFE(n) =

false if n = e

Used(n) otherwiseTransp(n) D-SAFE(m)

m succ(n)

D-Safe

Page 22: 05 May 20061 Lazy Code Motion in an SSA World A CS 526 Course Project Patrick Meredith Steven Lauterburg 05 May 2006.

05 May 2006 22

Earliest

EARLIEST(n) =

true if n = s

(¬Transp(m) otherwise

¬ D-SAFE(m) EARLIEST(m))

Σ Σ m pred(n)

z := a + b

x := a + b

y := a + b

w := a + b

a := c

x := a + b

A node n is EARLIEST if there is a path from s to n where no node on that path prior to n is D-Safe and delivers the same value for t as when computed at n.

D-Safe Earliest

Page 23: 05 May 20061 Lazy Code Motion in an SSA World A CS 526 Course Project Patrick Meredith Steven Lauterburg 05 May 2006.

05 May 2006 23

Safe-Earliest Transformation

x := h

h := a + b

x := h

y := h

w := h

a := c

z := h

h := a + b

Introduce a new auxiliary variable h for the term t.

Insert at the entry of every node n that is both D-Safe and Earliest the assignment h := t.

Replace every original computation of t by h.

D-Safe & Earliest

The set of nodes that are both D-Safe and Earliest are computationally optimal computation points.

Safe-Earliest Transformation…

Page 24: 05 May 20061 Lazy Code Motion in an SSA World A CS 526 Course Project Patrick Meredith Steven Lauterburg 05 May 2006.

05 May 2006 24

An Example…

z := a + b

x := a + b

y := a + b

w := a + b

a := c

x := a + b

The Lazy Code Motion Approach:

Identifies the earliest optimal computation points based on predicate values determined through a series of forward and backward analyses.

Identifies computation points that allow variables to be initialized “as late as possible”.

Replaces original computations with auxiliary variables that are initialized at identified computation points… But only when there is computational gain.

Page 25: 05 May 20061 Lazy Code Motion in an SSA World A CS 526 Course Project Patrick Meredith Steven Lauterburg 05 May 2006.

05 May 2006 25

Delay

z := a + b

x := a + b

y := a + b

w := a + b

a := c

x := a + b

A node n is DELAY if on every path from s to n there is a computation of the Safe-Earliest Transform such that all subsequent original computations lie in n.

DELAY(n) =

D-SAFE(n) EARLIEST(n) false if n = s

¬Used(m) DELAY(m) otherwise m pred(n)

DelayD-Safe & Earliest

Page 26: 05 May 20061 Lazy Code Motion in an SSA World A CS 526 Course Project Patrick Meredith Steven Lauterburg 05 May 2006.

05 May 2006 26

Latest

z := a + b

x := a + b

y := a + b

w := a + b

a := c

x := a + b

LATEST(n) =

false if n = e

DELAY(n) otherwise

(Used(n) ¬ DELAY(m)) m succ(n)

Delay Latest

A node n is LATEST if… n is a computation point of some

computationally optimal placement.

On every terminating path starting in n, any subsequent optimal computation point follows an original computation.

Page 27: 05 May 20061 Lazy Code Motion in an SSA World A CS 526 Course Project Patrick Meredith Steven Lauterburg 05 May 2006.

05 May 2006 27

An Example…

z := a + b

x := a + b

y := a + b

w := a + b

a := c

x := a + b

The Lazy Code Motion Approach:

Identifies the earliest optimal computation points based on predicate values determined through a series of forward and backward analyses.

Identifies computation points that allow variables to be initialized “as late as possible”.

Replaces original computations with auxiliary variables that are initialized at identified computation points… But only when there is computational gain.

Page 28: 05 May 20061 Lazy Code Motion in an SSA World A CS 526 Course Project Patrick Meredith Steven Lauterburg 05 May 2006.

05 May 2006 28

Isolated

z := a + b

x := a + b

y := a + b

w := a + b

a := c

x := a + b

A node n is ISOLATED if on every terminating path starting from a successor of n, any original computation of t is preceded by a new, latest computation.

ISOLATED(n) =

true if n = e

(LATEST(m) otherwise

¬ Used(m) ISOLATED(m))

m succ(n)

Isolated Latest

Page 29: 05 May 20061 Lazy Code Motion in an SSA World A CS 526 Course Project Patrick Meredith Steven Lauterburg 05 May 2006.

05 May 2006 29

Lazy Code Motion Transformation

z := h

x := a + b

h := a + b

y := h

w := h

a := c

x := a + b

h := a + b

Set of Optimal Computation Points for t : OCP = { n | Latest(n) ¬ Isolated(n) }

Set of Redundant Occurrences of t : RO = { n | Used(n) ¬ (Latest(n) Isolated(n)) }

Introduce a new auxiliary variable h for the term t.

Insert at the entry of every node in OCPOCP the assignment h := t.

Replace every original computation of t in nodes of RORO by h.

Latest Latest & Isolated

LCM Transformation…

Page 30: 05 May 20061 Lazy Code Motion in an SSA World A CS 526 Course Project Patrick Meredith Steven Lauterburg 05 May 2006.

05 May 2006 30

Register pressure is not always reduced. Some desirable code motion is not allowed. Code size can be increased.

Considerations

Page 31: 05 May 20061 Lazy Code Motion in an SSA World A CS 526 Course Project Patrick Meredith Steven Lauterburg 05 May 2006.

05 May 2006 31

How do the live ranges of aa and bb affect “lifetime optimality”?

Reducing Register Pressure?

z := h

h := a + b

y := h

w := h

x := a + b

h := a + b

x := h

y := h

w := h

z := h

h := a + b

Page 32: 05 May 20061 Lazy Code Motion in an SSA World A CS 526 Course Project Patrick Meredith Steven Lauterburg 05 May 2006.

05 May 2006 32

Some desirable code motion is not D-Safe and therefore not allowed.

Code Motion & Down-Safety

w := a + b

w := h

h := a + b

Page 33: 05 May 20061 Lazy Code Motion in an SSA World A CS 526 Course Project Patrick Meredith Steven Lauterburg 05 May 2006.

05 May 2006 33

Late placement of computation points increases code size.

Code Bloat

y := h y := h

h := a + b

y := hy := hh := a +

by := h

h := a + b

y := h

h := a + b

y := h

h := a + b

y := h

Page 34: 05 May 20061 Lazy Code Motion in an SSA World A CS 526 Course Project Patrick Meredith Steven Lauterburg 05 May 2006.

05 May 2006 34

Page 35: 05 May 20061 Lazy Code Motion in an SSA World A CS 526 Course Project Patrick Meredith Steven Lauterburg 05 May 2006.

05 May 2006 35

Equations 1

D-SAFE(n) =

false if n = e

Used(n) otherwiseTransp(n) D-SAFE(m)

m succ(n)

EARLIEST(n) =

true if n = s

(¬Transp(m) otherwise

¬ D-SAFE(m) EARLIEST(m))

Σ Σ m pred(n)

For every node n ≡ v := t′ and every term t T \ V…

Used(n, t) = t SubTerms(t′ )

Transp(n, t) = v Var(t)

Page 36: 05 May 20061 Lazy Code Motion in an SSA World A CS 526 Course Project Patrick Meredith Steven Lauterburg 05 May 2006.

05 May 2006 36

Equations 2

DELAY(n) =

D-Safe(n) Earliest(n) false if n = s

¬Used(m) DELAY(m) otherwise m pred(n)

LATEST(n) =

false if n = e

Delay(n) otherwise

(Used(n) ¬ Delay(m)) m succ(n)

Page 37: 05 May 20061 Lazy Code Motion in an SSA World A CS 526 Course Project Patrick Meredith Steven Lauterburg 05 May 2006.

05 May 2006 37

Equations 3

ISOLATED(n) =

true if n = e

(Latest(m) otherwise

¬ Used(m) ISOLATED(m))

m succ(n)

RO = { n | Used(n) ¬ (Latest(n) Isolated(n))

OCP = { n | Latest(n) ¬ Isolated(n) }