Optimizing Memory Accesses for Spatial Computation Mihai Budiu, Seth Goldstein CGO 2003.

43
Optimizing Memory Accesses for Spatial Computation Mihai Budiu, Seth Goldstein CGO 2003

Transcript of Optimizing Memory Accesses for Spatial Computation Mihai Budiu, Seth Goldstein CGO 2003.

Page 1: Optimizing Memory Accesses for Spatial Computation Mihai Budiu, Seth Goldstein CGO 2003.

Optimizing Memory Accesses for Spatial Computation

Mihai Budiu, Seth Goldstein

CGO 2003

Page 2: Optimizing Memory Accesses for Spatial Computation Mihai Budiu, Seth Goldstein CGO 2003.

2

Optimizing Memory Accesses for Spatial Computation

Program

Compiler

Page 3: Optimizing Memory Accesses for Spatial Computation Mihai Budiu, Seth Goldstein CGO 2003.

3

This work

C

Predicated IR

Optimized IR

Why at CGO?

Page 4: Optimizing Memory Accesses for Spatial Computation Mihai Budiu, Seth Goldstein CGO 2003.

4

Optimizing Memory Accesses for Spatial Computation=*q

*p=

=a[i]

=*q *p= =a[i]

=*p

=*p

This paper describes compiler representations and algorithms to• increase memory access parallelism• remove redundant memory accesses

Tim

e

Page 5: Optimizing Memory Accesses for Spatial Computation Mihai Budiu, Seth Goldstein CGO 2003.

5

...

def-use

may-dep.

:Intermediate Representation

Traditionally

• SSA + predication

• Uniform for scalars and memory

• Explicitly encode may-depend

• Summarize control-flow

• Executable

Our proposal

CFG

Page 6: Optimizing Memory Accesses for Spatial Computation Mihai Budiu, Seth Goldstein CGO 2003.

6

Contributions

• Predicated SSA optimizations for memory– Boolean manipulation instead of CFG dependences– Powerful term-rewriting optimizations for memory– Simple to implement and reason about

• Expose memory parallelism in loops– New loop pipelining techniques– New parallelization method: loop decoupling

Page 7: Optimizing Memory Accesses for Spatial Computation Mihai Budiu, Seth Goldstein CGO 2003.

7

Outline

• Introduction

• Program representation

• Redundant memory operation removal

• Pipelining memory accesses in loops

• Conclusions

Page 8: Optimizing Memory Accesses for Spatial Computation Mihai Budiu, Seth Goldstein CGO 2003.

8

Executable SSA

if (x)y = x*2;

elsey++;

* +

2 y

y’

!

x 1

• Program representation is a graph:• Nodes = operations, edges = values

Page 9: Optimizing Memory Accesses for Spatial Computation Mihai Budiu, Seth Goldstein CGO 2003.

9

Predication

…=*p;if (x)

…=*q;else

*r = …;

(1) …=*p;

(x) …=*q;

(!x) *r = …;

• Predicates encode control-flow• Hyperblock ) branch-free code• Caveat: all optimizations on hyperblock scope

Pred

Page 10: Optimizing Memory Accesses for Spatial Computation Mihai Budiu, Seth Goldstein CGO 2003.

10

Read-write SetsMemory

*p=…;

if (x)…=*q;

else*r =

…;

Entry

Exit

Page 11: Optimizing Memory Accesses for Spatial Computation Mihai Budiu, Seth Goldstein CGO 2003.

11

Token EdgesMemory

*p=…;

if (x)…=*q;

else*r = …;

Entry

Exit

Page 12: Optimizing Memory Accesses for Spatial Computation Mihai Budiu, Seth Goldstein CGO 2003.

12

Tokens ¼ SSA for Memory

*p=…;

if (x)…=*q;

else*r =

…;

Entry

*p=…;

if (x)…=*q;

else*r = …;

Entry

Page 13: Optimizing Memory Accesses for Spatial Computation Mihai Budiu, Seth Goldstein CGO 2003.

13

Meaning of Token Edges• Token graph is maintained transitively reduced

• Focus the optimizer• Linear space complexity in practice

• Maybe dependent• No intervening memory operation

• Independent

…=*q

*p=…

…=*q

*p=…

Page 14: Optimizing Memory Accesses for Spatial Computation Mihai Budiu, Seth Goldstein CGO 2003.

14

Outline• Introduction• Program Representation• Redundant memory operation removal

– Dead code elimination– Load || load– Store ) load– Store ) store– Useless token removal– ...

• Pipelining memory accesses in loops• Evaluation• Conclusions

Page 15: Optimizing Memory Accesses for Spatial Computation Mihai Budiu, Seth Goldstein CGO 2003.

15

Dead Code Elimination

*p=…(false)

Page 16: Optimizing Memory Accesses for Spatial Computation Mihai Budiu, Seth Goldstein CGO 2003.

16

¼ PRE

...=*p(p1) ...=*p(p2) ...=*p(p1 Ç p2)

This corresponds in the CFG to lifting the load to a basic block dominating the original loads

Page 17: Optimizing Memory Accesses for Spatial Computation Mihai Budiu, Seth Goldstein CGO 2003.

17

Forwarding Data (St ) Ld)

…=*p(p2)

*p=…(p1)

…=*p

*p=…(p1)

(p2 Æ : p1)

Load is executed only if store is not

Page 18: Optimizing Memory Accesses for Spatial Computation Mihai Budiu, Seth Goldstein CGO 2003.

18

Forwarding Data (2)

…=*p(p2)

*p=…(p1)

…=*p(false)

*p=…(p1)

• When p2 ) p1 the load becomes dead...• ...i.e., when store dominates load in CFG

Page 19: Optimizing Memory Accesses for Spatial Computation Mihai Budiu, Seth Goldstein CGO 2003.

19

Store-store (1)

*p=...(p2)

*p=…(p1)

*p=...(p2)

*p=…(p1 Æ : p2)

• When p1 ) p2 the first store becomes dead...• ...i.e., when second store post-dominates first in CFG

Page 20: Optimizing Memory Accesses for Spatial Computation Mihai Budiu, Seth Goldstein CGO 2003.

20

Store-store (2)

*p=...(p2)

*p=…(p1)

*p=...(p2)

*p=…(p1 Æ : p2)

• Token edge eliminated, but...• ...transitive closure of tokens preserved

Page 21: Optimizing Memory Accesses for Spatial Computation Mihai Budiu, Seth Goldstein CGO 2003.

21

Key Observation

The control-dependence tests and transformations

(i.e., dominance, post-dominance)

are carried by simple predicate

Boolean manipulations.

Page 22: Optimizing Memory Accesses for Spatial Computation Mihai Budiu, Seth Goldstein CGO 2003.

22

Implementation Is Clean

Optimization LOC

Useless dependence removal 160

Immutable loads 70

Dead-code elimination (incl. memory op) 66

Load-after-load and store-after-store removal 153

Redundant load and store removal 94

Transitive reduction of token edges 61

Loop-invariant scalar & load discovery 74

Page 23: Optimizing Memory Accesses for Spatial Computation Mihai Budiu, Seth Goldstein CGO 2003.

23

Operations Removed:- static data -

0

5

10

15

20

25

30

adpc

m_e

adpc

m_d

gsm

_e

gsm

_d

epic_

e

epic_

d

mpe

g2_e

mpe

g2_d

jpeg

_e

jpeg

_d

pegw

it_e

pegw

it_d

g721

_e

g721

_d

mes

a go

m88

ksim

com

pres

s

li

ijpeg pe

rl

vorte

x

reads

writes

Per

cent

Mediabench SpecInt95

Page 24: Optimizing Memory Accesses for Spatial Computation Mihai Budiu, Seth Goldstein CGO 2003.

24

Operations Removed:- dynamic data -

0

5

10

15

20

25

adpc

m_e

adpc

m_d

gsm

_e

gsm

_d

epic_

e

epic_

d

mpe

g2_e

mpe

g2_d

jpeg

_e

jpeg

_d

pegw

it_e

pegw

it_d

g721

_e

g721

_d

mes

a go

m88

ksim

com

pres

s

li

ijpeg pe

rl

vorte

x

readswrites

57 43

Per

cent

Mediabench SpecInt95

Page 25: Optimizing Memory Accesses for Spatial Computation Mihai Budiu, Seth Goldstein CGO 2003.

25

Outline• Introduction

• Program Representation

• Redundant memory operation removal

• Pipelining memory accesses in loops

• Conclusions

Page 26: Optimizing Memory Accesses for Spatial Computation Mihai Budiu, Seth Goldstein CGO 2003.

26

Loop Pipelining

...=*in++;

*out++ =...

...=*in++;

*out++ =...

• 1 loop ) 2 loops, which can slip with respect to each other• ‘in’ slips ahead of ‘out’ ) pipelining of the loop body

Page 27: Optimizing Memory Accesses for Spatial Computation Mihai Budiu, Seth Goldstein CGO 2003.

27

One Token Loop Per “Object”

extern int a[ ];

void g(int* p)

{

int i;

for (i=0; i < N; i++)

a[i] += *p;

}

a[ ] =*a

*a=

a

a

=*p

other

other

Page 28: Optimizing Memory Accesses for Spatial Computation Mihai Budiu, Seth Goldstein CGO 2003.

28

All accesses after current iteration

All accesses prior to current iteration

Inter-iteration Dependences

a other

=*p=*a

*a=

a other

!

Page 29: Optimizing Memory Accesses for Spatial Computation Mihai Budiu, Seth Goldstein CGO 2003.

29

collector

generator

Monotone Addresses

*a++=

• a[1] must receive token from a[0]• but these are independent!

*a++=

Page 30: Optimizing Memory Accesses for Spatial Computation Mihai Budiu, Seth Goldstein CGO 2003.

30

independent

Loop Decoupling: Motivation

for (i=0; i < N; i++) {

a[i] = ....

.... = a[i+3];

}

a

a[i]=

=a[i+3]

a

a[i]=

=a[i+3]

Page 31: Optimizing Memory Accesses for Spatial Computation Mihai Budiu, Seth Goldstein CGO 2003.

31

Loop Decoupling

for (i=0; i < N; i++) {

a[i] = ....

.... = a[i+3];

}

a0

a[i]=

=a[i+3]

a3

tk(3)

Slip control

• Token generator emits 3 tokens “instantly”• It allows a0 loop to slip at most 3 iterations ahead of a3

Page 32: Optimizing Memory Accesses for Spatial Computation Mihai Budiu, Seth Goldstein CGO 2003.

32

Performance Impact of Memory Optimizations

1

1.1

1.2

1.3

1.4

1.5

1.6

1.7

adpc

m_e

adpc

m_d

gsm_

e

gsm_

d

epic_

e

epic_

d

mpeg

2_e

mpeg

2_d

jpeg_

e

jpeg_

d

pegw

it_e

pegw

it_d

g721

_e

g721

_d mesa

m88k

sim

comp

ress

li

ijpeg pe

rl

vorte

x

Spe

ed-u

p vs

. no

mem

ory

optim

izat

ions

2.1

2.0

Mediabench SpecInt95

Page 33: Optimizing Memory Accesses for Spatial Computation Mihai Budiu, Seth Goldstein CGO 2003.

33

Conclusions

• Tokens = compact representation of memory dependences

• Explicit dependences enable easy & powerful optimizations

• Simple predicate manipulation replaces control-flow transforms

• Fine-grain dependence information enables loop pipelining

• Token generators + loop decoupling = dynamic slip control

Page 34: Optimizing Memory Accesses for Spatial Computation Mihai Budiu, Seth Goldstein CGO 2003.

34

Backup Slides

• Compilation speed• Compiler structure• Tokens in hardware• Cycle-free condition• How performance is evaluated• Sources of performance• Aren’t these optimizations well known?• Computing predicates

Page 35: Optimizing Memory Accesses for Spatial Computation Mihai Budiu, Seth Goldstein CGO 2003.

35

Compilation Speed

• On average 3.5x slower than gcc -O3• Max 10x slower• We do intra-procedural pointer analysis, but no scheduling or register allocation

back

Page 36: Optimizing Memory Accesses for Spatial Computation Mihai Budiu, Seth Goldstein CGO 2003.

36

Compiler Structure

Suif CC

C/FORTRAN

low Suif IR

Pointer analysisLive var. analysisCFG constructionUnreachable codeBuild hyperblocksCtrl dominance Path predicates

high Suif IR

inliningunrolling

call-graph

Pegasus(Predicated SSA)

call-graph

C circuitsimulation

Verilog

back

CSEDead-code

PREInduction variablesStrength reductionLoop-invariant lift

ReassociationMemory optimizationConstant propagation

Constant foldingUnreachable code

Page 37: Optimizing Memory Accesses for Spatial Computation Mihai Budiu, Seth Goldstein CGO 2003.

37

Tokens in Hardware

Load

add

data

predtoken

token

Memory

• Tokens are actual operation inputs and outputs• Operation waits for token to execute• Output token released as soon as side-effect certain

back

LSQ

Page 38: Optimizing Memory Accesses for Spatial Computation Mihai Budiu, Seth Goldstein CGO 2003.

38

Cycle-free Condition

...=*p(p1)

...=*p(p2)

...=*p(p1 Ç p2)

• Requires a reachability computation to test• Using memoization complexity is amortized constant

back

Page 39: Optimizing Memory Accesses for Spatial Computation Mihai Budiu, Seth Goldstein CGO 2003.

39

How Performance Is Evaluated

C

Unlimited ILP

LSQ

limited BW(2 words/c)

L18K

L21/4M

Mem

2

8

72

back

Page 40: Optimizing Memory Accesses for Spatial Computation Mihai Budiu, Seth Goldstein CGO 2003.

40

Sources of Performance

• Removal of redundant operations

• More freedom in scheduling

• Pipelining loops

back

Page 41: Optimizing Memory Accesses for Spatial Computation Mihai Budiu, Seth Goldstein CGO 2003.

41

Aren’t These Opts. Well Known?

• gcc –O3, Pentium• Sun Workshop CC –xo5, Sparc• DEC cc –O4, Alpha• MIPSpro cc –O4, SGI• SGI ORC –O4, Itanium• IBM cc –O3, AIX• Our compiler

back

void f(unsigned*p, unsigned a[], int i){

if (p) a[i] += p;else a[i]=1;a[i] <<= a[i+1];

}

Only ones to removeaccesses to a[i]

Page 42: Optimizing Memory Accesses for Spatial Computation Mihai Budiu, Seth Goldstein CGO 2003.

42

Computing Predicates

• Correct for irreducible graphs• Correct even when speculatively computed • Can be eagerly computed

s t

b

back

Page 43: Optimizing Memory Accesses for Spatial Computation Mihai Budiu, Seth Goldstein CGO 2003.

43

Spatial Computation