Spatial Computation Thesis committee: Seth Goldstein Peter Lee Todd Mowry Babak Falsafi Nevin...

77
Spatial Computation Thesis committee: Seth Goldstein Peter Lee Todd Mowry Babak Falsafi Nevin Heintze Ph.D. Thesis defense, December 8, 2003 SCS Mihai Budiu CMU CS

Transcript of Spatial Computation Thesis committee: Seth Goldstein Peter Lee Todd Mowry Babak Falsafi Nevin...

Page 1: Spatial Computation Thesis committee: Seth Goldstein Peter Lee Todd Mowry Babak Falsafi Nevin Heintze Ph.D. Thesis defense, December 8, 2003 SCS Mihai.

Spatial Computation

Thesis committee:Seth Goldstein

Peter Lee

Todd Mowry

Babak Falsafi

Nevin Heintze

Ph.D. Thesis defense, December 8, 2003

SCS

Mihai BudiuCMU CS

Page 2: Spatial Computation Thesis committee: Seth Goldstein Peter Lee Todd Mowry Babak Falsafi Nevin Heintze Ph.D. Thesis defense, December 8, 2003 SCS Mihai.

2

Spatial Computation

Thesis committee:Seth Goldstein

Peter Lee

Todd Mowry

Babak Falsafi

Nevin Heintze

Ph.D. Thesis defense, December 8, 2003

SCSA model of general-purpose computationbased on Application-Specific Hardware.

Page 3: Spatial Computation Thesis committee: Seth Goldstein Peter Lee Todd Mowry Babak Falsafi Nevin Heintze Ph.D. Thesis defense, December 8, 2003 SCS Mihai.

3

Thesis StatementApplication-Specific Hardware (ASH):

• can be synthesized by adapting software compilation for predicated architectures,

• provides high-performance for programs withhigh ILP, with very low power consumption,

• is a more scalable and efficient computation substrate than monolithic processors.

Page 4: Spatial Computation Thesis committee: Seth Goldstein Peter Lee Todd Mowry Babak Falsafi Nevin Heintze Ph.D. Thesis defense, December 8, 2003 SCS Mihai.

4

Outline• Introduction

• Compiling for ASH

• Media processing on ASH

• ASH vs. superscalar processors

• Conclusions

Page 5: Spatial Computation Thesis committee: Seth Goldstein Peter Lee Todd Mowry Babak Falsafi Nevin Heintze Ph.D. Thesis defense, December 8, 2003 SCS Mihai.

5

CPU Problems

• Complexity

• Power

• Global Signals

• Limited ILP

Page 6: Spatial Computation Thesis committee: Seth Goldstein Peter Lee Todd Mowry Babak Falsafi Nevin Heintze Ph.D. Thesis defense, December 8, 2003 SCS Mihai.

6

Design Complexity

from Michael Flynn’s FCRC 2003 talk

58%/Year

21%/Year

1981

1983

1985

1987

1989

1991

1993

1995

1997

1999

2003

2001

2005

2007

2009

xxx

x xx

x

Logic transistors/chip

Transistors/staff*month

Source: S. Malik, orig Sematech

Prod

uctiv

ity

10

1,000,000

10,000,000

100,000,000

1000

100

10,000

100,000

10

1000

100

10,000

100,000

1,000,000

10,000,000

Chi

p si

ze (K

tran

sist

ors)

Design Time:CAD productivity favors FPL

2.5

.10

.35

Page 7: Spatial Computation Thesis committee: Seth Goldstein Peter Lee Todd Mowry Babak Falsafi Nevin Heintze Ph.D. Thesis defense, December 8, 2003 SCS Mihai.

7

Communication vs. Computation

5ps 20ps

gate wire

Power consumption on wires is also dominant

Page 8: Spatial Computation Thesis committee: Seth Goldstein Peter Lee Todd Mowry Babak Falsafi Nevin Heintze Ph.D. Thesis defense, December 8, 2003 SCS Mihai.

8

Our Approach: ASH

Application-Specific Hardware

Page 9: Spatial Computation Thesis committee: Seth Goldstein Peter Lee Todd Mowry Babak Falsafi Nevin Heintze Ph.D. Thesis defense, December 8, 2003 SCS Mihai.

9

1.

2.

1.

2.Programs

Programs

Resource Binding Time

CPU ASH

Page 10: Spatial Computation Thesis committee: Seth Goldstein Peter Lee Todd Mowry Babak Falsafi Nevin Heintze Ph.D. Thesis defense, December 8, 2003 SCS Mihai.

10

Hardware Interface

CPU ASH

ISA

software

hardware

software

hardwaregates

virtual ISA

Page 11: Spatial Computation Thesis committee: Seth Goldstein Peter Lee Todd Mowry Babak Falsafi Nevin Heintze Ph.D. Thesis defense, December 8, 2003 SCS Mihai.

11

Application-Specific HardwareC program

Compiler

Dataflow IR

Reconfigurable/custom hw

Page 12: Spatial Computation Thesis committee: Seth Goldstein Peter Lee Todd Mowry Babak Falsafi Nevin Heintze Ph.D. Thesis defense, December 8, 2003 SCS Mihai.

12

Contributions

Compilation

Computerarchitecture

Reconfigurablecomputing

Embeddedsystems

Asynchronouscircuits

High-levelsynthesis

Dataflowmachines

Nanotechnology

theory

syste

ms

Page 13: Spatial Computation Thesis committee: Seth Goldstein Peter Lee Todd Mowry Babak Falsafi Nevin Heintze Ph.D. Thesis defense, December 8, 2003 SCS Mihai.

13

Outline• Introduction

• CASH: Compiling for ASH

• Media processing on ASH

• ASH vs. superscalar processors

• Conclusions

Page 14: Spatial Computation Thesis committee: Seth Goldstein Peter Lee Todd Mowry Babak Falsafi Nevin Heintze Ph.D. Thesis defense, December 8, 2003 SCS Mihai.

14

Computation = Dataflow

• Operations ) functional units• Variables ) wires• No interpretation

x = a & 7;...

y = x >> 2;

Programs

&

a 7

>>

2

x

Circuits

Page 15: Spatial Computation Thesis committee: Seth Goldstein Peter Lee Todd Mowry Babak Falsafi Nevin Heintze Ph.D. Thesis defense, December 8, 2003 SCS Mihai.

15

Basic Operation

+data

valid

ack

latch

Page 16: Spatial Computation Thesis committee: Seth Goldstein Peter Lee Todd Mowry Babak Falsafi Nevin Heintze Ph.D. Thesis defense, December 8, 2003 SCS Mihai.

16

+

Asynchronous Computation

data

valid

ack

1

+

2

+

3

+

4

+

8

+

7

+

6

+

5

latch

Page 17: Spatial Computation Thesis committee: Seth Goldstein Peter Lee Todd Mowry Babak Falsafi Nevin Heintze Ph.D. Thesis defense, December 8, 2003 SCS Mihai.

17

Distributed Control Logic

+ -

ackrdy

FSM

asynchronous control

short, local wires

Page 18: Spatial Computation Thesis committee: Seth Goldstein Peter Lee Todd Mowry Babak Falsafi Nevin Heintze Ph.D. Thesis defense, December 8, 2003 SCS Mihai.

18

Forward Branches

if (x > 0) y = -x;

elsey = b*x;

*

xb 0

y

!

- >

Conditionals ) Speculation critical path

Page 19: Spatial Computation Thesis committee: Seth Goldstein Peter Lee Todd Mowry Babak Falsafi Nevin Heintze Ph.D. Thesis defense, December 8, 2003 SCS Mihai.

19

Control Flow ) Data Flow

datapredicate

Merge (label)

Gateway

data

data

Split (branch)p

!

Page 20: Spatial Computation Thesis committee: Seth Goldstein Peter Lee Todd Mowry Babak Falsafi Nevin Heintze Ph.D. Thesis defense, December 8, 2003 SCS Mihai.

20

i

+1< 100

0

*

+

sum

0

Loops

int sum=0, i;

for (i=0; i < 100; i++)

sum += i*i;

return sum;return sum; !

ret

Page 21: Spatial Computation Thesis committee: Seth Goldstein Peter Lee Todd Mowry Babak Falsafi Nevin Heintze Ph.D. Thesis defense, December 8, 2003 SCS Mihai.

21

no speculation

sequencingof side-effects

Predication and Side-Effects

Load

addr

data

pred

token

token

tomemory

Page 22: Spatial Computation Thesis committee: Seth Goldstein Peter Lee Todd Mowry Babak Falsafi Nevin Heintze Ph.D. Thesis defense, December 8, 2003 SCS Mihai.

22

Thesis StatementApplication-Specific Hardware:

• can be synthesized by adapting software compilation for predicated architectures,

• provides high-performance for programs withhigh ILP, with very low power consumption,

• is a more scalable and efficient computation substrate than monolithic processors.

Page 23: Spatial Computation Thesis committee: Seth Goldstein Peter Lee Todd Mowry Babak Falsafi Nevin Heintze Ph.D. Thesis defense, December 8, 2003 SCS Mihai.

23

Outline• Introduction• CASH: Compiling for ASH

– An optimization on the SIDE

• Media processing on ASH• ASH vs. superscalar processors• Conclusions

skip to

Page 24: Spatial Computation Thesis committee: Seth Goldstein Peter Lee Todd Mowry Babak Falsafi Nevin Heintze Ph.D. Thesis defense, December 8, 2003 SCS Mihai.

24

Availability Dataflow Analysis

y

y = a*b;

...

if (x) {

...

... = a*b;

}

Page 25: Spatial Computation Thesis committee: Seth Goldstein Peter Lee Todd Mowry Babak Falsafi Nevin Heintze Ph.D. Thesis defense, December 8, 2003 SCS Mihai.

25

Dataflow Analysis Is Conservative

if (x) {

...

y = a*b;

}

...

... = a*b;y?

Page 26: Spatial Computation Thesis committee: Seth Goldstein Peter Lee Todd Mowry Babak Falsafi Nevin Heintze Ph.D. Thesis defense, December 8, 2003 SCS Mihai.

26

Static Instantiation, Dynamic Evaluation

flag = false;

if (x) {

...

y = a*b;

flag = true;

}

...

... = flag ? y : a*b;

Page 27: Spatial Computation Thesis committee: Seth Goldstein Peter Lee Todd Mowry Babak Falsafi Nevin Heintze Ph.D. Thesis defense, December 8, 2003 SCS Mihai.

27

SIDE Register Promotion Impact

0

5

10

15

20

25

30

ad

pcm

_e

ad

pcm

_d

gsm

_e

gsm

_d

ep

ic_

e

ep

ic_

d

mp

eg

2_

e

mp

eg

2_

d

jpe

g_

e

jpe

g_

d

pe

gw

it_e

pe

gw

it_d

g7

21

_e

g7

21

_d

pg

p_

e

pg

p_

d

rast

a

me

sa

09

9.g

o

12

4.m

88

ksim

12

9.c

om

pre

ss

13

0.li

13

2.ij

pe

g

13

4.p

erl

14

7.v

ort

ex

18

3.e

qu

ake

18

8.a

mm

p

16

4.g

zip

17

5.v

pr

17

6.g

cc

18

1.m

cf

19

7.p

ars

er

25

4.g

ap

30

0.tw

olf

%st promo

%st PRE

53

0

5

10

15

20

25

30

35

40

45

adp

cm_e

adp

cm_d

gsm

_e

gsm

_d

epic

_e

epic

_d

mpe

g2_e

mpe

g2_d

jpeg

_e

jpeg

_d

peg

wit_

e

peg

wit_

d

g72

1_e

g72

1_d

pgp

_e

pgp

_d

rast

a

mes

a

099

.go

124

.m88

ksim

129

.co

mp

ress

130

.li

132

.ijpe

g

134

.pe

rl

147

.vo

rtex

183

.eq

uake

188

.am

mp

164

.gzi

p

175

.vp

r

176

.gcc

181

.mcf

197

.pa

rser

254

.ga

p

300

.twol

f

% ld promo

% ld PRE

Loads

Stores

% r

educ

tion

Page 28: Spatial Computation Thesis committee: Seth Goldstein Peter Lee Todd Mowry Babak Falsafi Nevin Heintze Ph.D. Thesis defense, December 8, 2003 SCS Mihai.

28

Outline• Introduction• CASH: Compiling for ASH• Media processing on ASH

• ASH vs. superscalar processors• Conclusions

Page 29: Spatial Computation Thesis committee: Seth Goldstein Peter Lee Todd Mowry Babak Falsafi Nevin Heintze Ph.D. Thesis defense, December 8, 2003 SCS Mihai.

29

Performance Evaluation

ASH

LSQ

limited BW

L18K

L21/4M

Mem

CPU: 4-way OOO

Assumption: all operations have the same latency.

Page 30: Spatial Computation Thesis committee: Seth Goldstein Peter Lee Todd Mowry Babak Falsafi Nevin Heintze Ph.D. Thesis defense, December 8, 2003 SCS Mihai.

30

Media Kernels, vs 4-way OOO

0

0.5

1

1.5

2

2.5

3ad

pcm

_d

adpc

m_e

epic

_d

epic

_e

g721

_d

g721

_e

gsm

_d

gsm

_e

jpeg

_d

jpeg

_e

mes

a

mpe

g2_d

mpe

g2_e

pegw

it_d

pegw

it_e

rast

a

Tim

es f

aste

r

125.85.8

Page 31: Spatial Computation Thesis committee: Seth Goldstein Peter Lee Todd Mowry Babak Falsafi Nevin Heintze Ph.D. Thesis defense, December 8, 2003 SCS Mihai.

31

Media Kernels, IPC

0

5

10

15

20

25

adpc

m_d

adpc

m_e

epic

_d

epic

_e

g721

_d

g721

_e

gsm

_d

gsm

_e

jpeg

_d

jpeg

_e

mes

a

mpe

g2_d

mpe

g2_e

pegw

it_d

pegw

it_e

rast

a

Base IPC

ASH IPC

4

Page 32: Spatial Computation Thesis committee: Seth Goldstein Peter Lee Todd Mowry Babak Falsafi Nevin Heintze Ph.D. Thesis defense, December 8, 2003 SCS Mihai.

32

Speed-up IPC Correlation

0

1

2

3

4

5

6

7

8

9

10ad

pcm

_d

adpc

m_e

epic

_d

epic

_e

g721

_d

g721

_e

gsm

_d

gsm

_e

jpeg

_d

jpeg

_e

mes

a

mpe

g2_d

mpe

g2_e

pegw

it_d

pegw

it_e

rast

a

Tim

es b

igg

er

Speed-up

IPC Ratio

12

Page 33: Spatial Computation Thesis committee: Seth Goldstein Peter Lee Todd Mowry Babak Falsafi Nevin Heintze Ph.D. Thesis defense, December 8, 2003 SCS Mihai.

33

Low-Level EvaluationC

CASHcore

Verilog back-end

Synopsys,Cadence P/R

Results shown so far.All results in thesis.

Results in the next two slides.

ASIC

180nm std. cell library, 2V

~1999technology

Page 34: Spatial Computation Thesis committee: Seth Goldstein Peter Lee Todd Mowry Babak Falsafi Nevin Heintze Ph.D. Thesis defense, December 8, 2003 SCS Mihai.

34

Area

0

2

4

6

8

10

12

adpc

m_d

g721

_d

g721

_e

gsm

_d

gsm

_e

jpeg_

d

mpe

g2_d

mpe

g2_e

pegw

it_d

pegw

it_e

Sq

uar

e m

m

Reference: P4 in 180nm has 217mm2

Page 35: Spatial Computation Thesis committee: Seth Goldstein Peter Lee Todd Mowry Babak Falsafi Nevin Heintze Ph.D. Thesis defense, December 8, 2003 SCS Mihai.

35

Power

vs 4-way OOO superscalar, 600 Mhz, with clock gating (Wattch), ~ 6W

0

50

100

150

200

250

300

350

Tim

es s

mal

ler

than

OO

O

power ratio 70 41 41 129 147 94 121 136 303 303

adpcm_d g721_d g721_e gsm_d gsm_e jpeg_d mpeg2_d mpeg2_e pegwit_d pegwit_e

Page 36: Spatial Computation Thesis committee: Seth Goldstein Peter Lee Todd Mowry Babak Falsafi Nevin Heintze Ph.D. Thesis defense, December 8, 2003 SCS Mihai.

36

Thesis StatementApplication-Specific Hardware:

• can be synthesized by adapting software compilation for predicated architectures,

• provides high-performance for programs withhigh ILP, with very low power consumption,

• is a more scalable and efficient computation substrate than monolithic processors.

Page 37: Spatial Computation Thesis committee: Seth Goldstein Peter Lee Todd Mowry Babak Falsafi Nevin Heintze Ph.D. Thesis defense, December 8, 2003 SCS Mihai.

37

Outline• Introduction• CASH: Compiling for ASH• Media processing on ASH

– dataflow pipelining

• ASH vs. superscalar processors• Conclusions

skip to

Page 38: Spatial Computation Thesis committee: Seth Goldstein Peter Lee Todd Mowry Babak Falsafi Nevin Heintze Ph.D. Thesis defense, December 8, 2003 SCS Mihai.

38

Pipeliningi

+

<=

100

1

*

+

sum

pipelinedmultiplier(8 stages)

int sum=0, i;

for (i=0; i < 100; i++)

sum += i*i;

return sum;

cycle=1

Page 39: Spatial Computation Thesis committee: Seth Goldstein Peter Lee Todd Mowry Babak Falsafi Nevin Heintze Ph.D. Thesis defense, December 8, 2003 SCS Mihai.

39

Pipeliningi

+

<=

100

1

*

+

sum

cycle=2

Page 40: Spatial Computation Thesis committee: Seth Goldstein Peter Lee Todd Mowry Babak Falsafi Nevin Heintze Ph.D. Thesis defense, December 8, 2003 SCS Mihai.

40

Pipeliningi

+

<=

100

1

*

+

sum

cycle=3

Page 41: Spatial Computation Thesis committee: Seth Goldstein Peter Lee Todd Mowry Babak Falsafi Nevin Heintze Ph.D. Thesis defense, December 8, 2003 SCS Mihai.

41

Pipeliningi

+

<=

100

1

*

+

sum

cycle=4

Page 42: Spatial Computation Thesis committee: Seth Goldstein Peter Lee Todd Mowry Babak Falsafi Nevin Heintze Ph.D. Thesis defense, December 8, 2003 SCS Mihai.

42

Pipeliningi

+

<=

100

1

i=1

i=0

+

sum

cycle=5

pipeline balancing

Page 43: Spatial Computation Thesis committee: Seth Goldstein Peter Lee Todd Mowry Babak Falsafi Nevin Heintze Ph.D. Thesis defense, December 8, 2003 SCS Mihai.

43

Outline• Introduction

• CASH: Compiling for ASH

• Media processing on ASH

• ASH vs. superscalar processors

• Conclusions

Page 44: Spatial Computation Thesis committee: Seth Goldstein Peter Lee Todd Mowry Babak Falsafi Nevin Heintze Ph.D. Thesis defense, December 8, 2003 SCS Mihai.

44

This Is Obvious!

ASH runs at full dataflow speed, so CPU cannot do any better(if compilers equally good).

Page 45: Spatial Computation Thesis committee: Seth Goldstein Peter Lee Todd Mowry Babak Falsafi Nevin Heintze Ph.D. Thesis defense, December 8, 2003 SCS Mihai.

45

SpecInt95, ASH vs 4-way OOO

-50

-40

-30

-20

-10

0

10

20

300

99

.go

12

4.m

88

ksim

12

9.c

om

pre

ss

13

0.li

13

2.ij

pe

g

13

4.p

erl

14

7.v

ort

ex

Pe

rce

nt

slo

we

r /

fas

ter

Page 46: Spatial Computation Thesis committee: Seth Goldstein Peter Lee Todd Mowry Babak Falsafi Nevin Heintze Ph.D. Thesis defense, December 8, 2003 SCS Mihai.

46

Predicted not takenEffectively a noop for CPU!

Predicted taken.

Branch Prediction

for (i=0; i < N; i++) {

...

if (exception) break;

}

i

+

<

1

&

!

exception

result available before inputs

ASH crit path

CPU crit path

Page 47: Spatial Computation Thesis committee: Seth Goldstein Peter Lee Todd Mowry Babak Falsafi Nevin Heintze Ph.D. Thesis defense, December 8, 2003 SCS Mihai.

47

SpecInt95, perfect prediction

-60

-40

-20

0

20

40

60

099.

go

124.

m88

ksim

129.

com

pres

s

130.

li

132.

ijpeg

134.

perl

147.

vort

ex

Per

ce

nt

slo

we

r/fa

ster

baseline

prediction

no data

Page 48: Spatial Computation Thesis committee: Seth Goldstein Peter Lee Todd Mowry Babak Falsafi Nevin Heintze Ph.D. Thesis defense, December 8, 2003 SCS Mihai.

48

ASH Problems

• Both branch and join not free• Static dataflow (no re-issue of same instr)• Memory is “far”• Fully static

– No branch prediction– No dynamic unrolling– No register renaming

• Calls/returns not lenient• ...

Page 49: Spatial Computation Thesis committee: Seth Goldstein Peter Lee Todd Mowry Babak Falsafi Nevin Heintze Ph.D. Thesis defense, December 8, 2003 SCS Mihai.

49

Thesis StatementApplication-Specific Hardware:

• can be synthesized by adapting software compilation for predicated architectures,

• provides high-performance for programs withhigh ILP, with very low power consumption,

• is a more scalable and efficient computation substrate than monolithic processors.

Page 50: Spatial Computation Thesis committee: Seth Goldstein Peter Lee Todd Mowry Babak Falsafi Nevin Heintze Ph.D. Thesis defense, December 8, 2003 SCS Mihai.

50

Outline

Introduction

+ CASH: Compiling for ASH

+ Media processing on ASH

+ ASH vs. superscalar processors

= Conclusions

Page 51: Spatial Computation Thesis committee: Seth Goldstein Peter Lee Todd Mowry Babak Falsafi Nevin Heintze Ph.D. Thesis defense, December 8, 2003 SCS Mihai.

51

• low power

• simple verification?

• specialized to app.

• unlimited ILP

• simple hardware

• no fixed window

• economies of scale

• highly optimized

• branch prediction

• control speculation

• full-dataflow

• global signals/decision

Strengths

Page 52: Spatial Computation Thesis committee: Seth Goldstein Peter Lee Todd Mowry Babak Falsafi Nevin Heintze Ph.D. Thesis defense, December 8, 2003 SCS Mihai.

52

Conclusions

• Compiling “around the ISA” is a fruitful research approach.

• Distributed computation structures require more synchronization overhead.

• Spatial Computation efficiently implements high-ILP computation with very low power.

Page 53: Spatial Computation Thesis committee: Seth Goldstein Peter Lee Todd Mowry Babak Falsafi Nevin Heintze Ph.D. Thesis defense, December 8, 2003 SCS Mihai.

53

Backup Slides

• Control logic • Pipeline balancing• Lenient execution• Dynamic Critical Path• Memory PRE• Critical path analysis• CPU + ASH

Page 54: Spatial Computation Thesis committee: Seth Goldstein Peter Lee Todd Mowry Babak Falsafi Nevin Heintze Ph.D. Thesis defense, December 8, 2003 SCS Mihai.

54

Control Logic

C

C

Reg

rdyin

ackin

rdyoutackout

datain dataout

back back to talk

Page 55: Spatial Computation Thesis committee: Seth Goldstein Peter Lee Todd Mowry Babak Falsafi Nevin Heintze Ph.D. Thesis defense, December 8, 2003 SCS Mihai.

55

Last-Arrival Events

+

data

valid

ack

• Event enabling the generation of a result• May be an ack• Critical path=collection of last-arrival edges

Page 56: Spatial Computation Thesis committee: Seth Goldstein Peter Lee Todd Mowry Babak Falsafi Nevin Heintze Ph.D. Thesis defense, December 8, 2003 SCS Mihai.

56

Dynamic Critical Path

3. Some edges may repeat 2. Trace back along

last-arrival edges

1. Start from last node

back back to analysis

Page 57: Spatial Computation Thesis committee: Seth Goldstein Peter Lee Todd Mowry Babak Falsafi Nevin Heintze Ph.D. Thesis defense, December 8, 2003 SCS Mihai.

57

Critical Paths

if (x > 0) y = -x;

elsey = b*x;

*

xb 0

y

!

- >

Page 58: Spatial Computation Thesis committee: Seth Goldstein Peter Lee Todd Mowry Babak Falsafi Nevin Heintze Ph.D. Thesis defense, December 8, 2003 SCS Mihai.

58

Lenient Operations

if (x > 0) y = -x;

elsey = b*x;

*

xb 0

y

!

- >

Solve the problem of unbalanced pathsback back to talk

Page 59: Spatial Computation Thesis committee: Seth Goldstein Peter Lee Todd Mowry Babak Falsafi Nevin Heintze Ph.D. Thesis defense, December 8, 2003 SCS Mihai.

59

Pipeliningi

+

<=

100

1

*i=1

i=0

+

sum

cycle=6

Page 60: Spatial Computation Thesis committee: Seth Goldstein Peter Lee Todd Mowry Babak Falsafi Nevin Heintze Ph.D. Thesis defense, December 8, 2003 SCS Mihai.

60

Pipeliningi

+

<=

100

1

*

+

sum

i’s loop

sum’s loop

Longlatency pipe

predicate

cycle=7

Page 61: Spatial Computation Thesis committee: Seth Goldstein Peter Lee Todd Mowry Babak Falsafi Nevin Heintze Ph.D. Thesis defense, December 8, 2003 SCS Mihai.

61

Predicate ackedge is on thecritical path.

Pipeliningi

+

<=

100

1

*

+

sum

critical pathi’s loop

sum’s loop

Page 62: Spatial Computation Thesis committee: Seth Goldstein Peter Lee Todd Mowry Babak Falsafi Nevin Heintze Ph.D. Thesis defense, December 8, 2003 SCS Mihai.

62

Pipelinine balancing i

+

<=

100

1

*

+

sum

i’s loop

sum’s loop

decouplingFIFO

cycle=7

Page 63: Spatial Computation Thesis committee: Seth Goldstein Peter Lee Todd Mowry Babak Falsafi Nevin Heintze Ph.D. Thesis defense, December 8, 2003 SCS Mihai.

63

Pipelinine balancing i

+

<=

100

1

*

+

sum

i’s loop

sum’s loop

critical path

decouplingFIFO

back back to presentation

Page 64: Spatial Computation Thesis committee: Seth Goldstein Peter Lee Todd Mowry Babak Falsafi Nevin Heintze Ph.D. Thesis defense, December 8, 2003 SCS Mihai.

64

Register Promotion

…=*p(p2)

*p=…(p1)

…=*p

*p=…(p1)

(p2 Æ : p1)

Load is executed only if store is not

Page 65: Spatial Computation Thesis committee: Seth Goldstein Peter Lee Todd Mowry Babak Falsafi Nevin Heintze Ph.D. Thesis defense, December 8, 2003 SCS Mihai.

65

Register Promotion (2)

…=*p(p2)

*p=…(p1)

…=*p(false)

*p=…(p1)

• When p2 ) p1 the load becomes dead...• ...i.e., when store dominates load in CFG

back

Page 66: Spatial Computation Thesis committee: Seth Goldstein Peter Lee Todd Mowry Babak Falsafi Nevin Heintze Ph.D. Thesis defense, December 8, 2003 SCS Mihai.

66

¼ PRE

...=*p(p1) ...=*p(p2) ...=*p(p1 Ç p2)

This corresponds in the CFG to lifting the load to a basic block dominating the original loads

Page 67: Spatial Computation Thesis committee: Seth Goldstein Peter Lee Todd Mowry Babak Falsafi Nevin Heintze Ph.D. Thesis defense, December 8, 2003 SCS Mihai.

67

Store-store (1)

*p=...(p2)

*p=…(p1)

*p=...(p2)

*p=…(p1 Æ : p2)

• When p1 ) p2 the first store becomes dead...• ...i.e., when second store post-dominates first in CFG

Page 68: Spatial Computation Thesis committee: Seth Goldstein Peter Lee Todd Mowry Babak Falsafi Nevin Heintze Ph.D. Thesis defense, December 8, 2003 SCS Mihai.

68

Store-store (2)

*p=...(p2)

*p=…(p1)

*p=...(p2)

*p=…(p1 Æ : p2)

• Token edge eliminated, but...• ...transitive closure of tokens preserved

back

Page 69: Spatial Computation Thesis committee: Seth Goldstein Peter Lee Todd Mowry Babak Falsafi Nevin Heintze Ph.D. Thesis defense, December 8, 2003 SCS Mihai.

69

A Code Fragment

for(i = 0; i < 64; i++) {

for (j = 0; X[j].r != 0xF; j++)

if (X[j].r == i)

break;

Y[i] = X[j].q;

}

SpecINT95:124.m88ksim:init_processor, stylized

Page 70: Spatial Computation Thesis committee: Seth Goldstein Peter Lee Todd Mowry Babak Falsafi Nevin Heintze Ph.D. Thesis defense, December 8, 2003 SCS Mihai.

70

Dynamic Critical Path

for (j = 0; X[j].r != 0xF; j++)

if (X[j].r == i)

break;

load predicate

loop predicate

sizeof(X[j])

definition

Page 71: Spatial Computation Thesis committee: Seth Goldstein Peter Lee Todd Mowry Babak Falsafi Nevin Heintze Ph.D. Thesis defense, December 8, 2003 SCS Mihai.

71

MIPS gcc CodeLOOP:

L1: beq $v0,$a1,EXIT ; X[j].r == i

L2: addiu $v1,$v1,20 ; &X[j+1].r

L3: lw $v0,0($v1) ; X[j+1].r

L4: addiu $a0,$a0,1 ; j++

L5: bne $v0,$a3,LOOP ; X[j+1].r == 0xF

EXIT:

L1! L2 ! L3 ! L5 ! L14-instructions loop-carried dependence

for (j = 0; X[j].r != 0xF; j++)

if (X[j].r == i)

break;

Page 72: Spatial Computation Thesis committee: Seth Goldstein Peter Lee Todd Mowry Babak Falsafi Nevin Heintze Ph.D. Thesis defense, December 8, 2003 SCS Mihai.

72

If Branch Prediction Correct

L1! L2 ! L3 ! L5 ! L1Superscalar is issue-limited!2 cycles/iteration sustained

for (j = 0; X[j].r != 0xF; j++)

if (X[j].r == i)

break;

LOOP:

L1: beq $v0,$a1,EXIT ; X[j].r == i

L2: addiu $v1,$v1,20 ; &X[j+1].r

L3: lw $v0,0($v1) ; X[j+1].r

L4: addiu $a0,$a0,1 ; j++

L5: bne $v0,$a3,LOOP ; X[j+1].r == 0xF

EXIT:

Page 73: Spatial Computation Thesis committee: Seth Goldstein Peter Lee Todd Mowry Babak Falsafi Nevin Heintze Ph.D. Thesis defense, December 8, 2003 SCS Mihai.

73

Critical Path with Prediction

Loads are notspeculative

for (j = 0; X[j].r != 0xF; j++)

if (X[j].r == i)

break;

Page 74: Spatial Computation Thesis committee: Seth Goldstein Peter Lee Todd Mowry Babak Falsafi Nevin Heintze Ph.D. Thesis defense, December 8, 2003 SCS Mihai.

74

Prediction + Load Speculation

~4 cycles!Load not pipelined(self-anti-dependence)

ack edge

for (j = 0; X[j].r != 0xF; j++)

if (X[j].r == i)

break;

Page 75: Spatial Computation Thesis committee: Seth Goldstein Peter Lee Todd Mowry Babak Falsafi Nevin Heintze Ph.D. Thesis defense, December 8, 2003 SCS Mihai.

75

OOO Pipe Snapshot

IF DA EX WB CT

L5L1L2

L1L2L3L4

L1L3

L5L3L2

L1L3L3

registerrenaming

LOOP:

L1: beq $v0,$a1,EXIT ; X[j].r == i

L2: addiu $v1,$v1,20 ; &X[j+1].r

L3: lw $v0,0($v1) ; X[j+1].r

L4: addiu $a0,$a0,1 ; j++

L5: bne $v0,$a3,LOOP ; X[j+1].r == 0xF

EXIT:

Page 76: Spatial Computation Thesis committee: Seth Goldstein Peter Lee Todd Mowry Babak Falsafi Nevin Heintze Ph.D. Thesis defense, December 8, 2003 SCS Mihai.

76

Unrolling?

for(i = 0; i < 64; i++) {

for (j = 0; X[j].r != 0xF; j+=2) {

if (X[j].r == i)

break;

if (X[j+1].r == 0xF)

break;

if (X[j+1].r == i)

break;

}

Y[i] = X[j].q;

}

when 1 iteration

back back to talk

Page 77: Spatial Computation Thesis committee: Seth Goldstein Peter Lee Todd Mowry Babak Falsafi Nevin Heintze Ph.D. Thesis defense, December 8, 2003 SCS Mihai.

77

Ideal Architecture

High-ILPcomputation

Low ILP computation+ OS+ VM

CPU ASH

Memory

back