University of Michigan Electrical Engineering and Computer Science 1 Systematic Register Bypass...

14
1 University of Michigan Electrical Engineering and Computer Science Systematic Register Bypass Customization for Application-Specific Processors Kevin Fan, Nathan Clark, Michael Chu, K. V. Manjunath, Rajiv Ravindran, Mikhail Smelyanskiy, Scott Mahlke Advanced Computer Architecture Laboratory University of Michigan

Transcript of University of Michigan Electrical Engineering and Computer Science 1 Systematic Register Bypass...

Page 1: University of Michigan Electrical Engineering and Computer Science 1 Systematic Register Bypass Customization for Application-Specific Processors Kevin.

1 University of MichiganElectrical Engineering and Computer Science

Systematic Register Bypass Customizationfor Application-Specific Processors

Kevin Fan, Nathan Clark, Michael Chu,K. V. Manjunath, Rajiv Ravindran,Mikhail Smelyanskiy, Scott Mahlke

Advanced Computer Architecture Laboratory

University of Michigan

Page 2: University of Michigan Electrical Engineering and Computer Science 1 Systematic Register Bypass Customization for Application-Specific Processors Kevin.

2 University of MichiganElectrical Engineering and Computer Science

Introduction

• Bypass network allows for data forwarding to reduce pipeline stalls

• Full bypass: any FU can bypass from any other FU and from any pipeline stage

# paths = (issue width)2 bypassable stages input ports per FU output ports per FU

Page 3: University of Michigan Electrical Engineering and Computer Science 1 Systematic Register Bypass Customization for Application-Specific Processors Kevin.

3 University of MichiganElectrical Engineering and Computer Science

Bypass Path Utilization

• As processors get wider and deeper, cost of bypass network increases quadratically [Palacharla ’98]

• Only few bypasses are heavily utilized

0.65

0.7

0.75

0.8

0.85

0.9

0.95

1

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16

Percent Utilization

No

rma

lize

d C

um

ula

tiv

e N

um

be

r o

f B

yp

as

se

s

Page 4: University of Michigan Electrical Engineering and Computer Science 1 Systematic Register Bypass Customization for Application-Specific Processors Kevin.

4 University of MichiganElectrical Engineering and Computer Science

Designing a Partial Bypass Network

• Reduce hardware at the cost of runtime• Design a sparse bypass network while minimizing

performance impact• Challenges:

– Reconcile different requirements for different program regions– Interplay between different bypass paths– Huge search space, exponential number of possible configurations

Page 5: University of Michigan Electrical Engineering and Computer Science 1 Systematic Register Bypass Customization for Application-Specific Processors Kevin.

5 University of MichiganElectrical Engineering and Computer Science

Spacewalking Partial Bypass

Bypasses(Ranked by Importance)

MostUseful

LeastUseful

EvaluateNew Machine

Replace BypassIf Performance

Drops Too Much Remove theleast usefulbypass

• Profile-guided Pareto ascent– Rank bypass paths by importance– Remove least important path and evaluate performance impact– Update rankings with new statistics– Repeat until performance degrades too far

X

Program

Usagestatistics

Pareto machines

Cost

1P

erf

orm

an

ce

Cost/Performance

Page 6: University of Michigan Electrical Engineering and Computer Science 1 Systematic Register Bypass Customization for Application-Specific Processors Kevin.

6 University of MichiganElectrical Engineering and Computer Science

Ranking Bypass Paths

+1 +2Equivalent bypass paths

% utilizationoffload potentialImportance =

cycles bypass was usedtotal cycles

redundant cyclescycles bypass was used

Bypass path

Page 7: University of Michigan Electrical Engineering and Computer Science 1 Systematic Register Bypass Customization for Application-Specific Processors Kevin.

7 University of MichiganElectrical Engineering and Computer Science

• Uses more bypasses than necessary

• Not all edges require 1-stage bypass

M3I2

A Closer Look

Ma

Ib

Id Ie

If

Ic

I1

Time I1 I2 M3

0 a

1 b c

2 d e

3 f

Critical edges Time I1 I2 M3

0 a

1 b c

2 d

3 f e

M3I2I1

Time I1 I2 M3

0 a

1 b c

2 d

3 f e

Time I1 I2 M3

0 a

1 b

2 d c

3 f e

Time I1 I2 M3

0 a

1 b c

2 d e

3 f

Time I1 I2 M3

0 a

1 b c

2 d e

3 f

Page 8: University of Michigan Electrical Engineering and Computer Science 1 Systematic Register Bypass Customization for Application-Specific Processors Kevin.

8 University of MichiganElectrical Engineering and Computer Science

Time I1 I2 M3

0 a

1 b

2

3

4

5

Compiling for Partial Bypass

• Difficulties:– Latencies between

operations vary depending on resource assignments

– Current assignment will affect future decisions

• Naïve scheduler will arbitrarily place Op c

• Need to provide resource hints to the scheduler to break ties

Time I1 I2 M3

0 a

1 b

2 d c

3 f e

M3I2I1

Optimal:

Scheduler:

Ma

Ib

Id Ie

If

Ic

1,2

1,21,2

1,2

1,2

1,2

Possibleedge latencies

IcIc

Id Ie

If

Id Ie

IfTime I1 I2 M3

0 a

1 b

2 c? c?

3

4

5

Time I1 I2 M3

0 a

1 b

2 c

3 d? d?

4

5

Time I1 I2 M3

0 a

1 b

2 c

3 d

4 e? e?

5

Time I1 I2 M3

0 a

1 b

2 c

3 d

4 e

5 f

Page 9: University of Michigan Electrical Engineering and Computer Science 1 Systematic Register Bypass Customization for Application-Specific Processors Kevin.

9 University of MichiganElectrical Engineering and Computer Science

BUG Preference Algorithm

• Perform pre-scheduling pass over the DFG

• Bottom-Up Greedy algorithm based on [Ellis ’85]

• Traverse DFG, critical paths first

• Select bypass paths to achieve earliest completion time for each operation

• Take into account time to:– Get inputs– Execute– Send outputs to consumers

Page 10: University of Michigan Electrical Engineering and Computer Science 1 Systematic Register Bypass Customization for Application-Specific Processors Kevin.

10 University of MichiganElectrical Engineering and Computer Science

Ma

Ib

Id Ie

If

Ic

Id

Ma

Ib

Ie

If

Ic

{1,2}

{1,2}

{1,2}

{3} Ma

Ib

Id Ie

If

Ic

{1,2}

{1}

{1,2}

BUG Example

• Place ops b, d, f on unit 1 since M bypasses to it• Place ops c, e on unit 2 since resource is free

M3I2I1

Ma

Ib

Id Ie

If

Ic {2}

{2}

Time I1 I2 M3

0 a

1 b

2 d c

3 f e

Ma

Ib

Id Ie

If

Ic

Page 11: University of Michigan Electrical Engineering and Computer Science 1 Systematic Register Bypass Customization for Application-Specific Processors Kevin.

11 University of MichiganElectrical Engineering and Computer Science

Bypass Cost Savings

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Benchmark

Rel

ativ

e C

ost 1

0.95

0.9

0.8

0.7

RelativePerformance

Page 12: University of Michigan Electrical Engineering and Computer Science 1 Systematic Register Bypass Customization for Application-Specific Processors Kevin.

12 University of MichiganElectrical Engineering and Computer Science

Pareto-optimal Machines

djpeg (5-wide) g721dec (9-wide)

1

1.2

1.4

1.6

1.8

2

0 5000 10000 15000 20000 25000 30000 35000 40000

Cost (gates)

Re

lati

ve

Dy

na

mic

Cy

cle

Co

un

t

1

1.2

1.4

1.6

1.8

2

0 10000 20000 30000 40000 50000

Cost (gates)

Re

lati

ve

Dy

na

mic

Cy

cle

Co

un

t

BUG PreferencesILP Preferences

Page 13: University of Michigan Electrical Engineering and Computer Science 1 Systematic Register Bypass Customization for Application-Specific Processors Kevin.

13 University of MichiganElectrical Engineering and Computer Science

Ind

ivid

ua

l B

yp

as

s P

ath

s

More

Less

Bypass Usage is Variable

bfis

h

cjp

eg

djp

eg

ep

ic

un

epic

g7

21e

nc

g7

21d

ec

gsm

en

c

gsm

de

c

me

sa

mp

eg

2en

c

pe

gen

c

pe

gde

c

rast

a

raw

c

raw

d

Utilization

Page 14: University of Michigan Electrical Engineering and Computer Science 1 Systematic Register Bypass Customization for Application-Specific Processors Kevin.

14 University of MichiganElectrical Engineering and Computer Science

Conclusion

• Significant bypass network cost can be saved without much performance loss

• Our approach:– Intelligent bypass spacewalking– Resource hints allow compiler to schedule code

effectively– 95% of original performance maintained when

removing 60% of utilized bypasses

• http://cccp.eecs.umich.edu