University of Michigan Electrical Engineering and Computer Science 1 Systematic Register Bypass...
-
Upload
thomas-bradley -
Category
Documents
-
view
213 -
download
0
Transcript of University of Michigan Electrical Engineering and Computer Science 1 Systematic Register Bypass...
1 University of MichiganElectrical Engineering and Computer Science
Systematic Register Bypass Customizationfor Application-Specific Processors
Kevin Fan, Nathan Clark, Michael Chu,K. V. Manjunath, Rajiv Ravindran,Mikhail Smelyanskiy, Scott Mahlke
Advanced Computer Architecture Laboratory
University of Michigan
2 University of MichiganElectrical Engineering and Computer Science
Introduction
• Bypass network allows for data forwarding to reduce pipeline stalls
• Full bypass: any FU can bypass from any other FU and from any pipeline stage
# paths = (issue width)2 bypassable stages input ports per FU output ports per FU
3 University of MichiganElectrical Engineering and Computer Science
Bypass Path Utilization
• As processors get wider and deeper, cost of bypass network increases quadratically [Palacharla ’98]
• Only few bypasses are heavily utilized
0.65
0.7
0.75
0.8
0.85
0.9
0.95
1
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
Percent Utilization
No
rma
lize
d C
um
ula
tiv
e N
um
be
r o
f B
yp
as
se
s
4 University of MichiganElectrical Engineering and Computer Science
Designing a Partial Bypass Network
• Reduce hardware at the cost of runtime• Design a sparse bypass network while minimizing
performance impact• Challenges:
– Reconcile different requirements for different program regions– Interplay between different bypass paths– Huge search space, exponential number of possible configurations
5 University of MichiganElectrical Engineering and Computer Science
Spacewalking Partial Bypass
Bypasses(Ranked by Importance)
…
MostUseful
LeastUseful
EvaluateNew Machine
Replace BypassIf Performance
Drops Too Much Remove theleast usefulbypass
• Profile-guided Pareto ascent– Rank bypass paths by importance– Remove least important path and evaluate performance impact– Update rankings with new statistics– Repeat until performance degrades too far
X
Program
Usagestatistics
Pareto machines
Cost
1P
erf
orm
an
ce
Cost/Performance
6 University of MichiganElectrical Engineering and Computer Science
Ranking Bypass Paths
+1 +2Equivalent bypass paths
% utilizationoffload potentialImportance =
cycles bypass was usedtotal cycles
redundant cyclescycles bypass was used
Bypass path
7 University of MichiganElectrical Engineering and Computer Science
• Uses more bypasses than necessary
• Not all edges require 1-stage bypass
M3I2
A Closer Look
Ma
Ib
Id Ie
If
Ic
I1
Time I1 I2 M3
0 a
1 b c
2 d e
3 f
Critical edges Time I1 I2 M3
0 a
1 b c
2 d
3 f e
M3I2I1
Time I1 I2 M3
0 a
1 b c
2 d
3 f e
Time I1 I2 M3
0 a
1 b
2 d c
3 f e
Time I1 I2 M3
0 a
1 b c
2 d e
3 f
Time I1 I2 M3
0 a
1 b c
2 d e
3 f
8 University of MichiganElectrical Engineering and Computer Science
Time I1 I2 M3
0 a
1 b
2
3
4
5
Compiling for Partial Bypass
• Difficulties:– Latencies between
operations vary depending on resource assignments
– Current assignment will affect future decisions
• Naïve scheduler will arbitrarily place Op c
• Need to provide resource hints to the scheduler to break ties
Time I1 I2 M3
0 a
1 b
2 d c
3 f e
M3I2I1
Optimal:
Scheduler:
Ma
Ib
Id Ie
If
Ic
1,2
1,21,2
1,2
1,2
1,2
Possibleedge latencies
IcIc
Id Ie
If
Id Ie
IfTime I1 I2 M3
0 a
1 b
2 c? c?
3
4
5
Time I1 I2 M3
0 a
1 b
2 c
3 d? d?
4
5
Time I1 I2 M3
0 a
1 b
2 c
3 d
4 e? e?
5
Time I1 I2 M3
0 a
1 b
2 c
3 d
4 e
5 f
9 University of MichiganElectrical Engineering and Computer Science
BUG Preference Algorithm
• Perform pre-scheduling pass over the DFG
• Bottom-Up Greedy algorithm based on [Ellis ’85]
• Traverse DFG, critical paths first
• Select bypass paths to achieve earliest completion time for each operation
• Take into account time to:– Get inputs– Execute– Send outputs to consumers
10 University of MichiganElectrical Engineering and Computer Science
Ma
Ib
Id Ie
If
Ic
Id
Ma
Ib
Ie
If
Ic
{1,2}
{1,2}
{1,2}
{3} Ma
Ib
Id Ie
If
Ic
{1,2}
{1}
{1,2}
BUG Example
• Place ops b, d, f on unit 1 since M bypasses to it• Place ops c, e on unit 2 since resource is free
M3I2I1
Ma
Ib
Id Ie
If
Ic {2}
{2}
Time I1 I2 M3
0 a
1 b
2 d c
3 f e
Ma
Ib
Id Ie
If
Ic
11 University of MichiganElectrical Engineering and Computer Science
Bypass Cost Savings
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Benchmark
Rel
ativ
e C
ost 1
0.95
0.9
0.8
0.7
RelativePerformance
12 University of MichiganElectrical Engineering and Computer Science
Pareto-optimal Machines
djpeg (5-wide) g721dec (9-wide)
1
1.2
1.4
1.6
1.8
2
0 5000 10000 15000 20000 25000 30000 35000 40000
Cost (gates)
Re
lati
ve
Dy
na
mic
Cy
cle
Co
un
t
1
1.2
1.4
1.6
1.8
2
0 10000 20000 30000 40000 50000
Cost (gates)
Re
lati
ve
Dy
na
mic
Cy
cle
Co
un
t
BUG PreferencesILP Preferences
13 University of MichiganElectrical Engineering and Computer Science
Ind
ivid
ua
l B
yp
as
s P
ath
s
More
Less
Bypass Usage is Variable
bfis
h
cjp
eg
djp
eg
ep
ic
un
epic
g7
21e
nc
g7
21d
ec
gsm
en
c
gsm
de
c
me
sa
mp
eg
2en
c
pe
gen
c
pe
gde
c
rast
a
raw
c
raw
d
Utilization
14 University of MichiganElectrical Engineering and Computer Science
Conclusion
• Significant bypass network cost can be saved without much performance loss
• Our approach:– Intelligent bypass spacewalking– Resource hints allow compiler to schedule code
effectively– 95% of original performance maintained when
removing 60% of utilized bypasses
• http://cccp.eecs.umich.edu