Philip Brisk
-
Upload
abel-spence -
Category
Documents
-
view
45 -
download
0
description
Transcript of Philip Brisk
Optimal Polynomial-Time Optimal Polynomial-Time Interprocedural Register Allocation Interprocedural Register Allocation
for High-Level Synthesis Using for High-Level Synthesis Using SSA FormSSA Form
Philip Brisk
Ajay K. Verma
Paolo Ienne
csdacsda
OutlineOutline
Register Allocation Overview
Interprocedural Register Allocation
Related Work
SSA Form With Launch and Landing Pads
Optimal Solution
Experimental Results
Conclusion
Modeling Register Allocation Modeling Register Allocation
For Procedure Pi…Build interference graph Gi = (Vi, Ei)
Vi – One vertex for each variableEi – Edge between each pair of interfering
variablesTwo variables interfere if their lifetimes overlap
Compute the chromatic number χ(Gi)Color assignment = Register assignmentNP-Complete in general
Local Interferences Local Interferences
Local Interferences – Single ProcedureOverlapping lifetimesStatic Single Assignment (SSA) Form
Interference graph is chordal
X
Y
X
Z
Y
Z
Y
X
ZX
Z
Y
Global Interferences Global Interferences
Global InterferencesVariable V is live across a call to procedure PV interferes with EVERY local variable in P
And all variables in all procedures reachable from P
Must consider all paths through the Call Graph
Main:
V
Call P
V
P QMainP:
…
Call Q
…
Q:
…
Global Interferences and Global Interferences and Recursion Recursion
Fact: No register can hold a local variable across a recursive
function callRuntime stack is requiredSome exceptions (e.g. static local variables)
Ignored here
Call Graph Compute strongly connected components (SCCs) Collapse each SCC into a single node Resulting “Augmented Component Graph” is acyclic
Interprocedural Register Interprocedural Register Allocation Allocation
Interprocedural Interference Graph (IIG)Undirected graph G = (V, E)V – All variables in all proceduresE – Local AND global interferencesCompute chromatic number χ(G)
Related Work Related Work
Interprocedural Register Allocation in HLSColor IIG with heuristic [Vemuri et al., TODAES
’02]IIG is largePolynomial heuristics are still slow
Scalable Approach [Beidas and Zhu, ASP-DAC ’05]Color each procedure individually
Use any heuristic you wantUse any intermediate representation you want
Propagate global interferences at call points IIG is never built
Contribution Contribution
Interprocedural register allocationOptimal, polynomial-time algorithmScalable
IIG is never built If built, it would be chordal
Each Procedure colored individuallySSA Form – interference graph is chordal
Special case of [Beidas and Zhu, ASP-DAC ’05]Top-down color propagationNovel SSA-based intermediate representationChordal color assignment (with offset)
Preallocation of Global Registers Preallocation of Global Registers
Global registers hold variables that are live across procedure calls How many do we need?
Pi – Procedure
ck – Call Point
Pi ck Pj
Procedure CallP – Set of Procedures in
App.
L(ck) – Set of variables live across ck
ck : Call Pj
…
Preallocation of Global Registers Preallocation of Global Registers
δk = δi + |L(ck)|
δi = MAX {δk}
Pi
δ1…
Compute: δ – Number of variables live… At the entry of a procedure Across a call point
δ2
δm
1 ≤ k ≤ m
Procedure: Pi
ck: Call …
(δi is known)
L(ck)…
(i.e. Over all points that call Pi)
Example Example P1
c7 c8 c9 c10 c11
P2
P3
P4
c12 c13
P5
c14
P6
i δi
0P1
P2 0
P3 0
P4 0
P5 0
P6 0
c7 0
c8 0
c9 0
c10 0
c11 0
c12 0
c13 0
c14 0
ci |L(ci)|
c7 1
c8 2
c9 3
c10 2
c11 5
c12 3
c13 3
c14 2
δ1 = 0
P1
0P1
P1
0P1
c7
c7 1
1
δ7 = |L(c7)| + δ1
δ7 = 1 + 0 = 1
c7 1
c8
2
c8 2
δ8 = |L(c8)| + δ1
δ8 = 2 + 0 = 2
c9
c8 2
3
c9 3
δ9 = |L(c9)| + δ1
δ9 = 3 + 0 = 3
c10
2
c10 2
c9 3
δ10 = |L(c10)| + δ1
δ10 = 2 + 0 = 2
c11
5
c11 5
c10 2
δ11 = |L(c11)| + δ1
δ11 = 5 + 0 = 5
c7 c8
P2
0P1
P2 2
c7 1
c8 2
δ2 = MAX{δ7, δ8}δ2 = MAX{1, 2} = 2
c11 5
c9
P3
P2 2
c9 3
P3 3
δ3 = MAX{δ9}δ3 = MAX{3} = 3
c10
P4
c10 2
P3 3
P4 2
δ4 = MAX{δ10}δ4 = MAX{2} = 2
P2
c123
P4 2
P2 2
c12 5δ12 = |L(c12)| + δ2
δ12 = 3 + 2 = 5
P3
c13
3
c12 5
P3 3
c13 6δ13 = |L(c13)| + δ3
δ13 = 3 + 3 = 6
P4
c14
2
c13 6
P4 2
c14 4
δ14 = |L(c14)| + δ4
δ14 = 2 + 2 = 4
c12 c13
P5
c14 4
c12 5
c13 6
P5 6
δ5 = MAX{δ12, δ13}δ5 = MAX{5, 6} = 6
c11
c14
P6
P5 6
P6 5
c11 5
c14 4
δ6 = MAX{δ11, δ14}δ6 = MAX{5, 4} = 5
P6 5
Preallocation of Global Registers Preallocation of Global Registers
When Procedure Pi is called.. At most δi variables live across calls leading to Pi
Holds for every path in the call graph
How to ensure that all variables live across calls leading to Pi are assigned to the right register?
N = MAX {δi} – Number of global registers allocatedPi P T = {T1, ….,
TN}
Launch and Landing Pads Launch and Landing Pads
Procedure Pi calls Pj; (m = δi) Assign variables live across calls leading to Pi to
T1…Tm
Let ck be the call point; n = |L(ck)|
Launch Pad Parallel copy placed before the call
(Tm+1…Tm+n) ψ(L(ck))
Landing Pad Copy the values back after the call
L(ck) ψ((Tm+1…Tm+n))
Theoretical Consequences of Theoretical Consequences of Launch and Landing PadsLaunch and Landing Pads
Theorem: All global interferences involve at least one global register
Corollary: Local variables in distinct procedures do not interfere
Corollary: No local variable in “main” has a global interference
Theorem: Every variable defined locally in Pi (m = δi)
Interferes with global registers T1…Tm Does NOT interfere with global registers Tm+1, … TN
=> Can assign local vars in Pi to global registers Tm+1, … TN
Reducing the Chromatic Number Reducing the Chromatic Number
Procedure: A
V …Call BW …… VX …… WY …… XCall B… Y
Procedure: B
Z …… Z
V
X
W
Y
Z
V
Y
W
X
Chromatic Number = 3
Reducing the Chromatic Number Reducing the Chromatic Number
Procedure: A
V …T1 Ψ(V)Call BV Ψ-1(T1)W …… VX …… WY …… XT1 Ψ(Y)Call BY Ψ-1(T1)… Y
X
V
W
T1Z
Procedure: B
Z …… ZV
Y
W
X
T1
V
T1
Y
Chromatic Number = 2
Characterizing the IIGCharacterizing the IIG
Theorem: T is a clique in the IIG
Theorem: IIG is chordal
Theorem:
Chromatic Number of the IIG is: R = MAX{δi + χ(Gi)}
Pi P
ExampleExample
T1 T2 T3 T4 T5 T6
CLIQUE
N = 6
G1
δ1 = 0
G2
δ2 = 2
G3
δ3 = 3
G4
δ4 = 2
G5
δ3 = 6
G6
δ6 = 5
Global interference
Tj interferes with each local variable in Gi
Coloring AlgorithmColoring Algorithm
1. Use SSA+LLP Form, but DON’T build the IIG
2. For Pi colors in the range 1..δi are unavailable
3. Color the local (chordal) interference graph G i of Pi
Complexity: O(Vi + Ei)
4. For each vertex in Pi, replace color c with c + δi
Complexity: O(Vi)
ExperimentsExperiments
Applications taken from Mediabench and MiBench Written in C Compiled Using Machine SUIF
Optimal color assignment
Compare to heuristics Color Palette Propagation
Top-Down, Bottom-Up [Beidas and Zhu, ASP-DAC’05] Heuristic Color Assignment [Matula and Beck, JACM ’83]
Registers Allocated Registers Allocated (Normalized to Optimal)(Normalized to Optimal)
0
0.5
1
1.5
2
2.5
Optimal Top-Down Bottom-Up
Runtime Runtime (Normalized to Optimal)(Normalized to Optimal)
0
50
100
150
200
250
Optimal Top Down Bottom-Up
Runtime of PegwitRuntime of Pegwit(Normalized to Optimal)(Normalized to Optimal)
0
1000
2000
3000
4000
5000
6000
pegwit
Optimal Top-Down Bottom-Up
LimitationsLimitations
Global Variables Interfere with all variables in the programLifetime can still be analyzed
Static Local Variables Initialized on first accessHold their values across function calls
Function PointersResolution is NP-Complete
ConclusionConclusion
Inteprocedural register allocation in HLSOptimal, polynomial-time algorithm
Uses SSA Form + Launch/Landing PadsIIG is a chordal graphScalable – no need to build IIGSignificantly faster than sub-optimal heuristics
A few limitationsGlobal variables, local static variablesFunction pointers
Resolution is NP-Complete
Related Work Related Work
Register Allocation in HLSClique Partitioning/Coloring Problem
[Tseng and Siewiorek, ’86]
Scheduled DFGs – Interval Graphs [Kurdahi and Parker, ’87]
Scheduled Cyclic DFGs – Circular Arc Graphs (NP-Complete) [Stok, ’92]
Restrictions on Variable Lifetimes – Chordal Graphs
[Springer and Thomas, ’94]
Static Single Assignment Form – Chordal Graphs [Brisk et al. 2005/6], [Hack and Goos, 2005/6],
[Bouchez et al. 2005]