Post on 11-Feb-2017
Optimal Chain Rule Placementfor Instruction Selectionbased on SSA Graphs
Stefan Schäfer, Bernhard Scholz(stefans|scholz)@it.usyd.edu.au
School of IT, University of Sydney
Outline● Related Work
● Motivation (Instruction Selection based on SSA Form)
● Chain Rule Placement
● Implementation
● Results
● Conclusion
Instruction Selection based on SSA Graphs (1)
SourceProgram
CompilerFrontEnd
IntermediateRepresentationin SSA Form
CompilerBackEnd Target
Program
MachineIndependent
Optimisations
CodeSelection
InstructionScheduling
RegisterAllocation
Related Work (1)● Tree Pattern Matching
C. Fraser, R. Henry, T. ProebstingBURG – Fast Optimal Instruction Selection and Tree ParsingACM SIGPLAN Notices 27(4):6876 (1992)
– Works fine with trees (expressions)
Related Work (1)● Tree Pattern Matching
C. Fraser, R. Henry, T. ProebstingBURG – Fast Optimal Instruction Selection and Tree ParsingACM SIGPLAN Notices 27(4):6876 (1992)
– Works fine with trees (expressions)– Problem: control flow graphs are usually directed acyclic graphs
Related Work (1)● Tree Pattern Matching
C. Fraser, R. Henry, T. ProebstingBURG – Fast Optimal Instruction Selection and Tree ParsingACM SIGPLAN Notices 27(4):6876 (1992)
– Works fine with trees (expressions)– Problem: control flow graphs are usually directed acyclic graphs
● Code Selection for DAGs
M. A. Ertl, Optimal Code Selection in DAGs, Proceedings of POPL 1999
Related Work (1)● Tree Pattern Matching
C. Fraser, R. Henry, T. ProebstingBURG – Fast Optimal Instruction Selection and Tree ParsingACM SIGPLAN Notices 27(4):6876 (1992)
– Works fine with trees (expressions)– Problem: control flow graphs are usually directed acyclic graphs
● Code Selection for DAGs
M. A. Ertl, Optimal Code Selection in DAGs, Proceedings of POPL 1999
– DAGMatching is NPcomplete.
Related Work (2)● Code Selection based on SSA Graphs
E. Eckstein, O. König, B. ScholzCode Instruction Selection based on SSA GraphsSCOPES 2003, Volume 2826 of Lecture Notes on Computer Science
– Introduced a (heuristical) code selection techniques for DAGs– Costoptimal derivation of a graph grammar for a given SSA graph
– Chain rules used for type conversion, but optimal placement unaddressed– optimal means: costminimal for a given cost metric
Instruction Selection based on SSA Graphs (2)b1 cast [3014,1]
[3014,1]
[293,1]
[292,1]
[1712,1]
[1,1][26,1]
[59.2,1]
[28,1]
[25,1]
[34,1]
[124,1]
[94,1]
b2
b7
b6b8
b3
b4
b10
b9
b14 add
b11 add
b5
b12 add
Instruction Selection based on SSA Graphs (2)
b1 cast
b14 add b11 add b12 add
Instruction Selection based on SSA Graphs (2)
b1 cast
b14 add b11 add b12 add
reg ::= add(reg, reg) [10.0,1.0]sreg ::= cast(sreg) [10.0,1.0]reg ::= sreg [10.0,1.0]sreg ::= reg [10.0,1.0]
Instruction Selection based on SSA Graphs (2)
reg reg reg
sreg::=cast(sreg)
reg::=add(reg,reg) reg::=add(reg,reg) reg::=add(reg,reg)
b1 cast
b14 add b11 add b12 add
sreg
reg reg reg
reg ::= add(reg, reg) [10.0,1.0]sreg ::= cast(sreg) [10.0,1.0]reg ::= sreg [10.0,1.0]sreg ::= reg [10.0,1.0]
Instruction Selection based on SSA Graphs (2)b1 cast [3014,1]
[3014,1]
[293,1]
[292,1]
[1712,1]
[1,1][26,1]
[59.2,1]
[28,1]
[25,1]
[34,1]
[124,1]
[94,1]
b2
b7
b6b8
b3
b4
b10
b9
b14 add
b11 add
b5
b12 add
strategy time space tradeoff 1:4def (b
1) 30140 1 6028.8
reg ::= add(reg, reg) [10.0,1.0]sreg ::= cast(sreg) [10.0,1.0]reg ::= sreg [10.0,1.0]sreg ::= reg [10.0,1.0]
Instruction Selection based on SSA Graphs (2)b1 cast [3014,1]
[3014,1]
[293,1]
[292,1]
[1712,1]
[1,1][26,1]
[59.2,1]
[28,1]
[25,1]
[34,1]
[124,1]
[94,1]
b2
b7
b6b8
b3
b4
b10
b9
b14 add
b11 add
b5
b12 add
strategy time space tradeoff 1:4def (b
1) 30140 1 6028.8
uses (b11
, b12
, b14
) 19300 3 3862.4
reg ::= add(reg, reg) [10.0,1.0]sreg ::= cast(sreg) [10.0,1.0]reg ::= sreg [10.0,1.0]sreg ::= reg [10.0,1.0]
Instruction Selection based on SSA Graphs (2)b1 cast [3014,1]
[3014,1]
[293,1]
[292,1]
[1712,1]
[1,1][26,1]
[59.2,1]
[28,1]
[25,1]
[34,1]
[124,1]
[94,1]
b2
b7
b6b8
b3
b4
b10
b9
b14 add
b11 add
b5
b12 add
strategy time space tradeoff 1:4def (b
1) 30140 1 6028.8
uses (b11
, b12
, b14
) 19300 3 3862.4def/uses 19300 1 3862.4
reg ::= add(reg, reg) [10.0,1.0]sreg ::= cast(sreg) [10.0,1.0]reg ::= sreg [10.0,1.0]sreg ::= reg [10.0,1.0]
Instruction Selection based on SSA Graphs (2)b1 cast [3014,1]
[3014,1]
[293,1]
[292,1]
[1712,1]
[1,1][26,1]
[59.2,1]
[28,1]
[25,1]
[34,1]
[124,1]
[94,1]
b2
b7
b6b8
b3
b4
b10
b9
b14 add
b11 add
b5
b12 add
strategy time space tradeoff 1:4def (b
1) 30140 1 6028.8
uses (b11
, b12
, b14
) 19300 3 3862.4def/uses 19300 1 3862.4
optimal 3510placed at b
5, b
9, b
10
reg ::= add(reg, reg) [10.0,1.0]sreg ::= cast(sreg) [10.0,1.0]reg ::= sreg [10.0,1.0]sreg ::= reg [10.0,1.0]
Instruction Selection based on SSA Graphs (2)b1 cast [3014,1]
[3014,1]
[293,1]
[292,1]
[1712,1]
[1,1][26,1]
[59.2,1]
[28,1]
[25,1]
[34,1]
[124,1]
[94,1]
b2
b7
b6b8
b3
b4
b10
b9
b14 add
b11 add
b5
b12 add
strategy time space tradeoff 1:4def (b
1) 30140 1 6028.8
uses (b11
, b12
, b14
) 19300 3 3862.4def/uses 19300 1 3862.4
optimal 3510 1placed at b
5, b
9, b
10b
1
reg ::= add(reg, reg) [10.0,1.0]sreg ::= cast(sreg) [10.0,1.0]reg ::= sreg [10.0,1.0]sreg ::= reg [10.0,1.0]
Instruction Selection based on SSA Graphs (2)b1 cast [3014,1]
[3014,1]
[293,1]
[292,1]
[1712,1]
[1,1][26,1]
[59.2,1]
[28,1]
[25,1]
[34,1]
[124,1]
[94,1]
b2
b7
b6b8
b3
b4
b10
b9
b14 add
b11 add
b5
b12 add
strategy time space tradeoff 1:4def (b
1) 30140 1 6028.8
uses (b11
, b12
, b14
) 19300 3 3862.4def/uses 19300 1 3862.4
optimal 3510 1 704placed at b
5, b
9, b
10b
1b
5, b
7
reg ::= add(reg, reg) [10.0,1.0]sreg ::= cast(sreg) [10.0,1.0]reg ::= sreg [10.0,1.0]sreg ::= reg [10.0,1.0]
SSA Form● Single Static Assignment form
● There is at most one assignment to each variable.
● Each definition of a variable is distinct.
SSA Form● Single Static Assignment form
● There is at most one assignment to each variable.
● Each definition of a variable is distinct.
● Multiple definitions have to be resolved:
– if (e) b=32 else b=42; > if (e) b1=32 else b
2=42;
● Further uses induce φfunctions:
– a=b; > a=φ(b1,b
2);
● SSA graphs as intermediate data flow representation in SSA form
Chain Rule Placement● Map the CFG to a network
● Reduce the network for each definition and nonterminal(a definition node dominates all of its users)
● Find a minimum cut for each reduced network
Mapping to a Network
d 10
v 10u 10
Mapping to a Network
dn
dx
10
d 10
∞
tnt d
v 10u 10 un
ux
10
vn
vx
10
∞
∞ ∞
Reducing each Network● Done for each definition d and nonterminal
● Starts in each user u:
● Case 1: u is not a φnode
Reducing each Network● Done for each definition d and nonterminal
● Starts in each user u:
● Case 1: u is not a φnode
– All nodes an all acyclic paths from d to u are dominated by d– All those nodes added to reduced network
Reducing each Network● Done for each definition d and nonterminal
● Starts in each user u:
● Case 2: u is a φnode, all v ∈ preds(u) is dominated by d
r
u = (..., w1, ..., w
2 ...)
w2= op’ (...)
w2w1
w1= op (...)
v1 v2
Reducing each Network● Done for each definition d and nonterminal
● Starts in each user u:
● Case 2: u is a φnode, all v ∈ preds(u) are dominated by d
– All nodes an all acyclic paths from d to v are dominated by d– All those nodes and u added to reduced network
r
u = (..., w1, ..., w
2 ...)
w2= op’ (...)
w2w1
w1= op (...)
v1 v2
Reducing each Network● Done for each definition d and nonterminal
● Starts in each user u:
● Case 3: u is a φnode, any v ∈ preds(u) is not dominated by d
r
u = (..., d1, ..., d
2 ...)
d2= op’ (...)
d2d1
d1= op (...)
x1x2
y
Reducing each Network● Done for each definition d and nonterminal
● Starts in each user u:
● Case 3: u is a φnode, any v ∈ preds(u) is not dominated by d
– Stop traversal for all users of d and add only d to reduced networkr
u = (..., d1, ..., d
2 ...)
d2= op’ (...)
d2d1
d1= op (...)
x1x2
y
Reducing each Network● Done for each definition d and nonterminal
● Starts in each user u:
● Case 3: u is a φnode, any v ∈ preds(u) is not dominated by d
– Stop traversal for all users of d and add only d to reduced network
not costoptimal butdoes not occur very often:
2264628 nodes94183 φusescase 3 occurs 1076 times
r
u = (..., d1, ..., d
2 ...)
d2= op’ (...)
d2d1
d1= op (...)
x1x2
y
Implementation
GraphGrammar
Code Basein L
Implementation
GraphGrammar
CodeGeneratorGenerator
Source forCode
Generatorin L
Code Basein L
Implementation
GraphGrammar
Source forCode
Generatorin L
Compilerfor L
CodeGenerator
in L
PBQPLibraryfor L
Code Basein L
CodeGeneratorGenerator
Implementation
GraphGrammar
Source forCode
Generatorin L
Compilerfor L
CodeGenerator
in L
Run
Input Program inSSA Form
Base RuleMatching
PBQPLibraryfor L
Code Basein L
CodeGeneratorGenerator
CompleteMatching
Chain RulePlacement
Costs (Spec2000, Time:Space 1:4)
168.wupw
ise171.sw
im172.m
grid173.applu175.vpr176.gcc177.m
esa179.art181.m
cf183.equake186.crafty188.am
mp
197.parser200.sixtrack252.eon254.gap255.vortex256.bzip2300.tw
olf301.apsi
0
10
20
30
40
50
60
70
80
90
100
Use
Def
Def-Use
Min-Cut
%
Costs (MiBench, Time:Space 1:4)
bitcntscjpegcrcdijkstradjpegfft gs ispellloutpatriciapgpqsortraw
caudioraw
daudiorijndaelsearchshasusantiff2bwtiff2rgbatiffdithertiffm
ediantoastuntoast
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
Use
Def
Def-Use
Min-Cut
%
Execution Times (Spec2000)
168.wupw
ise171.sw
im172.m
grid173.applu175.vpr176.gcc177.m
esa179.art181.m
cf183.equake186.crafty188.am
mp
197.parser200.sixtrack252.eon254.gap255.vortex256.bzip2300.tw
olf301.apsi
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
Misc
Min Cut
NetworkPBQP
Program
% T
ime
Execution Times (MiBench)
bitcntscjpegcrcdijkstradjpegfft gs ispellloutpatriciapgpqsortraw
caudioraw
daudiorijndaelsearchshasusantiff2bwtiff2rgbatiffdithertiffm
ediantoastuntoast
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
Misc
Min Cut
Network
PBQP
Program
% T
ime
Contributions● Contributed to code selection based on SSAGraphs
● Main Contributions:
– Formally addressed the unsolved problem of placing chain rules optimally– Introduced an efficient and effective algorithm to place chain rules
optimally with respect to an arbitrary cost metric– Implemented a free, opensource code generator generator, enhancing rule
matching with chain rule placement– Proved the correctness of our algorithm– Conducted experiments with Spec2000 and MiBench suites
Thank you for your attention!
Any questions or comments?