Optimizing Compilers CISC 673 Spring 2009 Instruction Scheduling
Optimizing Compilers CISC 673 Spring 2011 Static Single Assignment II
description
Transcript of Optimizing Compilers CISC 673 Spring 2011 Static Single Assignment II
UUNIVERSITYNIVERSITY OFOF D DELAWARE ELAWARE • • C COMPUTER & OMPUTER & IINFORMATION NFORMATION SSCIENCES CIENCES DDEPARTMENTEPARTMENT
Optimizing CompilersCISC 673
Spring 2011Static Single Assignment II
John Cavazos
University of Delaware
UUNIVERSITYNIVERSITY OFOF D DELAWARE ELAWARE • • C COMPUTER & OMPUTER & IINFORMATION NFORMATION SSCIENCES CIENCES DDEPARTMENTEPARTMENT
2
What is SSA?
• Many data-flow problems have been formulated
• To limit number of analyses, use single analysis to perform multiple transformations SSA
• Compiler optimization algorithms enhanced or enabled by SSA: Constant propagation Dead code elimination Global value numbering Partial redundancy elimination Register allocation
UUNIVERSITYNIVERSITY OFOF D DELAWARE ELAWARE • • C COMPUTER & OMPUTER & IINFORMATION NFORMATION SSCIENCES CIENCES DDEPARTMENTEPARTMENT
3
SSA Construction Algorithm (High-level sketch)
1. Insert -functions
2. Rename values
UUNIVERSITYNIVERSITY OFOF D DELAWARE ELAWARE • • C COMPUTER & OMPUTER & IINFORMATION NFORMATION SSCIENCES CIENCES DDEPARTMENTEPARTMENT
4
SSA Construction Algorithm (Less high-level sketch)
• Insert -functions at every join for every name
• Solve reaching definitions
• Rename each use to def that reaches it (will be unique)
What’s wrong with this approach
• Too many -functions!
To do better, we need a more complex approach
Builds maximal SSA
UUNIVERSITYNIVERSITY OFOF D DELAWARE ELAWARE • • C COMPUTER & OMPUTER & IINFORMATION NFORMATION SSCIENCES CIENCES DDEPARTMENTEPARTMENT
5
1. Insert -functions
a.) calculate dominance frontiers
b.) find global namesfor each name, build list of blocks that
define it
c.) insert -functions
SSA Construction Algorithm (Less high-level sketch)
UUNIVERSITYNIVERSITY OFOF D DELAWARE ELAWARE • • C COMPUTER & OMPUTER & IINFORMATION NFORMATION SSCIENCES CIENCES DDEPARTMENTEPARTMENT
6
Insert -functions
global name n
worklist ← Block(n) // blocks in which n is assigned block b ∈ worklist
block d in b’s dominance frontierinsert a -function for n in dadd d to worklist
UUNIVERSITYNIVERSITY OFOF D DELAWARE ELAWARE • • C COMPUTER & OMPUTER & IINFORMATION NFORMATION SSCIENCES CIENCES DDEPARTMENTEPARTMENT
7
Computing Dominance Frontiers
•Only join points are in DF(n) for some n
•Simple algorithm for computing dominance frontiers
For each join point x (i.e., |preds(x)| > 1) For each CFG predecessor of xRun up to IDOM(x ) in dominator tree,
add x to DF(n) for each n between x and IDOM(x )
UUNIVERSITYNIVERSITY OFOF D DELAWARE ELAWARE • • C COMPUTER & OMPUTER & IINFORMATION NFORMATION SSCIENCES CIENCES DDEPARTMENTEPARTMENT
8
Dominance Frontiers
n B0 B1 B2 B3 B4 B5 B6 B7
DOM(n) 0 0,1 0,1,2 0,1,3 0,1,3,4
0,1,3,5
0,1,3,6
0,1,7
IDOM(n) -- 0 1 1 3 3 3 1
DF(n) -- -- 7 7 6 6 7 1
For each join point x
For each CFG pred of x
Run to IDOM(x ) in dom tree,
add x to DF(n) for each n
between x and IDOM(x )
B1
B2 B3
B4 B5
B6
B7
B0
B1
B2 B3
B4 B5
B6
B7
B0
Flow Graph Dominance Tree
UUNIVERSITYNIVERSITY OFOF D DELAWARE ELAWARE • • C COMPUTER & OMPUTER & IINFORMATION NFORMATION SSCIENCES CIENCES DDEPARTMENTEPARTMENT
Dominance Frontiers & -Function Insertion
Dominance Frontiers
• A definition at n forces a -function at m iff n DOM(m) but n DOM(p) for some p preds(m)
• DF(n ) is fringe just beyond region n dominates
B1
B2 B3
B4 B5
B6
B7
B0
0 1 2 3 4 5 6 7DOM 0 0,1 0,1,2 0,1,3 0,1,3,4 0,1,3,5 0,1,3,6 0,1,7DF – – 7 7 6 6 7 1
in 1 forces -function in DF(1) = Ø (halt )
x ...
x (...)• DF(4) is {6}, so in 4 forces -function
in 6x (...) in 6 forces -function in DF(6) = {7}
x (...)
in 7 forces -function in DF(7) = {1}
For each assignment, we insert the -functions
a (a,a)b (b,b)c (c,c)d (d,d)y a+bz c+di i+1
B7
i > 100i ...B0
b ...c ...d ...
B2
a (a,a)b (b,b)c (c,c)d (d,d)i (i,i) a ...c ...
B1
a ...d ...
B3
d ...B4 c ...
B5
d (d,d)c (c,c)b ...B6
i > 100
With all the -functions
• Lots of new ops
• Renaming is next
Assume a, b, c, & d defined before B0
Excluding local names avoids ’s for y & z
UUNIVERSITYNIVERSITY OFOF D DELAWARE ELAWARE • • C COMPUTER & OMPUTER & IINFORMATION NFORMATION SSCIENCES CIENCES DDEPARTMENTEPARTMENT
11
2. Rename variables in a pre-order walk over dominator tree(uses counter and a stack per global name)
Staring with the root block, b
a.) generate unique names for each -functionand push them on the appropriate stacks
SSA Construction Algorithm (Less high-level sketch)
UUNIVERSITYNIVERSITY OFOF D DELAWARE ELAWARE • • C COMPUTER & OMPUTER & IINFORMATION NFORMATION SSCIENCES CIENCES DDEPARTMENTEPARTMENT
12
2. Rename variables (cont’d)
b.) rewrite each operation in the blocki. Rewrite uses of global names with the current version (from the stack)ii. Rewrite definition by creating & pushing new name
c.) fill in -function parameters of successor blocks
d.) recurse on b’s children in the dominator tree
e.) <on exit from block b > pop names generated in b from stacks
SSA Construction Algorithm (Less high-level sketch)
UUNIVERSITYNIVERSITY OFOF D DELAWARE ELAWARE • • C COMPUTER & OMPUTER & IINFORMATION NFORMATION SSCIENCES CIENCES DDEPARTMENTEPARTMENT
13
SSA Construction Algorithm
for each global name icounter[i] 0stack[i]
call Rename(n0)
NewName(n)i counter[n]counter[n] counter[n] + 1push ni onto stack[n]
return ni
Rename(b)for each -function in b, x (…)
rename x as NewName(x)
for each operation “x y op z” in brewrite y as top(stack[y])rewrite z as top(stack[z])rewrite x as NewName(x)
for each successor of b in the CFGrewrite appropriate parameters
for each successor s of b in dom. treeRename(s)
for each operation “x y op z” in bpop(stack[x])
Adding all the details ...
a (a,a)b (b,b)c (c,c)d (d,d)y a+bz c+di i+1
B7
i > 100i ...B0
b ...c ...d ...
B2
a (a,a)b (b,b)c (c,c)d (d,d)i (i,i) a ...c ...
B1
a ...d ...
B3
d ...B4 c ...
B5
d (d,d)c (c,c)b ...B6
i > 100
CountersStacks
1 1 1 1 0
a
a0 b0 c0 d0
Before processing B0
b c d i
Assume a, b, c, & d defined before B0
i has not been defined
20
a (a,a)b (b,b)c (c,c)d (d,d)y a+bz c+di i+1
B7
i > 100i0 ...B0
b ...c ...d ...
B2
a (a0,a)b (b0,b)c (c0,c)d (d0,d)i (i0,i)
a ...c ...
B1
a ...d ...
B3
d ...B4 c ...
B5
d (d,d)c (c,c)b ...B6
i > 100
CountersStacks
1 1 1 1 1
a b c d i
a0 b0 c0 d0
End of B0
i0
21
a (a,a)b (b,b)c (c,c)d (d,d)y a+bz c+di i+1
B7
i > 100i0 ...B0
b ...c ...d ...
B2
a1 (a0,a)b1 (b0,b)c1 (c0,c)d1 (d0,d)i1 (i0,i)
a2 ...c2 ...
B1
a ...d ...
B3
d ...B4 c ...
B5
d (d,d)c (c,c)b ...B6
i > 100
CountersStacks
3 2 3 2 2
a b c d i
a0 b0 c0 d0
End of B1
i0
a1 b1 c1 d1 i1
a2 c2
22
a (a2,a)
b (b2,b)c (c3,c)
d (d2,d)y a+bz c+di i+1
B7
i > 100i0 ...B0
b2 ...c3 ...d2 ...
B2
a1 (a0,a)
b1 (b0,b)c1 (c0,c)
d1 (d0,d)
i1 (i0,i) a2 ...c2 ...
B1
a ...d ...
B3
d ...B4 c ...
B5
d (d,d)c (c,c)b ...B6
i > 100
CountersStacks
3 3 4 3 2
a b c d i
a0 b0 c0 d0
End of B2
i0
a1 b1 c1 d1 i1
a2 c2b2 d2
c3
23
a (a2,a)
b (b2,b)c (c3,c)
d (d2,d)y a+bz c+di i+1
B7
i > 100i0 ...B0
b2 ...c3 ...d2 ...
B2
a1 (a0,a)
b1 (b0,b)c1 (c0,c)
d1 (d0,d)
i1 (i0,i) a2 ...c2 ...
B1
a ...d ...
B3
d ...B4 c ...
B5
d (d,d)c (c,c)b ...B6
i > 100
i ≤ 100
CountersStacks
3 3 4 3 2
a b c d i
a0 b0 c0 d0
Before starting B3
i0
a1 b1 c1 d1 i1
a2 c2
24
a (a2,a)
b (b2,b)c (c3,c)
d (d2,d)y a+bz c+di i+1
B7
i > 100i0 ...B0
b2 ...c3 ...d2 ...
B2
a1 (a0,a)
b1 (b0,b)c1 (c0,c)
d1 (d0,d)
i1 (i0,i) a2 ...c2 ...
B1
a3 ...d3 ...
B3
d ...B4 c ...
B5
d (d,d)c (c,c)b ...B6
i > 100
CountersStacks
4 3 4 4 2
a b c d i
a0 b0 c0 d0
End of B3
i0
a1 b1 c1 d1 i1
a2 c2
a3
d3
25
a (a2,a)
b (b2,b)c (c3,c)
d (d2,d)y a+bz c+di i+1
B7
i > 100i0 ...B0
b2 ...c3 ...d2 ...
B2
a1 (a0,a)
b1 (b0,b)c1 (c0,c)
d1 (d0,d)
i1 (i0,i) a2 ...c2 ...
B1
a3 ...d3 ...
B3
d4 ...B4 c ...
B5
d (d4,d)c (c2,c)b ...B6
i > 100
CountersStacks
4 3 4 5 2
a b c d i
a0 b0 c0 d0
End of B4
i0
a1 b1 c1 d1 i1
a2 c2
a3
d3
d4
26
a (a2,a)
b (b2,b)c (c3,c)
d (d2,d)y a+bz c+di i+1
B7
i > 100i0 ...B0
b2 ...c3 ...d2 ...
B2
a1 (a0,a)
b1 (b0,b)c1 (c0,c)
d1 (d0,d)
i1 (i0,i) a2 ...c2 ...
B1
a3 ...d3 ...
B3
d4 ...B4 c4 ...
B5
d (d4,d3)c (c2,c4)
b ...B6
i > 100
CountersStacks
4 3 5 5 2
a b c d i
a0 b0 c0 d0
End of B5
i0
a1 b1 c1 d1 i1
a2 c2
a3
d3
c4
27
a (a2,a3)
b (b2,b3)c (c3,c5)
d (d2,d5)y a+bz c+di i+1
B7
i > 100i0 ...B0
b2 ...c3 ...d2 ...
B2
a1 (a0,a)
b1 (b0,b)c1 (c0,c)
d1 (d0,d)
i1 (i0,i) a2 ...c2 ...
B1
a3 ...d3 ...
B3
d4 ...B4 c4 ...
B5
d5 (d4,d3)c5 (c2,c4)
b3 ...B6
i > 100
CountersStacks
4 4 6 6 2
a b c d i
a0 b0 c0 d0
End of B6
i0
a1 b1 c1 d1 i1
a2 c2
a3
d3
c5 d5
b3
28
a (a2,a3)
b (b2,b3)c (c3,c5)
d (d2,d5)y a+bz c+di i+1
B7
i > 100i0 ...B0
b2 ...c3 ...d2 ...
B2
a1 (a0,a)
b1 (b0,b)c1 (c0,c)
d1 (d0,d)
i1 (i0,i) a2 ...c2 ...
B1
a3 ...d3 ...
B3
d4 ...B4 c4 ...
B5
d5 (d4,d3)c5 (c2,c4)
b3 ...B6
i > 100
CountersStacks
4 4 6 6 2
a b c d i
a0 b0 c0 d0
Before B7
i0
a1 b1 c1 d1 i1
a2 c2
29
a4 (a2,a3)
b4 (b2,b3)c6 (c3,c5)
d6 (d2,d5)y a4+b4
z c6+d6
i2 i1+1
B7
i > 100i0 ...B0
b2 ...c3 ...d2 ...
B2
a1 (a0,a4)
b1 (b0,b4)c1 (c0,c6)
d1 (d0,d6)
i1 (i0,i2) a2 ...c2 ...
B1
a3 ...d3 ...
B3
d4 ...B4 c4 ...
B5
d5 (d4,d3)c5 (c2,c4)
b3 ...B6
i > 100
CountersStacks
5 5 7 7 3
a b c d i
a0 b0 c0 d0
End of B7
i0
a1 b1 c1 d1 i1
a2 c2
a4
b4
c6
d6 i2
30
a4 (a2,a3)
b4 (b2,b3)c6 (c3,c5)
d6 (d2,d5)y a4+b4
z c6+d6
i2 i1+1
B7
i > 100i0 ...B0
b2 ...c3 ...d2 ...
B2
a1 (a0,a4)
b1 (b0,b4)c1 (c0,c6)
d1 (d0,d6)
i1 (i0,i2) a2 ...c2 ...
B1
a3 ...d3 ...
B3
d4 ...B4 c4 ...
B5
d5 (d4,d3)c5 (c2,c4)
b3 ...B6
i > 100
After renaming
• Semi-pruned SSA form
• We’re done …
Semi-pruned only names live in 2 or more blocks are “global names”.
31
UUNIVERSITYNIVERSITY OFOF D DELAWARE ELAWARE • • C COMPUTER & OMPUTER & IINFORMATION NFORMATION SSCIENCES CIENCES DDEPARTMENTEPARTMENT
26
SSA Construction Algorithm (Pruned SSA)
What’s this “pruned SSA” stuff?
• Minimal SSA still contains extraneous -functions
• Inserts some -functions where they are dead
• Would like to avoid inserting them
UUNIVERSITYNIVERSITY OFOF D DELAWARE ELAWARE • • C COMPUTER & OMPUTER & IINFORMATION NFORMATION SSCIENCES CIENCES DDEPARTMENTEPARTMENT
27
SSA Construction Algorithm (Two Ideas)
• Semi-pruned SSA: discard names used in only one block Significant reduction in total number of -functions Needs only local Live information (cheap to
compute)
• Pruned SSA: only insert -functions where their value is live Inserts even fewer -functions, but costs more to do Requires global Live variable analysis (more expensive)
In practice, both are simple modifications to step 1.
UUNIVERSITYNIVERSITY OFOF D DELAWARE ELAWARE • • C COMPUTER & OMPUTER & IINFORMATION NFORMATION SSCIENCES CIENCES DDEPARTMENTEPARTMENT
28
SSA Deconstruction
At some point, we need executable code
• Few machines implement operations
• Need to fix up the flow of values
Basic idea
• Insert copies -function pred’s
• Simple algorithm Works in most cases
• Adds lots of copies Many of them coalesce away
X17 (x10,x11)
... x17
... ...
... x17
X17 x10 X17 x11