Compiler Design
-
Upload
meatscribd4dl -
Category
Documents
-
view
662 -
download
3
description
Transcript of Compiler Design
Universityof
Amsterdam
CSACSAComputerSystems
Architecture
Introduction to Compiler Design:optimization and backend issues
Andy Pimentel
Computer Systems Architecture [email protected]
Introduction to Compiler Design – A. Pimentel – p. 1/98
Universityof
Amsterdam
CSACSAComputerSystems
Architecture
Compilers: Organization Revisited
IRSource MachineCode
IRFrontend BackendOptimizer
OptimizerIndependent part of compilerDifferent optimizations possibleIR to IR translation
Introduction to Compiler Design – A. Pimentel – p. 2/98
Universityof
Amsterdam
CSACSAComputerSystems
Architecture
Intermediate Representation (IR)
Flow graphNodes are basic blocks
Basic blocks are single entry and single exitEdges represent control-flow
Abstract Machine CodeIncluding the notion of functions and procedures
Symbol table(s) keep track of scope and bindinginformation about names
Introduction to Compiler Design – A. Pimentel – p. 3/98
Universityof
Amsterdam
CSACSAComputerSystems
Architecture
Partitioning into basic blocks
1. Determine the leaders, which are:The first statementAny statement that is the target of a jumpAny statement that immediately follows a jump
2. For each leader its basic block consists of the leader and allstatements up to but not including the next leader
Introduction to Compiler Design – A. Pimentel – p. 4/98
Universityof
Amsterdam
CSACSAComputerSystems
Architecture
Partitioning into basic blocks (cont’d)
1 prod=02 i=13 t1=4*i4 t2=a[t1]5 t3=4*i6 t4=b[t3]7 t5=t2*t48 t6=prod+t59 prod=t610 t7=i+i11 i=t712 if i < 21 goto 3
BB1
BB2
Introduction to Compiler Design – A. Pimentel – p. 5/98
Universityof
Amsterdam
CSACSAComputerSystems
Architecture
Intermediate Representation (cont’d)
Structure within a basic block:
Abstract Syntax Tree (AST)Leaves are labeled by variable names or constantsInterior nodes are labeled by an operator
Directed Acyclic Graph (DAG)
C-like
3 address statements (like we have already seen)
Introduction to Compiler Design – A. Pimentel – p. 6/98
Universityof
Amsterdam
CSACSAComputerSystems
Architecture
Directed Acyclic Graph
Like ASTs:Leaves are labeled by variable names or constantsInterior nodes are labeled by an operator
Nodes can have variable names attached that contain thevalue of that expression
Common subexpressions are represented by multiple edgesto the same expression
Introduction to Compiler Design – A. Pimentel – p. 7/98
Universityof
Amsterdam
CSACSAComputerSystems
Architecture
DAG creation
Suppose the following three address statements:
1. x � y op z
2. x � op y
3. x � y
i f
�
i � � 20
�
... will be treated like case 1 with x undefined
Introduction to Compiler Design – A. Pimentel – p. 8/98
Universityof
Amsterdam
CSACSAComputerSystems
Architecture
DAG creation (cont’d)
If node
�
y
�
is undefined, create leaf labeled y, same for z ifapplicable
Find node n labeled op with children node
�
y
�
and node
�
z
�
if applicable. When not found, create node n. In case 3 let nbe node
�
y
�
Make node
�
x
�
point to n and update the attached identifiersfor x
Introduction to Compiler Design – A. Pimentel – p. 9/98
Universityof
Amsterdam
CSACSAComputerSystems
Architecture
DAG example
1 t1 = 4 * i2 t2 = a[t1]3 t3 = 4 * i4 t4 = b[t3]5 t5 = t2 * t46 t6 = prod + t57 prod = t68 t7 = i + 19 i = t7
10 if (i � � 20) goto 1
[ ] [ ]
*
* +
+
<=
prod
t6, prod
t5
t2 t4
t1, t3 t7, i 20
1i4ba
Introduction to Compiler Design – A. Pimentel – p. 10/98
Universityof
Amsterdam
CSACSAComputerSystems
Architecture
Local optimizations
On basic blocks in the intermediate representationMachine independent optimizations
As a post code-generation step (often called peepholeoptimization)
On a small “instruction window” (often a basic block)Includes machine specific optimizations
Introduction to Compiler Design – A. Pimentel – p. 11/98
Universityof
Amsterdam
CSACSAComputerSystems
Architecture
Transformations on basic blocks
Examples
Function-preserving transformationsCommon subexpression eliminationConstant foldingCopy propagationDead-code eliminationTemporary variable renamingInterchange of independent statements
Introduction to Compiler Design – A. Pimentel – p. 12/98
Universityof
Amsterdam
CSACSAComputerSystems
Architecture
Transformations on basic blocks (cont’d)
Algebraic transformations
Machine dependent eliminations/transformationsRemoval of redundant loads/storesUse of machine idioms
Introduction to Compiler Design – A. Pimentel – p. 13/98
Universityof
Amsterdam
CSACSAComputerSystems
Architecture
Common subexpression elimination
If the same expression is computed more than once it iscalled a common subexpression
If the result of the expression is stored, we don’t have torecompute it
Moving to a DAG as IR, common subexpressions areautomatically detected!
x � a � b x � a � b
� � � � � � �
y � a � b y � x
Introduction to Compiler Design – A. Pimentel – p. 14/98
Universityof
Amsterdam
CSACSAComputerSystems
Architecture
Constant folding
Compute constant expression at compile time
May require some emulation support
x � 3 � 5 x � 8
� � � � � � �
y � x � 2 y � 16
Introduction to Compiler Design – A. Pimentel – p. 15/98
Universityof
Amsterdam
CSACSAComputerSystems
Architecture
Copy propagation
Propagate original values when copied
Target for dead-code elimination
x � y x � y
� � � � � � �
z � x � 2 z � y � 2
Introduction to Compiler Design – A. Pimentel – p. 16/98
Universityof
Amsterdam
CSACSAComputerSystems
Architecture
Dead-code elimination
A variable x is dead at a statement if it is not used after thatstatement
An assignment x � y � z where x is dead can be safelyeliminated
Requires live-variable analysis (discussed later on)
Introduction to Compiler Design – A. Pimentel – p. 17/98
Universityof
Amsterdam
CSACSAComputerSystems
Architecture
Temporary variable renaming
t1 � a � b t1 � a � bt2 � t1 � 2 t2 � t1 � 2
� � � � � � �t1 � d � e t3 � d � ec � t1 � 1 c � t3 � 1
If each statement that defines a temporary defines a newtemporary, then the basic block is in normal-form
Makes some optimizations at BB level a lot simpler(e.g. common subexpression elimination, copypropagation, etc.)
Introduction to Compiler Design – A. Pimentel – p. 18/98
Universityof
Amsterdam
CSACSAComputerSystems
Architecture
Algebraic transformations
There are many possible algebraic transformations
Usually only the common ones are implemented
x � x � 0
x � x � 1
x � x � 2 � x � x � � 1
x � x2 � x � x � x
Introduction to Compiler Design – A. Pimentel – p. 19/98
Universityof
Amsterdam
CSACSAComputerSystems
Architecture
Machine dependent eliminations/transformations
Removal of redundant loads/stores1 mov R0, a2 mov a, R0 // can be removed
Removal of redundant jumps, for example1 beq ...,$Lx bne ...,$Ly2 j $Ly � $Lx: ...3 $Lx: ...
Use of machine idioms, e.g.,Auto increment/decrement addressing modesSIMD instructions
Etc., etc. (see practical assignment)
Introduction to Compiler Design – A. Pimentel – p. 20/98
Universityof
Amsterdam
CSACSAComputerSystems
Architecture
Other sources of optimizations
Global optimizationsGlobal common subexpression eliminationGlobal constant foldingGlobal copy propagation, etc.
Loop optimizations
They all need some dataflow analysis on the flow graph
Introduction to Compiler Design – A. Pimentel – p. 21/98
Universityof
Amsterdam
CSACSAComputerSystems
Architecture
Loop optimizations
Code motionDecrease amount of code inside loop
Take a loop-invariant expression and place it before theloop
while (i � � limit � 2) � t � limit � 2while (i � � t)
Introduction to Compiler Design – A. Pimentel – p. 22/98
Universityof
Amsterdam
CSACSAComputerSystems
Architecture
Loop optimizations (cont’d)
Induction variable eliminationVariables that are locked to the iteration of the loop arecalled induction variables
Example: in for (i = 0; i < 10; i++) i is aninduction variable
Loops can contain more than one induction variable, forexample, hidden in an array lookup computation
Often, we can eliminate these extra induction variables
Introduction to Compiler Design – A. Pimentel – p. 23/98
Universityof
Amsterdam
CSACSAComputerSystems
Architecture
Loop optimizations (cont’d)
Strength reductionStrength reduction is the replacement of expensiveoperations by cheaper ones (algebraic transformation)
Its use is not limited to loops but can be helpful forinduction variable elimination
i � i � 1 i � i � 1t1 � i � 4 � t1 � t1 � 4t2 � a
t1
t2 � a
t1
if (i � 10) goto top if (i � 10) goto top
Introduction to Compiler Design – A. Pimentel – p. 24/98
Universityof
Amsterdam
CSACSAComputerSystems
Architecture
Loop optimizations (cont’d)
Induction variable elimination (2)Note that in the previous strength reduction we have toinitialize t1 before the loop
After such strength reductions we can eliminate aninduction variable
i � i � 1 t1 � t1 � 4t1 � t1 � 4 � t2 � a
t1
t2 � a
t1
if (t1 � 40) goto topif (i � 10) goto top
Introduction to Compiler Design – A. Pimentel – p. 25/98
Universityof
Amsterdam
CSACSAComputerSystems
Architecture
Finding loops in flow graphs
Dominator relationNode A dominates node B if all paths to node B go throughnode A
A node always dominates itself
We can construct a tree using this relation: the Dominator tree
Introduction to Compiler Design – A. Pimentel – p. 26/98
Universityof
Amsterdam
CSACSAComputerSystems
Architecture
Dominator tree example
5
1
2
3
4
9
6
7
8
4
10
1
2 3
5
6
7
8
9 10
Flow graph Dominator tree
Introduction to Compiler Design – A. Pimentel – p. 27/98
Universityof
Amsterdam
CSACSAComputerSystems
Architecture
Natural loops
A loop has a single entry point, the header, whichdominates the loop
There must be a path back to the header
Loops can be found by searching for edges of which theirheads dominate their tails, called the backedges
Given a backedge n � d, the natural loop is d plus thenodes that can reach n without going through d
Introduction to Compiler Design – A. Pimentel – p. 28/98
Universityof
Amsterdam
CSACSAComputerSystems
Architecture
Finding natural loop of n d
procedure insert(m) {if (not m � loop) {
loop � loop
mpush(m)
}}
stack � /0loop � �d
�
insert(n)while (stack
� � /0) {m = pop()for (p � pred
�m
�
) insert(p)}
Introduction to Compiler Design – A. Pimentel – p. 29/98
Universityof
Amsterdam
CSACSAComputerSystems
Architecture
Natural loops (cont’d)
When two backedges go to the same header node, we mayjoin the resulting loops
When we consider two natural loops, they are eithercompletely disjoint or one is nested inside the other
The nested loop is called an inner loop
A program spends most of its time inside loops, so loopsare a target for optimizations. This especially holds forinner loops!
Introduction to Compiler Design – A. Pimentel – p. 30/98
Universityof
Amsterdam
CSACSAComputerSystems
Architecture
Our example revisited
1
2
3
4
5 6
7
8
9 10
Flow graph
Natural loops:
1. backedge 10 −> 7: {7,8,10} (the inner loop) 2. backedge 7 −> 4: {4,5,6,7,8,10} 3. backedges 4 −> 3 and 8 −> 3: {3,4,5,6,7,8,10} 4. backedge 9 −> 1: the entire flow graph
Introduction to Compiler Design – A. Pimentel – p. 31/98
Universityof
Amsterdam
CSACSAComputerSystems
Architecture
Our example revisited
1
2
3
4
5 6
7
8
9 10
Flow graph
Natural loops:
1. backedge 10 −> 7: {7,8,10} (the inner loop) 2. backedge 7 −> 4: {4,5,6,7,8,10} 3. backedges 4 −> 3 and 8 −> 3: {3,4,5,6,7,8,10} 4. backedge 9 −> 1: the entire flow graph
Introduction to Compiler Design – A. Pimentel – p. 31/98
Universityof
Amsterdam
CSACSAComputerSystems
Architecture
Reducible flow graphs
A flow graph is reducible when the edges can be partitionedinto forward edges and backedges
The forward edges must form an acyclic graph in whichevery node can be reached from the initial node
Exclusive use of structured control-flow statements such asif-then-else, while and break produces reduciblecontrol-flow
Irreducible control-flow can create loops that cannot beoptimized
Introduction to Compiler Design – A. Pimentel – p. 32/98
Universityof
Amsterdam
CSACSAComputerSystems
Architecture
Reducible flow graphs (cont’d)
Irreducible control-flow graphs can always be madereducible
This usually involves some duplication of code
a
cb
a
cb
c’
Introduction to Compiler Design – A. Pimentel – p. 33/98
Universityof
Amsterdam
CSACSAComputerSystems
Architecture
Dataflow analysis
Data analysis is needed for global code optimization, e.g.:Is a variable live on exit from a block? Does adefinition reach a certain point in the code?
Dataflow equations are used to collect dataflow informationA typical dataflow equation has the formout
S
� gen
S
�
in
S� kill
S
�
The notion of generation and killing depends on thedataflow analysis problem to be solved
Let’s first consider Reaching Definitions analysis forstructured programs
Introduction to Compiler Design – A. Pimentel – p. 34/98
Universityof
Amsterdam
CSACSAComputerSystems
Architecture
Reaching definitions
A definition of a variable x is a statement that assigns ormay assign a value to x
An assignment to x is an unambiguous definition of x
An ambiguous assignment to x can be an assignment to apointer or a function call where x is passed by reference
Introduction to Compiler Design – A. Pimentel – p. 35/98
Universityof
Amsterdam
CSACSAComputerSystems
Architecture
Reaching definitions (cont’d)
When x is defined, we say the definition is generated
An unambiguous definition of x kills all other definitions ofx
When all definitions of x are the same at a certain point, wecan use this information to do some optimizations
Example: all definitions of x define x to be 1. Now, byperforming constant folding, we can do strength reductionif x is used in z � y � x
Introduction to Compiler Design – A. Pimentel – p. 36/98
Universityof
Amsterdam
CSACSAComputerSystems
Architecture
Dataflow analysis for reaching definitions
During dataflow analysis we have to examine every paththat can be taken to see which definitions reach a point inthe code
Sometimes a certain path will never be taken, even if it ispart of the flow graph
Since it is undecidable whether a path can be taken, wesimply examine all paths
This won’t cause false assumptions to be made for thecode: it is a conservative simplification
It merely causes optimizations not to be performed
Introduction to Compiler Design – A. Pimentel – p. 37/98
Universityof
Amsterdam
CSACSAComputerSystems
Architecture
The building blocks
S
S
S
S
d: a=b+c
S1
S2
S1 S2
S1
gen[S]=
�
d
�
kill[S]=Da � �d
�
out[S]=gen[S]
�
(in[S]-kill[S])
gen[S]=gen[S2]
�
(gen[S1]-kill[S2])kill[S]=kill[S2]
�
(kill[S1]-gen[S2])in[S1]=in[S]in[S2]=out[S1]out[S]=out[S2]
gen[S]=gen[S1]
�
gen[S2]kill[S]=kill[S1]
�
kill[S2]in[S1]=in[S2]=in[S]out[S]=out[S1]
�
out[S2]
gen[S]=gen[S1]kill[S]=kill[S1]in[S1]=in[S]
�
gen[S1]out[S]=out[S1]
Introduction to Compiler Design – A. Pimentel – p. 38/98
Universityof
Amsterdam
CSACSAComputerSystems
Architecture
Dealing with loops
The in-set to the code inside the loop is the in-set of theloop plus the out-set of the loop: in
S1 � in
S
out
S1
The out-set of the loop is the out-set of the code inside:out
S
� out
S1
Fortunately, we can also compute out
S1
in terms of in
S1
:out
S1
� gen
S1
�
in
S1� kill
S1
�
Introduction to Compiler Design – A. Pimentel – p. 39/98
Universityof
Amsterdam
CSACSAComputerSystems
Architecture
Dealing with loops (cont’d)
I � in
�
S1
��� O � out
�
S1
��� J � in
�
S
�� G � gen
�
S1
�
and K � kill
�
S1
�
I � J
�
O
O � G
� �
I � K
�
Assume O � /0, then I1 � J
O1 � G
� �
I1 � K
� � G
� �
J � K�
I2 � J
�
O1 � J
�
G
� �
J � K� � J
�
G
O2 � G
� �
I2 � K
� � G� �
J
�
G � K
� � G
� �
J � K
�
O1 � O2 so in
�
S1� � in�
S
� �
gen
�
S1
�
and out
�
S
� � out
�
S1
�
Introduction to Compiler Design – A. Pimentel – p. 40/98
Universityof
Amsterdam
CSACSAComputerSystems
Architecture
Reaching definitions example
d1 i = m - 1d2 j = nd3 a = u1
dod4 i = i + 1d5 j = j - 1
if (e1)d6 a = u2
elsed7 i = u3
while (e2)
001 1111110 0000
;000 1111111 0000
;000 1101110 0000
100 0000000 1001
d1 d2010 0000000 0100
d3001 0000000 0010
do000 1111110 0000
;
;
d4
110 0000000 1111
000 1100110 0001
000 1000100 0001 d5
000 0100010 0000
if
e1
000 0011000 0000
d6 d7
000 0010 100 1000000 0001
001 0000
e2
;
In reality, dataflow analysis is often performed at the granularityof basic blocks rather than statements
Introduction to Compiler Design – A. Pimentel – p. 41/98
Universityof
Amsterdam
CSACSAComputerSystems
Architecture
Iterative solutions
Programs in general need not be made up out of structuredcontrol-flow statements
We can do dataflow analysis on these programs using aniterative algorithm
The equations (at basic block level) for reaching definitionsare:
in
B �
P � pred
B
!
out
P
out
B
� gen
B
�
in
B
� kill
B
�
Introduction to Compiler Design – A. Pimentel – p. 42/98
Universityof
Amsterdam
CSACSAComputerSystems
Architecture
Iterative algorithm for reaching definitions
for (each block B) out
B
� gen
B
do {change � falsefor (each block B) {
in
B
�
P � pred
B
!out
P
oldout � out
B
out
B � gen
B
�
in
B
� kill
B
�
if (out
B
� � oldout) change � true}
} while (change)
Introduction to Compiler Design – A. Pimentel – p. 43/98
Universityof
Amsterdam
CSACSAComputerSystems
Architecture
Reaching definitions: an example
d1: i = m −1d2: j = nd3: a = u1
d4: i = i + 1d5: j = j − 1
d6: a = u2
d7: i = u3
B3
B1
B2
B4
gen[B1] = {d1,d2,d3}kill[B1] = {d4,d5,d6,d7}
kill[B2] = {d1,d2,d7}gen[B2] = {d4,d5}
gen[B3] = {d6}kill[B3] = {d3}
gen[B4] = {d7}kill[B4] = {d1,d4}
Block B Initial Pass 1 Pass 2
in
"
B
#
out"
B#
in
"
B
#
out
"
B
#
in
"
B
#
out
"
B
#
B1 000 0000 111 0000 000 0000 111 0000 000 0000 111 0000
B2 000 0000 000 1100 111 0011 001 1110 111 1111 001 1110
B3 000 0000 000 0010 001 1110 000 1110 001 1110 000 1110
B4 000 0000 000 0001 001 1110 001 0111 001 1110 001 0111
Introduction to Compiler Design – A. Pimentel – p. 44/98
Universityof
Amsterdam
CSACSAComputerSystems
Architecture
Available expressions
An expression e is available at a point p if every path fromthe initial node to p evaluates e, and the variables used by eare not changed after the last evaluations
An available expression e is killed if one of the variablesused by e is assigned to
An available expression e is generated if it is evaluated
Note that if an expression e is assigned to a variable usedby e, this expression will not be generated
Introduction to Compiler Design – A. Pimentel – p. 45/98
Universityof
Amsterdam
CSACSAComputerSystems
Architecture
Available expressions (cont’d)
Available expressions are mainly used to find commonsubexpressions
t1 = 4 * i
?
t2 = 4 * i
B2
B3
B1 t1 = 4 * i
t2 = 4 * i
t0 = 4 * ii = ...
B1
B2
B3
Introduction to Compiler Design – A. Pimentel – p. 46/98
Universityof
Amsterdam
CSACSAComputerSystems
Architecture
Available expressions (cont’d)
Dataflow equations:
out
B
� e_gen
B
�
in
B� e_kill
B
�
in
B
�
P � pred
B!
out
P
for B not initial
in
B1
� /0 where B1 is the initial block
The confluence operator is intersection instead of the union!
Introduction to Compiler Design – A. Pimentel – p. 47/98
Universityof
Amsterdam
CSACSAComputerSystems
Architecture
Liveness analysis
A variable is live at a certain point in the code if it holds avalue that may be needed in the future
Solve backwards:Find use of a variableThis variable is live between statements that havefound use as next statementRecurse until you find a definition of the variable
Introduction to Compiler Design – A. Pimentel – p. 48/98
Universityof
Amsterdam
CSACSAComputerSystems
Architecture
Dataflow for liveness
Using the sets use
B
and de f
B
de f
B
is the set of variables assigned values in B priorto any use of that variable in Buse
B
is the set of variables whose values may be usedin B prior to any definition of the variable
A variable comes live into a block (in in
B
), if it is eitherused before redefinition of it is live coming out of the blockand is not redefined in the block
A variable comes live out of a block (in out
B
) if and onlyif it is live coming into one of its successors
Introduction to Compiler Design – A. Pimentel – p. 49/98
Universityof
Amsterdam
CSACSAComputerSystems
Architecture
Dataflow equations for liveness
in
B
� use
B
�
out
B
� de f
B �
out
B
�
S �succ$
B%
in
S
Note the relation between reaching-definitions equations:the roles of in and out are interchanged
Introduction to Compiler Design – A. Pimentel – p. 50/98
Universityof
Amsterdam
CSACSAComputerSystems
Architecture
Algorithms for global optimizations
Global common subexpression eliminationFirst calculate the sets of available expressions
For every statement s of the form x � y � z where y � z isavailable do the following
Search backwards in the graph for the evaluations ofy � zCreate a new variable uReplace statements w � y � z by u � y � z; w � uReplace statement s by x � u
Introduction to Compiler Design – A. Pimentel – p. 51/98
Universityof
Amsterdam
CSACSAComputerSystems
Architecture
Copy propagation
Suppose a copy statement s of the form x � y isencountered. We may now substitute a use of x by a use ofy if
Statement s is the only definition of x reaching the useOn every path from statement s to the use, there are noassignments to y
Introduction to Compiler Design – A. Pimentel – p. 52/98
Universityof
Amsterdam
CSACSAComputerSystems
Architecture
Copy propagation (cont’d)
To find the set of copy statements we can use, we define anew dataflow problem
An occurrence of a copy statement generates this statement
An assignment to x or y kills the copy statement x � y
Dataflow equations:
out
B
� c_gen
B
�
in
B
� c_kill
B
�
in
B �
P � pred
B
!
out
P
for B not initial
in
B1
� /0 where B1 is the initial block
Introduction to Compiler Design – A. Pimentel – p. 53/98
Universityof
Amsterdam
CSACSAComputerSystems
Architecture
Copy propagation (cont’d)
For each copy statement s: x � y doDetermine the uses of x reached by this definition of xDetermine if for each of those uses this is the onlydefinition reaching it ( � s � in
Buse
)If so, remove s and replace the uses of x by uses of y
Introduction to Compiler Design – A. Pimentel – p. 54/98
Universityof
Amsterdam
CSACSAComputerSystems
Architecture
Detection of loop-invariant computations
1. Mark invariant those statements whose operands areconstant or have reaching definitions outside the loop
2. Repeat step 3 until no new statements are marked invariant
3. Mark invariant those statements whose operands either areconstant, have reaching definitions outside the loop, or haveone reaching definition that is marked invariant
Introduction to Compiler Design – A. Pimentel – p. 55/98
Universityof
Amsterdam
CSACSAComputerSystems
Architecture
Code motion
1. Create a pre-header for the loop
2. Find loop-invariant statements
3. For each statement s defining x found in step 2, check that(a) it is in a block that dominate all exits of the loop(b) x is not defined elsewhere in the loop(c) all uses of x in the loop can only be reached from
this statement s
4. Move the statements that conform to the pre-header
Introduction to Compiler Design – A. Pimentel – p. 56/98
Universityof
Amsterdam
CSACSAComputerSystems
Architecture
Code motion (cont’d)
i = 2u = u + 1
i = 1
if u < v goto B3
v = v − 1if v <= 20 goto B5
j = i
B3
B2
B4
B5
B1
i = 1 B1
i = 2u = u + 1
if u < v goto B3
v = v − 1if v <= 20 goto B5
j = i
B3
B2
B4
B5
i = 3
i = 2u = u + 1
i = 1
if u < v goto B3
B3
B2
B1
v = v − 1if v <= 20 goto B5
j = i B5
k = iB4
Condition (a) Condition (b) Condition (c)
Introduction to Compiler Design – A. Pimentel – p. 57/98
Universityof
Amsterdam
CSACSAComputerSystems
Architecture
Detection of induction variables
A basic induction variable i is a variable that only hasassignments of the form i � i
&
c
Associated with each induction variable j is a triple
�
i ' c ' d
�
where i is a basic induction variable and c and d areconstants such that j � c � i � d
In this case j belongs to the family of i
The basic induction variable i belongs to its own family,with the associated triple
�
i ' 1 ' 0
�
Introduction to Compiler Design – A. Pimentel – p. 58/98
Universityof
Amsterdam
CSACSAComputerSystems
Architecture
Detection of induction variables (cont’d)
Find all basic induction variables in the loop
Find variables k with a single assignment in the loop withone of the following forms:
k � j � b, k � b � j, k � j(
b, k � j � b, k � b � j, whereb is a constant and j is an induction variable
If j is not basic and in the family of i then there must beNo assignment of i between the assignment of j and kNo definition of j outside the loop that reaches k
Introduction to Compiler Design – A. Pimentel – p. 59/98
Universityof
Amsterdam
CSACSAComputerSystems
Architecture
Strength reduction for induction variables
Consider each basic induction variable i in turn. For eachvariable j in the family of i with triple
�i ' c ' d
�
:Create a new variable sReplace the assignment to j by j � sImmediately after each assignment i � i
&
n appends � s � c � nPlace s in the family of i with triple
�
i ' c ' d
�
Initialize s in the preheader: s � c � i � d
Introduction to Compiler Design – A. Pimentel – p. 60/98
Universityof
Amsterdam
CSACSAComputerSystems
Architecture
Strength reduction for induction variables (cont’d)
i = i + 1t2 = 4 * it3 = a[t2]if t3 < v goto B2
Strength reduction
i = m − 1t1 = 4 * nv = a[t1]
if i < n goto B5
B5
B1
B2
B3
B4
i = m − 1t1 = 4 * nv = a[t1]
s2 = 4 * i
t3 = a[t2]if t3 < v goto B2
t2 = s2s2 = s2 + 4i = i + 1
if i < n goto B5
B5
B3
B2
B1
B4
Introduction to Compiler Design – A. Pimentel – p. 61/98
Universityof
Amsterdam
CSACSAComputerSystems
Architecture
Elimination of induction variables
Consider each basic induction variable i only used tocompute other induction variables and tests
Take some j in i’s family such that c and d from the triple�
i ' c ' d
�
are simple
Rewrite tests if (i relop x) tor � c � x � d; if ( j relop r)
Delete assignments to i from the loop
Do some copy propagation to eliminate j � s assignmentsformed during strength reduction
Introduction to Compiler Design – A. Pimentel – p. 62/98
Universityof
Amsterdam
CSACSAComputerSystems
Architecture
Alias Analysis
Aliases, e.g. caused by pointers, make dataflow analysismore complex (uncertainty regarding what is defined andused: x � �p might use any variable)
Use dataflow analysis to determine what a pointer mightpoint to
in
B
contains for each pointer p the set of variables towhich p could point at the beginning of block B
Elements of in
B
are pairs
�
p ' a
�
where p is a pointerand a a variable, meaning that p might point to a
out
B
is defined similarly for the end of B
Introduction to Compiler Design – A. Pimentel – p. 63/98
Universityof
Amsterdam
CSACSAComputerSystems
Architecture
Alias Analysis (cont’d)
Define a function transB such that transB�
in
B � � out
B
transB is composed of transs, for each stmt s of block BIf s is p � &a or p � &a
&
c in case a is an array, thentranss
�
S
� �
�
S �� �
p ' b
�)
any variable b� � � �
p ' a
� �
If s is p � q
&
c for pointer q and nonzero integer c,then
transs
�
S� � �S �� �
p ' b
�)
any variable b
� �
� �p ' b
�) �
q ' b
� �
S and b is an array variable
�
If s is p � q, thentranss
�
S
� � �S �� �
p ' b
�)
any variable b
� �
� �
p ' b
�) �
q ' b
� � S
�
Introduction to Compiler Design – A. Pimentel – p. 64/98
Universityof
Amsterdam
CSACSAComputerSystems
Architecture
Alias Analysis (cont’d)
– If s assigns to pointer p any other expression, thentranss
�
S
� � S �� �
p ' b
�)
any variable b�
– If s is not an assignment to a pointer, then transs
�
S
� � S
Dataflow equations for alias analysis:
out
B � transB
�
in
B
�
in
B �
P � pred
B
!
out
P
where transB�
S� � transsk
�
transsk *1
�,+ + + �
transs1
�
S
� � � �
Introduction to Compiler Design – A. Pimentel – p. 65/98
Universityof
Amsterdam
CSACSAComputerSystems
Architecture
Alias Analysis (cont’d)
How to use the alias dataflow information? Examples:In reaching definitions analysis (to determine gen andkill)
� statement �p � a generates a definition of everyvariable b such that p could point to b
� �p � a kills definition of b only if b is not an arrayand is the only variable p could possibly point to (tobe conservative)
In liveness analysis (to determine de f and use)
� �p � a uses p and a. It defines b only if b is theunique variable that p might point to (to beconservative)
� a � �p defines a, and represents the use of p and ause of any variable that p could point to
Introduction to Compiler Design – A. Pimentel – p. 66/98
Universityof
Amsterdam
CSACSAComputerSystems
Architecture
Code generation
Instruction selectionWas a problem in the CISC era (e.g., lots of addressingmodes)
RISC instructions mean simpler instruction selection
However, new instruction sets introduce new, complicatedinstructions (e.g., multimedia instruction sets)
Introduction to Compiler Design – A. Pimentel – p. 67/98
Universityof
Amsterdam
CSACSAComputerSystems
Architecture
Instruction selection methods
Tree-based methods (IR is a tree)Maximal MunchDynamic programmingTree grammars
Input tree treated as string using prefix notationRewrite string using an LR parser and generateinstructions as side effect of rewriting rules
If the DAG is not a tree, then it can be partitioned intomultiple trees
Introduction to Compiler Design – A. Pimentel – p. 68/98
Universityof
Amsterdam
CSACSAComputerSystems
Architecture
Tree pattern based selection
Every target instruction is represented by a tree pattern
Such a tree pattern often has an associated cost
Instruction selection is done by tiling the IR tree with theinstruction tree patterns
There may be many different ways an IR tree can be tiled,depending on the instruction set
Introduction to Compiler Design – A. Pimentel – p. 69/98
Universityof
Amsterdam
CSACSAComputerSystems
Architecture
Tree pattern based selection (cont’d)
+
mem
const d
+
mem
mem
move
const b
temp 1 const a
* temp 2 const c
+
Name Effect Trees Cycles
— ri temp 0
ADD ri
- r j
. rk+
1
MUL ri
- r j
/ rk*
1
ADDI ri
- r j
. c +
const
+
constconst 1
LOAD ri
- M 0r j. c 1
+
const
mem
+
mem
const
mem
const
mem3
STORE M
0r j
. c 1 - ri
+
const
mem
move
+
mem
move
const
mem
move
const
mem
move3
MOVEM M
0
r j
1 - M 0ri
1
mem
move
mem6
Introduction to Compiler Design – A. Pimentel – p. 70/98
Universityof
Amsterdam
CSACSAComputerSystems
Architecture
Optimal and optimum tilings
The cost of a tiling is the sum of the costs of the tree patterns
An optimal tiling is one where no two adjacent tiles can becombined into a single tile of lower cost
An optimum tiling is a tiling with lowest possible cost
An optimum tiling is also optimal, but not vice-versa
Introduction to Compiler Design – A. Pimentel – p. 71/98
Universityof
Amsterdam
CSACSAComputerSystems
Architecture
Maximal Munch
Maximal Munch is an algorithm for optimal tilingStart at the root of the treeFind the largest pattern that fitsCover the root node plus the other nodes in the pattern;the instruction corresponding to the tile is generatedDo the same for the resulting subtrees
Maximal Munch generates the instructions in reverse order!
Introduction to Compiler Design – A. Pimentel – p. 72/98
Universityof
Amsterdam
CSACSAComputerSystems
Architecture
Dynamic programming
Dynamic programming is a technique for finding optimumsolutions
Bottom up approachFor each node n the costs of all children are foundrecursively.Then the minimum cost for node n is determined.
After cost assignment of the entire tree, instructionemission follows:
Emission(node n): for each leaves li of the tileselected at node n, perform Emission(li). Then emitthe instruction matched at node n
Introduction to Compiler Design – A. Pimentel – p. 73/98
Universityof
Amsterdam
CSACSAComputerSystems
Architecture
Register allocation...a graph coloring problem
First do instruction selection assuming an infinite numberof symbolic registers
Build an interference graphEach node is a symbolic registerTwo nodes are connected when they are live at thesame time
Color the interference graphConnected nodes cannot have the same colorMinimize the number of colors (maximum is thenumber of actual registers)
Introduction to Compiler Design – A. Pimentel – p. 74/98
Universityof
Amsterdam
CSACSAComputerSystems
Architecture
Coloring by simplification
Simplify interference graph G using heuristic method(K-coloring a graph is NP-complete)
Find a node m with less than K neighborsRemove node m and its edges from G, resulting in G
2
.Store m on a stackColor the graph G
2Graph G can be colored since m has less than Kneighbors
Introduction to Compiler Design – A. Pimentel – p. 75/98
Universityof
Amsterdam
CSACSAComputerSystems
Architecture
Coloring by simplification (cont’d)
SpillIf a node with less than K neigbors cannot be found inG
Mark a node n to be spilled, remove n and its edgesfrom G (and stack n) and continue simplification
SelectAssign colors by popping the stackArriving at a spill node, check whether it can becolored. If not:
The variable represented by this node will reside inmemory (i.e. is spilled to memory)Actual spill code is inserted in the program
Introduction to Compiler Design – A. Pimentel – p. 76/98
Universityof
Amsterdam
CSACSAComputerSystems
Architecture
Coalescing
If there is no interference edge between the source anddestination of a move, the move is redundant
Removing the move and joining the nodes is calledcoalescing
Coalescing increases the degree of a node
A graph that was K colorable before coalescing might notbe afterwards
Introduction to Compiler Design – A. Pimentel – p. 77/98
Universityof
Amsterdam
CSACSAComputerSystems
Architecture
Sketch of the algorithm with coalescing
Label move-related nodes in interference graph
While interference graph is nonemptySimplify, using non-move-related nodesCoalesce move-related nodes using conservativecoalescing
Coalesce only when the resulting node has less thanK neighbors with a significant degree
No simplifications/coalescings: “freeze” amove-related node of a low degree � do not considerits moves for coalescing anymoreSpill
Select
Introduction to Compiler Design – A. Pimentel – p. 78/98
Universityof
Amsterdam
CSACSAComputerSystems
Architecture
Register allocation: an example
Live in: k,jg = mem[j+12]h = k −1f = g * he = mem[j+8]m = mem[j+16]b = mem[f]c = e + 8d = ck = m + 4j = bgoto dLive out: d,k,j
e
d
h g
kj b
f
m
c
Assume a 4-coloring (K � 4)
Simplify by removing and stacking nodes with � 4neighbors (g,h,k,f,e,m)
Introduction to Compiler Design – A. Pimentel – p. 79/98
Universityof
Amsterdam
CSACSAComputerSystems
Architecture
Register allocation: an example (cont’d)
After removing and stacking the nodes g,h,k,f,e,m:
After simplification
d
j b
c
j&b d&c
After coalescing
Coalesce now and simplify again
Introduction to Compiler Design – A. Pimentel – p. 80/98
Universityof
Amsterdam
CSACSAComputerSystems
Architecture
Register allocation: an example (cont’d)
R0 R1 R2 R3Stacked elements: d&cj&b mefkgh
4 registers available:
e
d
h g
kj b
f
m
c
e
d
h g
kj b
f
m
c
Introduction to Compiler Design – A. Pimentel – p. 81/98
Universityof
Amsterdam
CSACSAComputerSystems
Architecture
Register allocation: an example (cont’d)
R0 R1 R2 R3efkgh
4 registers available:
e
d
h g
kj b
f
m
c
e
d
h g
kj b
f
m
c
Stacked elements: m
ETC., ETC.
No spills are required and both moves were optimized away
Introduction to Compiler Design – A. Pimentel – p. 82/98
Universityof
Amsterdam
CSACSAComputerSystems
Architecture
Instruction scheduling
Increase ILP (e.g., by avoiding pipeline hazards)Essential for VLIW processors
Scheduling at basic block level: list schedulingSystem resources represented by matrix Resources 3
TimePosition in matrix is true or false, indicating whetherthe resource is in use at that timeInstructions represented by matrices Resources 3
Instruction durationUsing dependency analysis, the schedule is made byfitting instructions as tight as possible
Introduction to Compiler Design – A. Pimentel – p. 83/98
Universityof
Amsterdam
CSACSAComputerSystems
Architecture
List scheduling (cont’d)
Finding optimal schedule is NP-complete problem � useheuristics, e.g. at an operation conflict schedule the mosttime-critical first
For a VLIW processor, the maximum instruction durationis used for scheduling � painful for memory loads!
Basic blocks usually are small (5 operations on the average)
� benefit of scheduling limited � Trace Scheduling
Introduction to Compiler Design – A. Pimentel – p. 84/98
Universityof
Amsterdam
CSACSAComputerSystems
Architecture
Trace scheduling
Schedule instructions over code sections larger than basicblocks, so-called traces
A trace is a series of basic blocks that does not extendbeyond loop boundaries
Apply list scheduling to whole trace
Scheduling code inside a trace can move code beyond basicblock boundaries � compensate this by adding code to theoff-trace edges
Introduction to Compiler Design – A. Pimentel – p. 85/98
Universityof
Amsterdam
CSACSAComputerSystems
Architecture
Trace scheduling (cont’d)
BB1
BB2 BB3
BB4
Introduction to Compiler Design – A. Pimentel – p. 86/98
Universityof
Amsterdam
CSACSAComputerSystems
Architecture
Trace scheduling (cont’d)
Operation to be movedbefore Op A
Op COp AOp B
Off Trace
Off Trace
in TraceBasic Block
in TraceBasic Block
in TraceBasic Block
(c)
(b)
(a)
Copied code
Basic BlockOff Trace
traceOp ABranch
Op BOp C
Branch
Op A
Branch
Op B
Op B
below Branch in
Op COp AOp B Op B
Op C
BranchOp A
Op A Op C Op BOp A
Op C
In Trace
allowed if no side-
In TraceCopied code inoff Trace Basic Block
codeeffects in Off trace
Moved code onlyOperation to be movedabove Branch
In Trace
In Trace
Operation to be moved
Introduction to Compiler Design – A. Pimentel – p. 87/98
Universityof
Amsterdam
CSACSAComputerSystems
Architecture
Trace scheduling (cont’d)
Trace selection
Because of the code copies, the trace that is most oftenexecuted has to be scheduled first
A longer trace brings more opportunities for ILP (loopunrolling!)
Use heuristics about how often a basic block is executedand which paths to and from a block have the most chanceof being taken (e.g. inner-loops) or use profiling (inputdependent)
Introduction to Compiler Design – A. Pimentel – p. 88/98
Universityof
Amsterdam
CSACSAComputerSystems
Architecture
Other methods to increase ILP
Loop unrollingTechnique for increasing the amount of code availableinside a loop: make several copies of the loop body
Reduces loop control overhead and increases ILP (moreinstructions to schedule)
When using trace scheduling this results in longer tracesand thus more opportunities for better schedules
In general, the more copies, the better the job the schedulercan do but the gain becomes minimal
Introduction to Compiler Design – A. Pimentel – p. 89/98
Universityof
Amsterdam
CSACSAComputerSystems
Architecture
Loop unrolling (cont’d)
Example
for (i = 0; i < 100; i++)
a[i] = a[i] + b[i];becomes
for (i = 0; i < 100; i += 4) {
a[i] = a[i] + b[i];
a[i+1] = a[i+1] + b[i+1];
a[i+2] = a[i+2] + b[i+2];
a[i+3] = a[i+3] + b[i+3];
}
Introduction to Compiler Design – A. Pimentel – p. 90/98
Universityof
Amsterdam
CSACSAComputerSystems
Architecture
Software pipelining
Also a technique for using the parallelism available inseveral loop iterations
Software pipelining simulates a hardware pipeline, henceits name
pipelinedSoftware
iteration
Iterattion 0Iteration 1
Iteration 2Iteration 3
Iteration 4
There are three phases: Prologue, Steady state and Epilogue
Introduction to Compiler Design – A. Pimentel – p. 91/98
Universityof
Amsterdam
CSACSAComputerSystems
Architecture
Software pipelining (cont’d)
Loop: LDADDD F4,F0,F2
SD
F0,0(R1)
0(R1),F4
Body
SBGEZ R1, Loop Loop control
T0
T1
T2
T... Loop:
LD
ADDD .
.
SD
LD
LD SBGEZ Loop.ADDD
LD
SD ADDD .Steady state
Prologue
Epilogue
Tn
Tn+1
Tn+2
SD ADDD
SD
Introduction to Compiler Design – A. Pimentel – p. 92/98
Universityof
Amsterdam
CSACSAComputerSystems
Architecture
Modulo scheduling
Scheduling multiple loop iterations using softwarepipelining can create false dependencies between variablesused in different iterations
Renaming the variables used in different iterations is calledmodulo scheduling
When using n variables for representing the same variable,the steady state of the loop has to be unrolled n times
Introduction to Compiler Design – A. Pimentel – p. 93/98
Universityof
Amsterdam
CSACSAComputerSystems
Architecture
Compiler optimizations for cache performance
Merging arrays (better spatial locality)
int val[SIZE]; struct merge {
int key[SIZE]; 4 int val, key; };
struct merge m_array[SIZE]
Loop interchange
Loop fusion and fission
Blocking (better temporal locality)
Introduction to Compiler Design – A. Pimentel – p. 94/98
Universityof
Amsterdam
CSACSAComputerSystems
Architecture
Loop interchange
Exchanging of nested loops to change the memory footprintBetter spatial locality
for (i = 0; i < 50; i++)
for (j = 0; j < 100; j++)
a[j][i] = b[j][i] * c[j][i];
becomesfor (j = 0; j < 100; j++)
for (i = 0; i < 50; i++)
a[j][i] = b[j][i] * c[j][i];
Introduction to Compiler Design – A. Pimentel – p. 95/98
Universityof
Amsterdam
CSACSAComputerSystems
Architecture
Loop fusion
Fuse multiple loops togetherLess loop controlBigger basic blocks (scheduling)Possibly better temporal locality
for (i = 0; i < n; i++)
c[i] = a[i] + b[i];
for (j = 0; j < n; j++)
d[j] = a[j] * e[j];
becomes
for (i = 0; i < n; i++) {
c[i] = a[i] + b[i];
d[i] = a[i] * e[i];
}
Introduction to Compiler Design – A. Pimentel – p. 96/98
Universityof
Amsterdam
CSACSAComputerSystems
Architecture
Loop fission
Split a loop with independent statements into multiple loopsEnables other transformations (e.g. vectorization)Results in smaller cache footprint (better temporallocality)
for (i = 0; i < n; i++) {
a[i] = b[i] + c[i];
d[i] = e[i] * f[i];
}
becomes
for (i = 0; i < n; i++) {
a[i] = b[i] + c[i];
}
for (i = 0; i < n; i++) {
d[i] = e[i] * f[i];
}
Introduction to Compiler Design – A. Pimentel – p. 97/98
Universityof
Amsterdam
CSACSAComputerSystems
Architecture
Blocking
Perform computations on sub-matrices (blocks), e.g. whenmultiple matrices are accessed both row by row and column bycolumn
i
j
i
k j
k
X Y Zfor (i=0; i < N; i++) for (j=0; j < N; j++) {
r = 0;for (k = 0; k < N; k++) {
r = r + y[i][k]*z[k][j];};x[i][j] = r;
};
Matrix multiplication x = y*z
not touched older access recent access
i
j
i
k j
k
X Y Z
Blocking
Introduction to Compiler Design – A. Pimentel – p. 98/98