An Optimizing Compiler: From Intermediate To Target: The ... · The phases of a simple compiler:...
Transcript of An Optimizing Compiler: From Intermediate To Target: The ... · The phases of a simple compiler:...
1
Compiler ConstructionSMD163
Lecture 13: Instruction selection
Viktor Leijon & Peter Jonsson with slides by Johan Nordlander
Contains material generously provided by Mark P. Jones
2
The phases of a simple compiler:
Without an intermediate representation, we use aform of “syntax-directed code generation” toproduce IA32 code directly from the validatedabstract syntax.
Reasonable output, reasonably low cost.
Lexer Parser StaticAnalysis
IA32 CodeGenerator
3
An Optimizing Compiler:
For better output (but higher cost)we might add an optimizer.
Lexer Parser StaticAnalysis
Intermed.Code
Generator
Target CodeGenerator
Optimizer
We can modify our IA32 code generator toproduce intermediate code.
4
From Intermediate To Target:
At some level, generating target code fromintermediate code is usually very easy.
We can usually find a simple mapping for eachintermediate code instruction onto acorresponding target language instruction (orinstruction sequence):
movl y, %eax
x := y + z addl z, %eax
movl %eax, x
5
Generating Good Target Code:
But will this do a good job?
A poor mapping from intermediate code totarget code could undermine improvements thatwere made during optimization.
The potential for mismatch is high:
! Intermediate codes are usually designed for analysisand optimization, and to increase portability.
! A target may have special instructions or addressingmodes that we need to exploit to get good output.
6
Instruction Selection:
The process of mapping from one instructionset to another is known as instruction selection.
We will see that it is possible to a good job ofinstruction selection, despite the problemsmentioned.
In fact, there are even tools that can be used toautomate the construction of a code generatorfrom the specification of a target machine.
Register allocation/spilling are still required.
7
Low-Level Intermediate Code…
A simple intermediate code may include only oneinstruction for loading from memory, and oneinstruction for saving to memory.
u := [v] [v] := u
Code to access the ith
element of an array a.
By using this intermediate code, we expose all thedetails of address calculation:
u := 4*i
v := a + u
w := [v]
This gives more opportunities for optimization.
And it is a good match for some RISC machines.8
… is a Poor Match for CISC:
A naïve translation of the same intermediate code intoIA32 machine language would give:
imull $4, %eax
addl $a, %eax
movl (%eax), %eax
But we can do much better!
movl a(,%eax,4), %eax
Was optimization worthwhile? A non-optimizingcompiler could have generated the more efficient codedirectly from the source code!
9
Pattern Matching:
To do a better job:
! Identify patterns of intermediate code instructionsequences for which a better target instruction canbe used.
! Search for (and apply) those patterns in theoptimizer’s output.
Like peephole optimization, but the patterns wewill be looking for are likely to be morecomplex.
10
Continued…
For example, in each of the following codesequences, we have an opportunity to use aninstruction like: movl a(,%eax,4), %eax
u := 4 * i
v := a + u
w := [v]
u := 4 * i
t := u * 7
v := u + a
w := [v]
u := 4 * i
t := s * 7
v := u + a
w := [v]
It’s easy tosearch for codelike this in the
optimizer’soutput.
But an assignmentto t here confuses
the pattern.
And the use of theintermediate value uhere in a separatecomputation adds
further complications.
11
Pattern Matching on Trees:
We can make the task a little bit easier bythinking of instruction sequences as trees:
a
v:+
u:*
4 i
[ ]
:=
wu := 4 * i
v := a + u
w := [v]
12
Pattern Matching on Trees:
We can make the task a little bit easier bythinking of instruction sequences as forests:
a
v:+
u:*
4 i
[ ]
:=
w u := 4 * i
t := s * 7
v := u + a
w := [v]
*
s 7
Forests
A forest is aset of trees.
The patternis mucheasier tospot now!
:=
t
13
Pattern Matching on Trees:
We can make the task a little bit easier bythinking of instruction sequences as Dags:
a
v:+
u:*
4 i
[ ]
:=
w
ForestsDags
u := 4 * i
t := u * 7
v := u + a
w := [v]
A dag is a “DirectedAcyclic Graph”
Again, thepattern iseasier to
spot!
*
7
:=
t
14
From Forests To Trees:
It is easy to deal with forests: we just treat eachtree as an independent entity.
The code that we will generate for two trees isjust the code for the first followed by the codefor the second.
15
From Dags to Forests:
We can convert a dag to a forest by replicating:
Or by introducing assignments:
We can use heuristics, or generate code for both, andsee which comes out best.
a
v:+
u:*
4 i
[ ]
:=
w t:*
7u:*
4 i
:=
t
a
v:+
u
[ ]
:=
w *
7u
*
4 i
:=
u
:=
t
16
Focus on Trees:
From now on, we will concentrate on trees.
But note that we use trees primarily a tool forpresenting techniques for instruction selection.
We might also choose to build and manipulate explicittree data structures in our implementation … but wedon’t have to.
u := 4 * i
…
v := a + u
…
w := [v]
For example, we can look forchildren of a tree node bysearching back through earlierstatements in a basic block.
17
An Example:
Suppose that we have thefollowing sequence ofintermediate codeinstructions:
t := a * b
s := t + c
r := d * e
q := r + s
Naïve translation:
movl a, %eax
imull b, %eax
movl %eax, t
movl t, %eax
addl c, %eax
movl %eax, s
movl d, %eax
imull e, %eax
movl %eax, r
movl r, %eax
addl s, %eax
movl %eax, q
Suppose also that none oft, s, or r are live afterthese instructions.
18
The Tree Perspective:
Here is what thecorresponding treelooks like.
If possible, we’d liketo evaluate thewhole thing usingjust registers…
+
:=
q
s:+r:*
ed ct:*
ba
19
Trees and Register Spilling:
If the whole tree istoo complex toevaluate in registers,then we can break itinto simpler piecesby spillingintermediate valuesto memory.
How can we identifythe places where thisis necessary?
+
:=
q
s:+r:*
ed ct:*
ba
+
:=
q
sr:*
ed
+
ct:*
ba
:=
s
20
Simple Patterns for Registers:
Here are some patterns that we can use toevaluate parts of a tree, leaving the result in aregister:
reg1:+
reg2reg1
addl reg2,reg1
reg2:+
reg2reg1
addl reg1,reg2
reg1:var
movl var,reg1
reg1:+
varreg1
addl var,reg1
21
%eax
addl %ebx,%eax
movl c, %ebx
%ebx%eax
imull %ebx, %eax
movl b, %ebx
%ebx
movl a, %eax
%eax
Tiling a Tree:
+
c*
ba
movl a, %eax
movl b, %ebx
imull %ebx, %eax
movl c, %ebx
addl %ebx, %eax
Generated Target Code:
22
addl c, %eax
%eax
imull b, %eax
%eax
movl a, %eax
%eax
Tiling a Tree (Differently):
+
c*
ba
movl a, %eax
imull b, %eax
addl c, %eax
Generated Target Code:
23
Choosing Between Tilings:
Even simple trees can have multiple tilings, eachcorresponding to a different sequence ofinstructions.
For the best code, we want to use as few tilesas possible.
Generating all possible tilings, and then pickingthe best is likely to be very expensive.
24
A More Efficient Approach:
Try all patterns that match the root.
+
c*
ba
To calculate an optimal tiling for a given tree:
In each case, we can calculatethe cost of that particular matchfrom the cost of the subtreesthat are used.
Pick the pattern withthe lowest cost,which must thereforebe optimal!
Calculate an optimal tiling for all of its subtrees.
25
Dynamic Programming:
Dynamic programming is a standard technique that isused to solve a wide range of “optimization problems”.
These are algorithms in which the goal is to find,amongst all values in some set, one that is biggest,smallest, longest, shortest, cheapest, … etc.
Dynamic programming is used when optimal solutionsto a problem can be calculated from optimal solutions toits subproblems.
A key feature of dynamic programming is the use oftables to record previously computed information (andso avoid unnecessary recomputation of those results.)
26
DP in Instruction Selection:
Suppose that we have just two registers, %eaxand %ebx.
For each node, we will store a triple: m, r1, r2:! m is the length of the optimal instruction sequence
to evaluate this tree node and store the result inmemory.
! r1 is the length of an optimal instruction sequence toevaluate this tree node when only one register isfree.
! r2 is the length of an optimal instruction sequencewhen both registers are free.
27
Memory References:
Memory references are leaf nodes in the tree.
It takes 0 instructions to evaluate the value in amemory location and save it to memory.
It takes 1 instruction to load a memory valueinto a register (it doesn’t matter how manyregisters are available).
So we annotate each
memory reference:
a
0,1,1
28
Calculating the Costs:
+
c*
ba
0,1,10,1,1
?,?,?
?,?,?
0,1,1
29
Operations:
+
rl
s,t,um,n,o
?,?,?
Suppose we are have atree whose root lookslike this:
Either one of thesepatterns could apply:
reg1:+
reg2reg1
addl reg2,reg1
reg1:+
varreg1
addl var,reg1
30
With Two Registers:
Using the addl reg,reg pattern:
! Evaluate l with two registers, r with one, and then dothe addition (length: o+t+1); or
! Evaluate r with two registers, l with one, and then dothe addition (length: u+n+1).
Using the addl var, reg pattern:
! Evaluate l with two registers, r from memory, andthen do the addition (length: o + s + 1).
The minimum of these lengths will be the costof an optimal solution.
31
Calculating the Costs:
+
c*
ba
0,1,1
0,1,10,1,1
?,?,2
?,?,?
32
With One Register:
Using the addl reg,reg pattern:
! Does not apply; we need at least two registers forthis!
Using the addl var, reg pattern:
! Evaluate l with one register, r from memory, andthen do the addition (length: n + s + 1).
33
Calculating the Costs:
+
c*
ba
0,1,1
0,1,10,1,1
?,2,2
?,?,?
34
Storing the Results in Memory:
The optimal way to calculate the value andstore it in memory is to use:
! The optimal way to calculate the value with bothregisters free;
! A store instruction to save the result in memory.
Note: If we end up using this rule, then we’llneed to chop off this tree for evaluation intomemory beforehand. This is ensures that tworegisters are actually available, and is also whatforces spilling…
35
Calculating the Costs:
+
c*
ba
0,1,1
0,1,10,1,1
3,2,2
?,?,?
36
Calculating the Costs:
+
c*
ba
0,1,1
0,1,10,1,1
3,2,2
4,3,3
37
Calculating the Costs:
+
c*
ba
0,1,1
0,1,10,1,1
3,2,2
4,3,3
If we have two free registers, then we obtainthe shortest code sequence like this:
We join together the instruction sequences ateach of these points to get the optimal code.
38
What Costs are Useful?
So far, we’ve used the length of an instruction sequenceas an indication of its cost.
Other measures might be useful in some circumstances:
! The number of bytes in the generated instruction;
! The predicted execution times for the generated instructions.
Getting accurate timing/cost details for modernmachines is almost impossible, because there are toomany variables, but good guesses can still be useful.
Each pattern can be annotated with a different cost.
39
Implementation Details:
We store an array of costs at each tree node (or witheach three-address code instruction).
We traverse the tree from the bottom up (or the blockfrom beginning to end), calculating costs as we go.
We traverse the tree from the top down (or the block inreverse order) to build up the optimal code sequence.
It is useful to store, at each node, the instructionscorresponding to each of the optimal costs that arecomputed during the first pass. (This makes it easier toimplement the second pass!)
40
Register Allocation:
We saw previously that register allocation andspilling are an important part of any attempt togenerate assembly language output.
The techniques that we have seen in thislecture combine register allocation andinstruction selection.
In some variations of these ideas, registerallocation is done before instruction selection,while in others, it is done after.
41
Portable Code Generators:
In the algorithm that we’ve seen here:! The set of patterns that we use will depend on the
target machine;
! The cost that we associate with each pattern willdepend on the target machine.
But the dynamic programming algorithm, andthe intermediate code are independent of thechoice of target.
Is it possible to build a portable code generator:a single program that takes a set of patternsand associated costs as a parameter rather thansomething that is hard-wired into the code?
42
Code Generator Generators:
In fact we can go a step further, and build acode generator generator:
For authors of code generators, this is theequivalent of a tool like lex or yacc.
The generated code generator is just a versionof a portable code generator that has beenspecialized to work on a particular input.
MachineDescription
(Tree grammar)
Code Generator(InstructionSelection)
43
Tree Grammars:
+
reg
addl var,reg11reg
Production Cost Action
$n movl $n,reg11reg
var movl var,reg11reg
+
regreg
addl reg2,reg11reg
[ ]
reg
movl (reg2),reg11reg
var
44
Continued…
+
[ ] movl a(,reg1,4), reg11reg
Production Cost Action
reg
+
$1
incl reg11reg
a *
4 reg
45
Continued:
:=
regvar
movl reg1,var1mem
Production Cost Action
:=
$nvar
movl $n,var1mem
:=
reg[ ]
movl reg1,(reg2)1mem
reg 46
The Tree Perspective:
+
s:+r:*
ed ct:*
ba
0,1,1 0,1,1 0,1,1
0,1,10,1,1
3,2,2
4,3,33,2,2
7,7,6
imull b, %ebx
movl a, %ebx
imull e, %eax
Final Translation:
movl d, %eax
addl c, %ebx
addl %ebx,%eax
47
Summary:
Good instruction selection is essential topreserve the benefits gained by optimization.
Instruction selection can be implemented byviewing the intermediate code as trees and thensearching for patterns corresponding toparticular target language machines.
Ideas like these can be used to build codegenerator generators, which make it mucheasier to build and maintain compilers thattarget multiple processors.