An Optimizing Compiler: From Intermediate To Target: The ... · The phases of a simple compiler:...

1

Compiler ConstructionSMD163

Lecture 13: Instruction selection

Viktor Leijon & Peter Jonsson with slides by Johan Nordlander

Contains material generously provided by Mark P. Jones

2

The phases of a simple compiler:

Without an intermediate representation, we use aform of “syntax-directed code generation” toproduce IA32 code directly from the validatedabstract syntax.

Reasonable output, reasonably low cost.

Lexer Parser StaticAnalysis

IA32 CodeGenerator

3

An Optimizing Compiler:

For better output (but higher cost)we might add an optimizer.

Lexer Parser StaticAnalysis

Intermed.Code

Generator

Target CodeGenerator

Optimizer

We can modify our IA32 code generator toproduce intermediate code.

4

From Intermediate To Target:

At some level, generating target code fromintermediate code is usually very easy.

We can usually find a simple mapping for eachintermediate code instruction onto acorresponding target language instruction (orinstruction sequence):

movl y, %eax

x := y + z addl z, %eax

movl %eax, x

5

Generating Good Target Code:

But will this do a good job?

A poor mapping from intermediate code totarget code could undermine improvements thatwere made during optimization.

The potential for mismatch is high:

! Intermediate codes are usually designed for analysisand optimization, and to increase portability.

! A target may have special instructions or addressingmodes that we need to exploit to get good output.

6

Instruction Selection:

The process of mapping from one instructionset to another is known as instruction selection.

We will see that it is possible to a good job ofinstruction selection, despite the problemsmentioned.

In fact, there are even tools that can be used toautomate the construction of a code generatorfrom the specification of a target machine.

Register allocation/spilling are still required.

7

Low-Level Intermediate Code…

A simple intermediate code may include only oneinstruction for loading from memory, and oneinstruction for saving to memory.

u := [v] [v] := u

Code to access the ith

element of an array a.

By using this intermediate code, we expose all thedetails of address calculation:

u := 4*i

v := a + u

w := [v]

This gives more opportunities for optimization.

And it is a good match for some RISC machines.8

… is a Poor Match for CISC:

A naïve translation of the same intermediate code intoIA32 machine language would give:

imull $4, %eax

addl $a, %eax

movl (%eax), %eax

But we can do much better!

movl a(,%eax,4), %eax

Was optimization worthwhile? A non-optimizingcompiler could have generated the more efficient codedirectly from the source code!

9

Pattern Matching:

To do a better job:

! Identify patterns of intermediate code instructionsequences for which a better target instruction canbe used.

! Search for (and apply) those patterns in theoptimizer’s output.

Like peephole optimization, but the patterns wewill be looking for are likely to be morecomplex.

10

Continued…

For example, in each of the following codesequences, we have an opportunity to use aninstruction like: movl a(,%eax,4), %eax

u := 4 * i

v := a + u

w := [v]

u := 4 * i

t := u * 7

v := u + a

w := [v]

u := 4 * i

t := s * 7

v := u + a

w := [v]

It’s easy tosearch for codelike this in the

optimizer’soutput.

But an assignmentto t here confuses

the pattern.

And the use of theintermediate value uhere in a separatecomputation adds

further complications.

11

Pattern Matching on Trees:

We can make the task a little bit easier bythinking of instruction sequences as trees:

a

v:+

u:*

4 i

[ ]

:=

wu := 4 * i

v := a + u

w := [v]

12


We can make the task a little bit easier bythinking of instruction sequences as forests:

a

v:+

u:*

4 i

[ ]

:=

w u := 4 * i

t := s * 7

v := u + a

w := [v]

*

s 7

Forests

A forest is aset of trees.

The patternis mucheasier tospot now!

:=

t

13


We can make the task a little bit easier bythinking of instruction sequences as Dags:

a

v:+

u:*

4 i

[ ]

:=

w

ForestsDags

u := 4 * i

t := u * 7

v := u + a

w := [v]

A dag is a “DirectedAcyclic Graph”

Again, thepattern iseasier to

spot!

*

7

:=

t

14

From Forests To Trees:

It is easy to deal with forests: we just treat eachtree as an independent entity.

The code that we will generate for two trees isjust the code for the first followed by the codefor the second.

15

From Dags to Forests:

We can convert a dag to a forest by replicating:

Or by introducing assignments:

We can use heuristics, or generate code for both, andsee which comes out best.

a

v:+

u:*

4 i

[ ]

:=

w t:*

7u:*

4 i

:=

t

a

v:+

u

[ ]

:=

w *

7u

*

4 i

:=

u

:=

t

16

Focus on Trees:

From now on, we will concentrate on trees.

But note that we use trees primarily a tool forpresenting techniques for instruction selection.

We might also choose to build and manipulate explicittree data structures in our implementation … but wedon’t have to.

u := 4 * i

…

v := a + u

…

w := [v]

For example, we can look forchildren of a tree node bysearching back through earlierstatements in a basic block.

17

An Example:

Suppose that we have thefollowing sequence ofintermediate codeinstructions:

t := a * b

s := t + c

r := d * e

q := r + s

Naïve translation:

movl a, %eax

imull b, %eax

movl %eax, t

movl t, %eax

addl c, %eax

movl %eax, s

movl d, %eax

imull e, %eax

movl %eax, r

movl r, %eax

addl s, %eax

movl %eax, q

Suppose also that none oft, s, or r are live afterthese instructions.

18

The Tree Perspective:

Here is what thecorresponding treelooks like.

If possible, we’d liketo evaluate thewhole thing usingjust registers…

+

:=

q

s:+r:*

ed ct:*

ba

19

Trees and Register Spilling:

If the whole tree istoo complex toevaluate in registers,then we can break itinto simpler piecesby spillingintermediate valuesto memory.

How can we identifythe places where thisis necessary?

+

:=

q

s:+r:*

ed ct:*

ba

+

:=

q

sr:*

ed

+

ct:*

ba

:=

s

20

Simple Patterns for Registers:

Here are some patterns that we can use toevaluate parts of a tree, leaving the result in aregister:

reg1:+

reg2reg1

addl reg2,reg1

reg2:+

reg2reg1

addl reg1,reg2

reg1:var

movl var,reg1

reg1:+

varreg1

addl var,reg1

21

%eax

addl %ebx,%eax

movl c, %ebx

%ebx%eax

imull %ebx, %eax

movl b, %ebx

%ebx

movl a, %eax

%eax

Tiling a Tree:

+

c*

ba

movl a, %eax

movl b, %ebx

imull %ebx, %eax

movl c, %ebx

addl %ebx, %eax

Generated Target Code:

22

addl c, %eax

%eax

imull b, %eax

%eax

movl a, %eax

%eax

Tiling a Tree (Differently):

+

c*

ba

movl a, %eax

imull b, %eax

addl c, %eax

Generated Target Code:

23

Choosing Between Tilings:

Even simple trees can have multiple tilings, eachcorresponding to a different sequence ofinstructions.

For the best code, we want to use as few tilesas possible.

Generating all possible tilings, and then pickingthe best is likely to be very expensive.

24

A More Efficient Approach:

Try all patterns that match the root.

+

c*

ba

To calculate an optimal tiling for a given tree:

In each case, we can calculatethe cost of that particular matchfrom the cost of the subtreesthat are used.

Pick the pattern withthe lowest cost,which must thereforebe optimal!

Calculate an optimal tiling for all of its subtrees.

25

Dynamic Programming:

Dynamic programming is a standard technique that isused to solve a wide range of “optimization problems”.

These are algorithms in which the goal is to find,amongst all values in some set, one that is biggest,smallest, longest, shortest, cheapest, … etc.

Dynamic programming is used when optimal solutionsto a problem can be calculated from optimal solutions toits subproblems.

A key feature of dynamic programming is the use oftables to record previously computed information (andso avoid unnecessary recomputation of those results.)

26

DP in Instruction Selection:

Suppose that we have just two registers, %eaxand %ebx.

For each node, we will store a triple: m, r1, r2:! m is the length of the optimal instruction sequence

to evaluate this tree node and store the result inmemory.

! r1 is the length of an optimal instruction sequence toevaluate this tree node when only one register isfree.

! r2 is the length of an optimal instruction sequencewhen both registers are free.

27

Memory References:

Memory references are leaf nodes in the tree.

It takes 0 instructions to evaluate the value in amemory location and save it to memory.

It takes 1 instruction to load a memory valueinto a register (it doesn’t matter how manyregisters are available).

So we annotate each

memory reference:

a

0,1,1

28

Calculating the Costs:

+

c*

ba

0,1,10,1,1

?,?,?

?,?,?

0,1,1

29

Operations:

+

rl

s,t,um,n,o

?,?,?

Suppose we are have atree whose root lookslike this:

Either one of thesepatterns could apply:

reg1:+

reg2reg1

addl reg2,reg1

reg1:+

varreg1

addl var,reg1

30

With Two Registers:

Using the addl reg,reg pattern:

! Evaluate l with two registers, r with one, and then dothe addition (length: o+t+1); or

! Evaluate r with two registers, l with one, and then dothe addition (length: u+n+1).

Using the addl var, reg pattern:

! Evaluate l with two registers, r from memory, andthen do the addition (length: o + s + 1).

The minimum of these lengths will be the costof an optimal solution.

31


+

c*

ba

0,1,1

0,1,10,1,1

?,?,2

?,?,?

32

With One Register:

Using the addl reg,reg pattern:

! Does not apply; we need at least two registers forthis!

Using the addl var, reg pattern:

! Evaluate l with one register, r from memory, andthen do the addition (length: n + s + 1).

33


+

c*

ba

0,1,1

0,1,10,1,1

?,2,2

?,?,?

34

Storing the Results in Memory:

The optimal way to calculate the value andstore it in memory is to use:

! The optimal way to calculate the value with bothregisters free;

! A store instruction to save the result in memory.

Note: If we end up using this rule, then we’llneed to chop off this tree for evaluation intomemory beforehand. This is ensures that tworegisters are actually available, and is also whatforces spilling…

35


+

c*

ba

0,1,1

0,1,10,1,1

3,2,2

?,?,?

36


+

c*

ba

0,1,1

0,1,10,1,1

3,2,2

4,3,3

37


+

c*

ba

0,1,1

0,1,10,1,1

3,2,2

4,3,3

If we have two free registers, then we obtainthe shortest code sequence like this:

We join together the instruction sequences ateach of these points to get the optimal code.

38

What Costs are Useful?

So far, we’ve used the length of an instruction sequenceas an indication of its cost.

Other measures might be useful in some circumstances:

! The number of bytes in the generated instruction;

! The predicted execution times for the generated instructions.

Getting accurate timing/cost details for modernmachines is almost impossible, because there are toomany variables, but good guesses can still be useful.

Each pattern can be annotated with a different cost.

39

Implementation Details:

We store an array of costs at each tree node (or witheach three-address code instruction).

We traverse the tree from the bottom up (or the blockfrom beginning to end), calculating costs as we go.

We traverse the tree from the top down (or the block inreverse order) to build up the optimal code sequence.

It is useful to store, at each node, the instructionscorresponding to each of the optimal costs that arecomputed during the first pass. (This makes it easier toimplement the second pass!)

40

Register Allocation:

We saw previously that register allocation andspilling are an important part of any attempt togenerate assembly language output.

The techniques that we have seen in thislecture combine register allocation andinstruction selection.

In some variations of these ideas, registerallocation is done before instruction selection,while in others, it is done after.

41

Portable Code Generators:

In the algorithm that we’ve seen here:! The set of patterns that we use will depend on the

target machine;

! The cost that we associate with each pattern willdepend on the target machine.

But the dynamic programming algorithm, andthe intermediate code are independent of thechoice of target.

Is it possible to build a portable code generator:a single program that takes a set of patternsand associated costs as a parameter rather thansomething that is hard-wired into the code?

42

Code Generator Generators:

In fact we can go a step further, and build acode generator generator:

For authors of code generators, this is theequivalent of a tool like lex or yacc.

The generated code generator is just a versionof a portable code generator that has beenspecialized to work on a particular input.

MachineDescription

(Tree grammar)

Code Generator(InstructionSelection)

43

Tree Grammars:

+

reg

addl var,reg11reg

Production Cost Action

$n movl $n,reg11reg

var movl var,reg11reg

+

regreg

addl reg2,reg11reg

[ ]

reg

movl (reg2),reg11reg

var

44

Continued…

+

[ ] movl a(,reg1,4), reg11reg


reg

+

$1

incl reg11reg

a *

4 reg

45

Continued:

:=

regvar

movl reg1,var1mem


:=

$nvar

movl $n,var1mem

:=

reg[ ]

movl reg1,(reg2)1mem

reg 46

The Tree Perspective:

+

s:+r:*

ed ct:*

ba

0,1,1 0,1,1 0,1,1

0,1,10,1,1

3,2,2

4,3,33,2,2

7,7,6

imull b, %ebx

movl a, %ebx

imull e, %eax

Final Translation:

movl d, %eax

addl c, %ebx

addl %ebx,%eax

47

Summary:

Good instruction selection is essential topreserve the benefits gained by optimization.

Instruction selection can be implemented byviewing the intermediate code as trees and thensearching for patterns corresponding toparticular target language machines.

Ideas like these can be used to build codegenerator generators, which make it mucheasier to build and maintain compilers thattarget multiple processors.

An Optimizing Compiler: From Intermediate To Target: The ... · The phases of a simple compiler:...

Documents

Transcript of An Optimizing Compiler: From Intermediate To Target: The ... · The phases of a simple compiler:...