Code Generation 1

7/31/2019 Code Generation 1

1/45

Code Generation

Steve Johnson


2/45

May 23, 2005 Copyright (c) Stephen C. Johnson2005

2

The Problem

Given an expression tree and a machine

architecture, generate a set of instructions

that evaluate the tree

Initially, consider only trees (no common

subexpressions)

Interested in the quality of the program

Interested in the running time of the algorithm


3/45


3

The Solution

Over a large class of machine architectures, we

can generate optimal programs in linear time

A very practical algorithm

But different from the way most compilers work today And the technique, dynamic programming, is powerful

and interesting

Work done with Al Aho, published in JACM


4/45


4

What is an Expression Tree?

Nodes represent

Operators (including assignment)

Operands (memory, registers, constants)

No flow of control operations

A

=

+

B C


5/45


5

Representing Operands

In fact, we want the tree to represent where

the operands are found

MEM

(A)

=

+

MEM

(B)

MEM

(C)


6/45


6

Possible Programs

load B,r1;

load C,r2;

add r1,r2,r1

store r1,Aorload B,r1

add C,r1

store r1,Aor

add B,C,A


7/45


7

(Assembler Notation)

Data always moves left to right

load B,r1 r1 = MEM(B)

add r1,r2,r3 r3 = r1 + r2store r1,A MEM(A) = r1


8/45


8

Which is Better?

Not all sequences legal on all machines

Longer sequences may be faster

Situation gets more complex when Complicated expressions run out of registers

Some operations (e.g., call) take a lot ofregisters

Instructions have complicated addressing

modes


9/45


9

Example Code

A = 5*B + asin(C/2 + sin(D))

might generate (machine with 2 registers)

load B,r1 OR load D,r1

mul r1,#5,r1 call sin

store r1,T1 load C,r2

load C,r1 div r2,#2,r2

div r1,#2,r1 add r2,r1,r1

store r1,T2 call asin

load D,r1 load B,r2

call sin mul r2,#5,r2

load T2,r2 add r1,r2,r1

add r2,r1 store r1,Acall asin

load T1,r2

add r2,r1,r1

store r1,A


10/45


10

What is an Instruction

An instruction is a tree transformation

MEM

(A)

REG

(r1)load A,r1

REG

(r1)

MEM

(A)store r1,A

*

REG

(r2)load (r1),r2

REG

(r1)


11/45


11

These can be Quite Complicated

*

+

REG

(r1)

REG

(r2)

INT

2


12/45


12

Types and Resources

Expression Trees (and instructions)

typically have types associated with them

Well ignore this

Doesnt introduce any real problems

Instructions often need resources to work

For example, a temporary register or a

temporary storage location

Will be discussed later


13/45


13

Programs

A program is a sequence of instructions

A program computesan expression tree ifit transforms the tree according to the

desired goal Compute the tree into a register

Compute the tree into memory

Compute the tree for its side-effects Condition codes

Assignments


14/45


14

Example

Goal: compute for side effects

MEM

(A)

=

+

MEM

(B)

MEM

(C)

load B,r1

load C,r2

add r1,r2,r1

store A,r1


15/45


15

Example (cont.)

MEM

(A)

=

+

REG

(r1)

MEM

(C)

load C,r2

MEM

(A)

=

+

REG

(r1)

REG

(r2)


16/45


16

Example (cont.)

MEM

(A)

=

+

REG

(r1)

REG

(r2)

add r1,r2,r1

MEM

(A)

=

REG

(r1)


17/45


17

Example (concl.)

store r1,A

MEM

(A)

=

REG

(r1)

REG(r1) (Side effect done)


18/45


18

Typical Code Generation

Some variables are assigned to registers,

leaving a certain number ofscratchregisters

An expression tree is walked, producing

instructions (greedy algorithm...). An

infinite number of temporary registers is

assumed


19/45


19

Typical Code Generation (cont.)

A register allocationphase is run

Assign temporary registers to scratch register

Often by doing graph coloring...

If you run out of scratch registers, spill

Select a register

Store it into a temporary

When it is needed again, reload it


20/45


20

Practical Observation

Many (most?) code generation bugshappen in this spill code

Choose a register that is really needed

Very hard to test... Create test cases that just barely fit or just barely

dont fit to test edge cases...

Can be quite inefficient

thrashing of scratch registers

Code may not be optimal


21/45


21

Complexity Results

Simple machine with 2-address instructions:

r1 op r2 => r1

Cost = number of instructions

Allow common subexpressions only of the formA op B, where A and B are leaf nodes

Generating optimal code is N-P complete

Even if there are an infinite number of registers! Implies exponential time for a tree with nnodes


22/45


22

Complexity Results (cont.)

Simple 3-address machine

r1 op r2 => r3

Cost = number of instructions

Allow arbitrary common subexpressions

Infinite number of registers

Can get optimal code in linear time Topological sort

Each node in a different register


23/45

May 23, 2005 Copyright (c) Stephen C. Johnson2005 23

Complexity Results (cont.)

In the 3-address model, finding optimal

code that uses the minimal number of

registers is N-P complete

But thats not what we are faced with in

practice

We have a certain number of registers

We need to use them intelligently


24/45


Complexity Results (concl.)

For many practical machine architectures(including 2-address machines), we cangenerate optimal code in linear time when

there are no common subexpressions(tree)

Can be extended to an algorithmexponential in the amount of sharing

The optimal instruction sequence is notgenerated by a simple tree walk


25/45


Machine Model Restrictions

Resources (temporary registers) must be

interchangeable. We will assume that we

have Nof them

Every instruction has a (positive) cost

The cost of a program is the sum of the

costs of the instructions

No other constraints on the instruction

shape or format (!)


26/45


Study Optimal Programs

Suppose we have an expression tree T that wewish to compute into a register For the moment, we assume Tcan be computed with

no stores

We assume we have Nscratchregisters

Suppose the root node ofTis +

Then, in an optimal program, the last instruction

must have a + at the root of the tree that ittransforms We make a list of these instructions

Each has some preconditionsfor it to be legal


27/45


Preconditions: Example

Suppose the last instruction wasadd r1,r2,r1

Suppose the tree T looks like

Then our optimal program must compute T1 intor1 and T2into r2

+

T1 T2


28/45


Precondition Resources

If our optimal program ends in this add

instruction, then we can assume that it

contains two subprograms that compute

T1 and T2into r1 and r2, respectively


29/45


Precondition Resources (cont.)

Look at the first instruction

If it computes part ofT1, then (since nostores) at least one register is always in use

computing T1. So T2must be computed using at most N-1

registers

Alternatively, if the first instruction computespart ofT2, T1 must be computed using atmost N-1 registers.


30/45


Reordering Lemma

Let Pbe an optimal program without stores thatcomputes T. Suppose it ends in an instruction Xthat has kpreconditions. Then we can reorderthe instructions in Pso it looks like

P1 P2 P3 ... Pk X

where the Pi compute the preconditions ofXinsome order. Moreover, P2uses at most N-1

registers, P3uses at most N-2registers, etc.,and each Picomputes its precondition optimallyusing that number of registers


31/45


Cost Computation

Define C(T,n) to be the cost of the optimalprogram computing Tusing at most nregisters.Suppose Xis an instruction matching the root ofTwith kpreconditions, corresponding to subtrees

T1 through Tk. ThenC(T,n)


32/45


Sketch of Proof

By the reordering lemma, we can write any

optimal program as a sequence of subprograms

computing the preconditions in order, with

decreasing numbers of scratch registers,followed by some instruction X. If anysubprograms is not optimal, we can make the

program shorter, contradicting optimality of the

original program. Thus the optimal cost equalsone of the sums (for some Xand permutation)


33/45


How About Stores (spills)?

We will now let C(T,n) represent the cost

of computing Twith nregisters if stores(spills) are allowed.

More notation: ifTis a tree and Sasubtree, T/Swill represent Twith Sremoved and replaced by a MEM node.


34/45


Another Rearrangement Lemma

Suppose Pis an optimal programcomputing a tree T, and suppose asubtree Sis stored into a temporary

location in this optimal program. Then Pcan be rewritten in the form

P1 P2

where P1 computes Sinto memory andP2computes T/S.


35/45


Consequences

P1 can use all Nregisters. AfterP1 runs, allregisters are free again.

Let C(S,0) be the cost of computing Sinto a

temporary (MEM) location. ThenC(T,n)


36/45


Optimal Algorithm

1. Recursively compute C(S,n) and C(S,0) for allsubtrees ofT, starting bottom up, and all n


37/45


Dynamic Programming

This bottom-up technique is called dynamicprogramming

It has a fixed cost per tree node because:

There are a finite (usually small) number ofinstructions that match the root of each tree

The number of permutations for each instruction is

fixed (and typically small)

The number of scratch registers Nis fixed

So the optimal cost can be determined in time

linear in the size of the tree


38/45


Unravelling

Going from the minimal cost back to the

instructions can be done several ways:

Can remember the instruction and

permutation that gives the minimal value foreach node

At each node, recompute the desired minimal

value until you find an instruction andpermutation that attain it


39/45


Top-Down Memo Algorithm

Instead of computing bottom up, you can

compute top down (in a lazy manner) and

remember the results. This might be

considerably faster for some architectures


40/45


No Spills!

Note that we do not have to have spillcode in this algorithm. The subtrees thatare computed and stored fall out of the

algorithm. They are computed ahead of the main

computation, when all registers areavailable.

The resulting instruction stream is nottypically a tree walk of the input.


41/45


Reality Check

Major assumptions

Cost is the sum of costs of instructions

Assumes single ALU, no overlapping

Many machines now have multiple ALUs, overlapping

operations

All registers identical

True of most RISC machines

Not true of X86 architectures

But memory operations getting more expensive

Optimality for spills is important


42/45

May 23, 2005 Copyright (c) Stephen C. Johnson

2005

42

Other Issues

Register allocation across multiple

statements, flow control, etc.

Can make a big difference in performance

Can use this algorithms to evaluate possible

allocations

Cost of losing a scratch register to hold a variable


43/45


2005

43

Common Subexpressions

A subtree SofTis used more than once(Tis now not a tree, but a DAG)

Say there are 2 uses ofS. Then there are

4 strategies Compute S and store it

Compute one use and save the result until thesecond use (2 ways, depending on which useis first

Ignore the sharing, and recompute S


44/45


2005

44

Cost Computations

Ignoring the sharing is easy

Computing and storing is easy

Ordering the two uses implies an orderingof preconditions in some higher-level

instruction selection

And the number of free registers is affected,

too

Do the problem twice, once for each order


45/45

May 23, 2005 Copyright (c) Stephen C. Johnson 45

Summary

Register spills are evil

Complicated, error-prone, hard to test

If something is to be spilled, compute it

ahead of time with all registers free

The optimal spill points fall out of the

dynamic programming algorithm

Code Generation 1

Documents

Transcript of Code Generation 1