Code Generation 1
-
Upload
maria-ghazalia-cameroon -
Category
Documents
-
view
228 -
download
0
Transcript of Code Generation 1
-
7/31/2019 Code Generation 1
1/45
Code Generation
Steve Johnson
-
7/31/2019 Code Generation 1
2/45
May 23, 2005 Copyright (c) Stephen C. Johnson2005
2
The Problem
Given an expression tree and a machine
architecture, generate a set of instructions
that evaluate the tree
Initially, consider only trees (no common
subexpressions)
Interested in the quality of the program
Interested in the running time of the algorithm
-
7/31/2019 Code Generation 1
3/45
May 23, 2005 Copyright (c) Stephen C. Johnson2005
3
The Solution
Over a large class of machine architectures, we
can generate optimal programs in linear time
A very practical algorithm
But different from the way most compilers work today And the technique, dynamic programming, is powerful
and interesting
Work done with Al Aho, published in JACM
-
7/31/2019 Code Generation 1
4/45
May 23, 2005 Copyright (c) Stephen C. Johnson2005
4
What is an Expression Tree?
Nodes represent
Operators (including assignment)
Operands (memory, registers, constants)
No flow of control operations
A
=
+
B C
-
7/31/2019 Code Generation 1
5/45
May 23, 2005 Copyright (c) Stephen C. Johnson2005
5
Representing Operands
In fact, we want the tree to represent where
the operands are found
MEM
(A)
=
+
MEM
(B)
MEM
(C)
-
7/31/2019 Code Generation 1
6/45
May 23, 2005 Copyright (c) Stephen C. Johnson2005
6
Possible Programs
load B,r1;
load C,r2;
add r1,r2,r1
store r1,Aorload B,r1
add C,r1
store r1,Aor
add B,C,A
-
7/31/2019 Code Generation 1
7/45
May 23, 2005 Copyright (c) Stephen C. Johnson2005
7
(Assembler Notation)
Data always moves left to right
load B,r1 r1 = MEM(B)
add r1,r2,r3 r3 = r1 + r2store r1,A MEM(A) = r1
-
7/31/2019 Code Generation 1
8/45
May 23, 2005 Copyright (c) Stephen C. Johnson2005
8
Which is Better?
Not all sequences legal on all machines
Longer sequences may be faster
Situation gets more complex when Complicated expressions run out of registers
Some operations (e.g., call) take a lot ofregisters
Instructions have complicated addressing
modes
-
7/31/2019 Code Generation 1
9/45
May 23, 2005 Copyright (c) Stephen C. Johnson2005
9
Example Code
A = 5*B + asin(C/2 + sin(D))
might generate (machine with 2 registers)
load B,r1 OR load D,r1
mul r1,#5,r1 call sin
store r1,T1 load C,r2
load C,r1 div r2,#2,r2
div r1,#2,r1 add r2,r1,r1
store r1,T2 call asin
load D,r1 load B,r2
call sin mul r2,#5,r2
load T2,r2 add r1,r2,r1
add r2,r1 store r1,Acall asin
load T1,r2
add r2,r1,r1
store r1,A
-
7/31/2019 Code Generation 1
10/45
May 23, 2005 Copyright (c) Stephen C. Johnson2005
10
What is an Instruction
An instruction is a tree transformation
MEM
(A)
REG
(r1)load A,r1
REG
(r1)
MEM
(A)store r1,A
*
REG
(r2)load (r1),r2
REG
(r1)
-
7/31/2019 Code Generation 1
11/45
May 23, 2005 Copyright (c) Stephen C. Johnson2005
11
These can be Quite Complicated
*
+
REG
(r1)
REG
(r2)
INT
2
-
7/31/2019 Code Generation 1
12/45
May 23, 2005 Copyright (c) Stephen C. Johnson2005
12
Types and Resources
Expression Trees (and instructions)
typically have types associated with them
Well ignore this
Doesnt introduce any real problems
Instructions often need resources to work
For example, a temporary register or a
temporary storage location
Will be discussed later
-
7/31/2019 Code Generation 1
13/45
May 23, 2005 Copyright (c) Stephen C. Johnson2005
13
Programs
A program is a sequence of instructions
A program computesan expression tree ifit transforms the tree according to the
desired goal Compute the tree into a register
Compute the tree into memory
Compute the tree for its side-effects Condition codes
Assignments
-
7/31/2019 Code Generation 1
14/45
May 23, 2005 Copyright (c) Stephen C. Johnson2005
14
Example
Goal: compute for side effects
MEM
(A)
=
+
MEM
(B)
MEM
(C)
load B,r1
load C,r2
add r1,r2,r1
store A,r1
-
7/31/2019 Code Generation 1
15/45
May 23, 2005 Copyright (c) Stephen C. Johnson2005
15
Example (cont.)
MEM
(A)
=
+
REG
(r1)
MEM
(C)
load C,r2
MEM
(A)
=
+
REG
(r1)
REG
(r2)
-
7/31/2019 Code Generation 1
16/45
May 23, 2005 Copyright (c) Stephen C. Johnson2005
16
Example (cont.)
MEM
(A)
=
+
REG
(r1)
REG
(r2)
add r1,r2,r1
MEM
(A)
=
REG
(r1)
-
7/31/2019 Code Generation 1
17/45
May 23, 2005 Copyright (c) Stephen C. Johnson2005
17
Example (concl.)
store r1,A
MEM
(A)
=
REG
(r1)
REG(r1) (Side effect done)
-
7/31/2019 Code Generation 1
18/45
May 23, 2005 Copyright (c) Stephen C. Johnson2005
18
Typical Code Generation
Some variables are assigned to registers,
leaving a certain number ofscratchregisters
An expression tree is walked, producing
instructions (greedy algorithm...). An
infinite number of temporary registers is
assumed
-
7/31/2019 Code Generation 1
19/45
May 23, 2005 Copyright (c) Stephen C. Johnson2005
19
Typical Code Generation (cont.)
A register allocationphase is run
Assign temporary registers to scratch register
Often by doing graph coloring...
If you run out of scratch registers, spill
Select a register
Store it into a temporary
When it is needed again, reload it
-
7/31/2019 Code Generation 1
20/45
May 23, 2005 Copyright (c) Stephen C. Johnson2005
20
Practical Observation
Many (most?) code generation bugshappen in this spill code
Choose a register that is really needed
Very hard to test... Create test cases that just barely fit or just barely
dont fit to test edge cases...
Can be quite inefficient
thrashing of scratch registers
Code may not be optimal
-
7/31/2019 Code Generation 1
21/45
May 23, 2005 Copyright (c) Stephen C. Johnson2005
21
Complexity Results
Simple machine with 2-address instructions:
r1 op r2 => r1
Cost = number of instructions
Allow common subexpressions only of the formA op B, where A and B are leaf nodes
Generating optimal code is N-P complete
Even if there are an infinite number of registers! Implies exponential time for a tree with nnodes
-
7/31/2019 Code Generation 1
22/45
May 23, 2005 Copyright (c) Stephen C. Johnson2005
22
Complexity Results (cont.)
Simple 3-address machine
r1 op r2 => r3
Cost = number of instructions
Allow arbitrary common subexpressions
Infinite number of registers
Can get optimal code in linear time Topological sort
Each node in a different register
-
7/31/2019 Code Generation 1
23/45
May 23, 2005 Copyright (c) Stephen C. Johnson2005 23
Complexity Results (cont.)
In the 3-address model, finding optimal
code that uses the minimal number of
registers is N-P complete
But thats not what we are faced with in
practice
We have a certain number of registers
We need to use them intelligently
-
7/31/2019 Code Generation 1
24/45
May 23, 2005 Copyright (c) Stephen C. Johnson2005 24
Complexity Results (concl.)
For many practical machine architectures(including 2-address machines), we cangenerate optimal code in linear time when
there are no common subexpressions(tree)
Can be extended to an algorithmexponential in the amount of sharing
The optimal instruction sequence is notgenerated by a simple tree walk
-
7/31/2019 Code Generation 1
25/45
May 23, 2005 Copyright (c) Stephen C. Johnson2005 25
Machine Model Restrictions
Resources (temporary registers) must be
interchangeable. We will assume that we
have Nof them
Every instruction has a (positive) cost
The cost of a program is the sum of the
costs of the instructions
No other constraints on the instruction
shape or format (!)
-
7/31/2019 Code Generation 1
26/45
May 23, 2005 Copyright (c) Stephen C. Johnson2005 26
Study Optimal Programs
Suppose we have an expression tree T that wewish to compute into a register For the moment, we assume Tcan be computed with
no stores
We assume we have Nscratchregisters
Suppose the root node ofTis +
Then, in an optimal program, the last instruction
must have a + at the root of the tree that ittransforms We make a list of these instructions
Each has some preconditionsfor it to be legal
-
7/31/2019 Code Generation 1
27/45
May 23, 2005 Copyright (c) Stephen C. Johnson2005 27
Preconditions: Example
Suppose the last instruction wasadd r1,r2,r1
Suppose the tree T looks like
Then our optimal program must compute T1 intor1 and T2into r2
+
T1 T2
-
7/31/2019 Code Generation 1
28/45
May 23, 2005 Copyright (c) Stephen C. Johnson2005 28
Precondition Resources
If our optimal program ends in this add
instruction, then we can assume that it
contains two subprograms that compute
T1 and T2into r1 and r2, respectively
-
7/31/2019 Code Generation 1
29/45
May 23, 2005 Copyright (c) Stephen C. Johnson2005 29
Precondition Resources (cont.)
Look at the first instruction
If it computes part ofT1, then (since nostores) at least one register is always in use
computing T1. So T2must be computed using at most N-1
registers
Alternatively, if the first instruction computespart ofT2, T1 must be computed using atmost N-1 registers.
-
7/31/2019 Code Generation 1
30/45
May 23, 2005 Copyright (c) Stephen C. Johnson2005 30
Reordering Lemma
Let Pbe an optimal program without stores thatcomputes T. Suppose it ends in an instruction Xthat has kpreconditions. Then we can reorderthe instructions in Pso it looks like
P1 P2 P3 ... Pk X
where the Pi compute the preconditions ofXinsome order. Moreover, P2uses at most N-1
registers, P3uses at most N-2registers, etc.,and each Picomputes its precondition optimallyusing that number of registers
-
7/31/2019 Code Generation 1
31/45
May 23, 2005 Copyright (c) Stephen C. Johnson2005 31
Cost Computation
Define C(T,n) to be the cost of the optimalprogram computing Tusing at most nregisters.Suppose Xis an instruction matching the root ofTwith kpreconditions, corresponding to subtrees
T1 through Tk. ThenC(T,n)
-
7/31/2019 Code Generation 1
32/45
May 23, 2005 Copyright (c) Stephen C. Johnson2005 32
Sketch of Proof
By the reordering lemma, we can write any
optimal program as a sequence of subprograms
computing the preconditions in order, with
decreasing numbers of scratch registers,followed by some instruction X. If anysubprograms is not optimal, we can make the
program shorter, contradicting optimality of the
original program. Thus the optimal cost equalsone of the sums (for some Xand permutation)
-
7/31/2019 Code Generation 1
33/45
May 23, 2005 Copyright (c) Stephen C. Johnson2005 33
How About Stores (spills)?
We will now let C(T,n) represent the cost
of computing Twith nregisters if stores(spills) are allowed.
More notation: ifTis a tree and Sasubtree, T/Swill represent Twith Sremoved and replaced by a MEM node.
-
7/31/2019 Code Generation 1
34/45
May 23, 2005 Copyright (c) Stephen C. Johnson2005 34
Another Rearrangement Lemma
Suppose Pis an optimal programcomputing a tree T, and suppose asubtree Sis stored into a temporary
location in this optimal program. Then Pcan be rewritten in the form
P1 P2
where P1 computes Sinto memory andP2computes T/S.
-
7/31/2019 Code Generation 1
35/45
May 23, 2005 Copyright (c) Stephen C. Johnson2005 35
Consequences
P1 can use all Nregisters. AfterP1 runs, allregisters are free again.
Let C(S,0) be the cost of computing Sinto a
temporary (MEM) location. ThenC(T,n)
-
7/31/2019 Code Generation 1
36/45
May 23, 2005 Copyright (c) Stephen C. Johnson2005 36
Optimal Algorithm
1. Recursively compute C(S,n) and C(S,0) for allsubtrees ofT, starting bottom up, and all n
-
7/31/2019 Code Generation 1
37/45
May 23, 2005 Copyright (c) Stephen C. Johnson2005 37
Dynamic Programming
This bottom-up technique is called dynamicprogramming
It has a fixed cost per tree node because:
There are a finite (usually small) number ofinstructions that match the root of each tree
The number of permutations for each instruction is
fixed (and typically small)
The number of scratch registers Nis fixed
So the optimal cost can be determined in time
linear in the size of the tree
-
7/31/2019 Code Generation 1
38/45
May 23, 2005 Copyright (c) Stephen C. Johnson2005 38
Unravelling
Going from the minimal cost back to the
instructions can be done several ways:
Can remember the instruction and
permutation that gives the minimal value foreach node
At each node, recompute the desired minimal
value until you find an instruction andpermutation that attain it
-
7/31/2019 Code Generation 1
39/45
May 23, 2005 Copyright (c) Stephen C. Johnson2005 39
Top-Down Memo Algorithm
Instead of computing bottom up, you can
compute top down (in a lazy manner) and
remember the results. This might be
considerably faster for some architectures
-
7/31/2019 Code Generation 1
40/45
May 23, 2005 Copyright (c) Stephen C. Johnson2005 40
No Spills!
Note that we do not have to have spillcode in this algorithm. The subtrees thatare computed and stored fall out of the
algorithm. They are computed ahead of the main
computation, when all registers areavailable.
The resulting instruction stream is nottypically a tree walk of the input.
-
7/31/2019 Code Generation 1
41/45
May 23, 2005 Copyright (c) Stephen C. Johnson2005 41
Reality Check
Major assumptions
Cost is the sum of costs of instructions
Assumes single ALU, no overlapping
Many machines now have multiple ALUs, overlapping
operations
All registers identical
True of most RISC machines
Not true of X86 architectures
But memory operations getting more expensive
Optimality for spills is important
-
7/31/2019 Code Generation 1
42/45
May 23, 2005 Copyright (c) Stephen C. Johnson
2005
42
Other Issues
Register allocation across multiple
statements, flow control, etc.
Can make a big difference in performance
Can use this algorithms to evaluate possible
allocations
Cost of losing a scratch register to hold a variable
-
7/31/2019 Code Generation 1
43/45
May 23, 2005 Copyright (c) Stephen C. Johnson
2005
43
Common Subexpressions
A subtree SofTis used more than once(Tis now not a tree, but a DAG)
Say there are 2 uses ofS. Then there are
4 strategies Compute S and store it
Compute one use and save the result until thesecond use (2 ways, depending on which useis first
Ignore the sharing, and recompute S
-
7/31/2019 Code Generation 1
44/45
May 23, 2005 Copyright (c) Stephen C. Johnson
2005
44
Cost Computations
Ignoring the sharing is easy
Computing and storing is easy
Ordering the two uses implies an orderingof preconditions in some higher-level
instruction selection
And the number of free registers is affected,
too
Do the problem twice, once for each order
-
7/31/2019 Code Generation 1
45/45
May 23, 2005 Copyright (c) Stephen C. Johnson 45
Summary
Register spills are evil
Complicated, error-prone, hard to test
If something is to be spilled, compute it
ahead of time with all registers free
The optimal spill points fall out of the
dynamic programming algorithm