Basic Block, Trace and Instruction Selection
description
Transcript of Basic Block, Trace and Instruction Selection
![Page 1: Basic Block, Trace and Instruction Selection](https://reader036.fdocuments.us/reader036/viewer/2022062518/56813fea550346895daae27c/html5/thumbnails/1.jpg)
1
Basic Block, Traceand Instruction Selection
Chapter 8, 9
![Page 2: Basic Block, Trace and Instruction Selection](https://reader036.fdocuments.us/reader036/viewer/2022062518/56813fea550346895daae27c/html5/thumbnails/2.jpg)
2
Tree IR
(1) Semantic gap
(2) IR is not proper for optimization analysis
Machine Languages
• Tree representation => no execution order is assumed.
Tree Model v.s. Flat list of instructions Eg: - Some expressions have side effects ESEQ ESEQ (rj M(rj+C), rk) , CALL call(“f1”, ESEQ (rj M(rj+k), rk) , call(…))
- Semantic Gap CJUMP vs. Jump on Condition 2 targets 1 target + “fall through” (x>y) ? goto Ltrue : goto Lfalse vs. if(x>y) goto Ltrue
![Page 3: Basic Block, Trace and Instruction Selection](https://reader036.fdocuments.us/reader036/viewer/2022062518/56813fea550346895daae27c/html5/thumbnails/3.jpg)
3
Semantic Gap Continued
- ESEQ within expression is inconvenient- evaluation order matters
- CALL node within expression causes side effect ! - CALL node within the argument – expression of other CALL nodes will cause problem if the args of result are passed in the same (one) register.
- Rewrite Tree into an equivalent one(Canonical Form)
SEQ
SEQ
SEQ SEQ
S1
S2 S3 S4 S5
=> S1;S2;S3;S4;S5
![Page 4: Basic Block, Trace and Instruction Selection](https://reader036.fdocuments.us/reader036/viewer/2022062518/56813fea550346895daae27c/html5/thumbnails/4.jpg)
4
Transformation
Step 1: A tree is rewritten into a list of “canonical trees” without SEQ or ESEQ nodes.
-> tree.StmList linearize(tree.Stm S);
Step 2: Grouping into a set of “basic blocks” which contains no internal jumps or labels
-> BasicBlocks
Step 3: Basic Blocks are ordered into a set of “traces” in which every CJUMP is immediately followed by
false label.-> TraceSchedule(BasicBlocks b)
![Page 5: Basic Block, Trace and Instruction Selection](https://reader036.fdocuments.us/reader036/viewer/2022062518/56813fea550346895daae27c/html5/thumbnails/5.jpg)
5
8.1 Canonical Trees
Def : canonical trees are those with following properties:
1. No SEQ or ESEQ remove ESEQ first and then SEQ
2. The parent of each CALL is either EXP(..) or MOVE(TEMP t, ….)
i.e., CALL(…) statement or t CALL(…).
• How to remove ESEQ ?•Lifting ESEQ higher and higher until it becomes SEQ.
![Page 6: Basic Block, Trace and Instruction Selection](https://reader036.fdocuments.us/reader036/viewer/2022062518/56813fea550346895daae27c/html5/thumbnails/6.jpg)
6
Transformations on ESEQ-move ESEQ to higher level.
Eg.
ESEQ
ESEQS1
S2
e
ESEQ
SEQ
S1S2
eCase 1:
[S1 ; [S2 ; e ] ] [ [S1, S2]; e ]
![Page 7: Basic Block, Trace and Instruction Selection](https://reader036.fdocuments.us/reader036/viewer/2022062518/56813fea550346895daae27c/html5/thumbnails/7.jpg)
7
l2
ESEQ
S e1
BINOP
e2op ESEQ
S e1
MEM
ESEQ
S e1
JUMP
ESEQ
S e1
CJUMP
op l1 l2e
ESEQ
BINOPS
op
e1 e2
ESEQ
S MEM
e1
SEQ
JMPS
e1
SEQ
CJUMPS
op
e1 e2 l1
Case 2:
![Page 8: Basic Block, Trace and Instruction Selection](https://reader036.fdocuments.us/reader036/viewer/2022062518/56813fea550346895daae27c/html5/thumbnails/8.jpg)
8
Case 3:
ESEQ
S
e1
BINOP
e2
op ESEQ
S1 e2
CJUMP
op ㅣ1
ㅣ 2e1
S
ESEQ
SEQ
t
BINOP
op
TEMP e2MOVE
TEMP e1
t
SEQ
S CJUMP
op
e2 ㅣ1
ㅣ 2
SEQ
MOVE
TEMP e1
t
t
TEMP
![Page 9: Basic Block, Trace and Instruction Selection](https://reader036.fdocuments.us/reader036/viewer/2022062518/56813fea550346895daae27c/html5/thumbnails/9.jpg)
9
Case 4: When S does not affect e1 in case 3 and (s and e1 have no side effect( e.g., I/O))
ESEQ
S
e1
BINOP
e2
op
ESEQ
S BINOP
op
e1 e2
ESEQ
S1 e2
CJUMP
op ㅣ1
ㅣ 2e1
SEQ
S CJUMP
op
e1 e2 ㅣ1
ㅣ 2
if s,e1 commute if s,e1 commute
![Page 10: Basic Block, Trace and Instruction Selection](https://reader036.fdocuments.us/reader036/viewer/2022062518/56813fea550346895daae27c/html5/thumbnails/10.jpg)
10
Some conditions under which two Exp/Stm commute:
1. CONST(n) can commute with any Statement!!
2. NOP (= ExpStm(CONST(0)) ) can commute with any Exp!!
• Be Conservative if we cannot determine if two Exp/Stm
commute!! Notation: [s1,…,sn : e1,…,em ] ( n ≥ 0, m ≥ 0 )
1. is a list of stms s1,…,sn followed by a list of Exps e1 … em.
2. Semantically it means we have to compute the list according to their order and return as a vector the
results of last m expressions.
![Page 11: Basic Block, Trace and Instruction Selection](https://reader036.fdocuments.us/reader036/viewer/2022062518/56813fea550346895daae27c/html5/thumbnails/11.jpg)
11
General Rewriting Rules
1. Identify the sub expressions [e1,…,en] for each Exp e or Stmt s.
• ExpList kids() defined for each Exp and Stm. e.g: Plus([s1 : e1],[s2 : e2 ]) --- e.kids() [: [s1:e1],[s2:e2]]
2. Pull the ESEQs out of the stm or exp and rebuild. e.g: [: [s1:e1],[s2,e2] ] --- reorder [ s1,s2 : e1, e2 ] --- build([:e1,e2])
[ s1,s2 : PLUS(e1,e2) ] --- new ESEQ(_,_). ESEQ(SEQ(s1,s2), PLUS(e1,e2) ) • (Stm, ExpList ) reorder( ExpList ) ;
reorder([: e1,e2,…,em]) return[s1,s2,…,sn: e1,e2,…,em ]• Stm | Exp build(ExpList kids)
![Page 12: Basic Block, Trace and Instruction Selection](https://reader036.fdocuments.us/reader036/viewer/2022062518/56813fea550346895daae27c/html5/thumbnails/12.jpg)
12
Additional example
e = CALL( e1, e2, ESEQ(s1,e3) ) --- e.kids()
[: e1,e2,ESEQ(s1,e3) ] --- reorder(.)
[ s1 : e1,e2,e3] if s1, e1,e2 commute
or [ MOVE(t1,e1), s1 : TEMP(t1), e2, e3 ]
else if s1, e2 commute
or [ MOVE(t1, e1), MOVE(t2,e2), s1 :
TEMP(t1), TEMP(t2), e3 ] o/w --- build()
[s1 : CALL(e1,e2,e3 ) ] or … --- new ESEQ()
ESEQ(s1, CALL(e1,e2,e3)).
![Page 13: Basic Block, Trace and Instruction Selection](https://reader036.fdocuments.us/reader036/viewer/2022062518/56813fea550346895daae27c/html5/thumbnails/13.jpg)
13
package canon;public class Canon { …static tree.Stm reorder_stm(tree.Stm s) { // StmExpList is a pair of Stm and ExpList. StmExpList x = reorder(s.kids()); // seq(a,b) return new SEQ(a,b). return seq(x.stm, s.build(x.exps)); }
static tree.ESEQ reorder_exp (tree.Exp e) { StmExpList x = reorder(e.kids()); return new tree.ESEQ(x.stm, e.build(x.exps)); }
![Page 14: Basic Block, Trace and Instruction Selection](https://reader036.fdocuments.us/reader036/viewer/2022062518/56813fea550346895daae27c/html5/thumbnails/14.jpg)
14
Moving CALLS to Toplevel
All CALLs return their results in the same register (e.g, TEMP( RV) in mips ). CALL( obj, CALL(…), CALL(…)) results in conflict.
Solution : 1. save every CALL result in a new temporary. CALL(fun,args) -> // t is a new temporary. ESEQ(MOVE(TEMP t,CALL(fun,args)),TEMP t) I.e., [: [ MOVE(TEMP t,CALL(fun,args)) : TEMP t ] ]
need extra TEMP(t) (registers)
2. Then eliminate/lifiting ESEQ. => [ MOVE(TEMP t,CALL(fun,args)) : TEMP t ]
overwrite TEMP(RV)
![Page 15: Basic Block, Trace and Instruction Selection](https://reader036.fdocuments.us/reader036/viewer/2022062518/56813fea550346895daae27c/html5/thumbnails/15.jpg)
15
• do_stm(MOVE(TEMP tnew, CALL(f, args)))• do_stm (EXP(CALL(f, args)))• - will not reorder on CALL node • ( so infinite recursion can be avoided)• - will reorder on f and args as the children of MOVE
![Page 16: Basic Block, Trace and Instruction Selection](https://reader036.fdocuments.us/reader036/viewer/2022062518/56813fea550346895daae27c/html5/thumbnails/16.jpg)
16
A LINEAR LIST OF STATEMENTS
S0 S0’ (right linear)
SEQ
SEQ
a b
c
SEQ
SEQa
b c
SEQ(a, SEQ(b, c) )
=> [a,b,c]
linearlize(stm s0) :StmList
![Page 17: Basic Block, Trace and Instruction Selection](https://reader036.fdocuments.us/reader036/viewer/2022062518/56813fea550346895daae27c/html5/thumbnails/17.jpg)
17
8.2 TAMING CONDITIONAL BRANCHES
Definition : BASIC BLOCK :a sequence of statements entered at the beginning
exited at the end- The 1st stmt is a LABEL- The last stmt is a JUMP or
a CJUMP- no other LABELs, JUMPs, CJUMPs..
FT
CJUMP
Cond
Cond
CJUMP
T:
F: …… t
.
.
.(C)JUMP
LABEL
![Page 18: Basic Block, Trace and Instruction Selection](https://reader036.fdocuments.us/reader036/viewer/2022062518/56813fea550346895daae27c/html5/thumbnails/18.jpg)
18
Partition a list of statements into basic blocks
Algorithm : Scan from beginning to end- when Label is found, begin new Block (and end previous block)- when JUMP or CJUMP is found, a block is ended ( an d begin the next block)- If block begins w/o label add a label;
- If block ends without JUMP or CJUMP, insert two statements : JUMP LABEL LABEL : Epilogue block of Function. insert two stms at the
end: JUMP DONE:
DONE;• Note: The class canon.BasicBlocks implements this
algorithm.
![Page 19: Basic Block, Trace and Instruction Selection](https://reader036.fdocuments.us/reader036/viewer/2022062518/56813fea550346895daae27c/html5/thumbnails/19.jpg)
19
Example:
m 0
v 0
L3: if v >= n goto L15
r v
s 0
if r < n goto L9
v v + 1
goto L3
L9: x M[r]
s s + x
if s <= m got L13
m s
L13: r r + 1
goto L6
L15: rv m
![Page 20: Basic Block, Trace and Instruction Selection](https://reader036.fdocuments.us/reader036/viewer/2022062518/56813fea550346895daae27c/html5/thumbnails/20.jpg)
20
Example: (add epilogue and find end of blocks)
m 0
v 0
L3: if v >= n goto L15
r v
s 0
if r < n goto L9
v v + 1
goto L3
L9: x M[r]
s s + x
if s <= m got L13
m s
L13: r r + 1
goto L6
L15: rv m
JUMP done
Done: (function Epilogue)
![Page 21: Basic Block, Trace and Instruction Selection](https://reader036.fdocuments.us/reader036/viewer/2022062518/56813fea550346895daae27c/html5/thumbnails/21.jpg)
21
Example: (start of blocks)
m 0
v 0
L3: if v >= n goto L15
r v
s 0
if r < n goto L9
v v + 1
goto L3
L9: x M[r]
s s + x
if s <= m got L13
m s
L13: r r + 1
goto L6
L15: rv m
JUMP done
Done: (function Epilogue)
![Page 22: Basic Block, Trace and Instruction Selection](https://reader036.fdocuments.us/reader036/viewer/2022062518/56813fea550346895daae27c/html5/thumbnails/22.jpg)
22
Example: (insert start label)
Lb1: m 0
v 0
L3: if v >= n goto L15
Lb2: r v
s 0
if r < n goto L9
Lb3: v v + 1
goto L3
L9: x M[r]
s s + x
if s <= m got L13
Lb4: m s
L13: r r + 1
goto L6
L15: rv m
JUMP done
Done: (function Epilogue)
![Page 23: Basic Block, Trace and Instruction Selection](https://reader036.fdocuments.us/reader036/viewer/2022062518/56813fea550346895daae27c/html5/thumbnails/23.jpg)
23
Example: (insert ending JUMP)
Lb1: m 0
v 0 ; JUMP L3
L3: if v >= n goto L15
Lb2: r v
s 0
if r < n goto L9
Lb3: v v + 1
goto L3
L9: x M[r]
s s + x
if s <= m got L13
Lb4: m s ; JUMP L13
L13: r r + 1
goto L6
L15: rv m
JUMP done
Done: (function Epilogue)
![Page 24: Basic Block, Trace and Instruction Selection](https://reader036.fdocuments.us/reader036/viewer/2022062518/56813fea550346895daae27c/html5/thumbnails/24.jpg)
24
Definition : Trace:A trace is a sequence of statements (or blocks ) that could be consecutively executed during the execution of the program.
We want a set of traces that exactly covers the program
: Every block belongs to exactly one one trace.
To reduce JUMPs, fewer traces are preferred !!
Traces
Exit
![Page 25: Basic Block, Trace and Instruction Selection](https://reader036.fdocuments.us/reader036/viewer/2022062518/56813fea550346895daae27c/html5/thumbnails/25.jpg)
25
5
34
2
4
7
6
3
2
1
5
TF
7
6
1
JUMP
T
F
Idea : the greedy method
T F
T
F
![Page 26: Basic Block, Trace and Instruction Selection](https://reader036.fdocuments.us/reader036/viewer/2022062518/56813fea550346895daae27c/html5/thumbnails/26.jpg)
26
Algorithm 8.2 :canon.TraceSchedules.traceSchedule(BasicBlocks)
Put all the blocks of the Program into a list Q. while Q is not empty
Start a new(empty) trace, call it T.b = Q.pop().while b is not marked //∈ some trace
Mark b; T.add(b) ;
// Examine the successors of b. if there is an unmarked successor C of b
let b = C. // else b is marked and end the current trace T.
![Page 27: Basic Block, Trace and Instruction Selection](https://reader036.fdocuments.us/reader036/viewer/2022062518/56813fea550346895daae27c/html5/thumbnails/27.jpg)
27
5
34
2
4
7
6
3
2
1
5
TF
7
6
1
JUMP
T->FF->T
remove JUMP
JUMP on False
•Remove JumpToNext
T F
•Reverse true fall through
![Page 28: Basic Block, Trace and Instruction Selection](https://reader036.fdocuments.us/reader036/viewer/2022062518/56813fea550346895daae27c/html5/thumbnails/28.jpg)
28
Required local arrangements1. CJUMP followed by false Label => OK
1. CJUMP(op, e1,e2, Lt, Lf ) // op e1, e2 Jture Lt2. Lf: …
2. CJUMP followed by true Label => reverse condition1. CJUMP(>=,e1,e2, Lt,Lf) ; CJUMP(<,e1,e2,Lf,Lt);
2. Lt: … Lt:
3. CJUMP followed by neither true nor false label :1. CJUMP(op, e1,e2, Lt, Lf) // could not be implemented!
2. L1 : …
Then add a new label and a Jump
1. CJUMP(op, e1,e2, lt, lf’ ) // op e1,e2 Jtrue Lt
2. Lf’ : JUMP Lf // Jump Lf
4. … JUMP L ; L: => remove JUMP L (but not L:).
![Page 29: Basic Block, Trace and Instruction Selection](https://reader036.fdocuments.us/reader036/viewer/2022062518/56813fea550346895daae27c/html5/thumbnails/29.jpg)
29
Finishing Up- Efficient compiler should group statements into
basic blocks since analysis and optimizations algorithms run faster on basic blocks than on stmts.
- MiniJava flattens the list of traces back into one long list of Stms for simplicity of later implementation.
- Algorithm-8.2 is a simple greedy algorithm rather than an optimal algorithm. (Finding optimal trace is not computationally easy !! )
![Page 30: Basic Block, Trace and Instruction Selection](https://reader036.fdocuments.us/reader036/viewer/2022062518/56813fea550346895daae27c/html5/thumbnails/30.jpg)
30
Instruction Selection
Chapter 9
![Page 31: Basic Block, Trace and Instruction Selection](https://reader036.fdocuments.us/reader036/viewer/2022062518/56813fea550346895daae27c/html5/thumbnails/31.jpg)
31
What we are going to do.
Tree IR machine Instruction
(Jouette Architecture or SPARC or MIPS or Pentium or T )
=> LOAD r1 M[ fp + c]
MEM
CONST
BINOP
+ fp
C
![Page 32: Basic Block, Trace and Instruction Selection](https://reader036.fdocuments.us/reader036/viewer/2022062518/56813fea550346895daae27c/html5/thumbnails/32.jpg)
32
Jouette Architecture
Name Effect Trees
+
TEMP
ri rj + rk ADD
*ri rj * rk MUL
-ri rj - rk SUB
/ri rj / rk DIV
ADDI+
CONST
+
CONST
CONSTri rj + c
SUBI ri rj - c -
CONST
LOAD M[rj + c] ri
+
CONST
+
CONST
CONST
MEM MEM MEM MEM
ri
![Page 33: Basic Block, Trace and Instruction Selection](https://reader036.fdocuments.us/reader036/viewer/2022062518/56813fea550346895daae27c/html5/thumbnails/33.jpg)
33
Jouette Architecture
+
CONST
+
CONST
CONST
MEM MEM MEM MEM
MOVE MOVE MOVE MOVE
Name Effect Trees
STORE M[rj + c] ri
MOVEM M[rj]
M[ri]
MEM
MOVE
MEM
• Register r0 always contains zero• Instructions produces a result in a register => EXP• instructions produce side effects on Mem => Stm
![Page 34: Basic Block, Trace and Instruction Selection](https://reader036.fdocuments.us/reader036/viewer/2022062518/56813fea550346895daae27c/html5/thumbnails/34.jpg)
34
Tiling the IR tree ex: a[i]:= x i:register a,x:frame var
2 LOAD r1<-M[fp+a]4 ADDI r2<- r0 + 4
5 MUL r2 <- ri*r2
6 ADD r1 <- r1+r28 LOAD r2<-M[fp+x]9 STORE M[r1+0] <- r2
* CONST x
MEM
FP
+
CONST a
MEM
FP
+
MOVE
+
MEM
CONST 4TEMP i1
2
3 4
5
6
7
8
9
![Page 35: Basic Block, Trace and Instruction Selection](https://reader036.fdocuments.us/reader036/viewer/2022062518/56813fea550346895daae27c/html5/thumbnails/35.jpg)
35
Another Solution ex: a[i]:= x i:register a,x:frame var
2 LOAD r1<-M[fp+a]4 ADDI r2<- r0 + 4
5 MUL r2 <- ri*r2
6 ADD r1 <- r1+r28 ADDI r2<- fp+x
9 MOVEM M[r1] <- M[r2 ]
* CONST x
MEM
FP
+
CONST a
MEM
FP
+
MOVE
+
MEM
CONST 4TEMP i1
2
3 4
5
6
7
8
9
![Page 36: Basic Block, Trace and Instruction Selection](https://reader036.fdocuments.us/reader036/viewer/2022062518/56813fea550346895daae27c/html5/thumbnails/36.jpg)
36
Or Another Tiles with a different set of tile-pattern
3 LOAD r1<-M[r1+0]4 ADDI r2<- r0 + 4
5 MUL r2 <- ri*r2
6 ADD r1 <- r1+r2
8 ADD r2<- fp+ r2
10 STORE M[r1+0] <- r2
1 ADDI r1<- r0 + a
2 ADD r1 <- fp +r1
7 ADDI r2<- r0 + x
9 LOAD r2<-M[r2+0]
* CONST x
MEM
FP
+
CONST a
MEM
FP
+
MOVE
+
MEM
CONST 4TEMP i12
3
4
5
67
8
9
10
![Page 37: Basic Block, Trace and Instruction Selection](https://reader036.fdocuments.us/reader036/viewer/2022062518/56813fea550346895daae27c/html5/thumbnails/37.jpg)
37
OPTIMAL and OPTIMUM TILINGS
Optimum Tiling : one whose tiles sum to the lowest possible value.
cost of tile : instr. exe. time, # of bytes, ......
Optimal Tiling : one where no two adjacent tiles can be combined into a single tile of lower cost.
then why we keep ?
are enough.30 25
![Page 38: Basic Block, Trace and Instruction Selection](https://reader036.fdocuments.us/reader036/viewer/2022062518/56813fea550346895daae27c/html5/thumbnails/38.jpg)
38
Algorithms for Instruction Selection
1. Optimal vs Optimum simple maybe hard
2. CISC vs RISC (Complex Instr. Set Computer) tile size large small
optimal >= optimum optimal ~= optimum
instruction cost varies almost same! on addressing mode
![Page 39: Basic Block, Trace and Instruction Selection](https://reader036.fdocuments.us/reader036/viewer/2022062518/56813fea550346895daae27c/html5/thumbnails/39.jpg)
39
Maximal Munch – optimal tiling algorithm
1. starting at root, find the largest tile that fits.
2. repeat step 1 for several subtrees which are generated(remain)!!
3. Generate instructions for each tile (which are in reverse order)
=> traverse tree of tiles in post-orderWhen several tiles can be matched, select the largest tile(which covers the most nodes).
If same tiles are matched, choose an arbitrary one.
![Page 40: Basic Block, Trace and Instruction Selection](https://reader036.fdocuments.us/reader036/viewer/2022062518/56813fea550346895daae27c/html5/thumbnails/40.jpg)
40
Implementation
See Program 9.3 for example(p181)
case statements for each root type!!
There is at least one tile for each type of root node!!
![Page 41: Basic Block, Trace and Instruction Selection](https://reader036.fdocuments.us/reader036/viewer/2022062518/56813fea550346895daae27c/html5/thumbnails/41.jpg)
41
MunchStatementvoid munchStm(Stm s) {
if (s instanceof MOVE)
munchMove(((MOVE)s).dst, ((MOVE)s).src);
⋮ // CALL, JUMP, CJUMP unimplemented here
}
void munchMove(Exp dst, Exp src) {
// MOVE(d, e)
if (dst instanceof MEM) munchMove((MEM)dst,src);
else if (dst instanceof TEMP) munchMove((TEMP)dst,src);
}
void munchMove(TEMP dst, Exp src) {
// MOVE(TEMP(t1), e)
munchExp(src); emit("ADD"); }
![Page 42: Basic Block, Trace and Instruction Selection](https://reader036.fdocuments.us/reader036/viewer/2022062518/56813fea550346895daae27c/html5/thumbnails/42.jpg)
42
PROGRAM 9.3: Maximal Munch in Java.void munchMove(MEM dst, Exp src) { // MOVE(MEM(BINOP(PLUS, e1, CONST(i))), e2)if (dst.exp instanceof BINOP && ((BINOP)dst.exp).oper==BINOP.PLUS
&& ((BINOP)dst.exp).right instanceof CONST) { munchExp(((BINOP)dst.exp).left); munchExp(src); emit("STORE");}
// MOVE(MEM(BINOP(PLUS, CONST(i), e1)), e2) else if (dst.exp instanceof BINOP && ((BINOP)dst.exp).oper ==
BINOP.PLUS && ((BINOP)dst.exp).left instanceof CONST) { munchExp(((BINOP)dst.exp).right); munchExp(src);
emit("STORE");} // MOVE(MEM(e1), MEM(e2)) else if (src instanceof MEM) { munchExp(dst.exp); munchExp(((MEM)src).exp); emit("MOVEM");} // MOVE(MEM(e1, e2) else { munchExp(dst.exp); munchExp(src); emit("STORE");} }
![Page 43: Basic Block, Trace and Instruction Selection](https://reader036.fdocuments.us/reader036/viewer/2022062518/56813fea550346895daae27c/html5/thumbnails/43.jpg)
43
MEM
B30+2 A
40
10 20
Dynamic Programming – finding optimum tiling finding optimum solutions based on
optimum solutions of each subproblem!!
1. Assign cost to every node in the
tree.
2. Find several matches.
3. Compute the cost for each match.
4. Choose the best one.
5. Let the cost be the value of node.
10+20+40
+4=
30+2+40+5=?
![Page 44: Basic Block, Trace and Instruction Selection](https://reader036.fdocuments.us/reader036/viewer/2022062518/56813fea550346895daae27c/html5/thumbnails/44.jpg)
44
+
MEM
ADDIADDI
+
+
CONSTCONST2CONST1+
CONST
TileCost LeavesCost Total
1 1+1 3
1 1 2
1 1 2
CONST
MEM
+
CONST
MEM
+
MEM
2
1 1
LOAD ri<-M[rj] LOAD ri<-M[rj+c] LOAD ri<-M[rj+c]
cost 1+2 1+1 1+1
Example
MEM node
![Page 45: Basic Block, Trace and Instruction Selection](https://reader036.fdocuments.us/reader036/viewer/2022062518/56813fea550346895daae27c/html5/thumbnails/45.jpg)
45
Tree Grammars
Example : Schizo-Jouette machine
ADD di <- dj +dk
MUL di <- dj *dk
SUB di <- dj - dk
DIV di <- dj /dk
d
d+
d
d
d*
d
d
d-
d
d
d/
d
ADDI di <- dj +C
SUBI di <- dj -C
d
d+
CONST CONST
d+
d CONSTd
d
d-
CONST
MOVEA dj<- ai
MOVED aj<- di
da
ad
A generalization of DP for machines with complex instruction set and several classes of registers and addressing modes.
ai : address register
dj : data register
![Page 46: Basic Block, Trace and Instruction Selection](https://reader036.fdocuments.us/reader036/viewer/2022062518/56813fea550346895daae27c/html5/thumbnails/46.jpg)
46
LOAD di<-M[aj+C]
STORE M[aj+C]<- di
MOVEM M[aj] <- M[ai ]
+
CONST
CONST
dMEM dMEM
+
CONST
dMEM
a a
a
dMEM
+
CONST
CONST
MEM MEM
+
CONST
MEM MEM
MOVE MOVE MOVE MOVE
a a
a
d d d d
MEM
MOVE
a
MEM
a
![Page 47: Basic Block, Trace and Instruction Selection](https://reader036.fdocuments.us/reader036/viewer/2022062518/56813fea550346895daae27c/html5/thumbnails/47.jpg)
47
Use Context-free grammar to describe the tiles;ex: nonterminal s : statement
d : data a : address
d -> MEM(+(a,CONST))d-> MEM(+(CONST,a))d-> MEM(CONST)d-> MEM(a)
d -> a
a -> d
MOVEA
MOVED
LOAD
=> ambiguous grammar!!
-> parse based on the minimum cost!!
s MOVE(MEM(+(a,CONST)), d)STORE
s MOVE(M(a),M(a))MOVEM
![Page 48: Basic Block, Trace and Instruction Selection](https://reader036.fdocuments.us/reader036/viewer/2022062518/56813fea550346895daae27c/html5/thumbnails/48.jpg)
48
Efficiency of Tiling Algorithms
Order of Execution Cost for “Maximal Munch & Dynamic Programming”
T : # of different tiles.K : # of non-leaf node of tile (in average)K’: largest # of node that need to be examined to choose the right tile ~= the size of largest tileT’: average # of tile-patterns which matches at each tree node
Ex: for RISC machineT = 50, K = 2, K’= 4, T’ = 5 ,
![Page 49: Basic Block, Trace and Instruction Selection](https://reader036.fdocuments.us/reader036/viewer/2022062518/56813fea550346895daae27c/html5/thumbnails/49.jpg)
49
N : # of input nodes in a tree.
complexity = N/K * ( K’ + T’) ofmaximal Munch # of node
(#of patterns)to be examined
to find matched pattern
to findminimum
cost
complexity ofDynamic Programming
= N * (K’ + T’)
“linear to N”
![Page 50: Basic Block, Trace and Instruction Selection](https://reader036.fdocuments.us/reader036/viewer/2022062518/56813fea550346895daae27c/html5/thumbnails/50.jpg)
50
9.2 RISC vs CISC
RISC
1. 32registers.
2. only one class of integer/pointer
registers.
3. arithmetic operations only between
registers.
4. “three-address” instruction form r1<-r2
& r3
5. load and store instructions with only the
M[reg+const] addressing mode.
6. every instruction exactly 32 bits long.
7. One result or effect per instruction.
![Page 51: Basic Block, Trace and Instruction Selection](https://reader036.fdocuments.us/reader036/viewer/2022062518/56813fea550346895daae27c/html5/thumbnails/51.jpg)
51
CISC(Complex Instruction Set Computers)
Complex Addressing Mode
1. few registers (16 or 8 or 6).2. registers divided into different classes.3. arithmetic operations can access registers or memory through “addressing mode”.4. “two-address” instruction of the form r1<-r1 & r2.5. several different addressing modes.6. variable length instruction format.7. instruction with side effects. eg: auto-increment/decrement.
![Page 52: Basic Block, Trace and Instruction Selection](https://reader036.fdocuments.us/reader036/viewer/2022062518/56813fea550346895daae27c/html5/thumbnails/52.jpg)
52
Solutions for CISC1. Few registers.
- do it in register allocation phase.2. Classes of registers.
- specify the operands and result explicitly. - ex: left opr of arith op (e.g. mul) must be eax
- t1 t2 x t3 ==>- move eax, t2 eax t2- mul t3 eax eax x t3; edx
garbage - mov t1 eax t1 eax3. Two addressing instructions
- add extra move instruction -> resgister allocation
t1 <- t2+t3 move t1,t2 t1<- t2 add t1,t3 t1<- t1+t3
![Page 53: Basic Block, Trace and Instruction Selection](https://reader036.fdocuments.us/reader036/viewer/2022062518/56813fea550346895daae27c/html5/thumbnails/53.jpg)
53
4. Arithmetic operations can address memory.- actually handled by “register spill” phase.
- load memory operand into register and
store back into memory -> may trash registers!!
-ex: add [ebp – 8,] ecx is equivalent to - mov eax, [ebp –8] - add eax, ecx - mov [ebp – 8], eax
![Page 54: Basic Block, Trace and Instruction Selection](https://reader036.fdocuments.us/reader036/viewer/2022062518/56813fea550346895daae27c/html5/thumbnails/54.jpg)
54
5. several addressing modes- takes time to execute (no faster than multiInstr seq) “trash” fewer registers short instruction sequence select appropriate patterns for addressing mode.6. Variable Length Instructions
- let assembler do generate binary code.7. Instruction with Side effect
eg: r2 <- M[r1]; r1<- r1 + 4; - difficult to model!!
(a) ignore the auto increment-> forget it! (b) try to match special idioms
(c) try to invent new algorithms.
![Page 55: Basic Block, Trace and Instruction Selection](https://reader036.fdocuments.us/reader036/viewer/2022062518/56813fea550346895daae27c/html5/thumbnails/55.jpg)
55
• assembly language instruction without register assignment.
package assem;
public abstract class Instr {
public String assem; // instr template
public abstract temp.TempList use(); // retrun src list
public abstract temp.TempList def(); // return dst list
public abstract Targets jumps(); // return jump
public String format(temp.tempMap m); // txt of assem instr
}
public Targets(temp.LabelList labes);
Abstract Assembly Language Instructions
![Page 56: Basic Block, Trace and Instruction Selection](https://reader036.fdocuments.us/reader036/viewer/2022062518/56813fea550346895daae27c/html5/thumbnails/56.jpg)
56
// dst, src and jump can be null.
public OPER(String assem, TempList dst, TempList src, temp.LabelList jump);
public OPER(String assem, TempList dst, TempList src);
public MOVE(String assem, Temp dst, Temp src)
public LABEL(String assem, temp.Label label);
![Page 57: Basic Block, Trace and Instruction Selection](https://reader036.fdocuments.us/reader036/viewer/2022062518/56813fea550346895daae27c/html5/thumbnails/57.jpg)
57
Example
• assem.Instr is independent of the target machine assembly.
ex: MEM( +( fp, CONST(8)) ==>
new OPER(“LOAD ‘d0 <- M[‘s0 + 8]”,
new TempList(new Temp(),null),
new TempList(frame.FP(), null));
call format(…) on the above Instr. we get
LOAD r1 <- M[r27+8]
assume reg. allocator assign r1 to the new Temp and r27 is the frame pointer register.
![Page 58: Basic Block, Trace and Instruction Selection](https://reader036.fdocuments.us/reader036/viewer/2022062518/56813fea550346895daae27c/html5/thumbnails/58.jpg)
58
Another Example• *(+(Temp(t87), CONST(3)), MEM(temp(t92))• assem dst src• ADDI ‘d0 <- ‘s0 + 3 t908 t87• LOAD ‘d0 <- M[‘s0+0] t909 t92• MUL ‘d0 <- ‘s0*’s1 t910 t908,t909• after register allocation, the instr look like:
– ADDI r1 <- r12 + 3 t908/r1 t87/r12– LOAD r2 <- M[r13+0] t909/r2 t92/r13– MUL r1 <- r1*r2 t910/r1
• Two-address instructions– t1 t1 + t2 ==>– assem dst src– add ‘d0 ‘s1 t1 t1,t2
• PROGRAM 9.5: Assem-instructions for munchStm.