Register Allocation
description
Transcript of Register Allocation
![Page 1: Register Allocation](https://reader036.fdocuments.us/reader036/viewer/2022062423/568149a2550346895db6e367/html5/thumbnails/1.jpg)
Register Allocation
Mooly Sagiv
Schrierber 31703-640-7606
Wed 10:00-12:00
html://www.math.tau.ac.il/~msagiv/courses/wcc.html
![Page 2: Register Allocation](https://reader036.fdocuments.us/reader036/viewer/2022062423/568149a2550346895db6e367/html5/thumbnails/2.jpg)
Already StudiedSource program (string)
lexical analysis
syntax analysis
semantic analysis
Translate
Tokens
Abstract syntax tree
Tree IR
Abstract syntax tree
Cannon
Assem (with many reg)
Instruction Selection
Cannonical Tree IR
![Page 3: Register Allocation](https://reader036.fdocuments.us/reader036/viewer/2022062423/568149a2550346895db6e367/html5/thumbnails/3.jpg)
Register Allocation• Input:
– Sequence of machine code instructions(assembly)
• Unbounded number of temporary registers
• Output– Sequence of machine code instructions
(assembly)– Machine registers – Some MOVE instructions removed– Missing prologue and epilogue
![Page 4: Register Allocation](https://reader036.fdocuments.us/reader036/viewer/2022062423/568149a2550346895db6e367/html5/thumbnails/4.jpg)
Register Allocation ProcessRepeat
Construct a interference graph
Nodes = Machine registers and temporaries
Interference-Edges = Temporaries with
overlapping lifetimes
Move-edges = Source and Target of MOVE
Color graph nodes with machine registers
Adjacent nodes are not colored by the same
register
Spill a temporary into activation record
Until no more spill
![Page 5: Register Allocation](https://reader036.fdocuments.us/reader036/viewer/2022062423/568149a2550346895db6e367/html5/thumbnails/5.jpg)
Running Example
/* k, j */ g := mem[j+12]
h := k - 1
f := g * h
e := mem[j+8]
m := mem[j+16]
b := mem[f]
c := e + 8
d := c
k := m + 4
j := b /* d, k, j */
![Page 6: Register Allocation](https://reader036.fdocuments.us/reader036/viewer/2022062423/568149a2550346895db6e367/html5/thumbnails/6.jpg)
Challenges
• The Coloring problem is computationally hard
• The number of machine registers may be small
• Avoid too many MOVEs
• Handle “pre-colored” nodes
![Page 7: Register Allocation](https://reader036.fdocuments.us/reader036/viewer/2022062423/568149a2550346895db6e367/html5/thumbnails/7.jpg)
Coloring by SimplificationKempe 1879
• Let K be the number of machine
• Let G(V, E) be the interference graph
• Consider a node v V which has K-1 or less neighbors– Color G – v in K colors– Color v in a color different that its (colored)
neighbors
![Page 8: Register Allocation](https://reader036.fdocuments.us/reader036/viewer/2022062423/568149a2550346895db6e367/html5/thumbnails/8.jpg)
Graph Coloring by Simplification
Build: Construct the interference graph
Simplify: Recursively remove nodes with less than K neighbors ; Push removed nodes into stack
Potential-Spill: Spill some nodes and remove nodes
Push removed nodes into stack
Select: Assign actual registers (from simplify/spill stack)
Actual-Spill: Spill some potential spills and repeat the process
![Page 9: Register Allocation](https://reader036.fdocuments.us/reader036/viewer/2022062423/568149a2550346895db6e367/html5/thumbnails/9.jpg)
Coalescing
• MOVs can be removed if the source and the target share the same register
• The source and the target of the move can be merged into a single node (unifying the sets of neighbors)
• May require more registers• Conservative Coalescing
– Merge nodes only if the resulting node has fewer than K neighbors with degree K (in the resulting graph)
![Page 10: Register Allocation](https://reader036.fdocuments.us/reader036/viewer/2022062423/568149a2550346895db6e367/html5/thumbnails/10.jpg)
Constrained Moves
• A instruction T := S is constrained– if S and T interfere
• May happen after coalescing• Assume X and Z interfere and
– X := Y– Y := Z
• After coalescing X and Y we get– XY := Zwith interference between XY and Z
• Constrained MOVs are not coalesced
![Page 11: Register Allocation](https://reader036.fdocuments.us/reader036/viewer/2022062423/568149a2550346895db6e367/html5/thumbnails/11.jpg)
Graph Coloring with Coalescing
Build: Construct the interference graph
Simplify: Recursively remove non MOVE nodes
with less than K neighbors; Push removed nodes into stack
Potential-Spill: Spill some nodes and remove nodes
Push removed nodes into stack
Select: Assign actual registers (from simplify/spill stack)
Actual-Spill: Spill some potential spills and repeat the process
Coalesce: Conservatively merge unconstrained MOV related nodes with fewer that K “heavy” neighbors
Freeze: Give-Up Coalescing on some low-degree MOV related nodes
![Page 12: Register Allocation](https://reader036.fdocuments.us/reader036/viewer/2022062423/568149a2550346895db6e367/html5/thumbnails/12.jpg)
Spilling
• Many heuristics exist– Maximal degree– Live-ranges– Number of usages in loops
• The whole process need to be repeated after an actual spill
![Page 13: Register Allocation](https://reader036.fdocuments.us/reader036/viewer/2022062423/568149a2550346895db6e367/html5/thumbnails/13.jpg)
Pre-Colored Nodes
• Some registers are in the intermediate language are pre-colored– correspond to real registers
(stack-pointer, frame-pointer, …)
• Cannot be Simplified, Coalesced, or Spilled (infinite degree)
• Interfered with each other• But normal temporaries can be coalesced into pre-
colored registers• Register allocation is completed when all the nodes
are pre-colored
![Page 14: Register Allocation](https://reader036.fdocuments.us/reader036/viewer/2022062423/568149a2550346895db6e367/html5/thumbnails/14.jpg)
Caller-Save and Callee-Save Registers
• callee-save-registers (MIPS 16-23)– Saved by the callee
• caller-save-registers– Saved by the caller
• Interprocedural analysis may improve results
int f(int a) {
int b=a+1; g(); h(b); return(b+2); }
void h (int y) {
int x=y+1; f(y); f(2); }
![Page 15: Register Allocation](https://reader036.fdocuments.us/reader036/viewer/2022062423/568149a2550346895db6e367/html5/thumbnails/15.jpg)
Saving Callee-Save Registers
enter: def(r7)
…
exit: use(r7)
enter: def(r7)
t231 := r7
…
r7 := t231
exit: use(r7)
![Page 16: Register Allocation](https://reader036.fdocuments.us/reader036/viewer/2022062423/568149a2550346895db6e367/html5/thumbnails/16.jpg)
A Realistic Exampleenter:
c := r3
a := r1
b := r2
d := 0
e := a
loop:
d := d+b
e := e-1
if e>0 goto loop
r1 := d
r3 := c
return /* r1,r3 */
r1, r2 caller save
r3 callee-save
enter: /* r2, r1, r3 */
c := r3 /* c, r2, r1 */
a := r1 /* a, c, r2 */
b := r2 /* a, c, b */
d := 0 /* a, c, b, d */
e := a / * e, c, b, d */
loop:
d := d+b /* e, c, b, d */
e := e-1 /* e, c, b, d */
if e>0 goto loop /* c, d */
r1 := d /* r1, c */
r3 := c /* r1, r3 */
return /* r1,r3 */
![Page 17: Register Allocation](https://reader036.fdocuments.us/reader036/viewer/2022062423/568149a2550346895db6e367/html5/thumbnails/17.jpg)
A Realistic Example(cont)
enter: /* r2, r1, r3 */
c := r3 /* c, r2, r1 */
a := r1 /* a, c, r2 */
b := r2 /* a, c, b */
d := 0 /* a, c, b, d */
e := a / * e, c, b, d */
loop:
d := d+b /* e, c, b, d */
e := e-1 /* e, c, b, d */
if e>0 goto loop /* c, d */
r1 := d /* r1, c */
r3 := c /* r1, r3 */
return /* r1,r3 */
![Page 18: Register Allocation](https://reader036.fdocuments.us/reader036/viewer/2022062423/568149a2550346895db6e367/html5/thumbnails/18.jpg)
Graph Coloring with Coalescing
Build: Construct the interference graph
Simplify: Recursively remove non MOVE nodes
with less than K neighbors; Push removed nodes into stack
Potential-Spill: Spill some nodes and remove nodes
Push removed nodes into stack
Select: Assign actual registers (from simplify/spill stack)
Actual-Spill: Spill some potential spills and repeat the process
Coalesce: Conservatively merge unconstrained MOV related nodes with fewer that K “heavy” neighbors
Freeze: Give-Up Coalescing on some low-degree MOV related nodes
![Page 19: Register Allocation](https://reader036.fdocuments.us/reader036/viewer/2022062423/568149a2550346895db6e367/html5/thumbnails/19.jpg)
enter: /* r2, r1, r3 */
c := r3 /* c, r2, r1 */
a := r1 /* a, c, r2 */
b := r2 /* a, c, b */
d := 0 /* a, c, b, d */
e := a / * e, c, b, d */
loop:
d := d+b /* e, c, b, d */
e := e-1 /* e, c, b, d */
if e>0 goto loop /* c, d */
r1 := d /* r1, c */
r3 := c /* r1, r3 */
return /* r1,r3 */
use+def
outside
loop
use+def
within
loop
deg spill
priority
a 2 0 4 0.5
b 1 1 1 2.75
c 2 0 6 0.33
d 2 2 4 5.5
e 1 3 3 10.3
spill = (uo + 10 ui)/deg
![Page 20: Register Allocation](https://reader036.fdocuments.us/reader036/viewer/2022062423/568149a2550346895db6e367/html5/thumbnails/20.jpg)
Spilling c
c
stackstack
![Page 21: Register Allocation](https://reader036.fdocuments.us/reader036/viewer/2022062423/568149a2550346895db6e367/html5/thumbnails/21.jpg)
Coalescing a+e
c
stack
c
stack
![Page 22: Register Allocation](https://reader036.fdocuments.us/reader036/viewer/2022062423/568149a2550346895db6e367/html5/thumbnails/22.jpg)
Coalescing b+r2
c
stack
c
stack
![Page 23: Register Allocation](https://reader036.fdocuments.us/reader036/viewer/2022062423/568149a2550346895db6e367/html5/thumbnails/23.jpg)
Coalescing ae+r1
c
stack
c
stack
r1ae and d are constrained
![Page 24: Register Allocation](https://reader036.fdocuments.us/reader036/viewer/2022062423/568149a2550346895db6e367/html5/thumbnails/24.jpg)
Simplifying d
d
c
stack
c
stack
![Page 25: Register Allocation](https://reader036.fdocuments.us/reader036/viewer/2022062423/568149a2550346895db6e367/html5/thumbnails/25.jpg)
Pop d
d
c
stack
c
stack
d is assigned to r3
![Page 26: Register Allocation](https://reader036.fdocuments.us/reader036/viewer/2022062423/568149a2550346895db6e367/html5/thumbnails/26.jpg)
Pop c
c
stackstack
actual spill!
c
![Page 27: Register Allocation](https://reader036.fdocuments.us/reader036/viewer/2022062423/568149a2550346895db6e367/html5/thumbnails/27.jpg)
enter: /* r2, r1, r3 */
c1 := r3 /* c1, r2, r1 */
M[c_loc] := c1 /* r2 */
a := r1 /* a, r2 */
b := r2 /* a, b */
d := 0 /* a, b, d */
e := a / * e, b, d */
loop:
d := d+b /* e, b, d */
e := e-1 /* e, b, d */
if e>0 goto loop /* d */
r1 := d /* r1 */
c2 := M[c_loc] /* r1, c2 */
r3 := c2 /* r1, r3 */
return /* r1,r3 */
enter: /* r2, r1, r3 */
c := r3 /* c, r2, r1 */
a := r1 /* a, c, r2 */
b := r2 /* a, c, b */
d := 0 /* a, c, b, d */
e := a / * e, c, b, d */
loop:
d := d+b /* e, c, b, d */
e := e-1 /* e, c, b, d */
if e>0 goto loop /* c, d */
r1 := d /* r1, c */
r3 := c /* r1, r3 */
return /* r1,r3 */
![Page 28: Register Allocation](https://reader036.fdocuments.us/reader036/viewer/2022062423/568149a2550346895db6e367/html5/thumbnails/28.jpg)
enter: /* r2, r1, r3 */
c1 := r3 /* c1, r2, r1 */
M[c_loc] := c1 /* r2 */
a := r1 /* a, r2 */
b := r2 /* a, b */
d := 0 /* a, b, d */
e := a / * e, b, d */
loop:
d := d+b /* e, b, d */
e := e-1 /* e, b, d */
if e>0 goto loop /* d */
r1 := d /* r1 */
c2 := M[c_loc] /* r1, c2 */
r3 := c2 /* r1, r3 */
return /* r1,r3 */
![Page 29: Register Allocation](https://reader036.fdocuments.us/reader036/viewer/2022062423/568149a2550346895db6e367/html5/thumbnails/29.jpg)
Coalescing c1+r3; c2+c1r3
stackstack
![Page 30: Register Allocation](https://reader036.fdocuments.us/reader036/viewer/2022062423/568149a2550346895db6e367/html5/thumbnails/30.jpg)
Coalescing a+e; b+r2
stackstack
![Page 31: Register Allocation](https://reader036.fdocuments.us/reader036/viewer/2022062423/568149a2550346895db6e367/html5/thumbnails/31.jpg)
Coalescing ae+r1; Simplify d
d
stackstack
![Page 32: Register Allocation](https://reader036.fdocuments.us/reader036/viewer/2022062423/568149a2550346895db6e367/html5/thumbnails/32.jpg)
Pop d
stack
d
stack
d
a r1
b r2
c1 r3
c2 r3
d r3
e r1
![Page 33: Register Allocation](https://reader036.fdocuments.us/reader036/viewer/2022062423/568149a2550346895db6e367/html5/thumbnails/33.jpg)
enter:
c1 := r3
M[c_loc] := c1
a := r1
b := r2
d := 0
e := a
loop:
d := d+b
e := e-1
if e>0 goto loop
r1 := d
c2 := M[c_loc]
r3 := c2
return /* r1,r3 */
a r1
b r2
c1 r3
c2 r3
d r3
e r1
enter:
r3 := r3
M[c_loc] := r3
r1 := r1
r2 := r2
r3 := 0
r1 := r1
loop:
r3 := r3+r2
r1 := r1-1
if r1>0 goto loop
r1 := r3
r3 := M[c_loc]
r3 := r3
return /* r1,r3 */
![Page 34: Register Allocation](https://reader036.fdocuments.us/reader036/viewer/2022062423/568149a2550346895db6e367/html5/thumbnails/34.jpg)
enter:
r3 := r3
M[c_loc] := r3
r1 := r1
r2 := r2
r3 := 0
r1 := r1
loop:
r3 := r3+r2
r1 := r1-1
if r1>0 goto loop
r1 := r3
r3 := M[c_loc]
r3 := r3
return /* r1,r3 */
enter:
M[c_loc] := r3
r3 := 0
loop:
r3 := r3+r2
r1 := r1-1
if r1>0 goto loop
r1 := r3
r3 := M[c_loc]
return /* r1,r3 */
![Page 35: Register Allocation](https://reader036.fdocuments.us/reader036/viewer/2022062423/568149a2550346895db6e367/html5/thumbnails/35.jpg)
Graph Coloring Implementation• Two kinds of queries on interference graph
– Get all the nodes adjacent to node X– Is X and Y adjacent?
• Significant save is obtained by not explicitly representing adjacency lists for machine registers– Need to be traversed when nodes are coalesced
• Heuristic conditions for coalescing temporary X with machine register R:– For every T that is a neighbor of X any of the following
is true(guarantee that the number of neighbors of $T$ are not increased from <K to =K):
– T already interferes with R– T is a machine register– Degree(T) < K
![Page 36: Register Allocation](https://reader036.fdocuments.us/reader036/viewer/2022062423/568149a2550346895db6e367/html5/thumbnails/36.jpg)
Worklist Management• Use work-list to avoid expensive iterative
queries
• Maintain various invariants as data structures are updated
• Implement work-lists as doubly linked lists to allow element-deletion
![Page 37: Register Allocation](https://reader036.fdocuments.us/reader036/viewer/2022062423/568149a2550346895db6e367/html5/thumbnails/37.jpg)
Graph Coloring Expression Trees• Drastically simpler
• No need for data flow analysis
• Optimal solution can be computed (using dynamic programming)
![Page 38: Register Allocation](https://reader036.fdocuments.us/reader036/viewer/2022062423/568149a2550346895db6e367/html5/thumbnails/38.jpg)
Simple Register Allocation
int n := 0
function SimpleAlloc(t) {
for each non trivial tile u that is a child of t
SimpleAlloc(u)
for each non trivial tile u that is a child of t
n := n - 1
n := n + 1
assign rn to hold the value of t
}
![Page 39: Register Allocation](https://reader036.fdocuments.us/reader036/viewer/2022062423/568149a2550346895db6e367/html5/thumbnails/39.jpg)
Bad Example
PLUS
BINOP
MEM BINOP
NAME a TIMES MEM MEM
NAME b NAME c
![Page 40: Register Allocation](https://reader036.fdocuments.us/reader036/viewer/2022062423/568149a2550346895db6e367/html5/thumbnails/40.jpg)
Two Phase SolutionDynamic Programming
Sethi & Ullman
• Bottom-up (labeling)– Compute for every subtree
• The minimal number of registers needed
• Top-Down– Generate the code using labeling by preferring
“heavier” subtrees (larger labeling)
![Page 41: Register Allocation](https://reader036.fdocuments.us/reader036/viewer/2022062423/568149a2550346895db6e367/html5/thumbnails/41.jpg)
The Labeling Algorithmvoid label(t) {
for each tile u that is a child of t do label(u)
if t is trivial then need[t] := 0
else if t has two children, left and right then
if need[left] = need[right]
then need[t] := need[left] + 1
else need[t] := max(1, need[left], need[right])
else if t has one child, c then
need[t] := max(1, need[c])
else if t has no children then need[t] := 1
}
![Page 42: Register Allocation](https://reader036.fdocuments.us/reader036/viewer/2022062423/568149a2550346895db6e367/html5/thumbnails/42.jpg)
int n := 0
void StehiUllman(t) {
if t has two children,left and right then
if need[left]>K and need[right]>K then
StehiUllman (right) ;
n := n – 1 ;
spill: emit instruction to store reg[right]
StehiUllman (left)
unspill: emit instruction to fetch reg[right]
else if need[left]> need[right] then
StehiUllman (left) ; StehiUllman (right) ; n := n - 1
else if need[right]> need[left] then
StehiUllman (right) ; StehiUllman (left) ; n := n – 1
reg[t] := rn ; emit(OPER, instruction[t], reg[t], reg[left], reg[right])
else if t has one child c then
StehiUllman(c) ; reg[t] : = rn; emit(OPER, instruction[t], reg[t], reg[c])
else if t is non trivial but has no children
n := n + 1 ; reg[t] := rn ; emit (OPER, instruction[t], reg[t])
else if t a trivial node TEMP(ri) then reg[t] := ri
![Page 43: Register Allocation](https://reader036.fdocuments.us/reader036/viewer/2022062423/568149a2550346895db6e367/html5/thumbnails/43.jpg)
Summary
• Two Register Allocation Methods– Local of every IR tree
• Simultaneous instruction selection and register allocation
• Optimality (under certain conditions)
– Global of every function• Applied after instruction selection
• Performs well for machines with many registers
• Can handle instruction level parallelism
• Missing– Interprocedural allocation