CMPE 511 Computer Architecture A Faster Optimal Register Allocator Betül Demiröz.
-
Upload
linda-howard -
Category
Documents
-
view
220 -
download
0
Transcript of CMPE 511 Computer Architecture A Faster Optimal Register Allocator Betül Demiröz.
![Page 1: CMPE 511 Computer Architecture A Faster Optimal Register Allocator Betül Demiröz.](https://reader036.fdocuments.us/reader036/viewer/2022062422/56649ece5503460f94bda640/html5/thumbnails/1.jpg)
CMPE 511 Computer Architecture
A Faster Optimal Register Allocator
Betül Demiröz
![Page 2: CMPE 511 Computer Architecture A Faster Optimal Register Allocator Betül Demiröz.](https://reader036.fdocuments.us/reader036/viewer/2022062422/56649ece5503460f94bda640/html5/thumbnails/2.jpg)
8 December 2005
2
Outline
Motivation of the StudyRegister Allocation ProblemClassical Methods (Chaitin & Briggs)Optimal Register AllocatorExperimental Study
![Page 3: CMPE 511 Computer Architecture A Faster Optimal Register Allocator Betül Demiröz.](https://reader036.fdocuments.us/reader036/viewer/2022062422/56649ece5503460f94bda640/html5/thumbnails/3.jpg)
8 December 2005
3
Motivation of the StudyChallenges of Compilers for Embedded Systems
Power consumption, memory space limitationsSmall set of applications
Afford long execution cycles to generate good code quality for various phases
instruction selectioninstruction schedulingregister allocation
![Page 4: CMPE 511 Computer Architecture A Faster Optimal Register Allocator Betül Demiröz.](https://reader036.fdocuments.us/reader036/viewer/2022062422/56649ece5503460f94bda640/html5/thumbnails/4.jpg)
8 December 2005
4
Motivation of the Study (2)
Instruction Selectionselecting target machine instructions to implement pirimitive IR (Instruction Representation) code instructionschanges quality of the code
Instruction Schedulingordering the operations in the compiled codedecreases the running time of the compiler
![Page 5: CMPE 511 Computer Architecture A Faster Optimal Register Allocator Betül Demiröz.](https://reader036.fdocuments.us/reader036/viewer/2022062422/56649ece5503460f94bda640/html5/thumbnails/5.jpg)
8 December 2005
5
Register Allocation
Problemassigning program variables into available registersshape runtime performance of a compiled code
Failure to provide an efficient register allocation
increase in the number of memory accessesincrease in code size (effect memory capacity and overall form factor of the device)increase in power consumption (frequent memory visits due to poor register allocation)
![Page 6: CMPE 511 Computer Architecture A Faster Optimal Register Allocator Betül Demiröz.](https://reader036.fdocuments.us/reader036/viewer/2022062422/56649ece5503460f94bda640/html5/thumbnails/6.jpg)
8 December 2005
6
Register Allocation (2)
NP-Complete (Garey & Johnson, 1976)Approaches
Graph ColoringChaitin (1981)
Integer ProgrammingGoodwin and Wilken (1996)
![Page 7: CMPE 511 Computer Architecture A Faster Optimal Register Allocator Betül Demiröz.](https://reader036.fdocuments.us/reader036/viewer/2022062422/56649ece5503460f94bda640/html5/thumbnails/7.jpg)
8 December 2005
7
Graph ColoringTraditional solution to register allocation problem.Graphs are used to show registersEach node represents a register, and an edge connecting these nodes shows that these registers are alive at the same point in the programSuch nodes should be colored with different colors
![Page 8: CMPE 511 Computer Architecture A Faster Optimal Register Allocator Betül Demiröz.](https://reader036.fdocuments.us/reader036/viewer/2022062422/56649ece5503460f94bda640/html5/thumbnails/8.jpg)
8 December 2005
8
Graph Coloring (2)
Spilling (lack of registers variables stored in memory for some or all of its lifetime)Spill cost (runtime cost of a variable for loading from and storing in memory)
address computation, memory operation, execution frequency
![Page 9: CMPE 511 Computer Architecture A Faster Optimal Register Allocator Betül Demiröz.](https://reader036.fdocuments.us/reader036/viewer/2022062422/56649ece5503460f94bda640/html5/thumbnails/9.jpg)
8 December 2005
9
Live RangesA variable Vi is live at a point p in program if
defined above p & not used yet for the last time.
Live Range (LRi )begins with the definition of Vi ends with its last use of Vi
LRi & LRj simultaneously live at p LRi interferes LRj
Not stored in the same register.Interference Graph Gı = G(V,E)
V = set of individual live ranges E = set of edges that represent interferences
![Page 10: CMPE 511 Computer Architecture A Faster Optimal Register Allocator Betül Demiröz.](https://reader036.fdocuments.us/reader036/viewer/2022062422/56649ece5503460f94bda640/html5/thumbnails/10.jpg)
8 December 2005
10
int main(){ int a; int b; int i; a=10; b=1; i=0; while (i<=a){
b+=b*i; i++; if (b>=100) break;
} return 0;
}
main:pushl %ebpmovl %esp, %ebpsubl $24, %espandl $-16, %espmovl $0, %eaxsubl %eax, %espmovl $10, -4(%ebp)movl $1, -8(%ebp)movl $0, -12(%ebp)
.L2:movl -12(%ebp),
%eaxcmpl -4(%ebp), %eaxjle .L4jmp .L3
.....
Source CodeGaS (GNU
Assembler)
![Page 11: CMPE 511 Computer Architecture A Faster Optimal Register Allocator Betül Demiröz.](https://reader036.fdocuments.us/reader036/viewer/2022062422/56649ece5503460f94bda640/html5/thumbnails/11.jpg)
8 December 2005
11
main: subl $4, t1 (t2) movl t3, t2 movl t2, t3 (t4) subl $24, t2 (t5) andl $-16, t5 (t6) movl $0, t7 subl t7, t6 (t8) movl $10, t4 movl $1, t4
movl $0, t4 .L2:
movl t4, t7 (t9) cmpl t4, t9.....
Extended Representation
Interference Graph
t8t9
t11
t12
t10
t3
t13
t7
t6
t5 t2
t1
t14
t15
t4
![Page 12: CMPE 511 Computer Architecture A Faster Optimal Register Allocator Betül Demiröz.](https://reader036.fdocuments.us/reader036/viewer/2022062422/56649ece5503460f94bda640/html5/thumbnails/12.jpg)
8 December 2005
12
Classical Methods for Register Allocation
Register allocator based on Graph Coloring
Chaitin’s Heuristic (limitations for diamond graphs)Optimistic Coloring Heuristic (Briggs)
Stack-Based Methods
![Page 13: CMPE 511 Computer Architecture A Faster Optimal Register Allocator Betül Demiröz.](https://reader036.fdocuments.us/reader036/viewer/2022062422/56649ece5503460f94bda640/html5/thumbnails/13.jpg)
8 December 2005
13
Chaitin’s HeuristicInitialize stack S to empty.while(GI ) do
while v of G1 such that v0 < k
Pick any vertex v such that v0 < kRemove v and its edges from G1 and put v on S.
if (GI ) then
Pick a vertex v based on the given Spill MetricSpill the live range associated with v.
Remove v and its edges from GI
while(S ) dov = pop(S)Color v with the lowest color not used by any neighbor of v.
![Page 14: CMPE 511 Computer Architecture A Faster Optimal Register Allocator Betül Demiröz.](https://reader036.fdocuments.us/reader036/viewer/2022062422/56649ece5503460f94bda640/html5/thumbnails/14.jpg)
8 December 2005
14
Chaitin-Briggs Heuristic (OCH)Initialize stack S to empty.
while(GI ) do
while v of G1 such that v0 < k Pick any vertex v such that v0 < kRemove v and its edges from G1 and put v on S.
if (GI ) then
Pick a vertex v based on the given Spill MetricPush v on the stack
Remove v and its edges from GIwhile(S ) do
v = pop(S)Color v with the lowest color not used by any neighbor of v.If node υ cannot be colored, then pick an uncolored node υ to spill, spill it, and restart at step 1
![Page 15: CMPE 511 Computer Architecture A Faster Optimal Register Allocator Betül Demiröz.](https://reader036.fdocuments.us/reader036/viewer/2022062422/56649ece5503460f94bda640/html5/thumbnails/15.jpg)
8 December 2005
15
Comparison of Chaitin’s Heuristic and OCH
Try to find 2 colorings
A
B
C
D
Chaitin (A spilled, B->r1, C->r2, D->r1)
OCH(A->r1, B->r2, C->r1, D->r2)
![Page 16: CMPE 511 Computer Architecture A Faster Optimal Register Allocator Betül Demiröz.](https://reader036.fdocuments.us/reader036/viewer/2022062422/56649ece5503460f94bda640/html5/thumbnails/16.jpg)
8 December 2005
16
Integer Programming (IP)
Compared with graph coloring, IPincreases program performancereduces code size
The time to solve a register allocation problem can be significantThe IP formulation should be as simple as possible
![Page 17: CMPE 511 Computer Architecture A Faster Optimal Register Allocator Betül Demiröz.](https://reader036.fdocuments.us/reader036/viewer/2022062422/56649ece5503460f94bda640/html5/thumbnails/17.jpg)
8 December 2005
17
Optimal Register Allocator (ORA)
ORA uses IP to solve register allocation problemProposed by Goodwin and Wilkonson (1996)IP model is very complex, because it contains many redundanciesSolution of the problem is slow
![Page 18: CMPE 511 Computer Architecture A Faster Optimal Register Allocator Betül Demiröz.](https://reader036.fdocuments.us/reader036/viewer/2022062422/56649ece5503460f94bda640/html5/thumbnails/18.jpg)
8 December 2005
18
A Faster Optimal Register Allocator
“A Faster Optimal Register Allocator” uses IP to solve register allocation problemFu, Wilken and Goodwin (2005)The proposed approach uses global and local analysis techniques to identify locations where spill and deallocation decisions are unnecessaryUses a simplified IP formulation Faster
![Page 19: CMPE 511 Computer Architecture A Faster Optimal Register Allocator Betül Demiröz.](https://reader036.fdocuments.us/reader036/viewer/2022062422/56649ece5503460f94bda640/html5/thumbnails/19.jpg)
8 December 2005
19
Basic ORA Model
![Page 20: CMPE 511 Computer Architecture A Faster Optimal Register Allocator Betül Demiröz.](https://reader036.fdocuments.us/reader036/viewer/2022062422/56649ece5503460f94bda640/html5/thumbnails/20.jpg)
8 December 2005
20
Control Flow Graph and ORA Graphs
![Page 21: CMPE 511 Computer Architecture A Faster Optimal Register Allocator Betül Demiröz.](https://reader036.fdocuments.us/reader036/viewer/2022062422/56649ece5503460f94bda640/html5/thumbnails/21.jpg)
8 December 2005
21
Basic ORA Model
Models register allocation as a set of network graphs
Symbolic-register graphsMemory graphs
An optimal allocation solution is obtained by selecting a set of graph edges whose costs are minimal
Cost = allocation overhead of a decision
![Page 22: CMPE 511 Computer Architecture A Faster Optimal Register Allocator Betül Demiröz.](https://reader036.fdocuments.us/reader036/viewer/2022062422/56649ece5503460f94bda640/html5/thumbnails/22.jpg)
8 December 2005
22
IP Formulation
![Page 23: CMPE 511 Computer Architecture A Faster Optimal Register Allocator Betül Demiröz.](https://reader036.fdocuments.us/reader036/viewer/2022062422/56649ece5503460f94bda640/html5/thumbnails/23.jpg)
8 December 2005
23
Redundancy
![Page 24: CMPE 511 Computer Architecture A Faster Optimal Register Allocator Betül Demiröz.](https://reader036.fdocuments.us/reader036/viewer/2022062422/56649ece5503460f94bda640/html5/thumbnails/24.jpg)
8 December 2005
24
Global Reduction
Eliminates unnecessary load, store and deallocation decisions placed at the diverge and merge edges in the live range graphs80% of the total decisions generated by ORA model
![Page 25: CMPE 511 Computer Architecture A Faster Optimal Register Allocator Betül Demiröz.](https://reader036.fdocuments.us/reader036/viewer/2022062422/56649ece5503460f94bda640/html5/thumbnails/25.jpg)
8 December 2005
25
Decision Placement
![Page 26: CMPE 511 Computer Architecture A Faster Optimal Register Allocator Betül Demiröz.](https://reader036.fdocuments.us/reader036/viewer/2022062422/56649ece5503460f94bda640/html5/thumbnails/26.jpg)
8 December 2005
26
Diamond Region ReductionsThere are 4 reduction techniques which can eliminate unnecessary load, store and deallocationVoid region coupling
void regioncoupled decisionpaired decision
Symmetric Decision SelectionJump-Edge NullificationAsymmetric Decision Elimination
![Page 27: CMPE 511 Computer Architecture A Faster Optimal Register Allocator Betül Demiröz.](https://reader036.fdocuments.us/reader036/viewer/2022062422/56649ece5503460f94bda640/html5/thumbnails/27.jpg)
8 December 2005
27
Local Reduction
Examines symbolic registers used in adjacent instructions to identify unnecessary load and deallocation decisions
![Page 28: CMPE 511 Computer Architecture A Faster Optimal Register Allocator Betül Demiröz.](https://reader036.fdocuments.us/reader036/viewer/2022062422/56649ece5503460f94bda640/html5/thumbnails/28.jpg)
8 December 2005
28
Constraint Reduction
Deallocation constraintsMust-allocate constraintSingle-symbolic constraintLiveness constraint
![Page 29: CMPE 511 Computer Architecture A Faster Optimal Register Allocator Betül Demiröz.](https://reader036.fdocuments.us/reader036/viewer/2022062422/56649ece5503460f94bda640/html5/thumbnails/29.jpg)
8 December 2005
29
Deallocation Constraints
Used to allow a real register to be deallocated from a symbolic register at the deallocation decision locationXr
sp-1>= Xrsp
Xrsp-1 represents the allocation state of
real register r to symbolic register s before the deallocation constraint pXr
sp represents the allocation state after p
![Page 30: CMPE 511 Computer Architecture A Faster Optimal Register Allocator Betül Demiröz.](https://reader036.fdocuments.us/reader036/viewer/2022062422/56649ece5503460f94bda640/html5/thumbnails/30.jpg)
8 December 2005
30
Must-allocate Constraint
Used to ensure a symbolic register must be allocated to a real register at each definition and each useΣ Xr
sp >=1For optimal allocation, if no deallocation exists between two must-allocate constraints for a symbolic register, then the second must-allocate constraint is redundant
![Page 31: CMPE 511 Computer Architecture A Faster Optimal Register Allocator Betül Demiröz.](https://reader036.fdocuments.us/reader036/viewer/2022062422/56649ece5503460f94bda640/html5/thumbnails/31.jpg)
8 December 2005
31
Single-symbolic Constraint
Used to ensure a real register can be allocated to at most one symbolic registerΣ Xr
sp <=1For optimal allocation, if no deallocation exists between two adjacant single-symbolic constraints for a real register, then the first must-allocate constraint is redundant
![Page 32: CMPE 511 Computer Architecture A Faster Optimal Register Allocator Betül Demiröz.](https://reader036.fdocuments.us/reader036/viewer/2022062422/56649ece5503460f94bda640/html5/thumbnails/32.jpg)
8 December 2005
32
Liveness constraint
Used to ensure the liveness of a symbolic register Σ Xr
sp + Xmemsp >=1
Xmemsp represents the allocation
state of a symbolic register s to memory at the liveness constraint location p
![Page 33: CMPE 511 Computer Architecture A Faster Optimal Register Allocator Betül Demiröz.](https://reader036.fdocuments.us/reader036/viewer/2022062422/56649ece5503460f94bda640/html5/thumbnails/33.jpg)
8 December 2005
33
Experimental Study
Compares graph coloring, ORA and faster ORAFor ORA and faster ORA, SPEC CPU2000 and SPEC CPU92 integer benchmark suites are used with a RISC processor
![Page 34: CMPE 511 Computer Architecture A Faster Optimal Register Allocator Betül Demiröz.](https://reader036.fdocuments.us/reader036/viewer/2022062422/56649ece5503460f94bda640/html5/thumbnails/34.jpg)
8 December 2005
34
SPEC CPU92 Benchmark Functions
![Page 35: CMPE 511 Computer Architecture A Faster Optimal Register Allocator Betül Demiröz.](https://reader036.fdocuments.us/reader036/viewer/2022062422/56649ece5503460f94bda640/html5/thumbnails/35.jpg)
8 December 2005
35
# decision variables and constraints produced by basic ORA and Faster
ORA
![Page 36: CMPE 511 Computer Architecture A Faster Optimal Register Allocator Betül Demiröz.](https://reader036.fdocuments.us/reader036/viewer/2022062422/56649ece5503460f94bda640/html5/thumbnails/36.jpg)
8 December 2005
36
Dynamic spill-code saved using Faster ORA
![Page 37: CMPE 511 Computer Architecture A Faster Optimal Register Allocator Betül Demiröz.](https://reader036.fdocuments.us/reader036/viewer/2022062422/56649ece5503460f94bda640/html5/thumbnails/37.jpg)
8 December 2005
37
Dynamic spill code components for SPEC CPU 2000
![Page 38: CMPE 511 Computer Architecture A Faster Optimal Register Allocator Betül Demiröz.](https://reader036.fdocuments.us/reader036/viewer/2022062422/56649ece5503460f94bda640/html5/thumbnails/38.jpg)
8 December 2005
38
ConclusionTwo different solutions to register allocation problem
Integer ProgrammingGraph Coloring
The formulations and usages of these solutions are shownFaster ORA reduces the number of register allocation IP decision variables compared to the basic IP formulations IP gives better results as compared to graph coloring
![Page 39: CMPE 511 Computer Architecture A Faster Optimal Register Allocator Betül Demiröz.](https://reader036.fdocuments.us/reader036/viewer/2022062422/56649ece5503460f94bda640/html5/thumbnails/39.jpg)
8 December 2005
39
ReferencesG. Chatin and M. Auslender, “Register allocation via coloring,” Computer Languages, 1981D. Goodwin and K. Wilken, “Optimal and near-optimal global register allocation using 0-1 integer programming,” Software Practice and Experience, 1996 C. Fu, K. Wilken and D. Goodwin, “A Faster Optimal Register Allocator,” Journal of Instruction-Level Parallelism 7, 2005
![Page 40: CMPE 511 Computer Architecture A Faster Optimal Register Allocator Betül Demiröz.](https://reader036.fdocuments.us/reader036/viewer/2022062422/56649ece5503460f94bda640/html5/thumbnails/40.jpg)
8 December 2005
40
Thank You
ANY QUESTIONS??