Register Tracking Register tracking improves on a simple code generator Uses a simple local register...

25
Register Tracking Register tracking improves on a simple code generator Uses a simple local register allocation scheme in which the contents of allocatable registers are tracked Allows allocation of frequently accessed variables and temporaries to registers It is not optimal We generate code from tuples •n 2 registers must be available for allocation
  • date post

    21-Dec-2015
  • Category

    Documents

  • view

    220
  • download

    2

Transcript of Register Tracking Register tracking improves on a simple code generator Uses a simple local register...

Register Tracking• Register tracking improves on a simple code

generator• Uses a simple local register allocation scheme in

which the contents of allocatable registers are tracked

• Allows allocation of frequently accessed variables and temporaries to registers

• It is not optimal• We generate code from tuples• n 2 registers must be available for allocation

Machine Instructions and Cost

• Load Storage, Reg -- Cost = 2

• Store Storage, Reg -- Cost = 2

• OP Storage, Reg -- Reg = Reg OP Storage; Cost = 2

• OP Reg1, Reg2 -- Reg2 = Reg2 OP Reg1; Cost = 1

Status Flags

• L (live) or D (dead)– A variable or temp that is live will be

referenced again in the basic block

• S (to be saved) or NS (not to be saved)– A variable should always be saved at the end of

a basic block if it has not already been stored in memory

– Temps are not saved after they become dead

Cost of Var/Temp0 If status is (D,NS) – if the next reference to

it is an assignment of a new value

0 If status is (D,S) – The variable won’t be used again, but it hasn’t been saved – There is no

cost in freeing the register and doing the save immediately

2 If status is (L,NS) – A load is needed to

restore the it to a register

4 If status is (L,S) – A store is needed to save the value, and a load is needed to restore the value to a register

Cost Algorithm

cost = (U is in a register ? 0

: get_reg_cost() + 2 /* cost to load U into R1?

+ cost (R1) /* cost of loading U */

+ (V is in a register || U == V

? 1 : 2) /* cost of register-to-register */

/* vs storage-to-register */

Example Basic Block

a = b * c + d * e

d = c + (d – b);

f = e + a + c;

a = d + e;

Quadruples

1. (*, b, c, t1) 8. (+, e, a, t6)

2. (*, d, e, t2) 9. (+, t6, c, t7)

3. (+, t1, t2, t3) 10. (=, t7, f)

4. (=, t3, a) 11. (+, d, e, t8)

5. (–, d, b, t4) 12. (=, t8, a)

6. (+, c, t4, t5)

7. (=, t5, d)

Unoptimized Code1. Load b, r1 11. Store d, r22. Mul c, r1 12. Load e, r13. Load d, r2 13. Add a, r14. Mul e, r2 14. Add c, r15. Add r2, r1 15. Store f, r16. Store a, r1 16. Load d, r17. Load d, r1 17. Add e, r18. Sub b, r1 18. Store a, r19. Add r1, r210. Store d, r2

Register Tracking – 1Tuple/Code Generated

Register Assoc

r1 r2 r3 r4

(*,b,c,t1)

Cost(*,b,c,t1)=2+2+2

Cost(*,c,b,t1)=2+2+2

Load b,r1

Load c,r2

Mul r2, r1

b(L,NS)

b(L,NS)

T1(L,S)

C(L,NS)

C(L,NS)

(*,d,e,t2)

cost(*,d,e,t2)=2+2+2

cost(*,e,d,t2)=2+2+2

Load d,r3

Load e,r4

Mul r4,r3

t1(L,S)

t1(L,S)

t1(L,S)

c(L,NS)

c(L,NS)

c(L,NS)

d(L,NS)

d(L,NS)

t2(L,S)

e(L, NS)

e(L, NS)

Register Tracking – 2Tuple/Code Generated Register Assoc

r1 r2 r3 r4

(+,t1,t2,t3)

cost(+,t1,t2,t3)=0+0+1

cost(+,t2,t1,t3)=0+0+1

Add r3,r1

-- (D,NS) can be

immediately removed

t3(L,S) c(L,NS)

t2(D,NS) e(L,NS)

(=,t3,a)

-- The store is deferred

a(L,S) c(L,NS) e(L,NS)

(–,d,b,t4)

Load d,r3

Sub b,r3

-- B dead after this tuple

a(L,S) a(L,S)

c(L,NS)

c(L,NS)

d(D,NS)

t4(L,S)

e(L,NS)

e(L,NS)

Register Tracking - 3Tuple/Code Generated Register Assoc

r1 r2 r3 r4

(+,c,t4,t5)

cost(+,c,t4,t5)=0+2+1

cost(+,t4,c,t50=0+0+1

Add r2,r3 a(L,S) c(L,NS)

t5(L,S) e(L,NS)

(=,t5,d)

-- The store is deferred

a(L,S) c(L,NS) d(L,S) e(L,NS)

(+,e,a,t6)

cost(+,e,a,t6)=0+2+1

cost(+,a,e,t6)=0+0+1

-- a is dead after this

Add r4,r1 t6(L,S) c(L,NS) d(L,S) e(L,NS)

Register Tracking – 4

Tuple/Code Generated Register Assoc

r1 r2 r3 r4

(+,t6,c,t7)

cost(+,t6,c,t7)=0+0+1

cost(+,c,t6,t7)=0+0+1

Add r2,r1 t7(L,S) c(D,NS)

d(L,S) e(L,NS)

(=,t7,t8)

Store d,r3

-- Store since f is not

-- live in block

t7(D,NS) d(L,S) e(L,NS)

Register Tracking – 5

Tuple/Code Generated

Register Assoc

r1 r2 r3 r4

(+,d,e,t8)

cost(+,d,e,t8)=0+0+1

cost(+,e,d,t8)=0+0+1

Store d,r3

-- Store is unavoidable

Add r3,r4

d(L,NS)

t8(L,S)

e(L,NS)

e(L,NS)

(=,t8,a)

Store a,r3

-- Store is unavoidable

Optimized Code

1. Load b,r1 9. Sub b,r3

2. Load c,r2 10. Add r2,r3

3. Mul r2, r1 11. Add r4,r1

4. Load d,r3 12. Add r2,r1

5. Load e,r4 13. Store d,r3

6. Mul r4,r3 14. Store d,r3

7. Add r3,r1 15. Add r3,r4

8. Load d,r3 16. Store a,r3

Effects of Aliasing

• Let N be a name that can alias data objects.• N can be a:

– reference parameter– pointer – indexed variable

• For N we compute a set O of data objects that it may alias, i.e. set of all;– variables of a given type– heap objects of a given type– array elements

Reference to N

When N is referenced:

• Examine register association list

• If any data object o O appears with status S, the store must be completed before N is referenced

Assignment to N

When assignment to N is made:

• If any data object o O appears in the register association list, it must be removed.

• Removal reflects that the assignment to N may have changed the value of o, invalidating the value currently held in the register associated with o.

Subprogram Calls

Allocatable registers are normally saved and

restored across subprogram calls.

Caller Saves and Restores Registers

• Save (L,S)

• Don’t save (L,NS)

• Then, free all registers

• On return, only those variables that are needed are reloaded into registers

Callee Saves and Restores Registers• Callee knows how many registers callee will use• Possibly, some number of caller’s registers will not be used

and therefore can remain untouched across subprogram call• Divide callee registers to be used into two groups:

– Def: set of variables that may be defined (updated)– Use; Set of variables that may be used (referenced only)

• Before call, save all data objects o Use that appear in register association list with status S

• Similarly, remove all data objects o Def from register association list

• That is, save values that may be referenced during the call, and remove associations that will be invalidated by assignments during the call

Register Tracking Variations• Spill register whose next reference is most distant.• Consider next two references• Coloring Algorithm (Chaitin 82)

– Register allocation becomes a problem of coloring the graph

– Each color is a register

• Use more precise costing based on instruction size or timing

• Use additional address modes (immediate, indirect, indexed+base, etc)

• Consider different register classes and pairs of adjacent registers

Cutting Cost

Trade: For:

load c, r1 load c, r2

sub a, r1 sub a, r1

load c, r2 add b, r2

add b, r2 add r2, r1

add r2, r1

cost = 7

cost = 9

Global Register Tracking

• Special operands such as loop indices or procedure arguments can be allocated to fixed registers that can span multiple blocks

• We can carry register status information forward to basic blocks where predecessors are unique:– i.e., if and case statements

– It is not always a good idea to delay saves across block boundaries (might do saves in several successor blocks instead of once in single predecessor blocks

Global Difficulties

• We must agree in all blocks to keep a given data object in the same register or do extra moves

• Must decide which of the possibly thousands of variables and temps to keep in a small set of available registers

• Need to do flow-of control analysis and use some mechanism to estimate frequency of reference to variables and temps

• Free (L,NS) variable v1 with only a cost of 1 (instead of 2) if next time it is referenced as OP v1, r2, instead of Load v1, r1, OP r1, r2

• Similarly, free (S,NS) variable v2 with only a cost of 3 (instead of 4) if we do a save and later reference the value as we did v1 above

• This is a peephole optimization technique

Peephole Optimization