Garbage collection in shared-environment closure reducers: Space-efficient depth first copying using...

7
Information ~cgsin~ Information ProcessingLetters 56 (1995) 1-7 Garbage collection in shared-environment Space-efficient depth first copying using a Stephen Thomas closure reducers: tailored approach Department of Computer Science.University of Nottingham, University Park, Nottingham NG7 2RLl, United Kingdom Received 30 May 1995; revised 14 July 1995 Communicated by R.S. Bird Abstract Implementations of abstract machines such as the OP-TIM and the PG-TIM need to use a tailored garbage collector which seems to require an auxiliary stack,with a potential maximum size that is directly proportional to the amount of live data in the heap. However, it turns out that it is possible to build a recursive copying collector that does not require additional space by reusing already-scavenged space. This paper is a description of this technique. Keywords: TIM; Garbage collection; Closure reduction; Abstract architectures 1. Introduction Memory organisation in the TIM [ 21, PG-TIM [ 41 and OP-TIM [ 51 abstract architectures is very simple. The basic unit of information is the pair (I, E), usually called a closure. I is a reference to the closure’s infor- mation table, which in turn contains exkcutable code C, and E is the closure’s environment, which contains the values of all free variabIes referenced in C. This environment, also called a frame, is itself an array of closures. This situation is shown pictorially in Fig. 1. The problem is that different closures may share the same physical environment, but the executable code sequences of these closures may each use different selections of free variables. This in turn means that there may be some closures in any given environment which are not used at all, and should not be preserved in a garbage collection. Email:[email protected]. The way to solve this problem is to associate with each sequence of executable code a preserver, which identifies for the garbage collector exactly which clo- sures need to be followed up in any frame using this code. Rules for synthesising preservers at the same time that abstract code is generated are given in [ 51. This need for such selective slot preservation has an important consequence. Because the garbage collector cannot assume that every slot in a frame is visible merely because the frame itself is visible, it is very difficult to see how a Cheney-style stackless copying collector [ 1] could be implemented for this kind of heap organisation. This technique maintains a queue of scavenged-but-not-yet-followed-up heap,objects, so that once an object is identified as live, it is added to the end of the queue. However, objects to be followed up are taken from the beginning of the queue. Hence, there is a temporal separation between marking a live object, and following up any pointers within a live object, between which any number of other objects 0020-0190/95/$09.50 @ 1995 Elsevier Science B.V. All rights reserved SSDIOO20-0190(95)00131-X

Transcript of Garbage collection in shared-environment closure reducers: Space-efficient depth first copying using...

Page 1: Garbage collection in shared-environment closure reducers: Space-efficient depth first copying using a tailored approach

Information ~cgsin~

Information Processing Letters 56 (1995) 1-7

Garbage collection in shared-environment Space-efficient depth first copying using a

Stephen Thomas ’

closure reducers: tailored approach

Department of Computer Science.University of Nottingham, University Park, Nottingham NG7 2RLl, United Kingdom

Received 30 May 1995; revised 14 July 1995 Communicated by R.S. Bird

Abstract

Implementations of abstract machines such as the OP-TIM and the PG-TIM need to use a tailored garbage collector which seems to require an auxiliary stack,with a potential maximum size that is directly proportional to the amount of live data in the heap. However, it turns out that it is possible to build a recursive copying collector that does not require additional space by reusing already-scavenged space. This paper is a description of this technique.

Keywords: TIM; Garbage collection; Closure reduction; Abstract architectures

1. Introduction

Memory organisation in the TIM [ 21, PG-TIM [ 41 and OP-TIM [ 51 abstract architectures is very simple. The basic unit of information is the pair (I, E), usually called a closure. I is a reference to the closure’s infor- mation table, which in turn contains exkcutable code C, and E is the closure’s environment, which contains the values of all free variabIes referenced in C. This environment, also called a frame, is itself an array of closures. This situation is shown pictorially in Fig. 1.

The problem is that different closures may share the same physical environment, but the executable code sequences of these closures may each use different selections of free variables. This in turn means that there may be some closures in any given environment which are not used at all, and should not be preserved in a garbage collection.

’ Email:[email protected].

The way to solve this problem is to associate with each sequence of executable code a preserver, which identifies for the garbage collector exactly which clo- sures need to be followed up in any frame using this code. Rules for synthesising preservers at the same time that abstract code is generated are given in [ 51.

This need for such selective slot preservation has an important consequence. Because the garbage collector cannot assume that every slot in a frame is visible merely because the frame itself is visible, it is very difficult to see how a Cheney-style stackless copying collector [ 1 ] could be implemented for this kind of heap organisation. This technique maintains a queue of scavenged-but-not-yet-followed-up heap,objects, so that once an object is identified as live, it is added to the end of the queue. However, objects to be followed up are taken from the beginning of the queue. Hence, there is a temporal separation between marking a live object, and following up any pointers within a live object, between which any number of other objects

0020-0190/95/$09.50 @ 1995 Elsevier Science B.V. All rights reserved

SSDIOO20-0190(95)00131-X

Page 2: Garbage collection in shared-environment closure reducers: Space-efficient depth first copying using a tailored approach

S. Thomas/Information Processing Letters 56 (1995) 1-7

Closure

s Environment

Fig. I. A closure and its environment frame.

may be marked and/or followed up. This is perfectly safe, provided that it is known

that everything referenced from a live object is truly needed. If this information is uncertain, then problems

are encountered. It is entirely possible to encounter a closure that refers to a frame that has been scavenged and followed up, but requires some closures beyond those already scavenged. The only option available to a Cheney-style collector is to rewind the queue to the offending frame, which in turn means that all the frames which have been properly followed up since the offending frame was originally processed have to be taken into account. Any time-efficient method to avoid rescanning these would involve maintaining a stack of some kind.

So, assuming that some kind of stack is always re- quired, it makes sense to do away with the complexity that modifying a Cheney-style collector would entail, and use a straightforward depth-first scan of the heap,

using the stack to control the traversal in a standard fashion. Now there need be no separation between marking a frame as live, and following through the required closures.

The purpose of this paper is to describe, within the context of a copying garbage collection strategy, a way of implementing a closure’s preserver as directly exe- cutable code which can be called from a similar pre- server. Although a depth-first heap scanning process is used, the auxiliary stack that is implied by this is stored in the garbage collector’s fromspace, in space no longer required. This ensures that any space over- heads imposed by the collector are eliminated.

2. Executable preservers

In its most abstract form, the preserver information for a code sequence is a set P which denotes a col- lection of slots { kt , k2, . . . , kn} that are required in that code sequence. Conceptually, a closure becomes apair (Z,E), whereI z (eC>.

However, an important point to note is that for any particular sequence C in a program, its correspond- ing preserver P is completely known at compile time, which means that the compiler also knows exactly what actions that would be undertaken by the garbage collector if it was required to preserve the contents of

a frame associated with C. This in turn leads to the

idea that the garbage collector could be specialised for a particular preserver, and so represent preservers as directly executable code.

In this fashion, the top level action of the garbage collector would be to call the preservers for each clo- sure in the root set, and each preserver would recur- sively call the preserver of each required slot in a frame.

A complication to consider is the sharing of frames, where different closures viewing the same frame may

require slightly different sets of required slots - but there may be a high degree of commonality, too. This means that there are two issues to cope with: l The first thing a preserver must do is ensure that the

frame it is using has been marked as live. However, if it already has been so marked, then it cannot as- sume that there is nothing further to do and return to its caller - the frame may have been marked by a preserver needing a different slot set. So, irrespec- tive of whether a frame is already live or not, a pre- server must always attempt to follow up its required closures.

l A consequence of this is that as well as marking

frames as live, it is necessary to have some way of marking individual closures as visited, so that attempts to follow up closures visited by another preserver can be avoided. The remainder of this paper describes how exe-

cutable recursive preservers can be implemented for a copying garbage collector, without the implicit need for a resumption stack to cope with the recursion in- curring any additional space penalties, and yet still handling the above constraints efficiently. This is done by presenting the instruction set for GCM, a copying

Page 3: Garbage collection in shared-environment closure reducers: Space-efficient depth first copying using a tailored approach

S. Thomas/Information Processing Letters 56 (1995) l-7 3

garbage collection abstract machine.

3. An abstract machine for garbage collection

The following discussion of GCM is in three parts. Firstly, there is a description of GCM’s state regis- ters and the way it expects the heap to be organ&d. Secondly there is a description of each of the GCM instructions. Finally, the marking mechanism is de- scribed in greater detail, causing the GCM instruction set description to be revised.

3.1. GCM registers and memory organisation

GCM has a number of registers for holding state information:

oldc, newt On entry to a preserver, these are point- ers to the fromspace closure which caused that

preserver to be invoked, and to its corresponding tospace closure respectively.

frm, newf These are pointers to the fromspace frame currently being scavenged, and its corresponding

tospace frame. rlk This is the return link pointer. rsp This is the resumption stack pointer, and refers to

a pair of words. The first word contains the previ- ous value of rlk, before it gained its current value, and the second word contains the previous value of rsp. Hence, rsp forms a stack containing return link

pointers implemented as a linked list of word pairs, threaded through the second word of each pair.

hpp This is tospace heap pointer, indicating where the next copied frame will be placed.

It is assumed that the pointers refer to words, and that a closure is a pair of words. The notation [p] indicates the value of the word at pointer p. If p is a frame pointer, then [p] is a forward reference, a pointer to the corresponding tospace frame, (or NULL if no such frame has been allocated yet), and [p + 1 ]

is the size of the frame, in words, including the header

information. Consequently, for a frame p, the closures are at locations p + 2, p + 4, and so-on. In general, a pointer to the closure in slot i of the frame pointed to by f is given by the expression f + ((i x 2) + 2).

The first word of a closure pair points at an infor- mation table. In a complete implementation this table

would contain a number of things. For the purposes of our discussion, however, it contains only two pieces of information.

The first word is a jump instruction to the sequence of GCM instructions implementing the preserver.’ Directly following the jump is another sequence of in- structions, representing the actual closure code itself. During normal evaluation, the preserver jump instruc- tion is always skipped when a closure’s code is en- tered.

3.2. GCM instruction set

Because GCM is designed to do a highly specific task, it only has five instructions, and these may not be arbitrarily composed. In outline, for a preserver P containing slots {kl, k2,. . . , k,}, the code generated resembles:

GCCopy GCMark GCCall kt, LI

Ll GCRecover kl GCCall kz, Lz

Lz . . .

L,_ 1 GCRecover k,_ 1 GCCall kn, L,

L, GCReturn

The way these instructions work is described below.

3.2.1. The GCCopy instruction Any preserver expects three registers to be defined

on entry. Firstly, it expects that rlk contains the return address to jump to once this preserver call is complete, and also that oldc points to thefromspace closure from which the currently executing preserver was invoked. Register new refers to the corresponding closure in

2 In less accommodating machine architectures, where an arbi- trary jump instruction cannot be encoded into a single word, this would be a word pointing at the appropriate jump instruction, or at the preserver code itself.

Page 4: Garbage collection in shared-environment closure reducers: Space-efficient depth first copying using a tailored approach

4 S. Thomas/Information Processing Letters 56 (1995) l-7

Fromspace

oldc

I

IILtl

Before

Tospace

newt

I I I I I I I I I I I I I I I I I I I I I I I I I I I I

After

Fromspace

oldc

(’ ?=I- I I

n u Tospace

newt

ELI >

J1 newf

n

I

fig. 2. Allocating a tospace frame.

tospace which will contain the preserved version of that closure.

The first instruction in a preserver is GCCopy, which

ensures that a portion of tospace is available to hold the preserved version of the oldc’s frame, as shown in

Fig. 2. The pseudo-code to do this is shown below:

frm := [ oldc + l]

nmf := Ifrrn] if newf = NULL then

newf := hpp

hpP :=hpp+ lfm+ll

lfml := newf [newfl := NULL

[navf+ 11 := lfrm+ 11 endif

[newt] := [ofdc] [newt + l] := newf

The effect of this code is first to set fi-rn to point at the fromspace frame. Then it checks if that frame has a forward reference. If not, it allocates some space in the tospace, pointed to by newf, and sets frm’s forward reference point at this newly allocated frame. If there is already a forward reference, then newf is set from this. After newf is properly set up, the closure at newt is initialised so that its information table points to oldc’s,

and its environments refers to newf.

3.2.2. The GCMark instruction The next task of a preserver, before visiting its first

closure, it to ensure that the closure at oldc is marked as visited, so that a another preserver will not make an

unnecessary attempt to redo the work of this preserver. This has to be done before any slots are visited in case the frame contains cyclic references.

Also, since closures are about to be recursively vis-

ited, any registers which need to be protected across recursive calls should be saved. Clearly rlk must be one of these registers. Apart from that, however, only the contents of oldc need to be saved, for reasons that will become clear as the discussion unfolds.

In fact, only the contents of the register oldc itself need to be saved - the contents of the closure refer- enced from oldc are never needed again. This insight

allows us to re-use the closure’s space, as shown in Fig. 3, using the code below.

[ oldc] := rlk

[oldc + l] := rsp rsp := oldc

Superficially, this seems odd, because the value of oldc does not seem to be saved. In fact, it is regenerated just before the preserver returns to its caller, using the GCReturn instruction described below. The fact that closure pairs are re-used in this way explains the somewhat exotic stack representation.

Page 5: Garbage collection in shared-environment closure reducers: Space-efficient depth first copying using a tailored approach

rlk

oldc

4

S. Thomns/lnformation Processing Letters 26 (I99S) l-7

Before Afer

Fig. 3. Marking a closure.

Finally, since the contents of the closure at oldc have

been ovenutitten, this effectively marks that closure. The effect of the GCMark instruction is shown in Fig. 3.

How this prevents subsequent revisits to oldc’s closure is described later.

3.2.3. The GCCall instruction

Having copied oldc’s frame, saved any return in- formation and marked the closure at oldc as visited,

the preserver should now visit any slots that are re-

quired. GCCall k, 1 visits the closure in slot k of frm, arranging that the GCM code at label 1 be jumped to upon return. This involves setting up a return address in rlk, setting oldc and newt to point at correspond-

ing closures in frm and newt, and then jumping to the preserver referenced from the new closure at oldc, if it

has not already been marked as visited. If it has been visited already, then GCCall should simply jump to rlk directly. This process is shown in the code below.

rlk := 1 oldc:=fm +((kx2)+2) newt := newf + ((k x 2) + 2) if ismarked(oldc) then

jump rlk else

jump [oldc] endif

Note that the code assumes that the first slot in a

closure is slot 0.

3.2.4. The GCRecover instruction A GCRecover k instruction should be the first in-

struction executed upon return from a recursive pre- server call, if the calling preserver wishes to visit an- other closure in its frame. Here, k should be the slot number of the closure just visited in this preserver, which will be pointed to by oldc. GCRecover uses this to determine where the fromspace frame containing that closure starts, and hence where the corresponding tospace frame starts, in preparation for the next call to GCCall:

frm:=oldc - ((kx2)+2) newf := lfmr]

3.2.5. The GCReturn instruction The final instruction is used upon return from a

recursive visit, when there are no more slots to visit. At this point the preserver should return to its caller, ensuring that oldc points to the originating closure, in preparation for any subsequent GCRecover in the caller. This involves setting oldc to the current value of rsp, restoring rlk and rsp from the values stored at rsp, and finally jumping to rlk.

oldc := rsp rlk := [ rsp] rsp := [rsp+ l] jump rlk

The effect of GCReturn is to partially undo the effect of the GCMark instruction, so that the pointers oldc and rsp are restored to the Before state in Fig. 3.

Page 6: Garbage collection in shared-environment closure reducers: Space-efficient depth first copying using a tailored approach

6 S. Thomas/Information Processing Letters 56 (1995) 1-7

Note that it is impossible to have any kind of tail-call optimisation in this situation, so that oldc is restored before making the last recursive visit in the preserver. This is because GCCalling the last closure would im- mediately corrupt the restored oldc.

3.3. Marking closures

The mechanism by which simply overwriting a clo- sure using the GCMark instruction causes a closure to be protected against further visits is left somewhat vague in the discussion above. The GCCall instruc- tion description implies that there is a run-time check made each time a closure is about to be visited, to see

if it necessary or not. It turns out that this inefficient run-time check can

be eliminated quite simply. The important observation which enables this is that when GCMark overwrites the closure at oldc, the I portion of that closure is overwritten by rlk, which will always be a pointer to

a GCM sequence generated by a GCCall instruction. All that is necessary is to make any GCM sequence that rlk could point to “resemble” a normal informa-

tion table. The first word of this sequence should be a jump instruction, and the normal GCM code should just follow this instruction as normal.

To achieve this a new GCIgnore instruction is in- troduced. This new instruction is placed just before any GCRecover or GCReturn instruction:

GCCopy GCMark GCCall

Ll GCIgnore GCRecover

GCCall

L2 GCIgnore . . .

L, _ 1 GCIgnore GCRecover GCCall

L* GCIgnore GCReturn

kl, Ll

h k2, L2

k-1

km L

This modification has an effect on the GCCall and GCReturn instructions. The GCCall instruction loses its conditional statement:

rlk := 1 oldc:=fm+((kx2)+2) newt := newf + ((k x 2) +2) jump [oldc]

The GCReturn instruction is modified so that it skips over the GCIgnore instruction at the start of the destination sequence:

oldc := rsp rlk := [ rsp] rsp := [rsp+ l] jump rlk + 1

Now that GCCall uses an unconditional jump, the code executed as a result will either be a normal pre- server, or it will be a GCIgnore instruction executed by attempting to revisit a marked closure. In the latter

situation, GCIgnore should simply cause an immedi- ate return to rlk:

jump rlk+ 1

Note that both GCReturn and GCIgnore expect that GCIgnore can be expressed in only one word.

4. Conclusions

This technique has been used in two separate imple- mentations built by the author. The earliest was in the PG-TIM implementation described in 141, and based around the ARM architecture [ 71. This implementa- tion also had GCM variants which used a true aux- iliary stack, and so behaved much more like conven- tional subroutines, More recently, a GCM variant is used in an implementation of the OP-TIM [ 51. This implementation is targeted on the SPARC architecture [3], but is considerably more sophisticated than the description given in this paper, since the OP-TIM is able to store environment frames on the stack, as well as the heap.

The author’s experience of using variants of GCM is that re-using visited closures to form the resump- tion stack gives no performance advantage over using

Page 7: Garbage collection in shared-environment closure reducers: Space-efficient depth first copying using a tailored approach

S. Thomas/Information Processing Letters 56 (1995) l-7 I

a separate auxiliary stack (although any implemen- tation that represents preservers as executable code gives better performance than a collector which inter- prets tables of preserver information). However, the

absence of a separate stack means that correct termi- nation of a garbage collection is guaranteed. Using a separate stack, it would need to have a maximum size proportional to the heap semispace size in order to guarantee correct termination.

What has yet to be explored is how this technique

can be applied to more sophisticated garbage collec- tion approaches, such as generational collection [ 61. The author’s suspicion is that this may be easier said

than done.

Acknowledgements

I would like to thank my wife Jane, without whom I could not be where I am today. Richard Jones provided the initial spur for me to get this paper written, and Mark Jones has provided many helpful comments on this and a variety of other topics.

References

[ 1] C.J. Cheney, A non-recursive list compacting algorithm,

Comm. ACM 13 (1970) 677-678.

[2] J. Fairbairn and S. Wray, TIM: A simple, lazy abstract machine to execute supercombinators, in: Proc. I987 Conf:

on Functional Programming and Computer Architecture,

Portland, OR ( 1987).

[ 31 SPARC International, Inc., The SPARC Architecrure Manual

(Prentice-Hall, Englewood Cliffs, NJ, 1992).

[4] S. Thomas, The pragmatics of closure reduction, Ph.D. Thesis,

The Computing Laboratory, University of Rent at Canterbury,

1993.

[5] S. Thomas, The OP-TIM - A better PG-TIM, Tech. Rept.,

NOTTCS-TR-95-7, Dept. of Computer Science, University of

Nottingham, 1995.

[6] D.M. Ungar, Generation scavenging: A non-disruptive high

performance storage reclamation algorithm, ACM SIGPLAN

Notices 19 (1984) 157-167; also published as ACM Software

Engineering Notes 9, 3 (1984).

[7] VLSI Technology, Inc., Acorn RISC Machine Family Data

Manual (Prentice-Hall, Englewood Cliffs, NJ, 1990).