J. - Semantic Scholar...ext ens ion al pro p ert ie s of lazy fu nct ion al lan guage s are s imp le...

34

Transcript of J. - Semantic Scholar...ext ens ion al pro p ert ie s of lazy fu nct ion al lan guage s are s imp le...

Page 1: J. - Semantic Scholar...ext ens ion al pro p ert ie s of lazy fu nct ion al lan guage s are s imp le an d elegan t; sa t-i sfy more laws t h an str ict on e s. Con v ers ely, e in

J. Functional Programming 1 (1): 1{000, January 1993 c 1993 Cambridge University Press 1Deriving a Lazy Abstract MachinePeter SestoftDepartment of Mathematics and PhysicsRoyal Veterinary and Agricultural UniversityThorvaldsensvej 40, DK-1871 Frederiksberg C, DenmarkE-mail: [email protected] 6 of March 13, 1996AbstractWe derive a simple abstract machine for lazy evaluation of the lambda calculus, startingfrom Launchbury's natural semantics. Lazy evaluation here means non-strict evaluationwith sharing of argument evaluation, that is, call-by-need. The machine we derive is alazy version of Krivine's abstract machine, which was originally designed for call-by-nameevaluation. We extend it with datatype constructors and base values, so the �nal machineimplements all dynamic aspects of a lazy functional language.1 IntroductionThe development of an e�cient abstract machine for lazy evaluation usually startsfrom either a graph reduction machine or an environment machine.Graph reduction machines perform substitution by rewriting the term graph, thatis, the program itself. Sharing of argument evaluation in an application is imple-mented by sharing the subgraph representing the argument term; this is `obviouslycorrect'. However, the rewriting of program subterms precludes e�cient implement-ation on sequential imperative computers, so graph reduction machines must bere�ned to give e�cient implementations. The machines become more complicatedand less obviously correct; cf. the G-machine (Augustsson, 1984; Johnsson, 1984).Environment machines perform substitution by updating the environment, map-ping variables to terms, instead of modifying the program. They resemble sequentialimperative computers and therefore are `obviously e�cient'. However, they imple-ment call-by-name evaluation, and so must be modi�ed to implement sharing ofargument evaluation. The machines become more complicated and less obviouslye�cient; cf. the Three Instruction Machine TIM (Fairbairn & Wray, 1987).Ingenious and e�cient hybrids of graph reduction machines and environmentmachines exist, but since they are improvements of the others, they do not havetheir initial simplicity; cf. the Spineless Tagless G-machine (Peyton Jones, 1992).In this paper we take a di�erent approach. We develop an abstract machine forlazy evaluation from the natural semantics published by Launchbury (1993). Likethe graph reduction schemes, this semantics accounts for sharing from the outset,yet an abstract machine can be derived from it easily.

Page 2: J. - Semantic Scholar...ext ens ion al pro p ert ie s of lazy fu nct ion al lan guage s are s imp le an d elegan t; sa t-i sfy more laws t h an str ict on e s. Con v ers ely, e in

2 P. Sestoft1.1 ContributionsWe show that known implementation techniques can be developed from a publishednatural semantics for lazy evaluation. We do so in a number of re�nement steps,proving the correctness of the non-trivial ones.Initially, we consider a simple language: the lambda calculus augmented withmutually recursive let-bindings. From a natural semantics for this language, weobtain an abstract machine, which is a naturally lazy version of the Krivine machine(Curien, 1988). The machine has four instructions, is simple, close to a sequentialcomputer, and potentially e�cient.Extending the language and its natural semantics to include datatype construct-ors and base values, we show that well-known and e�cient implementations canbe derived for these too. The latter steps may be seen as reverse engineering oftechniques used in the Spineless Tagless G-machine (Peyton Jones, 1992).Thus the development covers all dynamic aspects of a lazy functional language,but not the static aspects, such as type inference. We hope this demonstrates thata lazy functional language implementation can be at the same time clearly correct,small, understandable, and e�cient.1.2 MotivationOne goal of this paper is pedagogical: to understand lazy evaluation. The extensionalproperties of lazy functional languages are simple and elegant; lazy languages sat-isfy more laws than strict ones. Conversely, the intensional aspects, such as spaceconsumption and evaluation order, are harder to understand. In practice, intensionalaspects are as important as extensional ones; an engineer should be able to estimatethe properties, including time and space requirements, of programs, just as he orshe understands the properties of other artefacts (bridges, power lines, computerhardware, etc.). The study of a model implementation, as developed in this paper,may help students understand the intensional aspects of lazy languages.Similarly, a model implementation is useful for describing and experimenting withimplementation design choices, in particular the representation of environments.Another goal of the paper is to provide a foundation for intensional program ana-lyses. For instance, an evaluation order analysis may determine at compile-time (anapproximation to) the evaluation order of subexpressions. Any semantic foundationfor such an analysis must incorporate a notion of evaluation order, and should betrue to a real implementation. An abstract machine with close links to an operationalsemantics seems a good candidate for a foundation.The present work has already been put to such use. Sansom and Peyton Jones(1995) devised a cost semantics , an instrumentation of Launchbury's semantics withtime and space costs. Using a modi�ed version of our derivation and proof theyshow that an abstract machine with pro�ling, close to the actual implementation,is correct with respect to the cost semantics. Working at the level of the semantics,rather than that of the abstract machine, enables a precise discussion of alternativecost attributions.

Page 3: J. - Semantic Scholar...ext ens ion al pro p ert ie s of lazy fu nct ion al lan guage s are s imp le an d elegan t; sa t-i sfy more laws t h an str ict on e s. Con v ers ely, e in

Deriving a Lazy Abstract Machine 31.3 OutlineIn Section 2 we present a small lazy functional language and its operational se-mantics, following Launchbury. In Section 3 we �rst derive a simple abstract machinewith an evaluation stack and prove its correctness with respect to the semantics.We then introduce an environment to avoid modifying the program being evalu-ated, and replace variable names by de Bruijn indices. In Section 4 we identify andsolve a problem with space consumption in the abstract machine. We then extendthe language, the semantics, and the abstract machine with algebraic datatypes inSection 5 and with base values in Section 6. In Section 7 we evaluate the resultingabstract machine, and in Section 8 we discuss related work.2 A lazy language and its natural semanticsBy lazy evaluation we understand normal order reduction to weak head normal form(whnf), with sharing of argument evaluation.2.1 SyntaxLaunchbury (1993) presents a natural semantics for lazy evaluation of so-callednormalized lambda expressions:e ::= �x:e j e x j x j let x1 = e1; : : : ; xn = en in eThe argument in an application must be a variable x; this ensures sharing of argu-ment evaluation by requiring that non-trivial argument expressions are let-bound(and by treating let-bound expressions appropriately). General lambda expressionsmay be transformed into `normalized' ones by introduction of new let-bindings. Let-bindings are simultaneous and recursive, and all xi in a let-binding must be distinct.We write let fxi = eig e for the let-binding let x1 = e1; : : : ; xn = en in e.Throughout, � denotes syntactical identity of expressions, and e[e0=x] denotesna��ve simultaneous substitution of e0 for all free occurrences of x in e:x[e0=x] � e0y[e0=x] � y if x 6� y(�x:e)[e0=x] � �x:e(�y:e)[e0=x] � �y:e[e0=x] if x 6� y(e1 e2)[e0=x] � (e1[e0=x]) (e2[e0=x])(let fxi = eig e)[e0=x] � let fxi = eig e if 9i:x � xi(let fxi = eig e)[e0=x] � let fxi = ei[e0=x]g e[e0=x] if 8i:x 6� xiNo implicit renaming of bound variables is assumed. We write e[pi=xi] for the sim-ultaneous substitution e[p1=x1; : : : ; pn=xn], where the x1; : : : ; xn must be distinct.2.2 Launchbury's semanticsA heap or store � = f: : : ; x 7! e; : : :g is a mapping from variables x to expressionse. By dom� we denote the heap's domain (the set of variables x bound by �),

Page 4: J. - Semantic Scholar...ext ens ion al pro p ert ie s of lazy fu nct ion al lan guage s are s imp le an d elegan t; sa t-i sfy more laws t h an str ict on e s. Con v ers ely, e in

4 P. Sestoft� : �x:e + � : �x:e Lam� : e + � : �y:e0 � : e0[x=y] + � : w-------------------------------------------------------------------------------------------------------------------------------------------------------------------------� : e x + � : w App� : e + � : w-------------------------------------------------------------------------------------------------------------------------�[x 7! e] : x + �[x 7! w] : bw Var�[xi 7! ei] : e + � : w-----------------------------------------------------------------------------------------------------------------� : let fxi = eig e + � : w LetFig. 1. Launchbury's natural semantics for lazy evaluationby rng � its range (the set of expressions e bound in �), and by �[x 7! e] theheap which maps x to e and any other variable y to �[y]. We write �[pi 7! ei] for�[p1 7! e1; : : : ; pn 7! en].A con�guration � : e consists of a heap � and an expression e to be evaluated. Ajudgement � : e + � : w says that in the heap �, the expression e will evaluate tothe value w, producing the new heap �. Launchbury de�nes the relation + on con-�gurations by a set of inference rules, that is, an operational semantics, reproducedin Figure 1. He proved this semantics correct with respect to a denotational one.Rule Lam: A lambda abstraction �x:e is already a value (in whnf) and thereforeevaluates to itself; the heap � is unmodi�ed.Rule App: An application (e x) is evaluated in heap � by evaluating e to a lambdaabstraction �y:e0, producing a new heap �. The argument x is substituted for theformal parameter y in e0, and e0[x=y] is evaluated in the heap � to obtain the �nalresult w and �nal heap �.Rule Var : A variable x which is bound to expression e in heap �[x 7! e] isevaluated by evaluating e in � to obtain a value w and new heap �. To achievelaziness (that is, avoid re-evaluation of e), the heap must be updated with thereduced value w, giving the heap �[x 7! w]. This duplicates the expression wand might cause name clashes later, so all bound variables in the other copy of ware replaced by fresh variables, as indicated by the renaming notation bw. Observethat e is evaluated in heap � which has no binding for x. If the evaluation of e towhnf requires x, then the evaluation fails, indicating a `black hole', a direct self-dependency in the program, as in let x = x in x or let x = y; y = x in x.Rule Let : A let-binding let fxi = eig in e is evaluated by binding the expressionsei to the variables xi, and then evaluating e in the resulting heap �[xi 7! ei].A program is a closed expression e in which all bound variables are distinct. Thevalue w of a program e is computed by �nding a derivation of fg : e + � : w. Aneasy induction shows that the result w, if any, will be a lambda abstraction �y:e0.The semantics rules are deterministic modulo variable renaming, and essentially

Page 5: J. - Semantic Scholar...ext ens ion al pro p ert ie s of lazy fu nct ion al lan guage s are s imp le an d elegan t; sa t-i sfy more laws t h an str ict on e s. Con v ers ely, e in

Deriving a Lazy Abstract Machine 5sequential: to build a derivation tree, one must determine the �nal heap of anyleft-hand premise before proceeding to any right-hand premise.2.3 Properties of Launchbury's semanticsThe semantics rules adequately model lazy evaluation: no function argument isevaluated more than once, and no unevaluated expression gets duplicated.Namely, in a normalized expression (see Section 2.1), any non-trivial functionargument e must be bound to a let-variable x, and no such let-bound expressionis evaluated to whnf twice. At its �rst use, the let-bound x is removed from theheap �, evaluated, and then rebound to the whnf w of e. An attempt to refer to xagain before its rebinding will fail; rule Var is not applicable unless x is bound inthe heap. After the rebinding, every subsequent use of x will just retrieve the whnffrom the heap, via rules Var and Lam.Furthermore, only expressions which are in whnf are ever duplicated. In the Apprule, e0[x=y] substitutes a variable x for another variable y, and therefore can du-plicate only variables. In the Var rule, the duplicated expression w is a whnf of form�y:e0.Thus the rules model lazy evaluation. However, the duplication of w � �y:e0in rule Var violates full laziness (Peyton Jones, 1987, Chapter 15), since e0 maycontain a redex with no free occurrences of y. Such a redex may be wastefully re-evaluated at each application of w � �y:e0. If desired, full laziness can be obtained byintroducing a new let-binding for every maximal free expression: recursively replace�y:(: : : e : : :), where e is a maximal free expression, by let x = e in �y:(: : : x : : :),where x is a fresh variable.The semantics must avoid variable capture in the na��ve substitution e0[x=y] in theApp rule, and must avoid rebinding of any variable x already bound in heap � inthe Var rule. Launchbury ensures this by requiring all bound variables in a programe to be distinct, and by renaming all bound variables at the only point where anexpression w containing bound variables may be duplicated: the Var rule.However, this renaming strategy is unsuitable as basis for an abstract machinedesign. One problem is that variable freshness is not locally checkable in the Varrule of Figure 1, because a binding x 7! e may have been deleted from � in aprevious application of the Var rule. To see this, evaluate the expressionlet s = �z:z in let p = (q s); q = (�y:let r = y in r) in pin an empty heap. When applying the Var rule to evaluate and then re-bind qduring the evaluation of p, it seems possible to rename r to p, because p occurs inno local expression or heap. But this would cause trouble later when re-binding theevaluated p in the heap �, invalidating Launchbury's (1993, page 149) Theorem 1about distinct naming.Hence to check freshness, one must inspect all of the derivation tree built so far,which is undesirable. Furthermore, it is unnecessary to rename all bound variablesof w in the Var rule; a more economical approach is adequate, and preferable in animplementation.

Page 6: J. - Semantic Scholar...ext ens ion al pro p ert ie s of lazy fu nct ion al lan guage s are s imp le an d elegan t; sa t-i sfy more laws t h an str ict on e s. Con v ers ely, e in

6 P. Sestoft� : �x:e +A � : �x:e Lam� : e +A � : �y:e0 � : e0[p=y] +A � : w--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------� : e p +A � : w App� : e +A[fpg � : w------------------------------------------------------------------------------------------------------------------------------�[p 7! e] : p +A �[p 7! w] : w Var�[pi 7! ei] : e +A � : w------------------------------------------------------------------------------------------------------------------------ (�)� : let fxi = eig e +A � : w Let(�) In the Let rule, the variables p1; : : : ; pn must be distinct and fresh: they must notoccur in A or � or let fxi = eig e. The notation e means e[p1=x1; : : : ; pn=xn].Fig. 2. Revised natural semantics for lazy evaluation2.4 Revising the semanticsWe therefore change the renaming machinery, obtaining the revised semantics inFigure 2. It is identical to Launchbury's in all respects other than renaming. In thenext section we show that it avoids variable capture.First, we make freshness locally checkable by extending each judgement with theset A of the variables x left out of � in the premise of the Var rule. Intuitively, Ais the set of variables whose values are currently being computed.Secondly, we move the renaming from the Var rule to the Let rule, and renameonly let-bound variables. Renaming let x1 = e1; : : : ; xn = en in e with fresh vari-ables p1; : : : ; pn gives the expression let p1 = e1; : : : ; pn = en in e, where e de-notes the result of the na��ve substitution e[p1=x1; : : : ; pn=xn]. Since all xi are dis-tinct, this is well-de�ned. A variable p is fresh if it does not occur in A or � orlet x1 = e1; : : : ; xn = en in e. Intuitively, the introduction of fresh variables corres-ponds to allocation of unused heap addresses p1; : : : ; pn, and is therefore naturallydone in the Let rule rather than the Var rule.The value (whnf) w of a closed expression e is computed by �nding a derivationof fg : e +fg � : w using the revised semantics rules.Henceforth we distinguish the heap pointers p introduced by the Let rule from theprogram variables x originating from the expression to be evaluated. The soundnessof this distinction follows from the following observation: in a derivation tree forevaluation of a closed expression, all occurrences of heap pointers p in expressionsare free (with p possibly bound in the heap), and all occurrences of program variablesx are bound by let or lambda. This is proved below.

Page 7: J. - Semantic Scholar...ext ens ion al pro p ert ie s of lazy fu nct ion al lan guage s are s imp le an d elegan t; sa t-i sfy more laws t h an str ict on e s. Con v ers ely, e in

Deriving a Lazy Abstract Machine 72.5 Properties of the revised semanticsWe must show that there is no variable capture in the substitution e0[p=y] in ruleApp, and no rebinding of a variable p already bound in heap � in rule Var .For a given expression e, let Bv(e) denote its let- or lambda-bound variables andFv(e) its free variables, and extend this notation to con�gurations:Bv(� : e) = Bv(e) [ Sf Bv(e0) j e0 2 rng � gFv(� : e) = Fv(e) [ Sf Fv(e0) j e0 2 rng � gThe revised semantics satis�es: In a derivation of fg : e +fg � : w where e is closed,all bound variables are program variables x from e, and all free variables are heappointers p introduced by the Let rule. We now formalize this.De�nition 1Let A be a set of variables. The con�guration �0 : e0 is A-good if(1) A and dom�0 are disjoint;(2) Fv(�0 : e0) � A [ dom�0; and(3) Bv(�0 : e0) and A [ dom�0 are disjoint.In the context of a judgement �0 : e0 +A �1 : e1, these requirements say: (1) novariable whose value is being computed is also bound in �0; (2) every free variableof e0 is either being computed or is bound in �0: there are no dangling pointers;and (3) program variables and pointers are distinct in e0. 2De�nition 2The judgement �0 : e0 +A �1 : e1 is promising if �0 : e0 is A-good. 2In a derivation of a promising judgement, the distinction between program variablesand pointers is maintained everywhere:Lemma 1Assume the judgement �0 : e0 +A �1 : e1 has a derivation. If �0 : e0 is A-good,then �1 : e1 is A-good, dom�0 � dom�1, and every judgement in the derivation ispromising.Proof By induction on the structure of the derivation. 2Proposition 1Let e be a closed expression and consider a derivation of fg : e +fg � : w. In noinstance of rule App can there be any variable capture in e0[p=y], and in no instanceof rule Var is p already bound in �.Proof Since e is closed, fg : e is fg-good by de�nition. By Lemma 1, every judge-ment � : e0 +A � : e1 in the derivation is promising, and in every such judgement,the con�gurations � : e0 and � : e1 are A-good. It follows that:In every instance of the App rule, p 2 Fv(� : e p) � A [ dom� � A [ dom�because � : e p is A-good. Since � : �y:e0 is A-good, Bv(� : �y:e0) is disjoint fromA [ dom�, so p =2 Bv(�y:e0): variable p is not bound in �y:e0. Hence p cannot becaptured in the substitution e0[p=y].In every instance of the Var rule, A [ fpg and dom� are disjoint since � : w is(A [ fpg)-good. Hence p is not already bound in �. 2

Page 8: J. - Semantic Scholar...ext ens ion al pro p ert ie s of lazy fu nct ion al lan guage s are s imp le an d elegan t; sa t-i sfy more laws t h an str ict on e s. Con v ers ely, e in

8 P. SestoftProposition 1 says that if expression e is closed, then there can be no variablecapture or rebinding in a derivation of fg : e +fg � : w which computes the value wof e. The revised renaming strategy is adequate. Note that the bound variables of eneed not be distinctly named.To illustrate renaming, let us evaluate the expression let x = x in (�y:�x:y)x, inwhich one should not na��vely substitute x for y in the beta-reduction:-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------Lamfp 7!pg : �y:�x:y +fg fp 7!pg : �y:�x:y ---------------------------------------------------------------------------------------------------------------------------------------------------Lamfp 7!pg : �x:p +fg fp 7!pg : �x:p-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------Appfp 7!pg : (�y:�x:y)p +fg fp 7!pg : �x:p----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- Letfg : let x = x in (�y:�x:y)x +fg fp 7!pg : �x:pIn the premise of the Let rule, a fresh variable p has been substituted for x in theexpressions x and (�y:�x:y)x. In the second premise of the App rule, the argumentp has been substituted for y in �x:y; there is no variable capture.3 Deriving a simple abstract machineOperationally speaking, evaluation in the natural semantics builds a derivation treefor fg : e +fg � : w from the bottom up, whereas computation by an abstractmachine builds a state sequence. We must turn the recipe for tree construction(Figure 2 above) into a recipe for state sequence construction (Figure 3 below).The challenge lies in the representation of the context of subtrees. For instance,when applying the Var rule, we build a subtree for the premise, and the context(the rule) tells us that we must update the heap � at p with the computed value wafterwards to create �[p 7! w]. This context must be made explicit in the abstractmachine: after the state sequence corresponding to the subtree, we must rememberto update the heap at p.Only the App and Var rules change the context of subtrees. In the App rule, wemust remember to build a second subtree after building the �rst one, and in the Varrule, we must remember to update the heap after building the subtree. Although asubtree is built in the Let rule, the result � : w of the subtree is also the result ofthe entire tree, so no new context is needed.The �rst step towards an abstract machine is to make the context (or continu-ation) of each subtree explicit in the state. When the context is extensible as here(e.g. if t0 is a subtree of t, then the context of t0 is an extension of the contextof t), it is convenient to use a stack for representing the context. This is a long-standing tradition: the control context of a procedure invocation in Algol or Pascalis represented by a stack of return addresses.3.1 First step: introduce a stackA state of the �rst abstract machine is a triple (�; e; S) where � and e are just theheap and the expression of a con�guration � : e, and S is a stack which representsthe context of this con�guration: part of a surrounding derivation tree. Traditionally,the e component is called the control of the abstract machine.

Page 9: J. - Semantic Scholar...ext ens ion al pro p ert ie s of lazy fu nct ion al lan guage s are s imp le an d elegan t; sa t-i sfy more laws t h an str ict on e s. Con v ers ely, e in

Deriving a Lazy Abstract Machine 9Heap Control Stack rule� (e p) S app1=) � e p : S� �y:e p : S app2=) � e[p=y] S�[p 7! e] p S var1=) � e #p : S� �y:e #p : S var2=) �[p 7! �y:e] �y:e S� letfxi = eige S let (�)=) �[pi 7! ei] e S(�) In the let rule, the notation e means e[p1=x1; : : : ; pn=xn], where variables p1; : : : ; pnmust be distinct and fresh: they must not occur in � or let fxi = eig e or S.Fig. 3. Abstract machine mark 1, with stack� The Lam rule does not give rise to any machine rules; no machine action isrequired to leave the heap and expression as they are.� The App rule from the natural semantics gives rise to two machine rules: app1which begins the computation corresponding to the left subtree, and app2which begins the computation corresponding to the right subtree.� The Var rule gives rise to two machine rules: var1 which begins the computa-tion corresponding to the subtree, and var2 which updates the heap with thecomputed result.� The Let rule gives rise to the machine rule let which allocates new heapaddresses and begins the computation corresponding to the subtree.The stack S is a list of arguments p and update markers #p. A reduced value wmust be a lambda abstraction �y:e. Whenever the control is a lambda abstraction�y:e, we must examine the stack top element, and act accordingly:� an argument p on the stack top is a reminder that �y:e must be applied to p,that is, e[p=y] must be evaluated. This is done by the app2 rule.� an update marker #p on the stack top is a reminder that the heap must beupdated with [p 7! �y:e]. This is done by the var2 rule. Intuitively, �y:e isthe value of some expression that was previously bound to p in the heap; toachieve sharing, the binding of p must be updated with this reduced value.Values other than lambda abstractions will be considered later; they must check forupdate markers #p also, to correctly implement the Var rule. The �rst version ofthe abstract machine is shown in Figure 3. The set of update markers #p in thestack corresponds closely to the set A used in the revised natural semantics rules.The initial heap and the stack are empty, so the initial state is (fg; e; [ ]). Themachine terminates when no rule applies. Hence a terminal state either has form

Page 10: J. - Semantic Scholar...ext ens ion al pro p ert ie s of lazy fu nct ion al lan guage s are s imp le an d elegan t; sa t-i sfy more laws t h an str ict on e s. Con v ers ely, e in

10 P. SestoftHeap � Control e Stack S (rule)[ ] let y=�x:x; v=(�z:z) y in v v [ ]) fp 7!�x:x; q 7!(�z:z) pg q q [ ] (let)) fp 7!�x:x; q 7!(�z:z) pg q [q] (app1)) fp 7!�x:xg (�z:z) p [#q; q] (var1)) fp 7!�x:xg �z:z [p;#q; q] (app1)) fp 7!�x:xg p [#q; q] (app2)) fg �x:x [#p;#q; q] (var1)) fp 7!�x:xg �x:x [#q; q] (var2)) fp 7!�x:x; q 7!�x:xg �x:x [q] (var2)) fp 7!�x:x; q 7!�x:xg q [ ] (app2)) fp 7!�x:xg �x:x [#q] (var1)) fp 7!�x:x; q 7!�x:xg �x:x [ ] (var2)Fig. 4. Example evaluation with abstract machine mark 1(�; �y:e; [ ]), representing successful termination with result �y:e, or form (�; p; S)where p =2 dom �, representing the discovery of a black hole in the program. Therules are deterministic modulo the choice of fresh variables.In the terminology of Peyton Jones (1992, Section 3.2), the natural semantics inFigure 2 is an eval/apply model, whereas the abstract machine in Figure 3 is apush/enter model. The former evaluates an application (e p) by �rst evaluating e,then applying the result to p. The latter evaluates (e p) by �rst pushing the argumentp, then entering the function e. The derivation above and the correctness proof belowshow that these models are closely related.To see the machine in action, and to show how it achieves sharing of subcompu-tations, consider the evaluation of let y = �x:x; v = (�z:z) y in v v in Figure 4.The �nal result is �x:x, as expected. In the �rst step, fresh heap pointers p and qare allocated and substituted for free occurrences of y and v. The redex (�z:z) yis reduced only once, although it is used twice in the let-body. The last applica-tion of rule var2 needlessly overwrites an expression which is already in whnf. Suchidentical updates could be avoided by not pushing an update marker when accessinga whnf (by rule var1). This re�nement of the machine may be introduced at a laterstage, if desired.3.2 Correctness of the �rst abstract machineThe correctness of the abstract machine with respect to the revised semantics isestablished by Proposition 2 below, but �rst we need some auxiliary properties. Letap(S) = fq j q is in Sg stand for the set of argument pointers on the stack S, andlet similarly #(S) = fq j #q is in Sg stand for the set of update markers on thestack S.Lemma 2

Page 11: J. - Semantic Scholar...ext ens ion al pro p ert ie s of lazy fu nct ion al lan guage s are s imp le an d elegan t; sa t-i sfy more laws t h an str ict on e s. Con v ers ely, e in

Deriving a Lazy Abstract Machine 11For all �, e, A, S, �, and e0 such that � : e is A-good, A = #(S), and ap(S) �#(S) [ dom�, if � : e +A � : e0 is derivable then (�; e; S))� (�; e0; S).Proof By induction on the derivation of � : e +A � : e0. Lemma 1 ensures that allpremises are promising, and that ap(S) � #(S) [ dom�.Case Lam: The Lam rule says that � : �y:e +A � : �y:e; but clearly (�; �y:e; S))�(�; �y:e; S) by the empty sequence of computation steps.Case App: Assume � : e p +A � : w by rule App. Note that ap(p : S) = fpg[ap(S) �Fv(� : (e p)) [ ap(S) � #(S) [ dom� by Lemma 1, so the induction hypothesisapplies in line three below.(�; (e p); S)) (�; e; p : S) by rule app1)� (�; �y:e0; p : S) by left premise and ind. hyp.) (�; e0[p=y]; S) by rule app2)� (�; w; S) by right premise and ind. hyp.Case Var : Assume �[p 7! e] : p +A �[p 7! w] : w by rule Var . Observe that wis necessarily a lambda abstraction �y:e0. Moreover, for line three below, note thatA [ fpg = #(#p : S), so the induction hypothesis applies.(�[p 7! e]; p; S)) (�; e;#p : S) by rule var1)� (�; �y:e0;#p : S) by the premise and ind. hyp.) (�[p 7! �y:e0]; �y:e0; S) by rule var2Case Let : Assume � : let fxi = eig e +A � : w by rule Let , and that p1; : : : ; pn donot occur in A or � or let fxi = eig e. Then they do not occur in S, that is, in #(S)or ap(S), so(�; let fxi = eig e; S)) (�[pi 7! ei]; e; S) by rule let)� (�; w; S) by the premise and ind. hyp. 2The Lemma shows that the abstract machine can simulate derivations by the naturalsemantics. To show the converse, that it computes no more results than the naturalsemantics, we introduce the concept of balanced computation. The intention is thata balanced computation corresponds to a derivation (sub)tree.We say that stack S0 extends stack S if S0 = r1 : : : : : rn : S for some stackobjects r1; : : : ; rn, where n � 0.De�nition 3A balanced computation is a computation (�; e; S))� (�; e0; S) in which the initialand �nal stacks are the same, and in which every intermediate stack extends theinitial one. 2Every successful computation (fg; e0; [ ]) )� (�; �y:e0; [ ]) is balanced. We want toprove that for every balanced computation (satisfying some further restrictions)

Page 12: J. - Semantic Scholar...ext ens ion al pro p ert ie s of lazy fu nct ion al lan guage s are s imp le an d elegan t; sa t-i sfy more laws t h an str ict on e s. Con v ers ely, e in

12 P. Sestoftthere is a corresponding derivation tree. First we show that a balanced computationhas a simple structure.De�nition 4The trace of a computation (�0; e0; S0) tr1=) (�1; e1; S1) tr2=) � � � trn=) (�n; en; Sn)where n � 0, is the sequence tr1; tr2; : : : ; trn of transition rules used. A balancedtrace is the trace of a balanced computation. 2What are the possible forms of balanced traces? The empty trace, having one stateand no transitions, is balanced. Assume the initial stack is S. Any non-empty bal-anced trace must begin with app1, var1, or let, since app2 or var2 would producean intermediate stack which is not an extension of S.If the trace begins with app1, producing an intermediate stack of form p : S, theneventually an app2 transition must occur which restores the stack to S; no othertransition can do this. The subtrace between app1 and the �rst occurrence of app2is balanced (with stacks which are extensions of p : S), and the subtrace followingit is balanced (with stacks which are extensions of S). Hence the trace has the formapp1 bal app2 bal, where bal stands for arbitrary balanced traces.If the trace begins with var1, producing an intermediate stack of form #p : S,then eventually a var2 transition must occur which restores the stack to S. The sub-trace between var1 and the �rst occurrence of var2 is balanced. Furthermore, sincethe control (before and) after var2 must be �y:e, only an app2 or var2 transitioncould follow, but either would remove an element from the stack and contradict thebalancedness of the trace. Hence the occurrence of var2 is the last element of thetrace, which must have the form var1 bal var2.If the trace begins with let, then the subtrace after let must be balanced, so thetrace has form let bal.In summary, all balanced traces can be derived from the following grammar:bal ::= � j app1 bal app2 bal j var1 bal var2 j let balThe four possibilities correspond to the four natural semantics rules Lam, App, Var ,and Let in Figure 2. Thus potentially, a balanced trace has the same structure asa derivation tree; the following lemma gives a formal proof, which shows that thenatural semantics can simulate any balanced computation of the machine.Lemma 3For all �0, e0, S, �1, w, and A it holds that if (�0; e0; S))� (�1; w; S) is balanced,and w � �v:e1, and A = #(S) then �0 : e0 +A �1 : w is derivable.Proof By induction on the structure of balanced traces, following the grammar.Case �: follows by rule Lam because we have �0 = �1 and e0 � w � �v:e1.Case app1 bal app2 bal: We must have e0 � (e p). The state after app1 must be(�0; e; p : S) and the state before app2 must be (�; �y:e0; p : S). Since the tracebetween these is balanced, �0 : e +A � : �y:e0 is derivable by the induction hy-pothesis. The state after app2 is (�; e0[p=y]; S), and the trace of (�; e0[p=y]; S) )�(�1; w; S) is balanced, so � : e0[p=y] +A �1 : w is derivable by the induction hypo-thesis. Using the App rule, we conclude that �0 : e p +A �1 : w is derivable.

Page 13: J. - Semantic Scholar...ext ens ion al pro p ert ie s of lazy fu nct ion al lan guage s are s imp le an d elegan t; sa t-i sfy more laws t h an str ict on e s. Con v ers ely, e in

Deriving a Lazy Abstract Machine 13Case var1 bal var2: We must have e0 � p and �0 = �[p 7! e] and �1 = �[p 7!�v:e1] for some p, �, e, and �. The state after var1 is (�; e;#p : S), and thestate before var2 is (�; w;#p : S), with w � �v:e1. Since the subtrace betweenthese is balanced, � : e +A[fpg � : w is derivable by the induction hypothesis,which is applicable because A [ fpg = #(#p : S). Using the Var rule we �nd that�[p 7! e] : p +A �[p 7! w] : w is derivable.Case let bal: We must have e0 � let fxi = eig e, and for renaming of e somep1; : : : ; pn have been chosen that do not occur in �0 or let fxi = eig e or S.The state after let is (�0[pi 7! ei]; e; S), and the trace after let is balanced, so�0[pi 7! ei] : e +A �1 : w is derivable by the induction hypothesis. Since thep1; : : : ; pn do not occur in �0 or A or letfxi = eige, we can use rule Let to concludethat �0 : letfxi = eige +A �1 : w is derivable. 2Proposition 2Assume w � �y:e1. Then (fg; e0; [ ]))� (�; w; [ ]) if and only if fg : e0 +fg � : w isderivable.Proof If: follows from Lemma 2 because fg : e0 +fg � : w is derivable and prom-ising. Only if: observe that the computation (fg; e0; [ ]) )� (�; w; [ ]) is necessarilybalanced; then the result follows from Lemma 3. 2Proposition 2 says that the abstract machine terminates with a value �v:e1 in thecontrol if and only if the natural semantics successfully derives this value. Bothsystems are deterministic, modulo the choice of fresh variables in the Let and letrules. As a second possibility, the machine may terminate with a heap pointer p inthe control, indicating a `black hole', in which case the natural semantics permitsno derivations at all. As a third possibility, the machine may embark on an in�nitecomputation, corresponding to an in�nite derivation tree in the natural semantics.Determinism implies that these possibilities are mutually exclusive.In retrospect it is not surprising that proving Lemma 2 requires less machinerythan Lemma 3. In the former proof we must create a sequence from a tree, which isjust a matter of recursive attening. In the latter proof we must create a tree froma sequence, which is harder: the notion of balanced trace is a device to recover thetree structure from the sequence.3.3 Second step: introduce environments and closuresThe abstract machine above uses substitution in the expression e to model applic-ation (in the app2 rule) and renaming (in the let rule). This is unsatisfactory froman implementation point of view because it modi�es the expression at run-time.To avoid this, we introduce an environment E, which is a mapping from programvariables x to heap pointers p. The environment can be thought of as a delayedsubstitution, which is not applied until we meet a program variable x in the control.When we would previously perform the substitution e[p=y], obtaining a new ex-pression e0, we will now build the pair (e; fy 7! pg) of the expression e and theenvironment fy 7! pg. In general, where we previously had an expression e0, weshall now have a pair (e; E) of an expression e and an environment E such that e0

Page 14: J. - Semantic Scholar...ext ens ion al pro p ert ie s of lazy fu nct ion al lan guage s are s imp le an d elegan t; sa t-i sfy more laws t h an str ict on e s. Con v ers ely, e in

14 P. SestoftHeap Control Environment Stack rule� (ex) E[x 7!p] S app1=) � e E[x 7!p] p : S� �y:e E p : S app2=) � e E[y 7!p] S�[p 7!(e0; E0)] x E[x 7!p] S var1=) � e0 E0 #p : S� �y:e E #p : S var2=) �[p 7!(�y:e;E)] �y:e E S� letfxi=eige E S let (�)=) �[pi 7!(ei; E0)] e E0 S(�) In the let rule, the variables p1; : : : ; pn must be distinct and fresh: they must notoccur in � or letfxi=eige or S. The new environment E0 is E[x1 7!p1; : : : ; xn 7!pn].Fig. 5. Abstract machine mark 2, with stack and environmentis eE, the result of applying the substitution E to the expression e. The pair (e; E)is traditionally called a closure.Henceforth the heap maps pointers to closures (e; E) instead of expressions e.Similarly, to bind the free variables of the control e, we add an environment E tothe machine state, which becomes a four-tuple (�; e; E; S). Now an expression e inthe control or the heap may have free program variables, but these will be boundin the environment E associated with e.The resulting abstract machine is shown in Figure 5. Each of the revised app1and var1 rules performs two tasks: �rst it �nds the heap pointer p bound to x in E,and then it behaves as the old app1 or var1 rule.The correctness of this modi�cation is clear, since the computation sequencesof the mark 1 and mark 2 machines are closely related. Every state of a mark 2computation can be mapped to a corresponding state of the mark 1 machine, byreplacing every closure (e; E) with the expression eE.Thus the introduction of environments does not change the result of a computa-tion. However, it increases the set of heap addresses p transitively reachable fromthe machine components e, E, and S. Namely, in the mark 1 rule app2, the substi-tution e[p=y] would embed p in the new control only if y occurs free in e. In themark 2 rule app2, p will be retained in the new environment E[y 7! p] regardlesswhether y occurs free in e or not. This can make a considerable di�erence, sincemany heap addresses may be transitively reachable from the closure pointed to byp in the heap �. In terms of lazy language implementations, we have introduced aspace leak . The same problem appears in the mark 2 treatment of let-bindings. Weshall return to this issue in Section 4.

Page 15: J. - Semantic Scholar...ext ens ion al pro p ert ie s of lazy fu nct ion al lan guage s are s imp le an d elegan t; sa t-i sfy more laws t h an str ict on e s. Con v ers ely, e in

Deriving a Lazy Abstract Machine 15Heap Control Environment Stack rule� (e u) E=[p1; : : : ; pk] S app1=) � e E pu : S� �e E p : S app2=) � e p : E S�[pu 7!(e0; E0)] u E=[p1; : : : ; pk] S var1=) � e0 E0 #pu : S� �e E #p : S var2=) �[p 7!(�e;E)] �e E S� let feig e E S let (�)=) �[pi 7!(ei; E0)] e E0 SIn the app1 and var1 rules, pu is the u'th element of the environment E = [p1; : : : ; pk].(�) In the let rule, the addresses p1; : : : ; pn must be fresh: they must not occur in � or S.The new environment E0 is p1 : : : : : pn : E.Fig. 6. Abstract machine mark 3, with stack, environment, and de Bruijn indices3.4 Third step: introduce variable indicesIn this step, we get rid of program variable names, replacing them with de Bruijnindices, which are numbers similar to variable o�sets in conventional compilerterminologyy. The syntax of normalized lambda expressions with de Bruijn indicesis: e ::= �e j e u j u j let e1; : : : ; en in ewhere u is a de Bruijn index (a positive integer). An environment E is now amapping from positive integers to heap pointers, and can be represented as a list[p1; : : : ; pk] of heap pointers, such that E maps index u to E[u] = pu when 1 � u � k.More e�cient representations of E exist.The transition rules of the �nal abstract machine are shown in Figure 6. Rulesapp2 and let ensure that the environment E = [p1; : : : ; pk] contains bindings forall variables in scope, with p1 most recently bound and pk least recently bound.Thus an occurrence of variable x must be replaced by a de Bruijn index equal tothe number of lambdas between the binding lambda �x : : : and the occurrence, plusone; this is standard (Barendregt, 1984, Appendix C).y The introduction of de Bruijn indices at this point may seem premature, but it permitscomparison with the Krivine machine, and it permits the derivation of a well-knownand simple representation of data constructors in Section 5.5, which in turn leads to awell-known implementation of base values in Section 6.2.

Page 16: J. - Semantic Scholar...ext ens ion al pro p ert ie s of lazy fu nct ion al lan guage s are s imp le an d elegan t; sa t-i sfy more laws t h an str ict on e s. Con v ers ely, e in

16 P. Sestoft3.5 Related abstract machinesThe lazi�ed Krivine machines of Cr�egut (1991, page 30) and Sestoft (1991, page31) may be obtained from the present one by combining the rules let and app1into a single rule which requires every application (e1 e2) to take the restrictedform let x = e2 in (e1 x). That is, those machines always allocate a closure forthe argument to ensure sharing. When evaluating an application (e1 y) in which theargument expression is already a variable y, this creates a super uous indirectionand requires an extra update. In a recursive function, a chain of indirections maybuild up, causing a space leak.Hence one advantage of the present machine is that sharing (the let rule) has beenseparated from application (the app1 rule). We introduce let-bindings only wherenecessary, by transforming the expressions to normalized form. In addition, let cande�ne recursive functions and cyclic data; the other lazi�ed Krivine machines needa separate mechanism for that.To study our machine's relation to the Three Instruction Machine TIM (Fairbairn& Wray, 1987), we �rst observe that normalized lambda expressions are closelyrelated to ordinary sequential code:(e u) reads Push u; e Push pointer onto stack�e reads Take; e Take one argument from stacku reads Enter u Enter closurelet feig e reads Let feig; e Make recursive bindingsInspection of the rules in Figure 6 shows that each of the four instructions Push,Take, Enter, and Let performs a bounded amount of computation. The three �rstinstructions are those of the TIM, except that TIM's Take n instruction can taken � 1 arguments from the stack, rather than a single one. Lazy versions of theTIM usually tie sharing to argument passing, as do the lazi�ed Krivine machines,and with the same consequences: chains of indirections will build up unless specialprecautions are made. Again, using let-bindings to separate sharing from argumentpassing is preferable.The TIM's Take n instruction is more e�cient than our single-argument Takeinstruction in the call-by-name case, but causes complications in the call-by-needcase, when there may be update markers in between the n arguments to be takeno� the stack.The abstract machine in Figure 6 is a Four Instruction Machine. Experimentalimplementations of the machine are easily written in Scheme or Standard ML, whichprovide destructive update as well as an underlying garbage collector.

Page 17: J. - Semantic Scholar...ext ens ion al pro p ert ie s of lazy fu nct ion al lan guage s are s imp le an d elegan t; sa t-i sfy more laws t h an str ict on e s. Con v ers ely, e in

Deriving a Lazy Abstract Machine 174 Space considerationsThe above abstract machine is quite e�cient, but uses too much memory; it has aspace leak . The environments hold on to useless closures, as hinted in Section 3.3.Here we solve this problem. 4.1 A leaky programTo see that the machine as presented so far leaks space, consider the followingexample program, where I abbreviates �y:y:let f = (�n: let x = I in (f x))in (f f)Evaluation of this expression will not terminate. However, the real problem is thatevaluation requires an unbounded amount of space. We shall evaluate the expressionusing the mark 2 machine in Figure 5; using the mark 3 machine with de Bruijnindices would obscure the presentation.Evaluation of the outer let allocates a closure ((�n: let x = I in (f x)); ff 7! pfg)for f in the heap, at address pf say. Evaluation of the call (f f) in the let-body bindsn to the pointer pf , and enters the body of f with environment fn 7! pf ; f 7! pfg.The �rst evaluation of the inner let allocates a closure (I; E1) for x, at address px;1say. The environment part E1 = fx 7! px;1; n 7! pf ; f 7! pfg contains bindings forx, n, and f , none of which is needed for evaluating I . The recursive call (f x) bindsn to the pointer px;1 and enters the body of f with environment fn 7! px;1; f 7! pfg.The second evaluation of the inner let allocates a new closure (I; E2) for x, ataddress px;2 say, where the environment part is E2 = fx 7! px;2; n 7! px;1; f 7! pfg.The recursive call (f x) binds n to px;2 and enters the body of f with environmentfn 7! px;2; f 7! pfg.The third evaluation of the inner let allocates a new closure (I; E3) for x, ataddress px;3 say, where E3 = fx 7! px;3; n 7! px;2; f 7! pfg. Then it evaluates therecursive call, and so on.The undesirable overall e�ect is to build up an ever-growing chain of closures forx, in which px;n points to a closure containing px;n�1, which points to a closurecontaining px;n�2, and so on, down to px;1. At any point of execution, all theseclosures are reachable from the abstract machine's environment or stack, so none ofthem can be garbage-collected. Consequently, the space consumption grows linearlywith the execution time of the program.4.2 Practical consequencesThe above example is contrived, but similar space leaks would appear in mostrecursive programs. To show that the problem is signi�cant in practice, we imple-mented an extended version of the leaky abstract machine and ran it using StandardML of New Jersey. The data given below are based on the heap sizes reported bythat implementation. Printing a pre�x of the list of natural numbers, de�ned by

Page 18: J. - Semantic Scholar...ext ens ion al pro p ert ie s of lazy fu nct ion al lan guage s are s imp le an d elegan t; sa t-i sfy more laws t h an str ict on e s. Con v ers ely, e in

18 P. SestoftHeap Control Environment Stack rule� letfxi = (ei; ti)g(e; t) E S let0 (�)=) �[pi 7!(ei; E0jti)] e E0jt S(�) The variables p1; : : : ; pn must be fresh: they must not occur in � or let : : : or S. Thenew environment E0 (before trimming) is E[x1 7! p1; : : : ; xn 7! pn].Fig. 7. Revised let-rule with environment trimming for abstract machine mark 2let nats = cons 0 (map add1 nats) in natsrequires heap space proportional to the length of the pre�x printed. In the improvednon-leaky machine presented below, an arbitrary pre�x can be printed in a bounded(and small) amount of space. In a similar experiment, computing and printing the�rst 200 prime numbers (using Eratosthenes's sieve) requires more than 3000 KBheap space with the leaky machine, and less than 8 KB with the non-leaky one. Itis important to solve this problem.4.3 Environment trimmingThe solution is to re�ne the use of environments in the mark 2 and 3 machines,so they model more closely the substitutions performed in the mark 1 machine(Figure 3). Ideally, the environment E in the machine, and the environments storedin closures in the heap, should not bind any super uous variables.Following an idea from the Spineless Tagless G-machine (Peyton Jones, 1992,Section 5.3), we shall trim the environments E of let-bound expressions and of let-bodies: only their free variables should be included in E. To perform trimming,we annotate every let-bound expression and the let-body with the set t of its freevariables. This annotation is called a trimmer .In the mark 2 machine, an environment E is a mapping from variable names topointers, a trimmer t is just a set of names, and the trimming Ejt of E with respectto t is simply the restriction of the mapping E to the domain t. Figure 7 shows therevised mark 2 let rule with trimming. Adding mark 2 trimmers to the examplefrom the previous section, we �nd that since I has no free variables, its trimmershould be the empty set fg:let f = ((�n: let x = (I; fg) in ((f x); fx; fg)); ffg)in ((f f); ffg)We see that the environment in closures allocated for x will be empty, and no spaceleak will occur. Although many closures are allocated for x during execution, theyquickly become unreachable and therefore subject to garbage collection.In the mark 3 machine, an environment E = [p1; : : : ; pk] is a list of pointers,a trimmer t = [i1; : : : ; in] is a sub-sequence of the de Bruijn indices 1; : : : ; k, and

Page 19: J. - Semantic Scholar...ext ens ion al pro p ert ie s of lazy fu nct ion al lan guage s are s imp le an d elegan t; sa t-i sfy more laws t h an str ict on e s. Con v ers ely, e in

Deriving a Lazy Abstract Machine 19the trimming Ejt of E with t is the sub-list [pi1 ; : : : ; pin ]. The de Bruijn indicesof variable occurrences must be adjusted when the environment gets trimmed. Theadjustment depends on the trimmer, that is, the set of free variables of the expressionbeing compiled. In general two passes over the expression are required at compiletime, but otherwise this adjustment poses no problems.Closures for let-bound expressions are written to the heap, potentially have along life-time, and therefore are prone to cause space leaks; this is the reason forexplicitly trimming their environments.Similarly, when a lambda abstraction is the whnf of a let-bound expression, thena closure for the lambda abstraction will be written to the heap. If the lambdaabstraction ignores its argument as in let y = (�y:�z:z)x in : : :, the inclusionof x in the environment may cause a space leak. However, a lambda abstractionmust occur either in the right-hand side of a let-binding or in a let-body, both ofwhich have their environments trimmed, so although its environment may containsuper uous variables, they are likely to be few.Trimming the environment of a lambda abstraction would entail the run-timeadjustment of the de Bruijn indexes of variable occurrences in its body.4.4 Related approachesIn the intermediate language of the Spineless Tagless G-machine STG, every let-bound expression is annotated with the set of its free variables. When making aclosure for a let-bound expression, the STG let-instruction trims the environment.It does not trim the environment of the let-body, but this does not harm lambdaabstractions, which in the STG-machine can appear only as right-hand sides in let-bindings. The STG-machine seems to su�er the same potential space leak as ourmachine when a lambda abstraction ignores its argument, but in the STG-machinethis can be �xed easily.Lambda lifting (Johnsson, 1985) as used in the G-machine automatically performsenvironment trimming by turning local function bindings into top-level functionbindings, passing any free variables as arguments. This may explain why lambdalifting works so well in lazy languages, in spite of its destroying scope informationand introducing more parameter passing: the pointer copying required for parameterpassing is required anyway for environment trimming in a lazy language. The G-machine does not seem to trim the environment of let-bound expressions.Trimming incurs a run-time overhead, but it also reduces the time to accessvariables when environments are represented as linked lists. A trimmed environmentcould be represented by a vector, or by a linked list of argument values (and case-bound values, see below) followed by a vector of the values of the free variables. Forsimplicity we use linked lists throughout this paper. After introducing constructorsand case-expressions we shall return to environment trimming again.A lazy approach to environment trimming was proposed by the designers of theThree Instruction Machine (Fairbairn & Wray, 1987, page 42). Every heap-allocatedenvironment is equipped with a trimmer in the form of a bit vector, indicating whichentries in the environment are live. This bit vector is generated a compile time. At

Page 20: J. - Semantic Scholar...ext ens ion al pro p ert ie s of lazy fu nct ion al lan guage s are s imp le an d elegan t; sa t-i sfy more laws t h an str ict on e s. Con v ers ely, e in

20 P. Sestoft� : c p1 : : : pa +A � : c p1 : : : pa Cons� : e +A � : ck p1 : : : pak � : ek[p1=yk1; : : : ; pak=ykak ] +A � : w----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------� : case e of fcj �yj!ejg +A � : w CaseFig. 8. Natural semantics rules for constructorsgarbage collection, only the live pointers in the environment are followed in thecopy- or mark-phase. This scheme probably spends less time trimming, but requiresmore space. It has the advantage that variable indexes need not be adjusted.E�cient environment representations which are safe for space complexity havebeen studied for strict functional languages (Shao & Appel, 1994). This work shouldbe relevant when designing environment representations for lazy languages also,although their space/e�ciency trade-o�s are di�erent: closures are created at amuch higher rate, and the lifetime of values is harder to predict.5 Algebraic datatypesMachine instructions for handling algebraic datatypes can be derived from naturalsemantics rules also. Algebraic datatypes, as known from Standard ML or Haskell,introduce two new forms of expressions: constructor applications c x1 : : : xa and caseexpressions:e ::= : : : j c x1 : : : xa j case e of fcj yj1 : : : yjaj ! ejgnj=1Constructor arguments must be variables to ensure sharing of components; addi-tional let-bindings may be introduced to satisfy this requirement. We assume a typedlanguage, so that constructors from di�erent datatypes cannot be confused at run-time, and so that every constructor cj has a �xed arity aj � 0. Constructors mustbe fully applied. In a case expression, e is the case object , and cj yj1 : : : yjaj ! ejis an alternative. There must be exactly one alternative for each constructor of thedatatype. For brevity we write fcj �yj!ejg for the list of alternatives, where �yj is avector yj1 : : : yjaj of variables.Launchbury gives the natural rules for evaluation of constructor application andcase, see Figure 8 (where we have added the A component as in Figure 2). Itfollows from the rules that a reduced value w may now be a constructor applicationc p1 : : : pa as well as a lambda abstraction �y:e.5.1 Constructors in the mark 1 machineWhat changes are needed to handle constructors in the mark 1 abstract machine,shown in Figure 3? Recall that the machine stack represents the continuation of acomputation; it says how to join subcomputations, corresponding to the atteningof the derivation tree.

Page 21: J. - Semantic Scholar...ext ens ion al pro p ert ie s of lazy fu nct ion al lan guage s are s imp le an d elegan t; sa t-i sfy more laws t h an str ict on e s. Con v ers ely, e in

Deriving a Lazy Abstract Machine 21Heap Control Stack rule� case e of alts S case1=) � e alts : S� ck p1 : : : pak alts : S case2=) � ek[pi=yki] S� ck p1 : : : pak #p : S var3=) �[p 7! ck p1 : : : pak ] ck p1 : : : pak SThe notation alts stands for the list fcj �yj!ejg of alternatives, and �yk stands for thevector yk1; : : : ; ykak of variables. In the case2 rule, ek is the right-hand side of the k'thalternative, and ek[pi=yki] abbreviates ek[p1=yk1; : : : ; pak=ykak ].Fig. 9. Constructor rules for abstract machine mark 1� The new Cons rule is similar to the Lam rule, and gives rise to no new machinerules: a constructor application is a value already, and requires no machineaction.� The new Case rule is similar to the App rule, and gives rise to two newmachine rules: case1 which begins the computation corresponding to the leftsubtree, and case2 which begins the computation corresponding to the rightsubtree.� A new version var3 of the var2 rule is needed, since the result w in the Varrule of Figure 2 may now be a constructor application c p1 : : : pa, not just alambda abstraction �y:e. When a constructor application c p1 : : : pa �nds anupdate marker #p on the stack, it must update the heap at p with itself.The new rules are shown in Figure 9. A new kind of stack object, similar in purposeto an argument p, is needed to store the case alternatives while evaluating the caseobject:� a case marker is a list fcj �yj!ejg of alternatives.Rule case1 pushes the case marker, then starts evaluating the case object e. Whenan application ck p1 : : : pak of constructor ck encounters a case marker, rule case2starts evaluating the k'th alternative.5.2 Correctness of the constructor rulesThe proof of Lemma 2 is easily extended to account for the new rules:Case Cons is trivial, by the empty sequence of computation steps (as for Lam).Case Case: Assume � : case e of fcj �yj!ejg +A � : w by rule Case, then

Page 22: J. - Semantic Scholar...ext ens ion al pro p ert ie s of lazy fu nct ion al lan guage s are s imp le an d elegan t; sa t-i sfy more laws t h an str ict on e s. Con v ers ely, e in

22 P. Sestoft(�; case e of fcj �yj!ejg; S)) (�; e; fcj �yj!ejg : S) by rule case1)� (�; ck p1 : : : pak ; fcj �yj!ejg : S) left premise and ind. hyp.) (�; ek[pi=yki]; S) by rule case2)� (�; w; S) right premise and ind. hyp. 2The other direction is more work. First, there are more balanced traces. Arguing asin Section 3.2, we �nd that all balanced traces of the new machine can be derivedfrom the following grammar:bal ::= � j app1 bal app2 bal j var1 bal var2 j let bal jvar1 bal var3 j case1 bal case2 balThe � trace now corresponds to the Cons rule as well as the Lam rule, and the twolast possibilities correspond to the Var rule and the Case rule, respectively.Secondly, Lemma 3 must be amended slightly, because the result w of a compu-tation may now be a constructor application ck p1 : : : pak :Lemma 4For all �0, e0, S, �1, and w it holds that if (�0; e0; S))� (�1; w; S) is balanced, andw � �v:e1 or w � ck p1 : : : pak , and A = #(S) then �0 : e0 +A �1 : w is derivable.Proof By induction on the structure of balanced traces.Case �. Follows by rule Lam or Cons according as w � �v:e1 or w � ck p1 : : : pak .Case app1 bal app2 bal, case var1 bal var2, and case let bal are as in Lemma 3.Case var1 bal var3. We must have e0 � p and �0 = �[p 7! e] and �1 = �[p 7!ck p1 : : : pak ] for some p, �, e, and �. The state after var1 is (�; e;#p : S) and thestate before var3 is (�; ck p1 : : : pak ;#p : S). Since the subtrace between these isbalanced, � : e +A[fpg � : ck p1 : : : pak is derivable by the induction hypothesis,which is applicable because A [ fpg = #(#p : S). Using the Var rule, we �nd that�[p 7! e] : p +A �[p 7! ck p1 : : : pak ] : ck p1 : : : pak is derivable.Case case1 bal case2 bal: We must have e0 � case e of fcj �yj ! ejg. The stateafter case1 must be (�0; e; fcj �yj ! ejg : S) and the state before case2 must be(�; ck p1 : : : pak ; fcj �yj ! ejg : S). The trace between these is balanced, so that�0 : e +A � : ck p1 : : : pak is derivable by the induction hypothesis. The stateafter case2 is (�; ek[pi=yki]; S), and the trace of (�; ek[pi=yki]; S) )� (�1; w; S) isbalanced, so � : ek[pi=yki] +A �1 : w is derivable by the induction hypothesis. Usingthe Case rule, we �nd that �0 : case e of fcj �yj!ejg +A �1 : w is derivable. 2Proposition 2 must be amended similarly and its proof must appeal to the extendedLemmas. 5.3 Constructors in the mark 2 machineWe introduce environments to avoid doing substitutions in the code, as in Section 3.3when we derived the mark 2 machine. Because of possible free variables in the right-hand sides of alternatives, the case1 rule must store the environment E along with

Page 23: J. - Semantic Scholar...ext ens ion al pro p ert ie s of lazy fu nct ion al lan guage s are s imp le an d elegan t; sa t-i sfy more laws t h an str ict on e s. Con v ers ely, e in

Deriving a Lazy Abstract Machine 23Heap Control Environment Stack rule� case e of alts E S case1=) � e E (alts;E) : S� ck x1 : : : xak E0 (alts;E) : S case2=) � ek E[yki 7!pi] S� ck x1 : : : xak E0 #p : S var3=) �[p 7!(ck �x;E0)] ck �x E0 SThe notation alts stands for the list fcj �yj!ejg of alternatives, and �yk stands for thevector yk1; : : : ; ykak of variables. In the case2 rule, ek is the right-hand side of the k'thalternative, and pi = E0[xi] for i = 1; : : : ; ak. In the var3 rule, �x stands for the vectorx1 : : : xak of variables.Fig. 10. Constructor rules for abstract machine mark 2Heap Control Environment Stack rule� case e of alts E S case1=) � e E (alts;E) : S� ck u1 : : : uak E0 (alts;E) : S case2=) � ek E00 S� ck u1 : : : uak E0 #p : S var3=) �[p 7!(ck �u;E0)] ck �u E0 SThe notation alts stands for the list fcj ! ejg of alternatives. In the case2 rule, ek is theright-hand side of the k'th alternative, and E00 = E0[uak ] : : : : : E0[u1] : E is theenvironment for ek. In the var3 rule, �u stands for the vector u1 : : : uak of de Bruijnindices. Fig. 11. Na��ve constructor rules for abstract machine mark 3the alternatives in the case marker (on the stack), and rule case2 must extract itagain. Hence a case marker now has form (fcj �yj ! ejg; E). Also, the substitutionin rule case2 must be replaced by environment extension. The new rules are shownin Figure 10. 5.4 Constructors in the mark 3 machineReplacing variable names by de Bruijn indices as in Section 3.4, the syntax for analternative cj �yj ! ej becomes cj ! ej , simply. The pattern variables yj1 : : : yjajhave been replaced by the indices aj ; : : : ; 1 in the right-hand side ej . The seeminglyinverse order of the indices is natural; it corresponds to the indices in ej of the para-meters in the lambda abstraction �yj1: : : : �yjaj :ej . The resulting abstract machineis shown in Figure 11.

Page 24: J. - Semantic Scholar...ext ens ion al pro p ert ie s of lazy fu nct ion al lan guage s are s imp le an d elegan t; sa t-i sfy more laws t h an str ict on e s. Con v ers ely, e in

24 P. Sestoft5.5 Improving constructors in the mark 3 machineThe handling of constructors and case expressions can still be improved, by requir-ing that every constructor application has the restricted formck ak : : : 1in which the arguments u1 : : : uak are exactly the indices of the ak most recentlybound variables. In that case the new environment E00 in the second rule of Figure 11is always E00 = E0[1] : : : : : E0[ak] : E, which is just the environment E appendedto the �rst ak variables from E0. Then there is no need to represent the argumentsexplicitly in a constructor application; an application of a constructor ck can berepresented just by ck;a, showing its tag k and arity a. We have arrived at thePackftag,arityg representation for constructors used by Peyton Jones and Lester(1992), with tag = k and arity = a.The restriction on constructor application can be enforced by de�ning namedconstructors, as illustrated by the standard list-constructors nil and cons (witharities 0 and 2): nil = c1;0cons = ��c2;2An application cons x y of a named constructor is an ordinary application, in whichthe arguments x and y must be variables to ensure sharing. To illustrate the use ofthis encoding, consider evaluating cons x y with an update marker #p on the stacktop; assume that px = E[x] and py = E[y]:� (��c2;2)x y E #p : S) � (��c2;2)x E py : #p : S (app1)) � (��c2;2) E px : py : #p : S (app1)) � �c2;2 px : E py : #p : S (app2)) � c2;2 py : px : E #p : S (app2)) �[p 7! (c2;2; [py; px])] c2;2 py : px : E S (var3)Moving cons's arguments into the environment via the stack may seem cumbersome,but this scheme is simple and permits partial application of constructor encodingssuch as cons. Moreover, in a realistic implementation, the frequent case of applying aknown constructor to all of its arguments can be detected and optimized at compile-time. The improved constructor rules are shown in Figure 12.Note that we ever only use the �rst ak elements of the constructor application'senvironment, where ak is the constructor's arity. Hence we need to store only the �rstak elements of E in the heap when meeting an update marker. This is particularlye�cient if an environment is a linked list of vectors (of heap pointers); then theconstructor arguments can be represented by a �xed size vector of heap pointers,and the append operation to build E00 in rule case2 of Figure 12 can be done inconstant time, independently of the size of E0 and E.

Page 25: J. - Semantic Scholar...ext ens ion al pro p ert ie s of lazy fu nct ion al lan guage s are s imp le an d elegan t; sa t-i sfy more laws t h an str ict on e s. Con v ers ely, e in

Deriving a Lazy Abstract Machine 25Heap Control Environment Stack rule� case e of alts E S case1=) � e E (alts;E) : S� ck;a E0 (alts;E) : S case2=) � ek E00 S� ck;a E0 #p : S var3=) �[p 7!(ck;a; E0[1 : : : a])] ck;a E0 SThe notation alts stands for the list fcj ! ejg of alternatives. In the case2 rule, ek is theright-hand side of the k'th alternative, and E00 = E0[1] : E0[2] : : : : : E0[a] : E. In the var3rule, E0[1 : : : a] abbreviates [E0[1]; : : : ; E0[a]].Fig. 12. Improved constructor rules for abstract machine mark 35.6 Environment trimming for constructorsThe space behaviour of the rules in Figure 12 need to be improved in two ways. Inrule case1, storing the entire environment E in the case marker on the stack maycause a space leak. Namely, it may hold on to a large value while the case objecte is being evaluated. Hence we should trim the environment E before storing it onthe stack.In rule case2, even if an alternative has no free variables at all, its environmentE00 will contain all the variables free in other alternatives. If the alternative is alambda expression �z:z, a closure for it might be written to the heap, causing aspace leak because of these free variables. Hence we should trim the environmentE00 of an alternative, just as we trim the environment of a let-body.In rule var3 we can do no better than including all the arguments when writinga constructed value (ck;a; E0[1 : : : a]) to the heap.We therefore add a separate trimmer tk for each alternative ck ! ek, and a jointtrimmer t for the list alts of all alternatives. The trimmer tk is the set of variablesfree in ek, including the variables bound on the left-hand side, and t is the set ofvariables free in one or more of fcj ! ejg, excluding the variables bound on theleft-hand sides. We arrive at the improved rules shown in Figure 13.The Spineless Tagless G-machine trims the joint environment of the alternativesof case (Peyton Jones, 1992, Section 9.4.1). The G-machine does not seem to.6 Base valuesAdditional machine instructions for handling base values, such as integers, can beintroduced in several ways. In this section we derive mechanisms for base valuehandling from the representations suggested in (Peyton Jones & Launchbury, 1991).Alternatively, one may derive nearly the same base value operations from Launch-bury's operational semantics (Sestoft, 1994, Appendix B).

Page 26: J. - Semantic Scholar...ext ens ion al pro p ert ie s of lazy fu nct ion al lan guage s are s imp le an d elegan t; sa t-i sfy more laws t h an str ict on e s. Con v ers ely, e in

26 P. SestoftHeap Control Environment Stack rule� case e of alts ; t E S case1=) � e E (alts;Ejt) :S� ck;a E0 (alts;E) :S case2=) � ek E00jtk S� ck;a E0 #p : S var3=) �[p 7!(ck;a; E0[1 : : : a])] ck;a E0 SThe notation alts stands for the list fcj ! (ej ; tj)g of alternatives with trimmers. In thecase2 rule, (ek; tk) is the right-hand side of the k'th alternative, andE00 = E0[1] : : : : E0[a] : E. In the var3 rule, E0[1 : : : a] abbreviates [E0[1]; : : : ; E0[a]].Fig. 13. Final constructor rules with environment trimming, abstract machine mark 3� : n +A � : n Int� : e1 +A � : n1 � : e2 +A � : n2---------------------------------------------------------------------------------------------------------------------------------------------------------------� : e1 + e2 +A � : n1 + n2 AddFig. 14. Natural semantics rules for integers6.1 Deriving base value handlingWe use integers with addition to illustrate base values:e ::= : : : j n j e1 + e2Note that the operands of the addition need not be variables, as they cannot beshared. We shall assume that expressions are well-typed, so `+' is applied only toexpressions that evaluate to integers. Launchbury's evaluation rules are shown inFigure 14. An integer n reduces to itself immediately by rule Int . An addition isevaluated by evaluating the operands in turn and adding their results by rule Add.This re ects that `+' and other base value operators are strict : the arguments mustbe evaluated before the operation can be applied.We exploit two ideas due to Peyton Jones and Launchbury (1991): using con-structors to represent boxed integers, and using case expressions to evaluate argu-ments before the application of `+'. Recall that a boxed base value v is representedby a closure which must be evaluated to obtain v, whereas an unboxed value is justthe bit-pattern representing the value. A boxed value may turn out to be unde�ned,since an attempt to evaluate the closure may loop, whereas an unboxed value isalways de�ned.First, we represent a boxed integer by a constructor application Int n. Here Intis the only constructor in the datatype of boxed integers and has arity one, corres-ponding to the generic constructor c1;1. Using the �nal constructor representation

Page 27: J. - Semantic Scholar...ext ens ion al pro p ert ie s of lazy fu nct ion al lan guage s are s imp le an d elegan t; sa t-i sfy more laws t h an str ict on e s. Con v ers ely, e in

Deriving a Lazy Abstract Machine 27Heap Control Environment Stack rule� Cst n E S cst=) � Int [n] S� op+ n2 : n1 : E S add=) � Int [n1 + n2] SFig. 15. Rules for base values in abstract machine mark 3developed in Section 5.5, a boxed integer Int n would be represented by a pair(Int ; E) where E is a one-element vector containing a pointer to a closure some-how representing the unboxed integer n. But the unboxed n is just a bit-pattern,so we shall let E be a one-element vector containing n directly, giving the boxedrepresentation (Int ; [n]).Secondly, we use case expressions to force argument evaluation. For instance, theaddition e1+ e2 of two integer-valued expressions is de�ned following Peyton Jonesand Launchbury (1991, page 644), here slightly simpli�ed:case e1 ofInt n1 ! case e2 ofInt n2 ! Int (n1 + n2)Using de Bruijn indices as in Section 5.5, the case expression translates to thefollowing one (where we ignore environment trimming):case e1 of fcase e2 of fInt (2 + 1)ggHere 2 and 1 are the indices of n1 and n2, not base value constants, so (2+1) means`add the two �rst values in the environment'. The rules already given in Figure 12say how to evaluate the case expression, except for the unboxed addition (2+1). Itsu�ces to postulate a new machine instruction op+, which computes the (unboxed)sum n0 of the two �rst values in the current environment, and creates the boxedresult (Int ; [n0]). Thus we replace the innermost case alternative Int (2+1) by op+.In conclusion, a single new instruction is needed to perform addition. Anothernew instruction Cst n is needed to introduce integer constants; it must evaluate to(Int ; [n]). The required new rules are shown in Figure 15.For illustration, consider the evaluation of e1 + e2, which is compiled ascase e1 of fcase e2 of fop+ggUsing the abbreviation e0 � case e2 of fop+g for the inner case, we have

Page 28: J. - Semantic Scholar...ext ens ion al pro p ert ie s of lazy fu nct ion al lan guage s are s imp le an d elegan t; sa t-i sfy more laws t h an str ict on e s. Con v ers ely, e in

28 P. Sestoft� case e1 of fe0g E S) � e1 E (fe0g; E) : S (case1))� � Int [n1] (fe0g; E) : S (evaluate e1)) � case e2 of op+ n1 : E S (case2)) � e2 n1 : E (fop+g; n1 : E) : S (case1))� � Int [n2] (fop+g; n1 : E) : S (evaluate e2)) � op+ n2 : n1 : E S (case2)) � Int [n1 + n2] S (add)There is a close correspondence between this computation sequence and a treederived from the Add rule in the natural semantics of Figure 14. The two subcom-putations starting with case1 and ending just before case2 correspond to the leftand right subtree, respectively. The �rst occurrence of case2 initiates the evaluationof the right-hand subtree, and the second occurrence of case2 activates the add step,which performs the addition in the conclusion of the rule.When a let-bound expression evaluates to an integer, the result (Int ; [n]) will en-counter an update marker #p on the machine stack. Since Int is just the constructorc1;1, this situation is handled by rule var3 in Figure 12, which will update the heapat p with the closure (Int ; [n]), as desired.6.2 Introducing a base value stackThe base value handling developed above is simple but ine�cient. Intermediate basevalues are held in the environment component of closures on the stack during thecomputation. This may prevent optimization of environment representations, andmay complicate garbage collection by mixing base values and pointers in the stack.It is better to have a separate value stack V for intermediate base values.To introduce a separate value stack V , we replace the general Int constructor(which is really c1;1) by a special constructor called Ret , which will be used only tohandle boxed base values. A boxed value will be a closure (Ret; n), and we still usethe environment component to hold the unboxed value, since this simpli�es updatingof the heap. Only the new Ret and op+ instructions will access the unboxed value,so we can use the representation (Ret; n) instead of (Ret; [n]).The new instruction Ret must push the unboxed integer n from the environmentonto the value stack. In contrast to Int it should not extend the environment, so itmust behave as an argument-less constructor c1;0 rather than c1;1. The behavior ofc1;0 is prescribed by the case2 and var3 rules of Figure 12. By rule case2, when Retencounters a case marker (feg; E) on the stack, it must activate (e; E), and pushthe base value held in E onto the value stack V . By rule var3, when Ret encountersan update marker #p on the stack, it must update the heap at p with (Ret; n).The rules for Cst n and op+ must be modi�ed too. Clearly Cst n must evaluate to(Ret; n). The addition operator op+ will �nd its arguments n2 and n1 on the valuestack V , and must evaluate to (Ret; n1+n2). The new rules are shown in Figure 16.The action performed by Ret is to `return' to the `address' kept on the stack

Page 29: J. - Semantic Scholar...ext ens ion al pro p ert ie s of lazy fu nct ion al lan guage s are s imp le an d elegan t; sa t-i sfy more laws t h an str ict on e s. Con v ers ely, e in

Deriving a Lazy Abstract Machine 29Heap Control Env Values Stack rule� Cst n E V S cst=) � Ret n V S� op+ E n2 : n1 : V S add=) � Ret n1 + n2 V S� Ret n V (feg; E) : S ret1=) � e E n : V S� Ret n V #p : S ret2=) �[p 7!(Ret;n)] Ret n V SFig. 16. Rules for base values in abstract machine mark 3, with value stacktop: it activates the single closure in the case marker on the stack, hence its name.Originally, Ret was introduced as a code trick by Fairbairn and Wray (1987) tohandle base values in the Three Instruction Machine; they called it Self rather thanRet . It was adopted also by Peyton Jones and Lester (1992), who called it Return.What is new here is the derivation of Ret from an ordinary constructor c1;1.Consider again the evaluation of e1 + e2 by the new rules. The expression is stillcompiled as case e1 of fe0g, where e0 abbreviates case e2 of fop+g:� case e1 of fe0g E V S) � e1 E V (fe0g; E) : S)� � Ret n1 V (fe0g; E) : S) � case e2 of fop+g E n1 : V S) � e2 E n1 : V (fop+g; E) : S)� � Ret n2 n1 : V (fop+g; E) : S) � op+ E n2 : n1 : V S) � Ret n1 + n2 V SType correctness would ensure that the only stack top objects encountered by theRet instruction are update markers #p or case markers (feg; E); never an argumentpointer p.Consider introducing the abbreviation e seq e0 for case e of fc1;0 ! e0g, where seqassociates to the right. The compilation of e1 + e2 is e1 seq e2 seq op+, the reversePolish form of the expression. The net e�ect of evaluating a base value expression,such as e1, is to push its own reduced value onto the value stack.6.3 Boolean values, conditionals, and printingBoolean values may be represented by two argument-less constructors False � c1;0and True � c2;0, which may be tested by case expressions. These constructorsmust be generated by the basic comparison instructions such as op<, op�, op=,op^, op_ which operate on the value stack. Alternatively, Boolean values may berepresented by the integers 0 and 1, and one may introduce a new conditional such

Page 30: J. - Semantic Scholar...ext ens ion al pro p ert ie s of lazy fu nct ion al lan guage s are s imp le an d elegan t; sa t-i sfy more laws t h an str ict on e s. Con v ers ely, e in

30 P. Sestoftas Cond(e1; e2) from Peyton Jones and Lester (1992, Section 4.3.2) to test them,but this would require a duplication of the machinery for environment trimming.Printing of data structures can be done systematically as well; see (Peyton Jones& Lester, 1992, Section 4.6.4) and (Sestoft, 1991, Section 3.4). One new machineinstruction su�ces. It must print a string (as a side e�ect), and then activate theclosure in the case marker on the stack S. The activation of this closure correspondsto the demand for a new substructure to print; thus the print instruction forces theevaluation of composite data structures, as usual in lazy language implementations.For this reason printing is not a trivial concern: If one wants to argue the correctnessof e.g. an evaluation order analysis with respect to the abstract machine, then onemust know how printing drives evaluation.7 Assessment and further improvementsThe �nal abstract machine, consisting of Figures 6, 7, 13, and 16, plus operationsfor Booleans and printing (not shown), has twelve instructions and implements alldynamic aspects of a lazy functional language. We wrote a straightforward experi-mental implementation in Standard ML for the example language used by PeytonJones and Lester (1992). We used Standard ML references to implement the heap�, and the rely on the Standard ML system for garbage collection. Computing andprinting the �rst 300 prime numbers using Eratosthenes's sieve takes approximately21 seconds in this implementation running under Standard ML of New Jersey ver-sion 108.13 and Linux with an Intel 486DX4/100MHz processor. This is only 5.3times slower than Mark Jones's Gofer system (version 2.30b, written in C).Let us list some positive and negative properties of our abstract machine:+ it performs tail calls in constant space;+ it does not build indirection chains;� it performs many identical updates (but this is easily avoided);� all values must test the stack top;+ it can exploit strictness information to eliminate some identical updates (bynot pushing update markers for values which are already evaluated);� even with strictness information it is hard to eliminate the tests on the stacktop element;� variable access is not constant time, because the linked environment structuremust be traversed.Surprisingly, the overhead incurred by identical updates in the experimental im-plementation is negligible. In a more realistic implementation, eliminating identicalupdates and some stack top tests would probably give a signi�cant speed-up.The environment trimming performed in let- and case-expressions may appearexpensive, but this cost is present also in the G-machine and the Three Instruc-tion Machine TIM (when applying super-combinators) and in the Spineless Tag-less G-machine (where environment trimming is explicit in let-bindings and case-expressions), and is necessary to avoid space leaks.

Page 31: J. - Semantic Scholar...ext ens ion al pro p ert ie s of lazy fu nct ion al lan guage s are s imp le an d elegan t; sa t-i sfy more laws t h an str ict on e s. Con v ers ely, e in

Deriving a Lazy Abstract Machine 31In our experimental implementation the environment is a linked list of pointers.The other extreme in the design space is the TIM's representation of the environmentas a vector of pointers. Our representation gives fast application and slow variableaccess, and TIM's representation gives slow application and fast variable access.A linked list of vectors is an obvious compromise, which should work well withtrimming.Compile-time optimizations may be used to improve the implementation of known,fully applied constructors. Similarly, one may improve base value handling as in theB compilation scheme of Peyton Jones and Lester (1992, page 160). By maintaininga symbolic value stack at compile-time one may detect when a constant or an op+instruction will necessarily �nd another arithmetic operation in the case marker onthe stack top (at run-time). In such cases the Ret step may be skipped, and the basevalue operations can be performed directly on the value stack.In short, we believe that a realistic implementation, perhaps the Spineless TaglessG-machine, could be derived by further re�nements.8 Related workThe literature on derivation of abstract machines is rich. Closest to the presentwork is Hannan and Miller's formal derivation of the call-by-name Krivine machinefrom a natural semantics for call-by-name evaluation, and Hannan's (1991) furtherconcretizations of abstract machines. Their derivation di�ers from ours in that en-vironments and de Bruijn indices are introduced already in the natural semantics.Hannan and Miller (1990) present a general transformation from branching nat-ural semantics rules to non-branching ones, essentially by introducing an explicitstack of premises still to be proved. The further transformation from non-branchingnatural semantics rules to at abstract machine rules proceeds by several steps,more ad hoc. Altogether they achieve, in smaller steps, the same as our somewhatmonolithical correctness proof (Proposition 2). On the other hand, our notion ofbalanced traces has intuitive signi�cance and provided for an easy extension of thecorrectness proof when augmenting the language with data structures (Section 5).Cast in a logical framework, Hannan and Miller's derivation lends itself well toformalization; Hannan and Pfenning (1992) mechanically veri�ed such a transla-tion proof in Elf. We have not attempted machine veri�cation, but believe that thesupporting concepts are formalized well enough that it would be feasible.Wand (1982a; 1982b) derived a compiler and an abstract machine from a denota-tional semantics in continuation passing style. Josephs (1989) gave a denotationalsemantics for a lazy functional languages. Presumably Wand's approach could beused to derive a lazy abstract machine from that.Fairbairn and Wray (1987) developed the Three Instruction Machine TIM for call-by-name evaluation of super-combinators in 1986, and made it lazy by using updatemarkers #p on the stack. Argo studied ine�ciencies and improvements of the TIM,in particular the handling of shared partial applications and the representation ofenvironments (Argo, 1989; Argo, 1990).Krivine developed his machine for call-by-name evaluation around 1985; a descrip-

Page 32: J. - Semantic Scholar...ext ens ion al pro p ert ie s of lazy fu nct ion al lan guage s are s imp le an d elegan t; sa t-i sfy more laws t h an str ict on e s. Con v ers ely, e in

32 P. Sestofttion is given by Curien (1988). Borrowing the update marker technique from theTIM, several people have made the Krivine machine lazy, including Cr�egut (1990;1991) and Sestoft (1991). The close relation between the Krivine machine and theTIM has been studied by Cr�egut (1991, page 41) and Mogensen (1992). Althoughsimple, the Krivine machine is practically useful: a strict version is at the core ofthe e�cient interpretive Caml Light system due to Leroy (1990).Developing the Spineless Tagless G-machine, Peyton Jones (1992) studies all as-pects of lazy language evaluation. The point of departure is graph reduction, anda solid operational understanding of lazy languages, rather than a formal descrip-tion. The necessary intuition can be gained by studying (Peyton Jones, 1987) and(Peyton Jones & Lester, 1992). 9 ConclusionWe developed a simple and well-known abstract machine from a natural semanticsfor lazy evaluation. We proceeded in a number of re�nement steps, proving the cor-rectness of the non-trivial steps, and demonstrated that well-known implementationtechniques can be derived from a semantics.The resulting abstract machine is less sophisticated than the G-machine (Augusts-son, 1984; Johnsson, 1984), the Spineless Tagless G-machine (Peyton Jones, 1992),and the re�ned Three Instruction Machine (Argo, 1990; Peyton Jones & Lester,1992), but it is simple and its correctness requires no separate proof. This showsthat a lazy functional language implementation can be at the same time demon-strably correct, understandable, small, and quite e�cient.In addition to having pedagogical value, the present abstract machine and devel-opment provide a foundation for program analyses which are concerned with theoperational properties of lazy languages.AcknowledgementsA visit to Andy Moran and John Hughes at Chalmers University of Technologyinspired this work. The anonymous referees provided much helpful advice on thepresentation and the subject matter. Thanks to Carsten Kehler Holst, Torben Mo-gensen, and Simon Peyton Jones for comments and suggestions.ReferencesArgo, G. (1989). Improving the Three Instruction Machine. Pages 100{112 of: Fourth in-ternational conference on functional programming languages and computer architecture.Imperial College, London. Reading, MA: Addison-Wesley.Argo, G. (1990). E�cient laziness. Ph.D. thesis, Department of Computing Science,University of Glasgow. First draft, 123+12 pages.Augustsson, L. (1984). A compiler for Lazy ML. Pages 218{227 of: 1984 ACM symposiumon Lisp and functional programming, Austin, Texas. New York: ACM.Barendregt, H.P. (1984). The lambda calculus. Its syntax and semantics. Revised edn.Amsterdam: North-Holland.

Page 33: J. - Semantic Scholar...ext ens ion al pro p ert ie s of lazy fu nct ion al lan guage s are s imp le an d elegan t; sa t-i sfy more laws t h an str ict on e s. Con v ers ely, e in

Deriving a Lazy Abstract Machine 33Cr�egut, P. (1990). An abstract machine for the normalization of �-terms. Pages 333{340of: 1990 ACM conference on Lisp and functional programming, Nice, France. New York:ACM.Cr�egut, P. (1991). Machines �a environnement pour la r�eduction symbolique et l'�evaluationpartielle. Ph.D. thesis, Universit�e Paris VII, France.Curien, P.L. (1988). The ��-calculus: an abstract framework for environment machines.Rapport de Recherche LIENS-88-10. Ecole Normale Sup�erieure, Paris, France.Fairbairn, J., & Wray, S.C. (1987). TIM: A simple, lazy abstract machine to executesupercombinators. Pages 34{45 of: Kahn, G. (ed), Functional programming languagesand computer architecture, Portland, Oregon. (Lecture notes in computer science, vol.274). Berlin: Springer-Verlag.Hannan, J. (1991). Making abstract machines less abstract. Pages 618{635 of: Hughes,J. (ed), Functional programming languages and computer architecture, 5th ACM con-ference, Cambridge, Massachusetts, August 1991. (Lecture notes in computer science,vol. 523). Berlin: Springer-Verlag.Hannan, J., & Miller, D. (1990). From operational semantics to abstract machines: Pre-liminary results. Pages 323{332 of: 1990 ACM conference on Lisp and functional pro-gramming, Nice, France. New York: ACM.Hannan, J., & Pfenning, F. (1992). Compiler veri�cation in LF. Pages 407{418 of: Scedrov,A. (ed), Seventh annual IEEE symposium on logic in computer science. IEEE ComputerSociety Press.Johnsson, T. (1984). E�cient compilation of lazy evaluation. Pages 58{69 of: ACM Sigplan'84 symposium on compiler construction (SIGPLAN Notices vol. 19, no. 6). New York:ACM.Johnsson, T. (1985). Lambda lifting: Transforming programs to recursive equations. Pages190{205 of: Jouannaud, J.-P. (ed), Functional programming languages and computerarchitecture, Nancy, France, 1985. (Lecture notes in computer science, vol. 201). Berlin:Springer-Verlag.Josephs, M.B. (1989). The semantics of lazy functional languages. Theoretical computerscience, 68, 105{111.Launchbury, J. (1993). A natural semantics for lazy evaluation. Pages 144{154 of: Twen-tieth ACM symposium on principles of programming languages, Charleston, South Car-olina, January 1993. New York: ACM.Leroy, X. (1990). The Zinc experiment: An economical implementation of the ML language.Rapport Technique 117. INRIA Rocquencourt, France.Mogensen, T. (1992). Re: Is lambda lifting always necessary? Personal communication.Peyton Jones, S.L. (1987). The implementation of functional programming languages.Prentice-Hall.Peyton Jones, S.L. (1992). Implementing lazy functional languages on stock hardware: thespineless tagless G-machine. Journal of functional programming, 2(2), 127{202.Peyton Jones, S.L., & Launchbury, J. (1991). Unboxed values as �rst class citizens in a non-strict functional language. Pages 636{666 of: Hughes, J. (ed), Functional programminglanguages and computer architecture, 5th ACM conference, Cambridge, Massachusetts,August 1991. (Lecture notes in computer science, vol. 523). Berlin: Springer-Verlag.Peyton Jones, S.L., & Lester, D. (1992). Implementing functional languages. Prentice-Hall.Sansom, P.M., & Peyton Jones, S.L. (1995). Time and space pro�ling for non-strict,higher-order functional languages. Pages 355{366 of: POPL'95: 22nd ACM symposiumon principles of programming languages, San Francisco, California, January 1995. NewYork: ACM.

Page 34: J. - Semantic Scholar...ext ens ion al pro p ert ie s of lazy fu nct ion al lan guage s are s imp le an d elegan t; sa t-i sfy more laws t h an str ict on e s. Con v ers ely, e in

34 P. SestoftSestoft, P. (1991). Analysis and e�cient implementation of functional programs. Ph.D.thesis, DIKU, University of Copenhagen, Denmark. DIKU Research Report 92/6.Sestoft, P. (1994). Deriving a lazy abstract machine. Tech. rept. ID-TR 1994-146. Depart-ment of Computer Science, Technical University of Denmark. 27 pages.Shao, Z., & Appel, A. (1994). Space-e�cient closure representations. Pages 150{161 of:1994 ACM conference on Lisp and functional programming, Orlando, Florida, June1994.Wand, M. (1982a). Deriving target code as a representation of continuation semantics.ACM transactions on programming languages and systems, 4(3), 496{517.Wand, M. (1982b). Semantics-directed machine architecture. Pages 234{241 of: NinthACM symposium on principles of programming languages, Albuquerque, New Mexico,January 1982. New York: ACM.