Compiling with Non-deterministic Choice for Expressing...

24
Compiling with Non-deterministic Choice for Expressing Pending Optimization Decisions Matthias Blume Toyota Technological Institute at Chicago [email protected] Abstract. Optimizing compilers must make many decisions while translating programs or program fragments. However, the outcome (i.e., the effect on overall code quality) of these decisions is often not immediately apparent. It would be useful if one could defer decisions about specific program transformations until later when their impact on other optimizations become clear. This paper describes a simple extension to the intermediate program representation used by an optimizing compiler and shows how this makes it possible to encode pending optimization decisions in this intermediate language itself. The result is a framework that not only can serve directly as the basis for an implementation but which also permits reasoning about its theoretical and practical properties. The extension—a non-deterministic choice operator—can be added to many kinds of in- termediate languages of which this paper will show two examples. Using λ-calculus we give a thorough formal account that also shows how potential theoretical pitfalls can be circumvented. In addition, a short excursion to more traditional flow graphs with quadru- ples is used to demonstrate how the choice operator can help to improve the precision of flow-analyses. 1 Introduction Every optimizing compiler uses some form of intermediate language to represent programs. Transformations on these programs for the sake of improving them are governed by the laws of that language. Intermediate representations come in many different flavors: expression trees, flow graphs with quadruples, or variants of the λ-calculus, to name a few. But regardless of what the actual representation is, the compiler always faces the same problem: performing transformations on the so-represented programs constitutes a heuristic search in optimization space, and directing this search is not easy. An exhaustive search would (typically! [11]) be infeasible due to the very high branching factor, so most compilers settle for something that resembles a linear walk along a single path. To select that path the compiler invokes guiding heuristics every time there is more than one alternative of how to proceed. The heuristic guide assesses the current situation and makes a recommendation in a greedy fashion. For example, to decide whether inlining a given function seems worthwhile, the compiler might look at three things: the current program size, the size of the function in question, and the number of places where the function is invoked. This is done to estimate the total in program

Transcript of Compiling with Non-deterministic Choice for Expressing...

Page 1: Compiling with Non-deterministic Choice for Expressing ...people.cs.uchicago.edu/~blume/papers/choice.pdfCompiling with Non-deterministic Choice for Expressing Pending Optimization

Compiling with Non-deterministic Choice for ExpressingPending Optimization Decisions

Matthias Blume

Toyota Technological Institute at [email protected]

Abstract. Optimizing compilers must make many decisions while translating programsor program fragments. However, the outcome (i.e., the effect on overall code quality) ofthese decisions is often not immediately apparent.It would be useful if one could defer decisions about specific program transformationsuntil later when their impact on other optimizations become clear. This paper describesa simple extension to the intermediate program representation used by an optimizingcompiler and shows how this makes it possible to encode pending optimization decisionsin this intermediate language itself. The result is a framework that not only can servedirectly as the basis for an implementation but which also permits reasoning about itstheoretical and practical properties.The extension—a non-deterministic choice operator—can be added to many kinds of in-termediate languages of which this paper will show two examples. Usingλ-calculus wegive a thorough formal account that also shows how potential theoretical pitfalls can becircumvented. In addition, a short excursion to more traditional flow graphs with quadru-ples is used to demonstrate how the choice operator can help to improve the precision offlow-analyses.

1 Introduction

Every optimizing compiler uses some form of intermediate language to represent programs.Transformations on these programs for the sake of improving them are governed by the lawsof that language.

Intermediate representations come in many different flavors: expression trees, flow graphswith quadruples, or variants of theλ-calculus, to name a few. But regardless of what the actualrepresentation is, the compiler always faces the same problem: performing transformations onthe so-represented programs constitutes a heuristic search in optimization space, and directingthis search is not easy.

An exhaustive search would (typically! [11]) be infeasible due to the very high branchingfactor, so most compilers settle for something that resembles a linear walk along a single path.To select that path the compiler invokes guiding heuristics every time there is more than onealternative of how to proceed. The heuristic guide assesses the current situation and makes arecommendation in agreedyfashion.

For example, to decide whether inlining a given function seems worthwhile, the compilermight look at three things: the current program size, the size of the function in question, and thenumber of places where the function is invoked. This is done to estimate the total in program

Page 2: Compiling with Non-deterministic Choice for Expressing ...people.cs.uchicago.edu/~blume/papers/choice.pdfCompiling with Non-deterministic Choice for Expressing Pending Optimization

2 M. Blume

size after inlining is complete. Of course, the conclusion can be overly pessimistic because fu-ture optimization steps such as constant folding, value propagation, and dead code eliminationmight eventually lead to reduced code size. In fact, the hope for this effect is often the verymotivation for inlinining in the first place!

One can construct better heuristics by taking an additionalk optimization steps into ac-count. This leads to a somewhat more thorough search than what can be achieved by walking asingle path. However,depth-first searchis not ideally suited to this task because it can waste alot of time on one branch where another branch might yield much better results quickly. It alsorequires the optimizer to be able to backtrack—which can be tricky to implement.

Since abreadth-first searchtraverses all branches simultaneously, it is much better at avoid-ing deep traversals when a good solution exists on another branch at a short distance. Thisadvantage—as well as the fact that no backtracking is required—comes at the expense of los-ing some memory efficiency: the search must be able to represent many different situations atonce. To alleviate the latter problem we want a representation forsets of programsthat is asspace efficient as we can make it but which also allows for agressivepruningonce it grows toobig.

Space efficiency can be improved, for example, bysharingparts that are common to severalprograms in the set. Pruning gets along very well with the idea of breadth-first search as itpermits virtually unlimited tracking of several promising paths if this is payed for by leavingout enough others on the way. In effect, the breadth-first search is not reallyk-limited like itsdepth-first cousin. Instead, one can view this technique as “walking athick path” (of somegiven widthw). The original optimizer is then just a special case wherew = 1.

One desirable effect when common parts of different programs are being shared is that asingle manipulation of such a shared part can be used by all its client programs simultaneously.However, this requires great care because in general the validity of manipulations on parts ofa program depends on the context, and in the shared case a program part can have multiplecontexts.

The main observation that we make here is that all of the above (breadth-first search, si-multaneous representation of program sets, sharing of program parts, rules for where sharingmay occur and which manipulations are permitted, ability to prune) can be achieved by addinga choice-operator./ to the intermediate language. As we will see, this makes it possible torepresent sets of (ordinary) programs as a single program and the walk along a “fat path” turnsback into the familiar walk of a single path.

One can think of the new operator as anon-deterministicchoice, for example a conditionalbranch where the condition is random. However, it is not actually our intention to deal withnon-determinism here. Therefore, in each case we want to adjust the rules of the languagein such a way that the introduction of a non-deterministic choice operator does not lead tosemantic ambiguities. While nondeterministic choice is nothing particularly new, we considerit with the unusual purpose of aiding the optimizer for adeterministiclanguage. For this, wewill put it into very restricted settings where its use is disciplined enough to prevent programsfrom becoming ambiguous.

For the purpose of exposition in this paper, we will give a thorough account of what hap-pens when choice is added to aλ-calculus (forming theλ./-calculus) and how its rules must beadjusted in order to avoid non-determinism. But choice can be added to most if not all kinds ofintermediate languages. We will briefly hint at this by also having a look at possible improve-

Page 3: Compiling with Non-deterministic Choice for Expressing ...people.cs.uchicago.edu/~blume/papers/choice.pdfCompiling with Non-deterministic Choice for Expressing Pending Optimization

Compiling with Choice 3

ments to the precision flow analysis in the framework of flow graphs and quadruples. Finallywe come back to theλ./-calculus and discuss the construction of heuristics for a compilerbased on such an intermediate language.

2 λ-calculus

Many compilers for functional languages such as ML [12, 10] or Haskell [8] use one of themany variations of theλ-calculus as their internal language where (high-level) optimizationsare performed. Examples are the CPS language described by Appel [1] that has been used inthe SML/NJ implementation of Standard ML [3]. A popular alternative to CPS is A-normalform [16, 7], and recently there is also a continuing trend towards typed intermediate lan-guages [21, 20].

x ∈ Variable

Eλ ::= x | λx.Eλ | (Eλ Eλ)

Cλ[·] ::= [·] | λx.Cλ[·] | (Cλ[·] Eλ) | (Eλ Cλ[·])

Fig. 1. Syntax of expressionsEλ and contextsCλ[·] in theuntypedλ-calculus

λx1x2 · · · .M=def λx1.λx2 · · · .M

(M1 M2 M3 · · · Mn)=def (· · · ((M1 M2) M3) · · · Mn)

true=def λxy.x

false=def λxy.y

not=def λx.(x false true)

xor=def λxy.(x (not y) y)

pair=def λxys.(s x y)

fst=def λp.(p true)

snd=def λp.(p false)

Fig. 2.Common abbreviations for certainλ-expressions.

x ∈ Variable

E./ ::= x | λx.E./ | (E./ E./) | (E./ ./ E./)

C./[·] ::= [·] | λx.C./[·] | (C./[·] E./) | (E./ C./[·]) | (C./[·] ./ E./) | (E./ ./ C./[·])

Fig. 3. Syntax of expressionsE./ and contextsC./[·] for the untypedλ-calculus with an added operator./ for free choice.

Page 4: Compiling with Non-deterministic Choice for Expressing ...people.cs.uchicago.edu/~blume/papers/choice.pdfCompiling with Non-deterministic Choice for Expressing Pending Optimization

4 M. Blume

x[M/x] = M

x 6= y

y[M/x] = y

(L M)[N/x] = (L[N/x] M [N/x])

(L ./ M)[N/x] = (L[N/x] ./ M [N/x])

(λx.M)[N/x] = λx.M

x 6= y z is fresh

(λy.M)[N/x] = λz.(M [z/y][N/x])

Fig. 4.Capture-free substitution.

M [·] = [·]M [N ] = N

L[·] = λx.M [·]L[N ] = λx.M [N ]

L[·] = (M N [·])L[O] = (M N [O])

L[·] = (M [·] N)

L[O] = (M [O] N)

L[·] = (M ./ N [·])L[O] = (M ./ N [O])

L[·] = (M [·] ./ N)

L[O] = (M [O] ./ N)

Fig. 5.Context substitution.

(λx.M N)→λM [N/x] (BETAλ)

N→λN ′

M [N ]→λM [N ′](CONTEXTλ)

M→λ∗M (NULL λ)

M→λM ′ M ′→λ∗M ′′

M→λ∗M ′′ (TRANSλ)

Fig. 6. Transition relation for the originalλ-calculus.Capture-free substitution and context substitution oper-ations are those shown in Figure 4 and 5 (but restrictedto λ-terms without choice).

(λx.M N)→./M [N/x] (BETA)

N→./N ′

M [N ]→./M [N ′](CONTEXT)

(L ./ (M ./ N))→./((L ./ M) ./ N) (CH-ASSOC)

(M ./ N)→./(M ./ N) (CH-SYMM )

λx.(M ./ N)→./(λx.M ./ λx.N) (LAM -DIST)

((L ./ M) N)→./((L N) ./ (M N)) (APP-DIST-L)

(L (M ./ N))→./((L M) ./ (L N)) (APP-DIST-R)

(M ./ N)→./M (CHOOSE)

M→./∗M (NULL )

M→./M ′ M ′→./∗M ′′

M→./∗M ′′ (TRANS)

Fig. 7.Transition relation for the calculus with choice./.

Page 5: Compiling with Non-deterministic Choice for Expressing ...people.cs.uchicago.edu/~blume/papers/choice.pdfCompiling with Non-deterministic Choice for Expressing Pending Optimization

Compiling with Choice 5

2.1 An example

In the untypedλ-calculus1, using the abbreviations from Figure 2, consider the following ex-pression whereA, B, andC are assumed to be arbitrary (perhaps large) subexpressions:

1 (λf.(pair2 (pair (g f)3 (λy.(y (pair true A)) f))4 (λz.(z (pair w B)) f))5 λx.((fst x) (fst x) (C (snd x))))

While an evaluator would have to eventuallyβ-reduce the outermost application, the op-timizer of a compiler might not indiscriminately want to do so. The body of the abstractionthat bindsf contains three separate occurences off . Thus, aβ-reduction would triplicate theargument expression. ButC might be so large that the compiler’s heuristics do not consider itworthwhile to even just try.

Now let us add choice to the calculus as shown in Figure 3. Intuitively, the meaning of anexpression(M ./ N) is the non-deterministic choice between evaluatingM and evaluatingN .Guided by this intuition, we want the choice operator to have various useful properties such asbeing symmetric and associative. Moreover, both abstraction as well as application should bedistributive with respect to choice. A transitions system for this calculus is given in Figure 7.Notice that this transition system is identical to that of the underlyingλ-calculus without choiceas far as expressions are concerned where./ does not occur.

x ⇓ x (VAR)

λx.M ⇓ λx.M (LAM )

M ⇓ λx.B B[N/x] ⇓ V

(M N) ⇓ V(APP)

Fig. 8. Classical natural (“big step”) semantics for the call-by-nameλ-calculus.

x ⇓′ x (VAR’)

λx.M ⇓′ λx.M (LAM ’)

M ⇓′ λx.B N ⇓′ N ′ B[N ′/x] ⇓′ V

(M N) ⇓′ V(APP’)

Fig. 9.Classical natural semantics for the call-by-valueλ-calculus.

With this preparation, we can come back to our example. Let us postulate a modified ver-sion of the BETA rule that states

(λx.M N) → (λy.(M [(y ./ N)/x]) N) (BETA./)

wherey is a fresh variable.Now have a closer look at each of the three occurences off (on lines 2, 3, and 4 in the

example) if this new rule is applied to the outermost redex. (For brevity, we writeλx.F{x} forthe argument of this outermost application and also ignore theα-renaming off—assumingfdoes not occur free inF{x}.)

1 see Figures 1 and 8 for a quick reference

Page 6: Compiling with Non-deterministic Choice for Expressing ...people.cs.uchicago.edu/~blume/papers/choice.pdfCompiling with Non-deterministic Choice for Expressing Pending Optimization

6 M. Blume

On line 2, the application(g f) undergoes the following transition steps:

(g (f ./ λx.F{x}))by APP-DIST-R−→./ ((g f) ./ (g λx.F{x}))

by CHOOSE−→./ (g f)

Note that the choice on the last line can be made either way, but most compilers would want tochose the one shown here.

On line 3, the original application is distributed over the choice in a similar fashion. Oneof the alternatives then quickly arrives at(f (pair true A)), while the other undergoes thefollowing sequence of transitions:

(λy.(y (pair true A)) λx.F{x})→./ (λx.F{x} (pair true A))

...

→./ (true true (C A))→./ true

Clearly, in this case the compiler’s heuristics should favor the right-hand operand of./ (i.e., aCH-SYMM transition before applying CHOOSE).

Line 4 looks very similar to line 3, excepttrue is replaced with a free variablew. Thus,none of the reductions that depended ontrue can be performed. Like in the previous case,transformations eventually produce a choice between two expressions. One of them is(f(pairwB)),the other one is produced by this transformation sequence:

(λz.(z (pair w B)) λx.F{x})→./ (λx.F{x} (pair w B))

...

→./ (w w (C B))

Thus, either side of the final choice still containsB. Moreover, the second one also containsC. C is still present in the originalF{x} on line 5, i.e., the “definition” off . This is necessarysince on line 2 we have settled for keeping the original expression(g f), the variablef contin-ues to occur and its definition cannot be dropped. Considering all this, an optimizer’s heuristicswould probably decide to keep the first alternative, namely(f (pair w B)), and discard thesecond. The final overall expression therefore is

1 (λf.(pair2 (pair (g f)3 true)4 (f (pair w B)))5 λx.((fst x) (fst x) (C (snd x))))

which would not have been reachable with the choice-free version of the calculus.

Page 7: Compiling with Non-deterministic Choice for Expressing ...people.cs.uchicago.edu/~blume/papers/choice.pdfCompiling with Non-deterministic Choice for Expressing Pending Optimization

Compiling with Choice 7

2.2 The purpose of choice

As demonstrated in this example, the purpose of the choice operator is to delay the effect of areduction and to maintain—as part of the calculus—the alternatives of both reducing and notreducing. The CHOOSE rule always provides the possibility of finalizing pending choices byarbitrarily favoring one side. Moreover, a pending choice can be propagated to several differentplaces and local knowledge at those places can later be used to actually make the decision ateach instance.

An expression with embedded choices represents a “snapshot” of a heuristic breadth-firstsearch in optimization space. In practice the expression will grow very quickly in size, so theCHOOSErule must be applied often, and the heuristics for deciding when and how will haveto be similar to the heuristics in “traditional” optimizers. When viewed from this angle, itbecomes clear that./ is certainly no silver bullet.

Still, as we have promised in the introduction, the calculus with choice naturally expressesprogram fragments that are shared between members of the current set.2

Moreover, since the choice-based optimizer essentially performs a breadth-first instead ofa depth-first search, it is easier to trade precision in one search branch for precision in another.A branch with a pending choice that “looks promising” can be explored arbitrarily far as longas this is payed for by pruning other, less promising branches. To implement such behavior inan optimizer for a language without choice is harder because there it is difficult to comparedifferent branches “side-by-side”.

3 Free choice

The choice operator./ that we have seen in the previous section does not constrain the use ofother transformation rules. We will call such an non-constraining choice operator afree choice.(Section 7 discusses aconstrainedvariety that can also sometimes be useful.)

There is no technical reason for not using typed calculi, but their formal treatment takesmore room. For the sake of economy of notation we will thus mainly consider the untypedλ-calculus with choice (henceforth called theλ./-calculus) and only briefly mention typedversions where types are relevant to the discussion.

3.1 Non-confluence

The immediate trouble with the introduction of choice into aλ-calculus (or any intermediatelanguage for that matter) is that the resulting system is no longerconfluent, meaning that thereare expressionsL, M , andN with L→./M andL→./N but there is noO such thatM→./

∗OandN→./

∗O. In fact, examples for non-confluence can be trivially constructed:

(true ./ false)

Clearly, we must avoid the possibility of different transformations to arrive at programswith different meanings. Therefore, we will restrict the calculus to “well-behaved” expressions,i.e., those that restore the Church-Rosser property (confluence) of the calculus. How this is tobe achieved will be our topic for the remainder of this section.

2 Formally, this is the set ofcommitsas described in section 3.2.

Page 8: Compiling with Non-deterministic Choice for Expressing ...people.cs.uchicago.edu/~blume/papers/choice.pdfCompiling with Non-deterministic Choice for Expressing Pending Optimization

8 M. Blume

3.2 Commits

Consider aλ./-expressionM . We define the set ofcommitsC[[M ]] ∈ E as the set of expressionsthat can be derived fromM by deciding each occurrence of./ one way or the other (Figure 10).

C[[x]] = {x}

C[[λx.M ]] = {λx.M ′ | M ′ ∈ C[[M ]]}

C[[(M N)]] = {(M ′ N ′) | M ′ ∈ C[[M ]], N ′ ∈ C[[N ]]}

C[[(M ./ N)]] = C[[M ]] ∪ C[[N ]]

Fig. 10.Calculating the set ofcommitsC[[M ]] of aλ./-expressionM .

C[[M ]] has the following properties:

– If M is an ordinaryλ-term (i.e., does not contain./), thenC[[M ]] = {M}.– EachM ′ ∈ C[[M ]] can be obtained fromM by only applying two rules: CHOOSE and

CONTEXT.– By only applying symmetry-, associativity-, and distribution rules (CH-SYMM , CH-ASSOC,

LAM -DIST, APP-DIST-L, A PP-DIST-R), one can for eachM derive a “canonical” ex-pression((. . . (N1./N2)./ . . .Nk−1)./Nk) so thatC[[M ]] = {N1, N2, . . . , Nk−1, Nk}. Inother words,M can be evolved into an expression that simultaneously represents all mem-bers ofC[[M ]] in a syntactic way. There can be duplicates of the formNi = Nj for some1 ≤ i, j ≤ k, though.

It may be tempting to tackle the ambiguity problem by postulating that an expressionM bewell-formed wheneverC[[M ]] contains pairwise equivalent expressions only.3 We call expres-sions that fail this testimmediately inconsistent.

M immediately inconsistent=def

6 ∃N : ∀M ′ ∈ C[[M ]] : M ′→λ∗N

Obviously, avoiding immediate inconsistency is necessary. However, for our purposes it isnot sufficient. The first (and lesser) problem is that the property of being immediately inconsis-tent is not recursively enumerable. This would not be fatal if its negation would be preservedby→./. Unfortunately, this is not the case. Consider:

(λx.(xor x x) (true ./ false))

This expression has two commits that both have the same normal formfalse. Therefore, theexpression is not immediately inconsistent even though a simple BETA step changes that:

(xor (true ./ false) (true ./ false))

Two of the four commits of this new expression can be transformed totrue, the other two tofalse, and there is noN so thattrue→λ

∗N and alsofalse→λ∗N .

3 As usual,M, N ∈ Eλ are considered equivalent if there is someO such thatM→λ∗O ∧ N→λ

∗O.

Page 9: Compiling with Non-deterministic Choice for Expressing ...people.cs.uchicago.edu/~blume/papers/choice.pdfCompiling with Non-deterministic Choice for Expressing Pending Optimization

Compiling with Choice 9

3.3 Non-ambiguity

We are looking for a property that rules out immediate inconsistency and which will alsobe preserved by program transformations. At a minimum, the→./ relation, i.e., each rule inFigure 7 must preserve it.

The trouble with the previous counterexample can be attributed to the fact that there isa subterm which—in some sense—is inconsistent but for which the difference in meaninghappens to cancel with itself. After the BETA step, the two incompatible sides meet each otherdirectly, and the original cancellation of their difference no longer occurs.

One way to ensure preservation of equivalent commits (implying confluence) is to rule outinconsistent subterms. However, simply requiring that all subterms also not be immediatelyinconsistent is not helpful. To see why, consider applying BETA./ to

(λx.x y)

The result is

(λz.(z ./ y) y)

which now has a subterm(z ./ y) that is immediately inconsistent—although the intention wasto treatz andy as synonyms inside the abstraction. Thus, the correct way of defining whatconstitutes a well-formedλ./-expression must take binding context into account.

Γ, A ` M�N N→λO

Γ, A ` M�O(SUB)

x 6∈ dom(Γ )

Γ, A ` x�x(FREE)

Γ (x) = M

Γ, A ` x�M(BOUND)

Γ, 〈〉 ` N�N ′

Γ, 〈A | N ′〉 ` M�M ′

Γ, A ` (M N)�M ′ (CHARGE)

Γ [x 7→ N ], A ` M�O

Γ, 〈A | N〉 ` λx.M�O(DISCHARGE)

y is fresh Γ [x 7→ y], 〈〉 ` M�O

Γ, 〈〉 ` λx.M�λy.O(ALPHA)

Γ, A ` M�O Γ, A ` N�O

Γ, A ` (M ./ N)�O(CONS)

Fig. 11. Inference rules for judgements of the formΓ, A ` M�N . These judgements are pronounced:“In Γ with A, M consistently reduces toN .

Let the finite mapΓ ∈ Variable→ Eλ be abinding environmentandA ∈ Eλ∗ a (possibly

empty) sequence ofEλ-terms〈M1, . . . ,Mk〉. As usual,Γ [x 7→ M ] denotes the environment

Page 10: Compiling with Non-deterministic Choice for Expressing ...people.cs.uchicago.edu/~blume/papers/choice.pdfCompiling with Non-deterministic Choice for Expressing Pending Optimization

10 M. Blume

that mapsx to M and otherwise coincides withΓ . Furthermore, for anyA = 〈M1, . . . ,Mk〉,the sequence〈M1, . . . ,Mk,M〉 can be written as〈A | M〉. For λ./-expressionsM andλ-expressionsN we then define the relationΓ,A ` M�N which should be pronounced as “inΓ with A, M consistently reduces toN ” as shown in Figure 11.Non-ambiguityis then definedas follows4:

Definition 1. An expressionM ∈ E./ is called non-ambiguous if and only if there is someexpressionN ∈ Eλ such that∅, 〈〉 ` M�N .

We write` M�N as an abbreviation for∅, 〈〉 ` M�N , so non-ambiguity ofM can beconcisely expressed as∃N : (` M�N). Note that from M�N follows M→./

∗N .Non-ambiguity also implies that the term in question cannot be immediately inconsistent:

Proposition 1. If ` M�N , then∃N ′ : ∀M ′ ∈ C[[M ]] : M ′→λ∗N ′.

The proof of Proposition 1 proceeds by straightforward induction on the structure of theexpression.

3.4 Preservation of non-ambiguity

To show that an expression involving choice is non-ambiguous, one must be able to prove thattwo expressionsM andN are equivalent, i.e., that there is an expressionO that can be reachedfrom bothM andN . Of course, there is no general decision procedure for this because it wouldrequire solving the halting problem. However, non-ambiguous expressions can be enumerated,and since a compiler always starts with a non-ambiguous expression (if the source languagedoes not have a./ operator), it is enough to make sure that non-ambiguity is preserved by thetransition relation.

Proposition 2. If ` M�M ′ andM→./N , then∃N ′ : (` N�N ′ ∧M ′→λ∗N ′).

To prove Proposition 2 we show that the slightly stronger version holds whereΓ andAare not necessarily empty: ifΓ,A ` M�M ′ andM→./N , then∃N ′ : (Γ,A ` N�N ′ ∧M ′→λ

∗N ′). For this we proceed by cases where each transition rule is considered in turn.The individual cases use simultaneous structural induction onM andN . A slight technicaldifficulty is that the structures ofM andN are not completely identical. For example, in thecase of the BETA rule even the binding structures are different. To be able to use the inductionhypothesis, environmentsΓ and the list of pending applicationsA must be “fudged” accordingto Lemma 1 to look the same for bothM andN .

Notice that the substitution that occurs during a BETA-transition is mimicked by the inter-play of CHARGE-, DISCHARGE, ALPHA-, and BOUND rules. The roles ofΓ andA in judg-ments of the formΓ,A ` M�N is the following:Γ remembers pending substitutions andArecords pending applications of the BETA transition rule. As a result, the judgments treatβ-redexes as if they have already been reduced and, thus, do not distinguish between expressionsbefore or after an application of the BETA rule.

The following lemma justifies the “fudging” of environments and application lists we justmentioned:

4 A preliminary version of this paper called thisstrongnon-ambiguity and also briefly discusses a weakvariety [5].

Page 11: Compiling with Non-deterministic Choice for Expressing ...people.cs.uchicago.edu/~blume/papers/choice.pdfCompiling with Non-deterministic Choice for Expressing Pending Optimization

Compiling with Choice 11

Lemma 1. If Γ,A ` M�N andx does not occur free inM , thenΓ, 〈A | Y 〉 ` λx.M�N forarbitrary expressionsY .

Proof: Straightforward by induction on the structure ofM .

3.5 Confluence

From Propositions 1 and 2 one can conclude that for non-ambiguous terms,→./ keeps thesets of commits consistent. This means, that the absence of immediate inconsistency is alsopreserved:

Proposition 3. If ` M�M ′ andM→./N , then∃N ′ : ∀L ∈ C[[M ]] ∪ C[[N ]] : L→./∗N ′.

From this we conclude that→./ is indeed confluent for non-ambiguous expressions:

Proposition 4. If ` M�N and there are termsM ′,M ′′ ∈ E./ such thatM→./M′, and

M→./M′′, then∃N ′ : N→λ

∗N ′ ∧M ′→./∗N ′ ∧M ′′→./

∗N ′.

Proof: First we use induction on the number of→./ steps in a given→./∗ transition sequence

to show that Proposition 3 extends to→./∗. Now considerM , N such that M�N . We

pick an arbitrary elementM0 ∈ C[[M ]]. For M ′ with M→./M′ we then pickM ′

0 ∈ C[[M ′]]in a similar way and note that sinceM ′

0 can be obtained fromM ′ by using CHOOSE wehaveM ′→./

∗M ′0 and thereforeM→./

∗M ′0. By the extension of Proposition 3 we conclude

that ∃M ′ : M0→./∗M ′ ∧ M ′

0→./∗M ′. However, since bothM0 and M ′

0 are choice-freeexpressions, we even haveM0→λ

∗M ′ andM ′0→λ

∗M ′. By the same reasoning for anM ′′

with M→./M′′ we get someM ′′ so thatM0→λ

∗M ′′ andM ′′0 →λ

∗M ′′. We also know thatM→./

∗N and thatN itself is choice-free. Therefore, there is anN with M0→λ∗N and

N→λ∗N . Since→λ

∗ is already known to be confluent, we conclude that there must be anN ′ with N→λ

∗N ′ ∧M ′0→λ

∗N ′ ∧M ′′0 →λ

∗N ′ and therefore alsoN→λ∗N ′ ∧M ′→./

∗N ′ ∧M ′′→./

∗N ′. 2

3.6 Choice-free terms

Proposition 5. Every choice-free termM ∈ Eλ is non-ambiguous.

The proof for this proceeds by induction on the structure ofM , showing thatΓ,A `M�M .

3.7 Adding rules

In a real compiler, the initial expression will typically be an ordinaryλ-term without choice5,but so far—without choice-introduction rules—such choice-free terms are always confluentanyway. However, the developed theory acts as a framework that makes it easy to justify theaddition of choice-introduction rules, for example BETA./ as shown in the introduction.

To justify that an added rule does not disturb confluence, one must show that it also pre-serves non-ambiguity:

5 An exception to this is discussed in section 4.

Page 12: Compiling with Non-deterministic Choice for Expressing ...people.cs.uchicago.edu/~blume/papers/choice.pdfCompiling with Non-deterministic Choice for Expressing Pending Optimization

12 M. Blume

Proposition 6. If ` M�M ′ andMby BETA./−→./ N , then∃N ′ : (` N�N ′ ∧M ′→λ

∗N ′).

The proof for Proposition 6 is very similar to that of the BETA case in the proof for Propo-sition 2, the main difference being that here theβ-redex does not disappear (but its boundvariable gets renamed).

4 A real-world example: Cross-module inlining

The SML/NJ compiler [3] represents compilation units as closed functions that map importedvalues to exported values [4]. This fact can be exploited to express cross-module inlining bymanipulating theλ-terms that represent these closed functions in intermediate language [6].

If the source code of compilation unitU2 has a free reference to a variablex exported fromU1, thenF2—the function representingU2—will have a formal parameterx and the linker willmake sure that the value forx as produced byF1 (the function representingU1) will be passedto F2.

Theλ-splitting transformation modifies the code forF2 by attaching code that originallycame fromF1. In particular, letX be some subterm ofF1 that (in its original context) producesthe value ofx. For simplicity, let us assume thatx was the only argument ofF2 and thatX hasonly one free variabley. Thenλ-splitting would turnF2 into

λy.(F2 X)

and let the existing intra-modular optimizer take care of the rest, i.e., let it propagate the ex-pressionX to all free occurrences of variablex in the body ofF2 by reducing the outermostβ-redex. Notice thatF1 must also be modified to exporty if it wasn’t exported already, and thelinker must pass the corresponding values.

This works well ifX is small enough for the optimizer’s heuristics to actually enable theβ-reduction. If the reduction does not occur, then nothing is won and one suddenly has a residualcopy ofX in the code for compilation unitU2. (Of course, if the reductiondoesoccur, theremight be even more residual copies ofX in F2—which is precisely why the optimizer wouldsometimes decide to hold off the reduction of the outermostβ-redex.)

Since in many casesF1 itself continues to calculate the value ofx anyway, an added choiceto the intermediate language can solve this problem.6 Instead of turningF2 into λy.(F2 X),one can turn it into

λxy.(F2 (x ./ X))

Of course, the linker must now pass bothx andy to the resulting code forU2.The careful reader will observe that this code is ambiguous. Non-ambiguity can be shown

only for the case where the contexts ensures an equivalence betweenX andx, i.e., where thefunction is being called with the correct argument.

Fortunately, we can show that there is aY so that:

∅, 〈y, X〉 ` λxy.(F2 (x ./ X))�Y

With this we know that if we have a correctly implemented linker, then we also get non-ambiguous behavior fromλ-splitting with choice. Only a broken linker could upset us (but abroken linker would upset us anyway).

6 In fact, the thought of adding choice first came up during the work onλ-splitting.

Page 13: Compiling with Non-deterministic Choice for Expressing ...people.cs.uchicago.edu/~blume/papers/choice.pdfCompiling with Non-deterministic Choice for Expressing Pending Optimization

Compiling with Choice 13

5 Flow analysis with choice

X := y + z

P(x,y,z)

a := xb := y

w := -xx := 1/b

y+z available?

Fig. 12. Suppose the invocation of an unknown procedureP could potentially definey or z . Such asituation is known as anambiguousdefinition. If we interpret the./-box as an ordinary branch with anunknown condition, taking the intersection at the join point will eliminatey+z from the set of availableexpressions. However, if ambiguities due to choice have been ruled out, then one can interpret the flowas going “simultaneously through both paths” and use set union instead of intersection at./-joins. Thus,even if later optimizations eventually drop the left path in favor of the invocation ofP, its temporarypresence was able to improve the precision of dataflow analysis.

Consider the problem of flow analysis in the framework of flow graphs and quadruples(e.g., as described by Appel [2, Section 17.1]). We can extend this with free choice by adding aspecial./-branch together with a corresponding./-join. As we will see, dataflow analysis canbenefit from adjustments made at the point where./-induced paths merge. Maintaining special./-join nodes makes the detection of these places easier.

When doing dataflow analysis on flow graphs with choice, a first approximation wouldbe to interpret./-branches as conditional branches where the condition is statically unknown.This will lead to a correct analysis, but such an analysis can be rather conservative.

Suppose we enforce a non-ambiguity rule—analogous to theλ./-calculus’ consistent re-duction (Definition 1)—for flowgraphs with choice. In the resulting framework we would thenknow that two paths that merge at a./-join will have identical behavior at all times. Therefore,flow analysis can aggressively take the sum of the flow information that enters along differentpaths.

Figure 12 demonstrates this for the example ofavailable expressionanalysis. In essence,even though formally the procedureP that is called on the right path is not known, some oreven all of its innards have been made available to the flow analyzer on the left path. Availableexpression analysis would normally have to take the intersection of sets at join nodes, but forthe special case of./-joins it can use set union, thereby potentially yielding better precision.The results of the analysis can be used for program transformations even if the left branch islater being dropped in favor of merely keeping a call ofP.

Page 14: Compiling with Non-deterministic Choice for Expressing ...people.cs.uchicago.edu/~blume/papers/choice.pdfCompiling with Non-deterministic Choice for Expressing Pending Optimization

14 M. Blume

6 Designing choice-introduction rules

Let us now come back to theλ./-calculus. A rule of thumb for designing new choice-introducingrules is to look at places where the original calculus leaves some freedom for decision-making.A new rule can often be designed by taking two of the alternatives, joining them via./, andtheninverselyapplying some of the distribution laws.

Unfortunately, our BETA./ rule is not a very good example of this because its non-choicecounterpart is not present in our originalλ-calculus. This counterpart is a form ofpartial β-rulethat retains theβ-redex and performs substitution in only some but not necessarily all possibleplaces. (Observe how BETA./ substitutes a choice between a residual occurence of the formalparameter and the corresponding actual argument.)

Partialβ-reduction, while not often mentioned, is actually being taken for granted in mostcompilers when it comes to inlining, constant folding, and value propagation. The very ideathat a function may be inlined in some but not all places is a manifestation of a partialβ-rule.7

7 Constrained choice

As we have seen, studying the mechanics of a traditional optimizer’s decision-making canguide the design of rules that introduce choice. However, there are cases where this intuitionbreaks down and a legal program transformation would lead to an invalid introduction rule.Consider the following fragment of a program written in Standard ML:

structure F :> sigtype tval f : tval a : t -> int * int -> int

end = structtype t = int * int -> intfun f (x, y) = . . .fun a f (y, x) = f (x, y)

end

In this example, an ML compiler can apply the following reasoning:

– Typet is abstract (as indicated by theopaquesignature match written using “:> ”).– Therefore, there is no way of directly invoking any function of typet from outside struc-

tureF.– Functionf is such a function of typet .– Therefore, one can rewritef (together with all other functions of typet ) to use a more

efficient calling convention.– In particular, one can reverse the order of the arguments and modifya accordingly (as well

as all other call sites off inside structureF).

Here is what the resulting code would look like. Notice howa was simplified:

7 An earlier version of this paper discusses partialβ and its relationship to BETA./ in greater detail [5].

Page 15: Compiling with Non-deterministic Choice for Expressing ...people.cs.uchicago.edu/~blume/papers/choice.pdfCompiling with Non-deterministic Choice for Expressing Pending Optimization

Compiling with Choice 15

structure F’ :> sigtype tval f : tval a : t -> int * int -> int

end = structtype t = int * int -> intfun f (y, x) = . . .fun a f = f

end

Obviously, there is freedom in the decision of whether or not to apply such a transforma-tion. But to think that this freedom justifies a corresponding introduction rule for./ is wrong.Consider a client module that invokesf via a. If the client hadfreechoice betweenF andF’ ,then it would be translated as something like:

(F ./F’).a ((F ./F’).f)

In cases where the two choices are decided differently, this code would matchF.a with F’.f(or vice versa), which changes the semantics of the program.

7.1 Avoiding too much freedom

x ∈ Variable

E� ::= x | λx.E� | (E� E�) | (E� ./ E�) | (E� �E�)

C�[·] ::= [·] | λx.C�[·] | (C�[·] E�) | (E� C�[·]) |(C�[·] ./ E�) | (E� ./ C�[·]) |(C�[·] �E�) | (E� �C�[·])

Fig. 13.Syntax of expressionsE� and contextsC�[·] for the untypedλ-calculus with both free choice./and constrained choice�.

Two different concrete realizations of the same abstract type must never “meet” each other.To ensure this, let us first introduce a new operator� which we refer to asconstrainedchoice(see Figure 13). The intuition here is that it largely acts like the original./, but that the transitionsystem will keep any two sides that are joined by� well-separated at all times.

The use of an elaborate marking scheme could give us this behavior: Whenever a new termof the form(L �R) is introduced, the occurrence of� is assigned a unique tagt and the entiresubtermL is “painted” with “color” t− while R is painted witht+. Colors can be mixed, butat no time can there be a mix involving botht− andt+ for any tagt. (Notice that distributionlaws andβ-reduction must propagate color information.)

While this seems to be a rather general approach because it can accommodate the possi-bility that occurrences of� be copied, colors will clutter the resulting notation tremendously.

Page 16: Compiling with Non-deterministic Choice for Expressing ...people.cs.uchicago.edu/~blume/papers/choice.pdfCompiling with Non-deterministic Choice for Expressing Pending Optimization

16 M. Blume

� does not occur inN

(λx.M N)→�M [N/x](F-BETA)

x appears free at most once inM

(λx.M N)→�M [N/x](C-BETA)

N→�N′

M [N ]→�M [N ′](CONTEXT)

(M ./ N)→�(N ./ M) (F-SYMM )

(M �N)→�(N �M) (C-SYMM )

(L ./ (M ./ N))→�((L ./ M) ./ N)(FF-ASSOC)

(L ./ (M �N))→�((L ./ M) �N) (FC-ASSOC)

(L � (M ./ N))→�((L �M) �N) (CF-ASSOC)

(L � (M �N))→�((L �M) �N) (CC-ASSOC)

λx.(M ./ N)→�(λx.M ./ λx.N) (F-LAM )

λx.(M �N)→�(λx.M �λx.N) (C-LAM )

� does not occur inN

((L ./ M) N)→�((L N) ./ (M N))(F-APP-L)

((L �M) N)→�((L N) � (M N)) (C-APP-L)

� does not occur inL

(L (M ./ N))→�((L M) ./ (L N))(F-APP-R)

(L (M �N))→�((L M) � (L N)) (C-APP-R)

(M ./ N)→�M (F-CHOOSE)

(M �N)→�M (C-CHOOSE)

Fig. 14. Transition relation→� for the calculus with free choice./ and constrained choice�. Noticehow the rules maintain the invariant that subterms separated by� before a rule was applied will still beseparated by� afterwards.

Therefore, we use a much simpler (and less general) device: we make sure that if two subtermsare separated by an occurrence of� before a transition, they must still be separated by a� afterthe transition.

Observe how the rules for→� (see Figure 14) satisfy this requirement. In comparison withthe original transition system for theλ./-calculus (without constrained choice) there are someinteresting aspects of the new rule set:

– Constraining conditions appear in the distribution rules for thefree choice operator (aswell as in the rules forβ-reduction) because these rules would otherwise be able to makecopies of�.

– No constraining conditions appear in distribution rules for constrained choice because eventhough they might make copies of occurrences of�, the copies always end up on oppositesides of another�.

– Associativity rules for mixed./ and� are not duals of one another. A constrained choice ispermitted to pass a free choice when traveling outwards, thereby causing more subexpres-sions to be separated than before.8 A free choice, however, must turns into a constrainedchoice if it passes another constrained choice while traveling outwards (see also Figure 15).

– Whenever the presence of a� in some subterm inhibits one of the distribution rules for./,the rule can always be re-enabled either by applying C-CHOOSEor by using distribution-and associativity laws until the offending� has been lifted out of the subterm.

8 This would not happen if we had used a coloring scheme as outlined earlier.

Page 17: Compiling with Non-deterministic Choice for Expressing ...people.cs.uchicago.edu/~blume/papers/choice.pdfCompiling with Non-deterministic Choice for Expressing Pending Optimization

Compiling with Choice 17

2

0 1 2 0

1

Fig. 15.Rule CF-ASSOCin the calculus with free (./) and constrained (�) choice. When travelling out-wards, a./ turns into a� when passing another�. Otherwise the subterms2 and1 which were on differentsides of a./ before the transition would no longer be on different sides of any./ afterwards. It is possibleto construct contexts that are “immune” (i.e, do not exhibit an overall ambiguity) to(2 � (0 ./ 1)) but notto ((2 � 0) ./ 1). However, notice that neither of the two expressions admits consistent reduction for./,so the importance of this rule can still be questioned.

7.2 When to use constrained choice

If one takes the code for structureF from above and translates it to the untyped lambda-calculusby simply stripping away all type- and module-information, one arrives at something like thefollowing expression which representsF by building the pair of functionsf anda:

(pair

F.f︷ ︸︸ ︷(λxy. · · ·)

F.a︷ ︸︸ ︷(λfxy.(f y x)))︸ ︷︷ ︸

structureF

On the other hand, the code forF’ is:

(pair

F’.f︷ ︸︸ ︷(λyx. · · ·)

F’.a︷︸︸︷λf.f )︸ ︷︷ ︸

structureF’

These two expressions are not equivalent. Contexts that distinguish between the two are easilyconstructed. Therefore, rewriting the first into the second is not justified by the equationaltheory, and neither is rewriting it into a choice between the two.

The problem is that stripping away types has discarded essential information. Any con-text that would be able to distinguish between the two expressions is not well-typed accord-ing to ML’s typing rules.This reasoning—as opposed to equational reasoning in the untypedcalculus—is what enabled the compiler to legally rewriteF into F’ .

In a second-order typedλ-calculus, abstract types such as the typet in structureF are ex-plained in terms of existentially quantified type variables [15]. Thus, we can say that wheneverone wants to track the choice between two expressions that have an existentially quantifiedtype, one must use� instead of./. Many examples for this exist.

One abstraction that is built into every functional programming language is that of aclo-sure. Closure representation is largely at the discretion of the compiler since it is clearly neces-sary to be consistent. However, the compiler will eventually make the construction of closuresexplicit. If one is interested in tracking several possible representations, then the alternativesmust again be kept separated by using� instead of./. This observation resonates well with theuse of existential types for typed closure conversion as described by Minamide, Morrisett, andHarper [14].

Page 18: Compiling with Non-deterministic Choice for Expressing ...people.cs.uchicago.edu/~blume/papers/choice.pdfCompiling with Non-deterministic Choice for Expressing Pending Optimization

18 M. Blume

7.3 Limitations of constrained choice

At the time constrained choices are decided, one of the two alternatives has to completelydisappear from the program. This rules out any coexistence of two correct but incompatibleimplementations of the same abstraction within the same program.

In general, to prove that such coexistence would be harmless requires a dataflow analysiswhich must establish that the alternatives are always kept apart at runtime. (Of course, it wouldhave been unreasonable to expect the simple addition of choice alone to solve such dataflowproblems.)

7.4 Using wrappers

One tempting but flawed way of eliminating the need for constrained choice is to adapt Leroy’swrapping-unwrapping technique for representation analysis [9, 18, 19]. Recall the example ofstructureF with its abstract typeF.t . We can retain knowledge of the abstract nature ofF.tby keeping explicit conversions between concrete representations and abstract representations.

Assuming we have two ways of implementingF.t , we introduce conversion operationwrap1(·), unwrap1(·), wrap2(·), andunwrap2(·) from and to some “abstract” representationfor each. It is then possible to translate structureF either into

(pair (wrap1(λxy. · · ·)) (λfxy.(unwrap1(f) y x)))

or into

(pair (wrap2(λyx. · · ·)) λf.unwrap2(f))

The trick is that the compiler “knows” howwrapi(·) cancels with a correspondingunwrapi(·):

wrapi(unwrapi(M)) → M

unwrapi(wrapi(M)) → M

Thus, an application ofF.a toF.f is translated as(λfxy.(unwrap1(f)yx)wrap1(λxy. · · ·))which then turns intoλxy.(unwrap1(wrap1(λxy. · · ·)) y x) at which point the wrapper can-cels with the unwrapper so that the final result isλxy.(λxy. · · · y x). Similarly, had we usedF’ , thenwrap2(·) would have cancelled withunwrap2(·), the final result being the efficientversion ofF’.f : (λyx. · · ·).

The point is that we can now join the two versions usingfreechoice because the wrapperswill sufficiently shield one concrete version of the abstract type from the other. The worstthat could happen is that the final program still contains wrapper-unwrapper pairs of the formwrap1(unwrap2(·)) or some permutation thereof.

Of course, on the way to an executable form of the program one must give a concretemeaning for such residual wrap-unwrap code. To do this, we single out one representation asthecanonicalrepresentation. Operationwrapi(M) implements the transformation of some ex-pressionMi that currently uses representationi to a corresponding canonicalM ; unwrapi(M)is the inverse. In our example, if variant1 is singled out as “canonical,” thenwrap1(·) and

Page 19: Compiling with Non-deterministic Choice for Expressing ...people.cs.uchicago.edu/~blume/papers/choice.pdfCompiling with Non-deterministic Choice for Expressing Pending Optimization

Compiling with Choice 19

unwrap1(·) are identity transformations while forwrap2(·) andunwrap2(·) one could give thefollowing transition rules:

wrap2(M) → λxy.(M y x)

unwrap2(M) → λxy.(M y x)

(Notice that these rules can be used to derive the cancellation rules that we have seen earlier.)The serious drawback of this approach is that it can lead to time- and space leaks. It is

not difficult to construct a loop where the same value repeatedly goes through wrapping- andunwrapping-transformations, each such transformation constructing a new closure around theprevious one. As a result, closures “pile up”, programs begin to leak space and can sufferdramatic changes in their asymptotic time complexity. (The problem is that unwrapping is notreally the inverse of wrapping in terms of what happens at low level.)

Minamide and Garrigue have demonstrated a method of wrapping and unwrapping that—at the expense of efficiency—does not suffer from this problem [13]. Instead of passing avalue in oneor the other representation they pass it as a pair ofbothrepresentations. Monnierpointed out that this could be seen as a way of representing aconstrained choicebetween thetwo versions at runtime—together with a compile-time guarantee that in each situation therespective correct alternative will be selected [17].

8 Heuristic measuring

The heuristics that drive a compiler’s decisions typically use some kind ofmeasuring function(“measure”) to assess overall code quality and to estimate in advance what the effect of atransformation will be. The presence of choice operators does not eliminate the need for eitherthe heuristics or the measuring that goes along with them.

Fortunately, it is (typically) possible to extend existing measures for the choice-free core tothe full λ./-calculus without degrading their predictive powers.

8.1 Compositional monotone measures

For anyM ∈ Eλ, letM[[M ]] ∈ N be some number “measuring”M so that the compiler willconsider “M better thanN ” wheneverM[[M ]] < M[[N ]].

To extendM[[·]] to expressions inE./ (or E�) we note thatM[[·]] already gives us a way ofmeasuring each commitM ′ ∈ C[[M ]] for a givenM ∈ E./. Thus, we define:

M[[M ]] =def min

M ′∈C[[M ]]M[[M ′]]

This definition reflects our intention of eventually selecting the “best” candidate from thecommit set. However, it does not provide an efficient way of calculatingM[[M ]] because thesize of the commit set can be exponential in the number of choices withinM . (Of course, anupper bound onM[[M ]] can be obtained simply by taking an arbitrary element inC[[M ]] andmeasuring that. But the so-calculated upper bound could be far from the actual value and resultin an optimizer that acts “paranoid.”)

Fortunately, real measuring functions are oftencompositionalandmonotone:

Page 20: Compiling with Non-deterministic Choice for Expressing ...people.cs.uchicago.edu/~blume/papers/choice.pdfCompiling with Non-deterministic Choice for Expressing Pending Optimization

20 M. Blume

compositional: The measure is defined in a “syntax-directed” way, i.e., there is a set ofrules that describesM[[M ]] for any M purely in terms ofM[[N1]], . . . ,M[[Nk]] whereN1, . . . , Nk are non-overlapping subterms ofM . Moreover, for each concreteM there isprecisely one rule that applies.

monotone: In the set of rules definingM[[M ]], the functions that mapM[[N1]], . . . ,M[[Nk]]toM[[M ]] are monotone in each argument.

Perhapsthemost popular kind of measure for traditional (non-choice-based) intermediatelanguages is an estimate of code size. Appel [1, Chapter 7.2] describes in detail what the sizemeasureO[[·]] for SML/NJ’s CPS language is and how it can be used in making optimizationdecisions. It is easy to see that Appel’s definition ofO[[·]] is both compositional and monotone.

8.2 MeasuringM ∈ E./ efficiently

To make the example concrete, let us define our own (somewhat arbitrary) size measureOλ[[·]]for Eλ as follows:

Oλ[[x]] = c1

Oλ[[λx.M ]] = c2 +Oλ[[M ]]Oλ[[(M N)]] = c3 +Oλ[[M ]] +Oλ[[N ]]

(Here,c1, c2, andc3 are small constants.) Obviously,Oλ[[·]] is compositional and monotone.With this we now define a new measureO./[[·]] for λ./-expressions:

O./[[x]] = c1

O./[[λx.M ]] = c2 +O./[[M ]]pO./[[(M N)]] = c3 +O./[[M ]] +O./[[N ]]O./[[(M ./ N)]] = min{O./[[M ]],O./[[N ]]}

Based on the monotonicity ofOλ[[·]], a simple inductive argument shows that, indeed,

O./[[M ]] = minM ′∈C[[M ]]

Oλ[[M ′]]

.Moreover,O./[[M ]] can be efficiently calculated in time linear to the size ofM . ForM ∈

Eλ it coincides withOλ[[M ]]. And finally, calculation of the new measure provides an effectiveway of picking the right element from the commit set, i.e, theMbest ∈ C[[M ]] for whichO./[[M ]] = Oλ[[Mbest]].

In the same style, it is then possible to extend any compositional monotone measureM[[·]] :Eλ → N to arbitraryλ./-expressions in such a way that it can be efficiently calculated. Thus,for the rest of the discussion we will assume all measures to be of this kind.

Page 21: Compiling with Non-deterministic Choice for Expressing ...people.cs.uchicago.edu/~blume/papers/choice.pdfCompiling with Non-deterministic Choice for Expressing Pending Optimization

Compiling with Choice 21

8.3 Estimating savings

The intuition behind the “minimum rule” when measuring choice can also be explained interms of how the measure is used during optimization. In a situation where the compiler hastwo or more alternative ways of applying→λ-transitions to aλ./-termM , it can always firstgo to Mbest ∈ C[[M ]], decide (based onM[[Mbest]]) how it would proceed there, and thentranslate the decision back toM . The minimum rule provides this behavior without actuallytaking the detour throughMbest.

Thus, the “ballast” carried around in pending choices cannot make things worse. Some ofit may currentlylookworse thanMbest, but if it really turns out tobeworse, then it can alwaysbe dropped later. If it enables some later optimization thatM[[M ]] was not able to foresee, thenall the better.

In summary, estimating the savings of a transformation in theλ./-calculus can work muchthe same way as in the originalλ-calculus.

8.4 Pruning

Applying CHOOSEcorresponds to pruning the search space. Since many transformations du-plicate subterms, pruning must be used aggressively to keep the total size of a program’s rep-resentation manageable.

In the calculus without choice, the same size measureOλ[[·]] can be used both for estimatingthe size of the machine code generated for the program and for calculating the size of its currentrepresentation (because the two are closely related). Withλ./ this is no longer the case since thecurrent representation also includes all the choices that will not make it into the final program,butO./[[·]] does not measure those.

A separate functionR./[[·]] measuring representation size instead of expected code size canbe defined simply by using summation to handle choice:

R./[[x]] = c1

R./[[λx.M ]] = c2 +R./[[M ]]R./[[(M N)]] = c3 +R./[[M ]] +R./[[N ]]

R./[[(M ./ N)]] = R./[[M ]] +R./[[N ]]

R./[[·]] helps detecting when a termM “needs pruning”, but we useO./[[·]] (or what-ever other measureM[[·]] is appropriate) to decide where withinM to actually do the prun-ing. For example, one could searchM to find an occurrence of(N ./ N ′) that maximizes|O./[[N ]]−O./[[N ′]]| and then drop the worse alternative.

An algorithm for this approach would run in linear time, but it might not do exactly theright thing since a locally optimal pruning is not necessarily globally optimal. From the globalpoint of view, any pruning replacesM = L[(N ./ N ′)] with M ′ = L[N ] for some contextL.We want to pickL (and thereforeN andN ′) in such a way thatO./[[L[N ′]]] is maximized.This leads to a simple quadratic-time solution. Of course, pruning might have to be repeatedbecause a single incision will not always bringR./[[M ]] back within the limit.

For certain measure functions one can improve on the quadratic-time search algorithm bytaking advantage of regularities. In particular, if the compositional rules that defineM[[·]] relate

Page 22: Compiling with Non-deterministic Choice for Expressing ...people.cs.uchicago.edu/~blume/papers/choice.pdfCompiling with Non-deterministic Choice for Expressing Pending Optimization

22 M. Blume

the measures ofM ’s subterms to that ofM in an arithmetically linear fashion, then the searchfor the globally best choice to be pruned can be done in linear time because the linear relationbetween a local change and the corresponding global change can be maintained while walkingthe expression tree. Notice that our size measureOλ[[·]] has this linearity property, and so doesAppel’s size measure for CPS-terms. In fact, all the factors are1, so even the above-mentionedsearch for the locally best pruning would work.

8.5 Managing non-termination

One problem with the BETA./ rule is that after it has been applied, the originalβ-redex isstill there. Therefore, if one is not careful, this could be a source of non-termination in theoptimizer.

Of course, there is no shortage of potential non-termination even with the original choice-free calculus. Most of the time this is simply a consequence of the fact that an optimizer cannever be “perfect” because it cannot beat the halting problem.

In the particular case of BETA./, however, turning(λx.MN) first into(λy.M [(y ./ N)/x]N)and then immediately into(λz.M [(y ./ N)/x][(z ./ N)/y] N) is not just something that canlead to non-termination, it also is useless because it never provides any progess towards thegoal of optimization! Since all it does is to build choice-triplets of the form((z ./ N) ./ N),there is no extra information being propagated. Therefore, the second BETA./ step does nothelp the optimizer in any way.

A simple solution is to markβ-redexes that have been BETA./-reduced so they will not beeligible for future applications of this rule. On the other hand, since it is possible that pruningprematurely undoes the effect of BETA./, it might be a better approach to adjust the heuristicsin such a way that they make re-application of BETA./ extremely unlikely but not impossible.

9 Conclusions and future work

We have investigated a simple addition to intermediate languages of optimizing compilers. Thechoice operator./ enables compilers to encode pending optimization decisions into terms ofthe language itself and to perform breadth-first searches in optimization space. While bornout of the work on cross-module inlining, this can be generally beneficial for intermodule andintra-module optimizers and even for dataflow analysis.

Because of the introduction of a nondeterministic operator, a representation such asλ-calculus loses its confluence properties. However, as we have shown for this particular lan-guage, it is possible to introduce a notion of non-ambiguity that is being preserved by thetransition relation and provides a sufficiently strong notion of confluence for the segment ofthe system that a compiler would actually be interested in.

The presence of choice in the calculus does not eliminate the need for heuristics. While itwill certainly be possible to design new heuristics that specifically take advantage of choice,we have argued here that adaptation of existing heuristics for choice-free calculi (as far as theyare based on compositional monotone measures) is possible and will yield results that are noworse than before. In the future, an actual practical advantage of using intermediate languageswith choice has to be verified by providing an implementation in a real compiler.

Page 23: Compiling with Non-deterministic Choice for Expressing ...people.cs.uchicago.edu/~blume/papers/choice.pdfCompiling with Non-deterministic Choice for Expressing Pending Optimization

Compiling with Choice 23

References

1. Andrew W. Appel.Compiling with Continuations. Cambridge University Press, Cambridge, Eng-land, 1992.

2. Andrew W. Appel. Modern Compiler Implementation in ML. Cambridge University Press, Cam-bridge, England, 1998.

3. Andrew W. Appel and David B. MacQueen. Standard ML of New Jersey. In Martin Wirsing, editor,3rd International Symp. on Prog. Lang. Implementation and Logic Programming, pages 1–13, NewYork, August 1991. Springer-Verlag.

4. Andrew W. Appel and David B. MacQueen. Separate compilation for Standard ML. InProc. SIG-PLAN ’94 Symp. on Prog. Language Design and Implementation, volume 29, pages 13–23. ACMPress, June 1994.

5. Matthias Blume.λ./: Using nondeterministic choice as part of a deterministic calculus. TechnicalReport RIMS-1283, Research Institute for Mathematical Sciences, Kyoto University, July 2000.

6. Matthias Blume and Andrew W. Appel. Lambda-splitting: A higher-order approach to cross-moduleoptimizations. InProc. 1997 ACM SIGPLAN International Conference on Functional Programming(ICFP’97), pages 112–124. ACM Press, June 1997.

7. Cormac Flanagan, Amr Sabry, Bruce F. Duba, and Matthias Felleisen. The essence of compiling withcontinuations. InProceedings of the ACM SIGPLAN ’93 Conference on Programming LanguageDesign and Implementation, pages 237–247, New York, 1993. ACM Press.

8. Paul Hudak, Simon Peyton Jones, and Philip Wadler. Report on the programming language Haskell,a non-strict, purely functional language, version 1.2.SIGPLAN Notices, 27(5), May 1992.

9. Xavier Leroy. Unboxed objects and polymorphic typing. In19th Annual ACM Symp. on Principlesof Prog. Languages, pages 177–188, New York, January 1992. ACM Press.

10. Xavier Leroy. The Objective Caml system (release 2.04). (with Didier Remy, Jerome Vouillon andDamien Doligez), Institut National de Recherche en Informatique et en Automatique, November1999.

11. H. Massalin. Superoptimizer-a look at the smallest program. InProc. of the Second InternationalConference on Architectural Support for Programming Languages and Operating Systems (ASPLOSII) , pages 122–126, Palo Alto, CA, 1987.

12. Robin Milner, Mads Tofte, Robert Harper, and David MacQueen.The Definition of Standard ML(Revised). MIT Press, Cambridge, MA, 1997.

13. Yasuhiko Minamide and Jacques Garrigue. On the runtime complexity of type-directed unboxing. InProc. 1998 ACM SIGPLAN International Conference on Functional Programming (ICFP’98), pages1–12. ACM Press, September 1998.

14. Yasuhiko Minamide, Greg Morrisett, and Robert Harper. Typed closure conversion. InPOPL ’96:The 23rd ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, pages271–283. ACM Press, January 1996.

15. John C. Mitchell and Gordon D. Plotkin. Abstract types have existential type.ACM Trans. onProgramming Languages and Systems, 10(3):470–502, July 1988.

16. Eugenio Moggi. Computational lambda-calculus and monads. InSymposium on Logic in ComputerScience, pages 14–22. IEEE, 1989.

17. Stefan Monnier. personal communication, January 2000.18. Zhong Shao.Compiling Standard ML for Efficient Execution on Modern Machines. PhD thesis,

Princeton University, Princeton, NJ, November 1994. Tech Report CS-TR-475-94.19. Zhong Shao. Flexible representation analysis. InProc. 1997 ACM SIGPLAN International Confer-

ence on Functional Programming (ICFP ’97), pages 85–98, New York, 1997. ACM Press.20. Zhong Shao. An overview of the FLINT/ML compiler. InProc. 1997 ACM SIGPLAN Workshop on

Types in Compilation (TIC’97), June 1997.

Page 24: Compiling with Non-deterministic Choice for Expressing ...people.cs.uchicago.edu/~blume/papers/choice.pdfCompiling with Non-deterministic Choice for Expressing Pending Optimization

24 M. Blume

21. D. Tarditi, G. Morrisett, P. Cheng, C. Stone, R. Harper, and P. Lee. TIL: A Type-Directed Opti-mizing Compiler for ML. In1996 SIGPLAN Conference on Programming Language Design andImplementation, New York, 1996. ACM Press.