Bonsai: Synthesis-Based Reasoning for Type Systemskach/bonsai-popl18.pdf · Bonsai trees enable the...

34
62 Bonsai: Synthesis-Based Reasoning for Type Systems KARTIK CHANDRA, Henry M. Gunn High School and Stanford University, USA RASTISLAV BODIK, University of Washington, USA When designing a type system, we may want to mechanically check the design to guide its further development. We describe algorithms that perform symbolic reasoning about executable models of type systems. The algorithms support three queries. First, they check type soundness and synthesize a counterexample program if such a soundness bug is found. Second, they compare two versions of a type system, synthesizing a program accepted by one but rejected by the other. Third, they minimize the size of synthesized counterexample programs. These algorithms symbolically evaluate typecheckers and interpreters, producing formulas that characterize the set of programs that fail or succeed in the typechecker and the interpreter. However, symbolically evaluating interpreters poses efciency challenges, which are caused by having to merge execution paths of the various possible input programs. Our main contribution is the bonsai tree, a novel symbolic representation of programs and program states that addresses these challenges. Bonsai trees encode complex syntactic information in terms of logical constraints, enabling more efcient merging. We implement these algorithms in the Bonsai tool, an assistant for type system designers. We perform case studies on how Bonsai helps test and explore a variety of type systems. Bonsai efciently synthesizes counterexamples for soundness bugs previously inaccessible to automatic tools and is the frst automated tool to fnd a counterexample for the recently discovered Scala soundness bug SI-9633. CCS Concepts: Software and its engineering Automated static analysis; Search-based software engineering; Model checking; Semantics; Additional Key Words and Phrases: Type checking, synthesis, symbolic compilation ACM Reference Format: Kartik Chandra and Rastislav Bodik. 2018. Bonsai: Synthesis-Based Reasoning for Type Systems. Proc. ACM Program. Lang. 2, POPL, Article 62 (January 2018), 34 pages. https://doi.org/10.1145/3158150 1 INTRODUCTION Today’s type system designers strive to develop typecheckers that balance expressiveness and convenience with powerful static guarantees. This has led to a variety of innovations, such as polymorphism, path-dependent types, and ownership types. On their own, these features promise programmers strong static guarantees on their programs. Unfortunately, combining such features often creates intricate soundness bugs, some of which have gone unnoticed for many years. For example, soundness bugs were caused by the confuence of assignment and polymorphism in ML [Tofte 1990b; Wright and Felleisen 1994], and of path-dependent types and nullable values in Scala [Amin and Tate 2016a]. Authors’ addresses: Kartik Chandra, Henry M. Gunn High School, Palo Alto , Stanford University, Palo Alto, USA, [email protected]; Rastislav Bodik, University of Washington, Seattle, USA, [email protected]. © 2018 Copyright held by the owner/author(s). 2475-1421/2018/1-ART62 https://doi.org/10.1145/3158150 Proceedings of the ACM on Programming Languages, Vol. 2, No. POPL, Article 62. Publication date: January 2018. This work is licensed under a Creative Commons Attribution 4.0 International License.

Transcript of Bonsai: Synthesis-Based Reasoning for Type Systemskach/bonsai-popl18.pdf · Bonsai trees enable the...

62

Bonsai: Synthesis-Based Reasoning for Type Systems

KARTIK CHANDRA, Henry M. Gunn High School and Stanford University, USA

RASTISLAV BODIK, University of Washington, USA

When designing a type system, we may want to mechanically check the design to guide its further development.

We describe algorithms that perform symbolic reasoning about executable models of type systems. The

algorithms support three queries. First, they check type soundness and synthesize a counterexample program

if such a soundness bug is found. Second, they compare two versions of a type system, synthesizing a program

accepted by one but rejected by the other. Third, they minimize the size of synthesized counterexample

programs.

These algorithms symbolically evaluate typecheckers and interpreters, producing formulas that characterize

the set of programs that fail or succeed in the typechecker and the interpreter. However, symbolically evaluating

interpreters poses efficiency challenges, which are caused by having to merge execution paths of the various

possible input programs. Our main contribution is the bonsai tree, a novel symbolic representation of programs

and program states that addresses these challenges. Bonsai trees encode complex syntactic information in

terms of logical constraints, enabling more efficient merging.

We implement these algorithms in the Bonsai tool, an assistant for type system designers. We perform

case studies on how Bonsai helps test and explore a variety of type systems. Bonsai efficiently synthesizes

counterexamples for soundness bugs previously inaccessible to automatic tools and is the first automated tool

to find a counterexample for the recently discovered Scala soundness bug SI-9633.

CCS Concepts: • Software and its engineering → Automated static analysis; Search-based software

engineering; Model checking; Semantics;

Additional Key Words and Phrases: Type checking, synthesis, symbolic compilation

ACM Reference Format:

Kartik Chandra and Rastislav Bodik. 2018. Bonsai: Synthesis-Based Reasoning for Type Systems. Proc. ACM

Program. Lang. 2, POPL, Article 62 (January 2018), 34 pages. https://doi.org/10.1145/3158150

1 INTRODUCTION

Today’s type system designers strive to develop typecheckers that balance expressiveness andconvenience with powerful static guarantees. This has led to a variety of innovations, such aspolymorphism, path-dependent types, and ownership types. On their own, these features promiseprogrammers strong static guarantees on their programs. Unfortunately, combining such featuresoften creates intricate soundness bugs, some of which have gone unnoticed for many years. Forexample, soundness bugs were caused by the confluence of assignment and polymorphism inML [Tofte 1990b; Wright and Felleisen 1994], and of path-dependent types and nullable values inScala [Amin and Tate 2016a].

Authors’ addresses: Kartik Chandra, Henry M. Gunn High School, Palo Alto , Stanford University, Palo Alto, USA,

[email protected]; Rastislav Bodik, University of Washington, Seattle, USA, [email protected].

Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee

provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and

the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses,

contact the owner/author(s).

© 2018 Copyright held by the owner/author(s).

2475-1421/2018/1-ART62

https://doi.org/10.1145/3158150

Proceedings of the ACM on Programming Languages, Vol. 2, No. POPL, Article 62. Publication date: January 2018.

This work is licensed under a Creative Commons Attribution 4.0 International License.

62:2 Kartik Chandra and Rastislav Bodik

Parser Typechecker Interpreter

✓ ✓ ✘counterexample!

random

program

syntactically

correct programs

well-typed

programs

Syntactic fuzzer

samples grammar derivations

Type fuzzer

samples typesafe programs

Random fuzzer

samples all programs

Bonsai checker

reasons from runtime failures

syntax rules syntax + type rules syntax + typechecker +

interpreter

Fig. 1. Searching for a counterexample using various combinations of goal-directed search and forwardexecution. For example, the type fuzzer composes type judgments in order to generate a random type-safeprogram P . It then executes P forward on the interpreter, hoping that P will fail.

To automate checking of type systems, we present a set of symbolic algorithms that aid typesystem designers in reasoning about executable language models. Using bounded program synthesistechniques, we demonstrate how users can compute answers to queries such as these:

• Soundness. If the type system is not sound, synthesize a counterexample, that is, a programthat passes the typechecker but fails in the interpreter.

• Comparison. Given two versions of a typechecker, synthesize a program accepted by oneversion but rejected by the other, elucidating the impact of changes to a type system.

• Minimization. For either of these queries, produce the smallest possible counterexample.

Successfully answering these queries requires us to efficiently explore the extremely large spaceof candidate programs that are sufficiently complex to demonstrate a harmful interaction. Fuzzers,which are commonly used to answer the first query, search these large spaces by avoiding programsthat would be rejected somewhere along the parser-typechecker-interpreter pipeline (see Figure 1).Early fuzzers explored the space of all programs, some of which may have failed in the parser [Zeller1999]. More recent łsyntax fuzzers,ž such as Redex [Klein et al. 2012a], generate only syntacticallycorrect programs, thus shrinking the candidate space. Further advancements allow us to generateonly type-safe programs [Dewey et al. 2014, 2015; Fetscher et al. 2015; Roberson and Boyapati 2010].These łtype fuzzersž use constraint solvers to search for a typesafe program in a goal-directedfashion; only the interpreter is executed forward. However, even among type-safe programs,counterexamples are scarce.The natural next step, then, is to reason about the entire pipeline in a goal-directed fashion so

that the constraint solver can directly construct a counterexample by reasoning backwards frompossible failure points in the interpreter. We develop such a goal-directed algorithm by symbolicallycompiling the entire parser-typechecker-interpreter pipeline into a single logical formula. Thisreduction also lets us answer the other preceding queries by modifying the formula’s structure.Our contributions are:

• The Bonsai tree (Sections 2 and 3): We describe Bonsai trees, a symbolic representation ofa bounded space of input programs and tree-shaped program states. Bonsai trees enable thesymbolic evaluation of language models into logical formulas. Symbolic evaluation can beperformed on standard trees, of course, but Bonsai-powered symbolic evaluation is moreefficient and more effective, generating formulas that are smaller and easier to solve. Bonsaitrees reduce formula sizes by encoding syntactic information in logical constraints ratherthan in the shape of the data structure.We demonstrate several interesting properties of Bonsai trees (Sections 3 and 4). First, Bonsaitrees make it easier to avoid symbolically evaluating the parser-typechecker-interpreter

Proceedings of the ACM on Programming Languages, Vol. 2, No. POPL, Article 62. Publication date: January 2018.

Bonsai: Synthesis-Based Reasoning for Type Systems 62:3

Parser✓

Type Checker✓

Interpreter✘

symbolic treerepresents all programs

up to depth d

(define p

(make-symbolic-program!))

syntactically correct programs

(assert (parser p))

programs failing in interpreter

(assert-not (interpreter p))

well-typed programs

(assert (typechecker p))

counterexamples(define cex (solve))

(display cex)

rejected good programs

Fig. 2. Bonsai performs three independent symbolic evaluations, executing the interpreter on trees that areboth syntactically and type-incorrect.

pipeline as a monolithic composition of the three components. Instead, we evaluate themseparately, producing three simpler formulas that in turn represent: (1) syntactically correctprograms, (2) programs that pass the typechecker despite being syntactically incorrect, and(3) programs that succeed (or fail) in the interpreter even though they may be syntacticallyincorrect or not typesafe. These three sets, illustrated in Figure 2, are then intersected byconjuncting the formulas to obtain the set of counterexamples.Second, depending on the sets we intersect, we can formulate other queries. For example,the synthesis of correct programs that the typechecker rejects is also shown in Figure 3. TheCompare query is formed analogously (see the bullet łCase studiesž below for more details).The Minimize query simply adds the requirement that the solver must return a solution thatminimizes some user-supplied metric, such as tree size.Third, Bonsai trees can be added to standard symbolic evaluators. In particular, we formalizethe symbolic evaluation of Bonsai trees on a small core language (Section 3) and describehow we added this functional core language to Rosette [Torlak and Bodik 2014], a symbolicevaluator for a subset of Racket with assignments, objects, and other features (Section 4).This integration of Bonsai symbolic evaluation into Rosette is at the heart of the Bonsai tool.

• Case studies (Section 5): We study a variety of type systems with Bonsai, gauging its utilityduring a type system design.We perform three case studies, with Featherweight Java [Igarashiet al. 1999], Ownership Java [Boyapati et al. 2003], and DOT, the dependently typed model ofScala [Amin et al. 2016].We first examine how counterexamples can clearly explain soundness bugs. We demonstratethat by understanding how a counterexample fails in the interpreter, in our case by examininga stack trace of a program crash, we can better understand the cause of a soundness bug.Second, we investigate the language specification patterns that are friendly to symbolicevaluation. Interestingly, we find that Bonsai trees can be efficiently used to represent notonly ASTs, but also other auxiliary data structures, such as environments and class tables.Third, we experiment with the Compare query: we ask Bonsai to explain a static type rulethat appeared to us unnecessary (subsumed by other checks) or unnecessarily restrictive(rejecting correct programs). We removed the rule, creating a more permissive version ofthe type system, and asked Bonsai łIs there a program (i) rejected by the old type system,(ii) accepted by the new type system, and (iii) that nevertheless runs without error?ž Bonsaisynthesized such a program, suggesting that it may be useful to design a less restrictive typesystem.Finally, we use the DOT model to stress Bonsai’s expressiveness. DOT is a rich calculus withdependent types, function types, records, intersection types, and recursive types. We find

Proceedings of the ACM on Programming Languages, Vol. 2, No. POPL, Article 62. Publication date: January 2018.

62:4 Kartik Chandra and Rastislav Bodik

DOT easy to reformulate to Bonsai patterns. We were able to express all features of DOTexcept for recursive types: early on, we decided to use call-by-value semantics, only to realizelater that DOT’s definition of recursive types is compatible only with call-by-name semantics.Overall, implementing DOT features did not hamper symbolic evaluation. In fact, in minutes,we synthesized on DOT a counterexample for the Scala soundness error SI-9633 [Amin andTate 2016a].

• Performance evaluation (Section 6): First, in under an hour, we synthesized a counterex-ample for the language with assignments and polymorphic references [Tofte 1990a], whichhad been inaccessible to automatic tools [Fetscher et al. 2015]. Second, we showed thatBonsai trees allow exploration of vastly larger spaces than standard symbolic trees. In ourexperiments with lambda calculus, Bonsai explored 1050 vs. 1025 candidate programs in 20seconds. Third, on a subset of Redex benchmarks, Bonsaiwas about 600-times faster than asyntactic fuzzer and 12-times faster than a type fuzzer. We also fiound Bonsai superior in thesize of counterexamples, synthesizing counterexamples about 10-times smaller, on average,than the type fuzzer.

Implementations of many of the case studies and evaluations can be found at https://bitbucket.org/bonsai-checker/.Overall, we believe that Bonsai complements the theorem-prover-based mechanization of a

soundness proof, especially during design space exploration. Bonsai’s strengths include automationand supporting queries that go beyond soundness checking. Among its weaknesses is the inability toguarantee the absence of bugs. This limitation is offset byBonsai’s efficiency: we observe that duringan overnight run, it exhaustively explored candidates 2- to 4-times larger than the counterexamplesconstructed by human experts (Table 3), providing a safety margin. Its relationship with other toolsis described in Section 7.

2 SYMBOLIC COMPILATION OF INTERPRETERS

This section describes the challenges of symbolically compiling interpreters, which arise becauseinputs to interpreter programs are themselves programs, typically in AST form. We present back-ground information on symbolic compilation and discuss how the choice of AST representationaffects both compilation efficiency and the size of the resulting logical formula.

Symbolic compilation. We define the symbolic evaluation of a program P given a desired outputvalue y0 as the computation of an input value x0 such that P(x0) evaluates to y0. Naturally, no suchx0 may exist. Further, y0 may be used to model a particular kind of program failure. In this paper,we perform symbolic evaluation by symbolic compilation of P to a logical formula ϕ(x ,y), whichgives the relation over P ’s inputs and outputs, followed by solving for the model of the formulaϕ(x ,y) ∧ (y = y0). If a model exists, it gives the input x0.

Language specification. Symbolic compilation is influenced by the structure of (i) compiledprograms and (ii) inputs to those programs. In our case, these are interpreters and ASTs, respectively.Together, they define the semantics of the language that we want to check. Ideally, they are tailoredto the needs of the language designer rather than to the peculiarities of the symbolic compiler. Inparticular, we assume that the interpreter code is concrete in that it assumes that its input is a singleconcrete AST, even though the symbolic compiler will execute the interpreter simultaneously on aset of concrete ASTs. We perform symbolic compilation on such concrete interpreters.We assume that the interpreter consists of a typechecker and an evaluator. We model both

as AST-manipulating programs because they transform an AST in one grammar to an AST in apotentially different grammar. We refer to typecheckers and evaluators henceforth as programs,

Proceedings of the ACM on Programming Languages, Vol. 2, No. POPL, Article 62. Publication date: January 2018.

Bonsai: Synthesis-Based Reasoning for Type Systems 62:5

and we may occasionally refer to ASTs simply as trees. The language specified by the interpreter isthe source language; the language in which the interpreter is implemented is the implementationlanguage; the language of logical formulas produced by the symbolic language is the target language.

Concrete ASTs. In this background section, we assume that the interpreter uses the standardrepresentation of ASTs where nodes are labeled with grammar symbols and a node’s childrencorrespond to a grammar rule. We assume that nodes are implemented with records. As an exampleof the source language, consider the lambda calculus with the grammar e ::= v | λv .e | e e , where vranges over variable names. The AST for the term λy.x , shown as one of the three concrete ASTsin Figure 3, is constructed with the implementation code abs(id("y"),id("x")), which invokesrecord constructors abs and id. Note that grammar rules are enforced in the AST through typesignatures of records. For example, the AST for the grammatically invalid term λ(y x).x cannotbe constructed because the implementation code abs(app(id("y"),id("x")),id("x")) is notwell-typed (the first parameter to abs must be an id node instead of the app node).

Because each AST node corresponds to one grammar rule, we refer to this standard AST rep-resentation as the syntactic AST. As will be explained in the next section, bonsai trees are notsyntactic. It is not possible to examine a single bonsai node and determine the grammar rule thatgenerated it; as a result, grammar rules are enforced with non-local checks.

Concrete programs. A typical program processes a concrete tree recursively, branching based onpattern matching the AST node type. We show below the evaluator for the lambda calculus. Notethat eval returns the AST node type cl that represents a closure, which is not in the grammar ofthe input AST.

def eval(t, e) =

match t on

app f a for

match eval(f, e) on

cl b n e′ for eval(b, extend(e′, n, eval(a, e)))

end

abs n b for cl b n e

id n for lookup(e, n)

end

Symbolic ASTs. This paper considers symbolic compilation that is bounded in the number ofinput ASTs. The resulting formula captures the behavior of the interpreter on a bounded set Aof concrete ASTs as opposed to behaviors of all possible ASTs. A symbolic AST is an object thatrepresents the set A and serves as input to the compiled program during symbolic compilation.To distinguish the concrete ASTs in A, we index them with tree selectors, a set of dedicated

symbolic constants. Each assignment of concrete values to tree selectors selects a concrete tree.Mathematically, the symbolic AST is a map from the set of such assignments to the set of concreteASTs. In the implementation, the symbolic AST is specified with a program in the implementationlanguage parameterized by tree selector constants. The recursive tree generator defined below isone such program: each assignment to tree selectors c and n produces a syntactically valid concretetree. As shown, the generator defines an unbounded set of ASTs. The recursion of this generator isbounded during symbolic compilation to produce the bounded set A.

def e() =

let c = symbolic({App, Abs, Id}) # a fresh symb. constant ranging over {App, Abs, Id}

in case c in

App : app(e(), e())

Proceedings of the ACM on Programming Languages, Vol. 2, No. POPL, Article 62. Publication date: January 2018.

62:6 Kartik Chandra and Rastislav Bodik

Abs : abs(var(), e())

Id : var()

end end

def var() =

let n = symbolic({"x", "y"})

in id(n)

end

The evaluation of this generator program is a data structure that selects an AST from A usingjoin operators. Arguments of a join are guarded with symbolic predicates, i.e., predicates over treeselectors. These mutually exclusive guards determine which argument is selected. Figure 3 showsa symbolic AST of three concrete ASTs. The join operator is denoted {ϕ1 → x ,ϕ2 → λy.x ,ϕ3 →

(x y)}. If the symbolic tree were produced by the generator introduced above, then the guard ϕ1would be c1 = Id ∧ n1 = "x". Symbolic ASTs can include smaller symbolic AST representations assubtrees, which leads to the nesting of join operators.

𝜆 .abs app abs app

sharing of subtrees

join𝜙 𝜙 𝜙

Fig. 3. Left: concrete ASTs for the terms x , λy.x , and (x y). Right: the symbolic AST representing the set ofthe three concrete ASTs under their respective guards ϕ1, ϕ2, and ϕ3. Notice that symbolic trees can savespace by sharing subtrees that are mutually exclusive with respect to their guards.

Merging symbolic values. To translate an interpreter P into a logical formula ϕ(x ,y), the symboliccompiler first produces the symbolic return value of P , which is an expression over the symbolicinputs to P . When the input to P is a symbolic AST, then the symbolic inputs are the tree selectors.

Because the compiler performs what Clarke and Richardson [1985] call global symbolic evaluation,i.e., it constructs symbolic values for all concrete trees in A, we must decide how to organize thesealternative symbolic values. One extreme design is to keep the symbolic values separate, forminga symbolic execution tree [King 1976]. The other extreme is to insert join operators wheneversymbolic values merge, for example, as in Gated SSA form [Alpern et al. 1988].There are tradeoffs between the two extremes. Keeping paths separate preserves path-specific

program properties, such as constant values of variables, which can in turn enhance the effective-ness of partial evaluation. Furthermore, path-specific facts can prove some execution paths to beinfeasible. When these proofs are accomplished during symbolic evaluation, as opposed to in thesolver, we reduce the size of the symbolic value and increase the speed of symbolic compilation.Unfortunately, these benefits come at the expense of path explosion.In this paper, we follow the tradition of preserving path-specific properties while reducing

the necessary path explosion [Ball and Rajamani 2000; Das et al. 2002; Flanagan and Saxe 2001b;Holley and Rosen 1980; Steffen 1996]. Specifically, we peek into the arguments of join operatorswhen pattern matching on a symbolic value. We process those arguments individually, effectivelydistributing pattern matching across the join. Such deconstruction of joins has been used, forexample, in [Das et al. 2002; Torlak and Bodík 2014].

To illustrate the benefits of peeking into joins, we first rely on an example that is not tree based:

def P(x) = if (x>5) "big" else "small"

Proceedings of the ACM on Programming Languages, Vol. 2, No. POPL, Article 62. Publication date: January 2018.

Bonsai: Synthesis-Based Reasoning for Type Systems 62:7

The symbolic return value of P is {x > 5 → "big",¬(x > 5) → "small"}. This symbolic valuejoins two string constants under symbolic path conditions. Next, consider the symbolic value ofstrLen(P(x)). The symbolic value starts as strLen({x > 5 → "big",¬(x > 5) → "small"}),but, after distributing strLen across the join, we get {x > 5 → strLen("big"),¬(x > 5) →

strLen("small")}), which simplifies to the smaller symbolic value {x > 5 → 3,¬(x > 5) → 5}.This simplification was made possible by deconstructing the join rather than treating it as anopaque symbolic value.

Merging symbolic trees. In interpreters, the merging of symbolic trees occurs, for example, at theend of match expressions, whose case alternatives can produce different symbolic ASTs that mustbe merged.

Merging symbolic ASTs is inefficient because the join operator may double in size, as measuredby the number of its arguments. Figure 4 (left) shows that a join must preserve children from bothmerged trees. Consequently, the number of children with the same label, say app, may increaseafter the merge. This property affects pattern matching efficiency on the symbolic tree.

join𝛼 𝛽𝐺 merge(𝜙 ⇒ 𝐺 , 𝜙 ⇒ 𝐺 ):𝐺𝜙 …𝜙 𝑘𝜆 app𝛼 𝑘 𝛽 𝑘

join𝛼 𝛽

𝜙 …𝜙 𝑘𝜆 app𝛼 𝑘 𝛽 𝑘

join𝛼 𝛽

𝜙 ∧ 𝜙𝜆 𝛼 𝛽

𝜙 ∧ 𝜙 …𝜆 app𝛼 𝑘 𝛽 𝑘

join𝛼 𝛽

𝜙 𝜆 𝛼 𝛽𝜙 …𝜆 app𝛼 𝑘 𝛽 𝑘

match(𝑛, λ 𝒗 . 𝒆) =𝒗 ↦ 𝛼𝒆 ↦ 𝛽pc ↦ 𝜙 ,

𝒗 ↦ 𝛼𝒆 ↦ 𝛽pc ↦ 𝜙 ,…join𝜙 𝜙

Fig. 4. Left: A merge of two symbolic ASTs. Right: A łmatch" operation on a symbolic AST.

Pattern matching on symbolic trees. When pattern matching on a symbolic tree, the symboliccompiler must execute each alternative case. Furthermore, as Figure 4 (right) shows, each case mayneed to be executed multiple times, reducing compilation efficiency. This occurs because the matchoperator may find in a join multiple children that match a given pattern. In the figure, we matchthe pattern λv .e and find two subtrees matching this pattern. These trees have the path conditionsϕ11 and ϕ22. Both trees must be separately compiled.

Summary. The crucial insight is that syntactic ASTs are inefficient in symbolic compilationbecause their joins grow during merges, impacting the size of the symbolic value, and their matchesproduce multiple alternatives, impacting the time efficiency of symbolic compilation.This observation guides us to the central concept behind the Bonsai tree: we maintain a single

tree data structure and embed into it all possible trees, using symbolic tree selectors to decidewhich tree is embedded. As we discuss in the next section, we can enable such embedding if werepresent syntactic constraints with non-local symbolic predicates rather than with the tree shape.

3 BONSAI TREES

This section develops bonsai trees following the three presentation steps depicted in Figure 5.First, we convert concrete syntactic ASTs to concrete bonsai trees (Section 3.1). Next, we introducethe symbolic bonsai tree, a data structure that represents a bounded set of concrete bonsai trees(Section 3.2). Finally, we define symbolic compilation by giving the semantics of Bcl, a calculus forprograms that operate on symbolic bonsai trees. The evaluation of a program in this semanticscompiles the program into an SMT formula (Section 3.3). To illustrate the interaction of conceptsintroduced in this section, we provide a small end-to-end example (Section 3.4). While we do notformalize symbolic ASTsÐideally, we would present a counterpart of Section 3.3 for ASTsÐexamplescontrast bonsai trees with ASTs throughout this section.

Proceedings of the ACM on Programming Languages, Vol. 2, No. POPL, Article 62. Publication date: January 2018.

62:8 Kartik Chandra and Rastislav Bodik

𝑓

bonsai treesASTs

symbolic AST symbolic bonsai tree

symbolic semantics of 𝑓𝑎 symbolic semantics of 𝑓𝑏concrete semantics of 𝑓𝑎 concrete semantics of 𝑓𝑏

Sec. 3.1

Sec. 3.2

6, 12

13 13

7, 8, 9,

10, 14

11, 15

3, 4

3, 6, 12S

ec.

2bonsai-fy ASTs

represent sets of trees

symbolicallygive semantics

of function 𝑓Legend

𝑓𝑎 𝑓𝑏𝑓𝑎 𝑓𝑏

Fig. 5. Overview of Section 3. The presentation traverses the three solid edges (ASTs→bonsai trees→symbolicbonsai trees→symbolic semantics), first bonsai-fying syntactic ASTs, then making them symbolic, and finallygiving symbolic compilation for programs on symbolic bonsai trees. Node labels refer to figures that illustrateconcepts in these nodes. Programs fa , fb operate on ASTs and bonsai trees, respectively.

Bonsai trees build on ideas introduced by Pipal [Roberson and Boyapati 2010] and Rosette [Torlakand Bodik 2014]; we describe the evolution of ideas in Section 7. An empirical evaluation of bonsaitrees is presented in Section 6.3.

3.1 The Bonsai Form of Concrete ASTs

A bonsai tree b = (N ,E, Σ, label) is a binary tree with an alphabet Σ = {•} ∪ L and a node-labelingfunction label : N → Σ. The internal nodes of bonsai trees are always labeled with the symbol •while leaf nodes are labeled with a symbol from L. A grammar is in bonsai form if it defines bonsaitrees.

Recall the lambda calculus grammar e ::= v | λv .e | e e introduced Section 2. One possible bonsaiform for this grammar is e ::= v | •(λ, •(v, e)) | •(e, e) with the alphabet Σ = {•, λ,v}. Figure 6shows three bonsai trees generated by this grammar along with the corresponding ASTs.Note that this bonsai grammar lacks a dedicated symbol for function application. (Our ASTs

use the symbol app for this purpose). The bonsai grammar is nonetheless a suitable programrepresentation because it is unambiguous, i.e., each bonsai tree node could have been generatedonly by one grammar rule. We can thus reconstruct the underlying AST of a bonsai tree eventhough all its internal nodes have the same label.

𝑥𝑦λ 𝑥𝜆𝑦. 𝑥𝑥

𝑥 𝑦𝑥 𝑦

leaf node (labeled with symbols from L)

internal nodes (labeled with ● )

a bonsai form of grammar

for lambda calculus:𝑒 ∶≔ 𝑣 ● 𝜆, ● 𝑣, 𝑒 ● 𝑒, 𝑒 𝜆𝑦. 𝑥𝑥 𝑥 𝑦𝑥 abs𝑦 𝑥 app𝑥 𝑦

Fig. 6. Left: Concrete bonsai trees for lambda calculus terms x , λy.x , and (x y). Right: Corresponding ASTs.

3.2 The Symbolic Bonsai Tree

Analogously to the symbolic AST introduced in Section 2, a symbolic bonsai tree B is a map from the

values of tree selectors C to a bounded set B of concrete bonsai trees. To preview our development,

the key idea is to represent the label function as a symbolic expression over the tree selectors C .Tree selectors thus determine node labels which in turn determine which tree is represented.

Proceedings of the ACM on Programming Languages, Vol. 2, No. POPL, Article 62. Publication date: January 2018.

Bonsai: Synthesis-Based Reasoning for Type Systems 62:9

We proceed in three steps. First, we embed the concrete bonsai trees bi ∈ B into the embeddingtree te such that the roots of bi and te are aligned, as illustrated in Figure 7(left). The embeddingtree te is a binary tree of sufficient depth to accommodate all trees in B. In this paper we assumethat te is a perfect binary tree but this is in general not necessary.The node labels in te determine which bonsai tree bi ∈ B is embedded in te . The only point

worth noting is that labels of nodes below the embedded tree are undefined, as is the case for nodesn3 and n4 when te embeds •(λ, •(y,x)) which represents λy.x , as illustrated in Figure 7(right).

leaf node

internal node𝑥𝑦λ 𝑥𝜆𝑦. 𝑥𝑥 𝑥 𝑦

unused node𝑦𝑥 𝑛 𝑛𝑛𝑛 𝑛 𝑛 𝑛

if label 𝑛 = ●label 𝑛 = ●label 𝑛2 = 𝜆label 𝑛 = 𝑦label 𝑛 = 𝑥then t represents 𝜆𝑦. 𝑥.

embedding tree t:

Fig. 7. Left: Concrete bonsai trees embedded in the embedding tree te . Right: The condition under which teembeds λy.x . The numbering of embedding tree nodes that is used in the examples of the paper.

The second step is to control the label function with tree selectors. A tree selector c j ∈ C isa symbolic constant ranging over the alphabet Σ = {•} ∪ L. A fresh symbolic bonsai tree is anembedding tree whose node labels are each directly determined by a tree selector, i.e., label(ni ) = c j .A fresh symbolic bonsai tree of depth 2 is shown in Figure 8(left).

label 𝑛 = 𝑐label 𝑛 = 𝑐label 𝑛 = 𝑐 label 𝑛 = 𝑐label 𝑛 = 𝑐label 𝑛 = 𝑐label 𝑛 = 𝑐

ite

c1 , • c1 [ , ]

ite

c2 , • c2 [ , ]

c3 c4

ite

c5 , • c5 [ , ]

c6 c7

Fig. 8. Left: A fresh symbolic bonsai tree of depth 2. If the tree is used with the bonsai grammar from Figure 6,then the symbolic constants ci range over the alphabet {•, λ,v}. Right: The symbolic value for the freshsymbolic bonsai tree on the left. The symbolic value was created by evaluating the term tree 2 defined inFigure 11. The nested-ite structure mirrors that of the embedding tree on the left.

A fresh symbolic bonsai tree includes all trees from B as well as syntactically invalid concretetrees. The reason is that a fresh tree includes all Σ-labeled binary trees that can be embedded in te .For example, no concrete bonsai tree from the lambda calculus grammar can have label(n5) = λ,yet a fresh symbolic tree includes such concrete trees.In the third step, we filter out grammatically invalid trees by imposing non-local syntactic

assertions of the form shown in Figure 7(right). A complete example of syntactic constraints is inthe right table of Figure 14. The general procedure for generating syntactic constraints traversesthe embedding tree, adding constraints for all grammar rules that are legal at a particular node.Details are given in Section 4.

Proceedings of the ACM on Programming Languages, Vol. 2, No. POPL, Article 62. Publication date: January 2018.

62:10 Kartik Chandra and Rastislav Bodik

label ሚ𝐶. 𝑛1 ← { 𝜋 → label ሚ𝐴. 𝑛1 , ¬𝜋 → label ෨𝐵. 𝑛1 }ሚ𝐶 ← merge 𝜋, ሚ𝐴, ෨𝐵

label 𝐴. 𝑛1ሚ𝐴: label 𝐴. 𝑛5 label 𝐵. 𝑛1෨𝐵: label 𝐵. 𝑛5 label ሚ𝐶. 𝑛5 ← { 𝜋 → label ሚ𝐴. 𝑛5 , ¬𝜋 → label ෨𝐵. 𝑛5 }ሚ𝐶:…… …

Fig. 9. The merge of symbolic bonsai trees A and B. The resulting symbolic tree C equals A under the condition

π , and B otherwise. The merge is a node-wise join of symbolic labels. While a merge could double the size ofthe symbolic label function, the embedding tree remains constant.

test ● a s,● 𝒏, 𝒃 on ሚ for ● ● l, 𝒏 ,● 𝒃, 𝑒→෨ , 𝜋𝜋 = label 𝑛 = ● ∧ label 𝑛 = a s ∧ label 𝑛 = ● 𝑛𝒏 𝒃pattern for a s 𝒏 𝒃:

a s 𝑛𝑛 𝑛𝑛ሚ: ෨:● ● l, 𝒏 ,● 𝒃, 𝑒label ෨ . 𝑛 ← ●label ෨ . 𝑛 ← ● label ෨ . 𝑛 ← ●𝑛 label ෨ . 𝑛 ← llabel ෨ . 𝑛 ← label ሚ. 𝑛

label ෨ . 𝑛 ← label ሚ. 𝑛label ෨ . 𝑛 ← 𝑒Fig. 10. Pattern matching on a symbolic bonsai tree A. The match is illustrated with the bonsai variant of thecode fragment łabs n b for cl b n ež from the lambda calculus interpreter in Section 2. The constructtest patt on tree for result in turn (i) constructs the symbolic condition π under which the symbolic tree

matches the pattern (left); (ii) binds the pattern variables n and b to nodes A.n6 and A.n7 (middle); and

(iii) substitutes these variables into result, producing the symbolic tree B (right).

We are now ready to define merging and pattern matching, which symbolic bonsai trees performefficiently. The merge of two symbolic trees is a node-wise symbolic join of their respective labelsshown in Figure 9.

While the symbolic labels of C include the labels of both A and B, potentially doubling in size

after a merge, the embedding tree of C does not grow. In comparison, the merge of symbolic ASTsgrows both symbolic guards and the underlying tree (see Figure 4(left)).

Note that growing symbolic labels tends to be less harmful than growing the tree. First, symboliccompilation represents formulas as DAG-shaped symbolic values, allowing sharing of commonsubformulas. Second, we never have to peek into the symbolic labels during symbolic compilation.As a result, they have very small memory footprint despite their large size.

Thanks to the size invariance of the embedding tree, we guarantee that pattern matching is aconstant-time operation. The reason is that all matched concrete subtrees are represented by the(single) symbolic label function. In contrast, pattern in symbolic ASTs can match multiple trees,forcing the symbolic compiler to perform more work (see Figure 4(right)).Figure 10 illustrates pattern matching. The pattern •(abs, •(n,b)) is applied against node n1 of

tree A. The success of the match is a symbolic predicate π that checks whether labels in nodes

n1, n2, and n5 match the pattern. The substitution of pattern variables into the resulting tree B isimplemented as substitution of labels. For example, the pattern variable n is substituted into node

B.n4, which receives the label of node A.n6 to which n was bound during pattern matching.

3.3 The Bcl Language

We formalize symbolic bonsai trees through Bcl, a core language for manipulating symbolic bonsaitrees. Bcl is minimal but suffices for defining symbolic bonsai trees as well as their match andmerge operations. Most notably, Bcl lacks procedures, which makes it unable to express a recursive-descent interpreter. This limitation will not impact the presentation because adding procedures isorthogonal to compiling symbolic bonsai trees.

To compile a Bcl program into a symbolic bonsai tree, we evaluate it with rules in Figure 11. Theevaluation produces symbolic values, which represent symbolic bonsai trees in the Bcl semantics.

Proceedings of the ACM on Programming Languages, Vol. 2, No. POPL, Article 62. Publication date: January 2018.

Bonsai: Synthesis-Based Reasoning for Type Systems 62:11

⟨term⟩ ::= [ ⟨term⟩ , ⟨term⟩ ] New inner node with subtrees

| treei d Fresh tree of depth d ≥ 0

| c Constant ranging over L

| match ⟨term⟩ on (⟨pat ⟩ for ⟨term⟩)* end Pattern matching

| x, y, . . . Variable bound by a match

⟨pat ⟩ ::= x, y, . . . Pattern variable

| c Constant ranging over L

| [ ⟨pat ⟩ , ⟨pat ⟩ ] Inner node

⟨val⟩ ::= [ ⟨val⟩ , ⟨val⟩ ] Inner node with subtrees

| c Constant ranging over L

| c Symbolic constant ranging over L ∪ {•}

| fail Failure

| ite(⟨frm⟩, ⟨val⟩, ⟨val⟩) If-then-else

⟨frm⟩ ::= T | F

| c = c

| ⟨frm⟩ ∧ ⟨frm⟩ | ⟨frm⟩ ∨ ⟨frm⟩ | ⟨frm⟩ ⇒ ⟨frm⟩ | ¬ ⟨frm⟩

⟨e, π , ϕ ⟩ →⟨e′, π ′

, ϕ′⟩

⟨f , π , ϕ ⟩ →⟨f ′, π ′′

, ϕ′′⟩

⟨[e, f ], π , ϕ ⟩ →⟨[e′, f ′], π ′ ∧ π ′′

, ϕ′ ∧ ϕ′′⟩ SC-Inner-Node

⟨treel (i ) d − 1, π , ϕ

⟩→ ⟨v1, π , ϕ ⟩

⟨treer (i ) d − 1, π , ϕ

⟩→ ⟨v2, π , ϕ ⟩ d > 0

⟨treei d, π , ϕ ⟩ → ⟨ite(ci , •, ci , [v1, v2]), π , ϕ ⟩SC-Tree

SC-Match-Empty

⟨match t on [] end, π , ϕ ⟩ → ⟨fail, π , ϕ ∧ ¬π ⟩ ⟨treei 0, π , ϕ ⟩ → ⟨ci , π , ϕ ⟩ SC-Leaf

SC-Match-Nonempty⟨test p on t for e, π , ϕ ⟩ →

⟨v ′, π ′

, ϕ′⟩ ⟨

match t on [q for f ; . . . ] end, π ∧ ¬π ′, ϕ

⟩→

⟨v ′′

, π ′′, ϕ′′

⟨match t on [p for e ;q for f ; . . . ] end, π , ϕ ⟩ →⟨merge(π ′

, v ′, v ′′), π ′ ∨ π ′′

, ϕ′ ∧ ϕ′′⟩

SC-Test-Const-Pass

⟨test c on c for e, π , ϕ ⟩ → ⟨e, π , ϕ ⟩

SC-Test-Sym

⟨test c on α for e, π , ϕ ⟩ → ⟨e, π ∧ (α = c), ϕ ⟩

SC-Test-Const-Failc , c′

⟨test c on c′ for e, π , ϕ

⟩→ ⟨fail, π , ϕ ∧ ¬π ⟩

SC-Test-Pattern-Variable

⟨test x on t for e, π , ϕ ⟩ → ⟨e[t/x ], π , ϕ ⟩

SC-Test-Inner-Node⟨test p on t for e, π , ϕ ⟩ →

⟨e′, π ′

, ϕ′⟩ ⟨

test p′ on t ′ for e′, π ′, ϕ′

⟩→

⟨e′′, π ′′

, ϕ′′⟩

⟨test [p, p′] on [t, t ′] for e, π , ϕ

⟩→

⟨e′′, π ′′

, ϕ′′⟩

⟨t, π , ϕ ⟩ →⟨t ′, π ′

, ϕ′⟩

⟨test p on t for e, π , ϕ ⟩ →⟨test p on t ′ for e, π ′

, ϕ′⟩ SC-Test-Trans

⟨test p on w for e, π ∧ψ , ϕ ⟩ →⟨v ′, π ′

, ϕ′⟩ ⟨

test p on w ′ for e, π ∧ ¬ψ , ϕ⟩→

⟨v ′′

, π ′′, ϕ′′

⟩⟨test p on ite(ψ , w, w ′) for e, π , ϕ

⟩→

⟨merge(ψ , v ′

, v ′′), (ψ ⇒ π ′) ∧ (¬ψ ⇒ π ′′), ϕ′ ∧ ϕ′′⟩ SC-Test-Ite

merge(π , ite(ϕ, a, [v, v ′]

), ite(ψ , b, [w, w ′])) →

ite((π ⇒ ϕ) ∧ (¬π ⇒ ψ ), ite(π , a, b), [merge(π , v, w ), merge(π , v ′

, w ′)]) SC-Merge

Fig. 11. The Bcl language and its reduction rules.

Proceedings of the ACM on Programming Languages, Vol. 2, No. POPL, Article 62. Publication date: January 2018.

62:12 Kartik Chandra and Rastislav Bodik

In the Bcl grammar also shown in Figure 11, programs are constructed from terms and patterns,and symbolic values are constructed from values and formulas.The structure of the symbolic value mirrors the embedding tree as illustrated in Figure 8. Each

internal node ni of the embedding tree is represented in the symbolic value with an ite(di,li,pi)

constructor where li = label(ni ) is the symbolic label of ni ; d = (li , •) is a symbolic predicate thatholds when ni is a leaf of the embedded tree; and pi = [v,w] is a pair constructor that points tothe descendant symbolic values v andw . Leaves of the embedding tree, such as the node n3, canonly embed leaves of concrete bonsai trees and are thus represented directly with a symbolic label,omitting the ite node.Note that Bcl programs are bound to symbolic inputs using the term treed which produces

a fresh symbolic bonsai tree of depth d . Symbolic compilation of a program term t amounts toreducing the triple ⟨t ,T ,T ⟩ to a triple ⟨v,π ,ϕ⟩, wherev is the resulting symbolic value, π is the pathcondition under which the final triple is reached, and ϕ is the condition under which the programterminates, called the assertion store. (A program can only fail in a pattern match.) Formulas π andϕ are symbolic over the tree selectors created in treed . The formula π ∧ ϕ is satisfiable iff there isa concrete bonsai tree bi ∈ B on which t terminates.In the reduction rules in Figure 11, t , e, f range over terms; p,q range over patterns; x ,y range

over pattern variables; v,w range over symbolic values; c ranges over the constants from L; ci are

symbolic constants from the set C ranging over Σ = {•} ∪ L; and π ,ϕ,ψ are boolean formulas

with equality defined on C × ({•} ∪ L).The rules SC-Tree and SC-Leaf evaluate the term treei d to a symbolic value for a fresh symbolic

tree of depth d . The subscript in treei identifies the syntactic position of the term in the originalprogram. The subscript governs the selection of tree selector symbolic variables, ensuring thattreei d evaluates to an equivalent symbolic value (i.e., a value with the same tree selectors) evenwhen the term is duplicated during reduction. SC-Tree relies on functions l and r to generateunique subscripts for tree selectors in subtrees.

SC-Inner-Node evaluates tree literalsÐinner tree nodes that are created outside the term treei d .The remaining rules define pattern matching and merging. The two SC-Match-* rules turn amatchterm into a sequence of tests, one for each case. SC-Match-Empty compiles to a failure that istriggered when none of the cases has matched.The real workhorse are the SC-Test-* rules which reduce expressions that compare patterns

against trees. For example, SC-Test-Ite compares an arbitrary pattern against an ite-rooted sym-bolic value, and SC-Test-Inner-Node compares an inner-node pattern against an inner-node termor a symbolic value.

The merge function performs the node-wise merge of two symbolic trees, each represented as asymbolic value. The merge happens under the path condition π and is illustrated in Figure 9. Notethat the merge operation is defined in its most general form: a merge of two ite values. Definitionsfor merging non-ite values can be derived by setting ϕ and/or ψ to T or F and simplifying theresulting expressions.The Bcl language explains why bonsai pattern matching is efficient. The crucial invariant

maintained in symbolic bonsai trees is that each symbolic join (represented in Bcl with an ite), hasthe property that if its condition is true, the result is a leaf, else it is an inner node. This invariantcan be verified by examining rules SC-Tree and SC-Merge, the only sources of ites.

A useful result of this invariant is that a pattern-matching engine need not branch whenmatchingagainst an ite: depending on whether the pattern is a leaf or an inner node, it is clear which path ofthe ite must be taken. In contrast, recall that symbolic syntax trees require a match operation foreach member of their symbolic joins.

Proceedings of the ACM on Programming Languages, Vol. 2, No. POPL, Article 62. Publication date: January 2018.

Bonsai: Synthesis-Based Reasoning for Type Systems 62:13

𝐺: e ::= a | id e | swap e e

𝐺’: e ::= a | ●(id, e) | ●(e, e)

𝑡 :

𝑏 :

𝑡 :

𝑏 :

swap𝑖𝑑 aa𝑖𝑑swapaa

aid aid aaa 𝑒 𝑒id 𝑒

a 𝑖𝑑 swap𝑒 𝑒 𝑒bonsai

AST

𝑓𝑏 a a● id, 𝑥 𝑥● 𝑥, 𝑦 ● 𝑥, 𝑦

swap𝑖𝑑 aaswap𝑖𝑑aa 𝑖𝑑swapaa

swapaa

a aid a aaa ida aid

𝑓𝑎 a aid 𝑥 𝑥swap 𝑥 𝑦 swap 𝑦 𝑥

Fig. 12. Running example: Two concrete trees in AST and bonsai forms together with their grammars.

𝐺𝐺’: ● ●

𝑡𝑡′

𝑡𝑡′

swap𝑖𝑑 aa𝑖𝑑swapaa

aid aid aaa 𝑒 𝑒id 𝑒

a 𝑖𝑑 swap𝑒 𝑒 𝑒

def 𝑓𝑏(t) = match t ona for a

● id, 𝑥 for 𝑥● 𝑥, 𝑦 for ● 𝑥, 𝑦

end

swap𝑖𝑑 aaswap𝑖𝑑aa 𝑖𝑑swapaa

swapaa

a aid a aaa ida aid

def 𝑓𝑎(t) = match t ona for aid 𝑥 for 𝑥swap 𝑥 𝑦 for swap 𝑦 𝑥end

Fig. 13. Running example: A Bcl program f for ASTs (top) and for bonsai trees (bottom). Also shown are twosamples of f ’s concrete semantics in the form of input-output pairs.

3.4 Example

We illustrate symbolic compilation of tree manipulating programs with the example in Figures 12ś15. The specific problem is to invert the execution of the Bcl program f in Figure 13, solving for aninput tree that makes f produce a particular output tree. The example first bonsai-fies ASTs, thengives the concrete semantics of f , followed by making bonsai trees symbolic, and finally giving thesymbolic semantics of f . Figure 5 illustrates how these steps build on each other.

Figure 12 describes trees that f transforms. ASTs have three kinds of terms: id(x), swap(x,y),and the leaf term a. The figure shows the AST grammar as well as a bonsai form of the grammar.Two pairs of trees illustrate how ASTs are represented with bonsai trees. Note that the symbolswap from the AST alphabet Σ is not explicitly represented in bonsai trees.Figure 13 shows the Bcl program f in two forms: for ASTs (fa ) and for bonsai trees (fb ).

We also illustrate f ’s concrete semantics on two concrete input trees. Admittedly, program f

is rather contrived. Ideally, we would illustrate symbolic compilation on a program that is aninterpreter for the source language in Figure 12. The recursion in an interpreter program would,however, complicate the example. Therefore, the non-recursive program f performs only one localtransformation of the input tree.

Figure 14 shows five symbolic bonsai trees.Wewill use tree B5 as input to f when f is symbolically

compiled. Trees B1 to B4 are stepping stones in our explanation. Trees B1 and B2 are degeneratesymbolic trees because they each represent a single concrete bonsai tree, b1 and b2, respectively,

which are shown in Figure 12. Due to this restriction, all labels of B1 and B2 are (non-symbolic)constants, and thus the trees are not strictly symbolic. Note that labels of nodes that are below

the embedded tree are allowed to be undefined (e.g., label(n7) for B1). Tree B3 represents the set

Proceedings of the ACM on Programming Languages, Vol. 2, No. POPL, Article 62. Publication date: January 2018.

62:14 Kartik Chandra and Rastislav Bodik

෨𝐵 ෪𝐵 ෪𝐵 ෪𝐵 ෪𝐵 ෪𝐵𝐿 ෨𝐵 {𝑏 } {𝑏 } {𝑏 , 𝑏 } 𝐹 𝐿 𝐺′label 𝑛 ● ● ● ǁ𝑐 ǁ𝑐label 𝑛 ● id ite ǁ𝑐, id, ● ǁ𝑐 ǁ𝑐label 𝑛 id – id ǁ𝑐 ǁ𝑐label 𝑛 a – a ǁ𝑐 ǁ𝑐label 𝑛 a ● ite ǁ𝑐, a, ● ǁ𝑐 ǁ𝑐label 𝑛 – a a ǁ𝑐 ǁ𝑐label 𝑛 – a a ǁ𝑐 ǁ𝑐

Assertion store for symbolic tree ෪𝐵 . (Syntactic constraints of grammar 𝐺′.)𝑠 𝑛 label 𝑛 = • ∨ label 𝑛 = 𝑎𝑠 𝑛 label 𝑛 = • ⇒ label 𝑛 = id ∨ label 𝑛 = a ∨ label 𝑛 = •𝑠 𝑛 label 𝑛 = • ⇒ label 𝑛 = • ⇒ label 𝑛 = id ∨ label 𝑛 = a𝑠 𝑛 label 𝑛 = • ⇒ label 𝑛 = • ⇒ label 𝑛 = a𝑠 𝑛 𝑠 𝑛 ⇒ label 𝑛 = id ∨ label 𝑛 = a ∨ label 𝑛 = •𝑠 𝑛 label 𝑛 = • ⇒ label 𝑛 = • ⇒ label 𝑛 = id ∨ label 𝑛 = a𝑠 𝑛 label 𝑛 = • ⇒ label 𝑛 = • ⇒ label 𝑛 = aFig. 14. Running example: Symbolic bonsai trees. The left table shows labels (label(n1) to label(n7)) of fivesymbolic bonsai trees. The right table shows constraints that rule out syntactically incorrect trees.

{b1,b2} of concrete trees. The tree selector is a symbolic constant c . The symbolic expression c

selects b1 while ¬c selects b2. Except for nodes n2 and n5, tree B3 has constant labels. Tree B4 is afresh symbolic tree of depth 2. Its labels are unconstrained symbolic constants, which means that

B4 represents the set F2 of bonsai trees of depth 2 or less, including those that do not adhere to the

grammar G ′ in Figure 12. Tree B5 is like B4 except that it excludes syntactically incorrect trees. It

does so through syntactic constraints s(ni ) in the assertion store. As a result, B5 represents exactlythe trees in L2(G

′), the trees described by grammar G ′ that are of depth 2 or less. The constraintss(ni ) shown in the figure have been produced manually; our implementation described in Section 4produces them automatically.

Figure 15 symbolically compiles fb . The input symbolic tree Bin with symbolic labels C ={v1, . . . ,v7} is first pattern-matched, producing three matches shown in columns 2ś4. The threecolumns also show the effect of substituting the bound pattern variables a, x , and y into the right-hand-side of the for statements. Finally, the results of these substitutions are merged to produce the

resulting symbolic bonsai tree Bout. The tree Bout gives the symbolic semantics of fb with respect

to the input tree Bin: for any concrete assignment to C , the concretizations of Bin and Bout give aninput-output pair of concrete bonsai trees that agree with the semantics of fb in Figure 13.

With Bout in hand, we are ready to solve for the concrete input tree that makes f output aparticular output tree, say swap(a,a). Translated to bonsai form, we are looking for bin such thatfb (bin) = •(a, a). (Our goal is to invert the second example from Figure 13.) We want to solve

for C = {v1, . . . ,v7}: an assignment to C that satisfies Bout = •(a, a) will concretize Bin to bin. To

obtain constraints over vi ’s, we translate the equation Bout = •(a, a) to the per-node constraint

label(Bout.n1) = •∧ label(Bout.n2) = a∧ label(Bout.n5) = a, which is further expanded per Figure 15.

The solution is v1 = •,v2 = id,v5 = •,v6 = a,v7 = a, which concretizes Bin to •(id, •(a, a)), orid(swap(a,a)) in AST form.

4 IMPLEMENTATION

This section illustrates how to implement Bonsai trees in a specific language, Racket [Flatt and PLT2010], using a specific symbolic evaluator, Rosette [Torlak 2016]. Of course, guided by the rules inFigure 11, these ideas can be implemented in other symbolic evaluation platforms as well [Sen et al.2013, 2015], although perhaps with more effort. After implementing bonsai trees, we demonstratehow to use them to check the arithmetic language in Figure 16.

Example language specification. We start by illustrating how a language is specified in ourimplementation of bonsai trees. Figure 16 shows an executable specification of the simply typed

Proceedings of the ACM on Programming Languages, Vol. 2, No. POPL, Article 62. Publication date: January 2018.

Bonsai: Synthesis-Based Reasoning for Type Systems 62:15

match

expression a for a id 𝑥 for 𝑥 swap 𝑥 𝑦

for swap 𝑦 𝑥 merge 𝜋𝑎, 𝐵𝑎, merge 𝜋id, 𝐵id, merge 𝜋swap, 𝐵swap, []path

condition

𝜋a 𝜋id 𝜋swap𝑣 = a 𝑣 = ● ∧ 𝑣 = id 𝑣 = ● ∧ 𝑣 ≠ idresulting

tree 𝐵a 𝐵id 𝐵swap 𝐵𝑜𝑢𝑡label 𝑛 a 𝑣 ● {𝜋𝑎 ⊢ a, 𝜋id ⊢ 𝑣 , 𝜋swap ⊢ ●}label 𝑛 – 𝑣 𝑣 {𝜋id⊢ 𝑣 , 𝜋swap ⊢ 𝑣 }label 𝑛 – – 𝑣 {𝜋swap⊢ 𝑣 }label 𝑛 – – 𝑣 {𝜋swap⊢ 𝑣 }label 𝑛 – 𝑣 𝑣 {𝜋id⊢ 𝑣 , 𝜋swap ⊢ 𝑣 }label 𝑛 – – 𝑣 {𝜋swap⊢ 𝑣 }label 𝑛 – – 𝑣 {𝜋swap⊢ 𝑣 }

𝑛𝑛𝑛𝑛 𝑛 𝑛 𝑛

tree that matched

the pattern:

the result tree

after substituting

the matched tree:

a ida

𝑥𝑥

𝑥𝑥𝑦𝑦

def 𝒇𝒃(t) = match t ona for a

● id, 𝑥 for 𝑥● 𝑥, 𝑦 for ● 𝑥, 𝑦

end

The values 𝑣 …𝑣 refer to

the symbolic labels of the

symbolic input tree 𝐵𝑖𝑛,

that is, 𝑣𝑖 = label 𝐵𝑖𝑛 . 𝑛𝑖 .

label 𝑛 ← 𝑣𝑛𝑛

Fig. 15. Running example: The symbolic semantics of fb . The semantics is given by the tree Bout which is

the symbolic return value of fb on a symbolic input tree of depth 2. The node labels of Bout, shown in the

rightmost column, are joins over the input tree node labels, denoted v1, . . . ,v7. The output tree Bout is themerge of the symbolic trees that result from the three pattern matches, shown in columns 2ś4.

arithmetic language [Pierce 2002]. We modified the typechecker to introduce unsoundness byremoving the rules that check whether the alternative branches of an if term have the same type.We automatically synthesize a soundness counterexample by symbolically compiling the syntaxchecks, the typechecker and the evaluator; we then query a solver for a program that succeeds inthe typechecker, fails in the evaluator, and satisfies the syntactic rules (see Figure 2).

Constructing a symbolic tree. Recall rule SC-Tree from Figure 11. A fresh symbolic constanton each node dictates whether it is an internal node or a leaf, and, if it is a leaf, which symbol itrepresents. In practice, we use two symbolic constants: one Boolean that decides whether or not itis a leaf, and a second to decide which leaf (if it is indeed a leaf).We can construct such a node by symbolically evaluating the call to binary-tree! shown in

Figure 17 (left). The symbolic evaluator recursively unrolls the call until the depth limit reacheszero; at this point, we follow SC-Leaf by creating a single symbolic constant to represent whichsymbol is represented by the leaf. fresh-symbolic! is a macro that symbolically selects from a setS . Internally, it is implemented by creating symbolic integer constants and maintaining a map fromintegers to members of S . Inner nodes are represented by cons pairs so we can take advantage ofexisting Racket primitives for manipulating S-expressions.

The result of symbolically evaluating binary-tree! is a data structure that embeds the symbolicconstants si , pictured in Figure 17 (right). Note that this tree contains no syntactic information:concrete assignments of values to each si may create syntactically invalid ASTs.

Pattern matching. TheBonsai pattern-matching macro, tree-match, is implemented by followingthe rules SC-Test-{ Const-Pass, Const-Fail, Sym, Pattern-Variable, Inner-Node, Trans }, asexpected. Since the pattern matcher is written within a symbolically evaluated language, SC-Test-Ite follows automatically from the symbolic evaluator’s behavior on operations on ites. Figure 18

Proceedings of the ACM on Programming Languages, Vol. 2, No. POPL, Article 62. Publication date: January 2018.

62:16 Kartik Chandra and Rastislav Bodik

; syntax

(define arithmetic-syntax

'([exp zero

(succ exp)

(if exp exp exp)

(zero? exp)

]))

; evaluator

(define (execute t)

(tree-match t

'zero (lambda () 0)

'(succ _) (lambda (x) (+ (execute x) 1))

'(if _ _ _) (lambda (c t f)

(if (execute c)

(execute t)

(execute f)))

'(zero? _) (lambda (x) (= 0 (execute x)))))

; type rules

(define (check t)

(tree-match t

'zero (lambda () 'nat)

'(succ _) (lambda (x)

(assert (eq? (check x) 'nat))

'nat)

'(if _ _ _) (lambda (c t f)

(assert (eq? (check c) 'bool))

(define t+ (check t))

(define f+ (check f))

;; (assert (eq? t+ f+))

;; omitted!

t+)

'(zero? _) (lambda (x)

(assert (eq? (check x) 'nat))

'bool)))

Fig. 16. An executable specification of the arithmetic language [Pierce 2002] that we would like to check forsoundness. tree-match is a pattern-matching macro, which checks an input AST against the given patterns,and, upon finding a match, executes the accompanying lambda with the contents of the blanks as arguments.

(define (binary-tree! depth)

(if (> depth 0)

; SC-Tree

(if

(eq? 'leaf (fresh-symbolic! '(leaf inner)))

(fresh-symbolic! all-leaves)

(cons

(binary-tree! (- depth 1))

(binary-tree! (- depth 1))))

; SC-Leaf

(fresh-symbolic! all-leaves)))

(define all-leaves '(zero succ zero? if))

; for arithmetic language

(ite (= s1 leaf)

s2

(cons (ite (= s2 leaf) s4 ...)

(ite (= s3 leaf) s5 ...)))

Fig. 17. A procedure that generates a fresh symbolic Bonsai tree (left) and its output (right).

illustrates the behavior of the pattern matcher using the example of syntactic constraints (seebelow). Note that per SC-Match-Nonempty, the pattern matcher is sensitive to the order in whichpatterns are presented. If a tree matches the patterns p1 and p2, the earlier one (p1) is given priority.The path condition for the p2 match will stipulate that the p1 match fails. This resolves ambiguitieswhen the patterns are not mutually exclusive.

Syntactic constraints. To assert that the Bonsai tree embeds only syntactically correct trees, wecan take advantage of pattern matching. We invoke (assert (is-program? tree)), which traverses

Proceedings of the ACM on Programming Languages, Vol. 2, No. POPL, Article 62. Publication date: January 2018.

Bonsai: Synthesis-Based Reasoning for Type Systems 62:17

(define (is-program? p)

(tree-match p

'zero (lambda () #t)

'(succ _) (lambda (x) (is-program? x))

'(if _ _ _) (lambda (c t f)

(and (is-program? c)

(is-program? t)

(is-program? f)))

'(zero? _) (lambda (x) (is-program? x))

'_ (lambda () #f)))

(define (is-program? p)

(cond [(eq? p 'zero) #t]

[(and (pair? p)

(eq? (car p) 'succ)

(null? (cddr p)))

(is-program? (cadr p))]

[(and (pair? p)

(eq? (car p) 'if)

(pair? (cdr p))

(pair? (cddr p))

(pair? (cdddr p))

(null? (cddddr p)))

(and (is-program? (cadr p))

(is-program? (caddr p))

(is-program? (cadddr p)))]

[(and (pair? p)

(eq? (car p) 'zero?)

(null? (cddr p)))

(is-program? (cadr p))]))

Fig. 18. Placing syntactic constraints on the bonsai tree. The code results from the macro’s expansion of thearithmetic grammar (left). One more level of expansion (right) reveals the internal bonsai tree structure.

the tree and generates constraints required by the grammar defined in Figure 16. This call expandsinto the procedures in Figure 18. More generally, we provide the macro syntax-matches?, whichconverts a grammar in BNF to a syntax checker in the style of is-program?.

Merge. Recall from Section 2 that symbolic evaluators already perform merges on conditionalbranches. Rosette, in particular, also performs some simplification: when merging two cons pairs,it first individually merges the heads and tails, and then creates a fresh cons pair of the two merges.This results in half of the merge operation‘’for free"; we must take care to arrange for Rosette tomerge pairs separately from leaves since the simplification does not apply to merging a leaf with aninner node. This, however, is always the case as long as the ites being merged satisfy the invariantdescribed in Section 3. Thus, by carefully regulating the creation of ite values in binary-tree!,we force Rosette to always perform a Bonsai merge on trees.

4.1 The Bonsai Checker

Equipped with the bonsai tree, we now return to our goal of symbolically reasoning about thelanguage in Figure 16. We begin by searching for counterexamples to soundness. The bonsai treelets us efficiently execute the algorithm shown earlier in Figure 2. First, it creates a symbolicrepresentation of the set of all ASTs up to some maximum size m. Next, it computes symbolicrepresentations of trees that (a) are syntactically valid, (b) pass the typechecker, and (c) fail in theinterpreter. Finally, it asks the solver to find a tree in the intersection of these sets. If it exists, thisprogram is a counterexample.Recall that the typechecker in Figure 16 has a soundness bug, i.e., failing to verify that the two

clauses of ‘if’ expressions have the same type. Bonsai catches this error by executing the algorithmdescribed above:

(define program (binary-tree! 10))

(assert-pass (syntax-matches? arithmetic-syntax program))

(assert-pass (check program))

Proceedings of the ACM on Programming Languages, Vol. 2, No. POPL, Article 62. Publication date: January 2018.

62:18 Kartik Chandra and Rastislav Bodik

(assert-fail (execute program))

(define solution-model (solve))

(echo (evaluate program solution-model))

The first line creates the formula characterizing the set A of all abstract syntax trees of depth 10or less that adhere to the grammar. The second line creates the formula S , which describes thesubset of programs in A that are syntactically valid. The third line evaluates the formula on thetypechecker to create the formulaT , which describes the set of programs that pass the typechecker.The fourth line does the same for the interpreter, negating the formula generated to produce I ′.The fifth line queries the solver for a model that satisfies the formula S ∧T ∧ I ′, and line six simplyinterprets the model to yield a concrete program that is then echoed to the user. Bonsai returns thefollowing counterexample in less than 2 seconds.

'(succ (if (zero? (succ zero)) zero (zero? zero)))

Indeed, this program passes the typechecker but causes the interpreter to crash by attempting toincrement a boolean value. If we reintroduce the omitted check in our typechecker, our systemfails to find a counterexample, showing that this check fixes the bug.

4.2 A Variation on theQuery

A small variation on the preceding counterexample-finding algorithm lets us compute a łdiffž oftwo typecheckers by querying for programs that are accepted by one but rejected by the other.Suppose a student submits the unsound arithmetic typechecker described above to a teacher. Theteacher can compare it to a correct reference implementation and give the student feedback in theform of a program accepted by the former but rejected by the latter.To synthesize such a program, we initialize T , the set of all programs that pass the student’s

implementation, andU , the set of all programs that pass the teacher’s implementation. Then, wequery the SAT solver for any member of the set S ∧T ∧U ′.

(define program (binary-tree! 10))

(assert-pass (syntax-matches? arithmetic-syntax program))

(assert-pass (student-check program))

(assert-fail (teacher-check program))

(define solution-model (solve))

(echo (evaluate program solution-model))

; ==> '(if (iszero? zero) zero (iszero? zero))

Note that this program is not a counterexample to soundness because it does not fail at runtime.However, it behaves differently on the two typecheckers and thus represents a member of theirłdiff."

5 CASE STUDIES

We now demonstrate how designers can useBonsai to explore three more advanced type systems: anobject-oriented, an ownership-based, and a dependent type system. Besides looking for soundnessissues, we also run queries that help us better understand the type system’s restrictions. Some ofthe simpler examples are available online; in such cases, footnotes contain links to repositories.

5.1 Featherweight Java

Featherweight Java [Igarashi et al. 1999] is a calculus that models essential features of the Javaprogramming language. It has classes with methods and inheritance via subclasses.

We test Bonsai’s ability to identify the classical bug in function subtyping, which is to insist thatfunction arguments are covariant, i.e., to assume that s <: t implies (s → u) <: (t → u). The correctapproach is to require contravariance in function arguments, i.e., if s <: t then (s → u) :> (t → u).

Proceedings of the ACM on Programming Languages, Vol. 2, No. POPL, Article 62. Publication date: January 2018.

Bonsai: Synthesis-Based Reasoning for Type Systems 62:19

This bug, of historical interest, appeared in early versions of the Eiffel programming language, andits discovery surprised the Eiffel community [Cook 1989].After we introduced the bug into our Featherweight Java implementation, Bonsai reported the

following counterexample in a few seconds. This counterexample has been manually formatted toJava syntax:

class A extends Object {

B bar(A arg) {

return arg.bar(this);

} }

class B extends A {

B quux(C arg) {

return new A().bar(arg);

} }

class C extends B {

B bar(B arg) {

return arg.quux(this);

} }

new C().quux(new C());

The call stack execution trace:

Error: class A has no method quux

in new2 A().quux(new1 C()); // line 11

...new1 C().bar(new2 A()); // line 3

...new2 A().bar(new1 C()); // line 7

...new0 C().quux(new1 C()); // line 13

...[init]

In the stack trace, we can see that the failure is caused by new C().bar(new A()). This call occursbecause the method C.bar is allowed in place of A.bar. This usage is unsound but it is allowed bythe covariant argument types of the methods. This explains why the covariant argument rule isproblematic: whenever C <: A, the rule guarantees that we can safely use a C wherever we usean A. This intended guarantee is violated because A.bar accepts inputs that C.bar does not. Notethat this issue does not apply to true Java because shadowing an inherited method with a differentargument type overloads that method name, creating a distinct method.The symbolic evaluation of Featherweight Java is much more demanding than that of the

arithmetic language. To typecheck and evaluate Featherweight Java, we must make lookups into aclass table in order to traverse the subclass hierarchy; when the class table is symbolic, a lookupcould potentially be any of the table’s entries.. Representing the class table as a bonsai tree lets usefficiently merge all possible lookup results into small, compact formulas that the interpreter caneasily manipulate.

5.2 Ownership Java

Our next language augments Featherweight Java with an object-oriented ownership type system,Ownership Java [Boyapati 2004; Boyapati et al. 2003]. Ownership Java’s safety guarantee is thatwell-typed objects must access only objects that they own. The type system statically enforces thisencapsulation guarantee by trackingthe binary ownership relation on objects, which is declared inannotations on each class, method, and field.

A canonical example of Ownership Java is a stack implementation. The stack object is allowed toaccess its inner linked list but not the objects stored in the linked list; those can be accessed only bythe stack’s client. The stack class is thus defined with two owner parameters. The first parameter isspecial because it is the owner of the stack object at runtime. The second owner parameter denotes

Proceedings of the ACM on Programming Languages, Vol. 2, No. POPL, Article 62. Publication date: January 2018.

62:20 Kartik Chandra and Rastislav Bodik

the client who owns the data in the list; the stack class implementation uses this parameter toannotate its methods and the encapsulated linked list. For example, the return value from the popmethod is annotated with the second owner parameter, letting the client access objects retrievedfrom the stack.To prove soundness of Ownership Java, Boyapati et al. [2003] introduce an łownership tree," a

transitive view of the ownership relation: o2 is a direct descendant of o1 if o1 owns o2. The ownershiptree is rooted at the special owner world. Using the ownership tree, the encapsulation theorem isstated: object x can access object o only if (1) x = o, (2) x is a descendant of o in the ownership tree,or (3) x is an inner class object of o. Tools like Pipal (Section 7) can check this invariant.

With the Bonsai Checker, however, we need not formulate any such invariant. Instead, we needonly one straightforward dynamic check to detect unsoundness. The check asserts that the ownerstored in the accessed object (object) is owned by the receiver of the accessor method (this):

(define (evaluate-expression ...)

...

(assert

(owns? this object)

"Error: attempted invalid access!")

...)

This check is easy to implement because it is a direct statement of the desired guarantee.We believe that a counterexample that violates this assertion provides a more intuitive under-

standing of the issue than a counterexample that violates the encapsulation theorem because atrace shows the origins of the unsafe state. We reproduce one such counterexample below, foundby Bonsai in about 90 seconds when we disabled the static owner check of method calls:

class Main<O1, O2> extends Foo {

Main<O1, O1> meth3(Main<O1, O1> arg) {

(arg.meth3(arg)).main(arg);

} }

class Foo<O1, O2> extends Object {

Main<O2, O2> main(Main<O2, O2> arg) {

(new Main<O2, O2>()).meth3(arg);

} }

new Main<world, world>().main(new Main<world, world>());

This program fails at runtime because Foo<O1, O2> should not be allowed to access methods ofa Main<O2, O2> on line 7. It should only be allowed to access an instance of Main that it owns,which could be a Main<this, O1> or a Main<this, world>.

Why have additional constraints on owners? Next, we illustrate Bonsai’s ability to compare typesystems by formulating queries involving multiple typecheckers and interpreters.

When instantiating a new object with owner parameters o0...n , the Ownership Java type systeminsists that the object’s owner o0 be a descendant of all other owner parameters o1...n in theownership tree. It is not immediately obvious why this condition is imposed Ð it is dictated byfeatures interacting with subtyping [Boyapati et al. 2002].A language designer could thus ask the questions, łWhat are the consequences of this extra

check? Specifically, does adding the check reject any correct programs that were accepted priorto adoption of this rule?" We answer this question with Bonsai. First, we create a version ofOwnership Java (OJ) called Reduced Ownership Java (ROJ), which prohibits subtyping and removesthe additional constraint on owners. As expected, Bonsai fails to find a soundness counterexamplein the ROJ language.Now we are ready to query for the program of interest. This program must (1) fail the OJ

typechecker, (2) pass the ROJ typechecker, and (3) succeed at runtime. This program is rejected by

Proceedings of the ACM on Programming Languages, Vol. 2, No. POPL, Article 62. Publication date: January 2018.

Bonsai: Synthesis-Based Reasoning for Type Systems 62:21

OJ, does not use subtyping, and is correct, per the three conditions in the query. Bonsai producesthe following counterexample in just over a minute.

class Main<O1, O2> {

Main<world, O2> main(Main<O1, O2> arg) {

return new Main<world, O2>();

} }

new Main<world, world>.main(new Main<world, world>)

This program fails the Ownership Java typechecker because world is not a descendant of O2 in theownership tree. However, this program is otherwise correct: it passes our ROJ typechecker anddoes not fail at runtime. While the counterexample itself is not particularly profound, it providesevidence that a less restrictive type rule could be designed.The comparison between OJ and ROJ is made possible by the novel way in which Bonsai lets

users compose sets of constraints when querying the solver. Crucially, by comparing two versionsof a typechecker, we can better understand how the difference in type rules affects which programsare rejected.

Simplifying counterexamples with minimization. OJ programs can easily grow extremely largebecause of unused classes and methods. To obtain an easily understandable counterexample, wepresent our SMT solver with the additional goa of minimizing the size of the generated program.Thus, the preceding counterexamples are minimal: no smaller program satisfies the constraints weimposed.

5.3 Dependent Object Types (DOT)

Dependent Object Types, or DOT [Amin et al. 2014], is both a formalization of the essence ofScala [Odersky and al. 2004] and the basis for dotty [Odersky 2015], an experimental Scala compiler.This makes DOT a good candidate for a real-world type system that underlies a common, modern,and practical programming language.DOT features path-dependent types. For example, the signature

feed(pet : Animal, food: pet.FoodType) : Boolean

describes a function whose second argument has a type pet.FoodType that depends on the firstargument pet, hence the name path-dependent types. DOT also provides record types, such as{ name : String }, and intersection types, such as type Nat = Int ∧ Positive.

In this section, we examine the Scala soundness issue SI-9633 [Amin and Tate 2016a,b]. Thisissue was discovered by reasoning about extending DOT with some features of full Scala, namely,the addition of the null object. Here, we demonstrate how to add null to DOT and use the BonsaiChecker to find a counterexample to the issue.1

With the Bonsai Checker library, we need just over 400 lines of code to implement DOT. Theimplementation closely follows the specification by Amin et al. [2016]. For example, three of thesubtyping rules are defined below:

Γ ⊢ T <: ⊤ Γ ⊢ ⊥ <: TΓ ⊢ S <: T Γ ⊢ S <: U

Γ ⊢ S <: T ∧U

Their corresponding implementations in Bonsai are almost direct translations:

(define/rec (dot-subtype? sub sup)

(tree-match `(,sub . ,sup)

'(_ . Any) (lambda (T) #t)

'(Nothing . _) (lambda (T) #t)

1Our models of NanoDOT and NanoScala are available at https://bitbucket.org/bonsai-checker/dot/.

Proceedings of the ACM on Programming Languages, Vol. 2, No. POPL, Article 62. Publication date: January 2018.

62:22 Kartik Chandra and Rastislav Bodik

⟨term⟩ ::= ⟨term⟩ . ⟨name⟩ selection

| ⟨definition⟩ object instantiation

| ⟨name⟩ variable reference

| λ ( ⟨name⟩ : ⟨type⟩ ) : ⟨type⟩ { ⟨term⟩ } lambda

| ⟨term⟩ ( ⟨term⟩ ) application

⟨definition⟩ ::= { val ⟨name⟩ = ⟨term⟩ } field

| { type ⟨name⟩ = ⟨type⟩ } type

| ⟨definition⟩ ∧ ⟨definition⟩ aggregate

⟨type⟩ ::= ⊤ | ⊥ top, bottom

| ⟨term⟩ . ⟨name⟩ type projection

| { val ⟨name⟩ : ⟨type⟩ } field

| { type ⟨name⟩ >: ⟨type⟩ <: ⟨type⟩ } type

| ⟨type⟩ ∧ ⟨type⟩ intersection

| ⟨type⟩ → ⟨type⟩ function

⟨name⟩ ::= a | b | c | φ

Fig. 19. The NanoDOT language.

'(_ . (and . (_ . _))) (lambda (S T U)

(and (dot-subtype? sub T)

(dot-subtype? sub U)))

...))

The NanoDOT language. Figure 19 shows the syntax of the version of DOT we use to investigateissue SI-9633. There are two deviations. First, for convenience, we add a global variable φ of type⊤ to the environment; its value will populate records in counterexamples. This addition makescounterexamples more readable: since DOT itself does not provide any primitive types, in pure DOT,Bonsaiwould have to use type-tags as leaf values in counterexamples. Second, we omit recursivetypes because DOT’s definition of such types uses call-by-name semantics, which is incompatiblewith our call-by-value implementation.

We now discuss two counterexamples found usingBonsai. The first explains a restriction imposedby DOT, while the second is a variant on the counterexample in SI-9633.

Disjoint domains in intersection types. DOT restricts intersection types and aggregate definitionsto field-disjoint terms. That is, AndDef-I reads [Amin et al. 2016]:

Γ ⊢ d1 : T1 d2 : T2dom(d1) ∩ dom(d2) = ϕ

Γ ⊢ d1 ∧ d2 : T1 ∧T2

where di are definitions and dom(d) is the set of fields bound in d .Readers lacking expertise in type systems might not immediately see the rationale for this

restriction until they simply remove AndDef-I from our NanoDOT implementation and observethe results. Bonsai constructs the following useful counterexample:

λ ( b: { val a: { val a: ⊤ } } ): ⊤ {

b.a.a

} ( { val a = φ } ∧ { val a = { val a = φ }} )

Proceedings of the ACM on Programming Languages, Vol. 2, No. POPL, Article 62. Publication date: January 2018.

Bonsai: Synthesis-Based Reasoning for Type Systems 62:23

When given this program, the interpreter crashes because b.a is undefined. Since b is bound toan intersection where both sides have a value for the field a, the value b.a could be either φ or{ val a = φ }. If the left side is chosen, we get a runtime error. If the right side is chosen, we getφ. Since our implementation gives precedence to the left member of an intersection, our interpreterthrows a runtime error, making this program a counterexample. In summary, this example showsthat for the interpreter to make a greedy choice among intersected values, these values must haveno conflicting fields, which explains AndDef-I.

Collapsing the subtype hierarchy using bad bounds (SI-9633). A common soundness issue withdependent types relates to bad bounds, which allows the creation of uninhabitable types. For instance,the type { type S >: ⊤ <: ⊥ }, which means that ⊤ <: S <: ⊥, should clearly be uninhabitable.Otherwise, we could collapse the subtype hierarchy and prove all types equivalent, much like theway a single contradiction allows proving anything in an inconsistent logical system [Amin et al.2016].DOT does not check the bounds of dependent types but ensures that values have types with

correct bounds. So, one can create the badly bounded type { type S >: ⊤ <: ⊥ }, but values aresyntactically constrained to have coinciding lower and upper bounds using the construct { type S

= ⊤ }, which expands to { type S >: ⊤ <: ⊤ }. As a result, while types with bad bounds canbe created in DOT, it is impossible to instantiate values with badly bounded dependent types. Thisrestriction makes it easy to check the bounds of a value upon its instantiation.In Scala, however, it is possible to create a value without going through a constructor, which

makes bounds checking difficult. Odersky [2016] describes three situations where this is possible:using lazy for delayed instantiation, using null for uninstantiated values, and using type projection.In these cases, the compiler cannot check for bad bounds of dependent types. In Scala, it is possibleto instantiate values with badly bounded dependent types. This is the basis for SI-9633.

Scala’s soundness bugs, including SI-9633, are traditionally demonstrated with an example thatcasts an arbitrary value to the type Nothing. The fact that Nothing is a subtype of all other typeslets us cast a value to any other value, for example, via the sequence Number→ Nothing→ String.In practice, such an operation leads to the runtime ClassCastException in the JVM.

To explore bugs related to bad bounds, we extend NanoDOT to NanoScala with a null value oftype ⊥. We also add a term to allow casting. The term cast t to T is typechecked in the usualmanner, namely, by ensuring that t : S ∧ S <:T . At runtime, the cast performs the same check todetect counterexample programs. This is sufficient to model the relevant Scala issues in NanoScala.We can now query Bonsai for a NanoScala equivalent of the SI-9633 counterexample. This

program was found in around 4 minutes:

cast

(λ (a: {type b >: ⊤ <: ⊥}): a.b {

φ

})(cast null to {type b >: ⊤ <: ⊥})

to ⊥

Why does this program typecheck? The inner cast typechecks because null is of type ⊥, which is asubtype of all types including the type { type S >: ⊤ <: ⊥ }, regardless of its bad bounds. Theouter cast to ⊥ typechecks because the type a.b is bounded above by ⊥; this is allowed becausebadly bounded types can be created in DOT.

Why does the program fail at runtime? At runtime, the issue occurs when null is cast to { type S

>: ⊤ <: ⊥ }. This type is uninhabited; yet, modeling Scala, NanoScala nevertheless casts null toit. The execution continues with a value with badly bounded dependent types. As a consequence,we can now invoke the lambda expression synthesized by Bonsai. This lambda should not be called

Proceedings of the ACM on Programming Languages, Vol. 2, No. POPL, Article 62. Publication date: January 2018.

62:24 Kartik Chandra and Rastislav Bodik

trait A { type L <: Nothing }

trait B { type L >: Any }

def toL(b: A with B): b.L = 123

val p: B with A = null

toL(p): Nothing

trait A { type L <: Nothing }

trait B { type L >: Any }

def toL(b: B)(x: Any): b.L = x

val p: B with A = null

println(toL(p)("hello"): Nothing)

Fig. 20. Comparing Bonsai’s counterexample with SI-9633. Left: A counterexample found by Bonsai, manuallytranslated to Scala with variables renamed for clarity. Right: The counterexample reported on SI-9633 [Aminand Tate 2016b], copied verbatim.

since its argument is of an uninhabited type. However, if called, it returns φ : ⊤, which is cast to ⊥.This raises an exception.

In summary, NanoScala lets us cast a non-null value to ⊥, which is at the heart of the SI-9633counterexample. This is because the NanoScala null and cast terms violate NanoDOT’s propertythat all dependent types be backed with an instantiated value.Two important considerations affect our implementation of DOT. First, to prevent Bonsai from

synthesizing counterexamples related to null dereferencing, we must tell it to disregard bugs in-volving the selection operator. Second, our current implementation of NanoDOT does not maintaintype information at runtime. Thus, we raise an exception only on casts of non-null values to ⊥,which can be detected without any runtime type information. This is sufficient to catch the runtimeerrors of interest to us; the Scala example Number → String would fail when Number is cast toNothing, which we model as DOT ⊥.Finally, we can translate the NanoScala counterexample to Scala. We must manually make two

minor changes. First, unlike DOT, Scala does check type annotations that have bad bounds. Asimple layer of indirection suffices to avoid this check: we create our badly bounded type { type S

>: ⊤ <: ⊥ } as an intersection of two well-bounded types, P and Q:

trait P {type B <: Nothing}

trait Q {type B >: Any}

Note that Scala’s intersection types are not commutative due to shadowing, so we must take careto use either P with Q or Q with P depending on the context.

The second change is purely syntactic. Scala’s lambdas do not allow path-dependent return typeannotations in the same style as methods, so we translate the lambda to a named method.

def bad(a: P with Q): a.B = 123

bad(null: Q with P): Nothing

When run with scala, this example produces:

java.lang.ClassCastException:

java.lang.Integer cannot be cast

to scala.runtime.Nothing

After variable renaming, this counterexample corresponds almost exactly with the one reported onSI-9633. Figure 20 presents a side-by-side comparison.

Discussion. We demonstrate the use of Bonsai to study DOT, a language with a complex typesystem. We introduce two subtle bugs in DOT by modifying the language and show that in eachcase Bonsai produces small, readable counterexamples after only a couple of minutes. The lattercounterexample reproduces soundness counterexample SI-9633 for the Scala compiler.

Proceedings of the ACM on Programming Languages, Vol. 2, No. POPL, Article 62. Publication date: January 2018.

Bonsai: Synthesis-Based Reasoning for Type Systems 62:25

6 EMPIRICAL EVALUATION OF PERFORMANCE

In this section, we evaluate the efficiency of Bonsai. We consider the speed of symbolic evaluationand solving as well as the sizes of counterexamples and program spaces that are explored. Wefind that Bonsai reliably finds bugs in a few seconds to a few minutes. When left to run for longerperiods,Bonsai explores spaces containing programs much larger than the minimal counterexample,providing a margin of assurance.

6.1 Comparing Bonsaiwith Fuzzers

Here, we compare Bonsai to two fuzzers mentioned in Section 7: the syntactic fuzzer built intoRedex [Klein et al. 2012c], and the type fuzzer based on judgement trees [Fetscher et al. 2015].

The benchmark. We implemented in Bonsai the łstlc+lists" language2 from the Redex bench-mark [Robert Bruce Findler and Felleisen 2017] and introduced each of the nine bugs to ourimplementation (all one-line changes). We compared the performance of Bonsai to that of thesyntactic fuzzer and type fuzzer from the Redex benchmark with respect to both the time taken tofind a counterexample, and the average size of counterexamples found. All tests were run on thesame machine.

Results. Our results are plotted in Figure 21 (left), and the times and sizes, correlated by bug, arelisted in Table 1. We make two important observations below.

We first examined Bonsai’s consistency. While fuzzers were slightly faster than Bonsai on somebenchmarks, they were several orders of magnitude slower on others, taking anywhere from afew minutes to over an hour. Bonsai, on the other hand, took between 1 and 5 seconds on everybenchmark. Consider, for example, bugs 4 and 5, which were the hardest for fuzzers to find despitetheir being labeled łshallow" errors in the benchmark suite: these bugs require a specific set ofinteractions in the counterexample program, and thus the probability of randomly generating sucha program is extremely low, making the bugs orders of magnitude more difficult for fuzzers toreveal. Bug 5 changed the interpreter to return the head of a list when tail was applied, while bug4 assigned the return type of cons to int rather than to Listof int as expected. Bonsai foundthese bugs as fast as it found the others.

Second, we compared the sizes of counterexamples produced by the two tools. The type fuzzer,though efficient, generated extremely large,complex programs as counterexamples. This likely con-tributed to its effectiveness since larger programs are more likely to uncover soundness bugs [Yanget al. 2011]. However, such programs are extremely difficult for users to reason about since theymay have (for example) dozens of layers of nested lambdas. In contrast, Bonsai consistently pro-duced small counterexamples. In almost all cases, Bonsai’s counterexamples were identical to thosesuggested by the Redex benchmark authors, with one being even smaller.Inspired by the latter observation, we evaluated Bonsai’s efficiency in constructing minimal

counterexamples by querying the solver for a solution that minimizes counterexample size. Table 2shows our results. We found that most counterexamples Bonsai produced were already minimal,even without using the minimization query. However, for the three bugs that could be furtherminimized, the minimization query took only 10%-30% longer than the standard query.

As a final demonstration of Bonsai’s efficiency compared to fuzzers, we considered a historic buginvolving let-polymorphism with mutable references, in e.g., ML [Tofte 1990b; Wright and Felleisen1994]. The Redex benchmark includes this łclassic let+referencesž bug as let-poly-2. However, thesyntactic fuzzer was run on it for several days with no results. The type fuzzer could not model

2Our model of stlc+lists, with patchfiles to introduce each bug, is available at https://bitbucket.org/bonsai-checker/

stlc-benchmark/.

Proceedings of the ACM on Programming Languages, Vol. 2, No. POPL, Article 62. Publication date: January 2018.

62:26 Kartik Chandra and Rastislav Bodik

Table 1. Bonsai performance vs that for syntax and type fuzzers described in the text. Fuzzer statistics areprovided as ratios against Bonsai statistics. ł+" indicates a lower bound due to a one-hour timeout. Size ismeasured in nodes (n) as per the Redex benchmark: łthe number of pairs of parentheses and atoms in thes-expression representation of the termž [Robert Bruce Findler and Felleisen 2017]. The left side of Figure 21represents this data on a scatter plot. Note that the type fuzzer cannot model let-poly-2 due to polymorphismand CPS-transformed judgment rules [Fetscher et al. 2015].

Bug Bonsai (sec) Syntax/Bonsai Type/Bonsai Bonsai Syntax/Bonsai Type/Bonsai

Time Size

stlc-1 0.86s 0.61 1 5n 1 57

stlc-2 0.99s 56 7.7 9n 1 14

stlc-3 0.84s 0.25 0.054 5n 3.6 51

stlc-4 3s 1200+ 34 17n N/A 1.8

stlc-5 0.99s 3600+ 40 9n N/A 14

stlc-6 2s 1000 15 17n 0.76 5.7

stlc-7 1s 8 0.07 9n 2 19

stlc-8 4.6s 14 0.061 29n 0.88 6.2

stlc-9 1.2s 0.4 0.049 15n 1.4 13

Full suite 16s 600+ 12

let-poly-2 1200s 24-hr timeout Cannot find 47n N/A N/A

let-poly due polymorphism, which requires the generator to make parallel choices that must matchup, and CPS-transformed type judgements, which impede its termination heuristics [Fetscher et al.2015].Bonsai, on the other hand, found this bug in just over twenty minutes,3 with a counterexamplealmost identical to that presented in the literature [Tofte 1990b].

6.2 Comparing Bonsaiwith Pipal

We compared the scalability of Bonsai to that of Pipal as described by Roberson et al. [2008]. Thiscomparison was only an approximation because Bonsai and Pipal encode different search spaces(symbolic programs and symbolic intermediate states, respectively). Thus, for example, Bonsaimust model all heap objects, while Pipal must limit exploration to four heap objects and n integerliterals. Furthermore, the Pipal experiments were performed using different symbolic executionframeworks and solvers. Nevertheless, Pipal results for comparable trials are listed in Table 3.

These measurements illustrate that Bonsai and Pipal generally scaled to around the same orderof magnitude. Thus, we found that even though Bonsai performed symbolic execution on boththe typechecker and interpreter, it still had comparable performance to Pipal, which performedsymbolic execution only on the typechecker. That is, compared to Pipal, Bonsai did not sacrificeefficiency for its versatility and ease-of-use.

6.3 Bonsai Encoding

Here, we compared the classical łsyntax tree" described in Section 3 (with subtree sharing) tot theBonsai tree. We evaluated the time required to check a type system as a function of program spacesize and separate the times required for symbolic compiltion and solving. We found that the Bonsaitree scaled significantly better than the classical syntax tree, allowing us to explore much largersearch spaces in the same amount of time.

3Our Bonsaimodel of let-poly-2 is available at https://bitbucket.org/bonsai-checker/let-poly/.

Proceedings of the ACM on Programming Languages, Vol. 2, No. POPL, Article 62. Publication date: January 2018.

Bonsai: Synthesis-Based Reasoning for Type Systems 62:27

Table 2. Bonsai performance when constructing minimal counterexamples. The star (*) indicates that theminimized counterexample was smaller than that produced without use of the minimization query.

Bug Time (sec) Min. Size

stlc-1 0.92s 5n

stlc-2 1.1s 9n

stlc-3 0.99s 5n

stlc-4 2.7s 17n

stlc-5 1.0s 9n

Bug Time (sec) Min. Size

stlc-6 2.6s 13n*

stlc-7 1.2s 5n*

stlc-8 4.9s 21n*

stlc-9 1.3s 15n

10−1

100

101

102

103

Time (log sec)

0

50

100

150

200

250

Cou

nterexample

Size(n)

Tim

eout(1

hour)

Benchmark: stlc+lists

Syntax fuzzer

Type fuzzer

Bonsai

1012

1025

1050

10100

10200

10400

10800

syntactically-valid trees

102

103

104

105

106

107

Tim

etaken(m

s)

Comparison of Classical and Bonsai encodings

Bonsai symbolic execution

Classical symbolic execution

Bonsai solving

Classical solving

Fig. 21. Left: Comparing fuzzers and Bonsai on the Redex benchmark. Right: Comparing the classical syntaxtree and Bonsai on verifying the lambda calculus.

The benchmark. We implemented a simple recursive-descent interpreter for the lambda calculus,bounded at 4 recursive calls. We then added a ‘typechecker’ that checks for free variables. Wesearched for counterexamples using both encodings with the same typechecker and interpreter.Thus, the only variable between the trials was the symbolic tree encoding used and its size.

In this setup, there are no possible counterexamples. This is intentional: a satisfiable formularequires the solver to reach any valid model, whereas an unsatisfiable formula requires the solverto visit all models and prove that none are satisfiable. Thus, an unsatisfiable formula represents amore taxing stress test for the solver. In practice, we find that satisfiable queries almost alwaystake less than half the time that unsatisfiable queries (usually, they are much faster).

Since tree ‘size’ has different meanings for the AST and bonsai encodings, we normalize by thenumber of syntactically valid ASTs represented by a symbolic tree of a given depth.

Results. Figure 21 (right) shows that Bonsai encoding lets us scale to a much larger number ofASTs than classical encoding in the same amount of time. For example, if we impose a time limitof approximately 20 seconds, then classical encoding explored less than 1025 ASTs while Bonsaiencoding explored more than 1050 ASTs. Similarly, to explore 1025 trees, Bonsai was roughly athousand times faster than classical encoding. In our experience, classical encoding did not scalesufficiently to identify most of the counterexamples discussed in this paper.Note that Bonsai encoding consistently spent less time in the solver than classical encoding

even though Bonsai pushed the complexity of the formal grammar to the solver. This suggests that

Proceedings of the ACM on Programming Languages, Vol. 2, No. POPL, Article 62. Publication date: January 2018.

62:28 Kartik Chandra and Rastislav Bodik

although Bonsai often creates larger symbolic trees than classical encoding to represent the sameAST, it has a performance advantage in both symbolic execution and the solver.

In practice, we rarely search ASTs beyond depth 10; smaller trees are always sufficient tosynthesize counterexamples identified by experts (see Table 3). Nevertheless, we provide data formuch larger trees in this graph to show that the encoding is scalable to much larger spaces ifneeded.

6.4 Summary of Case Studies

The benchmark. We implemented several different languages with Bonsaiand searched for coun-terexamples by planting bugs in the typecheckers. We collected data for the smallest parametersneeded to catch these bugs, as well as for scaled-up trials. The latter were run with the same queries,but parameters were modified in order to search larger spaces.

Results. Table 3 lists data collected for several languages implemented with Bonsai. Each coun-terexample, including those found by experts, was synthesized by Bonsaiwithin a few minutes.This suggests that Bonsai is suitable for interactive use while developing a type system.

Additionally, we found that Bonsai scaled to much larger search spaces, which translates intosearching for larger and more complex counterexample. Indeed, the new ‘scaled-up’ counterex-amples were often several times as large as the minimal counterexamples. Thus, while boundedmodel checking is not a substitute for a formal proof, these results suggest that Bonsai can be usedto search through large spaces of programs efficiently to provide confidence that the type systemis most likely sound, or, alternatively, reveal a large counterexample that may not be found by ahuman.Finally, Bonsai performed well on a wide variety of languages, letting users easily experiment

with features such as closures, environments, classes/objects, ownership types, dependent types,records, and polymorphism.

7 RELATED WORK

Most existing tools for type system designers focus on checking for soundness bugs. This sectiondiscusses various techniques for detecting such bugs. (Section 6 describes empirical evaluations ofBonsai against these tools.) Additionally, we discuss additional related work in theorem-provingand symbolic execution.

7.1 Fuzzing

A type system fuzzer generates random programs and tests them on the typechecker and interpreteruntil it finds one that (1) passes the typechecker, and (2) fails in the interpreter. If such a program isfound, it witnesses a soundness bug.

Fuzzing presents serious scalability challenges. Recall Figure 1, which shows a high-level overviewof the fuzzing process. At each step, the parser, typechecker, or interpreter rejects the vast majorityof its inputs, making the probability of a random program witnessing a soundness error very small.Modern fuzzers address this problem by carefully generating random programs that are more likelyto be counterexamples. For example, syntactic fuzzers, such as the one used by PLT Redex [Kleinet al. 2012c], generate random syntactically valid programs, bypassing the parser. Syntactic fuzzershave been shown to find a wide variety of bugs in real languages [Klein et al. 2012b]. However,they are still not sufficiently scalable enough to catch many simple bugs [Fetscher et al. 2015].A type fuzzer generates random, well-typed programs, bypassing both the parser and the type-

checker. Type fuzzing can be done in many ways: for example, by using constraint logic program-ming to express the typechecker declaratively (which has been used to fuzz Rust [Dewey et al.

Proceedings of the ACM on Programming Languages, Vol. 2, No. POPL, Article 62. Publication date: January 2018.

Bonsai: Synthesis-Based Reasoning for Type Systems 62:29

Table 3. An overview of the languages implemented with Bonsai. Here, tBonsai is the time required to catch thesoundness bug described in the text; |ce | is the size of the counterexample measured as in Table 1; and |S | isthe number of syntactically correct programs explored. t+, |S+ |, and |ce+ | are analogous, but for long-runningtrials. We show relevant comparisons with Pipal where possible.

Language Description Parameters |S | |ce | tBonsai tPipal |S+ | |ce+ | t+Arithmetic Typed arith-

metic [Pierce

2002]

depth = 9

no bound

104 13 1s 0.5s 1088 99 11m

Simply-typed λ-

calculus

From Re-

dex [Robert

Bruce Findler

and Felleisen

2017], bug 5

depth = 5

bound = 4

104 9 0.97s Ð 1021 43 1.7m

Featherweight

Java

A minimal core

calculus for

Java with inher-

itance [Igarashi

et al. 1999]

# classes = 3

# methods = 3

depth = 6

bound = 7

1027 42 3s 2.1s 10119 163 13m

Ownership

Java

Featherweight

Java augmented

with ownership

types [Boyapati

et al. 2003]

# classes = 2

# methods = 4

# owners = 2

depth = 8

bound = 7

1063 72 17s >250s 10342 158 30m

NanoDOT Subset of depen-

dent object type

calculus [Amin

et al. 2016]

depth = 10

obj depth = 6

no bound

1019 23 211s Ð 1039 49 8m

NanoScala NanoDOT

augmented

with casts and

null

Same as Nano-

DOT

1020 24 175s Ð 1064 39 48m

2015]) or by generating random type judgement trees and using them to derive a program (whichhas been used to fuzz GHC [Fetscher et al. 2015]).Fuzzing is an effective technique for two reasons. First, a fuzzer can easily check actual imple-

mentations of languages. This eliminates the need to formalize the language and helps users catchimplementation-dependent bugs that a formalization would miss. Second, a fuzzer produces a con-crete counterexample program, making it easy to diagnose and fix the soundness issue. This claimis supported by a recent study that used NanoMaLy [Seidel et al. 2016] to produce counterexamplesthat crashed ill-typed ML programs. Students who were given NanoMaLy’s counterexamples in anexam setting were 10% to 30% more likely to correctly explain and fix the type error than studentsgiven only a printout of the compiler error.

However, fuzzers have a critical weakness: by their nature, random fuzzers are non-exhaustive.If after 24 hours no soundness error has been discovered, we cannot assume that the typecheckeris sound. Indeed, neither syntactic fuzzers nor type fuzzers can discover certain simple bugs in thePLT Redex benchmark [Klein et al. 2012b] after many hours [Fetscher et al. 2015] because thosebugs are witnessed by only a small set of programs that are extremely unlikely to be generatedrandomly.

Proceedings of the ACM on Programming Languages, Vol. 2, No. POPL, Article 62. Publication date: January 2018.

62:30 Kartik Chandra and Rastislav Bodik

7.2 Handling Non-exhaustiveness with Pipal

One specialized kind of type fuzzer does make bounded exhaustiveness guarantees. Pipal [Robersonand Boyapati 2010; Roberson et al. 2008] requires a typechecker to be encoded as a set of constraintsimposed on a finite set of intermediate program states. At each iteration, Pipal queries a constraintsolver for a well-typed state and performs a single step of evaluation to determine whether progressor preservation were violated. To search the space of intermediate states efficiently, Pipal monitorshow the intermediate state was manipulated during the single step of evaluation. If the program isnot a counterexample, Pipal uses this information to add more pruning constraints. The searchspace has been exhaustively checked when the solver returns UNSAT.

Pipal’s symbolic representation of ASTs provided the original inspiration for bonsai trees. Pipaltrees encode syntax information with symbolic formulas, reducing the size of the tree and thus thenumber of symbolic constants.

While Pipal is efficient, it requires typecheckers to be manually rewritten as a set of declarativeconstraints, together with carefully formulated invariants on intermediate program states. Notonly is this tedious for the user, but it also prevents directly checking executable versions oftypecheckers, losing one of the primary benefits of fuzzing. In contrast, Bonsai uses symbolicexecution to automatically convert an executable interpreter into constraints. Importantly, Bonsai’scounterexamples are programs, which tend to be easier to understand than the intermediate statesreported by Pipal. Finally, unlike Pipal’s iterative algorithm, Bonsaimakes only one query to thesolver. This property allows us to ask the solver questions beyond checking soundness.

7.3 Theorem Proving

If a fuzzer or a bounded model checker fails to find a witness to a soundness bug, we cannot claimto have a proof of soundness: there is always the possibility that a program larger than the boundmight expose a bug. In practice, the small-scope hypothesis [Andoni et al. 2003] conjectures thatall soundness bugs will be revealed by small witness programs;thus, for a sufficiently large searchbound, the failure to find a counterexample is usually compelling evidence (but not proof) that thetype system is correct.

Unlike fuzzing or bounded model checking, however, a mathematical proof of soundness providescomplete assurance that a type system is sound. Such proofs can be developed manually or by usingproof assistants such as [The Coq development team 2004] and [Nipkow et al. 2002]. For instance,Drossopoulou and Eisenbach [1997] prove soundness of a subset of Java manually, while Nipkowand von Oheimb [1998] do so using Isabelle/HOL [Nipkow et al. 2002].Unfortunately, theorem proving requires a formalized model of the language, such as Feath-

erweight Java [Igarashi et al. 1999] for Java and DOT [Amin et al. 2014] for Scala. Most modernlanguages are too complex to be fully formalized, and the models that do emerge often take years ofdevelopment. Furthermore, the actual proofs are often long and tedious, requiring significant man-ual effort. For example, the mechanized proof of the DOT language [Amin et al. 2014] using the Coqproof assistant [The Coq development team 2004] consists of several thousand lines of code [Rompfand Amin 2016] even though the language can be formalized in a couple of pages [Amin et al. 2016].We intend for Bonsai to aid in model design in a manner similar to PLT Redex [Klein et al. 2012b].

7.4 Symbolic Execution and Synthesis

Bounded verification tools [Dennis et al. 2006; Dolby et al. 2007; Galeotti et al. 2010; JPF; Köksalet al. 2012; Kroening; Solar-Lezama et al. 2006; Taghdiri 2007; Torlak et al. 2010] encode the concretesemantics of a language and translate the program (or just one execution path) into a logical formula.The formula represents constraints whose solution answers queries about symbolic inputs to the

Proceedings of the ACM on Programming Languages, Vol. 2, No. POPL, Article 62. Publication date: January 2018.

Bonsai: Synthesis-Based Reasoning for Type Systems 62:31

program. Symbolic execution has become competitive in advanced analysis tasks, such as analysisof high-order contracts [Nguyen et al. 2015].The generation of a counterexample program is related to program synthesis because we can

think of the type system as the specification: the desired program passes the typechecker and failsin the interpreter. There are three main kinds of synthesizers: (1) rewriting [Fischer and Schumann2003; Frigo and Johnson 2005; Püschel et al. 2005], (2) deductive [Mateev et al. 2000; McDonaldand Anton 2001; Srivastava et al. 2010], and (3) those based on searching a space of programs withcontraint solvers [Angluin and Smith 1983; Gulwani 2011; Solar-Lezama et al. 2006; Srivastava et al.2010; Vechev et al. 2006]. While the first two categories derive a program from a specification, thelast category searches a space of programs, conceptually evaluating each against the specification.This search is analogous to Bonsai’s search for the counterexample program.

Reducing the path explosion during symbolic evaluation was addressed in ESC/Java [Flanaganand Saxe 2001a] and Rosette [Torlak and Bodík 2014]. Rosette’s symbolic unions, which distributesymbolic joins deeply across other operations, provided the second inspiration for bonsai trees.When the bonsai nodes are cons structures, Rosette effectively performs the bonsai merge for free.

Bonsai trees are a so-called symbolic data structure [Bodík et al. 2017], espite producing anunusual encoding when symbolically evaluated, they offer programmers the usual AST interface,facilitating specification of compact (big step) interpreters.

8 CONCLUSION

Bonsai a type system designer’s assistant, uses symbolic evaluation to convert typecheckersand interpreters into constraints. By combining these constraints, Bonsai can query a constraintsolver for programs with specific properties. Such queries are useful for locating soundness errors(including intricate bugs that took experts many years to discover), comparing type systems, findingunnecessary restrictions, and even synthesizing simple typechecks automatically.Bonsai uses a novel data structure to encode sets of abstract syntax trees. This lets it scale to

extremely large search spaces while remaining sufficiently fast for designers to use interactivelywhen developing a type system.

REFERENCES

B. Alpern, M. N. Wegman, and F. K. Zadeck. Detecting equality of variables in programs. In Proceedings of the 15th ACM

SIGPLAN-SIGACT Symposium on Principles of Programming Languages, POPL ’88, pages 1ś11, New York, NY, USA, 1988.

ACM. ISBN 0-89791-252-7. doi: 10.1145/73560.73561. URL http://doi.acm.org/10.1145/73560.73561.

Nada Amin and Ross Tate. Java and scala’s type systems are unsound: The existential crisis of null pointers. In Proceedings

of the 2016 ACM SIGPLAN International Conference on Object-Oriented Programming, Systems, Languages, and

Applications, OOPSLA 2016, pages 838ś848, New York, NY, USA, 2016a. ACM. ISBN 978-1-4503-4444-9. doi:

10.1145/2983990.2984004. URL http://doi.acm.org/10.1145/2983990.2984004.

Nada Amin and Ross Tate. Soundness issue with path-dependent type on ‘null‘ path.

https://issues.scala-lang.org/browse/SI-9633, 2016b.

Nada Amin, Tiark Rompf, and Martin Odersky. Foundations of path-dependent types. In Proceedings of the 2014 ACM

International Conference on Object Oriented Programming Systems Languages & Applications, OOPSLA ’14, pages 233ś249,

New York, NY, USA, 2014. ACM. ISBN 978-1-4503-2585-1. doi: 10.1145/2660193.2660216. URL

http://doi.acm.org/10.1145/2660193.2660216.

Nada Amin, Samuel Grütter, Martin Odersky, Tiark Rompf, and Sandro Stucki. The Essence of Dependent Object Types, pages

249ś272. Springer International Publishing, Cham, 2016. ISBN 978-3-319-30936-1. doi: 10.1007/978-3-319-30936-1_14.

URL http://dx.doi.org/10.1007/978-3-319-30936-1_14.

Alexandr Andoni, Dumitru Daniliuc, Sarfraz Khurshid, and Darko Marinov. Evaluating the łsmall scope hypothesis".

Technical report, MIT Laboratory for Computer Science, 2003.

Dana Angluin and Carl H. Smith. Inductive inference: Theory and methods. ACM Comput. Surv., 15(3):237ś269, 1983.

Thomas Ball and Sriram K. Rajamani. Bebop: A symbolic model checker for boolean programs. In Klaus Havelund, John

Penix, and Willem Visser, editors, SPIN Model Checking and Software Verification, 7th International SPIN Workshop,

Proceedings of the ACM on Programming Languages, Vol. 2, No. POPL, Article 62. Publication date: January 2018.

62:32 Kartik Chandra and Rastislav Bodik

Stanford, CA, USA, August 30 - September 1, 2000, Proceedings, volume 1885 of Lecture Notes in Computer Science, pages

113ś130. Springer, 2000. ISBN 3-540-41030-9. doi: 10.1007/10722468_7. URL https://doi.org/10.1007/10722468_7.

Rastislav Bodík, Kartik Chandra, Phitchaya Mangpo Phothilimthana, and Nathaniel Yazdani. Domain-specific symbolic

compilation. In Benjamin S. Lerner, Rastislav Bodík, and Shriram Krishnamurthi, editors, 2nd Summit on Advances in

Programming Languages, SNAPL 2017, May 7-10, 2017, Asilomar, CA, USA, volume 71 of LIPIcs, pages 2:1ś2:17. Schloss

Dagstuhl - Leibniz-Zentrum fuer Informatik, 2017. ISBN 978-3-95977-032-3. doi: 10.4230/LIPIcs.SNAPL.2017.2. URL

https://doi.org/10.4230/LIPIcs.SNAPL.2017.2.

Chandrasekhar Boyapati. SafeJava: A Unified Type System for Safe Programming. PhD thesis, Massachusetts Institute of

Technology, February 2004.

Chandrasekhar Boyapati, Robert Lee, and Martin Rinard. Ownership types for safe programming: Preventing data races

and deadlocks. In Proceedings of the 17th ACM SIGPLAN Conference on Object-oriented Programming, Systems, Languages,

and Applications, OOPSLA ’02, pages 211ś230, New York, NY, USA, 2002. ACM. ISBN 1-58113-471-1. doi:

10.1145/582419.582440. URL http://doi.acm.org/10.1145/582419.582440.

Chandrasekhar Boyapati, Barbara Liskov, and Liuba Shrira. Ownership types for object encapsulation. In Proceedings of the

30th ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, POPL ’03, pages 213ś223, New York,

NY, USA, 2003. ACM. ISBN 1-58113-628-5. doi: 10.1145/604131.604156. URL http://doi.acm.org/10.1145/604131.604156.

Lori A. Clarke and Debra J. Richardson. Applications of symbolic evaluation. Journal of Systems and Software, 5(1):15ś35,

1985. doi: 10.1016/0164-1212(85)90004-4. URL https://doi.org/10.1016/0164-1212(85)90004-4.

W. R. Cook. A proposal for making Eiffel type-safe. Comput. J., 32(4):305ś311, July 1989. ISSN 0010-4620. doi:

10.1093/comjnl/32.4.305. URL http://dx.doi.org/10.1093/comjnl/32.4.305.

Manuvir Das, Sorin Lerner, and Mark Seigle. ESP: path-sensitive program verification in polynomial time. In Jens Knoop

and Laurie J. Hendren, editors, Proceedings of the 2002 ACM SIGPLAN Conference on Programming Language Design and

Implementation (PLDI), Berlin, Germany, June 17-19, 2002, pages 57ś68. ACM, 2002. ISBN 1-58113-463-0. doi:

10.1145/512529.512538. URL http://doi.acm.org/10.1145/512529.512538.

Greg Dennis, Felix Sheng-Ho Chang, and Daniel Jackson. Modular verification of code with SAT. In International

Symposium on Software Testing and Analysis (ISSTA’06), 2006.

Kyle Dewey, Jared Roesch, and Ben Hardekopf. Language fuzzing using constraint logic programming. In Ivica Crnkovic,

Marsha Chechik, and Paul Grünbacher, editors, ACM/IEEE International Conference on Automated Software Engineering,

ASE ’14, Vasteras, Sweden - September 15 - 19, 2014, pages 725ś730. ACM, 2014. ISBN 978-1-4503-3013-8. doi:

10.1145/2642937.2642963. URL http://doi.acm.org/10.1145/2642937.2642963.

Kyle Dewey, Jared Roesch, and Ben Hardekopf. Fuzzing the Rust typechecker using CLP. In Proceedings of the 2015 30th

IEEE/ACM International Conference on Automated Software Engineering (ASE), ASE ’15, pages 482ś493, Washington, DC,

USA, 2015. IEEE Computer Society. ISBN 978-1-5090-0025-8. doi: 10.1109/ASE.2015.65. URL

http://dx.doi.org/10.1109/ASE.2015.65.

J. Dolby, M. Vaziri, and F. Tip. Finding bugs efficiently with a sat solver. In 6th International Symposium on the Foundations

of Software Engineering (FSE’07), page 204. ACM, 2007.

Sophia Drossopoulou and Susan Eisenbach. Java is type safe - probably. In In European Conference On Object Oriented

Programming, pages 389ś418. Springer-Verlag, 1997.

Burke Fetscher, Koen Claessen, Michal H. Palka, John Hughes, and Robert Bruce Findler. Making random judgments:

Automatically generating well-typed terms from the definition of a type-system. In Jan Vitek, editor, Programming

Languages and Systems - 24th European Symposium on Programming, ESOP 2015, Held as Part of the European Joint

Conferences on Theory and Practice of Software, ETAPS 2015, London, UK, April 11-18, 2015. Proceedings, volume 9032 of

Lecture Notes in Computer Science, pages 383ś405. Springer, 2015. ISBN 978-3-662-46668-1. doi:

10.1007/978-3-662-46669-8_16. URL https://doi.org/10.1007/978-3-662-46669-8_16.

Bernd Fischer and Johann Schumann. Autobayes: a system for generating data analysis programs from statistical models. J.

Funct. Program., 13(3):483ś508, 2003. ISSN 0956-7968.

Cormac Flanagan and James B. Saxe. Avoiding exponential explosion: generating compact verification conditions. In Chris

Hankin and Dave Schmidt, editors, Conference Record of POPL 2001: The 28th ACM SIGPLAN-SIGACT Symposium on

Principles of Programming Languages, London, UK, January 17-19, 2001, pages 193ś205. ACM, 2001a. ISBN 1-58113-336-7.

doi: 10.1145/360204.360220. URL http://doi.acm.org/10.1145/360204.360220.

Cormac Flanagan and James B. Saxe. Avoiding exponential explosion: Generating compact verification conditions. In

Proceedings of the 28th ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, POPL ’01, pages

193ś205, New York, NY, USA, 2001b. ACM. ISBN 1-58113-336-7. doi: 10.1145/360204.360220. URL

http://doi.acm.org/10.1145/360204.360220.

Matthew Flatt and PLT. Reference: Racket. Technical Report PLT-TR-2010-1, PLT Design Inc., 2010.

https://racket-lang.org/tr1/.

Proceedings of the ACM on Programming Languages, Vol. 2, No. POPL, Article 62. Publication date: January 2018.

Bonsai: Synthesis-Based Reasoning for Type Systems 62:33

Matteo Frigo and Steven G. Johnson. The design and implementation of FFTW3. Proceedings of the IEEE, 93(2):216ś231,

2005. Special issue on łProgram Generation, Optimization, and Platform Adaptationž.

J.P. Galeotti, N. Rosner, C.G.L. Pombo, and M.F. Frias. Analysis of invariants for efficient bounded verification. In 2010

International Symposium on Software Testing and Analysis (ISSTA’10), 2010.

Sumit Gulwani. Automating string processing in spreadsheets using input-output examples. In POPL, pages 317ś330, 2011.

L. Howard Holley and Barry K. Rosen. Qualified data flow problems. In Proceedings of the 7th ACM SIGPLAN-SIGACT

Symposium on Principles of Programming Languages, POPL ’80, pages 68ś82, New York, NY, USA, 1980. ACM. ISBN

0-89791-011-7. doi: 10.1145/567446.567454. URL http://doi.acm.org/10.1145/567446.567454.

Atshushi Igarashi, Benjamin Pierce, and Philip Wadler. Featherweight Java: A minimal core calculus for Java and GJ. In

Proceedings of the 14th ACM SIGPLAN Conference on Object-oriented Programming, Systems, Languages, and Applications,

OOPSLA ’99, pages 132ś146, New York, NY, USA, 1999. ACM. ISBN 1-58113-238-7. doi: 10.1145/320384.320395. URL

http://doi.acm.org/10.1145/320384.320395.

JPF. JavaPathFinder. http://babelfish.arc.nasa.gov/trac/jpf/.

James C. King. Symbolic execution and program testing. Communications of the ACM, 19(7):385ś394, 1976. ISSN 0001-0782.

Casey Klein, John Clements, Christos Dimoulas, Carl Eastlund, Matthias Felleisen, Matthew Flatt, Jay A. McCarthy, Jon

Rafkind, Sam Tobin-Hochstadt, and Robert Bruce Findler. Run your research: on the effectiveness of lightweight

mechanization. In John Field and Michael Hicks, editors, Proceedings of the 39th ACM SIGPLAN-SIGACT Symposium on

Principles of Programming Languages, POPL 2012, Philadelphia, Pennsylvania, USA, January 22-28, 2012, pages 285ś296.

ACM, 2012a. ISBN 978-1-4503-1083-3. doi: 10.1145/2103656.2103691. URL http://doi.acm.org/10.1145/2103656.2103691.

Casey Klein, John Clements, Christos Dimoulas, Carl Eastlund, Matthias Felleisen, Matthew Flatt, Jay A. McCarthy, Jon

Rafkind, Sam Tobin-Hochstadt, and Robert Bruce Findler. Run your research: On the effectiveness of lightweight

mechanization. In Proceedings of the 39th Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming

Languages, POPL ’12, pages 285ś296, New York, NY, USA, 2012b. ACM. ISBN 978-1-4503-1083-3. doi:

10.1145/2103656.2103691. URL http://doi.acm.org/10.1145/2103656.2103691.

Casey Klein, Matthew Flatt, and Robert Bruce Findler. The racket virtual machine and randomized testing. Higher Order

Symbol. Comput., 25(2-4):209ś253, December 2012c. ISSN 1388-3690. doi: 10.1007/s10990-013-9091-1. URL

http://dx.doi.org/10.1007/s10990-013-9091-1.

Ali Sinan Köksal, Viktor Kuncak, and Philippe Suter. Constraints as control. In Proceedings of the 39th annual ACM

SIGPLAN-SIGACT symposium on Principles of programming languages, POPL ’12, pages 151ś164, New York, NY, USA,

2012. ACM. ISBN 978-1-4503-1083-3. doi: 10.1145/2103656.2103675. URL http://doi.acm.org/10.1145/2103656.2103675.

Daniel Kroening. CBMC. http://www.cprover.org/cbmc/.

Nikolay Mateev, Keshav Pingali, Paul Stodghill, and Vladimir Kotlyar. Next-generation generic programming and its

application to sparse matrix computations. In Proceedings of ICS’00, pages 88ś99, 2000.

James McDonald and John Anton. SPECWARE - producing software correct by construction. Technical Report KES.U.01.3.,

2001.

Phuc C. Nguyen, Sam Tobin-Hochstadt, and David Van Horn. Higher-order symbolic execution for contract verification

and refutation. CoRR, abs/1507.04817, 2015. URL http://arxiv.org/abs/1507.04817.

Tobias Nipkow and David von Oheimb. Javaℓiдht is type-safe Ð definitely. In Proc. 25th ACM Symp. Principles of

Programming Languages, pages 161ś170. ACM Press, 1998.

Tobias Nipkow, Markus Wenzel, and Lawrence C. Paulson. Isabelle/HOL: A Proof Assistant for Higher-order Logic.

Springer-Verlag, Berlin, Heidelberg, 2002. ISBN 3-540-43376-7.

Martin Odersky. We got liftoff! The Dotty compiler for Scala bootstraps.

http://www.scala-lang.org/blog/2015/10/23/dotty-compiler-bootstraps.html, 2015.

Martin Odersky. Scaling DOT to Scala - soundness. http://scala-lang.org/blog/2016/02/17/scaling-dot-soundness.html, 2016.

Martin Odersky and al. An overview of the Scala programming language. Technical Report IC/2004/64, EPFL, Lausanne,

Switzerland, 2004.

Benjamin C. Pierce. Types and Programming Languages. The MIT Press, 1st edition, 2002. ISBN 0262162091, 9780262162098.

Markus Püschel, José M. F. Moura, Jeremy Johnson, David Padua, Manuela Veloso, Bryan Singer, Jianxin Xiong, Franz

Franchetti, Aca Gacic, Yevgen Voronenko, Kang Chen, Robert W. Johnson, and Nicholas Rizzolo. SPIRAL: Code

generation for DSP transforms. Proceedings of the IEEE, special issue on łProgram Generation, Optimization, and

Adaptationž, 93(2):232ś 275, 2005.

Michael Roberson and Chandrasekhar Boyapati. Efficient modular glass box software model checking. In Proceedings of the

ACM International Conference on Object Oriented Programming Systems Languages and Applications, OOPSLA ’10, pages

4ś21, New York, NY, USA, 2010. ACM. ISBN 978-1-4503-0203-6. doi: 10.1145/1869459.1869461. URL

http://doi.acm.org/10.1145/1869459.1869461.

Michael Roberson, Melanie Harries, Paul T. Darga, and Chandrasekhar Boyapati. Efficient software model checking of

soundness of type systems. In Proceedings of the 23rd ACM SIGPLAN Conference on Object-oriented Programming Systems

Proceedings of the ACM on Programming Languages, Vol. 2, No. POPL, Article 62. Publication date: January 2018.

62:34 Kartik Chandra and Rastislav Bodik

Languages and Applications, OOPSLA ’08, pages 493ś504, New York, NY, USA, 2008. ACM. ISBN 978-1-60558-215-3. doi:

10.1145/1449764.1449803. URL http://doi.acm.org/10.1145/1449764.1449803.

Burke Fetscher Robert Bruce Findler, Casey Klein and Matthias Felleisen. Redex: Practical Semantics Engineering, 2017. URL

http://docs.racket-lang.org/redex/index.html.

Tiark Rompf and Nada Amin. From F to DOT: Type soundness proofs with definitional interpreters. Technical report,

Purdue University and EPFL, 2016.

Eric L. Seidel, Ranjit Jhala, and Westley Weimer. Dynamic witnesses for static type errors. In 21st ACM SIGPLAN

International Conference on Functional Programming, 2016. URL http://eric.seidel.io/pub/nanomaly-icfp16.pdf.

Koushik Sen, Swaroop Kalasapur, Tasneem G. Brutch, and Simon Gibbs. Jalangi: a selective record-replay and dynamic

analysis framework for javascript. In Bertrand Meyer, Luciano Baresi, and Mira Mezini, editors, Joint Meeting of the

European Software Engineering Conference and the ACM SIGSOFT Symposium on the Foundations of Software Engineering,

ESEC/FSE’13, Saint Petersburg, Russian Federation, August 18-26, 2013, pages 488ś498. ACM, 2013. doi:

10.1145/2491411.2491447. URL http://doi.acm.org/10.1145/2491411.2491447.

Koushik Sen, George C. Necula, Liang Gong, and Wontae Choi. Multise: multi-path symbolic execution using value

summaries. In Elisabetta Di Nitto, Mark Harman, and Patrick Heymans, editors, Proceedings of the 2015 10th Joint

Meeting on Foundations of Software Engineering, ESEC/FSE 2015, Bergamo, Italy, August 30 - September 4, 2015, pages

842ś853. ACM, 2015. doi: 10.1145/2786805.2786830. URL http://doi.acm.org/10.1145/2786805.2786830.

Armando Solar-Lezama, Liviu Tancau, Rastislav Bodik, Sanjit Seshia, and Vijay Saraswat. Combinatorial sketching for finite

programs. In Proceedings of the 12th international conference on Architectural support for programming languages and

operating systems, ASPLOS XII, pages 404ś415, New York, NY, USA, 2006. ACM. ISBN 1-59593-451-0. doi:

10.1145/1168857.1168907. URL http://doi.acm.org/10.1145/1168857.1168907.

Saurabh Srivastava, Sumit Gulwani, and Jeffrey S. Foster. From program verification to program synthesis. In POPL, 2010.

Bernhard Steffen. Property-oriented expansion. In Radhia Cousot and David A. Schmidt, editors, Static Analysis, Third

International Symposium, SAS’96, Aachen, Germany, September 24-26, 1996, Proceedings, volume 1145 of Lecture Notes in

Computer Science, pages 22ś41. Springer, 1996. ISBN 3-540-61739-6. doi: 10.1007/3-540-61739-6_31. URL

https://doi.org/10.1007/3-540-61739-6_31.

M. Taghdiri. Automating Modular Program Verification by Refining Specifications. PhD thesis, Massachusetts Institute of

Technology, 2007.

The Coq development team. The Coq proof assistant reference manual. LogiCal Project, 2004. URL http://coq.inria.fr.

Mads Tofte. Type inference for polymorphic references. Inf. Comput., 89(1):1ś34, 1990a. doi: 10.1016/0890-5401(90)90018-D.

URL https://doi.org/10.1016/0890-5401(90)90018-D.

Mads Tofte. Type inference for polymorphic references. Inf. Comput., 89(1):1ś34, September 1990b. ISSN 0890-5401. doi:

10.1016/0890-5401(90)90018-D. URL http://dx.doi.org/10.1016/0890-5401(90)90018-D.

Emina Torlak. The Rosette Guide. University of Washington, 2016. URL

http://emina.github.io/rosette/doc/rosette-guide/index.html.

Emina Torlak and Rastislav Bodík. A lightweight symbolic virtual machine for solver-aided host languages. In ACM

SIGPLAN Conference on Programming Language Design and Implementation, PLDI ’14, Edinburgh, United Kingdom - June

09 - 11, 2014, page 54, 2014. doi: 10.1145/2594291.2594340. URL http://doi.acm.org/10.1145/2594291.2594340.

Emina Torlak and Rastislav Bodik. A lightweight symbolic virtual machine for solver-aided host languages. In Proceedings

of the 35th ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI ’14, pages 530ś541,

New York, NY, USA, 2014. ACM. ISBN 978-1-4503-2784-8. doi: 10.1145/2594291.2594340. URL

http://doi.acm.org/10.1145/2594291.2594340.

Emina Torlak, Mandana Vaziri, and Julian Dolby. MemSAT: Checking axiomatic specifications of memory models. In

Programming Language Design and Implementation (PLDI’10), pages 341ś350, 2010.

Martin T. Vechev, Eran Yahav, and David F. Bacon. Correctness-preserving derivation of concurrent garbage collection

algorithms. In PLDI, pages 341ś353, 2006.

A.K. Wright and M. Felleisen. A syntactic approach to type soundness. Inf. Comput., 115(1):38ś94, November 1994. ISSN

0890-5401. doi: 10.1006/inco.1994.1093. URL http://dx.doi.org/10.1006/inco.1994.1093.

Xuejun Yang, Yang Chen, Eric Eide, and John Regehr. Finding and understanding bugs in c compilers. In Proceedings of the

32Nd ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI ’11, pages 283ś294, New

York, NY, USA, 2011. ACM. ISBN 978-1-4503-0663-8. doi: 10.1145/1993498.1993532. URL

http://doi.acm.org/10.1145/1993498.1993532.

Andreas Zeller. Yesterday, my program worked. today, it does not. why? In Oscar Nierstrasz and Michel Lemoine, editors,

Software Engineering - ESEC/FSE’99, 7th European Software Engineering Conference, Held Jointly with the 7th ACM

SIGSOFT Symposium on the Foundations of Software Engineering, Toulouse, France, September 1999, Proceedings, volume

1687 of Lecture Notes in Computer Science, pages 253ś267. Springer, 1999. ISBN 3-540-66538-2. doi:

10.1007/3-540-48166-4_16. URL https://doi.org/10.1007/3-540-48166-4_16.

Proceedings of the ACM on Programming Languages, Vol. 2, No. POPL, Article 62. Publication date: January 2018.