Functional Programming for Domain-Speci c Languages · timizing new languages. Functional...

27
Functional Programming for Domain-Specific Languages Jeremy Gibbons Department of Computer Science, University of Oxford http://www.cs.ox.ac.uk/jeremy.gibbons/ Abstract. Domain-specific languages become effective only in the pres- ence of convenient lightweight tools for defining, implementing, and op- timizing new languages. Functional programming provides a promising framework for such tasks; FP and DSLs are natural partners. In these lectures we will discuss FP techniques for DSLs—especially standalone versus embedded DSLs, and shallow versus deep embeddings. 1 Introduction In his book [1], Fowler defines a domain-specific language (DSL) as a computer programming language of limited expressiveness focussed on a particular domain A DSL is targetted at a specific class of programming tasks; it may indeed not be Turing-complete. By restricting scope to a particular domain, one can tailor the language specifically for that domain. Common concepts or idioms in the domain can be made more easily and directly expressible—even at the cost of making things outside the intended domain more difficult to write. The assumptions common to the domain may be encoded within the language itself, so that they need not be repeated over and over for each program within the domain—and again, those assumptions may be inconsistent with applications outside the domain. The term ‘DSL’ is rather more recent than its meaning; DSLs have been prevalent throughout the history of computing. As Mernik et al. [2] observe, DSLs have in the past been called ‘application-oriented’, ‘special-purpose’, ‘spe- cialized’, ‘task-specific’, and ‘application’ languages, and perhaps many other things too. The ‘fourth-generation languages’ (4GLs) popular in the 1980s were essentially DSLs for database-oriented applications, and were expected at the time to supercede general-purpose 3GLs such as Pascal and C. One might even say that Fortran and Cobol were domain-specific languages, focussed on scientific and business applications respectively, although they are both Turing-complete. Bentley [3] wrote influentially in his Programming Pearls column about the ‘lit- tle languages’ constituting the philosophy and much of the functionality of the Unix operating system: tools for programmers such as the shell, regular expres- sions, lex and yacc, and tools for non-programmers such as the Pic language for line drawings and a language for specifying surveys.

Transcript of Functional Programming for Domain-Speci c Languages · timizing new languages. Functional...

Page 1: Functional Programming for Domain-Speci c Languages · timizing new languages. Functional programming provides a promising framework for such tasks; FP and DSLs are natural partners.

Functional Programming forDomain-Specific Languages

Jeremy Gibbons

Department of Computer Science, University of Oxfordhttp://www.cs.ox.ac.uk/jeremy.gibbons/

Abstract. Domain-specific languages become effective only in the pres-ence of convenient lightweight tools for defining, implementing, and op-timizing new languages. Functional programming provides a promisingframework for such tasks; FP and DSLs are natural partners. In theselectures we will discuss FP techniques for DSLs—especially standaloneversus embedded DSLs, and shallow versus deep embeddings.

1 Introduction

In his book [1], Fowler defines a domain-specific language (DSL) as

a computer programming language of limited expressiveness focussed ona particular domain

A DSL is targetted at a specific class of programming tasks; it may indeednot be Turing-complete. By restricting scope to a particular domain, one cantailor the language specifically for that domain. Common concepts or idiomsin the domain can be made more easily and directly expressible—even at thecost of making things outside the intended domain more difficult to write. Theassumptions common to the domain may be encoded within the language itself,so that they need not be repeated over and over for each program within thedomain—and again, those assumptions may be inconsistent with applicationsoutside the domain.

The term ‘DSL’ is rather more recent than its meaning; DSLs have beenprevalent throughout the history of computing. As Mernik et al. [2] observe,DSLs have in the past been called ‘application-oriented’, ‘special-purpose’, ‘spe-cialized’, ‘task-specific’, and ‘application’ languages, and perhaps many otherthings too. The ‘fourth-generation languages’ (4GLs) popular in the 1980s wereessentially DSLs for database-oriented applications, and were expected at thetime to supercede general-purpose 3GLs such as Pascal and C. One might evensay that Fortran and Cobol were domain-specific languages, focussed on scientificand business applications respectively, although they are both Turing-complete.Bentley [3] wrote influentially in his Programming Pearls column about the ‘lit-tle languages’ constituting the philosophy and much of the functionality of theUnix operating system: tools for programmers such as the shell, regular expres-sions, lex and yacc, and tools for non-programmers such as the Pic language forline drawings and a language for specifying surveys.

Page 2: Functional Programming for Domain-Speci c Languages · timizing new languages. Functional programming provides a promising framework for such tasks; FP and DSLs are natural partners.

2 Jeremy Gibbons

There are two main approaches to implementing DSLs. The historicallyprevalent approach has been to build standalone languages, with their own cus-tom syntax. Standard compilation techniques are used to translate or interpretprograms written in the DSL into a general-purpose language (GPL) for ex-ecution. This has the advantage that the syntax of the DSL can be designedspecifically for the intended users, and need bear no relation to that of the hostlanguage—indeed, there may be many different host languages, as there are forSQL, or the DSL ‘syntax’ may be diagrammatic rather than textual.

Conversely, implementing a new standalone DSL is a significant undertaking,involving a separate parser and compiler, and perhaps an interactive editor too.Moreover, the more the DSL is a kind of ‘programming’ language, the morelikely it is that it shares some features with most GPLs—variables, definitions,conditionals, etc—which will have to be designed and integrated with the DSL.In the process of reducing repetition and raising the level of abstraction forthe DSL programmer, we have introduced repetition and lowered the level ofabstraction for the DSL implementer. That may well be a rational compromise.But is there a way of getting the best of both worlds?

The second approach to implementing DSLs attempts exactly that: to retainas much as possible of the convenient syntax and raised level of abstractionthat a DSL provides, without having to go to the trouble of defining a separatelanguage. Instead, the DSL is embedded within a host language, essentially as alibrary of definitions written in the host GPL (although it is debatable to whatextent ‘library’ and ‘language’ coincide: we return to this point in Section 2.1below). All the existing facilities and infrastructure of the host environment cancontinue to be used, and familiarity with the syntactic conventions of the hostcan be carried over to the DSL.

However, there are some downsides. DSL programs have to be written inthe syntax of the host language; this may be clumsy if the the host syntax isrigid, and daunting to non-programmer domain specialists if the host syntax isobscure. It can be difficult to preserve the abstract boundary between the DSLits host: naive users may unwittingly invoke sophisticated language features, anderror messages may be reported unhelpfully in terms of the host language ratherthan the DSL. Needless to say, these issues are hot research topics among thoseworking on embedded DSLs.

Fowler [1] calls the standalone and embedded approaches ‘external’ and ‘in-ternal’ respectively. He does this not least because ‘embedded’ suggests mislead-ingly that specialized code written in a DSL is quoted verbatim within a hostprogram written in a GPL, with the whole being expressed in a hybrid languagethat is neither the DSL nor the GPL; for example, JavaServer Pages ‘programs’are hybrids, consisting of HTML markup interspersed with fragments of Java. (Infact, Fowler calls such hybrids ‘fragmentary’, and uses the term ‘standalone’ forpure-bred DSLs, in which any program is written in just one language, whetherinternal or external.) That objection notwithstanding, we will stick in this articleto the terms ‘standalone’ and ‘embedded’.

Page 3: Functional Programming for Domain-Speci c Languages · timizing new languages. Functional programming provides a promising framework for such tasks; FP and DSLs are natural partners.

Functional Programming for Domain-Specific Languages 3

We will concentrate in this article on embedded DSLs, only briefly makingthe connection back to standalone DSLs. Again, there are two main approachesto embedded DSLs, which are conventionally called deep and shallow embedding.With a deep embedding, terms in the DSL are implemented simply to constructan abstract syntax tree; this tree is subsequently transformed for optimizationand traversed for evaluation. With a shallow embedding, terms in the DSL areimplemented directly as the values to which they evaluate, bypassing the inter-mediate AST and its traversal. We explore this distinction in Section 2.4.

It turns out that functional programming languages are particularly wellsuited for hosting embedded DSLs. Language features such as algebraic datatypes,higher-order functions, lazy evaluation, and rich type systems supporting typeinference all contribute. We discuss these factors in more detail in Section 3.

The syntax of the host language is another factor, albeit a relatively minorone: functional languages often have lightweight syntax, for example favouringthe use of whitespace and layout rather than punctuation for expressing programstructure, and strongly supporting orthogonality of naming in the sense that bothsymbolic as well as alphabetic identifiers may be used in definitions. Both of thesefeatures improve flexibility, so that an embedded DSL can have a syntax closeto what one might provide for a corresponding standalone DSL. Of course, thereare functional languages with noisy syntactic conventions, and non-functionallanguages with quiet ones, so this factor doesn’t map precisely onto the languageparadigm. We make no more of it in this article, simply using Haskell syntax forconvenience.

We use a number of little examples of embedded DSLs throughout the firstpart of the article. We conclude in Section 4, with a more detailed study of oneparticular embedded DSL, namely Yorgey’s Diagrams package [4].

2 Exploring the design space

In the interests of focussing on the essence of DSLs, we start with a very simpleexample: a DSL for finite sets of integers. This consists of a representation ofsets, and a number of operations manipulating that representation:

type IntegerSet = ...

empty :: IntegerSetinsert :: Integer → IntegerSet → IntegerSetdelete :: Integer → IntegerSet → IntegerSetmember :: Integer → IntegerSet → Bool

For example, one might evaluate the expression

member 3 (insert 1 (delete 3 (insert 2 (insert 3 empty))))

and get the result False (assuming the usual semantics of these operations).

Page 4: Functional Programming for Domain-Speci c Languages · timizing new languages. Functional programming provides a promising framework for such tasks; FP and DSLs are natural partners.

4 Jeremy Gibbons

2.1 Libraries

The first approach one might take to implementing this characterization of in-teger sets might be as a library, that is, as a collection of related functions.For example, one might represent the set as a list, possibly with duplicates andtreating order as insignificant:

type IntegerSet = [Integer ] -- unsorted, duplicates allowed

empty :: IntegerSetempty = [ ]

insert :: Integer → IntegerSet → IntegerSetinsert x xs = x : xs

delete :: Integer → IntegerSet → IntegerSetdelete x xs = filter (6≡ x ) xs

member :: Integer → IntegerSet → Boolmember x xs = any (≡ x ) xs

(Here, the standard library function any p = foldr ((∨) ◦ p) False determineswhether any element of a list satisfies predicate p.)

We have been writing code in this style—that is, collections of types andrelated functions—from the earliest days of computing. Indeed, compilers are socalled because they ‘compile’ (collect and assemble the pieces for) an executableby linking together the programmer’s main program with the necessary functionsfrom the library. The problem with this style is that there is no encapsulationof the data representation: it is evident to all clients of the abstraction thatthe representation uses lists, and client code may exploit this knowledge byusing other list functions on the representation. The representation is publicknowledge, and it becomes very difficult to change it.

2.2 Modules

The realisation that libraries expose data representations prompted the notionof modular programming, especially as advocated by Parnas [5]: code should bepartitioned into modules, and in particular, the modules should be chosen sothat each hides a design decision (such as, but not necessarily, a choice of datarepresentation) from all the others, allowing that decision subsequently to bechanged.

The modular style that Parnas espouses presupposes mutable state: the mod-ule hides a single data structure, and operations query and modify the value ofthat data structure. Because of this dependence on mutable state, it is a littleawkward to write in the Parnas style in a pure functional language like Haskell.To capture this behaviour using only pure features, one adapts the operationsso that each accepts the ‘current’ value of the data structure as an additionalargument, and returns the ‘updated’ value as an additional result. Thus, animpure function of type a → b acting statefully on a data structure of type scan be represented as a pure function of type (a, s)→ (b, s), or equivalently by

Page 5: Functional Programming for Domain-Speci c Languages · timizing new languages. Functional programming provides a promising framework for such tasks; FP and DSLs are natural partners.

Functional Programming for Domain-Specific Languages 5

currying, a → (s → (b, s)). The return part s → (b, s) of this is an instance ofthe state monad, implemented in the Haskell standard library as a type State s b.Then the set module can be written as follows:

module SetModule (Set , runSet , insert , delete,member) where

import Control .Monad .State

type IntegerSet = [Integer ]newtype Set a = S {runS :: State IntegerSet a }instance Monad Set where

return a = S (return a)m >>= k = S (runS m >>= (runS ◦ k))

runSet :: Set a → arunSet x = evalState (runS x ) [ ]

insert :: Integer → Set ()insert x = S $ do {modify (x :)}delete :: Integer → Set ()delete x = S $ do {modify (filter ( 6≡ x ))}member :: Integer → Set Boolmember x = S $ do {xs ← get ; return (any (≡ x ) xs)}

Here, the type Set of stateful operations on the set is abstract: the representationis not exported from the module, only the type and an observer function runSetare. The operations insert , delete, and member are also exported; they may besequenced together to construct larger computations on the set. But the onlyway of observing this larger computation is via runSet , which initializes the setto empty before running the computation. Haskell’s do notation convenientlyhides the plumbing required to pass the set representation from operation tooperation:

runSet $ do {insert 3; insert 2; delete 3; insert 1; member 3}

(To be precise, this stateful programming style does not really use mutable state:all data is still immutable, and each operation that ‘modifies’ the set in factconstructs a fresh data structure, possibly sharing parts of the original. Haskelldoes support true mutable state, with imperative in-place modifications; but todo this with the same interface as above requires the use of unsafe features, inparticular unsafePerformIO .)

2.3 Abstract datatypes

Parnas’s approach to modular programming favours modules that hide a sin-gle data structure; the attentive reader will note that it is easy to add a unionoperation to the library, but difficult to add one to the module. A slightly dif-ferent approach is needed to support data abstractions that encompass multipledata structures—abstract datatypes. In this case, the module exports an abstract

Page 6: Functional Programming for Domain-Speci c Languages · timizing new languages. Functional programming provides a promising framework for such tasks; FP and DSLs are natural partners.

6 Jeremy Gibbons

type, which specifies the hidden representation, together with operations to cre-ate, modify and observe elements of this type.

module SetADT (IntegerSet , empty , insert , delete,member) where

newtype IntegerSet = IS [Integer ]

empty :: IntegerSetempty = IS [ ]

insert :: IntegerSet → Integer → IntegerSetinsert (IS xs) x = IS (x : xs)

delete :: IntegerSet → Integer → IntegerSetdelete (IS xs) x = IS (filter (6≡ x ) xs)

member :: IntegerSet → Integer → Boolmember (IS xs) x = any (≡ x ) xs

Note that in addition to the three operations insert , delete and member exportedby SetModule, we now export the operation empty to create a new set, and theabstract type IntegerSet so that we can store its result, but not the constructorIS that would allow us to deconstruct sets and to construct them by other meansthan the provided operations. Note also that we can revert to a purely functionalstyle; there is no monad, and ‘modifiers’ manifestly construct new sets—this wasnot an option when there was only one set. Finally, note that we have rearrangedthe order of arguments of the three operations, so that the source set is the firstargument; this gives the feeling of an object-oriented style, whereby one ‘sendsthe insert message to an IntegerSet object’:

((((empty ‘insert ‘ 3) ‘insert ‘ 2) ‘delete‘ 3) ‘insert ‘ 1) ‘member ‘ 3

2.4 Languages

One might, in fact, think of the abstract datatype SetADT as a DSL for sets,and the set expression above as a term in this DSL—there is at best a fuzzy linebetween ADTs and embedded DSLs. If one were to make a formal distinctionbetween ‘languages’ and ‘libraries’, it would presumably be that a ‘language’privileges one particular datatype whose elements represent terms in the lan-guage, with constructors that compose terms, and observers that analyse terms;a ‘library’, on the other hand, is just a collection of related functions, and mayhave no such privileged datatype.

The SetADT implementation above can be seen as an intermediate pointon the continuum between two extreme approaches to implementing embeddedDSLs: deep and shallow embedding. In a deep embedding, the operations thatconstruct elements of the abstraction do as little work as possible—they simplypreserve their arguments in an abstract syntax tree.

module SetLanguageDeep (IntegerSet (Empty , Insert ,Delete),member) where

data IntegerSet :: ∗ where

Page 7: Functional Programming for Domain-Speci c Languages · timizing new languages. Functional programming provides a promising framework for such tasks; FP and DSLs are natural partners.

Functional Programming for Domain-Specific Languages 7

Empty :: IntegerSetInsert :: IntegerSet → Integer → IntegerSetDelete :: IntegerSet → Integer → IntegerSet

member :: IntegerSet → Integer → Boolmember Empty y = Falsemember (Insert xs x ) y = (y ≡ x ) ∨ member xs ymember (Delete xs x ) y = (y 6≡ x ) ∧ member xs y

Now we declare and export an algebraic datatype IntegerSet as the implementa-tion of the three operations that yield a set; we have used Haskell’s generalizedalgebraic datatype notation to emphasize their return types, even though wemake no use of the extra expressive power of GADTs. The member operation,on the other hand, is implemented as a traversal over the terms of the language,and is not itself part of the language.

((((Empty ‘Insert ‘ 3) ‘Insert ‘ 2) ‘Delete‘ 3) ‘Insert ‘ 1) ‘member ‘ 3

Whereas in a deep embedding the constructors do nothing and the observersdo all the work, in a shallow embedding it is the other way round: the observersare trivial, and all the computation is in the constructors. Given that the sole ob-server in our set abstraction is the membership function, the shallow embeddingrepresents the set directly as this membership function:

module SetLanguageShallow (IntegerSet , empty , insert , delete,member) where

newtype IntegerSet = IS (Integer → Bool)

empty :: IntegerSetempty = IS (λy → False)

insert :: IntegerSet → Integer → IntegerSetinsert (IS f ) x = IS (λy → (y ≡ x ) ∨ f y)

delete :: IntegerSet → Integer → IntegerSetdelete (IS f ) x = IS (λy → (y 6≡ x ) ∧ f y)

member :: IntegerSet → Integer → Boolmember (IS f ) = f

It is used in exactly the same way as the SetADT definition:

((((empty ‘insert ‘ 3) ‘insert ‘ 2) ‘delete‘ 3) ‘insert ‘ 1) ‘member ‘ 3

We have only a single observer, so the shallow embedding is as precisely thatobserver, and the observer itself is essentially the identity function. More gener-ally, there may be multiple observers; then the embedding would be as a tupleof values, and the observers would be projections.

In a suitable sense, the deep embedding can be seen as the most abstractimplementation possible of the given interface, and the shallow embedding asthe most concrete: there are transformations from the deep embedding to anyintermediate implementation, such as SetADT—roughly,

Page 8: Functional Programming for Domain-Speci c Languages · timizing new languages. Functional programming provides a promising framework for such tasks; FP and DSLs are natural partners.

8 Jeremy Gibbons

elements :: SetLanguageDeep.IntegerSet → SetADT .IntegerSetelements Empty = [ ]elements (Insert xs x ) = x : elements xselements (Delete xs x ) = filter (6≡ x ) (elements xs)

and from this to the shallow embedding—roughly,

membership :: SetADT .IntegerSet → SetLanguageShallow .IntegerSetmembership xs = λx → any (≡ x ) xs

Expressed categorically, there is a category of implementations and transforma-tions between them, and in this category the deep embedding is the initial objectand the shallow embedding the final object [6]. The shallow embedding arises bydeforesting the abstract syntax tree that forms the basis of the deep embedding.

Kamin [7] calls deep and shallow embedding operational and denotationaldomain modelling, respectively, and advocates the latter in preference to theformer. Erwig and Walkingshaw [8] call shallow embedding semantics-drivendesign, and also favour it over what they might call syntax-driven design.

Deep embedding makes it easier to extend the DSL with new observers, suchas new analyses of programs in the language: just define a new function byinduction over the abstract syntax. But it is more difficult to extend the syntaxof the language with new operators, because each extension entails revisiting thedefinitions of all existing observers. Conversely, shallow embedding makes newoperators easier to add than new observers. This dichotomy is reminiscent of thatbetween OO programs structured around the Visitor pattern [9] and those inthe traditional OO style with methods attached to subclasses of an abstractNode class. The challenge of getting the best of both worlds—extensibility inboth dimensions at once—has been called the expression problem [10].

2.5 Embedded and standalone

All the approaches described above have been for embedded DSLs, of one kindor another: ‘programs’ in the DSL are simply expressions in the host language.An alternative approach is given by standalone DSLs. As the name suggests,a standalone DSL is quite independent of its implementation language: it mayhave its own syntax, which need bear no relation to that of the implementationlanguage—indeed, the same standalone DSL may have many implementations,in many different languages, which need have little in common with each other.Of course, being standalone, the DSL cannot depend on any of the features ofany of its implementation languages; everything must be build independently,using more or less standard compiler technology: lexer, parser, optimizer, codegenerator, etc.

Fortunately, there is a shortcut. It turns out that a standalone DSL canshare much of the engineering of the embedded DSL—especially if one is notso worried about absolute performance, and is more concerned about ease ofimplementation. The standalone DSL can be merely a frontend for the embeddedDSL; one only needs to write a parser converting strings in the standalone DSL

Page 9: Functional Programming for Domain-Speci c Languages · timizing new languages. Functional programming provides a promising framework for such tasks; FP and DSLs are natural partners.

Functional Programming for Domain-Specific Languages 9

to terms in the embedded DSL. (In fact, Parnas made a similar point overforty years ago [5]: he found that many of the design decisions—and hence themodules—could be shared between a compiler and an interpreter for the samelanguage, in his case for Markov processes.)

For example, suppose that we are given a type Parser a of parsers readingvalues of type a

type Parser a = ...

and an observer that applies a parser to a string and returns either a value oran error message:

runParser :: Parser a → String → Either a String

Then one can write a parser program :: Parser Bool for little set programs ina special syntax; perhaps “{}+3+2-3+1?3” should equate to the example ex-pressions above, with “{}” denoting the empty set, “+” and “-” the insertionand deletion operations, and “?” the membership test. Strings recognized byprogram are interpreted in terms of insert , delete etc, using one of the variousimplementations of sets described above. Then a simple wrapper program readsa string from the command line, tries to parse it, and writes out the booleanresult or an error message:

main :: IO ()main = do

ss ← getArgscase ss of

[s ]→ case runParser program s of -- single argLeft b → putStrLn ("OK: " ++ show b) -- parsedRight s → putStrLn ("Failed: " ++ s) -- not parsed→ do -- zero or multiple args

n ← getProgNameputStrLn ("Usage: " ++ n ++ " <set-expr>")

Thus, from the command line:

> ./sets "{}+3+2-3+1?3"

OK: False

We will return to this example in Section 3.3. Of course, parsers can be expressedas another DSL. . .

3 Functional programming for embedded DSLs

Having looked around the design space a little, we now step back to consider whatit is about functional programming that makes it particularly convenient forimplementing embedded DSLs. After all, a good proportion of the work on DSLsexpressed in the functional paradigm focusses on embedded DSL; and conversely,most work on DSLs in other (such as OO) paradigms focusses on standalone

Page 10: Functional Programming for Domain-Speci c Languages · timizing new languages. Functional programming provides a promising framework for such tasks; FP and DSLs are natural partners.

10 Jeremy Gibbons

DSLs. Why is that? We contend that there are three main aspects of modernfunctional programming that play a part: they are both useful for implementingembedded DSLs, and absent from most other programming paradigms. These arealgebraic datatypes, higher-order functions, and (perhaps to a lesser extent) lazyevaluation. We discuss each in turn, and illustrate with more simple examplesof embedded DSLs. Some parts are left as exercises.

3.1 Algebraic datatypes

The deep embedding approach depends crucially on algebraic datatypes, whichare used to represent abstract syntax trees for programs in the language. Withouta lightweight mechanism for defining and manipulating new tree datatypes, thisapproach becomes unworkably tedious.

Algebraic datatypes are extremely convenient for representing abstract syn-tax trees within the implementation of a DSL. Operations and observers typicallyhave simple recursive definitions, inductively defined over the structure of thetree; optimizations and transformations are often simple rearrangements of thetree—for example, rotations of tree nodes to enforce right-nesting of associativeoperators.

In addition to this, algebraic datatypes are also extremely convenient formaking connections outside the DSL implementation. Often the DSL is one in-habitant of a much larger software ecosystem; while an embedded implementa-tion within a functional programming language may be the locally optimal choicefor this DSL, it may have to interface with other inhabitants of the ecosystem,which for legacy reasons or because of other constraints require completely differ-ent implementation paradigms. (For example, one might have a DSL for financialcontracts, interfacing with Microsoft Excel at the front end for ease of use bydomain specialists, and with monolithic C++ pricing engines at the back endfor performance.) Algebraic datatypes form a very useful marshalling format forintegration, parsed from strings as input and pretty-printed back to strings asoutput.

Consider a very simple language of arithmetic expressions, involving integerconstants and addition. As a deeply embedded DSL, this can be captured by thefollowing algebraic datatype:

data Expr = Val Integer| Add Expr Expr

Some people call this datatype Hutton’s Razor, because Graham Hutton has beenusing it for years as a minimal vehicle for exploring many aspects of compilation[11].

Exercises

1. Write an observer for the expression language, evaluating expressions asintegers.

eval :: Expr → Integer

Page 11: Functional Programming for Domain-Speci c Languages · timizing new languages. Functional programming provides a promising framework for such tasks; FP and DSLs are natural partners.

Functional Programming for Domain-Specific Languages 11

2. Write another observer, printing expressions as strings.

print :: Expr → String

3. Reimplement the expression language using a shallow embedding, such thatthe interpretation is that of evaluations.

type Expr = Integerval :: Integer → Expradd :: Expr → Expr → Expr

4. Reimplement the expression language via a shallow embedding again, butthis time such that the interpretation is that of printing.

type Expr = Stringval :: Integer → Expradd :: Expr → Expr → Expr

5. Reimplement via a shallow embedding again, such that the interpretationprovides both evaluation and printing.

6. What interpretation of the shallow embedding provides the deep embed-ding? Conversely, given the deep embedding, what additional computationis needed to obtain the various interpretations we have used as shallow em-beddings?

7. What if you wanted a third interpretation, say computing the size of anexpression? What if you wanted to allow ten different interpretations? Whatabout allowing for unforeseen future interpretations?

3.2 Generalized algebraic datatypes

The Expr DSL above is untyped, or rather “unityped”: there is only a singletype involved, namely integer expressions. Suppose that we want to representboth integer- and boolean-valued expressions:

data Expr = ValI Integer| Add Expr Expr| ValB Boolean| And Expr Expr| EqZero Expr| If Expr Expr Expr

The idea is that EqZero yields a boolean value (whether its argument evaluatesto zero), and If should take a boolean-valued expression as its first argument.But what can we do for the evaluation function? Sometimes it should return aninteger, sometimes a boolean. One simple solution is to make it return an Eithertype:

eval :: Expr → Either Integer Booleval (ValI n) = Left neval (Add x y) = case (eval x , eval y) of (Left m,Left n)→ Left (m + n)

Page 12: Functional Programming for Domain-Speci c Languages · timizing new languages. Functional programming provides a promising framework for such tasks; FP and DSLs are natural partners.

12 Jeremy Gibbons

eval (ValB b) = Right beval (And x y) = case (eval x , eval y) of (Right a,Right b)→ Right (a ∧ b)eval (EqZero x ) = case eval x of Left n → Right (n ≡ 0)eval (If x y z ) = case eval x of Right b → if b then eval y else eval z

This is rather clumsy. For one thing, eval has become a partial function; thereare improper values of type Expr such as EqZero (ValB True) on which eval isundefined. For a second, all the tagging and untagging of return types is a sourceof inefficiency. Both of these problems are familiar symptoms of dynamic typechecking; if we could statically check the types instead, then we could rule outill-typed programs, and also abolish the runtime tags—a compile-time proof ofwell-typedness prevents the former and eliminates the need for the latter.

A more sophisticated solution, and arguably The Right Way, is to use de-pendent types, as discussed by Edwin Brady elsewhere in this Summer School.There are various techniques one might use; for example, one might tuple valueswith value-level codes for types, provide an interpretation function from codesto the types they stand for, and carry around “proofs” that values do indeedinhabit the type corresponding to their type code.

Haskell provides an intermediate, lightweight solution in the form of typeindexing, through so-called generalized algebraic datatypes or GADTs. Let usrewrite the Expr datatype in an equivalent but slightly more repetitive form:

data Expr :: ∗ whereValI :: Integer → ExprAdd :: Expr → Expr → ExprValB :: Bool → ExprAnd :: Expr → Expr → ExprEqZero :: Expr → ExprIf :: Expr → Expr → Expr → Expr

This form lists the signatures of each of the constructors; of course, they are allconstructors for the datatype Expr , so they all repeat the same return type Expr .But this redundancy allows us some flexibility: we might allow the constructorsto have different return types. Specifically, GADTs allow the constructors of apolymorphic datatype to have return types that are instances of the type beingreturned, rather than the full polymorphic type.

In this case, we make Expr a polymorphic type, but only provide constructorsfor values of type Expr Integer and Expr Bool , and not for other instances of thepolymorphic type Expr a.

data Expr :: ∗ → ∗ whereValI :: Integer → Expr IntegerAdd :: Expr Integer → Expr Integer → Expr IntegerValB :: Bool → Expr BoolAnd :: Expr Bool → Expr Bool → Expr BoolEqZero :: Expr Integer → Expr BoolIf :: Expr Bool → Expr a → Expr a → Expr a

Page 13: Functional Programming for Domain-Speci c Languages · timizing new languages. Functional programming provides a promising framework for such tasks; FP and DSLs are natural partners.

Functional Programming for Domain-Specific Languages 13

We use the type parameter as an index: a term of type Expr a is an expressionthat evaluates to a value of type a. Evaluation becomes much simpler:

eval :: Expr a → aeval (ValI n) = neval (Add x y) = eval x + eval yeval (ValB b) = beval (And x y) = eval x ∧ eval yeval (EqZero x ) = eval x ≡ 0eval (If x y z ) = if eval x then eval y else eval z

As well as being simpler, it is also safer (there is no possibility of ill-typedexpressions, and eval is a total function again) and swifter (there are no runtimeLeft and Right tags to manipulate any more).

Exercises

8. The type parameter a in Expr a is called a phantom type: it doesn’t representcontents, as the type parameter in a container datatype such as List a does,but some other property of the type. Indeed, there need be no Boolean insidean expression of type Expr Bool ; give an expression of type Expr Bool thatcontains no Bools. Is there always an Integer inside an expression of typeExpr Integer?

9. How do Exercises 2 to 7 work out in terms of GADTs?

3.3 Higher-order functions

Deep embeddings lean rather heavily on algebraic datatypes. Conversely, shallowembeddings depend on higher-order functions—functions that accept functionsas arguments or return them as results—and more generally on functions asfirst-class citizens of the host language. A simple example where this arises is ifwe were to extend the Expr DSL to allow for let bindings and variable references:

val :: Integer → Expradd :: Expr → Expr → Exprvar :: String → Exprbnd :: (String ,Expr)→ Expr → Expr

The idea is that bnd represents let-bindings and var represents variable refer-ences, so that

bnd ("x", val 3) (add (var "x") (var "x"))

corresponds to the Haskell expression let x = 3 in x + x . The standard structureof an evaluator for languages with such bindings is to pass in and manipulatean environment of bindings from variables to values (not to expressions):

type Env = [(String , Integer)]eval :: Expr → Env → Integer

Page 14: Functional Programming for Domain-Speci c Languages · timizing new languages. Functional programming provides a promising framework for such tasks; FP and DSLs are natural partners.

14 Jeremy Gibbons

The environment is initially empty, but is augmented when evaluating the bodyof a let expression. With a shallow embedding, the interpretation is the evalu-ation function:

type Expr = Env → Integer

That is, expressions are represented not as integers, or strings, or pairs, but asfunctions (from environments to values).

Exercises

10. Complete the definition of the Expr DSL with let bindings, via a shallowembedding whose interpretation provides evaluation in an environment.

11. Look again at Exercise 7: can we define a shallow embedding that allows forunforeseen future interpretations? Hint: think about a ‘generic’ or ‘parametrized’interpretation, as a higher-order function, which can be instantiated to yieldevaluation, or printing, or any of a number of other concrete interpretations.What is common to the evaluation and printing interpretations above, andwhat is specific? What kinds of function is it sensible to consider as ‘inter-pretations’, and what should be ruled out?

A larger and very popular example of shallow embeddings with functionalinterpretations is given by parser combinators. Recall the type Parsera of parsersrecognizing values of type a from Section 2.5; such a parser is roughly a functionof type String → a. But we will want to combine parsers sequentially, so it isimportant that a parser also returns the remainder of the string after recognizinga chunk, so it would be better to use functions of type String → (a,String). Butwe will also want to allow parsers that fail to match (so that we can try a seriesof alternatives until one matches), and more generally parsers that match inmultiple ways, so it is better still to return a list of results:

type Parser a = String → [(a,String)]

(Technically, these are more than just parsers, because they combine semanticactions with recognizing and extracting structure from strings. But the termi-nology is well established.)

The runParser function introduced in Section 2.5 takes such a parser and aninput string, and returns either a successful result or an error message:

runParser :: Parser a → String → Either a StringrunParser p s = case p s of

[(a, s)]→ if all isSpace s then Left a else Right ("Leftover input: " ++ s)[ ] → Right "No parse"

x → Right ("Ambiguous, with leftovers " ++ show (map snd x ))

If the parser yields a single match, and any leftover input is all whitespace,we return that value; if there is a nontrivial remainder, no match, or multiplematches, we return an error message.

Such parsers can be assembled from the following small set of combinators:

Page 15: Functional Programming for Domain-Speci c Languages · timizing new languages. Functional programming provides a promising framework for such tasks; FP and DSLs are natural partners.

Functional Programming for Domain-Specific Languages 15

success :: a → Parser afailure :: Parser a(〈∗〉) :: Parser (a → b)→ Parser a → Parser b(〈|〉) :: Parser a → Parser a → Parser amatch :: (Char → Bool) → Parser Char

In other words, these are the operators of a small DSL for parsers. The intentionis that: parser success x always succeeds, consumes no input, and returns x ;failure always fails; p〈∗〉q is a kind of sequential composition, matching accordingto p (yielding a function) and then on the remaining input to q (yielding anargument), and applying the function to the argument; p 〈|〉q is a kind of choice,matching according to p or to q ; and match b matches the single character atthe head of the input, if this satisfies b, and fails if the input is empty or thehead doesn’t satisfy.

We can implement the DSL via a shallow embedding, such that the interpre-tation is the type Parser a. Each operator has a one- or two-line implementation:

success :: a → Parser asuccess x s = [(x , s)]

failure :: Parser afailure s = [ ]

(〈∗〉) :: Parser (a → b)→ Parser a → Parser b(p 〈∗〉 q) s = [(f a, s ′′) | (f , s ′)← p s, (a, s ′′)← q s ′ ]

(〈|〉) :: Parser a → Parser a → Parser a(p 〈|〉 q) s = p s ++ q s

match :: (Char → Bool)→ Parser Charmatch q [ ] = [ ]match q (c : s) = if q c then [(c, s)] else [ ]

From the basic operators above, we can derive many more, without dependingany further on the representation of parsers as functions. In each of the followingexercises, the answer is another one- or two-liner.

Exercises

12. Implement two variations of sequential composition, in which the first (re-spectively, the second) recognized value is discarded. These are useful whenone of the recognized values is mere punctuation.

(∗〉) :: Parser a → Parser b → Parser b(〈∗) :: Parser a → Parser b → Parser a

13. Implement iteration of parsers, so-called Kleene plus (some) and Kleene star(many), which recognize one or more (respectively, zero or more) occurrencesof what their argument recognizes.

some,many :: Parser a → Parser [a ]

Page 16: Functional Programming for Domain-Speci c Languages · timizing new languages. Functional programming provides a promising framework for such tasks; FP and DSLs are natural partners.

16 Jeremy Gibbons

14. Implement a whitespace parser, which recognizes a nonempty section ofwhitespace characters (you might find the Haskell standard library func-tion Data.Char .isSpace helpful). Implement a variation ows for which thewhitespace is optional. For both of these, we suppose that the actual natureof the whitespace is irrelevant, and should be discarded.

whitespace, ows :: Parser ()

15. Implement a parser token, which takes a string and recognizes exactly andonly that string at the start of the input. Again, we assume that the stringso matched is irrelevant, since we know precisely what it will be.

token :: String → Parser ()

16. Now implement the parser program from Section 2.5, which recognizes a“set program”. A set program starts with the empty set {}, has zero ormore insert (+) and delete (-) operations, and a mandatory final member(?) operation. Each operation is followed by an integer argument. Optionalwhitespace is allowed in all sensible places.

program :: Parser Bool

3.4 Lazy evaluation

A third aspect of modern functional programming that lends itself to embeddedDSLs—albeit, perhaps, less important than algebraic datatypes and higher-orderfunctions—is lazy evaluation. Evaluation is demand-driven, and function argu-ments are not evaluated until their value is needed to determine the next step(for example, to determine which of multiple clauses of a definition to apply);and moreover, once an argument is evaluated, that value is preserved and reusedrather than being discarded and recomputed for subsequent uses.

One nice consequence of lazy evaluation is that infinite data structures workjust as well as finite ones: as long as finite parts of the result of a function canbe constructued from just finite parts of the input, the complete infinite datastructure may not need ever to be constructed. This is sometimes convenient fora shallow embedding, allowing one to use a datatype of infinite data structuresfor the domain of interpretation. This can lead to simpler programs than wouldbe the case if one were restricted to finite data structures—in the latter case,some terminating behaviour has to be interwoven with the generator, whereasin the former, the two can be quite separate. But we will not study infinite datastructures further in this article.

A second consequence of lazy evaluation manifests itself even in finite data:if one component of a result is not used anywhere, it is not evaluated. This isvery convenient for shallow embeddings of DSLs with multiple observers. Theinterpretation is then as a tuple containing all the observations of a term; but ifsome of those observations are not used, they need not be computed.

Page 17: Functional Programming for Domain-Speci c Languages · timizing new languages. Functional programming provides a promising framework for such tasks; FP and DSLs are natural partners.

Functional Programming for Domain-Specific Languages 17

Exercises

17. Review Exercise 5, which was to define a shallow embedding interpreted as apair, providing both evaluation and printing. Convince yourself that if onlyone component of the pair is demanded, only that component gets computed.

18. Here is an alternative technique for allowing for multiple observers with ashallow embedding. It is presented here using Haskell type classes; but thegeneral idea is about having a data abstraction with an interface and achoice of implementations, and doing abstract interpretation in one of thoseimplementations. For simplicity, let us return to Hutton’s Razor

type Expr = ...val :: Integer → Expradd :: Expr → Expr → Expr

with two desired observers

eval :: Expr → Integerprint :: Expr → String

The trick is to define Expr as a type class, the class of those types suitable asrepresentations of expressions according to this little DSL. What operationsmust a type support, if it is to be suitable for representing expressions? Itneeds to have at least the val and add operations:

class Expr a whereval :: Integer → aadd :: a → a → a

Of course, it is easy to define these two operations on integers:

instance Expr Integer whereval n = nadd x y = x + y

It is also easy to define them on strings:

instance Expr String whereval n = show nadd x y = "(" ++ x ++ "+" ++ y ++ ")"

Now, a term in the expression DSL has a polymorphic type: it can be inter-preted in any type in the type class Expr .

expr :: Expr a ⇒ aexpr = add (val 3) (val 4)

Then evaluating and printing expressions amounts to interpreting the poly-morphic type at the appropriate instance:

eval Expr :: Integereval Expr = expr

Page 18: Functional Programming for Domain-Speci c Languages · timizing new languages. Functional programming provides a promising framework for such tasks; FP and DSLs are natural partners.

18 Jeremy Gibbons

print Expr :: Stringprint Expr = expr

Try this approach out. (You’ll find that you need some language extensionsfor the String instance, but the Haskell type checker should guide you inthe right direction.) It’s a bit of an idiosyncratic way of implementing dataabstraction: the implementation is chosen implicitly by fixing a type, ratherthan explicitly by passing a parameter. This is a slight problem, if you wanttwo different interpretations on the same type, such as compact and verboseprintings. What can you do to work around that?

4 An extended example: diagrams

We now turn to a larger example of an embedded DSL, inspired by BrentYorgey’s diagrams project [4] for two-dimensional diagrams. That project im-plements a very powerful language which Yorgey doesn’t name, but we’ll call itDiagrams. But it’s also rather a large language, so we won’t attempt to cover thewhole thing; instead, we build a much simpler language in the same style. Thediagrams project does, however, provide a useful backend to output ScalableVector Graphics (SVG) files, which we will borrow to save ourselves from havingto reinvent one.

4.1 Shapes, styles, and pictures

The basics of our diagram DSL can be expressed in three simpler sublanguages,for shapes, styles, and pictures. We express them first via deep embedding. First,there are primitive shapes—as a language, these aren’t very interesting, becausethey aren’t recursive.

data Shape= Rectangle Double Double| Ellipse Double Double| Triangle Double

The parameters of a Rectangle specify its width and height; those of an Ellipseits x- and y-radii. A Triangle is equilateral, with its lowest edge parallel to thex-axis; the parameter is the length of the side.

Then there are drawing styles. A StyleSheet is a (possibly empty) sequenceof stylings, each of which specifies fill colour, stroke colour, and stroke width.(The defaults are for no fill, and very thin black strokes.)

type StyleSheet = [Styling ]data Styling

= FillColour Col| StrokeColour Col| StrokeWidth Double

Page 19: Functional Programming for Domain-Speci c Languages · timizing new languages. Functional programming provides a promising framework for such tasks; FP and DSLs are natural partners.

Functional Programming for Domain-Specific Languages 19

Here, colours are defined in an external library, which among other things pro-vides a large number of colour constants named according to the W3C SVGRecommendation [12, §4.4].

type Col = ...red , blue, green, yellow , brown, black ... ::Col

Finally, pictures are arrangements of shapes: individual shapes, with styling; orone picture above another, or one beside another. For simplicity, we specify thathorizontal and vertical alignment of pictures is by their centres.

data Picture= Place StyleSheet Shape| Above Picture Picture| Beside Picture Picture

For example, here is a little stick figure of a woman in a red dress and bluestockings.

figure :: Picturefigure = Place [StrokeWidth 0.1,FillColour bisque ] (Ellipse 3 3) ‘Above‘

Place [FillColour red ,StrokeWidth 0] (Rectangle 10 1) ‘Above‘Place [FillColour red ,StrokeWidth 0] (Triangle 10) ‘Above‘(Place [FillColour blue,StrokeWidth 0] (Rectangle 1 5) ‘Beside‘Place [StrokeWidth 0] (Rectangle 2 5) ‘Beside‘Place [FillColour blue,StrokeWidth 0] (Rectangle 1 5)) ‘Above‘

(Place [FillColour blue,StrokeWidth 0] (Rectangle 2 1) ‘Beside‘Place [StrokeWidth 0] (Rectangle 2 1) ‘Beside‘Place [FillColour blue,StrokeWidth 0] (Rectangle 2 1))

The intention is that it should be drawn like this:

(Note that blank spaces can be obtained by rectangles with zero stroke width.)

Page 20: Functional Programming for Domain-Speci c Languages · timizing new languages. Functional programming provides a promising framework for such tasks; FP and DSLs are natural partners.

20 Jeremy Gibbons

4.2 Transformations

In order to arrange pictures, we will need to be able to translate them. Later on,we will introduce some other transformations too; with that foresight in mind,we introduce a simple language of transformations—the identity transformation,translations, and compositions of these.

type Pos = Complex Double

data Transform= Identity| Translate Pos| Compose Transform Transform

For simplicity, we borrow the Complex type from the Haskell libraries to repre-sent points in the plane; the point with coordinates (x, y) is represented by thecomplex number x :+y . Complex is an instance of the Num type class, so we getarithmetic operations on points too. For example, we can apply a Transform toa point:

transformPos :: Transform → Pos → PostransformPos Identity = idtransformPos (Translate p) = (p+)transformPos (Compose t u) = transformPos t ◦ transformPos u

Exercises

19. Transform is represented above via a deep embedding, with a separate ob-server function transformPos. Reimplement Transform via a shallow embed-ding, with this sole observer.

4.3 Simplified pictures

As it happens, we could easily translate the Picture language directly intoDiagrams: it has equivalents of Above and Beside, for example. But if we were“executing” our pictures in a less sophisticated setting—for example, if we hadto implement the SVG backend from first principles—we would eventually haveto simplify recursively structured pictures into a flatter form.

Here, we flatten the hierarchy into a non-empty sequence of transformedstyled shapes:

type Drawing = [(Transform,StyleSheet ,Shape)]

In order to simplify alignment by centres, we will arrange that each simplifiedDrawing is itself centred: that is, the combined extent of all translated shapeswill be centred on the origin. Extents are represented as pairs of points, for thelower left and upper right corners of the orthogonal bounding box.

type Extent = (Pos,Pos)

Page 21: Functional Programming for Domain-Speci c Languages · timizing new languages. Functional programming provides a promising framework for such tasks; FP and DSLs are natural partners.

Functional Programming for Domain-Specific Languages 21

The crucial operation on extents is to compute their union:

unionExtent :: Extent → Extent → ExtentunionExtent (llx 1 :+ lly1, urx 1 :+ ury1) (llx 2 :+ lly2, urx 2 :+ ury2)

= (min llx 1 llx 2 :+ min lly1 lly2,max urx 1 urx 2 :+ max ury1 ury2)

Now, the extent of a drawing is the union of the extents of each of its translatedshapes, where the extent of a translated shape is the translation of the twocorners of the extent of the untranslated shape:

drawingExtent :: Drawing → ExtentdrawingExtent = foldr1 unionExtent ◦map getExtent where

getExtent (t , , s) = let (ll , ur) = shapeExtent sin (transformPos t ll , transformPos t ur)

(You might have thought initially that if all Drawings are kept centred, one pointrather than two serves to define the extent. But this doesn’t work: in computingthe extent of a whole Picture, of course we have to translate its constituentDrawings off-centre.) The extents of individual shapes can be computed using alittle geometry:

shapeExtent :: Shape → ExtentshapeExtent (Ellipse xr yr) = (−(xr :+ yr), xr :+ yr)

shapeExtent (Rectangle w h) = (−(w/2 :+ h/2), w/2 :+ h/2)

shapeExtent (Triangle s) = (−(s/2 :+√

3× s/4), s/2 :+√

3× s/4)

Now to simplify Pictures into Drawings, via a straightforward traversal overthe structure of the Picture.

drawPicture :: Picture → DrawingdrawPicture (Place u s) = drawShape u sdrawPicture (Above p q) = drawPicture p ‘aboveD ‘ drawPicture qdrawPicture (Beside p q) = drawPicture p ‘besideD ‘ drawPicture q

All the work is in the individual operations. drawShape constructs an atomicstyled Drawing , centred on the origin.

drawShape :: StyleSheet → Shape → DrawingdrawShape u s = [(Identity , u, s)]

aboveD and besideD both work by forming the “union” of the two child Drawings,but first translating each child by the appropriate amount—an amount calcu-lated so as to ensure that the resulting Drawing is again centred on the origin.

aboveD , besideD :: Drawing → Drawing → Drawingpd ‘aboveD ‘ qd = transformDrawing (Translate (0 :+ qury)) pd ++

transformDrawing (Translate (0 :+ plly)) qd where(pllx :+ plly , pur) = drawingExtent pd(qll , qurx :+ qury) = drawingExtent qd

pd ‘besideD ‘ qd = transformDrawing (Translate (qllx :+ 0)) pd ++transformDrawing (Translate (purx :+ 0)) qd where

Page 22: Functional Programming for Domain-Speci c Languages · timizing new languages. Functional programming provides a promising framework for such tasks; FP and DSLs are natural partners.

22 Jeremy Gibbons

(pll , purx :+ pury) = drawingExtent pd(qllx :+ qlly , qur) = drawingExtent qd

This involves transforming the child Drawings; but that’s easy, given our repre-sentation.

transformDrawing :: Transform → Drawing → DrawingtransformDrawing t = map (λ(t ′, u, s)→ (Compose t t ′, u, s))

Exercises

20. Add Square and Circle to the available Shapes; for simplicity, you can im-plement these using rect and ellipseXY .

21. Add Blank to the available shapes; implement this as a rectangle with zerostroke width.

22. Centring and alignment, as described above, are only approximations, be-cause we don’t take stroke width into account. How would you do so?

23. Add InFrontOf ::Picture → Picture → Picture as an operator to the Picturelanguage, for placing one Picture in front of (that is, on top of) another.Using this, you can draw a slightly less childish-looking stick figure, with the“arms” overlaid on the “body”:

24. Add FlipV :: Picture → Picture as an operator to the Picture language, forflipping a Picture vertically (that is, from top to bottom, about a horizontalaxis). Then you can draw this chicken:

Page 23: Functional Programming for Domain-Speci c Languages · timizing new languages. Functional programming provides a promising framework for such tasks; FP and DSLs are natural partners.

Functional Programming for Domain-Specific Languages 23

You’ll need to add a corresponding operator ReflectY to the Transformlanguage; you might note that the conjugate function on complex numberstakes x :+ y to x :+ (−y). Be careful in computing the extent of a flippedpicture!

25. Picture is represented above via a deep embedding, with a separate observerfunction drawPicture. Reimplement Picture via a shallow embedding, withthis sole observer.

4.4 Generating SVG

The final step is to assemble our simplified Drawing into some expression in theDiagrams language. What we need are the following:

– A type for representing diagrams.

type DiagramSVG = ...

(This is actually a synonym for a specialization of a more flexible Diagramstype from Yorgey’s library.)

– Primitives of type DiagramSVG :

rect :: Double → Double → DiagramSVGellipseXY :: Double → Double → DiagramSVGeqTriangle :: Double → DiagramSVG

– An operator for superimposing diagrams:

atop :: DiagramSVG → DiagramSVG → DiagramSVG

– Transformations on diagrams:

translate ◦ r2 :: (Double,Double)→ DiagramSVG → DiagramSVGreflectY :: DiagramSVG → DiagramSVG

(The latter is needed for Exercise 24.)– Functions for setting fill colour, stroke colour, and stroke width attributes:

fc :: Col → DiagramSVG → DiagramSVGlc :: Col → DiagramSVG → DiagramSVGlw :: Double → DiagramSVG → DiagramSVG

– A wrapper function that writes a diagram out in SVG format to a specifiedfile:

writeSVG :: FilePath → DiagramSVG → IO ()

Then a Drawing can be assembled into a DiagramSVG by laying one translatedstyled shape on top of another:

assemble :: Drawing → DiagramSVGassemble = foldr1 atop ◦map draw where

draw (t , u, s) = transformDiagram t (diagramShape u s)

Page 24: Functional Programming for Domain-Speci c Languages · timizing new languages. Functional programming provides a promising framework for such tasks; FP and DSLs are natural partners.

24 Jeremy Gibbons

Note that Shapes earlier in the list appear “in front” of those later; you’ll needto use this fact in solving Exercise 23.

A StyleSheet represents a sequence of functions, which are composed into onestyling function:

applyStyleSheet :: StyleSheet → (DiagramSVG → DiagramSVG)applyStyleSheet = foldr (◦) id ◦map applyStyling

applyStyling :: Styling → DiagramSVG → DiagramSVGapplyStyling (FillColour c) = fc capplyStyling (StrokeColour c) = lc capplyStyling (StrokeWidth w) = lw w

A single styled shape is drawn by applying the styling function to the corre-sponding atomic diagram:

diagramShape :: StyleSheet → Shape → DiagramSVGdiagramShape u s = shape (applyStyleSheet u) s where

shape f (Ellipse xr yr) = f (ellipseXY xr yr)shape f (Rectangle w h) = f (rect w h)shape f (Triangle s) = f (translate (r2 (0,−y)) (eqTriangle s))

where y = s ×√3/12

(The odd translation of the triangle is because we place triangles by their centre,but Diagrams places them by their centroid.)

A transformed shape is drawn by transforming the diagram of the underlyingshape.

transformDiagram :: Transform → DiagramSVG → DiagramSVGtransformDiagram Identity = idtransformDiagram (Translate (x :+ y)) = translate (r2 (x , y))transformDiagram (Compose t u) = transformDiagram t ◦

transformDiagram u

And that’s it! (You can look at the source file Shapes.lhs for the definition ofwriteSVG , and some other details.)

Exercises

26. In Exercise 19, we reimplemented Transform as a shallow embedding, withthe sole observer being to transform a point. This doesn’t allow us to applythe same transformations to DiagramSVG objects, as required by the func-tion transformDiagram above. Extend the shallow embedding of Transformso that it has two observers, for transforming both points and diagrams.

27. A better solution to Exercise 26 would be to represent Transform via ashallow embedding with a single parametrized observer, which can be in-stantiated at least to the two uses we require. What are the requirements onsuch instantiations?

Page 25: Functional Programming for Domain-Speci c Languages · timizing new languages. Functional programming provides a promising framework for such tasks; FP and DSLs are natural partners.

Functional Programming for Domain-Specific Languages 25

28. Simplifying a Picture into a Drawing is a bit inefficient, because we have tocontinually recompute extents. A more efficient approach would be to extendthe Drawing type so that it caches the extent, as well as storing the list ofshapes. Try this.

29. It can be a bit painful to specify a complicated Picture with lots of Shapes alldrawn in a common style—for example, all blue, with a thick black stroke—because those style settings have to be repeated for every single Shape. Ex-tend the Picture language so that Pictures too may have StyleSheets; stylesshould be inherited by children, unless they are overridden.

30. Add an operator Tile to the Shape language, for square tiles with markingson. It should take a Double parameter for the length of the side, and a listof lists of points for the markings; each list of points has length at leasttwo, and denotes a path of straight-line segments between those points. Forexample, here is one such pattern of markings:

markingsP :: [[Pos ]]markingsP = [[(4 :+ 4), (6 :+ 0)],

[(0 :+ 3), (3 :+ 4), (0 :+ 8), (0 :+ 3)],[(4 :+ 5), (7 :+ 6), (4 :+ 10), (4 :+ 5)],[(11 :+ 0), (10 :+ 4), (8 :+ 8), (4 :+ 13), (0 :+ 16)],[(11 :+ 0), (14 :+ 2), (16 :+ 2)],[(10 :+ 4), (13 :+ 5), (16 :+ 4)],[(9 :+ 6), (12 :+ 7), (16 :+ 6)],[(8 :+ 8), (12 :+ 9), (16 :+ 8)],[(8 :+ 12), (16 :+ 10)],[(0 :+ 16), (6 :+ 15), (8 :+ 16), (12 :+ 12), (16 :+ 12)],[(10 :+ 16), (12 :+ 14), (16 :+ 13)],[(12 :+ 16), (13 :+ 15), (16 :+ 14)],[(14 :+ 16), (16 :+ 15)]]

In Shapes.lhs, you’ll find this definition plus three others like it. They yieldtile markings looking like this:

You can draw such tiles via the function

tile :: [[Pos ]]→ DiagramSVG

provided for you. Also add operators to the Picture and Transform languagesto support scaling by a constant factor and rotation by a quarter-turn an-ticlockwise, both centred on the origin. You can implement these on theDiagramSVG type using two Diagrams operators:

Page 26: Functional Programming for Domain-Speci c Languages · timizing new languages. Functional programming provides a promising framework for such tasks; FP and DSLs are natural partners.

26 Jeremy Gibbons

scale :: Double → DiagramSVG → DiagramSVGrotateBy (1/4) :: DiagramSVG → DiagramSVG

Then suitable placements, rotations, and scalings of the four marked tileswill produce a rough version of Escher’s “Square Limit” print, as shown inthe left-hand image below:

This construction was explored by Peter Henderson in a famous early paperon functional geometry [13, 14]; I have taken the data for the markings from anote by Frank Buß [15]. The image on the right is taken from WikiPaintings[16].

31. Morally, “Square Limit” is a fractal image: the recursive decomposition canbe taken ad infinitum. Because Haskell uses lazy evaluation, that’s not aninsurmountable obstacle. The datatype Picture includes also infinite terms;and because Diagrams is an embedded DSL, you can use a recursive Haskelldefinition to define an infinite Picture. You can’t render it directly to SVG,though; that would at best yield an infinite SVG file. But still, you can prunethe infinite picture to a finite depth, and then render the result. Constructthe infinite Picture. (You’ll probably need to refactor some code. Note thatyou can’t compute the extent of an infinite Picture either.)

References

1. Fowler, M.: Domain-Specific Languages. Addison-Wesley (2011)

2. Mernik, M., Heering, J., Sloane, A.M.: When and how to develop domain-specificlanguages. ACM Computing Surveys 37(4) (2005) 316–344

3. Bentley, J.: Little languages. Communications of the ACM 29(8) (1986) 711–721Also in ‘More Programming Pearls’ (Addison-Wesley, 1988).

4. Yorgey, B.: Diagrams 0.6. http://projects.haskell.org/diagrams/ (2012)

5. Parnas, D.L.: On the criteria to be used in decomposing systems into modules.Communications of the ACM 15(12) (1972) 1053–1058

6. Wand, M.: Final algebra semantics and data type extensions. Journal of Computerand System Sciences 19 (1979) 27–44

Page 27: Functional Programming for Domain-Speci c Languages · timizing new languages. Functional programming provides a promising framework for such tasks; FP and DSLs are natural partners.

Functional Programming for Domain-Specific Languages 27

7. Kamin, S.: An implementation-oriented semantics of Wadler’s pretty-printingcombinators. Oregon Graduate Institute, http://www-sal.cs.uiuc.edu/~kamin/pubs/pprint.ps (1998)

8. Erwig, M., Walkingshaw, E.: Semantics-driven DSL design. In Mernik, M., ed.: For-mal and Practical Aspects of Domain-Specific Languages: Recent Developments.IGI-Global (2012) 56–80

9. Gamma, E., Helm, R., Johnson, R., Vlissides, J.: Design Patterns: Elements ofReusable Object-Oriented Software. Addison-Wesley (1995)

10. Wadler, P.L.: The expression problem. Posting to java-genericity mailing list(1998)

11. Hutton, G.: Fold and unfold for program semantics. In: Proceedings of the ThirdACM SIGPLAN International Conference on Functional Programming, Baltimore,Maryland (1998) 280–288

12. W3C: Scalable vector graphics (SVG) 1.1: Recognized color keyword names. http://www.w3.org/TR/SVG11/types.html#ColorKeywords (2011)

13. Henderson, P.: Functional geometry. In: Lisp and Functional Programming. (1982)179–187 http://users.ecs.soton.ac.uk/ph/funcgeo.pdf.

14. Henderson, P.: Functional geometry. Higher Order and Symbolic Computing 15(4)(2002) 349–365 Revision of [13].

15. Buß, F.: Functional geometry. http://www.frank-buss.de/lisp/functional.

html (2005)16. Escher, M.C.: Square limit. http://www.wikipaintings.org/en/m-c-escher/

square-limit (1964)