Compilers Are Databases

Click here to load reader

download Compilers Are Databases

of 50

  • date post

  • Category


  • view

  • download


Embed Size (px)

Transcript of Compilers Are Databases

Slide 1

Compilers Are DatabasesJVM Languages Summit

Martin OderskyTypeSafe and EPFL



Compilers and Data Bases


Compilers are Data Bases?


Put a square peg in a round hole?

This Talk ...... reports on a new compiler architecture for dsc, the Dotty Scala Compiler.

It has a mostly functional architecture, but uses a lot of low-level tricks for speed.

Some of its concepts are inspired by functional databases.

My Early Involvement in Compilers80sPascal, Modula-2 single pass, following the school of Niklaus Wirth.

95-96Espresso, the 2nd Java compiler E Compiler Borlands JBuilder used an OO AST with one class per node and all processing distributed between methods on these nodes.

96-99Pizza GJ javac (1.3+) -> scalac (1.x) replaced OO AST with pattern matching.6

Current Scala Compiler2004-12nsc compiler for Scala (2.0-2.10)

Made (some) use of functional capabilities of Scala


REPLpresentation compiler for IDEs (Eclipse, Ensime)run-time meta programming with toolboxes

Its the codebase for the official scalac compiler for 2.11, 2.12 and beyond.


Next Generation Scala Compiler2012 now: Dotty

Rethink compiler architecture from the ground up.Introduce some language changes with the aim of better regularity.Status: Close to bootstrapBut still rough around the edges


Compilers Traditional View9

Compilers Traditional View10

Add Separate Compilation11

ChallengesA compiler for a language like Scala faces quite a few challenges.

Among the most important are:


Challenge: Complex Transformations

Input language (Scala) is complicated.Output language (JVM) is also complicated.Semantic gap between the two is large.

Compare with compilers to simple low-level languages such as System F or SSA.13

Deep Transformation Pipeline14ParserTyperFirstTransformValueClassesMixinLazyValsMemoizeCapturedVarsConstructorsLambdaLiftFlattenElimStaticThisRestoreScopesGenBCodeSourceBytecodeRefChecksElimRepeatedNormalizeFlagsExtensionMethodsTailRecPatternMatcherExplicitOuterExpandSAMsSplitterSeqLiteralsInterceptedMethsLiteralizeGettersClassTagsElimByNameAugmentS2TraitsResolveSuperErasure

To achieve reliability, need

excellent modularityminimized side effects

Functional code rules!

Challenge: Speed

Current scalac achieves 500-700 loc/sec on idiomatic Scala code.Can be much lower, depending on input.Everyone would like it to be faster.But this is very hard to achieve.

- FP does have costs.- Optimizations are ineffective.- No hotspots, costs are smeared out widely.


Challenge: Latency

Some applications require fast turnaround for small changes more than high throughput.Examples:REPLWorksheetIDE Presentation Compiler

Need to keep things loaded (program + data)16

Challenge: ReusabilityA compiler has many clients:Command lineBuild toolsIDEsREPLMeta-programming

Abstractions must not leak.

(FP helps)17

A QuestionEvery compiler has to answer questions like this:Say I have a classclass C[T] { def f(x: T): T = ...}At some point I change it to:class C[T] { def f(x: T)(y: T): T = ...}What is the type signature of C.f?

Clearly, it depends on the time when the question is asked!18

Time-Varying AnswersInitially:(x: T): T

After erasure:(x: Any): Any

After the edit:(x: T)(y: T): T

After uncurry:(x: T, y: T): T

After erasure:(x: Any, y: Any): Any19

Naive Functional Approach

World1 IR1,1 ... IRn,1 Output1

World2 IR1,2 ... IRn,2 Output2...Worldk IR1,k ... IRn,k Outputk

How big is the world?


A More Practical StrategyTaking Inspiration from FRP and Functional Databases:

Treat every value as a time-varying function.So the question is not:

What is the signature of C.f ?but:

What is the signature of C.f at a given point in time ?

Need to index every piece of information with the time where it holds.


Time in dsc

Period = (RunID, PhaseID)

RunIDs is incremented for each compiler runPhaseID ranges from 1 (parser) to ~ 50 (backend)



Time-Indexed Valuessig(C.f, (Run 1, parser)) = (x: T): T

sig(C.f, (Run 1, erasure)) = (x: Any): Any

sig(C.f, (Run 2, erasure))= (x: T)(y: T): T

sig(C.f, (Run 2, uncurry))=(x: T, y: T): T

sig(C.f, (Run 2, erasure)=(x: Any, y: Any): Any23

Task of the CompilerCompute all values needed for analysis and code generation over all periods where they are relevant.

Problem: The graph of this function is humongous!

More work is needed to make it efficiently explorable.

But for a start it looks like the right model.


Core Data Types

Abstract Syntax TreesTypesReferencesDenotationsSymbols


Abstract Syntax TreesFor instance, for x * 2:26

Tree AttributesWhat about tree attributes?In dsc, we simplified as much as we could.Were left with just two attributes:

Position (intrinsic)Type

The job of the type checker is to transform untyped to typed trees.


Typed Abstract Syntax Trees

28For instance, for x * 2:

The distinction whether a tree is typed or untyped is pretty important, merits being reflected in the type of AST itself.

From Untyped to Typed TreesIdea: parameterize the type Tree of ASTs with the attribute info it carries.

Typed tree: tpd.Tree = Tree[Type]Untyped tree:untpd.Tree = Tree[Nothing]

This leads to the following class:

class Tree[T] { def tpe: T def withType(t: Type): Tree[Type]} 29

Question of VarianceQuestion: Which of the following two subtype relationships should hold?