Compilers Are Databases
date post
16-Apr-2017Category
Technology
view
6.008download
0
Embed Size (px)
Transcript of Compilers Are Databases
Slide 1
Compilers Are DatabasesJVM Languages Summit
Martin OderskyTypeSafe and EPFL
Compilers...
2
Compilers and Data Bases
3
Compilers are Data Bases?
4
Put a square peg in a round hole?
This Talk ...... reports on a new compiler architecture for dsc, the Dotty Scala Compiler.
It has a mostly functional architecture, but uses a lot of low-level tricks for speed.
Some of its concepts are inspired by functional databases.
My Early Involvement in Compilers80sPascal, Modula-2 single pass, following the school of Niklaus Wirth.
95-96Espresso, the 2nd Java compiler E Compiler Borlands JBuilder used an OO AST with one class per node and all processing distributed between methods on these nodes.
96-99Pizza GJ javac (1.3+) -> scalac (1.x) replaced OO AST with pattern matching.6
Current Scala Compiler2004-12nsc compiler for Scala (2.0-2.10)
Made (some) use of functional capabilities of Scala
Added:
REPLpresentation compiler for IDEs (Eclipse, Ensime)run-time meta programming with toolboxes
Its the codebase for the official scalac compiler for 2.11, 2.12 and beyond.
7
Next Generation Scala Compiler2012 now: Dotty
Rethink compiler architecture from the ground up.Introduce some language changes with the aim of better regularity.Status: Close to bootstrapBut still rough around the edges
8
Compilers Traditional View9
Compilers Traditional View10
Add Separate Compilation11
ChallengesA compiler for a language like Scala faces quite a few challenges.
Among the most important are:
ComplexitySpeedLatencyReusability
Challenge: Complex Transformations
Input language (Scala) is complicated.Output language (JVM) is also complicated.Semantic gap between the two is large.
Compare with compilers to simple low-level languages such as System F or SSA.13
Deep Transformation Pipeline14ParserTyperFirstTransformValueClassesMixinLazyValsMemoizeCapturedVarsConstructorsLambdaLiftFlattenElimStaticThisRestoreScopesGenBCodeSourceBytecodeRefChecksElimRepeatedNormalizeFlagsExtensionMethodsTailRecPatternMatcherExplicitOuterExpandSAMsSplitterSeqLiteralsInterceptedMethsLiteralizeGettersClassTagsElimByNameAugmentS2TraitsResolveSuperErasure
To achieve reliability, need
excellent modularityminimized side effects
Functional code rules!
Challenge: Speed
Current scalac achieves 500-700 loc/sec on idiomatic Scala code.Can be much lower, depending on input.Everyone would like it to be faster.But this is very hard to achieve.
- FP does have costs.- Optimizations are ineffective.- No hotspots, costs are smeared out widely.
15
Challenge: Latency
Some applications require fast turnaround for small changes more than high throughput.Examples:REPLWorksheetIDE Presentation Compiler
Need to keep things loaded (program + data)16
Challenge: ReusabilityA compiler has many clients:Command lineBuild toolsIDEsREPLMeta-programming
Abstractions must not leak.
(FP helps)17
A QuestionEvery compiler has to answer questions like this:Say I have a classclass C[T] { def f(x: T): T = ...}At some point I change it to:class C[T] { def f(x: T)(y: T): T = ...}What is the type signature of C.f?
Clearly, it depends on the time when the question is asked!18
Time-Varying AnswersInitially:(x: T): T
After erasure:(x: Any): Any
After the edit:(x: T)(y: T): T
After uncurry:(x: T, y: T): T
After erasure:(x: Any, y: Any): Any19
Naive Functional Approach
World1 IR1,1 ... IRn,1 Output1
World2 IR1,2 ... IRn,2 Output2...Worldk IR1,k ... IRn,k Outputk
How big is the world?
20
A More Practical StrategyTaking Inspiration from FRP and Functional Databases:
Treat every value as a time-varying function.So the question is not:
What is the signature of C.f ?but:
What is the signature of C.f at a given point in time ?
Need to index every piece of information with the time where it holds.
21
Time in dsc
Period = (RunID, PhaseID)
RunIDs is incremented for each compiler runPhaseID ranges from 1 (parser) to ~ 50 (backend)
22
Run1Run2Run3
Time-Indexed Valuessig(C.f, (Run 1, parser)) = (x: T): T
sig(C.f, (Run 1, erasure)) = (x: Any): Any
sig(C.f, (Run 2, erasure))= (x: T)(y: T): T
sig(C.f, (Run 2, uncurry))=(x: T, y: T): T
sig(C.f, (Run 2, erasure)=(x: Any, y: Any): Any23
Task of the CompilerCompute all values needed for analysis and code generation over all periods where they are relevant.
Problem: The graph of this function is humongous!
More work is needed to make it efficiently explorable.
But for a start it looks like the right model.
24
Core Data Types
Abstract Syntax TreesTypesReferencesDenotationsSymbols
25
Abstract Syntax TreesFor instance, for x * 2:26
Tree AttributesWhat about tree attributes?In dsc, we simplified as much as we could.Were left with just two attributes:
Position (intrinsic)Type
The job of the type checker is to transform untyped to typed trees.
27
Typed Abstract Syntax Trees
28For instance, for x * 2:
The distinction whether a tree is typed or untyped is pretty important, merits being reflected in the type of AST itself.
From Untyped to Typed TreesIdea: parameterize the type Tree of ASTs with the attribute info it carries.
Typed tree: tpd.Tree = Tree[Type]Untyped tree:untpd.Tree = Tree[Nothing]
This leads to the following class:
class Tree[T] { def tpe: T def withType(t: Type): Tree[Type]} 29
Question of VarianceQuestion: Which of the following two subtype relationships should hold?
tpd.Tree