Post on 19-Dec-2015
Modern Compiler Internal Representations
Silvius Rus
1/23/2002
Presentation Navigator
Introduction Challenges Staged compilation Generate efficient code Case studies Conclusions
Traditional Compiler Organization
Pass: output type– Read code as text: ASCII characters– Lexical scanner: language words– Syntactic parser: language phrases– Translation: attribute grammar phrases– Output generated code: binary stream
Focus on pipelining due to memory window constraints
Traditional Compiler Internal Representation
Grammatical structure not always built explicitly
Implicit, built-in semantics Simple data structures:
– Transition tables– Token streams and stacks
Presentation Navigator
Introduction Challenges Staged compilation Generate efficient code Case studies Conclusions
Compiler Challenges
Versatile: – Understand multiple languages – Generate output for various architectures
Generated efficient code:– Fast: as fast as coded directly in the output language – Portable: runs on multiple platforms – Verifiable: runs provably within a specified class of behavior – Secure: provably respects certain security requirements
Extendable: need to extend in order to: – Incorporate new input language and/or target system– Take advantage of advances in run-time environments (such as ISA
changes, multithreading, distributed/parallel execution)
L+A < L*A
Understand Multiple Languages - Output for Multiple Targets
Abstract IR:– Same representation for Fortran, C, C++, Java, …– Possible only for conceptually similar languages
Good points:– Perform complex transformations on a single representation
Bad points:– Language semantics may either get lost or need additional
particular representation– Specific architecture characteristics are more profitable to use
than common (abstractable) ones
Presentation Navigator
Introduction Challenges Staged compilation Generate efficient code Case studies Conclusions
Staged Compilation
Stage 1:– Load source file (text) into IR1 – machine independent– Optimize IR1– Stream IR1 to text file
Save/reload, pipe, HTTP, … text file– SUIF files, Java bytecode, .NET assembly
Stage 2:– Load text file into IR2 – machine dependent– Perform machine specific optimization on IR2– Generate executable code or interpret IR2
Stage 1 Stage 2 Examples
Static Static SUIF, Promis
Static Dynamic SUN JIT, .NET JITer
Static Static + Dynamic
DyC, Quicksilver
Staged Compilation
Staged Compilation
Prepare IR1 so that stage 2 is very cheap– Quicksilver
Insert templated optimized object code in bytecode Pack speculative optimization validation predicates in bytecode Keep method dependence graphs explicitly in bytecode
– Microsoft .NET Explicit type/class information in IL Preformatted, quickly accessible metadata
– Strings, tables, heaps– Custom data
Allow embedding of native code
Presentation Navigator
Introduction Challenges Staged compilation Generate efficient code Case studies Conclusions
Generate Fast And Portable Code
Fast code– IR close to machine structure
Mapping data to registers Mapping operations to opcodes Scheduling instructions for superscalar/VLIW processors
Portable code– Machine description must be totally abstracted
QuickSilver: templated optimized code
Generate Verifiable Code
Microsoft .NET IL– Static and dynamic type safety - reflections– Managed code
Carries a minimum of information on itself Usually signed by compiler in Stage 1
– Managed data Only accessible from managed code Garbage collected
– Managed pointers
Generate Secure Code
Hard to define limits– Make sure you run what you mean to– Limit rights
Per user Per software component
QuickSilver: digests .NET IL:
– Code is signed using encrypting of hashed original– Permissions are set per module
Generate Efficient Code
IR may also provide support for:– Versioning (Quicksilver, .NET)– Culture (.NET)
Presentation Navigator
Introduction Challenges Staged compilation Generate efficient code Case studies Conclusions
Compiler Internal Representation - General Organization
High-level - completely machine independent– Abstract Syntax Tree – Control Flow Graph – Control Dependence Graph – Data Dependence Graph – Static Single Assignment
Medium-level - dependent on classes of machines– Virtual machine code, such as stack machine
Low level - dependent on particular ISA – Assembly, machine instruction graphs
Case Study: Polaris
High level representation– Abstract Syntax Tree– Control Flow Graph– Control Dependence Graph– Data Dependence Graph– Gated Static Single Assignment
Some generality– Backends for various parallel execution systems
Case Study: SUIF2
Multiple level representation– CFG, CDG, …– Quads– Machsuif– Custom annotations
Multiple frontends: Fortran, C, Java Multiple backends: SUIF VM, C, assembly Decoupled passes communicate only via SUIF Extendable: OSUIF
Case Study: Promis
Switch to Promis organization presentation Switch to Promis IR presentation
Case Study: KCC
Kook and Associates (KAI) C++ compiler:– C++ dedicated internal representation
Advanced C++ specific optimization
– Proprietary C++ specific object format Interprocedural optimization with modular compilation C++ specific debug information – usable with KDB
– Outputs C with calls to proprietary run-time library– Uses GNU gcc to generate machine code
Case Study: Jalapeno QuickSilver
Quasi-static images– Java bytecode + proprietary format
Representation allows for optimizations– Explicit method dependence graph – Templated optimized object code – Speculative optimization validation predicates
Case Study: .NET
Advertised 9 digit $$ figure project CLI (ECMA standard)
– Common type system Type info in intermediate code
– Common exception system Throw in Visual Basic, catch in C++
– Support for security, culture, versioning– Support for charging per-use– Custom info can be passed for original
language specific description
30+ languages
MSIL
native code
Other Compilers – Open Source
GNU compiler:– C, Fortran, Java, C++ front-ends– Generates code for all major architectures– Low level internal representation– New version (3.x) has SSA
SGI open source project: discontinued
Other Compilers – Commercial
Fortran, C, C++, Java produced by OS and/or hardware producers– HP, SGI, Intel, Microsoft, SUN
Other commercial compiler producers:– Borland, Watcom, etc.
Internal representation – company secret
Presentation Navigator
Introduction Challenges Staged compilation Generate efficient code Case studies Conclusions
Conclusions
Internal representation evolved– Programming paradigms– Changes in hardware– Changes in compiler/run-time system technology– New issues: security, verifiability, culture, versioning
Tendency: E Pluribus Unum