Modern Compiler Internal Representations Silvius Rus 1/23/2002.

Post on 19-Dec-2015

229 views 3 download

Tags:

Transcript of Modern Compiler Internal Representations Silvius Rus 1/23/2002.

Modern Compiler Internal Representations

Silvius Rus

1/23/2002

Presentation Navigator

Introduction Challenges Staged compilation Generate efficient code Case studies Conclusions

Traditional Compiler Organization

Pass: output type– Read code as text: ASCII characters– Lexical scanner: language words– Syntactic parser: language phrases– Translation: attribute grammar phrases– Output generated code: binary stream

Focus on pipelining due to memory window constraints

Traditional Compiler Internal Representation

Grammatical structure not always built explicitly

Implicit, built-in semantics Simple data structures:

– Transition tables– Token streams and stacks

Presentation Navigator

Introduction Challenges Staged compilation Generate efficient code Case studies Conclusions

Compiler Challenges

Versatile: – Understand multiple languages – Generate output for various architectures

Generated efficient code:– Fast: as fast as coded directly in the output language – Portable: runs on multiple platforms – Verifiable: runs provably within a specified class of behavior – Secure: provably respects certain security requirements

Extendable: need to extend in order to: – Incorporate new input language and/or target system– Take advantage of advances in run-time environments (such as ISA

changes, multithreading, distributed/parallel execution)

L+A < L*A

Understand Multiple Languages - Output for Multiple Targets

Abstract IR:– Same representation for Fortran, C, C++, Java, …– Possible only for conceptually similar languages

Good points:– Perform complex transformations on a single representation

Bad points:– Language semantics may either get lost or need additional

particular representation– Specific architecture characteristics are more profitable to use

than common (abstractable) ones

Presentation Navigator

Introduction Challenges Staged compilation Generate efficient code Case studies Conclusions

Staged Compilation

Stage 1:– Load source file (text) into IR1 – machine independent– Optimize IR1– Stream IR1 to text file

Save/reload, pipe, HTTP, … text file– SUIF files, Java bytecode, .NET assembly

Stage 2:– Load text file into IR2 – machine dependent– Perform machine specific optimization on IR2– Generate executable code or interpret IR2

Stage 1 Stage 2 Examples

Static Static SUIF, Promis

Static Dynamic SUN JIT, .NET JITer

Static Static + Dynamic

DyC, Quicksilver

Staged Compilation

Staged Compilation

Prepare IR1 so that stage 2 is very cheap– Quicksilver

Insert templated optimized object code in bytecode Pack speculative optimization validation predicates in bytecode Keep method dependence graphs explicitly in bytecode

– Microsoft .NET Explicit type/class information in IL Preformatted, quickly accessible metadata

– Strings, tables, heaps– Custom data

Allow embedding of native code

Presentation Navigator

Introduction Challenges Staged compilation Generate efficient code Case studies Conclusions

Generate Fast And Portable Code

Fast code– IR close to machine structure

Mapping data to registers Mapping operations to opcodes Scheduling instructions for superscalar/VLIW processors

Portable code– Machine description must be totally abstracted

QuickSilver: templated optimized code

Generate Verifiable Code

Microsoft .NET IL– Static and dynamic type safety - reflections– Managed code

Carries a minimum of information on itself Usually signed by compiler in Stage 1

– Managed data Only accessible from managed code Garbage collected

– Managed pointers

Generate Secure Code

Hard to define limits– Make sure you run what you mean to– Limit rights

Per user Per software component

QuickSilver: digests .NET IL:

– Code is signed using encrypting of hashed original– Permissions are set per module

Generate Efficient Code

IR may also provide support for:– Versioning (Quicksilver, .NET)– Culture (.NET)

Presentation Navigator

Introduction Challenges Staged compilation Generate efficient code Case studies Conclusions

Compiler Internal Representation - General Organization

High-level - completely machine independent– Abstract Syntax Tree – Control Flow Graph – Control Dependence Graph – Data Dependence Graph – Static Single Assignment

Medium-level - dependent on classes of machines– Virtual machine code, such as stack machine

Low level - dependent on particular ISA – Assembly, machine instruction graphs

Case Study: Polaris

High level representation– Abstract Syntax Tree– Control Flow Graph– Control Dependence Graph– Data Dependence Graph– Gated Static Single Assignment

Some generality– Backends for various parallel execution systems

Case Study: SUIF2

Multiple level representation– CFG, CDG, …– Quads– Machsuif– Custom annotations

Multiple frontends: Fortran, C, Java Multiple backends: SUIF VM, C, assembly Decoupled passes communicate only via SUIF Extendable: OSUIF

Case Study: Promis

Switch to Promis organization presentation Switch to Promis IR presentation

Case Study: KCC

Kook and Associates (KAI) C++ compiler:– C++ dedicated internal representation

Advanced C++ specific optimization

– Proprietary C++ specific object format Interprocedural optimization with modular compilation C++ specific debug information – usable with KDB

– Outputs C with calls to proprietary run-time library– Uses GNU gcc to generate machine code

Case Study: Jalapeno QuickSilver

Quasi-static images– Java bytecode + proprietary format

Representation allows for optimizations– Explicit method dependence graph – Templated optimized object code – Speculative optimization validation predicates

Case Study: .NET

Advertised 9 digit $$ figure project CLI (ECMA standard)

– Common type system Type info in intermediate code

– Common exception system Throw in Visual Basic, catch in C++

– Support for security, culture, versioning– Support for charging per-use– Custom info can be passed for original

language specific description

30+ languages

MSIL

native code

Other Compilers – Open Source

GNU compiler:– C, Fortran, Java, C++ front-ends– Generates code for all major architectures– Low level internal representation– New version (3.x) has SSA

SGI open source project: discontinued

Other Compilers – Commercial

Fortran, C, C++, Java produced by OS and/or hardware producers– HP, SGI, Intel, Microsoft, SUN

Other commercial compiler producers:– Borland, Watcom, etc.

Internal representation – company secret

Presentation Navigator

Introduction Challenges Staged compilation Generate efficient code Case studies Conclusions

Conclusions

Internal representation evolved– Programming paradigms– Changes in hardware– Changes in compiler/run-time system technology– New issues: security, verifiability, culture, versioning

Tendency: E Pluribus Unum