CS412/413
Introduction toCompilers and Translators
May 3, 1999
Lecture 34: Compiler-like Systems
JIT
byteco
de
interpre
ter
src-to-srctranslator
bytecodecompiler
CS 412/413 Introduction to Compilers and Translators -- Spring '99 Andrew Myers
2
Administration
• Project code must be turned in Friday even if you haven’t finished the programming assignment -- at minimum, turn in something that can compile programs.
• If parts of program are incomplete, include explanation of design, and components that are working
CS 412/413 Introduction to Compilers and Translators -- Spring '99 Andrew Myers
3
A CompilerLexical analysis
Parsing
Intermediate Code Generation
MIR Optimization
Code generation
LIR optimization
HIR Optimization
Linking & Loading
“CompilerClassic”
bytecodecompiler
JITcompiler
front end
back end
source-to-source
translator
interpreter
CS 412/413 Introduction to Compilers and Translators -- Spring '99 Andrew Myers
4
Other options• Compile from source to another high-level
language (source-to-source translator), let other compiler deal with code gen, etc.
• Compile from source to an intermediate code format for which a back end already exists (ucode, RTF, LCC, ...)
• Compile from source to an intermediate code format executable by an interpreter– abstract syntax tree– bytecodes– threaded code (call- or pointer-threaded)
• Advantage: Architecture independence
CS 412/413 Introduction to Compilers and Translators -- Spring '99 Andrew Myers
5
Source-to-source translator
• Idea: choose well-supported high-level language (e.g. C, C++, Java) as target
• Translate AST to high-level language constructs instead of to IR, pass translated code off to underlying compiler
• Advantage: easy, can leverage good underlying compiler technology. Examples: C++ (to C), PolyJ (to Java), Toba (JVM to C)
• Disadvantages: target language won’t support all features, optimization harder in target language, language may impose extra checks
CS 412/413 Introduction to Compilers and Translators -- Spring '99 Andrew Myers
6
Compiling to C• C doesn’t impose extra checks, is
reasonably close to assembly (but can’t support static exception tables!), widely available
• Mismatch: doesn’t allow statements underneath expressions; need to translate and canonicalize in one step
• Translation of an expression into C is:– a sequence of statements to be executed– an expression to be evaluated afterward
E S1;…;Sn , E’
CS 412/413 Introduction to Compilers and Translators -- Spring '99 Andrew Myers
7
Translation rules• Translation still can be performed
by recursive traversal of AST• IotaC rules:
E S , E’
id = E; S , id = E’;
Si Si’ , Ei’
{ S1;…; Sn } S1’;…; Sn’ , En’block
assignment
CS 412/413 Introduction to Compilers and Translators -- Spring '99 Andrew Myers
8
Translating to Java• Same problems as C, plus: Java is
type-safe (good in a HLL, not so good in an intermediate language!)
• May need to use cast and instanceof expressions in generated code: slow in Java
CS 412/413 Introduction to Compilers and Translators -- Spring '99 Andrew Myers
9
Back Ends• Several standard intermediate code
formats exist with back ends for various architectures -- can reuse back ends– p-code: very old stack machine format– UCODE: old Stanford/MIPS stack machine
format– Java bytecode: new stack machine format– RTL: GNU gcc, etc.– SUIF: Stanford format for optimization– LCC: Lightweight C compiler
CS 412/413 Introduction to Compilers and Translators -- Spring '99 Andrew Myers
10
Intermediate code formats• Quadruples
– compact, similar to machine code, good for standard optimization techniques
• Stack-machine– easy to generate code for– not compact (contrary to popular opinion)– can be converted back into quadruples– used by some (sort of) high-level
languages: FORTH, PostScript, HP calculators
CS 412/413 Introduction to Compilers and Translators -- Spring '99 Andrew Myers
11
Stack machine format• Code is a sequence of stack
operations (not necessarily the same stack as the call stack)
push CONST : add CONST to the top of stackpop : discard top of stackstore : in memory location specified by
top of stack store element just below.load : replace top of stack with memory
location it points to+, *, /, -, … : replace top two elems w/
result of operation
CS 412/413 Introduction to Compilers and Translators -- Spring '99 Andrew Myers
12
Generating code• Stack operations mostly don’t name
operands (implicit): can code in 1 byte• Expression is translated to code that
leaves its value on the top of stack
• Translation of E1 + E2:
E1 + E2 E1; E2; +
• Translation of id = E : id = E E ; push addr(id); store• Good for very simple code generation
CS 412/413 Introduction to Compilers and Translators -- Spring '99 Andrew Myers
13
Verbosity• Values get “trapped” down low on
stack especially with subexpression elim.
• Need instruction that re-pushes element at known stack index on top; instruction is used often
• Might as well have abstract register operands!
• Result: less compact than a register-based format; extra copies of data too
CS 412/413 Introduction to Compilers and Translators -- Spring '99 Andrew Myers
14
Stack machine register machine
• At each point in code, keep track of stack depth (if possible)
• Assign temporaries according to depth
• Replace stack operands with quadruples using these temporariesa b c
a + b*c
push a ; 0push b ; 1push c ; 2* ; 1+ ; 0
t0 = at1 = bt2 = ct1 = t1 * t2t0 = t0 + t1
0 1 2
a b*ca+b*c
CS 412/413 Introduction to Compilers and Translators -- Spring '99 Andrew Myers
15
JIT compilers• Particularly widely available back end
with well-defined intermediate code (JVM bytecode)
• Generate code by reconstructing registers as discussed
• Compilation is done on-the-fly: generating code quickly is essential generated code quality is usually low
• HotSpot: new Sun JIT. High-quality optimization (esp. inlining and specialization), but used sparingly
CS 412/413 Introduction to Compilers and Translators -- Spring '99 Andrew Myers
16
Interpreters• “Why generate machine code at
all? Just run it. Processors are really fast”
• Options:– token interpreters (parsing on the
fly) -- reallly slow (>1000x)– AST interpreters -- 300x– threaded interpreters -- 20-50x– bytecode interpreters -- 10-30x
CS 412/413 Introduction to Compilers and Translators -- Spring '99 Andrew Myers
17
AST interpreters“Yet another recursive traversal”• For every node type in AST, add
method Object evaluate(RunTimeContext r)• Evaluate method is implemented
recursivelyObject PlusNode.evaluate(r) {
return left.evaluate(r).plus(right.evaluate(r)); }
• Variables, etc. looked up in r; some help from AST yields big speed-ups (e.g. pre-computed variable locations)
CS 412/413 Introduction to Compilers and Translators -- Spring '99 Andrew Myers
18
Threaded interpreters• Code is represented as a bunch of blocks of
memory connected by pointers. Blocks are either code blocks or pointer blocks.
• Execution is (mostly) depth-first traversal• Each block starts with a single jump
instruction
JMP nextJMP
machinecode
RET
CS 412/413 Introduction to Compilers and Translators -- Spring '99 Andrew Myers
19
Threaded interpreters• Can call blocks directly regardless of
kind• Conditional branches, other control
structure embedded in special machine code blocks that munge stack
• Interpreter proper is ~10 instructions of code: very space-efficient!
• Long favored by FORTH fans: whole language runtime fits in 4K memory
CS 412/413 Introduction to Compilers and Translators -- Spring '99 Andrew Myers
20
Call-threaded interpreters• Instead of pointers in pointer blocks,
store CALL instructions• Depth-first traversal done by processor
automatically• Faster, trivial code generation required
CALLCALLCALL
machinecode
RETCALL
RETCALLRET
CS 412/413 Introduction to Compilers and Translators -- Spring '99 Andrew Myers
21
Bytecode interpreters• Problem with threaded interpreters:
operations with immediate arguments require multiple words (code bloat)
• Observation: most interpreter instructions can be encoded in less than a word, many even in just a byte (esp. with stack-machine model)
• Tradeoff: interpreter size vs. executable code size
CS 412/413 Introduction to Compilers and Translators -- Spring '99 Andrew Myers
22
Implementing a bytecode interpreter
• Bytecode interpreter simulates a simple architecture (either stack or register machine)
• Interpreter state:– current code pointer– current simulated return stack– current registers or stack & stack pointer
• Interpreter code is a big loop containing a switch over kinds of bytecode instructions– one big function: optimization techniques work
well. Ideally: no recursion on function calls
• Result: about 10x slowdown if done right
CS 412/413 Introduction to Compilers and Translators -- Spring '99 Andrew Myers
23
Summary• Building a new system for executing
code doesn’t require construction of a full compiler
• Cost-effective strategies: source-to-source translation or translation to an existing intermediate code format
• Most material covered in this course is still helpful for these approaches
• High performance: translate to C• Portability, extensibility: translate to Java
or JVM
Top Related