Chapter 14: Building a Runnable Program. - 1 - Chapter 14: Building a runnable program 14.1 Back-End...
-
Upload
gerald-gregory -
Category
Documents
-
view
226 -
download
4
Transcript of Chapter 14: Building a Runnable Program. - 1 - Chapter 14: Building a runnable program 14.1 Back-End...
Chapter 14: Building a Runnable Program
- 2 -
Chapter 14: Building a runnable program
14.1 Back-End Compiler Structure
14.2 Intermediate Forms
14.3 Code Generation
14.4 Address Space Organization
- 3 -
Where We Are...
Lexical Analysis
Syntax Analysis
Semantic Analysis
Intermediate Code Gen
Source code(character stream)
token stream
abstract syntaxtree
abstract syntaxtree + symboltables, types
Intermediate code
regular expressions
grammars
static semantics
- 4 -
Chapter 14: Building a runnable program
14.1 Back-End Compiler Structure
14.2 Intermediate Forms
14.3 Code Generation
14.4 Address Space Organization
- 5 -
Intermediate Representation (aka IR)
The compilers internal representation» Is language-independent and machine-
independent
AST IR
Pentium
Java bytecode
Itanium
TI C5x
ARM
optimize
Enables machine independentand machine dependent optis
- 6 -
What Makes a Good IR?
Captures high-level language constructs» Easy to translate from AST» Supports high-level optimizations
Captures low-level machine features» Easy to translate to assembly» Supports machine-dependent optimizations
Narrow interface: small number of node types (instructions)» Easy to optimize» Easy to retarget
- 7 -
Multiple IRs
Most compilers use 2 IRs:» High-level IR (HIR): Language independent but
closer to the language
» Low-level IR (LIR): Machine independent but closer to the machine
» A significant part of the compiler is both language and machine independent!
AST HIR
Pentium
Java bytecodeItaniumTI C5xARM
optimize
LIR
optimizeoptimize
C++C
Fortran
- 8 -
High-Level IR
HIR is essentially the AST» Must be expressive for all input languages
Preserves high-level language constructs» Structured control flow: if, while, for, switch» Variables, expressions, statements, functions
Allows high-level optimizations based on properties of source language» Function inlining, memory dependence
analysis, loop transformations
- 9 -
Low-Level IR
A set of instructions which emulates an abstract machine (typically RISC)
Has low-level constructs» Unstructured jumps, registers, memory
locations Types of instructions
» Arithmetic/logic (a = b OP c), unary operations, data movement (move, load, store), function call/return, branches
- 10 -
Alternatives for LIR
3 general alternatives» Three-address code or quadruples
a = b OP c Advantage: Makes compiler analysis/opti easier
» Tree representation Was popular for CISC architectures Advantage: Easier to generate machine code
» Stack machine Like Java bytecode Advantage: Easier to generate from AST
- 11 -
Three-Address Code
a = b OP c» Originally, because instruction had at most 3
addresses or operands This is not enforced today, ie MAC: a = b * c + d
» May have fewer operands Also called quadruples: (a,b,c,OP) Example
a = (b+c) * (-e) t1 = b + ct2 = -ea = t1 * t2
Compiler-generatedtemporary variable
- 12 -
IR Instructions Assignment instructions
» a = b OP C (binary op) arithmetic: ADD, SUB,
MUL, DIV, MOD logic: AND, OR, XOR comparisons: EQ, NEQ,
LT, GT, LEQ, GEQ
» a = OP b (unary op) arithmetic MINUS,
logical NEG
» a = b : copy instruction
» a = [b] : load instruction
» [a] = b : store instruction
» a = addr b: symbolic address
Flow of control» label L: label instruction
» jump L: unconditional jump
» cjump a L : conditional jump
Function call» call f(a1, ..., an)
» a = call f(a1, ..., an)
IR describes the instruction set of an abstract machine
- 13 -
IR Operands
The operands in 3-address code can be:» Program variables» Constants or literals» Temporary variables
Temporary variables = new locations» Used to store intermediate values» Needed because 3-address code not as
expressive as high-level languages
- 14 -
Chapter 14: Building a runnable program
14.1 Back-End Compiler Structure
14.2 Intermediate Forms
14.3 Code Generation
14.4 Address Space Organization
- 15 -
Translating High IR to Low IR
May have nested language constructs» E.g., while nested within an if statement
Need an algorithmic way to translate» Strategy for each high IR construct» High IR construct sequence of low IR
instructions Solution
» Start from the high IR (AST like) representation» Define translation for each node in high IR» Recursively translate nodes
- 16 -
Notation
Use the following notation:» [[e]] = the low IR representation of high IR construct
e
[[e]] is a sequence of low IR instructions If e is an expression (or statement expression), it
represents a value» Denoted as: t = [[e]]
» Low IR representation of e whose result value is stored in t
For variable v: t = [[v]] is the copy instruction» t = v
- 17 -
Translating Expressions
Binary operations: t = [[e1 OP e2]]» (arithmetic, logical operations and comparisons)
Unary operations: t = [[OP e]]
OP
e1 e2
t1 = [[e1]]t2 = [[e2]]t1 = t1 OP t2
OP
e1
t1 = [[e1]]t = OP t1
- 18 -
Translating Array Accesses
Array access: t = [[ v[e] ]]» (type of e is array [T] and S = size of T)
t1 = addr vt2 = [[e]]t3 = t2 * St4 = t1 + t3t = [t4] /* ie load */
array
v e
- 19 -
Translating Structure Accesses
Structure access: t = [[ v.f ]]» (v is of type T, S = offset of f in T)
t1 = addr vt2 = t1 + St = [t2] /* ie load */
struct
v f
- 20 -
Translating Short-Circuit OR
Short-circuit OR: t = [[e1 SC-OR e2]]» e.g., || operator in C/C++
t = [[e1]]cjump t Lendt = [[e2]]Lend:
semantics:1. evaluate e12. if e1 is true, then done3. else evaluate e2
SC-OR
e1 e2
- 21 -
Class Problem
Short-circuit AND: t = [[e1 SC-AND e2]]» e.g., && operator in C/C++
Semantics:1. Evaluate e12. if e1 is true, then evaluate e23. else done
- 22 -
Translating Statements
Statement sequence: [[s1; s2; ...; sN]]
IR instructions of a statement sequence = concatenation of IR instructions of statements
[[ s1 ]][[ s2 ]]...[[ sN ]]
seq
s1 s2 sN...
- 23 -
Assignment Statements
Variable assignment: [[ v = e ]]
Array assignment: [[ v[e1] = e2 ]]
v = [[ e ]]
t1 = addr vt2 = [[e1]]t3 = t2 * St4 = t1 + t3t5 = [[e2][t4] = t5 /* ie store */
recall S = sizeof(T)where v is array(T)
- 24 -
Translating If-Then [-Else]
[[ if (e) then s ]] [[ if (e) then s1 else s2 ]]
t1 = [[ e ]]t2 = not t1cjump t2 LelseLthen: [[ s1 ]]jump LendLelse: [[ s2 ]]Lend:
t1 = [[ e ]]t2 = not t1cjump t2 Lend[[ s ]]Lend:
How could I do this moreefficiently??
- 25 -
While Statements
[[ while (e) s ]]
Lloop: t1 = [[ e ]]t2 = NOT t1cjump t2 Lend[[ s ]]jump LloopLend:
or
while-do translation do-while translation
t1 = [[ e ]]t2 = NOT t1cjump t2 LendLloop: [[ s ]]t3 = [[ e ]]cjump t3 LloopLend:
Which is better and why?
- 26 -
Class Problem
n = 0;while (n < 10) {
n = n+1;}
Convert the following code segment to IR
- 27 -
Switch Statements
[[ switch (e) case v1:s1, ..., case vN:sN ]]
t = [[ e ]]L1: c = t != v1cjump c L2[[ s1 ]]jump Lend /* if there is a break */L2: c = t != v2cjump c L3[[ s2 ]]jump Lend /* if there is a break */...Lend:
Can also implementswitch as table lookup.Table contains targetlabels, ie L1, L2, L3.‘t’ is used to index table.
Benefit: k branchesreduced to 1. Negative: target of branchhard to figure out inhardware
- 28 -
Call and Return Statements
[[ call f(e1, e2, ..., eN) ]]
[[ return e ]]
t1 = [[ e1 ]]t2 = [[ e2 ]]...tN = [[ eN ]]call f(t1, t2, ..., tN)
t = [[ e ]]return t
- 29 -
Statement Expressions
So far: statements which do not return values
Easy extensions for statement expressions:» Block statements» If-then-else» Assignment statements
t = [[ s ]] is the sequence of low IR code for statement s, whose result is stored in t
- 30 -
Statement Expressions
t = [[ if (e) then s1 else s2 ]]
t = [[ s1; s2; .. sN ]]
Result value of a block statement = value of last stmt in the sequence
t1 = [[ e ]]cjump t1 Lthent = [[ s2 ]]jump LendLthen: t = [[ s1 ]]Lend:
[[ s1 ]][[ s2 ]]...t = [[ sN ]]
- 31 -
Assignment Statements
t = [[ v = e ]]
Result value of an assignment statement = value of the assigned expression
v = [[ e ]]t = v
- 32 -
Nested Expressions
Translation recurses on the expression structure
Example: t = [[ (a – b) * (c + d) ]]
t1 = at2 = bt3 = t1 – t2t4 = ct5 = dt5 = t4 + t5t = t3 * t5
[[ (a – b) ]]
[[ (c + d) ]]
[[ (a-b) * (c+d) ]]
- 33 -
Nested Statements
Same for statements: recursive translation Example: t = [[ if c then if d then a = b ]]
t1 = ct2 = NOT t1cjump t2 Lend1t3 = dt4 = NOT t3cjump t4 Lend2t3 = ba = t3Lend2:Lend1:
[[ if c ... ]][[ a = b ]]
[[ if d ... ]]
- 34 -
Class Problem
for (i=0; i<100; i++) { A[i] = 0;}
if ((a > 0) && (b > 0)) c = 2;else c = 3;
Translate the following to the generic assembly code discussed
- 35 -
Chapter 14: Building a runnable program
14.1 Back-End Compiler Structure
14.2 Intermediate Forms
14.3 Code Generation
14.4 Address Space Organization
- 36 -
Issues
These translations are straightforward But, inefficient:
» Lots of temporaries» Lots of labels» Lots of instructions
Can we do this more intelligently?» Should we worry about it?
- 37 -
2 Classes of Storage in Processor
Registers» Fast access, but only a few of them» Address space not visible to programmer
Doesn’t support pointer access!
Memory» Slow access, but large» Supports pointers
Storage class for each variable generally determined when map HIR to LIR
- 38 -
Storage Class Selection
Standard (simple) approach» Globals/statics – memory
» Locals Composite types (structs, arrays, etc.) – memory Scalars
Accessed via ‘&’ operator? – memory Rest – Virtual register, later we will map virtual registers to true
machine registers. Note, as a result, some local scalars may be “spilled to memory”
All memory approach» Put all variables into memory
» Register allocation relocates some mem vars to registers
- 39 -
4 Distinct Regions of Memory
Code space – Instructions to be executed» Best if read-only
Static (or Global) – Variables that retain their value over the lifetime of the program
Stack – Variables that is only as long as the block within which they are defined (local)
Heap – Variables that are defined by calls to the system storage allocator (malloc, new)
- 40 -
Memory Organization
Code
Static Data
Stack
Heap
. . .
Code and staticdata sizes determinedby the compiler
Stack and heap sizesvary at run-time Stack grows downward Heap grows upward
Some ABI’s have stack/heap switched
- 41 -
Class Problem
Specify whether each variable is stored in register or memory.For memory which area of the memory?
int a;void foo(int b, double c){ int d; struct { int e; char f;} g; int h[10]; char i = 5; float j;}