Compiler Construction Notes
-
Upload
apoorva-bhatt -
Category
Documents
-
view
9 -
download
2
description
Transcript of Compiler Construction Notes
Introduction to Compiler ConstructionASU Textbook Chapter 1
Tsan-sheng [email protected]
http://www.iis.sinica.edu.tw/~tshsu
1
What is a compiler?
Definitions:• a recognizer ;
• a translator .
source program ⇒ compiler ⇒ target program
• Source and target must be equivalent!
Compiler writing spans:• programming languages;• machine architecture;• language theory;• algorithms and data structures;• software engineering.
History:• 1950: the first FORTRAN compiler took 18 man-years;• now: using software tools, can be done in a few months as a student’s
project.
Compiler notes #1, 20060224, Tsan-sheng Hsu 2
Applications
Computer language compilers.Translator: from one format to another.
• query interpreter• text formatter• silicon compiler• infix notation → postfix notation:
3 + 5− 6 ∗ 6 ⇒ 3 5 + 6 6 ∗ −• pretty printers• · · ·
Computational theory:• a set of grammar rules ≡ the definition of a particular machine.
. also equivalent to a set of languages recognized by this machine.
• a type of machines: a family of machines with a given set of operations,or capabilities;
• power of a type of machines≡ the set of languages that can be recognized by this type of machines.
Compiler notes #1, 20060224, Tsan-sheng Hsu 3
Flow chart of a typical compiler
source code
target code
lexical analyzer (scanner)
syntax analyzer (parser)
semantic analyzer
intermediate code generator
code optimizer
code generator
sequence of characters
sequence of tokens
abstract−syntax tree
annoted abstract−syntax tree
intermediate code
optimized intermediate code
error handlersymboltable
Compiler notes #1, 20060224, Tsan-sheng Hsu 4
Scanner
Actions:• Reads characters from the source program;
• Groups characters into lexemes , i.e., sequences of characters that
“go together”, following a given pattern ;
• Each lexeme corresponds to a token .
. the scanner returns the next token, plus maybe some additional infor-mation, to the parser;
• The scanner may also discover lexical errors, i.e., erroneous characters.
The definitions of what a lexeme , token or bad characteris depend on the definition of the source language.
Compiler notes #1, 20060224, Tsan-sheng Hsu 5
Scanner example for C
Lexeme: C sentence
L1: x = y2 + 12;
(Lexeme) L1 : x = y2 + 12 ;
(Token) ID COLON ID ASSIGN ID PLUS INT SEMI-COL
Arbitrary number of blanks between lexemes.Erroneous sequence of characters, that are not parts ofcomments, for the C language:
• control characters• @• 2abc
Compiler notes #1, 20060224, Tsan-sheng Hsu 6
Parser
Actions:• Group tokens into grammatical phrases , to discover the underlying
structure of the source• Find syntax errors , e.g., the following C source line:
(Lexeme) index = 12 * ;
(Token) ID ASSIGN INT TIMES SEMI-COL
Every token is legal, but the sequence is erroneous!
May find some static semantic errors , e.g., use of undeclaredvariables or multiple declared variables.May generate code, or build some intermediate representationof the source program, such as an abstract-syntax tree.
Compiler notes #1, 20060224, Tsan-sheng Hsu 7
Parser example for C
Source code: position = initial + rate ∗ 60;Abstract-syntax tree:
=
position +
initial *rate 60
• interior nodes of the tree are OPERATORS;• a node’s children are its OPERANDS;
• each subtree forms a logical unit .
• the subtree with ∗ at its root shows that ∗ has higher precedencethan +, the operation “rate ∗ 60” must be performed as a unit, not“initial + rate”.
Compiler notes #1, 20060224, Tsan-sheng Hsu 8
Semantic analyzer
Actions:• Check for more static semantic errors, e.g., type errors .
• May annotate and/or change the abstract syntax tree.
=
position +
initial *rate 60
=
position +
initial *rate
60
(float)
(float)(float)
int−to−float()
Compiler notes #1, 20060224, Tsan-sheng Hsu 9
Intermediate code generator
Actions: translate from abstract-syntax trees to intermediatecodes.
One choice for intermediate code is 3-address code :• Each statement contains
. at most 3 operands;
. in addition to “:=”, i.e., assignment, at most one operator.
• An”easy” and “universal” format that can be translated into mostassembly languages.
Example:
=
position +
initial *rate
60
(float)
(float)(float)
int−to−float()
temp1 := int-to-float(60)
temp2 := rate * temp1
temp3 := initial + temp2
position := temp3
Compiler notes #1, 20060224, Tsan-sheng Hsu 10
Optimizer
Improve the efficiency of intermediate code.
Goal may be to make code run faster , and/or to use leastnumber of registers · · ·
Example:
temp1 := int-to-float(60)
temp2 := rate * temp1
temp3 := initial + temp2
position := temp3
⇒ temp2 := rate * 60.0
position := initial + temp2
Current trends:• to obtain smaller, but maybe slower, equivalent code for embedded
systems;• to reduce power consumption.
Compiler notes #1, 20060224, Tsan-sheng Hsu 11
Code generation
A compiler may generate• pure machine codes (machine dependent assembly language) directly,
which is rare now ;• virtual machine code.
Example:
• PASCAL → compiler → P-code → interpreter → execution
• Speed is roughly 4 times slower than running directly generated machinecodes.
Advantages:• simplify the job of a compiler;
• decrease the size of the generated code: 1/3 for P-code ;
• can be run easily on a variety of platforms. P-machine is an ideal general machine whose interpreter can be written
easily;. divide and conquer;. recent example: JAVA and Byte-code.
Compiler notes #1, 20060224, Tsan-sheng Hsu 12
Code generation example
temp2 := rate * 60.0
position := initial + temp2=⇒
LOADF rate, R1
MULF #60.0, R1
LOADF initial, R2
ADDF R2, R1
STOREF R1, position
Compiler notes #1, 20060224, Tsan-sheng Hsu 13
Practical considerations (1/2)
Preprocessing phase:• macro substitution:
. #define MAXC 10
• rational preprocessing: add new features for old languages.. BASIC. C → C + +
• compiler directives:. #include <stdio.h>
• non-standard language extensions.. adding parallel primitives
Compiler notes #1, 20060224, Tsan-sheng Hsu 14
Practical considerations (2/2)
Passes of compiling
• First pass reads the text file once.
• May need to read the text one more time for any forward addressedobjects, i.e., anything that is used before its declaration.
• Example: C language
goto error handling;
· · ·error handling:
· · ·
Compiler notes #1, 20060224, Tsan-sheng Hsu 15
Reduce number of passes
Each pass takes I/O time.
Back-patching : leave a blank slot for missing information, and
fill in the empty slot when the information becomes available.Example: C languagewhen a label is used
• if it is not defined before, save a trace into the to-be-processed table
. label name corresponds to LABEL TABLE[i]
• code generated: GOTO LABEL TABLE[i]
when a label is defined• check known labels for redefined labels• if it is not used before, save a trace into the to-be-processed table• if it is used before, then find its trace and fill the current address into
the trace
Time and space trade-off !
Compiler notes #1, 20060224, Tsan-sheng Hsu 16