Compiler Construction Notes

16
Introduction to Compiler Construction ASU Textbook Chapter 1 Tsan-sheng Hsu [email protected] http://www.iis.sinica.edu.tw/~tshsu 1

description

Compiler Construction

Transcript of Compiler Construction Notes

Page 1: Compiler Construction Notes

Introduction to Compiler ConstructionASU Textbook Chapter 1

Tsan-sheng [email protected]

http://www.iis.sinica.edu.tw/~tshsu

1

Page 2: Compiler Construction Notes

What is a compiler?

Definitions:• a recognizer ;

• a translator .

source program ⇒ compiler ⇒ target program

• Source and target must be equivalent!

Compiler writing spans:• programming languages;• machine architecture;• language theory;• algorithms and data structures;• software engineering.

History:• 1950: the first FORTRAN compiler took 18 man-years;• now: using software tools, can be done in a few months as a student’s

project.

Compiler notes #1, 20060224, Tsan-sheng Hsu 2

Page 3: Compiler Construction Notes

Applications

Computer language compilers.Translator: from one format to another.

• query interpreter• text formatter• silicon compiler• infix notation → postfix notation:

3 + 5− 6 ∗ 6 ⇒ 3 5 + 6 6 ∗ −• pretty printers• · · ·

Computational theory:• a set of grammar rules ≡ the definition of a particular machine.

. also equivalent to a set of languages recognized by this machine.

• a type of machines: a family of machines with a given set of operations,or capabilities;

• power of a type of machines≡ the set of languages that can be recognized by this type of machines.

Compiler notes #1, 20060224, Tsan-sheng Hsu 3

Page 4: Compiler Construction Notes

Flow chart of a typical compiler

source code

target code

lexical analyzer (scanner)

syntax analyzer (parser)

semantic analyzer

intermediate code generator

code optimizer

code generator

sequence of characters

sequence of tokens

abstract−syntax tree

annoted abstract−syntax tree

intermediate code

optimized intermediate code

error handlersymboltable

Compiler notes #1, 20060224, Tsan-sheng Hsu 4

Page 5: Compiler Construction Notes

Scanner

Actions:• Reads characters from the source program;

• Groups characters into lexemes , i.e., sequences of characters that

“go together”, following a given pattern ;

• Each lexeme corresponds to a token .

. the scanner returns the next token, plus maybe some additional infor-mation, to the parser;

• The scanner may also discover lexical errors, i.e., erroneous characters.

The definitions of what a lexeme , token or bad characteris depend on the definition of the source language.

Compiler notes #1, 20060224, Tsan-sheng Hsu 5

Page 6: Compiler Construction Notes

Scanner example for C

Lexeme: C sentence

L1: x = y2 + 12;

(Lexeme) L1 : x = y2 + 12 ;

(Token) ID COLON ID ASSIGN ID PLUS INT SEMI-COL

Arbitrary number of blanks between lexemes.Erroneous sequence of characters, that are not parts ofcomments, for the C language:

• control characters• @• 2abc

Compiler notes #1, 20060224, Tsan-sheng Hsu 6

Page 7: Compiler Construction Notes

Parser

Actions:• Group tokens into grammatical phrases , to discover the underlying

structure of the source• Find syntax errors , e.g., the following C source line:

(Lexeme) index = 12 * ;

(Token) ID ASSIGN INT TIMES SEMI-COL

Every token is legal, but the sequence is erroneous!

May find some static semantic errors , e.g., use of undeclaredvariables or multiple declared variables.May generate code, or build some intermediate representationof the source program, such as an abstract-syntax tree.

Compiler notes #1, 20060224, Tsan-sheng Hsu 7

Page 8: Compiler Construction Notes

Parser example for C

Source code: position = initial + rate ∗ 60;Abstract-syntax tree:

=

position +

initial *rate 60

• interior nodes of the tree are OPERATORS;• a node’s children are its OPERANDS;

• each subtree forms a logical unit .

• the subtree with ∗ at its root shows that ∗ has higher precedencethan +, the operation “rate ∗ 60” must be performed as a unit, not“initial + rate”.

Compiler notes #1, 20060224, Tsan-sheng Hsu 8

Page 9: Compiler Construction Notes

Semantic analyzer

Actions:• Check for more static semantic errors, e.g., type errors .

• May annotate and/or change the abstract syntax tree.

=

position +

initial *rate 60

=

position +

initial *rate

60

(float)

(float)(float)

int−to−float()

Compiler notes #1, 20060224, Tsan-sheng Hsu 9

Page 10: Compiler Construction Notes

Intermediate code generator

Actions: translate from abstract-syntax trees to intermediatecodes.

One choice for intermediate code is 3-address code :• Each statement contains

. at most 3 operands;

. in addition to “:=”, i.e., assignment, at most one operator.

• An”easy” and “universal” format that can be translated into mostassembly languages.

Example:

=

position +

initial *rate

60

(float)

(float)(float)

int−to−float()

temp1 := int-to-float(60)

temp2 := rate * temp1

temp3 := initial + temp2

position := temp3

Compiler notes #1, 20060224, Tsan-sheng Hsu 10

Page 11: Compiler Construction Notes

Optimizer

Improve the efficiency of intermediate code.

Goal may be to make code run faster , and/or to use leastnumber of registers · · ·

Example:

temp1 := int-to-float(60)

temp2 := rate * temp1

temp3 := initial + temp2

position := temp3

⇒ temp2 := rate * 60.0

position := initial + temp2

Current trends:• to obtain smaller, but maybe slower, equivalent code for embedded

systems;• to reduce power consumption.

Compiler notes #1, 20060224, Tsan-sheng Hsu 11

Page 12: Compiler Construction Notes

Code generation

A compiler may generate• pure machine codes (machine dependent assembly language) directly,

which is rare now ;• virtual machine code.

Example:

• PASCAL → compiler → P-code → interpreter → execution

• Speed is roughly 4 times slower than running directly generated machinecodes.

Advantages:• simplify the job of a compiler;

• decrease the size of the generated code: 1/3 for P-code ;

• can be run easily on a variety of platforms. P-machine is an ideal general machine whose interpreter can be written

easily;. divide and conquer;. recent example: JAVA and Byte-code.

Compiler notes #1, 20060224, Tsan-sheng Hsu 12

Page 13: Compiler Construction Notes

Code generation example

temp2 := rate * 60.0

position := initial + temp2=⇒

LOADF rate, R1

MULF #60.0, R1

LOADF initial, R2

ADDF R2, R1

STOREF R1, position

Compiler notes #1, 20060224, Tsan-sheng Hsu 13

Page 14: Compiler Construction Notes

Practical considerations (1/2)

Preprocessing phase:• macro substitution:

. #define MAXC 10

• rational preprocessing: add new features for old languages.. BASIC. C → C + +

• compiler directives:. #include <stdio.h>

• non-standard language extensions.. adding parallel primitives

Compiler notes #1, 20060224, Tsan-sheng Hsu 14

Page 15: Compiler Construction Notes

Practical considerations (2/2)

Passes of compiling

• First pass reads the text file once.

• May need to read the text one more time for any forward addressedobjects, i.e., anything that is used before its declaration.

• Example: C language

goto error handling;

· · ·error handling:

· · ·

Compiler notes #1, 20060224, Tsan-sheng Hsu 15

Page 16: Compiler Construction Notes

Reduce number of passes

Each pass takes I/O time.

Back-patching : leave a blank slot for missing information, and

fill in the empty slot when the information becomes available.Example: C languagewhen a label is used

• if it is not defined before, save a trace into the to-be-processed table

. label name corresponds to LABEL TABLE[i]

• code generated: GOTO LABEL TABLE[i]

when a label is defined• check known labels for redefined labels• if it is not used before, save a trace into the to-be-processed table• if it is used before, then find its trace and fill the current address into

the trace

Time and space trade-off !

Compiler notes #1, 20060224, Tsan-sheng Hsu 16