CHAPTER 5 Compiler 5.1 Basic Compiler Concepts. Basic Compiler Concepts 1. Lexical Analysis (Lexical...
-
Upload
rosalind-stafford -
Category
Documents
-
view
249 -
download
1
Transcript of CHAPTER 5 Compiler 5.1 Basic Compiler Concepts. Basic Compiler Concepts 1. Lexical Analysis (Lexical...
CHAPTER 5 Compiler5.1 Basic Compiler Concepts
Source program
Lexical analysis Token
Table management
Syntax analysis Parse tree
Intermediate code generation Intermediate code
Error handling
Code optimalization Intermediate code
Code generation
Machine code
編譯器執行的功能
Basic Compiler Concepts
1. Lexical Analysis (Lexical Analyzer 或 Scanner)
Read the source program one character at a time, carving the some program into a sequence of atomic units called token.
Token (token type, token value)
Basic Compiler Concepts
PROGRAM MAIN;VARIABLE INTEGER:U,V,M;U = 5;V = 7;CALL S1(U ,V , M );ENP;SUBPOUTINE S1( INTEGER : X , Y , M ) ;M = X + Y + 2.7;ENS;
FRANCIS語言所寫之程式
Basic Compiler Concepts
PROGRAM MAIN;(2,21) (5,3) (1,1)
VARIABLE INTEGER: U , V , M ;(2,25) (2,14) (1,12) (5,1) (1,11) (5,5) (1,11) (5,6) (1,1)
U = 5 ;(5,1) (1,4) (3,1) (1,1)
V = 7 ;(5,5) (1,4) (3,2) (1,1)
CALL S1 ( U , V , M ) ;(2,3) (5,10) (1,2) (5,1) (1,11) (5,5) (1,11) (5,6) (1,3) (1,1)
ENP ;(2,6) (1,1)
SUBPOUTINE S1 ( INTEGER : X , Y , M ) ;(2,23) (5,10) (1,2) (2,14) (1,12) (5,8) (1,11) (5,4) (1,11) (5,9) (1,3) (1,1)
M = X + Y + 2.7 ;(5,9) (1,4) (5,8) (1,5) (5,4) (1,5) (4,1) (1,1)
ENS ;
(2,7) (1,1)FRANCIS語言所寫之程式,被轉換成記號的格式
Basic Compiler Concepts
2. Syntax Analysis (Syntax Analyzer 或 Parser)
The grammar specified the form, or syntax, of legal
statements in the language.
<id-list> ::= id | <id-list>,id
<assign> ::= id:=<exp>
<exp> ::= <term> | <exp>+<term> | <exp>-<term>
<term> ::= <factor> | <term>*<factor> | <term> DIV<factor>
<factor> ::= id | int | (<exp>)
<read> ::= READ(<id-list>)
<write> ::= WRITE(<id-list>) PASCAL語言之部份文法
Basic Compiler Concepts
<id-list> ::= id | <id-list>,id
<assign> ::= id:=<exp>
<read> ::= READ(<id-list>)
<write> ::= WRITE(<id-list>) PASCAL語言之部份文法
<read>
<id-list>
READ ( id )
VALUEREAD (VALUE)敘述之語法樹
Parse Tree
Basic Compiler Concepts<assign> ::= id:=<exp>
<exp> ::= <term> | <exp>+<term> | <exp>-<term>
<term> ::= <factor> | <term>*<factor> | <term> DIV<factor>
<factor> ::= id | int | (<exp>) PASCAL語言之部份文法 <assign>
<exp>
<exp>
<term> <term>
<term> <term>
<factor> <factor> <factor> <factor>
id := id DIV int - id * id
VARIANCE SUMSQ 100 MEAN MEAN
VARIANCE:= SUMSQ DIV 100 - MEAN * MEAN敘述之語法樹
Basic Compiler Concepts
Syntax Error
<term>
<factor> <factor>
id + / id
A B A + / B敘述之語法樹
<assign> ::= id:=<exp>
<exp> ::= <term> | <exp>+<term> | <exp>-<term>
<term> ::= <factor> | <term>*<factor> | <term> DIV<factor>
<factor> ::= id | int | (<exp>) PASCAL語言之部份文法
Basic Compiler Concepts
3. Intermediate Code Generation
Three Address Code
(operator , operand1 , operand2 , Res
ult)
A=B+C (+ , B , C , A)
SUM : =A/B*C ,可以被分解成 T1=A/B (/ , A , B , T1)
T2=T1*C (* , T1 , C , T2)
SUM=T2 (= , T2 , , SUM)
Basic Compiler Concepts SUM : =A/B*C ,可以被分解成 T1=A/B (/ , A , B , T1)
T2=T1*C (* , T1 , C , T2)
SUM=T2 (= , T2 , , SUM) <assign>
<exp>
<exp>
<term>
<term> <term>
<factor> <factor> <factor>
id := id DIV id * id
SUM A B C
敘述 SUM:=A/B*C之語法樹
Basic Compiler Concepts
4. Code Optimization
Improve the intermediate code (or machine code),
so that the ultimate object program run fast
and/or takes less space
FOR I:= 1 To 10 Do A:=10;begin FOR I:= 1 To 10 Do
A:=10; begin
B[I+1]:= C[I+1]+A; J:== I + 1; end B[J]:= C[J]+A; 未最佳化 end
最佳化後
Basic Compiler Concepts
5. Code Generation
* Allocate memory location
* Select machine code for each intermediate code
* Register allocation: utilize registers as efficientl
y as possible
(+ , B , C , A) 我們可以得到
MOV AX,B
ADD AX,C
MOV A,AX
Basic Compiler Concepts
SUM : =A/B*C
(/ , A , B , T1) MOV AX,A
DIV B
MOV T1,AX
(* , T1 , C , T2) MOV AX,T1
MUL C
MOV T2,AX
(= , T2 , , SUM) MOV AX,T2
MOV SUM,AX
Basic Compiler Concepts
(/ , A , B , T1) MOV AX,A DIV B MOV T1,AX (* , T1 , C , T2) MOV AX,T1 MUL C MOV T2,AX (= , T2 , , SUM) MOV AX,T2 MOV SUM,AX
再作一次碼的最佳化
Basic Compiler Concepts
6. Table Management and Error Handling
Token, symbol table, reserved word table, delimiter tab
le, constant table,… etc.
* 五大功能之每一功能均做一次處理,如此就是五次處理。
* 也可以把幾個功能合併在同一次處理。
* 它至少是二次處理。
Grammar
5.2 Grammar 1. Grammar Backus Naur Form Grammar consists of a set of
rules, each which defines the syntax of some
construct in the programming language.
<id-list> ::= id | <id-list>,id
<assign> ::= id:=<exp>
<exp> ::= <term> | <exp>+<term> | <exp>-<term>
<term> ::= <factor> | <term>*<factor> | <term> DIV<factor>
<factor> ::= id | int | (<exp>)
<read> ::= READ(<id-list>)
<write> ::= WRITE(<id-list>) PASCAL語言之部份文法
Non-terminal symbol Terminal symbol
Grammar
2. Parse Tree (Syntax Tree)
It is often convenient to display the analysis of source
statement in terms of a grammar as a tree.
<read>
<id-list>
READ ( id )
VALUEREAD (VALUE)敘述之語法樹
Grammar
3. Precedence and associativity
Precedence *, / > +, - Associativity a + b + c ( (a + b) + c)
Left associativity
Right associativity
Grammar
4. Ambiguous Grammar
There is more than one possible parse
tree for a given statement. <start>
<term>
<term>
<term> <term> <term>
id + id - id
<start>
<term>
<term>
<term> <term> <term>
id + id - id
Grammar
<start>
<term>
<term>
<term> <term> <term>
id + id - id
<start>
<term>
<term>
<term> <term> <term>
id + id - id
<start> ::= <term>
<term> ::= id | <term>+<term> | <term>-<term>
Ambiguous Grammar
Lexical Analysis5.3 Lexical Analysis
Program 內有下列幾類 Token:
a. Identifier
b. Delimiter
c. Reserved Word
d. Constant integer, float, string
1. Identifier
<ident> ::= <letter> | <ident> <letter> | <ident> <digit
>
<letter>::= A | B | C | …..
<digit>::= 0 | 1 | 2 |…..
Multiple character token
Lexical Analysis2. Token and Tables
1. AND2. BOOLEAN3. CALL4. DIMENSION5. ELSE6. ENP7. ENS8. EQ9. GE10. GT11. GTO12. IF13. INPUT14. INTEGER15. LABEL16. LE17. LT18. NE19. OR20. OUTPUT21. PROGRAM22. REAL23. SUBROUTINE24. THEN25. VARIABLE
Table 2 (Reserved Word Table)
Lexical Analysis2. Token and Tables
1 5
2 7
Table 3 (Integer Table)
1 2.7
Table 4 (Real Number Table)
Lexical Analysis2. Token and Tables
Identifier Subroutine Type Pointer
1 U 323 MAIN4 Y 105 V 36 M 378 X 109 M 1010 S1
Table 5 (Identifier Table)
Lexical Analysis2. Token and Tables
PROGRAM MAIN;(2,21) (5,3) (1,1)
VARIABLE INTEGER: U , V , M ;(2,25) (2,14) (1,12) (5,1) (1,11) (5,5) (1,11) (5,6) (1,1)
U = 5 ;(5,1) (1,4) (3,1) (1,1)
V = 7 ;(5,5) (1,4) (3,2) (1,1)
CALL S1 ( U , V , M ) ;(2,3) (5,10) (1,2) (5,1) (1,11) (5,5) (1,11) (5,6) (1,3) (1,1)
ENP ;(2,6) (1,1)
SUBPOUTINE S1 ( INTEGER : X , Y , M ) ;(2,23) (5,10) (1,2) (2,14) (1,12) (5,8) (1,11) (5,4) (1,11) (5,9) (1,3) (1,1)
M = X + Y + 2.7 ;(5,9) (1,4) (5,8) (1,5) (5,4) (1,5) (4,1) (1,1)
ENS ;
(2,7) (1,1)FRANCIS語言所寫之程式,被轉換成記號的格式
Token Specifier(Token Type, Token Value)
Table Entry
Syntax Analysis5.4 Syntax Analysis
1. Building the Parse Tree
a. Top down method
Begin with the rule of the grammar,
and attempt to construct the tree so
that the terminal nodes match the
statements being analyzed.
b. Bottom up method
Begin with the terminal nodes of the
tree, and attempt to combine these into
successively high level nodes until the
root is reached.
Syntax Analysis * Top down method
Begin with the rule of the grammar,
and attempt to construct the tree so
that the terminal nodes match the
statements being analyzed. <start>
<term>
<term>
<term>
id + id - id
Syntax Analysis * Bottom up method
Begin with the terminal nodes of the
tree, and attempt to combine these into
successively high level nodes until the
root is reached.
<term>
<term> <term> <term>
id + id - id
Syntax Analysis2. Operator Precedence Parser Bottom up parser
READ ; := + - ( ) idREAD =
; < > <:= > < < < <+ > > > < > <- > > > < > <( < < < = <) > > >
id > > > >
Precedence Matrix
Syntax AnalysisREAD ; := + - ( ) id
READ =; < > <
:= > < < < <+ > > > < > <- > > > < > <( < < < = <) > > >
id > > > >
Stack input< READ(id);<READ (id)<READ = ( id)<READ = ( <id )<READ = ( <id> )<READ = ( = id-list )<READ = ( = id-list ) >read
<read>
<id-list>
READ ( id )
VALUEREAD (VALUE)敘述之語法樹
Syntax Analysis READ ; := + - ( ) idREAD =
; < > <:= > < < < <+ > > > < > <- > > > < > <( < < < = <) > > >
id > > > >Stack input< id + id - id<id + id - id<id> + id - id<term + id - id<term + < id > - id<term + term > - id<term - < id<term - <id><term - term>term
<start>
<term>
<term>
<term> <term> <term>
id + id - id
Syntax Analysis
Stack input< id + id - id<id + id - id<id> + id - id<term + id - id<term + < id > - id<term + term > - id<term - < id<term - <id><term - term>term
Generally use a stack to save tokens that have
been scanned but not yet parsed
<start> ::= <term>
<term> ::= id | <term>+<term> | <term>-<term>
Syntax Analysis3. Recursive Descent Parser Top down method a. leftmost derivation It must be possible to decide which
alternative to used by examining the next input token
<stmt> id, READ, WRITE
<stmt> ::= <assign> | <read> | <write>
<id-list> ::= id | <id-list>,id
<assign> ::= id:=<exp>
<exp> ::= <term> | <exp>+<term> | <exp>-<term>
<term> ::= <factor> | <term>*<factor> | <term> DIV<factor>
<read> ::= READ(<id-list>)
<write> ::= WRITE(<id-list>) PASCAL語言之部份文法
Syntax Analysis b. left recursive Top down parser can not be used with
grammar that contains left recursive. Because unable to decide between its alternatives tokens.
both id and <id-list> can begin with id.
<id-list> ::= id | <id-list>,id
<assign> ::= id:=<exp>
<exp> ::= <term> | <exp>+<term> | <exp>-<term>
<term> ::= <factor> | <term>*<factor> | <term> DIV<factor>
<factor> ::= id | int | (<exp>)
<read> ::= READ(<id-list>)
<write> ::= WRITE(<id-list>) PASCAL語言之部份文法
Syntax AnalysisModified for recursive descent parser
<id-list> ::= id {, id}
<assign> ::= id:=<exp>
<exp> ::= <term> { +<term> | -<term> }
<term> ::= <factor> { *<factor> | DIV<factor> }
<factor> ::= id | int | (<exp>)
<read> ::= READ(<id-list>)
<write> ::= WRITE(<id-list>) PASCAL語言之部份文法
Code Generation5.5 Code Generation
When the parser recognizes a portion of the source program according to some rule of grammar, the corresponding routine is executed.
Semantic Routine or Code Generation Routines
1.Operator precedence parser When sub-string is reduced to nonterminal
2.Recursive descent parser When procedure return to its caller, indicating su
ccess.
Code Generation<start> ::= <term>
<term> ::= id | <term>+<term> | <term>-<term>
<start>
<term>
<term>
<term> <term> <term>
id + id - id
<term> ::= <term>1 + <term>2 MOV AX, <term>1 ADD AX, <term>2 MOV <term>, AX
<term> ::= <term>1 - <term>2 MOV AX, <term>1 SUB AX, <term>2 MOV <term>, AX
<term> ::= id add id to <term>
Intermediate Form
5.6 Intermediate Form
Three Address Code (Quadruple Form) (operator , operand1 , operand2 , Result)
<term> ::= <term>1 + <term>2
(+, <term>1, <term>2, <term>)
<term> ::= <term>1 - <term>2
(-, <term>1, <term>2, <term>)
<term> ::= id
add id to <term>
Intermediate Form
Variance := sumsq DIV 100 - mean * mean
(DIV, sumsq, #100, i1)
(*, mean, mean, i2)
(-, i1, i2, i3)
(:=, i3, , variance)
Machine Independent Compiler Features
5.7 Machine Independent Compiler Features
1. Storage Allocation
a. Storage Allocation
* Static Allocation
Allocate at compiler time
* Dynamic Allocation
Allocate at run time
Auto : Function call STACK
Controlled : malloc( ), free( ) HEAP
Machine Independent Compiler Features2. Activation Record
Each function call creates an activation record that contains storage for all the variables used by the function, return address,… etc.
Variables
Return Address
Next
Previous
Variables
Return Address
Next
Previous
Stack
Machine Independent Compiler FeaturesActivation Record
MAIN
Call SUB
MAIN Variables
Return Address
Next
Previous
Stack
MAIN
To OS
Machine Independent Compiler FeaturesActivation Record
SUB Variables
MAIN
Return Address
Next
Previous Call SUB
MAIN Variables
Return Address SUB
Next Call SUB
Previous
Stack
MAIN
SUB
To OS
Machine Independent Compiler FeaturesActivation Record
Return Address
SUB Variables
MAIN
Return Address
Next
Previous Call SUB
MAIN Variables
Return Address SUB
Next Call SUB
Previous
Stack
MAIN
SUB
SUB To OS
Machine Independent Compiler Features
3. Prologue and Epilogue
The compiler must generate additional code to manage the activation records themselves.
a. Prologue
The code to create a new activation record
b. Epilogue
The code to delete the current activation record
Machine Independent Compiler Features
4. Structure Variables
Array, Record, String, Set …..
B:array[0..3,0..1] of integer
B[0][0] B[0][1]
B[1][0] B[1][1]
B[2][0] B[2][1]
B[3][0] B[3][1]
B[0][0] B[0][1] B[1][0] B[1][1] B[2][0] B[2][1] B[3][0] B[3][1]
此陣列為列優先
B[0][0] B[1][0] B[2][0] B[3][0] B[0][1] B[1][1] B[2][1] B[3][1]
此陣列為行優先
Machine Independent Compiler Features
Type B[a-b] [c-d]
Address of B[s][t]
Row Major
[(s - a) *(d - c +1) + (t - c) ] * sizeof(Type) + Base address
Column Major
[(t - c) *(b - a +1) + (s - a) ] * sizeof(Type) + Base address
B[0][0] B[0][1] B[1][0] B[1][1] B[2][0] B[2][1] B[3][0] B[3][1]
此陣列為列優先
B[0][0] B[1][0] B[2][0] B[3][0] B[0][1] B[1][1] B[2][1] B[3][1]
此陣列為行優先
Machine Independent Compiler Features
5. Code Optimization
For I:= 1 to 10 Begin x[I, 2*J-1] := T[I, 2*J]; Table[I] := 2**I; END
T1:= 2 *J;T2 := T1 - 1;K := 1;For I:= 1 to 10 Begin x[I, T2] := T[I, T1]; K := K * 2; Table[I] := K; END
a. Common Sub-expression
b. Loop In-variants
c. Reduction in Strength
Compiler Design Option
5.8 Compiler Design Option
1. Interpreter
An interpreter processes a source program written
in a high level language, just as a compiler does.
The main difference is that interpreters execute a
version of the source directly.
An interpreter can be viewed as a set of functions,
the execution of these functions is driven by the
internal form of the program.
Compiler Design Option
2. P Code Compiler
* P Code 就是 Byte Code, 是一種與機器無關 (Machine Independent) 的語言
* 可以跨平台在不同種類的電腦內執行。
Source Java Byte
Program Interpreter Code
Byte Java
Code Run Module Run
Compiler Design Option3. Compiler-Compiler
A software tool that can be used to help in the task of compiler construction.
Uses Finite State Automata
YACC Parser Generator
LEX Scanner GeneratorUnix