Chapter 2 A Simple Compiler

Post on 02-Jan-2016

65 views 4 download

Tags:

description

Chapter 2 A Simple Compiler. Prof Chung. 1. 1. Outlines. 2.1 The Structure of a Micro Compiler 2.2 A Micro Scanner 2.3 The Syntax of Micro 2.4 Recursive Descent Parsing 2.5 Translating Micro. 2. Outlines. 2.1 The Structure of a Micro Compiler 2.2 A Micro Scanner - PowerPoint PPT Presentation

Transcript of Chapter 2 A Simple Compiler

11

Chapter 2 A Simple Compiler

Prof Chung.

1 NTHU SSLAB 04/20/23

22

Outlines 2.1 The Structure of a Micro Compiler 2.2 A Micro Scanner 2.3 The Syntax of Micro 2.4 Recursive Descent Parsing 2.5 Translating Micro

NTHU SSLAB 04/20/23

33

Outlines 2.1 The Structure of a Micro Compiler 2.2 A Micro Scanner 2.3 The Syntax of Micro 2.4 Recursive Descent Parsing 2.5 Translating Micro

NTHU SSLAB 04/20/23

44

The Structure of a Micro compiler(1)

Micro : a very simple language Only integers No declarations Variables consist of A..Z, 0..9 At most 32 characters long Comments begin with “--” and end with end-of-line. Three kinds of statements:

assignments, e.g., a := b + c read(list of ids), e.g., read(a, b) write(list of exps), e.g., write(a+b)

Begin, end, read, and write are reserved words. Tokens may not extend to the following line.

NTHU SSLAB 04/20/23

One-pass type, no explicit intermediate representations used

The interface Parser is the main routine. Parser calls scanner to get the next token. Parser calls semantic routines at appropriate times. Semantic routines produce output in assembly language. A simple symbol table is used by the semantic routines.

The Structure of a Micro compiler(2)

04/20/23NTHU SSLAB5

66

Outlines 2.1 The Structure of a Micro Compiler 2.2 A Micro Scanner 2.3 The Syntax of Micro 2.4 Recursive Descent Parsing 2.5 Translating Micro

NTHU SSLAB 04/20/23

A Micro Scanner (1) The Micro Scanner

will be a function of no arguments that returns token values

There are 14 tokens.

04/20/23NTHU SSLAB7

typedef enum token_types {BEGIN, END, READ, WRITE, ID, INTLITERAL,LPAREN, RPAREN, SEMICOLON, COMMA,

ASSIGNOP,PLUOP, MINUSOP, SCANEOF} token;

Extern token scanner(void);

A Micro Scanner (2) The Micro Scanner

Returns the longest string that constitutes a token, e.g., in

abcdef

ab, abc, abcdef are all valid tokens.

The scanner will return the longest one e.g.,

abcdef

04/20/23NTHU SSLAB8

A Micro Scanner (3) The Micro Scanner

Continue: skip one iteration getchar() , isspace(), isalpha(), isalnum(), isdigital()

ungetc(): push back one character

04/20/23NTHU SSLAB9

A Micro Scanner (4) The Micro Scanner

Continue: skip one iteration getchar() , isspace(), isalpha(), isalnum(), isdigital()

ungetc(): push back one character

04/20/23NTHU SSLAB10

A Micro Scanner (5) The Micro Scanner

How to handle RESERVED words? Reserved words are similar to identifiers.

Two approaches: Use a separate table of reserved words Put all reserved words into symbol table initially.

04/20/23NTHU SSLAB11

A Micro Scanner (6) The Micro Scanner

Provision for saving the characters of a token as they are scanned token_buffer buffer_char() clear_buffer() check_reserved()

Handle end of file feof(stdin)

04/20/23NTHU SSLAB12

1313

Outlines 2.1 The Structure of a Micro Compiler 2.2 A Micro Scanner 2.3 The Syntax of Micro 2.4 Recursive Descent Parsing 2.5 Translating Micro

NTHU SSLAB 04/20/23

The Syntax of Micro (1) The Syntax of Micro

Micro's syntax is defined by a context-free grammar (CFG) CFG is also called BNF (Backus-Naur Form) grammar

CFG consists of a set of production rules,

04/20/23NTHU SSLAB14

AB C D Z

LHS must be a single nonterminal

RHS consists 0 or more terminals or nonterminals

The Syntax of Micro (2) Two kinds of symbols

Nonterminals Delimited by “<“ and “>” Represent syntactic structures

Terminals Represent tokens

E.g. <program> begin <statement list> end

Start or goal symbol : empty or null string E.g.

<statement list> <statement><statement tail> <statement tail> <statement tail> <statement><statement tail>

04/20/23NTHU SSLAB15

The Syntax of Micro (3) Extended BNF: some abbreviations

optional: [ ] 0 or 1 <stmt> if <exp> then <stmt>

<stmt> if <exp> then <stmt> else <stmt>

<stmt> if <exp> then <stmt> [ else <stmt> ]

repetition: { } 0 or more <stmt list> <stmt> <tail>

<tail> <tail> <stmt> <tail>

<stmt list> <stmt> { <stmt> }

04/20/23NTHU SSLAB16

The Syntax of Micro (4) Extended BNF: some abbreviations

alternative: | or <stmt> <assign>

<stmt> <if stmt>

<stmt> <assign> | <if stmt>

Extended BNF == BNF Either can be transformed to the other.

Extended BNF is more compact and readable

04/20/23NTHU SSLAB17

The Syntax of Micro (5) Extended CFG Defining Micro1. <program> begin <statement list> end2. <statement list> <statement> { <statement> }3. <statement> ID := <expression> ;4. <statement> read ( <id list> ) ;5. <statement> write ( <expr list> ) ;6. <id list> ID { , ID }7. <expr list> <expression> { ,

<expression> }8. <expression> <primary> { <add op> <primary> }9. <primary> ( <expression> )10.<primary> ID11.<primary> INTLITERAL12.<add op> +13.<add op> -14.<system goal> <program> SCANEOF

04/20/23NTHU SSLAB18

19

The Syntax of Micro (5)

<system goal> (Apply rule 14)

<program> SCANEOF(Apply rule 1)

begin <statement list> end SCANEOF (Apply rule 2)

begin <statement> { <statement> } end SCANEOF (Apply rule 3)

begin ID := <expression> ; { <statement> } end SCANEOF (Apply rule 8)

begin ID := <primary> { <add op> <primary> }; { <statement> } end SCANEOF (Apply rule 10)

begin ID := ID {<add op> <primary> }; { <statement> } end SCANEOF (Apply rule 12)

begin ID := ID + <primary>; { <statement> } end SCANEOF (Apply rule 9)

begin ID := ID + ( <expression> ) ;{ <statement> } end SCANEOF (Apply rule 8)

begin ID := ID + ( <primary> { <add op> <primary> } ) ; { <statement> } end SCANEOF (Apply rule 11)

begin ID := ID + ( INTLITERAL <add op> <primary> ); { <statement> } end SCANEOF (Apply rule 13)

begin ID := ID + (NTLITERAL - <primary> ); { <statement> } end SCANEOF (Apply rule 10)

begin ID := ID + ( INTLITERAL - ID); { <statement> } end SCANEOF

begin ID := ID + ( INTLITERAL - ID ) ; end SCANEOF

NTHU SSLAB 04/20/23

begin ID := ID + ( INTLITERAL - ID ) ; end

The Syntax of Micro (6) A CFG

defines a language, which is a set of sequences of tokens

Syntax errors & semantic errors A:=‘X’+True;

Associativity A-B-C

Operator precedence A+B*C

04/20/23NTHU SSLAB20

The Syntax of Micro (7) A grammar fragment

defines such a precedence relationship <expression> <factor> {<add op><factor>} <factor> <primary> {<mult op><primary>} <primary>(< expression >) <primary>ID <primary>INTLITERAL

04/20/23NTHU SSLAB21

<expression>

<factor> <add op> <factor>

<primary> + <primary> <mult op> <primary>

ID ID * ID

Derivation Tree for A+B*C

<primary>

<factor>

<primary>

ID

<expression>

<mult op>

*

<primary>

<expression>

<factor> <add op> <factor>

+ <primary>

ID ID

Derivation Tree for (A+B)*C

)(

2222

Outlines 2.1 The Structure of a Micro Compiler 2.2 A Micro Scanner 2.3 The Syntax of Micro 2.4 Recursive Descent Parsing 2.5 Translating Micro

NTHU SSLAB 04/20/23

Recursive Descent Parsing (1) There are many parsing techniques.

Recursive descent is one of the simplest parsing techniques

Basic idea (1) Each nonterminal has a parsing procedure For symbol on the RHS : a sequence of matching

To Match a nonterminal A Call the parsing procedure of A

To match a terminal symbol t Call match(t)

match(t) calls the scanner to get the next token. If is it t, everything is correct. If it is not t, we have found a syntax error.

04/20/23NTHU SSLAB23

Recursive Descent Parsing (2) Basic idea (2)

If a nonterminal has several productions, choose an appropriate one based on the next input token.

Parser is started by invoking system_goal().

04/20/23NTHU SSLAB24

void system_goal(void){

/* <system goal> ::= <program> SCANEOF */

program();match(SCANEOF);

}

Recursive Descent Parsing (3) void program(void)

04/20/23NTHU SSLAB25

void program(void) {

/* <program> ::= BEGIN <statement list>

END */match(BEGIN)statement_list();match(END);

}

Recursive Descent Parsing (4) void statement_list(void)

04/20/23NTHU SSLAB26

void statement_list(void){

/* <statement list> ::= <statement> { <statement> } */statement();while (TRUE) {

switch (next_token()) {

case ID:case READ:case WRITE:

statement();break;

default:return;

}}

}

處理{<statement>}

不出現的情形

next_token() a function that returns the next token. It does not call scanner(void).

Recursive Descent Parsing (5)

04/20/23NTHU SSLAB27

void statement(void)void statement(void) {

token tok = next_token();switch (tok) {case ID:

/* <statement> ::= ID := <expression> ; */

match(ID); match(ASSIGNOP);expression(); match(SEMICOLON);break;

case READ:/* <statement> ::= READ ( <id

list> ) ; */match(READ); match(LPAREN);id_list(); match(RPAREN);

match(SEMICOLON);break;

case WRITE:/* <statement> ::= WRITE (<expr

list>) ; */match(WRITE); match(LPAREN);expr_list(); match(RPAREN);match(SEMICOLON);break;

default:syntax_error(tok);break;

}}

<statement>

必出現一次

Recursive Descent Parsing (6) void id_list(void)

04/20/23NTHU SSLAB28

void id_list(void){

/* <id list> ::= ID { , ID } */match(ID);while (next_token() == COMMA) {

match(COMMA);match(ID);

}}

Recursive Descent Parsing (7) void expression(void)

04/20/23NTHU SSLAB29

void expression(void){

/* <expression> ::= <primary> { <add op> <primary> } */token t;primary();for (t = next_token(); t == PLUSOP || t == MINUSOP; t = next_token()) {

add_op();primary();

}}

Recursive Descent Parsing (8)

04/20/23NTHU SSLAB30

void expr_list(void){

/* <expr list> ::= <expression> { ,

<expression> } */expression();while (next_token() == COMMA) {

match(COMMA);expression();

}}

Recursive Descent Parsing (9)

04/20/23NTHU SSLAB31

void add_op(void){

/* <addop> ::= PLUSOP I MINUSOP */token tok = next_token();if (tok == PLUSOP || tok ==MINUSOP)

match(tok);else

syntax_error(tok);}

Recursive Descent Parsing (10) void primary(void)

04/20/23NTHU SSLAB32

void primary(void){

token tok = next_token();switch (tok) {case LPAREN:/* <primary> ::= ( <expression> ) */

match(LPAREN); expression();

match(RPAREN);break;

case ID:/* <primary> ::= ID */match(ID);break;

case INTLITERAL:/* <primary> ::=

INTLITERAL */match(INTLITERAL);break;

default:syntax_error(tok);break;

}}

3333

Outlines 2.1 The Structure of a Micro Compiler 2.2 A Micro Scanner 2.3 The Syntax of Micro 2.4 Recursive Descent Parsing 2.5 Translating Micro

NTHU SSLAB 04/20/23

Translating Micro (1) Target language: 3-addr code (quadruple)

OP A, B, C

Note that we did not worry about registers at this time. temporaries: Sometimes we need to hold temporary values.

E.g. A+B+CADD A,B,TEMP&1ADD TEMP&1,C,TEMP&2

04/20/23NTHU SSLAB34

Translating Micro (2) Action Symbols

The bulk of a translation is done by semantic routines

Action symbols can be added to a grammar to specify when semantic processing should take place Be placed anywhere in the RHS of a production

translated into procedure call in the parsing procedures #add corresponds to a semantic routine named add()

No impact on the languages recognized by a parser driven by a CFG

04/20/23NTHU SSLAB35

Translating Micro (3) Grammar for Mirco with Action Symbols

1. <program> #start begin <statement list> end2. <statement list> <statement> { <statement> }3. <statement> <ident> := <expression> #assign ;4. <statement> read ( <id list> ) ;5. <statement> write ( <expr list> ) ;6. <id list> <ident> #read_id { , <ident> #read_id }7. <expr list> <expression> #write_expr { ,

<expression> #write_expr }8. <expression> <primary> { <add op> <primary>

#gen_infix }9. <primary> ( <expression> )10. <primary> <ident> 11. <primary> INTLITERAL #process_literal 12. <add op> + #process_op13. <add op> - #process_op14. <ident> ID #process_id15. <system goal> <program> SCANEOF #finish

04/20/23NTHU SSLAB36

Semantic Information (1) Semantic routines need certain information to

do their work. These information is stored in semantic records. Each kind of grammar symbol has a semantic record

04/20/23NTHU SSLAB37

#define MAXIDLEN 33typedef char string[MAXIDLEN];/* for operators */typedef struct operator {

enum op { PLUS, MINUS } operator;} op_rec;/* for <primary> and <expression> */enum expr { IDEXPR, LITERALEXPR, TEMPEXPR };typedef struct expression {

enum expr kind;union {

string name; /* for IDEXPR, TEMPEXPR */int val; /* for LITERALEXPR */

};} expr_rec; semantic records for Micro Symbols

Semantic Information (2) Parsing procedure

04/20/23NTHU SSLAB38

void expression(void){

token t;

/* <expression> ::= <primary> { <add op> <primary> } */

primary();for (t = next_token(); t == PLUSOP ||

t == MINUSOP; t = next_token()) {

add_op();primary();

}}

Old parsing procedure ( P. 37 )

void expression (expr_rec *result){

expr_rec left_operand, right_operand;op_rec op;

/* <expression> ::= <primary> { <add op> <primary> #gen_infix }*/

primary( &left_operand)while (next_token() == PLUSOP ||

next_token() == MINUSOP) {add_op( &op);primary( &right_operand);left_operand

=gen_infix(left_operand, op, right_operand);

}*result = left_operand;

}

New parsing procedure which involved semantic routines

<expression> <primary> {<add op> <primary> #gen_infix}

Semantic Information (3) Subroutines for symbol table

04/20/23NTHU SSLAB39

Symbol table

Is s in the symbol ?

extern int lookup(string s);

/* Put s unconditionally into symbol table. */extern void enter(string s);

void check_id(string s){

if (! lookup(s)) {enter(s);generate("Declare", s,

"Integer“," ");}

}

Semantic Information (4) Subroutines for temporaries

04/20/23NTHU SSLAB40

Temporaries

char *get_temp(void) {/* max temporary allocated so far*/

static int max_temp = 0; static char tempname[MAXIDLEN]; max_temp++; sprintf(tempname, "Temp&%d", max_temp); check_id(tempname); return tempname;}

Semantic Information (5) Subroutines

04/20/23NTHU SSLAB41

start

void start(void)

{

/* Semantic initializations, none needed. */

}

Semantic Information (6) Subroutines

04/20/23NTHU SSLAB42

assign

void assign(expr_rec target, expr_rec source){

/* Generate code for assignment. */generate("Store", extract(source),

target.name, "");}

Semantic Information (7) Subroutines

04/20/23NTHU SSLAB43

read_id

void read_id(expr_rec in_var){

/* Generate code for read. */generate("Read", in_var.name,"Integer",

"");}

Semantic Information (8) Subroutines

04/20/23NTHU SSLAB44

write_expr

void write_expr(expr_rec out_expr){

/* Generate code for write. */generate("Write",

extract(out_expr),"Integer", "");}

Semantic Information (9) Subroutines

04/20/23NTHU SSLAB45

gen_infix

expr_rec gen_infix(expr_rec e1, op_rec op, expr_rec e2){

/* Generate code for infix operation. * Get result temp and set up semantic * record for result. */expr_rec e_rec;/* An expr_rec with temp variant set. */e_rec.kind = TEMPEXPR;

strcpy(e_rec.name, get_temp());generate(extract(op), extract(e1),extract(e2), e_rec.name);return e_rec;

}

Semantic Information (10) Subroutines

04/20/23NTHU SSLAB46

process_literal

expr_rec process_literal(void){

/* Convert literal to a numeric representation and build semantic record. */

expr_rec t;t.kind = LITERALEXPR;(void) sscanf(token_buffer, “%d",&t.val);return t;

}

Semantic Information (11) Subroutines

04/20/23NTHU SSLAB47

process_op

op_rec process_op(void){

/* Produce operator descriptor. */op_rec o;if (current_token == PLUSOP)

o.operator = PLUS;else

o.operator = MINUS;return o;

}

Semantic Information (12) Subroutines

04/20/23NTHU SSLAB48

process_id

expr_rec process_id(void){

/* Declare ID and build a corresponding semantic record. */

expr_rec t;check_id(token_buffer);t.kind = IDEXPR;strcpy(t.name, token_buffer);return t;

}

Semantic Information (13) Subroutines

04/20/23NTHU SSLAB49

finish

void finish(void)

{

/* Generate code to finish program. */

generate("Halt", "", "");

}

Tracing Example (1)

04/20/23NTHU SSLAB50

Tracing Example (2)

04/20/23NTHU SSLAB51

Tracing Example (3)

04/20/23NTHU SSLAB52

Tracing Example (4)

04/20/23NTHU SSLAB53

Tracing Example (5)

04/20/23NTHU SSLAB54

Tracing Example (6)

04/20/23NTHU SSLAB55

Tracing Example (7)

04/20/23NTHU SSLAB56

Tracing Example (8)

04/20/23NTHU SSLAB57

Tracing Example (9)

04/20/23NTHU SSLAB58

Tracing Example (10)

04/20/23NTHU SSLAB59

Chapter 2 End

Any Question?

04/20/23NTHU SSLAB60