4.1 4. Phase 2 : Syntax Analysis Part II The unit directory. What you must do. Example run....

31
4.1 4. Phase 2 : Syntax Analysis Part II The unit directory. What you must do. Example run. syner.cxx The lookahead convention. Error detection and recovery. Symbol table lookup. Parsing declarations. Parsing statements. Parsing expressions.

Transcript of 4.1 4. Phase 2 : Syntax Analysis Part II The unit directory. What you must do. Example run....

4.14.1

4. Phase 2 : Syntax Analysis Part II4. Phase 2 : Syntax Analysis Part II

• The unit directory.

• What you must do.

• Example run.

• syner.cxx

• The lookahead convention.

• Error detection and recovery.

• Symbol table lookup.

• Parsing declarations.

• Parsing statements.

• Parsing expressions.

4.24.2

The Unit DirectoryThe Unit Directory

• The unit directory for phase 2 is :

/usr/users/staff/aosc/cm049icp/phase2

• Among other things, it contains the following :

– synprog.cxx : the test bed program for phase 2.

– syner.template : A template file for your phase 2 program.

– syner.h : header file for phase2. AST : Abstract syntax tree data structure. SymTab : Symbol table data structure. syntax : Array of syntax error messages. statics : Array of static semantic error messages. type : Array of type error messages.

– makefile : The makefile for phase 2.

– syner : An executable for my phase 2 program.

4.34.3

The Unit Directory IIThe Unit Directory II

– tests/test*.c-- : Testing programs for the demo.

– printers.cxx : Printing subprograms.

void printAST(AST *ast, // AST int &line, // Line number int indent) // Indentation

void printST(SymTab *st) // Symbol Table

– utilities.cxx : General utilities

void error(int number, // Error no. LexToken lexToken) // Token

bool lookup(LexToken lexToken, // Token SymTab *st, // Sym. Tab. SymTab *&match) // Entry

4.44.4

synprog.cxxsynprog.cxx

• The test bed program is as follows :

#include “.../phase2/syner.h”

void main(){ SymTab *st = NULL ; AST *ast = NULL ; int line = 1 ; int indent = 0 ; int label = 0 ;

synAnal(st, ast, label) ; printAST(ast, line, indent) ; printST(st) ;}

• printAST : Pretty prints the AST.

• printST : Pretty prints the symbol table.

• They ensure that your program’s output format is the same as mine.

You must write this

subprogram

You must write this

subprogram

4.54.5

What You Must DoWhat You Must Do

• Your implementation of synAnal must be in a file called syner.cxx in your directory.

• Take a copy of makefile and syner.template.

• Print out a copy of syner.h.

• Print out a copy of utilities.cxx.

• Print out a copy of printers.cxx.

• Useful commands : testphase2, demophase2.

– Shell scripts for running the phase 2 demo.

– They rely on your synAnal using the printing subprograms from printers.cxx.

– You must #include printers.cxx and utilities.cxx into your syner.cxx file.

4.64.6

Example RunExample Run

• Assuming we have the following valid C-- program in a file called prog.c-- :

int i = 10 ;{ while (i > 0) { cout << i ; } ;}

• Make and run syner :

jaguar> make synerjaguar> syner < prog.c--1 -- { 2 -- while (i > 0) /* Start label : 0 */3 -- { 4 -- cout << i ;5 -- } /* End label : 1 */ ;6 -- }

Name Type Constant Initial Valuei int False 10jaguar>

Output format may change. Make sure printers.cxx is #included in your

syner.cxx.

Output format may change. Make sure printers.cxx is #included in your

syner.cxx.

4.74.7

Example Run IIExample Run II

• Assuming we have the following invalid C-- program in a file called prog.c-- :

bool b = true ;{ cout >> b ;}

• Run syner :

jaguar> syner < prog.c--Type error 218.Attempt to input to a non-integer variable.IDENTIFIER : bjaguar>

4.84.8

Top Level Structure For syner.cxxTop Level Structure For syner.cxx

• Contents of syner.template :

#include <iostream.h>#include <iomanip.h>#include <ctype.h>#include <stddef.h>#include <stdlib.h>

#include “.../lib/cstring.h”#include “.../phase2/syner.h”#include “.../phase1/lexer.h”

#include “.../phase2/printers.cxx#include “.../phase2/utilities.cxx

void synDec(SymTab *&st, LexToken &lexToken){ cout << “synDec\n” ; }

void synDeclarations(SymTab *&st, LexToken &lexToken){ cout << “synDeclarations\n” ; }

4.94.9

Top Level Structure For syner.cxx IITop Level Structure For syner.cxx II

// Forward declaration.void synExpression(SymTab *st, Expression *&expr, LexToken &lexToken, DataType &type) ;

void synFactor(SymTab *st, Factor *&fact, LexToken &lexToken){ cout << “synFactor\n” ; }

void synTerm(SymTab *st, Term *&term, LexToken &lexToken, DataType &type){ cout << “synTerm\n” ; }

void synBasicExp(SymTab *st, BasicExp *&bexp, LexToken &lexToken) DataType &type){ cout << “synBasicExp\n” ; }

4.104.10

Top Level Structure For syner.cxx IIITop Level Structure For syner.cxx III

void synExpression(SymTab *st, Expression *&expr, LexToken &lexToken, DataType &type){ cout << “synExpression\n” }

//Forward declaration.void synStatements(SymTab *st, AST *&ast, int &label, LexToken &lexToken) ;

void synIfSt(SymTab *st, AST *&ast, int &label, LexToken &lexToken){ cout << “synIfSt\n” ; }

void synWhileSt(SymTab *st, AST *&ast, int &label, LexToken &lexToken){ cout << “synWhileSt\n” ; }

4.114.11

Top Level Structure For syner.cxx IVTop Level Structure For syner.cxx IV

void synCinSt(SymTab *st, AST *&ast, int &label, LexToken &lexToken){ cout << “synCinSt\n” ; }

void synCoutSt(SymTab *st, AST *&ast, int &label, LexToken &lexToken){ cout << “synCoutSt\n” ; }

void synAssignSt(SymTab *st, AST *&ast, int &label, LexToken &lexToken){ cout << synAssignSt\n” ; }

4.124.12

Top Level Structure For syner.cxx VTop Level Structure For syner.cxx V

void synStatement(SymTab *st, AST *&ast, int &label, LexToken &lexToken){ cout << “synStatement\n” ; }

void synStatements(SymTab *st, AST *&ast, int &label, LexToken &lexToken){ cout << “synStatements\n” ; }

4.134.13

Top Level Structure For syner.cxx VITop Level Structure For syner.cxx VI

void synAnal(SymTab *&st, AST *&ast, int &label){ LexToken lexToken ; // Current token

skipWhiteComments() ; lexAnal(lexToken) ;

st = NULL ; ast = NULL ;

synDeclarations(st, lexToken) ; synStatements(st, ast, label, lexToken) ;}

• Note the strange way pointers are passed as reference parameters :

• label holds the number of the next M68K label to be used. All labels are of the form L0, L1, .. etc.

4.144.14

Lookahead ConventionsLookahead Conventions

• C-- has an LL(1) grammar.

– Sometimes we must read one token beyond the end of a syntactic construct. When parsing an if statement to determine whether or not

there’s an else part.

• Conventions :

– 1. All the syntax analysis subprograms assume that they will be passed their first token as a parameter.

– 2. All the syntax analysis subprograms will read one token beyond the end of the syntactic construct they are attempting to parse and will pass that token back to their caller.

– Exception : synStatements does not follow convention 2.

– Otherwise it would read past the end of the input file.

– Luckily, synStatements does not need to lookahead.

4.154.15

Error Detection & RecoveryError Detection & Recovery

• Detecting errors is one thing that we’ll award marks for.

– RTFC for syner.cxx to see what errors you have to detect.

• Syntax errors.

27 = A ;

• Type errors.

bool b ;{ cin >> b ; }

• Static semantic errors.

string s ;

• As usual, upon detection of an error simply call error with an error number and the offending lexical token.

error(101, lexToken) ;

• error prints an error message and calls exit.

4.164.16

Symbol Table LookupSymbol Table Lookup

• One of the most common static semantic errors is the use of an undeclared variable.

• The following subprogram is in utilities.cxx :

bool lookup(LexToken lexToken, SymTab *st, SymTab *&match)

• lookup looks inspects the symbol table to see if the identifier held in lexToken has already been declared.

– If not already declared returns false and match is set to NULL.

– If already declared returns true and sets match to a copy of the Entry for the identifier.

4.174.17

Symbol Table Lookup IISymbol Table Lookup II

• lookup is also useful for checking that identifiers have the correct type.

• Note that lookup is a value returning subprogram which may also assign a value to a reference parameter.

– i.e. it is a side effecting ‘function’.

• This is disgusting programming practice.

– It’s also standard programming practice in this instance.

– Don’t do this kind of thing yourself unless you really want a zero mark.

4.184.18

Code For synDeclarationsCode For synDeclarations

void synDeclarations(SymTab *&st, LexToken &lexToken){ while (lexToken.tag != LBRACE) { synDec(st, lexToken) ; if (lexToken.tag != TERMINATOR) error(3, lexToken) ; else lexAnal(lexToken) ; }}

• error is in utilities.cxx.

– Prints out the error message corresponding to the number. 0..99 : Syntax errors. 100..199 : Static semantic errors. 200..299 : Type errors.

– Prints out the offending token (using writeToken).

– Calls exit to terminate the program.

4.194.19

Top Level Code For synDecTop Level Code For synDec

void synDec(SymTab *&st, LexToken &lexToken){ SymTab *newEntry ; // For this declaration SymTab *dummy ; // For lookup LexToken idToken ; // Var/const name Create new Symtab entry and initialise it (set type to VOIDDATA).

if (lexToken.tag == CONST) { newEntry->constFlag = true ; Lex another token into lexToken. } lexToken is the type. Set type field. If not valid type

raise an error.

Lex the identifier token. Copy it to idToken for better error reporting. if (lexToken.tag == IDENT) newEntry->ident = lexToken.ident ; else error(1, idToken) ; if (lookup(lexToken, st, dummy)) error(101, idToken) ;

4.204.20

Top Level Code For synDec IITop Level Code For synDec II

Lex next token. if (lexToken.tag == ASSIGN) { newEntry->initialise = new Factor ; Lex next token. if (lexToken.tag == BOOLLIT) { if (newEntry->type != BOOLDATA) error(217, idToken) ; newEntry->initialise->literal = true ; newEntry->initialise->type = BOOLDATA ; newEntry->initialise->litBool = lexToken.boolLit ; } else if (lexToken.tag == STRINGLIT) { As above but for a string. } else if (lexToken.tag == INTLIT) { As above but for an int. } else Not a literal so call error.

Lex next token. }

4.214.21

Top Level Code For synDec IIITop Level Code For synDec III

if ((newEntry->constFlag) && (newEntry->Initialise == NULL)) error(103,idToken) ;

if ((newEntry->type == STRINGDATA) && (!newEntry->constFlag)) error(104,idToken) ;

Add newEntry to the front of st.} // synDec

• Note that this will build the symbol table in reverse order of the order of the declarations.

– This is no problem.

– When parsing some languages (e.g. C, C++) it’s actually an advantage.

4.224.22

Top Level Code For synStatementsTop Level Code For synStatements

void synStatements(SymTab *st, AST *&ast, int &label, LexToken &lexToken){ AST *newast = NULL ; // Next statement AST *temp1 = NULL ; // For reversing AST *temp2 = NULL ; // the statement AST *temp3 = NULL ; // list

if (lexToken.tag != LBRACE) error(4, lexToken) ; Lex the next token.

while (lexToken.tag != RBRACE) { newast = new AST ;

synStatement(st, newast, label, lexToken) ;

if (lexToken.tag != TERMINATOR) error(8, lexToken) ;

4.234.23

Top Level Code For synStatements IITop Level Code For synStatements II

if (lexToken.tag != TERMINATOR) error(8, lexToken) ;

newast->next = ast ; ast = newast ;

Lex the next token. }

temp1 = ast ; while (temp1 != NULL) { temp3 = temp1->next ; temp1->next = temp2 ; temp2 = temp1 ; temp1 = temp3 ; } ast = temp2 ;} // synStatements

Statement list built in

reverse order. Must reverse it back again after it has

been parsed.

Statement list built in

reverse order. Must reverse it back again after it has

been parsed.

4.244.24

synStatementsynStatement

• synStatement will call the following subprograms :

– synIfSt, synWhileSt, synCinSt, synCoutSt, synAssignSt.

• synStatement decides which to call via an if statement which examines lexToken.tag.

– IF token : call synIfSt. WHILE token : call synWhileSt. CIN token : call synCinSt. COUT token : call synCoutSt, IDENT token : call synAssignSt. None of the above : call error.

• synIfSt and synWhileSt will call synStatements to syntax analyse their compound statements.

– Recursive Descent parsing : a set of mutually recursive subprograms, one per production rule in the EBNF syntax.

4.254.25

Parsing Input StatementsParsing Input Statements

• By far the easiest of the statement parsing subprograms are synCinSt and synCoutSt :

void synCinSt(SymTab *st, AST *&ast, int &label, LexToken &lexToken){ ast->tag = CINST ; ast->cinst = new CinSt ;

lexAnal(lexToken) ; if (lexToken.tag != INOP) error(9, lexToken) ;

lexAnal(lexToken) ; if (lexToken.tag != IDENT) error(11, lexToken) ;

4.264.26

Parsing Input Statements IIParsing Input Statements II

if (!lookup(lexToken, st, ast->cinst->invar)) error(102, lexToken) ;

if (ast->cinst->invar->type != INTDATA) error(216, lexToken) ;

if (ast->cinst->invar->constFlag) error(105, lexToken) ;

lexAnal(lexToken) ;}

• Most of the code is to check for syntax, static semantic and type errors.

4.274.27

Parsing if-else StatementsParsing if-else Statements

• This is slightly more complex :

– Need to use lookahead.

– Must handle the label field.

– Must parse the conditional expression.

– Must parse the enclosed statements.

void synIfSt(SymTab *st, AST *&ast, int &label, LexToken &lexToken){ DataType type = VOIDDATA ; Expression *expr = NULL ; AST *thenpart = NULL ; AST *elsepart = NULL ;

4.284.28

Parsing if-else Statements IIParsing if-else Statements II

ast->tag = IFST ; ast->ifst = new IfSt ; ast->ifst->elselabel = label++ ; ast->ifst->endlabel = label++ ; lexAnal(lexToken) ; if (lexToken.tag != LPAREN) error(6, lexToken) ;

lexAnal(lexToken) ; synExpression(st, expr, lexToken, type) ;

if (type != BOOLDATA) error(202, lexToken) ;

if (lexToken.tag != RPAREN) error(7, lexToken) ;

4.294.29

Parsing if-else Statements IIIParsing if-else Statements III

lexAnal(lexToken) ; synStatements(st, thenpart, label, lexToken) ; lexAnal(lexToken) ;

if (lexToken.tag == ELSE) { lexAnal(lexToken) ; synStatements(st, elsepart, label, lexToken) ; lexAnal(lexToken) ; }

ast->ifst->condition = expr ; ast->ifst->thenstats = thenpart ; ast->ifst->elsestats = elsepart ;} // synIfSt

synStatements doesn’t lex ahead

so must lex another token

after each call to it.

synStatements doesn’t lex ahead

so must lex another token

after each call to it.

4.304.30

ExpressionsExpressions

• synIfSt must call synExpression to parse the conditional expression.

– So must synWhileSt and synAssignSt.

• The next lecture covers how to write synExpression.

• For now, assume that all expressions are simply literal constants. if (true) while (false) x = 42; { ... } ; { ... } ;

• Code to parse literals is on slide 20.

• Handling expressions can be a bit fiddly.

– Not difficult, just fiddly.

4.314.31

SummarySummary

• Copy syner.template, makefile and syner (renamed dhsyner) into your directory.

• Print out syner.h, utilities.cxx and printers.cxx.

• Rename syner.template to syner.cxx.

• Complete the stubs in syner.cxx in the following order :

– synDeclarations, synDec, synStatements, synStatement, synCinSt, synCoutSt, synAssignSt, synWhileSt, synIfSt.

• For now, assume all expressions are simply literal constants.

– synExpression uses code from slide 20.