Syntax Analysis
description
Transcript of Syntax Analysis
![Page 1: Syntax Analysis](https://reader035.fdocuments.us/reader035/viewer/2022062517/568131ae550346895d981c87/html5/thumbnails/1.jpg)
Syntax Analysis
Mooly Sagiv
html://www.math.tau.ac.il/~msagiv/courses/wcc01.htmlTextbook:Modern Compiler Implementation in C
Chapter 3
![Page 2: Syntax Analysis](https://reader035.fdocuments.us/reader035/viewer/2022062517/568131ae550346895d981c87/html5/thumbnails/2.jpg)
A motivating example• Create a desk calculator• Challenges
– Non trivial syntax
– Recursive expressions (semantics)• Operator precedence
![Page 3: Syntax Analysis](https://reader035.fdocuments.us/reader035/viewer/2022062517/568131ae550346895d981c87/html5/thumbnails/3.jpg)
Solution (lexical analysis)/* desk.l */
%%[0-9]+ { yylval = atoi(yytext);
return NUM ;}“+” { return PLUS ;}“-” { return MINUS ;}“/” { return DIV ;}“*” { return MUL ;} “(“ { return LPAR ;}“)” { return RPAR ;}“//”.*[\n] ; /* comment */[\t\n ] ; /* whitespace */. { error (“illegal symbol”, yytext[0]); }
![Page 4: Syntax Analysis](https://reader035.fdocuments.us/reader035/viewer/2022062517/568131ae550346895d981c87/html5/thumbnails/4.jpg)
Solution (syntax analysis)/* desk.y */
%token NUM%left PLUS, MINUS%left MUL, DIV% start P%%P : E {printf(“%d\n”, $1) ;}E : NUM { $$ = $1 ;} | LPAR e RPAR { $$ = $2; } | e PLUS e { $$ = $1 + $3 ; } | e MINUS e { $$ = $1 - $3 ; } | e MUL e { $$ = $1 * $3; } | e DIV e {$$ = $1 / $3; } ;%%#include “lex.yy.c”
flex desk.l
bison desk.y
cc y.tab.c –ll -ly
![Page 5: Syntax Analysis](https://reader035.fdocuments.us/reader035/viewer/2022062517/568131ae550346895d981c87/html5/thumbnails/5.jpg)
Solution (syntax analysis)
// input 7 + 5 * 3
a.out <input
22
![Page 6: Syntax Analysis](https://reader035.fdocuments.us/reader035/viewer/2022062517/568131ae550346895d981c87/html5/thumbnails/6.jpg)
Subjects• The task of syntax analysis
• Automatic generation
• Error handling
• Context free grammars and derivations
• Ambiguous Grammars
• Predictive Parsing (sketch only)
• Bottom-up syntax analysis
• Bison A parser generator Next week
![Page 7: Syntax Analysis](https://reader035.fdocuments.us/reader035/viewer/2022062517/568131ae550346895d981c87/html5/thumbnails/7.jpg)
Basic Compiler Phases
Source program (string)
Fin. Assembly
lexical analysis
syntax analysis
semantic analysis
Tokens
Abstract syntax tree
Front-End
Back-End
![Page 8: Syntax Analysis](https://reader035.fdocuments.us/reader035/viewer/2022062517/568131ae550346895d981c87/html5/thumbnails/8.jpg)
• input– Sequence of tokens
• output– Abstract Syntax Tree
• Report syntax errors• unbalanced parenthesizes
• [Create “symbol-table” ]• [Create pretty-printed version of the program]• In some cases the tree need not be generated
(one-pass compilers)
Syntax Analysis (Parsing)
![Page 9: Syntax Analysis](https://reader035.fdocuments.us/reader035/viewer/2022062517/568131ae550346895d981c87/html5/thumbnails/9.jpg)
• Report and locate the error
• Diagnose the error
• Correct the error
• Recover from the error in order to discover more errors– without reporting too many “strange” errors
Handling Syntax Errors
![Page 10: Syntax Analysis](https://reader035.fdocuments.us/reader035/viewer/2022062517/568131ae550346895d981c87/html5/thumbnails/10.jpg)
Example
a := a * ( b + c * d ;
![Page 11: Syntax Analysis](https://reader035.fdocuments.us/reader035/viewer/2022062517/568131ae550346895d981c87/html5/thumbnails/11.jpg)
The Valid Prefix Property
• For every prefix tokens– t1, t2, …, ti that the parser identifies as legal:
• there exists tokens ti+1, ti+2, …, tn
such that t1, t2, …, tn
is a syntactically valid program• If every token is considered as a single character:
– For every prefix word u that the parser identifies as legal:• there exists a word w such that• u.w is a valid program
![Page 12: Syntax Analysis](https://reader035.fdocuments.us/reader035/viewer/2022062517/568131ae550346895d981c87/html5/thumbnails/12.jpg)
Error Diagnosis
• Line number – may be far from the actual error
• The current token• The expected tokens• Parser configuration
![Page 13: Syntax Analysis](https://reader035.fdocuments.us/reader035/viewer/2022062517/568131ae550346895d981c87/html5/thumbnails/13.jpg)
Error Recovery
• Becomes less important in interactive environments
• Example heuristics:– Search for a semi-column and ignore the statement– Try to “replace” tokens for common errors– Refrain from reporting 3 subsequent errors
• Globally optimal solutions – For every input w, find a valid program w’ with a
“minimal-distance” from w
![Page 14: Syntax Analysis](https://reader035.fdocuments.us/reader035/viewer/2022062517/568131ae550346895d981c87/html5/thumbnails/14.jpg)
Designing a parser
language design
context-free grammar design
Bison
parser (c program)
![Page 15: Syntax Analysis](https://reader035.fdocuments.us/reader035/viewer/2022062517/568131ae550346895d981c87/html5/thumbnails/15.jpg)
Context Free Grammar (Review)
• What is a grammar
• Derivations and Parsing Trees
• Ambiguous grammars
• Resolving ambiguity
![Page 16: Syntax Analysis](https://reader035.fdocuments.us/reader035/viewer/2022062517/568131ae550346895d981c87/html5/thumbnails/16.jpg)
Context Free Grammars
• Non-terminals– Start non-terminal
• Terminals (tokens)
• Context Free Rules<Non-Terminal> Symbol Symbol … Symbol
![Page 17: Syntax Analysis](https://reader035.fdocuments.us/reader035/viewer/2022062517/568131ae550346895d981c87/html5/thumbnails/17.jpg)
Example Context Free Grammar
1 S S ; S2 S id := E 3 S print (L)4 E id 5 E num6 E E + E7 E (S, E)8 L E9 L L, E
![Page 18: Syntax Analysis](https://reader035.fdocuments.us/reader035/viewer/2022062517/568131ae550346895d981c87/html5/thumbnails/18.jpg)
Derivations
• Show that a sentence is in the grammar (valid program)– Start with the start symbol– Repeatedly replace one of the non-terminals by a
right-hand side of a production– Stop when the sentence contains terminals only
• Rightmost derivation• Leftmost derivation
![Page 19: Syntax Analysis](https://reader035.fdocuments.us/reader035/viewer/2022062517/568131ae550346895d981c87/html5/thumbnails/19.jpg)
Example Derivations
1 S S ; S2 S id := E 3 S print (L)4 E id 5 E num6 E E + E7 E (S, E)8 L E9 L L, E
S
S ; S
S ; id := E
id := E ; id := E
id := num ; id := E
id := num ; id := E + E
id := num ; id := E + num
id := num ; id :=num + num
a 56 b 77 16
![Page 20: Syntax Analysis](https://reader035.fdocuments.us/reader035/viewer/2022062517/568131ae550346895d981c87/html5/thumbnails/20.jpg)
Parse Trees
• The trace of a derivation
• Every internal node is labeled by a non-terminal
• Each symbol is connected to the deriving non-terminal
![Page 21: Syntax Analysis](https://reader035.fdocuments.us/reader035/viewer/2022062517/568131ae550346895d981c87/html5/thumbnails/21.jpg)
Example Parse TreeS
S ; S
S ; id := E
id := E ; id := E
id := num ; id := E
id := num ; id := E + E
id := num ; id := E + num
id := num ; id :=num + num
;s s
s
id := E id := E
id := E id := E
num num
![Page 22: Syntax Analysis](https://reader035.fdocuments.us/reader035/viewer/2022062517/568131ae550346895d981c87/html5/thumbnails/22.jpg)
Ambiguous Grammars
• Two leftmost derivations
• Two rightmost derivations
• Two parse trees
![Page 23: Syntax Analysis](https://reader035.fdocuments.us/reader035/viewer/2022062517/568131ae550346895d981c87/html5/thumbnails/23.jpg)
A Grammar for Arithmetic Expressions
1 E E + E2 E E * E3 E id4 E (E)
![Page 24: Syntax Analysis](https://reader035.fdocuments.us/reader035/viewer/2022062517/568131ae550346895d981c87/html5/thumbnails/24.jpg)
Non Ambiguous Grammarfor Arithmetic Expressions
Ambiguous grammar
1 E E + T2 E T3 T T * F4 T F5 F id6 F (E)
1 E E + E2 E E * E3 E id4 E (E)
![Page 25: Syntax Analysis](https://reader035.fdocuments.us/reader035/viewer/2022062517/568131ae550346895d981c87/html5/thumbnails/25.jpg)
Non Ambiguous Grammarsfor Arithmetic Expressions
Ambiguous grammar
1 E E + T2 E T3 T T * F4 T F5 F id6 F (E)
1 E E + E2 E E * E3 E id4 E (E)
1 E E * T2 E T3 T F + T4 T F5 F id6 F (E)
![Page 26: Syntax Analysis](https://reader035.fdocuments.us/reader035/viewer/2022062517/568131ae550346895d981c87/html5/thumbnails/26.jpg)
Efficient Parsers • Pushdown automata
• Deterministic
• Report an error as soon as the input is not a prefix of a valid program
• Not usable for all context free grammars
bison
context free grammar
tokens parser
“Ambiguity errors”
parse tree
![Page 27: Syntax Analysis](https://reader035.fdocuments.us/reader035/viewer/2022062517/568131ae550346895d981c87/html5/thumbnails/27.jpg)
Kinds of Parsers
• Top-Down (Predictive Parsing) LL– Construct parse tree in a top-down matter
– Find the leftmost derivation
– For every non-terminal and token predict the next production
• Bottom-Up LR– Construct parse tree in a bottom-up manner
– Find the rightmost derivation in a reverse order
– For every potential right hand side and token decide when a production is found
![Page 28: Syntax Analysis](https://reader035.fdocuments.us/reader035/viewer/2022062517/568131ae550346895d981c87/html5/thumbnails/28.jpg)
Example Grammar for Predictive Parsing
1 S if E then S else S2 S begin S L 3 S print (E)4 L end5 L ; S L6 E num = num
![Page 29: Syntax Analysis](https://reader035.fdocuments.us/reader035/viewer/2022062517/568131ae550346895d981c87/html5/thumbnails/29.jpg)
enum token IF, THEN, ELSE, BEGIN, END, PRINT, SEMI, NUM, EQ, LP, RP;
extern enum token getToken(void);
void advance(void) { tok= getToken(); }void eat(enum token t) { if (tok==t) advance(); else error(); }
Predictive Parser(utility functions)
![Page 30: Syntax Analysis](https://reader035.fdocuments.us/reader035/viewer/2022062517/568131ae550346895d981c87/html5/thumbnails/30.jpg)
void S(void) {switch(tok) { case IF: eat(IF); E(); eat(THEN); S(); eat(ELSE); S(); break; case BEGIN: eat(BEGIN); S(); L(); break; case PRINT: eat(PRINT); eat(LP); E(); eat(RP); break; default: error(tok, “Expecting if, begin, or print”'); }}
Predictive Parser (S)
1 S if E then S else S2 S begin S L 3 S print (E)
![Page 31: Syntax Analysis](https://reader035.fdocuments.us/reader035/viewer/2022062517/568131ae550346895d981c87/html5/thumbnails/31.jpg)
Predictive Parser (L)
4 L end5 L ; S L
void L(void) {switch(tok) { case END: eat(END); break; case SEMI: eat(SEMI); S(); L(); break; default: error(tok, “Expecting end or semicolumn''); } }
![Page 32: Syntax Analysis](https://reader035.fdocuments.us/reader035/viewer/2022062517/568131ae550346895d981c87/html5/thumbnails/32.jpg)
Predictive Parser (E)
6 E num = num
void E(void) {switch(tok) { case NUM: eat(NUM); eat(EQ) eat(NUM); break; default: error(tok, “Expecting a number”); }}
![Page 33: Syntax Analysis](https://reader035.fdocuments.us/reader035/viewer/2022062517/568131ae550346895d981c87/html5/thumbnails/33.jpg)
Predictive Parser for Arithmetic Expressions
• Grammar
• C-code?
1 E E + T2 E T3 T T * F4 T F5 F id6 F (E)
![Page 34: Syntax Analysis](https://reader035.fdocuments.us/reader035/viewer/2022062517/568131ae550346895d981c87/html5/thumbnails/34.jpg)
Summary
• Context free grammars provide a natural way to define the syntax of programming languages
• Ambiguity may be resolved• Predictive parsing is natural
– Good error messages
– Natural error recovery
– But not expressive enough
• Next lesson LR bottom-up parsing