Homework 2: Parser and Lexer - ETH Zürich · PDF fileCompiler Design – 08.10.2015...
Transcript of Homework 2: Parser and Lexer - ETH Zürich · PDF fileCompiler Design – 08.10.2015...
![Page 1: Homework 2: Parser and Lexer - ETH Zürich · PDF fileCompiler Design – 08.10.2015 1. Compiler phases ... assignment ';' ... Solution stmt: 'if' expr 'then' stmt 'else' stmt](https://reader031.fdocuments.us/reader031/viewer/2022022500/5aa500267f8b9a517d8cac42/html5/thumbnails/1.jpg)
Homework 2:Parser and Lexer
Remi MeierCompiler Design – 08.10.2015
1
![Page 2: Homework 2: Parser and Lexer - ETH Zürich · PDF fileCompiler Design – 08.10.2015 1. Compiler phases ... assignment ';' ... Solution stmt: 'if' expr 'then' stmt 'else' stmt](https://reader031.fdocuments.us/reader031/viewer/2022022500/5aa500267f8b9a517d8cac42/html5/thumbnails/2.jpg)
Compiler phases
Javali x86 Assembly
2
Compiler
Machine independent
IR IRFront-end Optimizations Back-end
Machine dependent
SemanticAnalysis
LexicalAnalysis
SyntacticAnalysis
AST
![Page 3: Homework 2: Parser and Lexer - ETH Zürich · PDF fileCompiler Design – 08.10.2015 1. Compiler phases ... assignment ';' ... Solution stmt: 'if' expr 'then' stmt 'else' stmt](https://reader031.fdocuments.us/reader031/viewer/2022022500/5aa500267f8b9a517d8cac42/html5/thumbnails/3.jpg)
Homework 2
class Main {void main() {
write(222);writeln();
}}
Class
Method
Seq
Write
IntConst
WriteLn
?
How do we…• check if a program follows the syntax of Javali?• extract meaning / structure?
Text
AbstractSyntax Tree
![Page 4: Homework 2: Parser and Lexer - ETH Zürich · PDF fileCompiler Design – 08.10.2015 1. Compiler phases ... assignment ';' ... Solution stmt: 'if' expr 'then' stmt 'else' stmt](https://reader031.fdocuments.us/reader031/viewer/2022022500/5aa500267f8b9a517d8cac42/html5/thumbnails/4.jpg)
Part 2b
Homework 2
4
Part 2a
Token Stream
Lexer Parser Parse Tree Javali AST
![Page 5: Homework 2: Parser and Lexer - ETH Zürich · PDF fileCompiler Design – 08.10.2015 1. Compiler phases ... assignment ';' ... Solution stmt: 'if' expr 'then' stmt 'else' stmt](https://reader031.fdocuments.us/reader031/viewer/2022022500/5aa500267f8b9a517d8cac42/html5/thumbnails/5.jpg)
Lexical Analysis
Lexer• Read input character by character• Recognize character groups → tokens
Token• Sequence of characters with a collective meaning
→ grammar terminals• E.g. constants, identifiers, keywords, …
5
Token Stream
Lexer Parser Parse Tree / ASTText
![Page 6: Homework 2: Parser and Lexer - ETH Zürich · PDF fileCompiler Design – 08.10.2015 1. Compiler phases ... assignment ';' ... Solution stmt: 'if' expr 'then' stmt 'else' stmt](https://reader031.fdocuments.us/reader031/viewer/2022022500/5aa500267f8b9a517d8cac42/html5/thumbnails/6.jpg)
ID : [a-zA-Z]+ ;NUM : [0-9]+ ;MISC : [{()};] ;WS : ('\n'|' ') → skip ;
class Main {void main() {
write(222);writeln();
}}
Lexical Analysis
ID: class ID: Main MISC: { ID: void ID: main MISC: ( MISC: ) …
Token stream:
![Page 7: Homework 2: Parser and Lexer - ETH Zürich · PDF fileCompiler Design – 08.10.2015 1. Compiler phases ... assignment ';' ... Solution stmt: 'if' expr 'then' stmt 'else' stmt](https://reader031.fdocuments.us/reader031/viewer/2022022500/5aa500267f8b9a517d8cac42/html5/thumbnails/7.jpg)
Syntactic Analysis
Parser• Check if token stream follows the grammar• Group tokens hierarchically (extract structure)
→ Parse Tree / Abstract Syntax Tree
7
Token Stream
Lexer Parser Parse Tree / ASTText
…
![Page 8: Homework 2: Parser and Lexer - ETH Zürich · PDF fileCompiler Design – 08.10.2015 1. Compiler phases ... assignment ';' ... Solution stmt: 'if' expr 'then' stmt 'else' stmt](https://reader031.fdocuments.us/reader031/viewer/2022022500/5aa500267f8b9a517d8cac42/html5/thumbnails/8.jpg)
TOP-DOWN PARSER
8
![Page 9: Homework 2: Parser and Lexer - ETH Zürich · PDF fileCompiler Design – 08.10.2015 1. Compiler phases ... assignment ';' ... Solution stmt: 'if' expr 'then' stmt 'else' stmt](https://reader031.fdocuments.us/reader031/viewer/2022022500/5aa500267f8b9a517d8cac42/html5/thumbnails/9.jpg)
Top-down parsers
9
statement
return
expr ‘;’‘return’
‘+’ ‘b’‘a’
return a + b ;Grammar in Extended Backus-Naur Form (EBNF):
statement: return| assign
return:‘return’ expr ‘;’
assign:ID ‘=‘ expr ‘;’
expr: ID ‘+’ ID
![Page 10: Homework 2: Parser and Lexer - ETH Zürich · PDF fileCompiler Design – 08.10.2015 1. Compiler phases ... assignment ';' ... Solution stmt: 'if' expr 'then' stmt 'else' stmt](https://reader031.fdocuments.us/reader031/viewer/2022022500/5aa500267f8b9a517d8cac42/html5/thumbnails/10.jpg)
void statement() {return();assign();
}
void return() {match(‘return’);expr();match(‘;’);
}
void expr() {match(ID);match(‘+’);match(ID);
}
statement: return| assign
return:‘return’ expr ‘;’
assign:ID ‘=‘ expr ‘;’
expr: ID ‘+’ ID
Implementation
10
How to deal with alternatives?
Grammar in Extended Backus-Naur Form (EBNF):
![Page 11: Homework 2: Parser and Lexer - ETH Zürich · PDF fileCompiler Design – 08.10.2015 1. Compiler phases ... assignment ';' ... Solution stmt: 'if' expr 'then' stmt 'else' stmt](https://reader031.fdocuments.us/reader031/viewer/2022022500/5aa500267f8b9a517d8cac42/html5/thumbnails/11.jpg)
void statement() { if (next() is ‘return’) { return(); } else if (next() is ID) { assign(); }}
Lookahead
11
LL(1)
statement: return| assign
return:‘return’ expr ‘;’
assign:ID ‘=‘ expr ‘;’
expr: ID ‘+’ ID
Grammar in Extended Backus-Naur Form (EBNF):
![Page 13: Homework 2: Parser and Lexer - ETH Zürich · PDF fileCompiler Design – 08.10.2015 1. Compiler phases ... assignment ';' ... Solution stmt: 'if' expr 'then' stmt 'else' stmt](https://reader031.fdocuments.us/reader031/viewer/2022022500/5aa500267f8b9a517d8cac42/html5/thumbnails/13.jpg)
ANTLR
Top-down parser generator● ALL(*) adaptive, arbitrary lookahead● handles any non-left-recursive context-free grammar
Token specifications
+Grammar
13
MyLexer.javaMyParser.java
![Page 14: Homework 2: Parser and Lexer - ETH Zürich · PDF fileCompiler Design – 08.10.2015 1. Compiler phases ... assignment ';' ... Solution stmt: 'if' expr 'then' stmt 'else' stmt](https://reader031.fdocuments.us/reader031/viewer/2022022500/5aa500267f8b9a517d8cac42/html5/thumbnails/14.jpg)
/* This is an example */grammar Example;
/* Parser rules = Non-terminals */program : statement* EOF ;
statement : assignment ';' | expression ';' ;
/* Lexer rules = Terminals */Identifier : Letter (Letter | Digit)* ;Letter : '\u0024' | '\u0041'..'\u005a';Upper-case initial: Lexer
Literals → Tokens
Lower-case initial: Parser
Start rule matching end-of-file
ANTLR – Grammar description
14
![Page 15: Homework 2: Parser and Lexer - ETH Zürich · PDF fileCompiler Design – 08.10.2015 1. Compiler phases ... assignment ';' ... Solution stmt: 'if' expr 'then' stmt 'else' stmt](https://reader031.fdocuments.us/reader031/viewer/2022022500/5aa500267f8b9a517d8cac42/html5/thumbnails/15.jpg)
ANTLR – Operators
15
program : statement* EOF;
statement : assignment ';' | expression ';' ;
method : type name '(' params? ')' ;
Extended Backus-Naur Form (EBNF)
lexer-on ly
EBNF operatorsx | y | z (ordered) alternative
x? at most once (optional)
x* 0 .. n times
x+ 1 .. n times
[charset]one of the chars, e.g.: [a-zA-Z]
'x'..'y' characters in range
![Page 16: Homework 2: Parser and Lexer - ETH Zürich · PDF fileCompiler Design – 08.10.2015 1. Compiler phases ... assignment ';' ... Solution stmt: 'if' expr 'then' stmt 'else' stmt](https://reader031.fdocuments.us/reader031/viewer/2022022500/5aa500267f8b9a517d8cac42/html5/thumbnails/16.jpg)
Demo 1
16
![Page 17: Homework 2: Parser and Lexer - ETH Zürich · PDF fileCompiler Design – 08.10.2015 1. Compiler phases ... assignment ';' ... Solution stmt: 'if' expr 'then' stmt 'else' stmt](https://reader031.fdocuments.us/reader031/viewer/2022022500/5aa500267f8b9a517d8cac42/html5/thumbnails/17.jpg)
ANTLR – TroubleshootingANTLR does not warn about ambiguous rules● resolves ambiguity at runtime
→ requires lots of testing
ANTLR does not handle indirect left-recursion● direct left-recursion supported
17
![Page 18: Homework 2: Parser and Lexer - ETH Zürich · PDF fileCompiler Design – 08.10.2015 1. Compiler phases ... assignment ';' ... Solution stmt: 'if' expr 'then' stmt 'else' stmt](https://reader031.fdocuments.us/reader031/viewer/2022022500/5aa500267f8b9a517d8cac42/html5/thumbnails/18.jpg)
parserRule : 'enum' parserRule ;
fragmentLetter : [a-z] ;
Identifier : Letter+ ;
fragment enforces that the rule never produces a token, but can be used in other lexer rules (e.g., a)
can never match enum, but e.g., enums
creates implicit lexer rule T123 : 'enum'
ANTLR – Lexer ambiguity
18
Lexer decides based on:1. rule with the longest match first2. literal tokens before all regular Lexer rules3. document order4. fragment rules never match on their own
documen t order
What if some input is matched by multiple lexer rules?
![Page 19: Homework 2: Parser and Lexer - ETH Zürich · PDF fileCompiler Design – 08.10.2015 1. Compiler phases ... assignment ';' ... Solution stmt: 'if' expr 'then' stmt 'else' stmt](https://reader031.fdocuments.us/reader031/viewer/2022022500/5aa500267f8b9a517d8cac42/html5/thumbnails/19.jpg)
19
stmt: 'if' expr 'then' stmt 'else' stmt
| 'if' expr 'then' stmt
| ID '=' expr ;
if a then if c then d else e
(1)(2)
if a then if c then d else e
(1), (2) (2), (1)
if a then if c then d else e
ANTLR – Parser ambiguity
Ambiguous since there exist more than one parse trees for the same input.
![Page 20: Homework 2: Parser and Lexer - ETH Zürich · PDF fileCompiler Design – 08.10.2015 1. Compiler phases ... assignment ';' ... Solution stmt: 'if' expr 'then' stmt 'else' stmt](https://reader031.fdocuments.us/reader031/viewer/2022022500/5aa500267f8b9a517d8cac42/html5/thumbnails/20.jpg)
20
ANTLR – Parser ambiguity
At decision points, if more than one alternative match a given input, follow document order.
stmt: 'if' expr 'then' stmt 'else' stmt
| 'if' expr 'then' stmt
| ID '=' expr ;
(1)(2)
if a then if c then d else e
if a then if c then d else e
(1), (2) (2), (1)
if a then if c then d else e
![Page 21: Homework 2: Parser and Lexer - ETH Zürich · PDF fileCompiler Design – 08.10.2015 1. Compiler phases ... assignment ';' ... Solution stmt: 'if' expr 'then' stmt 'else' stmt](https://reader031.fdocuments.us/reader031/viewer/2022022500/5aa500267f8b9a517d8cac42/html5/thumbnails/21.jpg)
21
ANTLR – Parser ambiguity
At decision points, if more than one alternative match a given input, follow document order.
Solution
stmt: 'if' expr 'then' stmt 'else' stmt
| 'if' expr 'then' stmt
| ID '=' expr ;
(1)(2)
stmt: 'if' expr 'then' stmt
| 'if' expr 'then' stmt 'else' stmt
| ID '=' expr ;
![Page 22: Homework 2: Parser and Lexer - ETH Zürich · PDF fileCompiler Design – 08.10.2015 1. Compiler phases ... assignment ';' ... Solution stmt: 'if' expr 'then' stmt 'else' stmt](https://reader031.fdocuments.us/reader031/viewer/2022022500/5aa500267f8b9a517d8cac42/html5/thumbnails/22.jpg)
22
ANTLR – Parser ambiguityAt decision points, if more than one alternative match a given input, follow document order.
Alternative solution:
Sub-rules introduce additional decision points.
(…)? → (…| )stmt: 'if' expr 'then' stmt ('else' stmt)?
| ID '=' expr ;
![Page 23: Homework 2: Parser and Lexer - ETH Zürich · PDF fileCompiler Design – 08.10.2015 1. Compiler phases ... assignment ';' ... Solution stmt: 'if' expr 'then' stmt 'else' stmt](https://reader031.fdocuments.us/reader031/viewer/2022022500/5aa500267f8b9a517d8cac42/html5/thumbnails/23.jpg)
ANTLR – Left-recursion
23
Direct:
list : LETTER (',' LETTER)*;
Without: “a, b, c”
list : list ',' LETTER | LETTER ;
Indirect:list : LETTER
| longlist ;
longlist : list ',' LETTER;
![Page 24: Homework 2: Parser and Lexer - ETH Zürich · PDF fileCompiler Design – 08.10.2015 1. Compiler phases ... assignment ';' ... Solution stmt: 'if' expr 'then' stmt 'else' stmt](https://reader031.fdocuments.us/reader031/viewer/2022022500/5aa500267f8b9a517d8cac42/html5/thumbnails/24.jpg)
ANTLR – Direct left-recursion
24
exp: exp '*' exp
| exp '+' exp| ID ;
a + a * aa * a + a * a
rewriteA grammar that implicitly assigns priorities to alternatives in document order
21
3
1
2
https://theantlrguy.atlassian.net/wiki/display/ANTLR4/Left-recursive+rules
![Page 25: Homework 2: Parser and Lexer - ETH Zürich · PDF fileCompiler Design – 08.10.2015 1. Compiler phases ... assignment ';' ... Solution stmt: 'if' expr 'then' stmt 'else' stmt](https://reader031.fdocuments.us/reader031/viewer/2022022500/5aa500267f8b9a517d8cac42/html5/thumbnails/25.jpg)
Demo 2
25
![Page 26: Homework 2: Parser and Lexer - ETH Zürich · PDF fileCompiler Design – 08.10.2015 1. Compiler phases ... assignment ';' ... Solution stmt: 'if' expr 'then' stmt 'else' stmt](https://reader031.fdocuments.us/reader031/viewer/2022022500/5aa500267f8b9a517d8cac42/html5/thumbnails/26.jpg)
Part 2bJavaliAstVisitor.java
Homework
26
Part 2aParser grammar: Javali.g4
Token Stream
Lexer Parser Parse Tree Javali AST
![Page 27: Homework 2: Parser and Lexer - ETH Zürich · PDF fileCompiler Design – 08.10.2015 1. Compiler phases ... assignment ';' ... Solution stmt: 'if' expr 'then' stmt 'else' stmt](https://reader031.fdocuments.us/reader031/viewer/2022022500/5aa500267f8b9a517d8cac42/html5/thumbnails/27.jpg)
Generated files
27
ANTLR
JavaliLexer/Parser.java● the real thing
Javali(Base)Visitor.java● base class for parse-tree visitor
Javali(Lexer).tokens● token → number mapping for debugging
![Page 28: Homework 2: Parser and Lexer - ETH Zürich · PDF fileCompiler Design – 08.10.2015 1. Compiler phases ... assignment ';' ... Solution stmt: 'if' expr 'then' stmt 'else' stmt](https://reader031.fdocuments.us/reader031/viewer/2022022500/5aa500267f8b9a517d8cac42/html5/thumbnails/28.jpg)
start : exp EOF ;
exp : exp '*' exp | exp '+' exp | ID ;
Generated visitor
28
start : exp EOF ;
exp : exp '*' exp # MULT | exp '+' exp # ADD | ID # TERM ;
one method per rule
one method per label / rule
https://theantlrguy.atlassian.net/wiki/display/ANTLR4/Parser+Rules
![Page 29: Homework 2: Parser and Lexer - ETH Zürich · PDF fileCompiler Design – 08.10.2015 1. Compiler phases ... assignment ';' ... Solution stmt: 'if' expr 'then' stmt 'else' stmt](https://reader031.fdocuments.us/reader031/viewer/2022022500/5aa500267f8b9a517d8cac42/html5/thumbnails/29.jpg)
Constructing the Javali AST
29
start : exp EOF ;
exp : exp '*' exp # MULT | exp '+' exp # ADD | ID # TERM ;
“a * a + a * a”
2
1
3 Var('a') Var('a')
BinaryOp(*)
Var('a') Var('a')4 6
5
7
BinaryOp(+)
BinaryOp(*)
![Page 30: Homework 2: Parser and Lexer - ETH Zürich · PDF fileCompiler Design – 08.10.2015 1. Compiler phases ... assignment ';' ... Solution stmt: 'if' expr 'then' stmt 'else' stmt](https://reader031.fdocuments.us/reader031/viewer/2022022500/5aa500267f8b9a517d8cac42/html5/thumbnails/30.jpg)
Demo 3
30
![Page 31: Homework 2: Parser and Lexer - ETH Zürich · PDF fileCompiler Design – 08.10.2015 1. Compiler phases ... assignment ';' ... Solution stmt: 'if' expr 'then' stmt 'else' stmt](https://reader031.fdocuments.us/reader031/viewer/2022022500/5aa500267f8b9a517d8cac42/html5/thumbnails/31.jpg)
Notes
• You are not allowed to use syntactic predicates.• Look on our website for more material.• Due date is October, 22th
31