Automated Parser Generation (via CUP )
description
Transcript of Automated Parser Generation (via CUP )
![Page 2: Automated Parser Generation (via CUP )](https://reader035.fdocuments.us/reader035/viewer/2022062305/5681622f550346895dd25ebd/html5/thumbnails/2.jpg)
High-level structure
JFlex javacLexerspec
Lexical analyzer
text
tokens
.java
CUP javacParserspec .java Parser
AST
TPL.cup
TPL.lex
sym.javaParser.java
Lexer.java
(Token.java)
2
![Page 3: Automated Parser Generation (via CUP )](https://reader035.fdocuments.us/reader035/viewer/2022062305/5681622f550346895dd25ebd/html5/thumbnails/3.jpg)
Expression calculator
expr expr + expr| expr - expr| expr * expr| expr / expr| - expr| ( expr )| number
Goals of expression calculator parser:• Is 2+3+4+5 a valid expression?• What is the meaning (value) of this expression?
3
![Page 4: Automated Parser Generation (via CUP )](https://reader035.fdocuments.us/reader035/viewer/2022062305/5681622f550346895dd25ebd/html5/thumbnails/4.jpg)
Syntax analysis with CUP
CUP javacParserspec .java Parser
AST
CUP – parser generator Generates an LALR(1) Parser Input: spec file Output: a syntax analyzer
tokens
4
![Page 5: Automated Parser Generation (via CUP )](https://reader035.fdocuments.us/reader035/viewer/2022062305/5681622f550346895dd25ebd/html5/thumbnails/5.jpg)
CUP spec file
• Package and import specifications• User code components• Symbol (terminal and non-terminal) lists– Terminals go to sym.java– Types of AST nodes
• Precedence declarations• The grammar– Semantic actions to construct AST
5
![Page 6: Automated Parser Generation (via CUP )](https://reader035.fdocuments.us/reader035/viewer/2022062305/5681622f550346895dd25ebd/html5/thumbnails/6.jpg)
Expression Calculator – 1st Attempt
terminal Integer NUMBER;terminal PLUS, MINUS, MULT, DIV;terminal LPAREN, RPAREN;
non terminal Integer expr;
expr ::= expr PLUS expr| expr MINUS expr| expr MULT expr| expr DIV expr| MINUS expr| LPAREN expr RPAREN| NUMBER
;
Symbol typeexplained later
6
![Page 7: Automated Parser Generation (via CUP )](https://reader035.fdocuments.us/reader035/viewer/2022062305/5681622f550346895dd25ebd/html5/thumbnails/7.jpg)
Ambiguities
a * b + c
a b c
+
*
a b c
*
+
a + b + ca b c
+
+
a b c
+
+
7
![Page 8: Automated Parser Generation (via CUP )](https://reader035.fdocuments.us/reader035/viewer/2022062305/5681622f550346895dd25ebd/html5/thumbnails/8.jpg)
Ambiguities as conflicts for LR(1)
a * b + c
a b c
+
*
a b c
*
+
a + b + ca b c
+
+
a b c
+
+
8
![Page 9: Automated Parser Generation (via CUP )](https://reader035.fdocuments.us/reader035/viewer/2022062305/5681622f550346895dd25ebd/html5/thumbnails/9.jpg)
terminal Integer NUMBER;terminal PLUS,MINUS,MULT,DIV;terminal LPAREN, RPAREN;terminal UMINUS;non terminal Integer expr;
precedence left PLUS, MINUS;precedence left DIV, MULT;precedence left UMINUS;
expr ::= expr PLUS expr| expr MINUS expr| expr MULT expr| expr DIV expr| MINUS expr %prec UMINUS| LPAREN expr RPAREN| NUMBER
;
Expression Calculator – 2nd Attempt
Increasing precedence
Contextual precedence
9
![Page 10: Automated Parser Generation (via CUP )](https://reader035.fdocuments.us/reader035/viewer/2022062305/5681622f550346895dd25ebd/html5/thumbnails/10.jpg)
Parsing ambiguous grammars using precedence declarations
• Each terminal assigned with precedence– By default all terminals have lowest precedence– User can assign his own precedence– CUP assigns each production a precedence
• Precedence of rightmost terminal in production• or user-specified contextual precedence
• On shift/reduce conflict resolve ambiguity by comparing precedence of terminal and production and decides whether to shift or reduce
• In case of equal precedences left/right help resolve conflicts– left means reduce– right means shift
• More information on precedence declarations in CUP’s manual
10
![Page 11: Automated Parser Generation (via CUP )](https://reader035.fdocuments.us/reader035/viewer/2022062305/5681622f550346895dd25ebd/html5/thumbnails/11.jpg)
Resolving ambiguity
a + b + c
a b c
+
+
a b c
+
+
precedence left PLUS
11
![Page 12: Automated Parser Generation (via CUP )](https://reader035.fdocuments.us/reader035/viewer/2022062305/5681622f550346895dd25ebd/html5/thumbnails/12.jpg)
Resolving ambiguity
a * b + c
a b c
+
*
a b c
*
+
precedence left PLUSprecedence left MULT
12
![Page 13: Automated Parser Generation (via CUP )](https://reader035.fdocuments.us/reader035/viewer/2022062305/5681622f550346895dd25ebd/html5/thumbnails/13.jpg)
Resolving ambiguity
- a * b
a b
*
-
precedence left MULTMINUS expr %prec UMINUS
a
-b
*
13
![Page 14: Automated Parser Generation (via CUP )](https://reader035.fdocuments.us/reader035/viewer/2022062305/5681622f550346895dd25ebd/html5/thumbnails/14.jpg)
Resolving ambiguityterminal Integer NUMBER;terminal PLUS,MINUS,MULT,DIV;terminal LPAREN, RPAREN;terminal UMINUS;
precedence left PLUS, MINUS;precedence left DIV, MULT;precedence left UMINUS;
expr ::= expr PLUS expr| expr MINUS expr| expr MULT expr| expr DIV expr| MINUS expr %prec
UMINUS| LPAREN expr RPAREN| NUMBER
;
Rule has precedence of UMINUS
UMINUS never returnedby scanner
(used only to define precedence)
14
![Page 15: Automated Parser Generation (via CUP )](https://reader035.fdocuments.us/reader035/viewer/2022062305/5681622f550346895dd25ebd/html5/thumbnails/15.jpg)
More CUP directives• precedence nonassoc NEQ– Non-associative operators: < > == != etc.– 1<2<3 identified as an error (semantic error?)
• start non-terminal– Specifies start non-terminal other than first non-terminal– Can change to test parts of grammar
• Getting internal representation– Command line options:
• -dump_grammar• -dump_states • -dump_tables• -dump
15
![Page 16: Automated Parser Generation (via CUP )](https://reader035.fdocuments.us/reader035/viewer/2022062305/5681622f550346895dd25ebd/html5/thumbnails/16.jpg)
import java_cup.runtime.*;%%%cup%eofval{ return new Symbol(sym.EOF);%eofval}NUMBER=[0-9]+%%<YYINITIAL>”+” { return new Symbol(sym.PLUS); }<YYINITIAL>”-” { return new Symbol(sym.MINUS); }<YYINITIAL>”*” { return new Symbol(sym.MULT); }<YYINITIAL>”/” { return new Symbol(sym.DIV); }<YYINITIAL>”(” { return new Symbol(sym.LPAREN); }<YYINITIAL>”)” { return new Symbol(sym.RPAREN); }<YYINITIAL>{NUMBER} {
return new Symbol(sym.NUMBER, new Integer(yytext()));}<YYINITIAL>\n { }<YYINITIAL>. { }
Parser gets terminals from the scanner
Scanner integration
Generated from tokendeclarations in .cup file
16
![Page 17: Automated Parser Generation (via CUP )](https://reader035.fdocuments.us/reader035/viewer/2022062305/5681622f550346895dd25ebd/html5/thumbnails/17.jpg)
Recap
• Package and import specifications and user code components
• Symbol (terminal and non-terminal) lists– Define building-blocks of the grammar
• Precedence declarations– May help resolve conflicts
• The grammar– May introduce conflicts that have to be resolved
17
![Page 18: Automated Parser Generation (via CUP )](https://reader035.fdocuments.us/reader035/viewer/2022062305/5681622f550346895dd25ebd/html5/thumbnails/18.jpg)
Assigning meaning
• So far, only validation• Add Java code implementing semantic actions
expr ::= expr PLUS expr| expr MINUS expr| expr MULT expr| expr DIV expr| MINUS expr %prec UMINUS| LPAREN expr RPAREN| NUMBER
;
18
![Page 19: Automated Parser Generation (via CUP )](https://reader035.fdocuments.us/reader035/viewer/2022062305/5681622f550346895dd25ebd/html5/thumbnails/19.jpg)
• Symbol labels used to name variables• RESULT names the left-hand side symbol
expr ::= expr:e1 PLUS expr:e2{: RESULT = new Integer(e1.intValue() + e2.intValue()); :}| expr:e1 MINUS expr:e2{: RESULT = new Integer(e1.intValue() - e2.intValue()); :}| expr:e1 MULT expr:e2{: RESULT = new Integer(e1.intValue() * e2.intValue()); :}| expr:e1 DIV expr:e2{: RESULT = new Integer(e1.intValue() / e2.intValue()); :}| MINUS expr:e1{: RESULT = new Integer(0 - e1.intValue(); :} %prec UMINUS| LPAREN expr:e1 RPAREN{: RESULT = e1; :}| NUMBER:n {: RESULT = n; :};
Assigning meaning
19
![Page 20: Automated Parser Generation (via CUP )](https://reader035.fdocuments.us/reader035/viewer/2022062305/5681622f550346895dd25ebd/html5/thumbnails/20.jpg)
Building an AST
• More useful representation of syntax tree– Less clutter– Actual level of detail depends on your design
• Basis for semantic analysis• Later annotated with various information– Type information– Computed values
• Technically – a class hierarchy of abstract syntax tree nodes
20
![Page 21: Automated Parser Generation (via CUP )](https://reader035.fdocuments.us/reader035/viewer/2022062305/5681622f550346895dd25ebd/html5/thumbnails/21.jpg)
Parse tree vs. AST
+
expr
1 2 + 3
expr
expr
( ) ( )
expr
expr
1 2
+
3
+
21
![Page 22: Automated Parser Generation (via CUP )](https://reader035.fdocuments.us/reader035/viewer/2022062305/5681622f550346895dd25ebd/html5/thumbnails/22.jpg)
22
AST hierarchy example
int_const plus minus times divide
expr
![Page 23: Automated Parser Generation (via CUP )](https://reader035.fdocuments.us/reader035/viewer/2022062305/5681622f550346895dd25ebd/html5/thumbnails/23.jpg)
AST construction• AST Nodes constructed during parsing– Stored in push-down stack
• Bottom-up parser– Grammar rules annotated with actions for AST
construction– When node is constructed all children available
(already constructed)– Node (RESULT) pushed on stack
• Top-down parser– More complicated
23
![Page 24: Automated Parser Generation (via CUP )](https://reader035.fdocuments.us/reader035/viewer/2022062305/5681622f550346895dd25ebd/html5/thumbnails/24.jpg)
1 + (2) + (3)
expr + (expr) + (3)
+
expr
1 2 + 3
expr
expr + (3)
expr
( ) ( )
expr + (expr)
expr
expr
expr
expr + (2) + (3)
int_constval = 1
pluse1 e2
int_constval = 2
int_constval = 3
pluse1 e2
expr ::= expr:e1 PLUS expr:e2 {: RESULT = new plus(e1,e2); :} | LPAREN expr:e RPAREN {: RESULT = e; :} | INT_CONST:i {: RESULT = new int_const(…, i); :}
AST construction
24
![Page 25: Automated Parser Generation (via CUP )](https://reader035.fdocuments.us/reader035/viewer/2022062305/5681622f550346895dd25ebd/html5/thumbnails/25.jpg)
terminal Integer NUMBER;terminal PLUS,MINUS,MULT,DIV,LPAREN,RPAREN,SEMI;terminal UMINUS;non terminal Integer expr;non terminal expr_list, expr_part; precedence left PLUS, MINUS;precedence left DIV, MULT;precedence left UMINUS;
expr_list ::= expr_list expr_part | expr_part
; expr_part ::= expr:e {: System.out.println("= " + e); :} SEMI
; expr ::= expr PLUS expr
| expr MINUS expr| expr MULT expr| expr DIV expr| MINUS expr %prec UMINUS| LPAREN expr RPAREN| NUMBER
;
Example of lists
25