Grammar Variation in Compiler Design
description
Transcript of Grammar Variation in Compiler Design
Grammar Variation in Compiler Design
Carl Wu
Three topics
• Syntax Grammar vs. AST
• Component(?)-based grammar
• Aspect-oriented grammar
Grammar vs. AST (I)
How to automatically generate a tree from a grammar?
Grammar vs. AST (I)
Stmt ::= Block
| “if” Expr “then” Stmt
| IdUse “:=” Exp
Grammar vs. AST (I)
Stmt ::= Block | “if” Exp “then” Stmt | IdUse “:=” Exp
JastAdd Specification (Tree)abstract Stmt;BlockStmt : Stmt ::= Block;IfStmt : Stmt ::= Exp Stmt;AssignStmt : Stmt ::= IdUse Exp;
Grammar vs. AST (I)
Restricted CFG Definition
A ::= B C D √ => aggregation
A ::= B | C | D √ => inheritance
A ::= B C | D ×
Grammar vs. AST (I)
RCFG Specification
Stmt :: Block | IfStmt | AssignStmt
IfStmt :: “if” Exp “then” Stmt
AssignStmt :: IdUse “:=” Exp
AssignStmtBlockIfStmt
Stmt
Exp Stmt IdUse Exp
Grammar vs. AST (II)
Parse tree vs. IR tree
Grammar vs. AST (II)
• In an IDE, there are multiple visitors for the same source code (>12 !).
• Different requirement for the tree structure:– Syntax vs. semantics– Immutable vs. transformable (optimization)– Parse tree vs. IR tree
Grammar vs. AST (II)
• Generate two tree structures from the same grammar!
• One immutable, strong-typed, concrete parse tree – Read only!
• One transferable, untyped, abstract IR tree – Read and write!
Grammar vs. AST (II)IfStmt :: “if” Exp “then” Stmt
Class ASTNode{protected ASTNode[] children;
}class IfStmt extends ASTNode{
final protected Token token_if, Exp exp, Token token_then, Stmt stmt;IfStmt(Token token_if, Exp exp, Token token_then, Stmt stmt){
// parse tree construction this.token_if = token_if;this.exp = exp;this.token_then = token_then;this.stmt = stmt;// IR tree constructionchildren[0] = exp;children[1] = stmt;
}}
Component(?)-based grammar
Component vs. module
• What is the different between a component and a module?
• What is a modularized grammar?
• What is an ideal component-based grammar?
Component vs. module
Grammar Component
Grammar Component
Grammar Component
Grammar Component
ParserParser
ParserParser
Grammar Module
Grammar Module
Grammar Module
Grammar Module
GrammarGrammar
ParserParser
Modularized grammar
Component-based grammar
Benefits
• Benefits from modularized grammar– Easy to read, write, change– Eliminate naming conflicts
• Additional benefits brought from component-based grammar– Each component can be designed, developed and
tested individually. – Any change to certain component does not require
compiling all the other components.– Different type of grammars/parsing algorithms can be
used for different component, e.g., one component can be LL, one can be LALR.
Difficulty in designing component-based grammar
• No clear guards between two components. – Switch the control to a new parser or stay in the
same?– Suitable for embed languages, e.g., Jscript in Html– Not suitable for an integral language, e.g., Java
• Two much coupling between two components. – Not just reuse the component as a whole, may also
reuse the internal productions and symbols.– Not applicable for LR parsers, once the table is built,
you can’t reuse the internal productions (no way to jump into a table).
Ideal vs. reality
JavaClass
Interface
Object_type
Statement
Expression
Type
Binary_expr
Unary_expr
Primary
Array
JavaClass
Interface
Object_type
Statement
Expression
Type
Binary_expr
Unary_expr
Primary
Array
Suggestions?
Aspect-oriented grammar
Aspect-oriented grammar
• Join-point: grammar patterns that crosscut multiple productions
• Punctuations, identifiers, modifiers…
Example
• ";“ appears 25 times in one of the Java grammars
• “.” appears 74 times in one of the Cobol grammars
• Every one of them should be carefully placed!
<Sentence> ::= <Accept Stm> '.' | <Add Stm> '.' | <Add Stm Ex> <End-Add Opt> '.' | <Call Stm> '.' | <Call Stm Ex> <End-Call Opt> '.' | <Close Stm> '.' | <Compute Stm> '.' | <Compute Stm Ex> <End-Compute Opt>
'.' | <Display Stm> '.' | <Divide Stm> '.' | <Divide Stm Ex> <End-Divide Opt> '.' | <Evaluate Stm> <End-Evaluate Opt> '.' | <If Stm> <End-If Opt>'.' | <Move Stm> '.' | <Move Stm Ex> <End-Move Opt> '.' | <Multiply Stm>'.' | <Multiply Stm Ex> <End-Multiply Opt> '.'
| <Open Stm> '.' | <Perform Stm> '.' | <Perform Stm Ex> <End-Perform Opt>
'.' | <Read Stm> '.' | <Read Stm Ex> <End-Read Opt> '.' | <Release Stm> '.' | <Rewrite Stm> '.' | <Rewrite Stm Ex> <End-Rewrite Opt> '.' | <Set Stm> '.' | <Start Stm> '.' | <Start Stm Ex> <End-Start Opt> '.' | <String Stm> '.' | <String Stm Ex> <End-String Opt> '.' | <Subtract Stm>'.' | <Subtract Stm Ex> <End-Substract Opt>
'.' | <Write Stm> '.' | <Write Stm Ex> <End-Write Opt> '.' | <Unstring Stm>'.' | <Unstring Stm Ex> <End-Unstring Opt> '.' | <Misc Stm> '.'
pointcut PreDot(): <Sentence>;
after PreDot(): ‘.'
Another example
pointcut Content(): … …
before Content(): “(”;
after Content(): “)”;
Guarantee they match!
Grammar weaving
Base GrammarBase Grammar
Grammar AspectGrammar Aspect
Result grammarResult grammar
ParserParser
What do you think?