Grammar Variation in Compiler Design

Grammar Variation in Compiler Design

Carl Wu

Three topics

• Syntax Grammar vs. AST

• Component(?)-based grammar

• Aspect-oriented grammar

Grammar vs. AST (I)

How to automatically generate a tree from a grammar?

Grammar vs. AST (I)

Stmt ::= Block

| “if” Expr “then” Stmt

| IdUse “:=” Exp

Grammar vs. AST (I)

Stmt ::= Block | “if” Exp “then” Stmt | IdUse “:=” Exp

JastAdd Specification (Tree)abstract Stmt;BlockStmt : Stmt ::= Block;IfStmt : Stmt ::= Exp Stmt;AssignStmt : Stmt ::= IdUse Exp;

Grammar vs. AST (I)

Restricted CFG Definition

A ::= B C D √ => aggregation

A ::= B | C | D √ => inheritance

A ::= B C | D ×

Grammar vs. AST (I)

RCFG Specification

Stmt :: Block | IfStmt | AssignStmt

IfStmt :: “if” Exp “then” Stmt

AssignStmt :: IdUse “:=” Exp

AssignStmtBlockIfStmt

Stmt

Exp Stmt IdUse Exp

Grammar vs. AST (II)

Parse tree vs. IR tree


• In an IDE, there are multiple visitors for the same source code (>12 !).

• Different requirement for the tree structure:– Syntax vs. semantics– Immutable vs. transformable (optimization)– Parse tree vs. IR tree


• Generate two tree structures from the same grammar!

• One immutable, strong-typed, concrete parse tree – Read only!

• One transferable, untyped, abstract IR tree – Read and write!

Grammar vs. AST (II)IfStmt :: “if” Exp “then” Stmt

Class ASTNode{protected ASTNode[] children;

}class IfStmt extends ASTNode{

final protected Token token_if, Exp exp, Token token_then, Stmt stmt;IfStmt(Token token_if, Exp exp, Token token_then, Stmt stmt){

// parse tree construction this.token_if = token_if;this.exp = exp;this.token_then = token_then;this.stmt = stmt;// IR tree constructionchildren[0] = exp;children[1] = stmt;

}}

Component(?)-based grammar

Component vs. module

• What is the different between a component and a module?

• What is a modularized grammar?

• What is an ideal component-based grammar?

Component vs. module

Grammar Component

Grammar Component

Grammar Component

Grammar Component

ParserParser

ParserParser

Grammar Module

Grammar Module

Grammar Module

Grammar Module

GrammarGrammar

ParserParser

Modularized grammar

Component-based grammar

Benefits

• Benefits from modularized grammar– Easy to read, write, change– Eliminate naming conflicts

• Additional benefits brought from component-based grammar– Each component can be designed, developed and

tested individually. – Any change to certain component does not require

compiling all the other components.– Different type of grammars/parsing algorithms can be

used for different component, e.g., one component can be LL, one can be LALR.

Difficulty in designing component-based grammar

• No clear guards between two components. – Switch the control to a new parser or stay in the

same?– Suitable for embed languages, e.g., Jscript in Html– Not suitable for an integral language, e.g., Java

• Two much coupling between two components. – Not just reuse the component as a whole, may also

reuse the internal productions and symbols.– Not applicable for LR parsers, once the table is built,

you can’t reuse the internal productions (no way to jump into a table).

Ideal vs. reality

JavaClass

Interface

Object_type

Statement

Expression

Type

Binary_expr

Unary_expr

Primary

Array

JavaClass

Interface

Object_type

Statement

Expression

Type

Binary_expr

Unary_expr

Primary

Array

Suggestions?

Aspect-oriented grammar

Aspect-oriented grammar

• Join-point: grammar patterns that crosscut multiple productions

• Punctuations, identifiers, modifiers…

Example

• ";“ appears 25 times in one of the Java grammars

• “.” appears 74 times in one of the Cobol grammars

• Every one of them should be carefully placed!

Another example

pointcut Content(): … …

before Content(): “(”;

after Content(): “)”;

Guarantee they match!

Grammar weaving

Base GrammarBase Grammar

Grammar AspectGrammar Aspect

Result grammarResult grammar

ParserParser

What do you think?

Grammar Variation in Compiler Design

Documents

Transcript of Grammar Variation in Compiler Design