CSE 3302 Programming Languages Chengkai Li, Weimin He Spring 2008 Syntax (cont.) Lecture 4 –...

29
CSE 3302 Programming Languages Chengkai Li, Weimin He Spring 2008 Syntax (cont.) Lecture 4 – Syntax (Cont.), Spring 2008 1 CSE3302 Programming Languages, UT-Arlington ©Chengkai Li, Weimin He, Weimin He 2008

Transcript of CSE 3302 Programming Languages Chengkai Li, Weimin He Spring 2008 Syntax (cont.) Lecture 4 –...

Page 1: CSE 3302 Programming Languages Chengkai Li, Weimin He Spring 2008 Syntax (cont.) Lecture 4 – Syntax (Cont.), Spring 20081 CSE3302 Programming Languages,

CSE 3302 Programming Languages

Chengkai Li, Weimin HeSpring 2008

Syntax (cont.)

Lecture 4 – Syntax (Cont.), Spring 2008 1CSE3302 Programming Languages, UT-Arlington ©Chengkai Li, Weimin He, Weimin He 2008

Page 2: CSE 3302 Programming Languages Chengkai Li, Weimin He Spring 2008 Syntax (cont.) Lecture 4 – Syntax (Cont.), Spring 20081 CSE3302 Programming Languages,

What is Parsing?

• Given a grammar and a token string:– determine if the grammar can generate the token

string?– i.e., is the string a legal program in the language?

• In other words, to construct a parse tree for the token string.

Lecture 4 – Syntax (Cont.), Spring 2008 2CSE3302 Programming Languages, UT-Arlington ©Chengkai Li, Weimin He, Weimin He 2008

Page 3: CSE 3302 Programming Languages Chengkai Li, Weimin He Spring 2008 Syntax (cont.) Lecture 4 – Syntax (Cont.), Spring 20081 CSE3302 Programming Languages,

What’s significant about parse tree?

A parse tree gives a unique syntactic structure

• Leftmost, rightmost derivation• There is only one leftmost derivation for a parse tree, and

symmetrically only one rightmost derivation for a parse tree.

Lecture 4 – Syntax (Cont.), Spring 2008 3CSE3302 Programming Languages, UT-Arlington ©Chengkai Li, Weimin He, Weimin He 2008

Page 4: CSE 3302 Programming Languages Chengkai Li, Weimin He Spring 2008 Syntax (cont.) Lecture 4 – Syntax (Cont.), Spring 20081 CSE3302 Programming Languages,

Exampleexpr expr + expr | expr expr | ( expr ) | numbernumber number digit | digitdigit 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9

expr

expr

number

digit

3

*

expr

number

digit

4

expr

number

digit

5

expr

+

parse tree leftmost derivation expr

number

digit

3

number

digit

4

number

digit

5

exprexpr +

*expr expr

Lecture 4 – Syntax (Cont.), Spring 2008 4CSE3302 Programming Languages, UT-Arlington ©Chengkai Li, Weimin He, Weimin He 2008

Page 5: CSE 3302 Programming Languages Chengkai Li, Weimin He Spring 2008 Syntax (cont.) Lecture 4 – Syntax (Cont.), Spring 20081 CSE3302 Programming Languages,

What’s significant about parse tree?

A parse tree has a unique meaning, thus provides basis for semantic analysis.(Syntax-directed semantics: semantics are attached to syntactic structure.)

expr

expr1 expr2+

expr.val = expr1.val + expr2.val

Lecture 4 – Syntax (Cont.), Spring 2008 5CSE3302 Programming Languages, UT-Arlington ©Chengkai Li, Weimin He, Weimin He 2008

Page 6: CSE 3302 Programming Languages Chengkai Li, Weimin He Spring 2008 Syntax (cont.) Lecture 4 – Syntax (Cont.), Spring 20081 CSE3302 Programming Languages,

Relationship among language, grammar, parser

• Chomsky Hierarchyhttp://en.wikipedia.org/wiki/Chomsky_hierarchy

• A language can be described by multiple grammars, while a grammar defines one language.

• A grammar can be parsed by multiple parsers, while a parser accepts one grammar, thus one language.

• Should design a language that allows simple grammar and efficient parser

• For a language, we should construct a grammar that allows fast parsing

• Given a grammar, we should build an efficient parser

Lecture 4 – Syntax (Cont.), Spring 2008 6CSE3302 Programming Languages, UT-Arlington ©Chengkai Li, Weimin He, Weimin He 2008

Page 7: CSE 3302 Programming Languages Chengkai Li, Weimin He Spring 2008 Syntax (cont.) Lecture 4 – Syntax (Cont.), Spring 20081 CSE3302 Programming Languages,

Ambiguity

• Ambiguous grammar: There can be multiple parse trees for the same sentence (program)– In other words, multiple leftmost derivations.

• Why is it bad?– Multiple meanings

Lecture 4 – Syntax (Cont.), Spring 2008 7CSE3302 Programming Languages, UT-Arlington ©Chengkai Li, Weimin He, Weimin He 2008

Page 8: CSE 3302 Programming Languages Chengkai Li, Weimin He Spring 2008 Syntax (cont.) Lecture 4 – Syntax (Cont.), Spring 20081 CSE3302 Programming Languages,

Ambiguity

• Was this ambiguous?

• How about this?

number number digit | digitdigit 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9

expr expr - expr | number

Lecture 4 – Syntax (Cont.), Spring 2008 8CSE3302 Programming Languages, UT-Arlington ©Chengkai Li, Weimin He, Weimin He 2008

Page 9: CSE 3302 Programming Languages Chengkai Li, Weimin He Spring 2008 Syntax (cont.) Lecture 4 – Syntax (Cont.), Spring 20081 CSE3302 Programming Languages,

Deal with Ambiguity

• Unambiguous Grammar– Rewrite the grammar to avoid ambiguity.

Lecture 4 – Syntax (Cont.), Spring 2008 9CSE3302 Programming Languages, UT-Arlington ©Chengkai Li, Weimin He, Weimin He 2008

Page 10: CSE 3302 Programming Languages Chengkai Li, Weimin He Spring 2008 Syntax (cont.) Lecture 4 – Syntax (Cont.), Spring 20081 CSE3302 Programming Languages,

Example of Ambiguity: Precedenceexpr expr + expr | expr expr | ( expr ) | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9

expr

expr

3 *

expr

4

expr

5

expr

+

parse tree 1

Two different parse tress for expression 3+4*5

expr

expr

5

expr *

parse tree 2

+

3

expr

4

expr

Lecture 4 – Syntax (Cont.), Spring 2008 10CSE3302 Programming Languages, UT-Arlington ©Chengkai Li, Weimin He, Weimin He 2008

Page 11: CSE 3302 Programming Languages Chengkai Li, Weimin He Spring 2008 Syntax (cont.) Lecture 4 – Syntax (Cont.), Spring 20081 CSE3302 Programming Languages,

Eliminating Ambiguity for Precedence

• Establish “precedence cascade”: using different structured names for different constructs, adding grammar rules.– Higher precedence : lower in cascade

expr expr + expr | term

term term term | ( expr ) | number

expr expr + expr | expr expr | ( expr ) | number

Lecture 4 – Syntax (Cont.), Spring 2008 11CSE3302 Programming Languages, UT-Arlington ©Chengkai Li, Weimin He, Weimin He 2008

Page 12: CSE 3302 Programming Languages Chengkai Li, Weimin He Spring 2008 Syntax (cont.) Lecture 4 – Syntax (Cont.), Spring 20081 CSE3302 Programming Languages,

Example of Ambiguity: Associativityexpr expr - expr | ( expr ) | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9

expr

expr

5 -

expr

2

expr

1

expr

-

parse tree 1

Two different parse tress for expression 5-2-1

expr

expr

1

expr -

parse tree 2

-

5

expr

2

expr

expr.val = 4 expr.val = 2

- is right-associative, which is against common practice in integer arithmetic

- is left-associative, which is what we usually assume

Lecture 4 – Syntax (Cont.), Spring 2008 12CSE3302 Programming Languages, UT-Arlington ©Chengkai Li, Weimin He, Weimin He 2008

Page 13: CSE 3302 Programming Languages Chengkai Li, Weimin He Spring 2008 Syntax (cont.) Lecture 4 – Syntax (Cont.), Spring 20081 CSE3302 Programming Languages,

Associativity

• Left-Associative: + - * /• Right-Associative: =

What is meant by a=b=c=1?

Lecture 4 – Syntax (Cont.), Spring 2008 13CSE3302 Programming Languages, UT-Arlington ©Chengkai Li, Weimin He, Weimin He 2008

Page 14: CSE 3302 Programming Languages Chengkai Li, Weimin He Spring 2008 Syntax (cont.) Lecture 4 – Syntax (Cont.), Spring 20081 CSE3302 Programming Languages,

Eliminating Ambiguity for Associativity

• left-associativity: left-recursion

expr expr - term | termterm (expr) | number

expr expr - expr | ( expr ) | number

• right-associativity: right-recursionexpr expr = expr | a | b | c

expr term = expr | termterm a | b | c

Lecture 4 – Syntax (Cont.), Spring 2008 14CSE3302 Programming Languages, UT-Arlington ©Chengkai Li, Weimin He, Weimin He 2008

Page 15: CSE 3302 Programming Languages Chengkai Li, Weimin He Spring 2008 Syntax (cont.) Lecture 4 – Syntax (Cont.), Spring 20081 CSE3302 Programming Languages,

Putting Together

expr expr - expr | expr / expr | ( expr ) | numbernumber number digit | digitdigit 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9

expr expr - term | termterm term / factor | factorfactor ( expr )| numbernumber number digit | digitdigit 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9

We want to make - left-associative and / has precedence over -

Lecture 4 – Syntax (Cont.), Spring 2008 15CSE3302 Programming Languages, UT-Arlington ©Chengkai Li, Weimin He, Weimin He 2008

Page 16: CSE 3302 Programming Languages Chengkai Li, Weimin He Spring 2008 Syntax (cont.) Lecture 4 – Syntax (Cont.), Spring 20081 CSE3302 Programming Languages,

Example of Ambiguity: Dangling-Else

stmt if( expr ) stmt | if( expr ) stmt else stmt

| other-stmt

Two different parse trees for “if( expr ) if( expr ) stmt else stmt”

stmt

stmt

(expr)

if

if stmt

(expr)

parse tree 1 parse tree 2

stmt else

stmt

(expr)

if

if stmt

(expr) stmtstmt else

Lecture 4 – Syntax (Cont.), Spring 2008 16CSE3302 Programming Languages, UT-Arlington ©Chengkai Li, Weimin He, Weimin He 2008

Page 17: CSE 3302 Programming Languages Chengkai Li, Weimin He Spring 2008 Syntax (cont.) Lecture 4 – Syntax (Cont.), Spring 20081 CSE3302 Programming Languages,

Eliminating Dangling-Elsestmt matched_stmt | unmatched_stmtmatched_stmt if( expr ) matched_stmt else matched_stmt

| other-stmtunmatched_stmt if( expr ) stmt

| if( expr ) matched_stmt else unmatched_stmt

Lecture 4 – Syntax (Cont.), Spring 2008 17CSE3302 Programming Languages, UT-Arlington ©Chengkai Li, Weimin He, Weimin He 2008

Page 18: CSE 3302 Programming Languages Chengkai Li, Weimin He Spring 2008 Syntax (cont.) Lecture 4 – Syntax (Cont.), Spring 20081 CSE3302 Programming Languages,

EBNF• Repetition { }

• Option [ ]

expr expr - term | term

if-stmt if ( expr ) stmt | if ( expr ) stmt else stmt

if-stmt if ( expr ) stmt [ else stmt ]

number digit | number digit number { digit }

signed-number sign number

| number signed-number [ sign ] number

expr term term

Lecture 4 – Syntax (Cont.), Spring 2008 18CSE3302 Programming Languages, UT-Arlington ©Chengkai Li, Weimin He, Weimin He 2008

Page 19: CSE 3302 Programming Languages Chengkai Li, Weimin He Spring 2008 Syntax (cont.) Lecture 4 – Syntax (Cont.), Spring 20081 CSE3302 Programming Languages,

Syntax Diagrams• Written from EBNF, not BNF• If-statement

(more examples on page 101)

if

elsestatement

expression

statement

( )if-statement

Lecture 4 – Syntax (Cont.), Spring 2008 19CSE3302 Programming Languages, UT-Arlington ©Chengkai Li, Weimin He, Weimin He 2008

Page 20: CSE 3302 Programming Languages Chengkai Li, Weimin He Spring 2008 Syntax (cont.) Lecture 4 – Syntax (Cont.), Spring 20081 CSE3302 Programming Languages,

Parsing Techniques

• Intuitive analysis and conclusion• No formal theorems and rigorous proofs• More details: compilers, automata theory

Lecture 4 – Syntax (Cont.), Spring 2008 20CSE3302 Programming Languages, UT-Arlington ©Chengkai Li, Weimin He, Weimin He 2008

Page 21: CSE 3302 Programming Languages Chengkai Li, Weimin He Spring 2008 Syntax (cont.) Lecture 4 – Syntax (Cont.), Spring 20081 CSE3302 Programming Languages,

Parsing

• Parsing:– Determine if a grammar can generate a given token string.– That is, to construct a parse tree for the token string.

• Two ways of constructing the parse tree– Top-down (from root towards leaves)

• Can be constructed more easily by hand– Bottom-up (from leaves towards root)

• Can handle a larger class of grammars• Parser generators tend to use bottom-up methods

Lecture 4 – Syntax (Cont.), Spring 2008 21CSE3302 Programming Languages, UT-Arlington ©Chengkai Li, Weimin He, Weimin He 2008

Page 22: CSE 3302 Programming Languages Chengkai Li, Weimin He Spring 2008 Syntax (cont.) Lecture 4 – Syntax (Cont.), Spring 20081 CSE3302 Programming Languages,

Top-Down Parser

• Recursive-descent parser: – A special kind of top-down parser: single left-to-right scan,

with one lookahead symbol.– Backtracking (trial-and-error) may happen

• Predictive parser: – The lookahead symbol determines which production to

apply, without backtracking

Lecture 4 – Syntax (Cont.), Spring 2008 22CSE3302 Programming Languages, UT-Arlington ©Chengkai Li, Weimin He, Weimin He 2008

Page 23: CSE 3302 Programming Languages Chengkai Li, Weimin He Spring 2008 Syntax (cont.) Lecture 4 – Syntax (Cont.), Spring 20081 CSE3302 Programming Languages,

Recursive-Descent Parser• Types in Pascal typesimple | array [ simple ] of typesimple char | integer

Input: array [ integer ] of char

Parse treetype

simplearray type[ ] of

integer

char

simple

Lecture 4 – Syntax (Cont.), Spring 2008 23CSE3302 Programming Languages, UT-Arlington ©Chengkai Li, Weimin He, Weimin He 2008

Page 24: CSE 3302 Programming Languages Chengkai Li, Weimin He Spring 2008 Syntax (cont.) Lecture 4 – Syntax (Cont.), Spring 20081 CSE3302 Programming Languages,

Challenge 1: Top-Down Parser Cannot Handle Left-Recursion

expr expr - term | termterm 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9

Input: 3 – 4 – 5Parse tree expr

term-expr

term-expr

expr term-

Lecture 4 – Syntax (Cont.), Spring 2008 24CSE3302 Programming Languages, UT-Arlington ©Chengkai Li, Weimin He, Weimin He 2008

Page 25: CSE 3302 Programming Languages Chengkai Li, Weimin He Spring 2008 Syntax (cont.) Lecture 4 – Syntax (Cont.), Spring 20081 CSE3302 Programming Languages,

Eliminating Left-Recursion

void expr(void){ term(); while (token == ‘-’) { match(‘-’); term(); }}

expr expr - term | term term 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9

expr term { - term } term 0 | 1 | … | 9

( EBNF )

expr term expr’ expr’ term expr’ | ε term 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9

Lecture 4 – Syntax (Cont.), Spring 2008 25CSE3302 Programming Languages, UT-Arlington ©Chengkai Li, Weimin He, Weimin He 2008

Page 26: CSE 3302 Programming Languages Chengkai Li, Weimin He Spring 2008 Syntax (cont.) Lecture 4 – Syntax (Cont.), Spring 20081 CSE3302 Programming Languages,

Challenge 2: Backtracking is Inefficient

• Backtracking: trial-and-errortypesimple | array [ simple ] of typesimple char | integer

Input: array [ integer ] of char

Parse tree

simple

char

type

simplearray type[ ] of

X integer

(Types in Pascal )

Lecture 4 – Syntax (Cont.), Spring 2008 26CSE3302 Programming Languages, UT-Arlington ©Chengkai Li, Weimin He, Weimin He 2008

Page 27: CSE 3302 Programming Languages Chengkai Li, Weimin He Spring 2008 Syntax (cont.) Lecture 4 – Syntax (Cont.), Spring 20081 CSE3302 Programming Languages,

Challenge 2: Backtracking is Inefficient

• We cannot avoid backtracking if the grammar has multiple productions to apply, given a lookahead symbol.

• Solution: – Change the grammar so that there is only one applicable production

that is unambiguously determined by lookahead symbol.

subscription term | term .. termterm 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9

Lecture 4 – Syntax (Cont.), Spring 2008 27CSE3302 Programming Languages, UT-Arlington ©Chengkai Li, Weimin He, Weimin He 2008

Page 28: CSE 3302 Programming Languages Chengkai Li, Weimin He Spring 2008 Syntax (cont.) Lecture 4 – Syntax (Cont.), Spring 20081 CSE3302 Programming Languages,

Avoiding Backtracking by Left Factoring

expr term | term .. term term 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9

expr term restrest term | εterm 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9

A

A A’ A’

Lecture 4 – Syntax (Cont.), Spring 2008 28CSE3302 Programming Languages, UT-Arlington ©Chengkai Li, Weimin He, Weimin He 2008

Page 29: CSE 3302 Programming Languages Chengkai Li, Weimin He Spring 2008 Syntax (cont.) Lecture 4 – Syntax (Cont.), Spring 20081 CSE3302 Programming Languages,

Left Factoring Using EBNFexpr term @ expr | term

expr term [@ expr]

if-statement if ( expr ) statement | if ( expr ) statement else statement

if-statement if ( expr ) statement [ else statement ]

Lecture 4 – Syntax (Cont.), Spring 2008 29CSE3302 Programming Languages, UT-Arlington ©Chengkai Li, Weimin He, Weimin He 2008