Syntax Analysis (parsing) · Lexical Analysis Syntax Analysis Semantic Analysis IR Generation IR...

142
Syntax Analysis (parsing) April 8, 2013 Monday, April 8, 13

Transcript of Syntax Analysis (parsing) · Lexical Analysis Syntax Analysis Semantic Analysis IR Generation IR...

Page 1: Syntax Analysis (parsing) · Lexical Analysis Syntax Analysis Semantic Analysis IR Generation IR Optimization Code Generation Optimization Source Code Machine Code Flex-pert Monday,

Syntax Analysis(parsing)

April 8, 2013

Monday, April 8, 13

Page 2: Syntax Analysis (parsing) · Lexical Analysis Syntax Analysis Semantic Analysis IR Generation IR Optimization Code Generation Optimization Source Code Machine Code Flex-pert Monday,

Where We Are

Lexical Analysis

Syntax Analysis

Semantic Analysis

IR Generation

IR Optimization

Code Generation

Optimization

SourceCode

MachineCode

Monday, April 8, 13

Page 3: Syntax Analysis (parsing) · Lexical Analysis Syntax Analysis Semantic Analysis IR Generation IR Optimization Code Generation Optimization Source Code Machine Code Flex-pert Monday,

Where We Are

Lexical Analysis

Syntax Analysis

Semantic Analysis

IR Generation

IR Optimization

Code Generation

Optimization

SourceCode

MachineCode

Flex-pert Flex-pert

Monday, April 8, 13

Page 4: Syntax Analysis (parsing) · Lexical Analysis Syntax Analysis Semantic Analysis IR Generation IR Optimization Code Generation Optimization Source Code Machine Code Flex-pert Monday,

Where We Are

Lexical Analysis

Syntax Analysis

Semantic Analysis

IR Generation

IR Optimization

Code Generation

Optimization

SourceCode

MachineCode

Monday, April 8, 13

Page 5: Syntax Analysis (parsing) · Lexical Analysis Syntax Analysis Semantic Analysis IR Generation IR Optimization Code Generation Optimization Source Code Machine Code Flex-pert Monday,

while (ip < z) ++ip;

Monday, April 8, 13

Page 6: Syntax Analysis (parsing) · Lexical Analysis Syntax Analysis Semantic Analysis IR Generation IR Optimization Code Generation Optimization Source Code Machine Code Flex-pert Monday,

w h i l e ( i < z ) \n \t + i p ;

while (ip < z) ++ip;

p + +

Monday, April 8, 13

Page 7: Syntax Analysis (parsing) · Lexical Analysis Syntax Analysis Semantic Analysis IR Generation IR Optimization Code Generation Optimization Source Code Machine Code Flex-pert Monday,

w h i l e ( i < z ) \n \t + i p ;

while (ip < z) ++ip;

p + +

T_While ( T_Ident < T_Ident ) ++ T_Ident

ip z ip

Monday, April 8, 13

Page 8: Syntax Analysis (parsing) · Lexical Analysis Syntax Analysis Semantic Analysis IR Generation IR Optimization Code Generation Optimization Source Code Machine Code Flex-pert Monday,

w h i l e ( i < z ) \n \t + i p ;

while (ip < z) ++ip;

p + +

T_While ( T_Ident < T_Ident ) ++ T_Ident

ip z ip

While

++

Ident

<

Ident Ident

ip z ip

Monday, April 8, 13

Page 9: Syntax Analysis (parsing) · Lexical Analysis Syntax Analysis Semantic Analysis IR Generation IR Optimization Code Generation Optimization Source Code Machine Code Flex-pert Monday,

do[for] = new 0;

Monday, April 8, 13

Page 10: Syntax Analysis (parsing) · Lexical Analysis Syntax Analysis Semantic Analysis IR Generation IR Optimization Code Generation Optimization Source Code Machine Code Flex-pert Monday,

do[for] = new 0;

d o [ f ] n e w 0 ;=o r

Monday, April 8, 13

Page 11: Syntax Analysis (parsing) · Lexical Analysis Syntax Analysis Semantic Analysis IR Generation IR Optimization Code Generation Optimization Source Code Machine Code Flex-pert Monday,

do[for] = new 0;

T_Do [ T_For T_New T_IntConst

0

d o [ f ] n e w 0 ;=o r

] =

Monday, April 8, 13

Page 12: Syntax Analysis (parsing) · Lexical Analysis Syntax Analysis Semantic Analysis IR Generation IR Optimization Code Generation Optimization Source Code Machine Code Flex-pert Monday,

do[for] = new 0;

T_Do [ T_For T_New T_IntConst

0

d o [ f ] n e w 0 ;=o r

] =

Monday, April 8, 13

Page 13: Syntax Analysis (parsing) · Lexical Analysis Syntax Analysis Semantic Analysis IR Generation IR Optimization Code Generation Optimization Source Code Machine Code Flex-pert Monday,

What is Parsing? •  Parsing: Recognizing whether a sentence (or program) is

grammatically well formed and identifying the function of each component

1

�I gave him the book� sentence

subject:I object:him verb:gave indirect object

noun phrase

noun:book article:the

Monday, April 8, 13

Page 14: Syntax Analysis (parsing) · Lexical Analysis Syntax Analysis Semantic Analysis IR Generation IR Optimization Code Generation Optimization Source Code Machine Code Flex-pert Monday,

Overview of Syntax Analysis

• Input: stream of tokens from the lexer

• Output: Abstract Syntax Tree (AST)

if

==

b 0

=

a �Hi�

;

Source code: if (b==0) a = “Hi”;

AST:

Monday, April 8, 13

Page 15: Syntax Analysis (parsing) · Lexical Analysis Syntax Analysis Semantic Analysis IR Generation IR Optimization Code Generation Optimization Source Code Machine Code Flex-pert Monday,

Goal of Syntax Analysis

• Recovered the structure described by the token stream

• Report errors if the tokens do not properly encode a structure

Monday, April 8, 13

Page 16: Syntax Analysis (parsing) · Lexical Analysis Syntax Analysis Semantic Analysis IR Generation IR Optimization Code Generation Optimization Source Code Machine Code Flex-pert Monday,

What%Parsing%Doesn�t%Do%•  %Doesn�t%check:%type%agreement,%variable%declara;on,%

variables%ini;aliza;on,%etc.%

•  int x = true; •  int y; •  z = f(y);

•  %Deferred%un;l%seman;c%analysis%

Monday, April 8, 13

Page 17: Syntax Analysis (parsing) · Lexical Analysis Syntax Analysis Semantic Analysis IR Generation IR Optimization Code Generation Optimization Source Code Machine Code Flex-pert Monday,

Outline• Today: Formalism for syntax analysis

• Grammars

• Derivation

• Ambiguity

• Concrete vs. Abstract Syntax Tree

• Wednesday: Parsing algorithms

• Top Down

• Bottom up

Monday, April 8, 13

Page 18: Syntax Analysis (parsing) · Lexical Analysis Syntax Analysis Semantic Analysis IR Generation IR Optimization Code Generation Optimization Source Code Machine Code Flex-pert Monday,

Specify Language Syntax

Monday, April 8, 13

Page 19: Syntax Analysis (parsing) · Lexical Analysis Syntax Analysis Semantic Analysis IR Generation IR Optimization Code Generation Optimization Source Code Machine Code Flex-pert Monday,

Specify Language Syntax

• First problem: how to describe language syntax precisely and conveniently

Monday, April 8, 13

Page 20: Syntax Analysis (parsing) · Lexical Analysis Syntax Analysis Semantic Analysis IR Generation IR Optimization Code Generation Optimization Source Code Machine Code Flex-pert Monday,

Specify Language Syntax

• First problem: how to describe language syntax precisely and conveniently

• Last time: can describe tokens using ???

Monday, April 8, 13

Page 21: Syntax Analysis (parsing) · Lexical Analysis Syntax Analysis Semantic Analysis IR Generation IR Optimization Code Generation Optimization Source Code Machine Code Flex-pert Monday,

Specify Language Syntax

• First problem: how to describe language syntax precisely and conveniently

• Last time: can describe tokens using ???

• Regular expressions easy to implement, efficient (by converting to DFA)

Monday, April 8, 13

Page 22: Syntax Analysis (parsing) · Lexical Analysis Syntax Analysis Semantic Analysis IR Generation IR Optimization Code Generation Optimization Source Code Machine Code Flex-pert Monday,

Specify Language Syntax

• First problem: how to describe language syntax precisely and conveniently

• Last time: can describe tokens using ???

• Regular expressions easy to implement, efficient (by converting to DFA)

• Can we use regular expressions (on tokens) to specify programming language syntax?

Monday, April 8, 13

Page 23: Syntax Analysis (parsing) · Lexical Analysis Syntax Analysis Semantic Analysis IR Generation IR Optimization Code Generation Optimization Source Code Machine Code Flex-pert Monday,

Specify Language Syntax

• First problem: how to describe language syntax precisely and conveniently

• Last time: can describe tokens using ???

• Regular expressions easy to implement, efficient (by converting to DFA)

• Can we use regular expressions (on tokens) to specify programming language syntax?

Monday, April 8, 13

Page 24: Syntax Analysis (parsing) · Lexical Analysis Syntax Analysis Semantic Analysis IR Generation IR Optimization Code Generation Optimization Source Code Machine Code Flex-pert Monday,

Specify Language Syntax

• First problem: how to describe language syntax precisely and conveniently

• Last time: can describe tokens using ???

• Regular expressions easy to implement, efficient (by converting to DFA)

• Can we use regular expressions (on tokens) to specify programming language syntax?

Monday, April 8, 13

Page 25: Syntax Analysis (parsing) · Lexical Analysis Syntax Analysis Semantic Analysis IR Generation IR Optimization Code Generation Optimization Source Code Machine Code Flex-pert Monday,

To answer this...

• Consider: language of all strings that contain balanced parentheses

• () (()) ()()() (())()((()()))

• (( )( ()) (()()

• Construct a Finite Automaton for this...?

Monday, April 8, 13

Page 26: Syntax Analysis (parsing) · Lexical Analysis Syntax Analysis Semantic Analysis IR Generation IR Optimization Code Generation Optimization Source Code Machine Code Flex-pert Monday,

) ) ) ) )

( ( ( ( (

Monday, April 8, 13

Page 27: Syntax Analysis (parsing) · Lexical Analysis Syntax Analysis Semantic Analysis IR Generation IR Optimization Code Generation Optimization Source Code Machine Code Flex-pert Monday,

• Limits of regular language: DFA has only finite number of states; cannot perform unbounded counting

• Need  a  More  Powerful  Representa3on

) ) ) ) )

( ( ( ( (

Monday, April 8, 13

Page 28: Syntax Analysis (parsing) · Lexical Analysis Syntax Analysis Semantic Analysis IR Generation IR Optimization Code Generation Optimization Source Code Machine Code Flex-pert Monday,

Context Free Grammar (CFG)

Monday, April 8, 13

Page 29: Syntax Analysis (parsing) · Lexical Analysis Syntax Analysis Semantic Analysis IR Generation IR Optimization Code Generation Optimization Source Code Machine Code Flex-pert Monday,

Context Free Grammar (CFG)• A specification of the balanced-parenthesis language:

• S → ( S ) S

• S → ε

• The definition is recursive

• A context-free grammar: more expressive than regular expressions

• S => (S) ε => ((S) S) ε => ((ε) ε) ε => (())

• If a grammar accepts a string, there is a derivation of that string using the productions of the grammar

Monday, April 8, 13

Page 30: Syntax Analysis (parsing) · Lexical Analysis Syntax Analysis Semantic Analysis IR Generation IR Optimization Code Generation Optimization Source Code Machine Code Flex-pert Monday,

Formal Language Terminology Review

Monday, April 8, 13

Page 31: Syntax Analysis (parsing) · Lexical Analysis Syntax Analysis Semantic Analysis IR Generation IR Optimization Code Generation Optimization Source Code Machine Code Flex-pert Monday,

Formal Language Terminology Review• Language is a set of strings

Monday, April 8, 13

Page 32: Syntax Analysis (parsing) · Lexical Analysis Syntax Analysis Semantic Analysis IR Generation IR Optimization Code Generation Optimization Source Code Machine Code Flex-pert Monday,

Formal Language Terminology Review• Language is a set of strings

• each string is a finite sequence of symbols taken from a finite alphabet

Monday, April 8, 13

Page 33: Syntax Analysis (parsing) · Lexical Analysis Syntax Analysis Semantic Analysis IR Generation IR Optimization Code Generation Optimization Source Code Machine Code Flex-pert Monday,

Formal Language Terminology Review• Language is a set of strings

• each string is a finite sequence of symbols taken from a finite alphabet

• parsing:

• the strings are source program,

• the symbols are lexical tokens,

• the alphabet is the set of token-types returned by the lexical analyzer

• CFG describes a language

Monday, April 8, 13

Page 34: Syntax Analysis (parsing) · Lexical Analysis Syntax Analysis Semantic Analysis IR Generation IR Optimization Code Generation Optimization Source Code Machine Code Flex-pert Monday,

Formal Language Terminology Review• Language is a set of strings

• each string is a finite sequence of symbols taken from a finite alphabet

• parsing:

• the strings are source program,

• the symbols are lexical tokens,

• the alphabet is the set of token-types returned by the lexical analyzer

• CFG describes a language

For scanning, alphabet is the ASCII Monday, April 8, 13

Page 35: Syntax Analysis (parsing) · Lexical Analysis Syntax Analysis Semantic Analysis IR Generation IR Optimization Code Generation Optimization Source Code Machine Code Flex-pert Monday,

CFG Terminology• Terminals

• Token or ε

• Non-terminals

• variables

• Start symbol

• Begins the derivation

• Productions

• Specify how non-terminals may be expanded to form strings (replacement rules)

• LHS: single non-terminal, RHS: string of terminals (including ε) or non-terminals

• S → ( S ) S

• S → ε

Monday, April 8, 13

Page 36: Syntax Analysis (parsing) · Lexical Analysis Syntax Analysis Semantic Analysis IR Generation IR Optimization Code Generation Optimization Source Code Machine Code Flex-pert Monday,

Arithmetic Expressions

● Suppose we want to describe all legal arithmetic expressions using addition, subtraction, multiplication, and division.

● Here is one possible CFG:

E → int

E → E Op E

E → (E)

Op → +

Op → -

Op → *

Op → /

E⇒ E Op E⇒ E Op (E)⇒ E Op (E Op E)⇒ E * (E Op E)⇒ int * (E Op E)⇒ int * (int Op E)⇒ int * (int Op int)⇒ int * (int + int)

Another Example...

Non-terminal SymbolsTerminal SymbolsProduction Rules

Start Symbols

Monday, April 8, 13

Page 37: Syntax Analysis (parsing) · Lexical Analysis Syntax Analysis Semantic Analysis IR Generation IR Optimization Code Generation Optimization Source Code Machine Code Flex-pert Monday,

Arithmetic Expressions

● Suppose we want to describe all legal arithmetic expressions using addition, subtraction, multiplication, and division.

● Here is one possible CFG:

E → int

E → E Op E

E → (E)

Op → +

Op → -

Op → *

Op → /

E⇒ E Op E⇒ E Op (E)⇒ E Op (E Op E)⇒ E * (E Op E)⇒ int * (E Op E)⇒ int * (int Op E)⇒ int * (int Op int)⇒ int * (int + int)

Another Example...

Monday, April 8, 13

Page 38: Syntax Analysis (parsing) · Lexical Analysis Syntax Analysis Semantic Analysis IR Generation IR Optimization Code Generation Optimization Source Code Machine Code Flex-pert Monday,

A Notational Shorthand

A Notational Shorthand

E → int

E → E Op E

E → (E)

Op → +

Op → -

Op → *

Op → /

A Notational Shorthand

E → int | E Op E | (E)

Op → + | - | * | /

• Vertical bar | is shorthand for multiple productions

Monday, April 8, 13

Page 39: Syntax Analysis (parsing) · Lexical Analysis Syntax Analysis Semantic Analysis IR Generation IR Optimization Code Generation Optimization Source Code Machine Code Flex-pert Monday,

CFGs for Programming Languages

BLOCK → STMT | { STMTS }

STMTS → ε

| STMT STMTS

STMT → EXPR; | if (EXPR) BLOCK

| while (EXPR) BLOCK | do BLOCK while (EXPR); | BLOCK | …

EXPR → identifier | constant

| EXPR + EXPR | EXPR – EXPR | EXPR * EXPR | ...

Monday, April 8, 13

Page 40: Syntax Analysis (parsing) · Lexical Analysis Syntax Analysis Semantic Analysis IR Generation IR Optimization Code Generation Optimization Source Code Machine Code Flex-pert Monday,

RegExp:(A(Subset(of(CFG(•  Regular(expression(defini;on(of(real(numbers:(•  ! digit→[0@9](•  ! posint→digit+(•  ! int→@?(posint!•  ! real→int!.!(ε(|(posint)(

•  RE(symbolic(names(are(only(shorthand:!no(recursion,(so(all(symbols(can(be(fully(expanded:(

•  ! real!→@?([0@9]+(.!(ε(|(([0@9]+))(

Monday, April 8, 13

Page 41: Syntax Analysis (parsing) · Lexical Analysis Syntax Analysis Semantic Analysis IR Generation IR Optimization Code Generation Optimization Source Code Machine Code Flex-pert Monday,

Some CFG Notation

● Capital letters at the beginning of the alphabet will represent nonterminals.

● i.e. A, B, C, D

● Lowercase letters at the end of the alphabet will represent terminals.

● i.e. t, u, v, w

● Lowercase Greek letters will represent arbitrary strings of terminals and nonterminals.

● i.e. α, γ, ω

Monday, April 8, 13

Page 42: Syntax Analysis (parsing) · Lexical Analysis Syntax Analysis Semantic Analysis IR Generation IR Optimization Code Generation Optimization Source Code Machine Code Flex-pert Monday,

Examples

● We might write an arbitrary production as

A → ω

● We might write a string of a nonterminal followed by a terminal as

At

● We might write an arbitrary production containing a nonterminal followed by a terminal as

B → αAtω

Monday, April 8, 13

Page 43: Syntax Analysis (parsing) · Lexical Analysis Syntax Analysis Semantic Analysis IR Generation IR Optimization Code Generation Optimization Source Code Machine Code Flex-pert Monday,

Derivations

⇒ E

⇒ E Op E

⇒ E Op (E)

⇒ E Op (E Op E)

⇒ E * (E Op E)

⇒ int * (E Op E)

⇒ int * (int Op E)

⇒ int * (int Op int)

⇒ int * (int + int)

● This sequence of steps is called a derivation.

● A string αAω yields string αγω iff A → γ is a production.

● If α yields β, we write α ⇒ β.

● We say that α derives β iff there is a sequence of strings where α ⇒ α

1 ⇒ α

2 ⇒ … ⇒ β

● If α derives β, we write α ⇒* β.

Derivation

Monday, April 8, 13

Page 44: Syntax Analysis (parsing) · Lexical Analysis Syntax Analysis Semantic Analysis IR Generation IR Optimization Code Generation Optimization Source Code Machine Code Flex-pert Monday,

Derivations

⇒ E

⇒ E Op E

⇒ E Op (E)

⇒ E Op (E Op E)

⇒ E * (E Op E)

⇒ int * (E Op E)

⇒ int * (int Op E)

⇒ int * (int Op int)

⇒ int * (int + int)

● This sequence of steps is called a derivation.

● A string αAω yields string αγω iff A → γ is a production.

● If α yields β, we write α ⇒ β.

● We say that α derives β iff there is a sequence of strings where α ⇒ α

1 ⇒ α

2 ⇒ … ⇒ β

● If α derives β, we write α ⇒* β.

Derivation

• Terminals: no replacement rules for them

• Terminals: tokens from the lexerMonday, April 8, 13

Page 45: Syntax Analysis (parsing) · Lexical Analysis Syntax Analysis Semantic Analysis IR Generation IR Optimization Code Generation Optimization Source Code Machine Code Flex-pert Monday,

• Which of the strings are in the language of the given CFG?

• abcba

• acca

• aba

• abcbcba

• S → aXa

• X → ε | bY

• Y → ε | cXc

Monday, April 8, 13

Page 46: Syntax Analysis (parsing) · Lexical Analysis Syntax Analysis Semantic Analysis IR Generation IR Optimization Code Generation Optimization Source Code Machine Code Flex-pert Monday,

Leftmost Derivations

→ STMTS

⇒ STMT STMTS

⇒ EXPR; STMTS

⇒ EXPR = EXPR; STMTS

⇒ id = EXPR; STMTS

⇒ id = EXPR + EXPR; STMTS

⇒ id = id + EXPR; STMTS

⇒ id = id + constant; STMTS

⇒ id = id + constant;

BLOCK → STMT | { STMTS }

STMTS → ε

| STMT STMTS

STMT → EXPR; | if (EXPR) BLOCK | while (EXPR) BLOCK

| do BLOCK while (EXPR); | BLOCK | …

EXPR → identifier | constant | EXPR + EXPR | EXPR – EXPR | EXPR * EXPR | EXPR = EXPR | ...

Productions Derivations

Monday, April 8, 13

Page 47: Syntax Analysis (parsing) · Lexical Analysis Syntax Analysis Semantic Analysis IR Generation IR Optimization Code Generation Optimization Source Code Machine Code Flex-pert Monday,

Leftmost Derivations

● A leftmost derivation is a derivation in which each step expands the leftmost nonterminal.

● A rightmost derivation is a derivation in which each step expands the rightmost nonterminal.

● These will be of great importance when we talk about parsing next week.

Monday, April 8, 13

Page 48: Syntax Analysis (parsing) · Lexical Analysis Syntax Analysis Semantic Analysis IR Generation IR Optimization Code Generation Optimization Source Code Machine Code Flex-pert Monday,

Related Derivations

⇒ E

⇒ E Op E

⇒ int Op E

⇒ int * E

⇒ int * (E)

⇒ int * (E Op E)

⇒ int * (int Op E)

⇒ int * (int + E)

⇒ int * (int + int)

⇒ E

⇒ E Op E

⇒ E Op (E)

⇒ E Op (E Op E)

⇒ E Op (E Op int)

⇒ E Op (E + int)

⇒ E Op (int + int)

⇒ E * (int + int)

⇒ int * (int + int)

Monday, April 8, 13

Page 49: Syntax Analysis (parsing) · Lexical Analysis Syntax Analysis Semantic Analysis IR Generation IR Optimization Code Generation Optimization Source Code Machine Code Flex-pert Monday,

Derivations Revisited

● A derivation encodes two pieces of information:

● What productions were applied produce the resulting string from the start symbol?

● In what order were they applied?

● Multiple derivations might use the same productions, but apply them in a different order.

Monday, April 8, 13

Page 50: Syntax Analysis (parsing) · Lexical Analysis Syntax Analysis Semantic Analysis IR Generation IR Optimization Code Generation Optimization Source Code Machine Code Flex-pert Monday,

• Derivation: also a process of constructing a parse tree

Monday, April 8, 13

Page 51: Syntax Analysis (parsing) · Lexical Analysis Syntax Analysis Semantic Analysis IR Generation IR Optimization Code Generation Optimization Source Code Machine Code Flex-pert Monday,

Parse Trees

⇒ E

⇒ E Op E

⇒ int Op E

⇒ int * E

⇒ int * (E)

⇒ int * (E Op E)

⇒ int * (int Op E)

⇒ int * (int + E)

⇒ int * (int + int)

A Notational Shorthand

E → int | E Op E | (E)

Op → + | - | * | /

Monday, April 8, 13

Page 52: Syntax Analysis (parsing) · Lexical Analysis Syntax Analysis Semantic Analysis IR Generation IR Optimization Code Generation Optimization Source Code Machine Code Flex-pert Monday,

Parse Trees

⇒ E

⇒ E Op E

⇒ int Op E

⇒ int * E

⇒ int * (E)

⇒ int * (E Op E)

⇒ int * (int Op E)

⇒ int * (int + E)

⇒ int * (int + int)

E

A Notational Shorthand

E → int | E Op E | (E)

Op → + | - | * | /

Monday, April 8, 13

Page 53: Syntax Analysis (parsing) · Lexical Analysis Syntax Analysis Semantic Analysis IR Generation IR Optimization Code Generation Optimization Source Code Machine Code Flex-pert Monday,

Parse Trees

⇒ E

⇒ E Op E

⇒ int Op E

⇒ int * E

⇒ int * (E)

⇒ int * (E Op E)

⇒ int * (int Op E)

⇒ int * (int + E)

⇒ int * (int + int)

E

A Notational Shorthand

E → int | E Op E | (E)

Op → + | - | * | /

Monday, April 8, 13

Page 54: Syntax Analysis (parsing) · Lexical Analysis Syntax Analysis Semantic Analysis IR Generation IR Optimization Code Generation Optimization Source Code Machine Code Flex-pert Monday,

Parse Trees

⇒ E

⇒ E Op E

⇒ int Op E

⇒ int * E

⇒ int * (E)

⇒ int * (E Op E)

⇒ int * (int Op E)

⇒ int * (int + E)

⇒ int * (int + int)

E

E EOp

A Notational Shorthand

E → int | E Op E | (E)

Op → + | - | * | /

Monday, April 8, 13

Page 55: Syntax Analysis (parsing) · Lexical Analysis Syntax Analysis Semantic Analysis IR Generation IR Optimization Code Generation Optimization Source Code Machine Code Flex-pert Monday,

Parse Trees

⇒ E

⇒ E Op E

⇒ int Op E

⇒ int * E

⇒ int * (E)

⇒ int * (E Op E)

⇒ int * (int Op E)

⇒ int * (int + E)

⇒ int * (int + int)

E

E EOp

A Notational Shorthand

E → int | E Op E | (E)

Op → + | - | * | /

Monday, April 8, 13

Page 56: Syntax Analysis (parsing) · Lexical Analysis Syntax Analysis Semantic Analysis IR Generation IR Optimization Code Generation Optimization Source Code Machine Code Flex-pert Monday,

Parse Trees

⇒ E

⇒ E Op E

⇒ int Op E

⇒ int * E

⇒ int * (E)

⇒ int * (E Op E)

⇒ int * (int Op E)

⇒ int * (int + E)

⇒ int * (int + int)

E

E EOp

int

A Notational Shorthand

E → int | E Op E | (E)

Op → + | - | * | /

Monday, April 8, 13

Page 57: Syntax Analysis (parsing) · Lexical Analysis Syntax Analysis Semantic Analysis IR Generation IR Optimization Code Generation Optimization Source Code Machine Code Flex-pert Monday,

Parse Trees

⇒ E

⇒ E Op E

⇒ int Op E

⇒ int * E

⇒ int * (E)

⇒ int * (E Op E)

⇒ int * (int Op E)

⇒ int * (int + E)

⇒ int * (int + int)

E

E EOp

int

A Notational Shorthand

E → int | E Op E | (E)

Op → + | - | * | /

Monday, April 8, 13

Page 58: Syntax Analysis (parsing) · Lexical Analysis Syntax Analysis Semantic Analysis IR Generation IR Optimization Code Generation Optimization Source Code Machine Code Flex-pert Monday,

Parse Trees

⇒ E

⇒ E Op E

⇒ int Op E

⇒ int * E

⇒ int * (E)

⇒ int * (E Op E)

⇒ int * (int Op E)

⇒ int * (int + E)

⇒ int * (int + int)

E

E EOp

int *

A Notational Shorthand

E → int | E Op E | (E)

Op → + | - | * | /

Monday, April 8, 13

Page 59: Syntax Analysis (parsing) · Lexical Analysis Syntax Analysis Semantic Analysis IR Generation IR Optimization Code Generation Optimization Source Code Machine Code Flex-pert Monday,

Parse Trees

⇒ E

⇒ E Op E

⇒ int Op E

⇒ int * E

⇒ int * (E)

⇒ int * (E Op E)

⇒ int * (int Op E)

⇒ int * (int + E)

⇒ int * (int + int)

E

E EOp

int *

A Notational Shorthand

E → int | E Op E | (E)

Op → + | - | * | /

Monday, April 8, 13

Page 60: Syntax Analysis (parsing) · Lexical Analysis Syntax Analysis Semantic Analysis IR Generation IR Optimization Code Generation Optimization Source Code Machine Code Flex-pert Monday,

Parse Trees

⇒ E

⇒ E Op E

⇒ int Op E

⇒ int * E

⇒ int * (E)

⇒ int * (E Op E)

⇒ int * (int Op E)

⇒ int * (int + E)

⇒ int * (int + int)

E

E EOp

int *

E

( )

A Notational Shorthand

E → int | E Op E | (E)

Op → + | - | * | /

Monday, April 8, 13

Page 61: Syntax Analysis (parsing) · Lexical Analysis Syntax Analysis Semantic Analysis IR Generation IR Optimization Code Generation Optimization Source Code Machine Code Flex-pert Monday,

Parse Trees

⇒ E

⇒ E Op E

⇒ int Op E

⇒ int * E

⇒ int * (E)

⇒ int * (E Op E)

⇒ int * (int Op E)

⇒ int * (int + E)

⇒ int * (int + int)

E

E EOp

int *

E

( )

A Notational Shorthand

E → int | E Op E | (E)

Op → + | - | * | /

Monday, April 8, 13

Page 62: Syntax Analysis (parsing) · Lexical Analysis Syntax Analysis Semantic Analysis IR Generation IR Optimization Code Generation Optimization Source Code Machine Code Flex-pert Monday,

Parse Trees

⇒ E

⇒ E Op E

⇒ int Op E

⇒ int * E

⇒ int * (E)

⇒ int * (E Op E)

⇒ int * (int Op E)

⇒ int * (int + E)

⇒ int * (int + int)

E

E EOp

int *

E

( )

E EOp

A Notational Shorthand

E → int | E Op E | (E)

Op → + | - | * | /

Monday, April 8, 13

Page 63: Syntax Analysis (parsing) · Lexical Analysis Syntax Analysis Semantic Analysis IR Generation IR Optimization Code Generation Optimization Source Code Machine Code Flex-pert Monday,

Parse Trees

⇒ E

⇒ E Op E

⇒ int Op E

⇒ int * E

⇒ int * (E)

⇒ int * (E Op E)

⇒ int * (int Op E)

⇒ int * (int + E)

⇒ int * (int + int)

E

E EOp

int *

E

( )

E EOp

A Notational Shorthand

E → int | E Op E | (E)

Op → + | - | * | /

Monday, April 8, 13

Page 64: Syntax Analysis (parsing) · Lexical Analysis Syntax Analysis Semantic Analysis IR Generation IR Optimization Code Generation Optimization Source Code Machine Code Flex-pert Monday,

Parse Trees

⇒ E

⇒ E Op E

⇒ int Op E

⇒ int * E

⇒ int * (E)

⇒ int * (E Op E)

⇒ int * (int Op E)

⇒ int * (int + E)

⇒ int * (int + int)

E

E EOp

int *

E

( int )

E EOp

A Notational Shorthand

E → int | E Op E | (E)

Op → + | - | * | /

Monday, April 8, 13

Page 65: Syntax Analysis (parsing) · Lexical Analysis Syntax Analysis Semantic Analysis IR Generation IR Optimization Code Generation Optimization Source Code Machine Code Flex-pert Monday,

Parse Trees

⇒ E

⇒ E Op E

⇒ int Op E

⇒ int * E

⇒ int * (E)

⇒ int * (E Op E)

⇒ int * (int Op E)

⇒ int * (int + E)

⇒ int * (int + int)

E

E EOp

int *

E

( int )

E EOp

A Notational Shorthand

E → int | E Op E | (E)

Op → + | - | * | /

Monday, April 8, 13

Page 66: Syntax Analysis (parsing) · Lexical Analysis Syntax Analysis Semantic Analysis IR Generation IR Optimization Code Generation Optimization Source Code Machine Code Flex-pert Monday,

Parse Trees

⇒ E

⇒ E Op E

⇒ int Op E

⇒ int * E

⇒ int * (E)

⇒ int * (E Op E)

⇒ int * (int Op E)

⇒ int * (int + E)

⇒ int * (int + int)

E

E EOp

int *

E

( int + )

E EOp

A Notational Shorthand

E → int | E Op E | (E)

Op → + | - | * | /

Monday, April 8, 13

Page 67: Syntax Analysis (parsing) · Lexical Analysis Syntax Analysis Semantic Analysis IR Generation IR Optimization Code Generation Optimization Source Code Machine Code Flex-pert Monday,

Parse Trees

⇒ E

⇒ E Op E

⇒ int Op E

⇒ int * E

⇒ int * (E)

⇒ int * (E Op E)

⇒ int * (int Op E)

⇒ int * (int + E)

⇒ int * (int + int)

E

E EOp

int *

E

( int + )

E EOp

A Notational Shorthand

E → int | E Op E | (E)

Op → + | - | * | /

Monday, April 8, 13

Page 68: Syntax Analysis (parsing) · Lexical Analysis Syntax Analysis Semantic Analysis IR Generation IR Optimization Code Generation Optimization Source Code Machine Code Flex-pert Monday,

Parse Trees

⇒ E

⇒ E Op E

⇒ int Op E

⇒ int * E

⇒ int * (E)

⇒ int * (E Op E)

⇒ int * (int Op E)

⇒ int * (int + E)

⇒ int * (int + int)

E

E EOp

int *

E

( int + int )

E EOp

A Notational Shorthand

E → int | E Op E | (E)

Op → + | - | * | /

Monday, April 8, 13

Page 69: Syntax Analysis (parsing) · Lexical Analysis Syntax Analysis Semantic Analysis IR Generation IR Optimization Code Generation Optimization Source Code Machine Code Flex-pert Monday,

Parse Trees

⇒ E

⇒ E Op E

⇒ int Op E

⇒ int * E

⇒ int * (E)

⇒ int * (E Op E)

⇒ int * (int Op E)

⇒ int * (int + E)

⇒ int * (int + int)

E

E EOp

int *

E

( int + int )

E EOp

Start symbol is the root

A Notational Shorthand

E → int | E Op E | (E)

Op → + | - | * | /

Monday, April 8, 13

Page 70: Syntax Analysis (parsing) · Lexical Analysis Syntax Analysis Semantic Analysis IR Generation IR Optimization Code Generation Optimization Source Code Machine Code Flex-pert Monday,

Parse Trees

⇒ E

⇒ E Op E

⇒ int Op E

⇒ int * E

⇒ int * (E)

⇒ int * (E Op E)

⇒ int * (int Op E)

⇒ int * (int + E)

⇒ int * (int + int)

E

E EOp

int *

E

( int + int )

E EOp

Start symbol is the rootNon-leaf nodes are non-terminals

A Notational Shorthand

E → int | E Op E | (E)

Op → + | - | * | /

Monday, April 8, 13

Page 71: Syntax Analysis (parsing) · Lexical Analysis Syntax Analysis Semantic Analysis IR Generation IR Optimization Code Generation Optimization Source Code Machine Code Flex-pert Monday,

Parse Trees

⇒ E

⇒ E Op E

⇒ int Op E

⇒ int * E

⇒ int * (E)

⇒ int * (E Op E)

⇒ int * (int Op E)

⇒ int * (int + E)

⇒ int * (int + int)

E

E EOp

int *

E

( int + int )

E EOp

Start symbol is the root

Leaf nodes are terminalsNon-leaf nodes are non-terminals

A Notational Shorthand

E → int | E Op E | (E)

Op → + | - | * | /

Monday, April 8, 13

Page 72: Syntax Analysis (parsing) · Lexical Analysis Syntax Analysis Semantic Analysis IR Generation IR Optimization Code Generation Optimization Source Code Machine Code Flex-pert Monday,

Parse Trees

⇒ E

⇒ E Op E

⇒ int Op E

⇒ int * E

⇒ int * (E)

⇒ int * (E Op E)

⇒ int * (int Op E)

⇒ int * (int + E)

⇒ int * (int + int)

E

E EOp

int *

E

( int + int )

E EOp

Start symbol is the root

Leaf nodes are terminalsNon-leaf nodes are non-terminals

Inorder walk of the leaves is the generated string

A Notational Shorthand

E → int | E Op E | (E)

Op → + | - | * | /

Monday, April 8, 13

Page 73: Syntax Analysis (parsing) · Lexical Analysis Syntax Analysis Semantic Analysis IR Generation IR Optimization Code Generation Optimization Source Code Machine Code Flex-pert Monday,

Parse Trees

⇒ E

⇒ E Op E

⇒ E Op (E)

⇒ E Op (E Op E)

⇒ E Op (E Op int)

⇒ E Op (E + int)

⇒ E Op (int + int)

⇒ E * (int + int)

⇒ int * (int + int)

A Notational Shorthand

E → int | E Op E | (E)

Op → + | - | * | /

Monday, April 8, 13

Page 74: Syntax Analysis (parsing) · Lexical Analysis Syntax Analysis Semantic Analysis IR Generation IR Optimization Code Generation Optimization Source Code Machine Code Flex-pert Monday,

Parse Trees

⇒ E

⇒ E Op E

⇒ E Op (E)

⇒ E Op (E Op E)

⇒ E Op (E Op int)

⇒ E Op (E + int)

⇒ E Op (int + int)

⇒ E * (int + int)

⇒ int * (int + int)

E

A Notational Shorthand

E → int | E Op E | (E)

Op → + | - | * | /

Monday, April 8, 13

Page 75: Syntax Analysis (parsing) · Lexical Analysis Syntax Analysis Semantic Analysis IR Generation IR Optimization Code Generation Optimization Source Code Machine Code Flex-pert Monday,

Parse Trees

⇒ E

⇒ E Op E

⇒ E Op (E)

⇒ E Op (E Op E)

⇒ E Op (E Op int)

⇒ E Op (E + int)

⇒ E Op (int + int)

⇒ E * (int + int)

⇒ int * (int + int)

E

A Notational Shorthand

E → int | E Op E | (E)

Op → + | - | * | /

Monday, April 8, 13

Page 76: Syntax Analysis (parsing) · Lexical Analysis Syntax Analysis Semantic Analysis IR Generation IR Optimization Code Generation Optimization Source Code Machine Code Flex-pert Monday,

Parse Trees

⇒ E

⇒ E Op E

⇒ E Op (E)

⇒ E Op (E Op E)

⇒ E Op (E Op int)

⇒ E Op (E + int)

⇒ E Op (int + int)

⇒ E * (int + int)

⇒ int * (int + int)

E

E EOp

A Notational Shorthand

E → int | E Op E | (E)

Op → + | - | * | /

Monday, April 8, 13

Page 77: Syntax Analysis (parsing) · Lexical Analysis Syntax Analysis Semantic Analysis IR Generation IR Optimization Code Generation Optimization Source Code Machine Code Flex-pert Monday,

Parse Trees

⇒ E

⇒ E Op E

⇒ E Op (E)

⇒ E Op (E Op E)

⇒ E Op (E Op int)

⇒ E Op (E + int)

⇒ E Op (int + int)

⇒ E * (int + int)

⇒ int * (int + int)

E

E EOp

A Notational Shorthand

E → int | E Op E | (E)

Op → + | - | * | /

Monday, April 8, 13

Page 78: Syntax Analysis (parsing) · Lexical Analysis Syntax Analysis Semantic Analysis IR Generation IR Optimization Code Generation Optimization Source Code Machine Code Flex-pert Monday,

Parse Trees

⇒ E

⇒ E Op E

⇒ E Op (E)

⇒ E Op (E Op E)

⇒ E Op (E Op int)

⇒ E Op (E + int)

⇒ E Op (int + int)

⇒ E * (int + int)

⇒ int * (int + int)

E

E EOp

E

( )

A Notational Shorthand

E → int | E Op E | (E)

Op → + | - | * | /

Monday, April 8, 13

Page 79: Syntax Analysis (parsing) · Lexical Analysis Syntax Analysis Semantic Analysis IR Generation IR Optimization Code Generation Optimization Source Code Machine Code Flex-pert Monday,

Parse Trees

⇒ E

⇒ E Op E

⇒ E Op (E)

⇒ E Op (E Op E)

⇒ E Op (E Op int)

⇒ E Op (E + int)

⇒ E Op (int + int)

⇒ E * (int + int)

⇒ int * (int + int)

E

E EOp

E

( )

A Notational Shorthand

E → int | E Op E | (E)

Op → + | - | * | /

Monday, April 8, 13

Page 80: Syntax Analysis (parsing) · Lexical Analysis Syntax Analysis Semantic Analysis IR Generation IR Optimization Code Generation Optimization Source Code Machine Code Flex-pert Monday,

Parse Trees

⇒ E

⇒ E Op E

⇒ E Op (E)

⇒ E Op (E Op E)

⇒ E Op (E Op int)

⇒ E Op (E + int)

⇒ E Op (int + int)

⇒ E * (int + int)

⇒ int * (int + int)

E

E EOp

E

( )

E EOp

A Notational Shorthand

E → int | E Op E | (E)

Op → + | - | * | /

Monday, April 8, 13

Page 81: Syntax Analysis (parsing) · Lexical Analysis Syntax Analysis Semantic Analysis IR Generation IR Optimization Code Generation Optimization Source Code Machine Code Flex-pert Monday,

Parse Trees

⇒ E

⇒ E Op E

⇒ E Op (E)

⇒ E Op (E Op E)

⇒ E Op (E Op int)

⇒ E Op (E + int)

⇒ E Op (int + int)

⇒ E * (int + int)

⇒ int * (int + int)

E

E EOp

E

( )

E EOp

A Notational Shorthand

E → int | E Op E | (E)

Op → + | - | * | /

Monday, April 8, 13

Page 82: Syntax Analysis (parsing) · Lexical Analysis Syntax Analysis Semantic Analysis IR Generation IR Optimization Code Generation Optimization Source Code Machine Code Flex-pert Monday,

Parse Trees

⇒ E

⇒ E Op E

⇒ E Op (E)

⇒ E Op (E Op E)

⇒ E Op (E Op int)

⇒ E Op (E + int)

⇒ E Op (int + int)

⇒ E * (int + int)

⇒ int * (int + int)

E

E EOp

E

( int )

E EOp

A Notational Shorthand

E → int | E Op E | (E)

Op → + | - | * | /

Monday, April 8, 13

Page 83: Syntax Analysis (parsing) · Lexical Analysis Syntax Analysis Semantic Analysis IR Generation IR Optimization Code Generation Optimization Source Code Machine Code Flex-pert Monday,

Parse Trees

⇒ E

⇒ E Op E

⇒ E Op (E)

⇒ E Op (E Op E)

⇒ E Op (E Op int)

⇒ E Op (E + int)

⇒ E Op (int + int)

⇒ E * (int + int)

⇒ int * (int + int)

E

E EOp

E

( int )

E EOp

A Notational Shorthand

E → int | E Op E | (E)

Op → + | - | * | /

Monday, April 8, 13

Page 84: Syntax Analysis (parsing) · Lexical Analysis Syntax Analysis Semantic Analysis IR Generation IR Optimization Code Generation Optimization Source Code Machine Code Flex-pert Monday,

Parse Trees

⇒ E

⇒ E Op E

⇒ E Op (E)

⇒ E Op (E Op E)

⇒ E Op (E Op int)

⇒ E Op (E + int)

⇒ E Op (int + int)

⇒ E * (int + int)

⇒ int * (int + int)

E

E EOp

E

( + int )

E EOp

A Notational Shorthand

E → int | E Op E | (E)

Op → + | - | * | /

Monday, April 8, 13

Page 85: Syntax Analysis (parsing) · Lexical Analysis Syntax Analysis Semantic Analysis IR Generation IR Optimization Code Generation Optimization Source Code Machine Code Flex-pert Monday,

Parse Trees

⇒ E

⇒ E Op E

⇒ E Op (E)

⇒ E Op (E Op E)

⇒ E Op (E Op int)

⇒ E Op (E + int)

⇒ E Op (int + int)

⇒ E * (int + int)

⇒ int * (int + int)

E

E EOp

E

( + int )

E EOp

A Notational Shorthand

E → int | E Op E | (E)

Op → + | - | * | /

Monday, April 8, 13

Page 86: Syntax Analysis (parsing) · Lexical Analysis Syntax Analysis Semantic Analysis IR Generation IR Optimization Code Generation Optimization Source Code Machine Code Flex-pert Monday,

Parse Trees

⇒ E

⇒ E Op E

⇒ E Op (E)

⇒ E Op (E Op E)

⇒ E Op (E Op int)

⇒ E Op (E + int)

⇒ E Op (int + int)

⇒ E * (int + int)

⇒ int * (int + int)

E

E EOp

E

( int + int )

E EOp

A Notational Shorthand

E → int | E Op E | (E)

Op → + | - | * | /

Monday, April 8, 13

Page 87: Syntax Analysis (parsing) · Lexical Analysis Syntax Analysis Semantic Analysis IR Generation IR Optimization Code Generation Optimization Source Code Machine Code Flex-pert Monday,

Parse Trees

⇒ E

⇒ E Op E

⇒ E Op (E)

⇒ E Op (E Op E)

⇒ E Op (E Op int)

⇒ E Op (E + int)

⇒ E Op (int + int)

⇒ E * (int + int)

⇒ int * (int + int)

E

E EOp

E

( int + int )

E EOp

A Notational Shorthand

E → int | E Op E | (E)

Op → + | - | * | /

Monday, April 8, 13

Page 88: Syntax Analysis (parsing) · Lexical Analysis Syntax Analysis Semantic Analysis IR Generation IR Optimization Code Generation Optimization Source Code Machine Code Flex-pert Monday,

Parse Trees

⇒ E

⇒ E Op E

⇒ E Op (E)

⇒ E Op (E Op E)

⇒ E Op (E Op int)

⇒ E Op (E + int)

⇒ E Op (int + int)

⇒ E * (int + int)

⇒ int * (int + int)

E

E EOp

*

E

( int + int )

E EOp

A Notational Shorthand

E → int | E Op E | (E)

Op → + | - | * | /

Monday, April 8, 13

Page 89: Syntax Analysis (parsing) · Lexical Analysis Syntax Analysis Semantic Analysis IR Generation IR Optimization Code Generation Optimization Source Code Machine Code Flex-pert Monday,

Parse Trees

⇒ E

⇒ E Op E

⇒ E Op (E)

⇒ E Op (E Op E)

⇒ E Op (E Op int)

⇒ E Op (E + int)

⇒ E Op (int + int)

⇒ E * (int + int)

⇒ int * (int + int)

E

E EOp

*

E

( int + int )

E EOp

A Notational Shorthand

E → int | E Op E | (E)

Op → + | - | * | /

Monday, April 8, 13

Page 90: Syntax Analysis (parsing) · Lexical Analysis Syntax Analysis Semantic Analysis IR Generation IR Optimization Code Generation Optimization Source Code Machine Code Flex-pert Monday,

Parse Trees

⇒ E

⇒ E Op E

⇒ E Op (E)

⇒ E Op (E Op E)

⇒ E Op (E Op int)

⇒ E Op (E + int)

⇒ E Op (int + int)

⇒ E * (int + int)

⇒ int * (int + int)

E

E EOp

int *

E

( int + int )

E EOp

A Notational Shorthand

E → int | E Op E | (E)

Op → + | - | * | /

Monday, April 8, 13

Page 91: Syntax Analysis (parsing) · Lexical Analysis Syntax Analysis Semantic Analysis IR Generation IR Optimization Code Generation Optimization Source Code Machine Code Flex-pert Monday,

For ComparisonE

E EOp

int *

E

( int + int )

E EOp

E

E EOp

int *

E

( int + int )

E EOp

A Notational Shorthand

E → int | E Op E | (E)

Op → + | - | * | /

Monday, April 8, 13

Page 92: Syntax Analysis (parsing) · Lexical Analysis Syntax Analysis Semantic Analysis IR Generation IR Optimization Code Generation Optimization Source Code Machine Code Flex-pert Monday,

For ComparisonE

E EOp

int *

E

( int + int )

E EOp

E

E EOp

int *

E

( int + int )

E EOp

Left-most derivation and right-most derivation generate the same parse tree

A Notational Shorthand

E → int | E Op E | (E)

Op → + | - | * | /

Monday, April 8, 13

Page 93: Syntax Analysis (parsing) · Lexical Analysis Syntax Analysis Semantic Analysis IR Generation IR Optimization Code Generation Optimization Source Code Machine Code Flex-pert Monday,

For ComparisonE

E EOp

int *

E

( int + int )

E EOp

E

E EOp

int *

E

( int + int )

E EOp

Left-most derivation and right-most derivation generate the same parse tree

The order of the construction is different

A Notational Shorthand

E → int | E Op E | (E)

Op → + | - | * | /

Monday, April 8, 13

Page 94: Syntax Analysis (parsing) · Lexical Analysis Syntax Analysis Semantic Analysis IR Generation IR Optimization Code Generation Optimization Source Code Machine Code Flex-pert Monday,

Parse Trees

● A parse tree is a tree encoding the steps in a derivation.

● Internal nodes represent nonterminal symbols used in the production.

● Inorder walk of the leaves contains the generated string.

● Encodes what productions are used, not the order in which those productions are applied.

Monday, April 8, 13

Page 95: Syntax Analysis (parsing) · Lexical Analysis Syntax Analysis Semantic Analysis IR Generation IR Optimization Code Generation Optimization Source Code Machine Code Flex-pert Monday,

The Goal of Parsing

● Goal of syntax analysis: Recover the structure described by a series of tokens.

● If language is described as a CFG, goal is to recover a parse tree for the the input string.

● Usually we do some simplifications on the tree; more on that later.

● We'll discuss how to do this next week.We will discuss how to do this more this Wednesday ...

Monday, April 8, 13

Page 96: Syntax Analysis (parsing) · Lexical Analysis Syntax Analysis Semantic Analysis IR Generation IR Optimization Code Generation Optimization Source Code Machine Code Flex-pert Monday,

Challenges in Parsing

Monday, April 8, 13

Page 97: Syntax Analysis (parsing) · Lexical Analysis Syntax Analysis Semantic Analysis IR Generation IR Optimization Code Generation Optimization Source Code Machine Code Flex-pert Monday,

E

E EOp

int * int + int

E EOp

E

E EOp

int * int

E EOp

+ int

A Serious Problem

int * (int + int) (int * int) + int

Monday, April 8, 13

Page 98: Syntax Analysis (parsing) · Lexical Analysis Syntax Analysis Semantic Analysis IR Generation IR Optimization Code Generation Optimization Source Code Machine Code Flex-pert Monday,

Ambiguity

● A CFG is said to be ambiguous if there is at least one string with two or more parse trees.

● Note that ambiguity is a property of grammars, not languages.

● There is no algorithm for converting an arbitrary ambiguous grammar into an unambiguous one.

● Some languages are inherently ambiguous, meaning that no unambiguous grammar exists for them.

● There is no algorithm for detecting whether an arbitrary grammar is ambiguous.

Monday, April 8, 13

Page 99: Syntax Analysis (parsing) · Lexical Analysis Syntax Analysis Semantic Analysis IR Generation IR Optimization Code Generation Optimization Source Code Machine Code Flex-pert Monday,

Is Ambiguity a Problem?

● Depends on semantics.

E → int | E + E

Monday, April 8, 13

Page 100: Syntax Analysis (parsing) · Lexical Analysis Syntax Analysis Semantic Analysis IR Generation IR Optimization Code Generation Optimization Source Code Machine Code Flex-pert Monday,

Is Ambiguity a Problem?

● Depends on semantics.

E → int | E + E

R

+ R

E

+ E

R

R + R

5 R R+

3 7

E

E + E

5 E E+

3 7

R

5

RE

+E E 7

5 3

Monday, April 8, 13

Page 101: Syntax Analysis (parsing) · Lexical Analysis Syntax Analysis Semantic Analysis IR Generation IR Optimization Code Generation Optimization Source Code Machine Code Flex-pert Monday,

Is Ambiguity a Problem?

● Depends on semantics.

E → int | E + E | E - E

Monday, April 8, 13

Page 102: Syntax Analysis (parsing) · Lexical Analysis Syntax Analysis Semantic Analysis IR Generation IR Optimization Code Generation Optimization Source Code Machine Code Flex-pert Monday,

Is Ambiguity a Problem?

● Depends on semantics.

E → int | E + E | E - E

R

+ R

E

+ E

R

R + R

5 R R+

3 7

E

E - E

5 E E+

3 7

R

5

RE

-E E 7

5 3

Monday, April 8, 13

Page 103: Syntax Analysis (parsing) · Lexical Analysis Syntax Analysis Semantic Analysis IR Generation IR Optimization Code Generation Optimization Source Code Machine Code Flex-pert Monday,

Different Parse Trees

S � S + S | S * S | number

•  Consider expression 1 + 2 * 3

•  Derivation 1: S �S + S �1 + S �1 + S * S � 1 + 2 * S�1 + 2 * 3

•  Derivation 2: S �S * S�S * 3 �S + S * 3 � S + 2 * 3 �1 + 2 * 3

Different(Parse(Trees(! !S!→"S!+"S!|"S!*"S!!!|"number(

•  (Consider(expression(1(+(2(*(3(•  (Deriva9on(1:(S"→S"+"S"→1(+(S"→1(+(S"*(S"→(((1(+(2(*(S"→1(+(2(

*(3(•  (Deriva9on(2:(S"→S"*"S→S"*(3(→S"+(S"*(3(→(((((((S"+(2(*(3(→1(+(

2(*(3(

*

+ 3

1 2

+

1 *

2 3

Monday, April 8, 13

Page 104: Syntax Analysis (parsing) · Lexical Analysis Syntax Analysis Semantic Analysis IR Generation IR Optimization Code Generation Optimization Source Code Machine Code Flex-pert Monday,

Elimina'ng)Ambiguity)•  •)O1en)can)eliminate)ambiguity)by)adding)non6terminals)&)

allowing)recursion)only)on)right)or)le1)

•  ! S!→)S!+)T!|)T!•  ! T!→)T!*)num)|)num)

•  •)T!non6terminal)enforces)precedence)•  •)Le16recursion):)le16associa'vity)

S

S T +

T T 3 *

2 1

Monday, April 8, 13

Page 105: Syntax Analysis (parsing) · Lexical Analysis Syntax Analysis Semantic Analysis IR Generation IR Optimization Code Generation Optimization Source Code Machine Code Flex-pert Monday,

Another example:

Context-Free Grammars

● A regular expression can be

● Any letter

● ε

● The concatenation of regular expressions.

● The union of regular expressions.

● The Kleene closure of a regular expression.

● A parenthesized regular expression.

Monday, April 8, 13

Page 106: Syntax Analysis (parsing) · Lexical Analysis Syntax Analysis Semantic Analysis IR Generation IR Optimization Code Generation Optimization Source Code Machine Code Flex-pert Monday,

Context-Free Grammars

● This gives us the following CFG:

R → a | b | c | …

R → “ε”

R → RR

R → R “|” R

R → R*

R → (R)

Monday, April 8, 13

Page 107: Syntax Analysis (parsing) · Lexical Analysis Syntax Analysis Semantic Analysis IR Generation IR Optimization Code Generation Optimization Source Code Machine Code Flex-pert Monday,

R → a | b | c | …R → “ε”R → RRR → R “|” RR → R*R → (R)

a | b *

R

R

R

R

a | b *

R R

R

R

a | (b*) (a | b)*

An Ambiguous Grammar

Monday, April 8, 13

Page 108: Syntax Analysis (parsing) · Lexical Analysis Syntax Analysis Semantic Analysis IR Generation IR Optimization Code Generation Optimization Source Code Machine Code Flex-pert Monday,

Resolving Ambiguity

● We can try to resolve the ambiguity via layering:

R → a | b | c | …

R → “ε”

R → RR

R → R “|” R

R → R*

R → (R)

R → S | R “|” S

S → T | ST

T → U | T*

U → a | b | c | …

U → “ε”

U → (R)

Monday, April 8, 13

Page 109: Syntax Analysis (parsing) · Lexical Analysis Syntax Analysis Semantic Analysis IR Generation IR Optimization Code Generation Optimization Source Code Machine Code Flex-pert Monday,

Why is this unambiguous?

Only generates

“atomic” expressions

R → S | R “|” S

S → T | ST

T → U | T*

U → a | b | …

U → “ε”

U → (R)

Monday, April 8, 13

Page 110: Syntax Analysis (parsing) · Lexical Analysis Syntax Analysis Semantic Analysis IR Generation IR Optimization Code Generation Optimization Source Code Machine Code Flex-pert Monday,

Why is this unambiguous?

Only generates

“atomic” expressions

Puts stars onto

atomic expressions

R → S | R “|” S

S → T | ST

T → U | T*

U → a | b | …

U → “ε”

U → (R)

Monday, April 8, 13

Page 111: Syntax Analysis (parsing) · Lexical Analysis Syntax Analysis Semantic Analysis IR Generation IR Optimization Code Generation Optimization Source Code Machine Code Flex-pert Monday,

Why is this unambiguous?

Only generates

“atomic” expressions

Puts stars onto

atomic expressions

Concatenates starred

expressions

R → S | R “|” S

S → T | ST

T → U | T*

U → a | b | …

U → “ε”

U → (R)

Monday, April 8, 13

Page 112: Syntax Analysis (parsing) · Lexical Analysis Syntax Analysis Semantic Analysis IR Generation IR Optimization Code Generation Optimization Source Code Machine Code Flex-pert Monday,

Why is this unambiguous?

Only generates

“atomic” expressions

Puts stars onto

atomic expressions

Concatenates starred

expressions

Unions

concatenated

expressionsR → S | R “|” S

S → T | ST

T → U | T*

U → a | b | …

U → “ε”

U → (R)

Monday, April 8, 13

Page 113: Syntax Analysis (parsing) · Lexical Analysis Syntax Analysis Semantic Analysis IR Generation IR Optimization Code Generation Optimization Source Code Machine Code Flex-pert Monday,

R → S | R “|” S

S → T | ST

T → U | T*

U → a | b | c | …

U → “ε”

U → (R)

a b | c | a *

R

Monday, April 8, 13

Page 114: Syntax Analysis (parsing) · Lexical Analysis Syntax Analysis Semantic Analysis IR Generation IR Optimization Code Generation Optimization Source Code Machine Code Flex-pert Monday,

R → S | R “|” S

S → T | ST

T → U | T*

U → a | b | c | …

U → “ε”

U → (R)

a b | c | a *

S

R

R

Monday, April 8, 13

Page 115: Syntax Analysis (parsing) · Lexical Analysis Syntax Analysis Semantic Analysis IR Generation IR Optimization Code Generation Optimization Source Code Machine Code Flex-pert Monday,

R → S | R “|” S

S → T | ST

T → U | T*

U → a | b | c | …

U → “ε”

U → (R)

a b | c | a *

S

S

R

R

R

Monday, April 8, 13

Page 116: Syntax Analysis (parsing) · Lexical Analysis Syntax Analysis Semantic Analysis IR Generation IR Optimization Code Generation Optimization Source Code Machine Code Flex-pert Monday,

R → S | R “|” S

S → T | ST

T → U | T*

U → a | b | c | …

U → “ε”

U → (R)

a b | c | a *

S

S

S

R

R

R

Monday, April 8, 13

Page 117: Syntax Analysis (parsing) · Lexical Analysis Syntax Analysis Semantic Analysis IR Generation IR Optimization Code Generation Optimization Source Code Machine Code Flex-pert Monday,

R → S | R “|” S

S → T | ST

T → U | T*

U → a | b | c | …

U → “ε”

U → (R)

a b | c | a *

T

S

S

S

S

R

R

R

Monday, April 8, 13

Page 118: Syntax Analysis (parsing) · Lexical Analysis Syntax Analysis Semantic Analysis IR Generation IR Optimization Code Generation Optimization Source Code Machine Code Flex-pert Monday,

R → S | R “|” S

S → T | ST

T → U | T*

U → a | b | c | …

U → “ε”

U → (R)

a b | c | a *

TT

S

S

S

S

R

R

R

Monday, April 8, 13

Page 119: Syntax Analysis (parsing) · Lexical Analysis Syntax Analysis Semantic Analysis IR Generation IR Optimization Code Generation Optimization Source Code Machine Code Flex-pert Monday,

R → S | R “|” S

S → T | ST

T → U | T*

U → a | b | c | …

U → “ε”

U → (R)

a b | c | a *

U

TT

S

S

S

S

R

R

R

Monday, April 8, 13

Page 120: Syntax Analysis (parsing) · Lexical Analysis Syntax Analysis Semantic Analysis IR Generation IR Optimization Code Generation Optimization Source Code Machine Code Flex-pert Monday,

R → S | R “|” S

S → T | ST

T → U | T*

U → a | b | c | …

U → “ε”

U → (R)

a b | c | a *

U

TT

S

S

S

S

R

R

R

Monday, April 8, 13

Page 121: Syntax Analysis (parsing) · Lexical Analysis Syntax Analysis Semantic Analysis IR Generation IR Optimization Code Generation Optimization Source Code Machine Code Flex-pert Monday,

R → S | R “|” S

S → T | ST

T → U | T*

U → a | b | c | …

U → “ε”

U → (R)

a b | c | a *

U U

TT

S

S

S

S

R

R

R

Monday, April 8, 13

Page 122: Syntax Analysis (parsing) · Lexical Analysis Syntax Analysis Semantic Analysis IR Generation IR Optimization Code Generation Optimization Source Code Machine Code Flex-pert Monday,

R → S | R “|” S

S → T | ST

T → U | T*

U → a | b | c | …

U → “ε”

U → (R)

a b | c | a *

U U

TT

S

S

S

S

R

R

R

Monday, April 8, 13

Page 123: Syntax Analysis (parsing) · Lexical Analysis Syntax Analysis Semantic Analysis IR Generation IR Optimization Code Generation Optimization Source Code Machine Code Flex-pert Monday,

R → S | R “|” S

S → T | ST

T → U | T*

U → a | b | c | …

U → “ε”

U → (R)

a b | c | a *

U U

TTT

S

S

S

S

R

R

R

Monday, April 8, 13

Page 124: Syntax Analysis (parsing) · Lexical Analysis Syntax Analysis Semantic Analysis IR Generation IR Optimization Code Generation Optimization Source Code Machine Code Flex-pert Monday,

R → S | R “|” S

S → T | ST

T → U | T*

U → a | b | c | …

U → “ε”

U → (R)

a b | c | a *

U U U

TTT

S

S

S

S

R

R

R

Monday, April 8, 13

Page 125: Syntax Analysis (parsing) · Lexical Analysis Syntax Analysis Semantic Analysis IR Generation IR Optimization Code Generation Optimization Source Code Machine Code Flex-pert Monday,

R → S | R “|” S

S → T | ST

T → U | T*

U → a | b | c | …

U → “ε”

U → (R)

a b | c | a *

U U U

TTT

S

S

S

S

R

R

R

Monday, April 8, 13

Page 126: Syntax Analysis (parsing) · Lexical Analysis Syntax Analysis Semantic Analysis IR Generation IR Optimization Code Generation Optimization Source Code Machine Code Flex-pert Monday,

R → S | R “|” S

S → T | ST

T → U | T*

U → a | b | c | …

U → “ε”

U → (R)

a b | c | a *

U U U

TTT

S

S

S

S

R

R

R

T

Monday, April 8, 13

Page 127: Syntax Analysis (parsing) · Lexical Analysis Syntax Analysis Semantic Analysis IR Generation IR Optimization Code Generation Optimization Source Code Machine Code Flex-pert Monday,

R → S | R “|” S

S → T | ST

T → U | T*

U → a | b | c | …

U → “ε”

U → (R)

a b | c | a *

U U U

TTTT

S

S

S

S

R

R

R

T

Monday, April 8, 13

Page 128: Syntax Analysis (parsing) · Lexical Analysis Syntax Analysis Semantic Analysis IR Generation IR Optimization Code Generation Optimization Source Code Machine Code Flex-pert Monday,

R → S | R “|” S

S → T | ST

T → U | T*

U → a | b | c | …

U → “ε”

U → (R)

a b | c | a *

U U U U

TTTT

S

S

S

S

R

R

R

T

Monday, April 8, 13

Page 129: Syntax Analysis (parsing) · Lexical Analysis Syntax Analysis Semantic Analysis IR Generation IR Optimization Code Generation Optimization Source Code Machine Code Flex-pert Monday,

R → S | R “|” S

S → T | ST

T → U | T*

U → a | b | c | …

U → “ε”

U → (R)

a b | c | a *

U U U U

TTTT

S

S

S

S

R

R

R

T

Monday, April 8, 13

Page 130: Syntax Analysis (parsing) · Lexical Analysis Syntax Analysis Semantic Analysis IR Generation IR Optimization Code Generation Optimization Source Code Machine Code Flex-pert Monday,

Precedence Declarations

● If we leave the world of pure CFGs, we can often resolve ambiguities through precedence declarations.

● e.g. multiplication has higher precedence than addition, but lower precedence than exponentiation.

● Allows for unambiguous parsing of ambiguous grammars.

● We'll see how this is implemented later on.

Monday, April 8, 13

Page 131: Syntax Analysis (parsing) · Lexical Analysis Syntax Analysis Semantic Analysis IR Generation IR Optimization Code Generation Optimization Source Code Machine Code Flex-pert Monday,

R → S | R “|” S

S → T | ST

T → U | T*

U → a | b | …

U → “ε”

U → (R)

a a | b *

The Structure of a Parse Tree

Monday, April 8, 13

Page 132: Syntax Analysis (parsing) · Lexical Analysis Syntax Analysis Semantic Analysis IR Generation IR Optimization Code Generation Optimization Source Code Machine Code Flex-pert Monday,

R → S | R “|” S

S → T | ST

T → U | T*

U → a | b | …

U → “ε”

U → (R)

a a | b *

U

U

U

T

T

T

S

S

R S

R

The Structure of a Parse Tree

Monday, April 8, 13

Page 133: Syntax Analysis (parsing) · Lexical Analysis Syntax Analysis Semantic Analysis IR Generation IR Optimization Code Generation Optimization Source Code Machine Code Flex-pert Monday,

a a | b *

U

U

U

T

T

T

S

S

R S

R

The Structure of a Parse Tree

a a b

concat

*

orR → S | R “|” S

S → T | ST

T → U | T*

U → a | b | …

U → “ε”

U → (R)

a a | b *

Monday, April 8, 13

Page 134: Syntax Analysis (parsing) · Lexical Analysis Syntax Analysis Semantic Analysis IR Generation IR Optimization Code Generation Optimization Source Code Machine Code Flex-pert Monday,

a ( b | c )

R → S | R “|” S

S → T | ST

T → U | T*

U → a | b | …

U → “ε”

U → (R)

Monday, April 8, 13

Page 135: Syntax Analysis (parsing) · Lexical Analysis Syntax Analysis Semantic Analysis IR Generation IR Optimization Code Generation Optimization Source Code Machine Code Flex-pert Monday,

a ( b | c )

U U U

T TT

SS S

R

R

U

T

S

R

R → S | R “|” S

S → T | ST

T → U | T*

U → a | b | …

U → “ε”

U → (R)

Monday, April 8, 13

Page 136: Syntax Analysis (parsing) · Lexical Analysis Syntax Analysis Semantic Analysis IR Generation IR Optimization Code Generation Optimization Source Code Machine Code Flex-pert Monday,

a ( b | c )

U U U

T TT

SS S

R

R

U

T

S

R

a

b c

or

concat

R → S | R “|” S

S → T | ST

T → U | T*

U → a | b | …

U → “ε”

U → (R)

Monday, April 8, 13

Page 137: Syntax Analysis (parsing) · Lexical Analysis Syntax Analysis Semantic Analysis IR Generation IR Optimization Code Generation Optimization Source Code Machine Code Flex-pert Monday,

If#Then#Else+•  How+do+we+write+a+grammar+for+if+statements?+

•  ! S!→if+(E)+S!•  ! S!→if+(E)+S!else+S!•  ! S!→X!=!E!+

•  Is+this+grammar+OK?+

Monday, April 8, 13

Page 138: Syntax Analysis (parsing) · Lexical Analysis Syntax Analysis Semantic Analysis IR Generation IR Optimization Code Generation Optimization Source Code Machine Code Flex-pert Monday,

No!$Ambiguous$•  $ if$(E1)$if$(E2)$S1$else$S2$

•  # S#→if$(E)$S#•  $ $$$→if$(E)$if$(E)$S#else$S#

•  # S#→if$(E)$S#else$S#•  $ $$$→if$(E)$if$(E)$S#else$S#

•  Which#�if�#is#the#�else�#a.ached#to?$

S �if (E) S S �if (E) S else S S �other

if

E1 if

E2 S1 S2

if

E1 if

E2 S1

S2

Monday, April 8, 13

Page 139: Syntax Analysis (parsing) · Lexical Analysis Syntax Analysis Semantic Analysis IR Generation IR Optimization Code Generation Optimization Source Code Machine Code Flex-pert Monday,

Grammar for closest-if rule

• Want to rule out: if (E) if (E) S else S

• Problem: unmatched if may not occur as the “then” (consequent) clause of a containing “if”

statement →matched | unmatched

matched →if (E) matched else matched | other

unmatched →if (E) statement |

if (E) matched else unmatched

Monday, April 8, 13

Page 140: Syntax Analysis (parsing) · Lexical Analysis Syntax Analysis Semantic Analysis IR Generation IR Optimization Code Generation Optimization Source Code Machine Code Flex-pert Monday,

Abstract Syntax Trees (ASTs)

● A parse tree is a concrete syntax tree; it shows exactly how the text was derived.

● A more useful structure is an abstract syntax tree, which retains only the essential structure of the input.

Monday, April 8, 13

Page 141: Syntax Analysis (parsing) · Lexical Analysis Syntax Analysis Semantic Analysis IR Generation IR Optimization Code Generation Optimization Source Code Machine Code Flex-pert Monday,

Parse&Tree&vs.&AST&•  Parse&Tree,&aka&

concrete&syntax&S

E S +

S E ) (

E S + 5

1 E S +

2 E

S ) (

E S +

3 E

4

+

+ 5

1 +

2 +

3 4

Abstract Syntax Tree

Discards/abstracts unneeded information

Monday, April 8, 13

Page 142: Syntax Analysis (parsing) · Lexical Analysis Syntax Analysis Semantic Analysis IR Generation IR Optimization Code Generation Optimization Source Code Machine Code Flex-pert Monday,

Summary

● Syntax analysis (parsing) extracts the structure from the tokens produced by the scanner.

● Languages are usually specified by context-free grammars (CFGs).

● A parse tree shows how a string can be derived from a grammar.

● A grammar is ambiguous if it can derive the same string multiple ways.

● There is no algorithm for eliminating ambiguity; it must be done by hand.

● Abstract syntax trees (ASTs) contain an abstract representation of a program's syntax.

● Semantic actions associated with productions can be used to build ASTs.

Monday, April 8, 13