CSCI 565 - Compiler Design Spring 2011 The Front End ...€¢Separating syntax analysis into lexical...
Transcript of CSCI 565 - Compiler Design Spring 2011 The Front End ...€¢Separating syntax analysis into lexical...
1
Spring 2011CSCI 565 - Compiler Design
Pedro [email protected]
Syntactic Analysis
Introduction
Copyright 2011, Pedro C. Diniz, all rights reserved.Students enrolled in Compilers class at University of Southern California (USC)have explicit permission to make copies of these materials for their personal use.
2
Spring 2011CSCI 565 - Compiler Design
Pedro [email protected]
The Front End
Parser• Checks the stream of words and their parts of speech (produced
by the scanner) for grammatical correctness• Determines if the input is syntactically well formed• Guides checking at deeper levels than syntax• Builds an IR representation of the code
Sourcecode Scanner
IRParser
Errors
tokens
3
Spring 2011CSCI 565 - Compiler Design
Pedro [email protected]
The Study of ParsingThe process of discovering a derivation for some sentence• Need a mathematical model of syntax — a grammar G
• Need an algorithm for testing membership in L(G)• Need to keep in mind that our goal is building parsers, not
studying the mathematics of arbitrary languages
4
Spring 2011CSCI 565 - Compiler Design
Pedro [email protected]
Outline
• Overview of Lexical Analysis• What is Syntax Analysis?
• Context-Free Grammars• Derivation and Parse Trees• Top-down vs. Bottom-up Parsing
• Ambiguous Grammars• Implementing a Parser
5
Spring 2011CSCI 565 - Compiler Design
Pedro [email protected]
Anatomy of a Compiler
Intermediate Code Optimizer
Code Generator
Optimized Intermediate Representation
Assembly code
Intermediate Code Generator
Intermediate Representation
Lexical Analyzer (Scanner)
Syntax Analyzer (Parser)
Token Stream
Parse Tree
Program (character stream)
6
Spring 2011CSCI 565 - Compiler Design
Pedro [email protected]
Overview of Lexical Analysis
• Lexical Analyzer Creates Tokens out of a Text Stream
• Tokens are defined using Regular Expressions
2
7
Spring 2011CSCI 565 - Compiler Design
Pedro [email protected]
Reg. Expr., Grammars and Languages
• A regular expression can be written using only:– characters in the alphabet– regular expression operators: ‘*’ ‘·’ ‘|’ ‘+’ ‘?’ ‘(‘ ‘)’– Example:
(-| ε) ·(0|1|2|3|4|5|6|7|8|9)+ · (. ·(0|1|2|3|4|5|6|7|8|9)*)?
• Regular language is a language defined by a regularexpression
8
Spring 2011CSCI 565 - Compiler Design
Pedro [email protected]
Reg. Expr., Grammars and Languages
• What about symbolic variables?– Example:
num = 0|1|2|3|4|5|6|7|8|9posint = num · num*int = (ε | -) · posintreal = int · (ε | (. · posint))
• They are just shorthand or “syntactic sugar”– Example:
(-| ε) ·(0|1|2|3|4|5|6|7|8|9)+ · (. ·(0|1|2|3|4|5|6|7|8|9)*)?
9
Spring 2011CSCI 565 - Compiler Design
Pedro [email protected]
Outline
• Overview of Lexical Analysis• What is Syntactic Analysis?
• Context-Free Grammars• Derivation and Parse Trees• Top-down vs. Bottom-up Parsing
• Ambiguous Grammars• Implementing a Parser
10
Spring 2011CSCI 565 - Compiler Design
Pedro [email protected]
Syntax and Semantics of aProgramming Language?
• Syntax– How a program looks like– Textual representation or structure– A precise mathematical definition is possible
• Semantics– What is the meaning of a program– Harder to give a mathematical definition
11
Spring 2011CSCI 565 - Compiler Design
Pedro [email protected]
Why do Syntax Analysis?
• Can provide a precise, easy-to-understand definition• A proper grammar imparts a structure into a
programming language• Can automatically construct a parser that can
determine if the program is syntactically well formed
• Helps in the translation process• Easy to modify/add to the language
12
Spring 2011CSCI 565 - Compiler Design
Pedro [email protected]
Anatomy of a Compiler
Intermediate Code Optimizer
Code Generator
Optimized Intermediate Representation
Assembly code
Intermediate Code Generator
Intermediate Representation
Syntactic Analyzer (Parser)
Lexical Analyzer (Scanner)Token Stream
Parse Tree
Program (character stream)
3
13
Spring 2011CSCI 565 - Compiler Design
Pedro [email protected]
Input to and Output of a Parser
-
( )
123.3 23.6+
minus_op
left_paran_op
num(123.3)
plus_op
num(23.6)
right_paran_op
Token Stream Parse Tree
Input: - (123.3 + 23.6)Sy
ntac
tic A
naly
zer (
Pars
er)
14
Spring 2011CSCI 565 - Compiler Design
Pedro [email protected]
Syntax Definition
• We need to provide a precise, easy-to-understanddefinition of the syntax of a programming language
• Can we use Regular Expressions?– Can we use regular language to describe a programming language?
15
Spring 2011CSCI 565 - Compiler Design
Pedro [email protected]
Example: Hierarchical Scope
procedure foo(integer m, integer n, integer j) { for i = 1 to n do {
if (i == j) {j = j + 1;m = i*j;
} for k = i to n { m = m + k;
} }}
16
Spring 2011CSCI 565 - Compiler Design
Pedro [email protected]
Example: Hierarchical Scopeprocedure foo(integer m, integer n, integer j) { for i = 1 to n do {
if (i == j) {j = j + 1;m = i*j;
} for k = i to n { m = m + k;
} }}
• Balanced Parentheses Problem– Example: {{}{{{}{{}}}}}
17
Spring 2011CSCI 565 - Compiler Design
Pedro [email protected]
Balanced Parentheses Problem
• Can we define this using a Regular Expression?
18
Spring 2011CSCI 565 - Compiler Design
Pedro [email protected]
Balanced Parentheses Problem
• Can we define this using a Regular Expression?
NO!
4
19
Spring 2011CSCI 565 - Compiler Design
Pedro [email protected]
Balanced Parentheses Problem• Can we define this using a Regular Expression?
NO!
• Intution:– Number of open parentheses should match the close parentheses– Need to keep tab of the count
or need recursion– Also: NFA’s and DFA’s cannot perform unbounded counting
20
Spring 2011CSCI 565 - Compiler Design
Pedro [email protected]
Balanced Parentheses Problem
• Is there a grammar that can define this?
<S> → ( <S> ) <S> | ε
• The definition is Recursive
• This is a Context-Free Grammar– It is more expressive than Regular Expressions
21
Spring 2011CSCI 565 - Compiler Design
Pedro [email protected]
Outline
• Overview of Lexical Analysis• What is Syntactic Analysis?• Context-Free Grammars
• Derivation and Parse Trees• Top-down vs. Bottom-up Parsing• Ambiguous Grammars
• Implementing a Parser
22
Spring 2011CSCI 565 - Compiler Design
Pedro [email protected]
Defining Context-Free Grammars(CFGs)
Formally, a grammar is a four tuple, G = (S,N,T,P)
• Start symbol S (set of strings in L(G))– A special nonterminal is designated
• Nonterminals N (syntactic variables)– Syntactic variables
• Terminals T (words)– Symbols for strings or tokens
• Productions P (P : N → (N ∪ T)+ )– The manner in which terminals and nonterminals are combined to form strings– A nonterminal in LHS and a string of terminals and non-terminals in RHS
23
Spring 2011CSCI 565 - Compiler Design
Pedro [email protected]
Example of a CFG
<S> → ( <S> ) <S> | ε
24
Spring 2011CSCI 565 - Compiler Design
Pedro [email protected]
Example of a CFG<S> → ( <S> ) <S>
<S> → ε
5
25
Spring 2011CSCI 565 - Compiler Design
Pedro [email protected]
<S> → ( <S> ) <S>
<S> → ε Terminals
Example of a CFG
26
Spring 2011CSCI 565 - Compiler Design
Pedro [email protected]
Example of a CFG<S> → ( <S> ) <S>
<S> → ε Nonterminals
27
Spring 2011CSCI 565 - Compiler Design
Pedro [email protected]
Example of a CFG<S> → ( <S> ) <S>
<S> → εStart Symbol: <S>
28
Spring 2011CSCI 565 - Compiler Design
Pedro [email protected]
<S> → ( <S> ) <S>
<S> → εProductions
Example of a CFG
29
Spring 2011CSCI 565 - Compiler Design
Pedro [email protected]
Regular Languages a Subset of CFL
aRegular Expression Context-Free Grammar
<A> → a
p · q <S> → <P> <Q>If p and q are regular expressions with CFGs <P> and <Q>
p | q <S> → <P><S> → <Q>
p * <S> → <S> <P><S> → ε
30
Spring 2011CSCI 565 - Compiler Design
Pedro [email protected]
Why use Regular Expressions?
• Separating syntax analysis into lexical and non-lexicalparts is a nice modularization boundary
• Lexical rules are simple, can be expressed usingregular expressions
• Regular Expressions are more concise
• Lexical Analyzer implementations for RegularExpressions are more efficient
6
31
Spring 2011CSCI 565 - Compiler Design
Pedro [email protected]
Creating a CFG
• We need to create a CFG from a given languagedefinitions
• There are many issues involved– We’ll address some of them in this class
• Lets look at a simple language
32
Spring 2011CSCI 565 - Compiler Design
Pedro [email protected]
Example: A CFG for expressions• Simple arithmetic expressions with + and *
– 8.2 + 35.6– 8.32 + 86 * 45.3– (6.001 + 6.004) * (6.035 * -(6.042 + 6.046))
• Terminals (or tokens)– num for all the numbers– plus_op (‘+’), minus_op (‘-’), times_op(‘*’), left_paren_op(‘(‘),
right_paran_op(‘)’)
• What is the grammar for all possible expressions?
33
Spring 2011CSCI 565 - Compiler Design
Pedro [email protected]
Example: A CFG for expressions
<expr> → <expr> <op> <expr><expr> → ( <expr> )<expr> → - <expr><expr> → num<op> → +<op> → *
34
Spring 2011CSCI 565 - Compiler Design
Pedro [email protected]
Example: A CFG for expressions
<expr> → <expr> <op> <expr><expr> → ( <expr> )<expr> → - <expr><expr> → num<op> → +<op> → *
Terminals
35
Spring 2011CSCI 565 - Compiler Design
Pedro [email protected]
<expr> → <expr> <op> <expr><expr> → ( <expr> )<expr> → - <expr><expr> → num<op> → +<op> → *
Nonterminals
Example: A CFG for expressions
36
Spring 2011CSCI 565 - Compiler Design
Pedro [email protected]
Example: A CFG for expressions
<expr> → <expr> <op> <expr><expr> → ( <expr> )<expr> → - <expr><expr> → num<op> → +<op> → *
Start Symbol:<expr>
7
37
Spring 2011CSCI 565 - Compiler Design
Pedro [email protected]
Example: A CFG for expressions<expr> → <expr> <op> <expr><expr> → ( <expr> )<expr> → - <expr><expr> → num<op> → +<op> → *
Productions
38
Spring 2011CSCI 565 - Compiler Design
Pedro [email protected]
Example: A CFG for expressions<expr> → <expr> <op> <expr><expr> → ( <expr> )<expr> → - <expr><expr> → num<op> → +<op> → *
39
Spring 2011CSCI 565 - Compiler Design
Pedro [email protected]
Example: A CFG for expressions<expr> → <expr> <op> <expr> | ( <expr> )
| - <expr> | num
<op> → + | *
40
Spring 2011CSCI 565 - Compiler Design
Pedro [email protected]
What is the language defined by thisCFG?
<S> → a<S>a | aa
41
Spring 2011CSCI 565 - Compiler Design
Pedro [email protected]
Outline
• Overview of Lexical Analysis
• What is Syntactic Analysis?• Context-Free Grammars• Derivation and Parse Trees
• Top-down vs. Bottom-up Parsing• Ambiguous Grammars• Implementing a Parser
42
Spring 2011CSCI 565 - Compiler Design
Pedro [email protected]
Derivation
• How do we show that a sequence of tokens isaccepted by a CFG?
• Productions are used to derive a sequence oftokens from the start symbol
• For a given strings α,β and γ and a productionA → βA single step of derivation is αAγ ⇒ αβγ
8
43
Spring 2011CSCI 565 - Compiler Design
Pedro [email protected]
Example Derivation• Grammar
<expr> → <expr><op><expr> | (<expr>) | -<expr> | num
<op> → + | *
• Input 36 * ( 8 + 23.4)
• Token Streamnum ‘*’ ‘(‘ num ‘+’ num ‘)’
44
Spring 2011CSCI 565 - Compiler Design
Pedro [email protected]
Example Derivation
<expr>
num ‘*’ ‘(‘ num ‘+’ num ‘)’
45
Spring 2011CSCI 565 - Compiler Design
Pedro [email protected]
Example Derivation
<expr>
num ‘*’ ‘(‘ num ‘+’ num ‘)’46
Spring 2011CSCI 565 - Compiler Design
Pedro [email protected]
Example Derivation
<expr>
num ‘*’ ‘(‘ num ‘+’ num ‘)’
<expr> → <expr><op><expr>
47
Spring 2011CSCI 565 - Compiler Design
Pedro [email protected]
Example Derivation
<expr> ⇒ <expr> <op> <expr>
num ‘*’ ‘(‘ num ‘+’ num ‘)’
<expr> → <expr><op><expr>
48
Spring 2011CSCI 565 - Compiler Design
Pedro [email protected]
Example Derivation
<expr> ⇒ <expr> <op> <expr>
num ‘*’ ‘(‘ num ‘+’ num ‘)’
9
49
Spring 2011CSCI 565 - Compiler Design
Pedro [email protected]
Example Derivation
<expr> ⇒ <expr> <op> <expr>
num ‘*’ ‘(‘ num ‘+’ num ‘)’50
Spring 2011CSCI 565 - Compiler Design
Pedro [email protected]
Example Derivation
<expr> ⇒ <expr> <op> <expr>
num ‘*’ ‘(‘ num ‘+’ num ‘)’
<expr> → num
51
Spring 2011CSCI 565 - Compiler Design
Pedro [email protected]
Example Derivation
<expr> ⇒ <expr> <op> <expr>⇒ num <op> <expr>
num ‘*’ ‘(‘ num ‘+’ num ‘)’
<expr> → num
52
Spring 2011CSCI 565 - Compiler Design
Pedro [email protected]
<expr> ⇒ <expr> <op> <expr>⇒ num <op> <expr>
num ‘*’ ‘(‘ num ‘+’ num ‘)’
Example Derivation
53
Spring 2011CSCI 565 - Compiler Design
Pedro [email protected]
<expr> ⇒ <expr> <op> <expr>⇒ num <op> <expr>
num ‘*’ ‘(‘ num ‘+’ num ‘)’
Example Derivation
54
Spring 2011CSCI 565 - Compiler Design
Pedro [email protected]
<expr> ⇒ <expr> <op> <expr>⇒ num <op> <expr>
num ‘*’ ‘(‘ num ‘+’ num ‘)’
<op> → *Example Derivation
10
55
Spring 2011CSCI 565 - Compiler Design
Pedro [email protected]
<expr> ⇒ <expr> <op> <expr>⇒ num <op> <expr>⇒ num ‘*’ <expr>
num ‘*’ ‘(‘ num ‘+’ num ‘)’
<op> → *Example Derivation
56
Spring 2011CSCI 565 - Compiler Design
Pedro [email protected]
<expr> ⇒ <expr> <op> <expr>⇒ num <op> <expr>⇒ num ‘*’ <expr>
num ‘*’ ‘(‘ num ‘+’ num ‘)’
Example Derivation
57
Spring 2011CSCI 565 - Compiler Design
Pedro [email protected]
<expr> ⇒ <expr> <op> <expr>⇒ num <op> <expr>⇒ num ‘*’ <expr>
num ‘*’ ‘(‘ num ‘+’ num ‘)’
Example Derivation
58
Spring 2011CSCI 565 - Compiler Design
Pedro [email protected]
<expr> ⇒ <expr> <op> <expr>⇒ num <op> <expr>⇒ num ‘*’ <expr>
num ‘*’ ‘(‘ num ‘+’ num ‘)’
<expr> → (<expr>)Example Derivation
59
Spring 2011CSCI 565 - Compiler Design
Pedro [email protected]
<expr> ⇒ <expr> <op> <expr>⇒ num <op> <expr>⇒ num ‘*’ <expr>⇒ num ‘*’ ‘(‘ <expr> ‘)’
num ‘*’ ‘(‘ num ‘+’ num ‘)’
<expr> → (<expr>)Example Derivation
60
Spring 2011CSCI 565 - Compiler Design
Pedro [email protected]
<expr> ⇒ <expr> <op> <expr>⇒ num <op> <expr>⇒ num ‘*’ <expr>⇒ num ‘*’ ‘(‘ <expr> ‘)’
num ‘*’ ‘(‘ num ‘+’ num ‘)’
Example Derivation
11
61
Spring 2011CSCI 565 - Compiler Design
Pedro [email protected]
<expr> ⇒ <expr> <op> <expr>⇒ num <op> <expr>⇒ num ‘*’ <expr>⇒ num ‘*’ ‘(‘ <expr> ‘)’
num ‘*’ ‘(‘ num ‘+’ num ‘)’
Example Derivation
62
Spring 2011CSCI 565 - Compiler Design
Pedro [email protected]
<expr> ⇒ <expr> <op> <expr>⇒ num <op> <expr>⇒ num ‘*’ <expr>⇒ num ‘*’ ‘(‘ <expr> ‘)’
num ‘*’ ‘(‘ num ‘+’ num ‘)’
<expr> → <expr><op><expr>Example Derivation
63
Spring 2011CSCI 565 - Compiler Design
Pedro [email protected]
<expr> ⇒ <expr> <op> <expr>⇒ num <op> <expr>⇒ num ‘*’ <expr>⇒ num ‘*’ ‘(‘ <expr> ‘)’⇒ num ‘*’ ‘(‘ <expr> <op> <expr> ‘)’
num ‘*’ ‘(‘ num ‘+’ num ‘)’
<expr> → <expr><op><expr>Example Derivation
64
Spring 2011CSCI 565 - Compiler Design
Pedro [email protected]
<expr> ⇒ <expr> <op> <expr>⇒ num <op> <expr>⇒ num ‘*’ <expr>⇒ num ‘*’ ‘(‘ <expr> ‘)’⇒ num ‘*’ ‘(‘ <expr> <op> <expr> ‘)’
num ‘*’ ‘(‘ num ‘+’ num ‘)’
Example Derivation
65
Spring 2011CSCI 565 - Compiler Design
Pedro [email protected]
<expr> ⇒ <expr> <op> <expr>⇒ num <op> <expr>⇒ num ‘*’ <expr>⇒ num ‘*’ ‘(‘ <expr> ‘)’⇒ num ‘*’ ‘(‘ <expr> <op> <expr> ‘)’
num ‘*’ ‘(‘ num ‘+’ num ‘)’
Example Derivation
66
Spring 2011CSCI 565 - Compiler Design
Pedro [email protected]
<expr> ⇒ <expr> <op> <expr>⇒ num <op> <expr>⇒ num ‘*’ <expr>⇒ num ‘*’ ‘(‘ <expr> ‘)’⇒ num ‘*’ ‘(‘ <expr> <op> <expr> ‘)’
num ‘*’ ‘(‘ num ‘+’ num ‘)’
<expr> → numExample Derivation
12
67
Spring 2011CSCI 565 - Compiler Design
Pedro [email protected]
<expr> ⇒ <expr> <op> <expr>⇒ num <op> <expr>⇒ num ‘*’ <expr>⇒ num ‘*’ ‘(‘ <expr> ‘)’⇒ num ‘*’ ‘(‘ <expr> <op> <expr> ‘)’⇒ num ‘*’ ‘(‘ num <op> <expr> ‘)’
num ‘*’ ‘(‘ num ‘+’ num ‘)’
<expr> → numExample Derivation
68
Spring 2011CSCI 565 - Compiler Design
Pedro [email protected]
<expr> ⇒ <expr> <op> <expr>⇒ num <op> <expr>⇒ num ‘*’ <expr>⇒ num ‘*’ ‘(‘ <expr> ‘)’⇒ num ‘*’ ‘(‘ <expr> <op> <expr> ‘)’⇒ num ‘*’ ‘(‘ num <op> <expr> ‘)’
num ‘*’ ‘(‘ num ‘+’ num ‘)’
Example Derivation
69
Spring 2011CSCI 565 - Compiler Design
Pedro [email protected]
<expr> ⇒ <expr> <op> <expr>⇒ num <op> <expr>⇒ num ‘*’ <expr>⇒ num ‘*’ ‘(‘ <expr> ‘)’⇒ num ‘*’ ‘(‘ <expr> <op> <expr> ‘)’‘)’⇒ num ‘*’ ‘(‘ num <op> <expr> ‘)’
num ‘*’ ‘(‘ num ‘+’ num ‘)’
Example Derivation
70
Spring 2011CSCI 565 - Compiler Design
Pedro [email protected]
<expr> ⇒ <expr> <op> <expr>⇒ num <op> <expr>⇒ num ‘*’ <expr>⇒ num ‘*’ ‘(‘ <expr> ‘)’⇒ num ‘*’ ‘(‘ <expr> <op> <expr> ‘)’⇒ num ‘*’ ‘(‘ num <op> <expr> ‘)’
num ‘*’ ‘(‘ num ‘+’ num ‘)’
<op> → +Example Derivation
71
Spring 2011CSCI 565 - Compiler Design
Pedro [email protected]
<expr> ⇒ <expr> <op> <expr>⇒ num <op> <expr>⇒ num ‘*’ <expr>⇒ num ‘*’ ‘(‘ <expr> ‘)’⇒ num ‘*’ ‘(‘ <expr> <op> <expr> ‘)’⇒ num ‘*’ ‘(‘ num <op> <expr> ‘)’⇒ num ‘*’ ‘(‘ num ‘+’ <expr> ‘)’
num ‘*’ ‘(‘ num ‘+’ num ‘)’
<op> → +Example Derivation
72
Spring 2011CSCI 565 - Compiler Design
Pedro [email protected]
<expr> ⇒ <expr> <op> <expr>⇒ num <op> <expr>⇒ num ‘*’ <expr>⇒ num ‘*’ ‘(‘ <expr> ‘)’⇒ num ‘*’ ‘(‘ <expr> <op> <expr> ‘)’⇒ num ‘*’ ‘(‘ num <op> <expr> ‘)’⇒ num ‘*’ ‘(‘ num ‘+’ <expr> ‘)’
num ‘*’ ‘(‘ num ‘+’ num ‘)’
Example Derivation
13
73
Spring 2011CSCI 565 - Compiler Design
Pedro [email protected]
<expr> ⇒ <expr> <op> <expr>⇒ num <op> <expr>⇒ num ‘*’ <expr>⇒ num ‘*’ ‘(‘ <expr> ‘)’⇒ num ‘*’ ‘(‘ <expr> <op> <expr> ‘)’⇒ num ‘*’ ‘(‘ num <op> <expr> ‘)’⇒ num ‘*’ ‘(‘ num ‘+’ <expr> ‘)’
num ‘*’ ‘(‘ num ‘+’ num ‘)’
Example Derivation
74
Spring 2011CSCI 565 - Compiler Design
Pedro [email protected]
<expr> ⇒ <expr> <op> <expr>⇒ num <op> <expr>⇒ num ‘*’ <expr>⇒ num ‘*’ ‘(‘ <expr> ‘)’⇒ num ‘*’ ‘(‘ <expr> <op> <expr> ‘)’⇒ num ‘*’ ‘(‘ num <op> <expr> ‘)’⇒ num ‘*’ ‘(‘ num ‘+’ <expr> ‘)’
num ‘*’ ‘(‘ num ‘+’ num ‘)’
<expr> → numExample Derivation
75
Spring 2011CSCI 565 - Compiler Design
Pedro [email protected]
<expr> ⇒ <expr> <op> <expr>⇒ num <op> <expr>⇒ num ‘*’ <expr>⇒ num ‘*’ ‘(‘ <expr> ‘)’⇒ num ‘*’ ‘(‘ <expr> <op> <expr> ‘)’⇒ num ‘*’ ‘(‘ num <op> <expr> ‘)’⇒ num ‘*’ ‘(‘ num ‘+’ <expr> ‘)’⇒ num ‘*’ ‘(‘ num ‘+’ num ‘)’
num ‘*’ ‘(‘ num ‘+’ num ‘)’
<expr> → numExample Derivation
76
Spring 2011CSCI 565 - Compiler Design
Pedro [email protected]
<expr> ⇒ <expr> <op> <expr>⇒ num <op> <expr>⇒ num ‘*’ <expr>⇒ num ‘*’ ‘(‘ <expr> ‘)’⇒ num ‘*’ ‘(‘ <expr> <op> <expr> ‘)’⇒ num ‘*’ ‘(‘ num <op> <expr> ‘)’⇒ num ‘*’ ‘(‘ num ‘+’ <expr> ‘)’⇒ num ‘*’ ‘(‘ num ‘+’ num ‘)’
num ‘*’ ‘(‘ num ‘+’ num ‘)’
Example Derivation
77
Spring 2011CSCI 565 - Compiler Design
Pedro [email protected]
Parse Tree
• Graphical Representation of the Parsed Structure
• Shows the Sequence of Derivations Performed– Internal Nodes are Non-Terminals– Leaves are Terminals– Each Parent Node is LHS and the Children are RHS of a Production
78
Spring 2011CSCI 565 - Compiler Design
Pedro [email protected]
Parse Tree Example
<expr>
14
79
Spring 2011CSCI 565 - Compiler Design
Pedro [email protected]
<expr> ⇒ <expr> <op> <expr>
<expr>
<expr><expr> <op>
Parse Tree Example
80
Spring 2011CSCI 565 - Compiler Design
Pedro [email protected]
<expr> ⇒ num
num
<expr>
<expr><expr> <op>
Parse Tree Example
81
Spring 2011CSCI 565 - Compiler Design
Pedro [email protected]
<op> ⇒ ‘*’
*num
<expr>
<expr><expr> <op>
Parse Tree Example
82
Spring 2011CSCI 565 - Compiler Design
Pedro [email protected]
<expr> ⇒ ‘(‘ <expr> ‘)’
( )*num
<expr>
<expr><expr>
<expr>
<op>
Parse Tree Example
83
Spring 2011CSCI 565 - Compiler Design
Pedro [email protected]
<expr> ⇒ <expr> <op> <expr>
( )*num
<expr>
<expr><expr>
<expr>
<expr>
<expr>
<op>
<op>
Parse Tree Example
84
Spring 2011CSCI 565 - Compiler Design
Pedro [email protected]
<expr> ⇒ num
num
( )*num
<expr>
<expr><expr>
<expr>
<expr>
<expr>
<op>
<op>
Parse Tree Example
15
85
Spring 2011CSCI 565 - Compiler Design
Pedro [email protected]
<op> ⇒ ‘+’
num +
( )*num
<expr>
<expr><expr>
<expr>
<expr>
<expr>
<op>
<op>
Parse Tree Example
86
Spring 2011CSCI 565 - Compiler Design
Pedro [email protected]
<expr> ⇒ num
num num+
( )*num
<expr>
<expr><expr>
<expr>
<expr>
<expr>
<op>
<op>
Parse Tree Example
87
Spring 2011CSCI 565 - Compiler Design
Pedro [email protected]
num num+
( )*num
<expr>
<expr><expr>
<expr>
<expr>
<expr>
<op>
<op>
num ‘*’ ‘(‘ num ‘+’ num ‘)’
Parse Tree Example
88
Spring 2011CSCI 565 - Compiler Design
Pedro [email protected]
• Overview of Lexical Analysis
• What is Syntactic Analysis?• Context-Free Grammars• Derivation and Parse Trees
• Top-down vs. Bottom-up Parsing• Ambiguous Grammars• Implementing a Parser
Outline
89
Spring 2011CSCI 565 - Compiler Design
Pedro [email protected]
• Leftmost Derivation– In the string, find the leftmost non-terminal and apply a production to it– Previous example was left-most derivation
• Rightmost Derivation– Find the right-most non-terminal and apply a production to it
Left-most vs. Right-most Derivation
90
Spring 2011CSCI 565 - Compiler Design
Pedro [email protected]
<expr>
Production:String: <expr>
Right-Derivation Example
16
91
Spring 2011CSCI 565 - Compiler Design
Pedro [email protected]
<expr>
<expr><expr> <op>
Production: <expr> ⇒ <expr> <op> <expr>String: <expr> <op> <expr>
Right-Derivation Example
92
Spring 2011CSCI 565 - Compiler Design
Pedro [email protected]
Right-Derivation Example
( )
<expr>
<expr><expr>
<expr>
<op>
Production: <expr> ⇒ ‘(‘ <expr> ‘)’String: <expr> <op> ‘(‘ <expr> ‘)’
93
Spring 2011CSCI 565 - Compiler Design
Pedro [email protected]
Right-Derivation Example
( )
<expr>
<expr><expr>
<expr>
<expr>
<expr>
<op>
<op>
Production: <expr> ⇒ <expr> <op> <expr>String: <expr> <op> ‘(‘ <expr> <op> <expr> ‘)’
94
Spring 2011CSCI 565 - Compiler Design
Pedro [email protected]
Right-Derivation Example
num
( )
<expr>
<expr><expr>
<expr>
<expr>
<expr>
<op>
<op>
Production: <expr> ⇒ numString: <expr> <op> ‘(‘ <expr> <op> num ‘)’
95
Spring 2011CSCI 565 - Compiler Design
Pedro [email protected]
Right-Derivation Example
num+
( )
<expr>
<expr><expr>
<expr>
<expr>
<expr>
<op>
<op>
Production: <op> ⇒ ‘+’String: <expr> <op> ‘(‘ <expr> ‘+’ num ‘)’
96
Spring 2011CSCI 565 - Compiler Design
Pedro [email protected]
Right-Derivation Example
num num+
( )
<expr>
<expr><expr>
<expr>
<expr>
<expr>
<op>
<op>
Production: <expr> ⇒ numString: <expr> <op> ‘(‘ num ‘+’ num ‘)’
17
97
Spring 2011CSCI 565 - Compiler Design
Pedro [email protected]
Right-Derivation Example
num num+
( )*
<expr>
<expr><expr>
<expr>
<expr>
<expr>
<op>
<op>
Production: <op> ⇒ ‘*’String: <expr> ‘*’ ‘(‘ num ‘+’ num ‘)’
98
Spring 2011CSCI 565 - Compiler Design
Pedro [email protected]
Right-Derivation Example
num num+
( )*num
<expr>
<expr><expr>
<expr>
<expr>
<expr>
<op>
<op>
Production: <expr> ⇒ numString: num ‘*’ ‘(‘ num ‘+’ num ‘)’
99
Spring 2011CSCI 565 - Compiler Design
Pedro [email protected]
Right-Derivation Example
num num+
( )*num
<expr>
<expr><expr>
<expr>
<expr>
<expr>
<op>
<op>
String: num ‘*’ ‘(‘ num ‘+’ num ‘)’
100
Spring 2011CSCI 565 - Compiler Design
Pedro [email protected]
Right-Derivation Example⇒ <expr>⇒ <expr> <op> <expr>⇒ <expr> <op> ‘(‘ <expr> ‘)’⇒ <expr> <op> ‘(‘ <expr> <op> <expr> ‘)’⇒ <expr> <op> ‘(‘ <expr> <op> num ‘)’⇒ <expr> <op> ‘(‘ <expr> ‘+’ num ‘)’’⇒ <expr> <op> ‘(‘ num ‘+’ num ‘)’⇒ <expr> ‘*’ ‘(‘ num ‘+’ num ‘)’⇒ num ‘*’ ‘(‘ num ‘+’ num ‘)’
101
Spring 2011CSCI 565 - Compiler Design
Pedro [email protected]
Top-down vs. Bottom-up Parsing• We normally scan from left to right
• Left-most derivation reflects top-down parsing– Start with the start symbol– End with the string of tokens
102
Spring 2011CSCI 565 - Compiler Design
Pedro [email protected]
Top-down Parsing• Left-most derivation
⇒ <expr> ⇒ <expr> <op> <expr>⇒ num <op> <expr>⇒ num ‘*’ <expr>⇒ num ‘*’ ‘(‘ <expr> ‘)’⇒ num ‘*’ ‘(‘ <expr> <op> <expr> ‘)’⇒ num ‘*’ ‘(‘ num <op> <expr> ‘)’⇒ num ‘*’ ‘(‘ num ‘+’ <expr> ‘)’⇒ num ‘*’ ‘(‘ num ‘+’ num ‘)’
18
103
Spring 2011CSCI 565 - Compiler Design
Pedro [email protected]
Top-down vs. Bottom-up Parsing
• We normally scan from left to right
• Left-most derivation reflects top-down parsing– Start with the start symbol– End with the string of tokens
• Right-most derivation reflects bottom-up parsing– Start with the string of tokens– Ends with the start symbol
104
Spring 2011CSCI 565 - Compiler Design
Pedro [email protected]
Bottom-up Parsing• Right-most derivation
⇒ <expr>⇒ <expr> <op> <expr>⇒ <expr> <op> ‘(‘ <expr> ‘)’⇒ <expr> <op> ‘(‘ <expr> <op> <expr> ‘)’⇒ <expr> <op> ‘(‘ <expr> <op> num ‘)’⇒ <expr> <op> ‘(‘ <expr> ‘+’ num ‘)’’⇒ <expr> <op> ‘(‘ num ‘+’ num ‘)’⇒ <expr> ‘*’ ‘(‘ num ‘+’ num ‘)’⇒ num ‘*’ ‘(‘ num ‘+’ num ‘)’
105
Spring 2011CSCI 565 - Compiler Design
Pedro [email protected]
Bottom-up Parsing• Right-most derivation
⇒ <expr>⇒ <expr> <op> <expr>⇒ <expr> <op> ‘(‘ <expr> ‘)’⇒ <expr> <op> ‘(‘ <expr> <op> <expr> ‘)’⇒ <expr> <op> ‘(‘ <expr> <op> num ‘)’⇒ <expr> <op> ‘(‘ <expr> ‘+’ num ‘)’’⇒ <expr> <op> ‘(‘ num ‘+’ num ‘)’⇒ <expr> ‘*’ ‘(‘ num ‘+’ num ‘)’⇒ num ‘*’ ‘(‘ num ‘+’ num ‘)’
106
Spring 2011CSCI 565 - Compiler Design
Pedro [email protected]
Bottom-up Parsing• Right-most derivation in reverse
⇒ num ‘*’ ‘(‘ num ‘+’ num ‘)’⇒ <expr> ‘*’ ‘(‘ num ‘+’ num ‘)’⇒ <expr> <op> ‘(‘ num ‘+’ num ‘)’⇒ <expr> <op> ‘(‘ <expr> ‘+’ num ‘)’⇒ <expr> <op> ‘(‘ <expr> <op> num ‘)’⇒ <expr> <op> ‘(‘ <expr> <op> <expr> ‘)’
⇒ <expr> <op> ‘(‘ <expr> ‘)’⇒ <expr> <op> <expr>⇒ <expr>
107
Spring 2011CSCI 565 - Compiler Design
Pedro [email protected]
Outline
• Overview of Lexical Analysis• What is Syntactic Analysis?
• Context-Free Grammars• Derivation and Parse Trees• Top-down vs. Bottom-up Parsing
• Ambiguous Grammars• Implementing a Parser
108
Spring 2011CSCI 565 - Compiler Design
Pedro [email protected]
Another Example• Input:
124 + 23.5 * 86
• Token Stream: num ‘+’ num ‘*’ num
19
109
Spring 2011CSCI 565 - Compiler Design
Pedro [email protected]
Another Example
<expr>
Production:String: <expr>
110
Spring 2011CSCI 565 - Compiler Design
Pedro [email protected]
Another Example
<expr>
<expr><expr> <op>
Production: <expr> ⇒ <expr> <op> <expr>String: <expr> <op> <expr>
111
Spring 2011CSCI 565 - Compiler Design
Pedro [email protected]
Another Example
num
<expr>
<expr><expr> <op>
Production: <expr> ⇒ <num>String: num <op> <expr>
112
Spring 2011CSCI 565 - Compiler Design
Pedro [email protected]
Another Example
+num
<expr>
<expr><expr> <op>
Production: <op> ⇒ ‘+’String: num ‘+’ <expr>
113
Spring 2011CSCI 565 - Compiler Design
Pedro [email protected]
Another ExampleProduction: <expr> ⇒ <expr> <op> <expr>String: num ‘+’ <expr> <op> <expr>
+num
<expr>
<expr><expr>
<expr> <expr>
<op>
<op>
114
Spring 2011CSCI 565 - Compiler Design
Pedro [email protected]
Another ExampleProduction: <expr> ⇒ numString: num ‘+’ num <op> <expr>
num
+num
<expr>
<expr><expr>
<expr> <expr>
<op>
<op>
20
115
Spring 2011CSCI 565 - Compiler Design
Pedro [email protected]
Another Example
num
+num
<expr>
<expr><expr>
<expr> <expr>
<op>
Production: <op> ⇒ ‘*’String: num ‘+’ num ‘*’ <expr>
*
<op>
116
Spring 2011CSCI 565 - Compiler Design
Pedro [email protected]
Another Example
num num
+num
<expr>
<expr><expr>
<expr> <expr>
<op>
Production: <expr> ⇒ numString: num ‘+’ num ‘*’ num
*
<op>
117
Spring 2011CSCI 565 - Compiler Design
Pedro [email protected]
Another Example
String: num ‘+’ num ‘*’ num
num num
+num
<expr>
<expr><expr>
<expr> <expr>
<op>
*
<op>
118
Spring 2011CSCI 565 - Compiler Design
Pedro [email protected]
Another Example
String: num ‘+’ num ‘*’ num
• How about a different order of derivation
119
Spring 2011CSCI 565 - Compiler Design
Pedro [email protected]
Another Example
<expr>
String: <expr>
120
Spring 2011CSCI 565 - Compiler Design
Pedro [email protected]
Another Example
<expr>
<expr> <expr><op>
Production: <expr> ⇒ <expr> <op> <expr>String: <expr> <op> <expr>
21
121
Spring 2011CSCI 565 - Compiler Design
Pedro [email protected]
Another Example
num
<expr>
<expr><expr> <op>
Production: <expr> ⇒ <num>String: num <op> <expr>
122
Spring 2011CSCI 565 - Compiler Design
Pedro [email protected]
Another Example
num
<expr>
<expr><expr> <op>
Production: <expr> ⇒ <num>String: num <op> <expr>
But we can instead use the production<expr> ⇒ <expr> <op> <expr>
123
Spring 2011CSCI 565 - Compiler Design
Pedro [email protected]
Another Example
<expr>
<expr><expr> <op>
Production:String: <expr> <op> <expr>
But we can instead use the production<expr> ⇒ <expr> <op> <expr>
124
Spring 2011CSCI 565 - Compiler Design
Pedro [email protected]
Another Example
<expr>
<expr> <expr>
<expr> <expr>
<op>
Production: <expr> ⇒ <expr> <op> <expr>String: <expr> <op> <expr> <op> <expr>
<op>
125
Spring 2011CSCI 565 - Compiler Design
Pedro [email protected]
Another Example
num
<expr>
<expr> <expr>
<expr> <expr>
<op>
Production: <expr> ⇒ <num>String: num <op> <expr> <op> <expr>
<op>
126
Spring 2011CSCI 565 - Compiler Design
Pedro [email protected]
Another Example
num
<expr>
<expr> <expr>
<expr> <expr>
<op>
Production: <op> ⇒ <+>String: num ‘+’ <expr> <op> <expr>
+
<op>
22
127
Spring 2011CSCI 565 - Compiler Design
Pedro [email protected]
Another Example
num num
<expr>
<expr> <expr>
<expr> <expr>
<op>
Production: <expr> ⇒ <num>String: num ‘+’ num <op> <expr>
+
<op>
128
Spring 2011CSCI 565 - Compiler Design
Pedro [email protected]
Another Example
num num
*
<expr>
<expr> <expr>
<expr> <expr>
<op>
Production: <op> ⇒ ‘*’String: num ‘+’ num ‘*’ <expr>
+
<op>
129
Spring 2011CSCI 565 - Compiler Design
Pedro [email protected]
Another Example
num num
* num
<expr>
<expr> <expr>
<expr> <expr>
<op>
Production: <expr> ⇒ <num>String: num ‘+’ num ‘*’ num
+
<op>
130
Spring 2011CSCI 565 - Compiler Design
Pedro [email protected]
Another Example
num num
* num
<expr>
<expr> <expr>
<expr> <expr>
<op>
String: num ‘+’ num ‘*’ num
+
<op>
131
Spring 2011CSCI 565 - Compiler Design
Pedro [email protected]
Same string - Two derivations
num ‘+’ num ‘*’ num
num num
* num
<expr>
<expr> <expr>
<expr> <expr>
<op>
+
<op>
num num
+num
<expr>
<expr><expr>
<expr> <expr>
<op>
*
<op>
124 + (23.5 * 86) = 2145 (124 + 23.5) * 86 = 12685
132
Spring 2011CSCI 565 - Compiler Design
Pedro [email protected]
The Grammar is Ambiguous• Different Derivation Orders Produce Different
Parse Trees• This is Not Good!
– Leads to Ambiguous Results– Most probably will produce unexpected results
• Sometimes Rewriting a Grammar with AdditionalNon-Terminals will Eliminate the Ambiguity
23
133
Spring 2011CSCI 565 - Compiler Design
Pedro [email protected]
The Ambiguous Grammar<expr> → <expr> <op> <expr><expr> → ( <expr> )<expr> → - <expr>
<expr> → num<op> → +<op> → *
134
Spring 2011CSCI 565 - Compiler Design
Pedro [email protected]
Eliminating Ambiguity
<expr> → <expr> + <term>
<expr> → <term><term> → <term> * <unit><term> → <unit>
<unit> → num<unit> → ( <expr> )
<expr> → <expr> <op> <expr><expr> → ( <expr> )<expr> → - <expr><expr> → num
<op> → +<op> → *
135
Spring 2011CSCI 565 - Compiler Design
Pedro [email protected]
Eliminating Ambiguity
String: num ‘+’ num ‘*’ num
num
+
<expr>
<term><expr>
<term> <unit>*
num
<unit>
<term>
num
<unit>
136
Spring 2011CSCI 565 - Compiler Design
Pedro [email protected]
First example in the new grammar
num ‘*’ ‘(‘ num ‘+’ num ‘)’
+
( )*
num
<term>
<unit><term>
<expr>
<expr>
<term>
<expr>
<unit>
num
<term>
<unit> num
<unit>
137
Spring 2011CSCI 565 - Compiler Design
Pedro [email protected]
Question: Is this Grammar ambiguous?
<stmt> → if <expr> then <stlist><stmt> → if <expr> then <stlist> else <stlist>
138
Spring 2011CSCI 565 - Compiler Design
Pedro [email protected]
How do you make this unambiguous?
<stmt> → if <expr> then <stlist><stmt> → if <expr> then <stlist> else <stlist>
24
139
Spring 2011CSCI 565 - Compiler Design
Pedro [email protected]
Ambiguous GrammarsDefinitions• If a grammar has more than one leftmost derivation for a single
sentential form, the grammar is ambiguous• If a grammar has more than one rightmost derivation for a
single sentential form, the grammar is ambiguous• The leftmost and rightmost derivations for a sentential form
may differ, even in an unambiguous grammar
Classic example — the if-then-else problemStmt → if Expr then Stmt | if Expr then Stmt else Stmt | … other stmts …
This ambiguity is entirely grammatical in nature
140
Spring 2011CSCI 565 - Compiler Design
Pedro [email protected]
Ambiguity
This sentential form has two derivationsif Expr1 then if Expr2 then Stmt1 else Stmt2
then
else
if
then
if
E1
E2
S2
S1
production 2, thenproduction 1
then
if
then
if
E1
E2
S1
else
S2
production 1, thenproduction 2
141
Spring 2011CSCI 565 - Compiler Design
Pedro [email protected]
AmbiguityRemoving the Ambiguity• Must rewrite the grammar to avoid generating the problem• Match each else to innermost unmatched if (common sense rule)
With this grammar, the example has only one derivation
1 Stmt ! WithElse2 | NoElse
3 WithElse ! if Expr then WithElse else WithElse
4 | OtherStmt
5 NoElse ! if Expr then Stmt
6 | if Expr then WithElse else NoElse
Intuition: a NoElse always has no else on its last cascadedelse if statement
142
Spring 2011CSCI 565 - Compiler Design
Pedro [email protected]
Ambiguity
if Expr1 then if Expr2 then Stmt1 else Stmt2
This binds the else controlling S2 to the inner if
Rule Sentential Form— Stmt2 NoElse5 if Expr then Stmt? if E1 then Stmt1 if E1 then WithElse3 if E1 then if Expr then WithElse else WithElse? if E1 then if E2 then WithElse else WithElse4 if E1 then if E2 then S1 else WithElse4 if E1 then if E2 then S1 else S2
143
Spring 2011CSCI 565 - Compiler Design
Pedro [email protected]
Outline
• Overview of Lexical Analysis• What is Syntactic Analysis?
• Context-Free Grammars• Derivation and Parse Trees• Top-down vs. Bottom-up Parsing
• Ambiguous Grammars• Implementing a Parser
144
Spring 2011CSCI 565 - Compiler Design
Pedro [email protected]
Implementing a Parser
• Implementing a parser for some CFGs can be verydifficult– Need to look at the input and choose a production– Cannot choose the right production without looking a lot ahead
• Key: Decision Algorithm– If Input Sentence does not belong to the Language, then
• Algorithm must prove it cannot be derived using Grammer• For any possible production sequence…
– If Input sentence belongs to the Language• Algorithm must present derivation sequence
25
145
Spring 2011CSCI 565 - Compiler Design
Pedro [email protected]
Example of look ahead
• Grammar<stmt> → a <long> b<stmt> → a <long> c<long> → x <long> | x
• Input string “axxxxxxxxxxxxxxxxx…….”
• May need to look ahead a long while beforedeciding on a production
146
Spring 2011CSCI 565 - Compiler Design
Pedro [email protected]
Implementing a Parser
• Implementing a parser for some CFGs can be verydifficult– Need to look at the input and choose a production– Cannot choose the production without look ahead
• Different Techniques– Each can handle some set of CFGs– Categorization of techniques
147
Spring 2011CSCI 565 - Compiler Design
Pedro [email protected]
Implementing a Parser• Implementing a parser for some CFGs can be very
difficult– Need to look at the input and choose a production– Cannot choose the production without look ahead
• Different Techniques– Each can handle some set of CFGs– Categorization of techniques
( )
148
Spring 2011CSCI 565 - Compiler Design
Pedro [email protected]
Implementing a Parser• Implementing a parser for some CFGs can be very difficult
– Need to look at the input and choose a production– Cannot choose the production without look ahead
• Different Techniques– Each can handle some set of CFGs– Categorization of techniques
– L - parse from left to right– R - parse from right to left
( )
149
Spring 2011CSCI 565 - Compiler Design
Pedro [email protected]
Implementing a Parser• Implementing a parser for some CFGs can be very difficult
– Need to look at the input and choose a production– Cannot choose the production without look ahead
• Different Techniques– Each can handle some set of CFGs– Categorization of techniques
– L - leftmost derivation– R - rightmost derivation
( )
150
Spring 2011CSCI 565 - Compiler Design
Pedro [email protected]
Implementing a Parser• Implementing a parser for some CFGs can be very difficult
– Need to look at the input and choose a production– Cannot choose the production without look ahead
• Different Techniques– Each can handle some set of CFGs– Categorization of techniques
– Number of lookahead characters
( )
26
151
Spring 2011CSCI 565 - Compiler Design
Pedro [email protected]
Implementing a Parser• Implementing a parser for some CFGs can be very
difficult– Need to look at the input and choose a production– Cannot choose the production without look ahead
• Different Techniques– Each can handle some set of CFGs– Categorization of techniques
– Examples: LL(0), LR(1)
( )
152
Spring 2011CSCI 565 - Compiler Design
Pedro [email protected]
Next Lecture
• How to Implement a Parser• How to build a Parser Engine for a
Shift-Reduce Parser
• We will look at– LR(0)– LR(1)– LALR(1)
ParserEngine
153
Spring 2011CSCI 565 - Compiler Design
Pedro [email protected]
Summary
• What is Syntactic Analysis?• Difference between Lexical any Syntactic Analyses
• All about Context-Free Grammars• Parse Trees• Left-most and Right-most Derivations
• Top-down and Bottom-up Parsing• Ambiguous Grammars• Issues in Parser Implementation