Midterm Review€¦ · The Structure of a Modern Compiler Lexical Analysis Syntax Analysis Semantic...
Transcript of Midterm Review€¦ · The Structure of a Modern Compiler Lexical Analysis Syntax Analysis Semantic...
Midterm ReviewApril 28th
Monday, April 29, 13
Monday, April 29, 13
Midterm
• Close Book
• Close Note
• 80 mins
• 20%
Monday, April 29, 13
Previously on CSE 131b...
The Structure of a Modern Compiler
Lexical Analysis
Syntax Analysis
Semantic Analysis
IR Generation
IR Optimization
Code Generation
Optimization
SourceCode
Machine
Code
Structure of a modern compiler
Wednesday, April 3, 13
Midterm
Monday, April 29, 13
• Basic compiler concepts
• Lexical Analysis
• Goal, Input, Output
• Formal Language: regular expression/DFA/NFA
• Conflict Resolution
• Syntax Analysis
• Goal, input, output
• Formal Language: context free grammar, leftmost/rightmost derivation, ambiguity
• Parsing: parse tree construction, top down parsing (bfs,dfs, predictive, LL(1))
• Semantic Analysis
• Goal
• Scope: static scope, dynamic scope, symbol table
• Type checking: type inference, function overloading, function overriding
Monday, April 29, 13
Basic Concepts
• What’s a compiler?
• What’s an interpreter?
• What are the main phases of a compiler?
Monday, April 29, 13
• Compiler
Compilerprogram in source language
program in target language
Error message
Input
Output
Monday, April 29, 13
• Compiler
Compilerprogram in source language
program in target language
Error message
Input
Output
• Translates
Monday, April 29, 13
• Compiler
Compilerprogram in source language
program in target language
Error message
Input
Output
• Translates
• Typically lower the level of abstraction of the program
Monday, April 29, 13
• Compiler
Compilerprogram in source language
program in target language
Error message
Input
Output
• Translates
• Typically lower the level of abstraction of the program • High-level programming language -> machine code (C/C++ -> X86, etc)
Monday, April 29, 13
• Compiler
Compilerprogram in source language
program in target language
Error message
Input
Output
• Translates
• Typically lower the level of abstraction of the program • High-level programming language -> machine code (C/C++ -> X86, etc)
• We expect the program produced by the compiler to be better, (faster, consumes less memory, etc) than the original program
Monday, April 29, 13
• Compiler
Interpreter
Compilerprogram in source language
program in target language
Error message
ProgramInput
output
Input
Output
• Interpreter
Static: before runtime
Dynamic: during runtime
Examples: python, shell, javascript, PHP
Monday, April 29, 13
Previously on CSE 131b...
The Structure of a Modern Compiler
Lexical Analysis
Syntax Analysis
Semantic Analysis
IR Generation
IR Optimization
Code Generation
Optimization
SourceCode
Machine
Code
Structure of a modern compiler
Wednesday, April 3, 13
Monday, April 29, 13
Lexical Analysis• Goal, Input, Output
• Formal Language
• regular expression (need to know what language a regular expression means, and vice versa)
• DFA/NFA
• Conflict Resolution
• Maximal Munch
Monday, April 29, 13
w h i l e ( i < z ) \n \t + i p ;
while (ip < z) ++ip;
p + +
T_While ( T_Ident < T_Ident ) ++ T_Ident
ip z ip
Goal of Lexical AnalysisOutput: Token Stream
• For each lexeme, identify token type
• < Token type, attribute>Monday, April 29, 13
Scanning a Source File
w h i l e ( i < z ) \n \t + i p ;p + +( 1 < i ) \n \t + i ;3 + +7
Token Type
• Keyword: for int if else while
• Punctuation: ( ) { } ;
• Operand: + - ++
• Relation: < > =
• Identifier: (variable name,function name) foo foo_2
• Integer, float point, string: 2345 2.0 “hello world”
• Whitespace, comment /* this code is awesome */Monday, April 29, 13
• How do we describe which (potentially infinite) set of lexemes is associated with each token type?
• What type of formal language should we use to describe tokens?
Monday, April 29, 13
Regular Expressions
● Regular expressions are a family of descriptions that can be used to capture certain languages (the regular languages).
● Often provide a compact and human-readable description of the language.
● Used as the basis for numerous software systems, including the flex tool we will use in this course.
Monday, April 29, 13
Simple Regular Expressions
● Suppose the only characters are 0 and 1.
● Here is a regular expression for strings containing 00 as a substring:
(0 | 1)*00(0 | 1)*
Monday, April 29, 13
• More examples
• Whitespace: [ \t\n]+
• Integers: [+\-]?[0-9]+
• Hex numbers: 0x[0-9a-f]+
• identifier
• [A-Za-z]([A-Za-z]|[0-9])*
Wednesday, April 3, 13
Monday, April 29, 13
Recognizing Regular Language
What is the machine that recognize regular language??
Monday, April 29, 13
Recognizing Regular Language
• Finite Automata
What is the machine that recognize regular language??
Monday, April 29, 13
Recognizing Regular Language
• Finite Automata
• DFA (Deterministic Finite Automata)
What is the machine that recognize regular language??
Monday, April 29, 13
Recognizing Regular Language
• Finite Automata
• DFA (Deterministic Finite Automata)
• NFA (Non-deterministic Finite Automata)
What is the machine that recognize regular language??
Monday, April 29, 13
Recognizing Regular Language
• Finite Automata
• DFA (Deterministic Finite Automata)
• NFA (Non-deterministic Finite Automata)
• DFA and NFA have equal power
What is the machine that recognize regular language??
Monday, April 29, 13
Recognizing Regular Language
• Finite Automata
• DFA (Deterministic Finite Automata)
• NFA (Non-deterministic Finite Automata)
• DFA and NFA have equal power
• Need to be able to construct DFA/NFA for given regular expressions
What is the machine that recognize regular language??
Monday, April 29, 13
Recognizing Regular Language
• Finite Automata
• DFA (Deterministic Finite Automata)
• NFA (Non-deterministic Finite Automata)
• DFA and NFA have equal power
• Need to be able to construct DFA/NFA for given regular expressions
• Need to understand what language a DFA/NFA recoganize
What is the machine that recognize regular language??
Monday, April 29, 13
Lexing Ambiguities
T_For forT_Identifier [A-Za-z_][A-Za-z0-9_]*
f o tr
f o tr
f o tr
f o tr
f o tr
f o tr
f o tr
f o tr
f o tr
f o tr
Monday, April 29, 13
Lexing Ambiguities
T_For forT_Identifier [A-Za-z_][A-Za-z0-9_]*
f o tr
f o tr
f o tr
f o tr
f o tr
f o tr
f o tr
f o tr
f o tr
f o tr
Monday, April 29, 13
Conflict Resolution
● Assume all tokens are specified as regular expressions.
● Algorithm: Left-to-right scan.
● Tiebreaking rule one: Maximal munch.
● Always match the longest possible prefix of the remaining text.
Monday, April 29, 13
Implementing Maximal Munch
● Given a set of regular expressions, how can we use them to implement maximum munch?
● Idea:
● Convert expressions to NFAs.
● Run all NFAs in parallel, keeping track of the last match.
● When all automata get stuck, report the last match and restart the search at that point.
Monday, April 29, 13
Implementing Maximal Munch
● Given a set of regular expressions, how can we use them to implement maximum munch?
● Idea:
● Convert expressions to NFAs.
● Run all NFAs in parallel, keeping track of the last match.
● When all automata get stuck, report the last match and restart the search at that point.
Monday, April 29, 13
• Example
Monday, April 29, 13
T_Do doT_Double doubleT_Mystery [A-Za-z]
Implementing Maximal Munch
Monday, April 29, 13
T_Do doT_Double doubleT_Mystery [A-Za-z]
Implementing Maximal Munch
start d o
start d o u b l e
start Σ
Monday, April 29, 13
T_Do doT_Double doubleT_Mystery [A-Za-z]
Implementing Maximal Munch
start d o
start d o u b l e
start Σ
D O U B L ED O U B
Monday, April 29, 13
T_Do doT_Double doubleT_Mystery [A-Za-z]
Implementing Maximal Munch
start d o
start d o u b l e
start Σ
D O U B L ED O U B
Monday, April 29, 13
T_Do doT_Double doubleT_Mystery [A-Za-z]
Implementing Maximal Munch
start d o
start d o u b l e
start Σ
D O U B L ED O U B
Monday, April 29, 13
T_Do doT_Double doubleT_Mystery [A-Za-z]
Implementing Maximal Munch
start d o
start d o u b l e
start Σ
D O U B L ED O U B
Monday, April 29, 13
T_Do doT_Double doubleT_Mystery [A-Za-z]
Implementing Maximal Munch
start d o
start d o u b l e
start Σ
D O U B L ED O U B
Monday, April 29, 13
T_Do doT_Double doubleT_Mystery [A-Za-z]
Implementing Maximal Munch
start d o
start d o u b l e
start Σ
D O U B L ED O U B
Monday, April 29, 13
T_Do doT_Double doubleT_Mystery [A-Za-z]
Implementing Maximal Munch
start d o
start d o u b l e
start Σ
D O U B L ED O U B
Monday, April 29, 13
T_Do doT_Double doubleT_Mystery [A-Za-z]
Implementing Maximal Munch
start d o
start d o u b l e
start Σ
D O U B L ED O U B
Monday, April 29, 13
T_Do doT_Double doubleT_Mystery [A-Za-z]
Implementing Maximal Munch
start d o
start d o u b l e
start Σ
D O U B L ED O U B
Monday, April 29, 13
T_Do doT_Double doubleT_Mystery [A-Za-z]
Implementing Maximal Munch
start d o
start d o u b l e
start Σ
D O U B L ED O U B
Monday, April 29, 13
T_Do doT_Double doubleT_Mystery [A-Za-z]
Implementing Maximal Munch
start d o
start d o u b l e
start Σ
D O U B L ED O U B
Monday, April 29, 13
T_Do doT_Double doubleT_Mystery [A-Za-z]
Implementing Maximal Munch
start d o
start d o u b l e
start Σ
D O U B L ED O U B
Monday, April 29, 13
T_Do doT_Double doubleT_Mystery [A-Za-z]
Implementing Maximal Munch
start d o
start d o u b l e
start Σ
D O U B L ED O U B
Monday, April 29, 13
T_Do doT_Double doubleT_Mystery [A-Za-z]
Implementing Maximal Munch
start d o
start d o u b l e
start Σ
D O U B L ED O U B
Monday, April 29, 13
T_Do doT_Double doubleT_Mystery [A-Za-z]
Implementing Maximal Munch
start d o
start d o u b l e
start Σ
D O U B L ED O U B
Monday, April 29, 13
T_Do doT_Double doubleT_Mystery [A-Za-z]
Implementing Maximal Munch
start d o
start d o u b l e
start Σ
D O U B L ED O U B
Monday, April 29, 13
T_Do doT_Double doubleT_Mystery [A-Za-z]
Implementing Maximal Munch
start d o
start d o u b l e
start Σ
D O U B L ED O U B
Monday, April 29, 13
T_Do doT_Double doubleT_Mystery [A-Za-z]
Implementing Maximal Munch
start d o
start d o u b l e
start Σ
D O U B L ED O U B
Monday, April 29, 13
T_Do doT_Double doubleT_Mystery [A-Za-z]
Implementing Maximal Munch
start d o
start d o u b l e
start Σ
D O U B L ED O U B
Monday, April 29, 13
T_Do doT_Double doubleT_Mystery [A-Za-z]
Implementing Maximal Munch
start d o
start d o u b l e
start Σ
D O U B L ED O U B
Monday, April 29, 13
T_Do doT_Double doubleT_Mystery [A-Za-z]
Implementing Maximal Munch
start d o
start d o u b l e
start Σ
D O U B L ED O U B
Monday, April 29, 13
T_Do doT_Double doubleT_Mystery [A-Za-z]
Implementing Maximal Munch
start d o
start d o u b l e
start Σ
D O U B L ED O U B
Monday, April 29, 13
T_Do doT_Double doubleT_Mystery [A-Za-z]
Implementing Maximal Munch
start d o
start d o u b l e
start Σ
D O U B L ED O U B
Monday, April 29, 13
T_Do doT_Double doubleT_Mystery [A-Za-z]
Implementing Maximal Munch
start d o
start d o u b l e
start Σ
D O U B L ED O U B
Monday, April 29, 13
T_Do doT_Double doubleT_Mystery [A-Za-z]
Implementing Maximal Munch
start d o
start d o u b l e
start Σ
D O U B L ED O U B
Monday, April 29, 13
T_Do doT_Double doubleT_Mystery [A-Za-z]
Implementing Maximal Munch
start d o
start d o u b l e
start Σ
D O U B L ED O U B
Monday, April 29, 13
T_Do doT_Double doubleT_Mystery [A-Za-z]
Implementing Maximal Munch
start d o
start d o u b l e
start Σ
D O U B L ED O U B
Monday, April 29, 13
T_Do doT_Double doubleT_Mystery [A-Za-z]
Implementing Maximal Munch
start d o
start d o u b l e
start Σ
D O U B L ED O U B
Monday, April 29, 13
T_Do doT_Double doubleT_Mystery [A-Za-z]
Implementing Maximal Munch
start d o
start d o u b l e
start Σ
D O U B L ED O U B
Monday, April 29, 13
T_Do doT_Double doubleT_Mystery [A-Za-z]
Implementing Maximal Munch
start d o
start d o u b l e
start Σ
D O U B L ED O U B
Monday, April 29, 13
T_Do doT_Double doubleT_Mystery [A-Za-z]
Implementing Maximal Munch
start d o
start d o u b l e
start Σ
D O U B L ED O U B
Monday, April 29, 13
T_Do doT_Double doubleT_Mystery [A-Za-z]
Implementing Maximal Munch
start d o
start d o u b l e
start Σ
D O U B L ED O U B
Monday, April 29, 13
T_Do doT_Double doubleT_Mystery [A-Za-z]
Implementing Maximal Munch
start d o
start d o u b l e
start Σ
D O U B L ED O U B
Monday, April 29, 13
T_Do doT_Double doubleT_Mystery [A-Za-z]
Implementing Maximal Munch
start d o
start d o u b l e
start Σ
D O U B L ED O U B
Monday, April 29, 13
T_Do doT_Double doubleT_Mystery [A-Za-z]
Implementing Maximal Munch
start d o
start d o u b l e
start Σ
D O U B L ED O U B
Monday, April 29, 13
T_Do doT_Double doubleT_Mystery [A-Za-z]
Implementing Maximal Munch
start d o
start d o u b l e
start Σ
D O U B L ED O U B
Monday, April 29, 13
T_Do doT_Double doubleT_Mystery [A-Za-z]
Implementing Maximal Munch
start d o
start d o u b l e
start Σ
D O U B L ED O U B
Monday, April 29, 13
T_Do doT_Double doubleT_Mystery [A-Za-z]
Implementing Maximal Munch
start d o
start d o u b l e
start Σ
D O U B L ED O U B
Monday, April 29, 13
T_Do doT_Double doubleT_Mystery [A-Za-z]
Implementing Maximal Munch
start d o
start d o u b l e
start Σ
D O U B L ED O U B
Monday, April 29, 13
T_Do doT_Double doubleT_Mystery [A-Za-z]
Implementing Maximal Munch
start d o
start d o u b l e
start Σ
D O U B L ED O U B
Monday, April 29, 13
T_Do doT_Double doubleT_Mystery [A-Za-z]
Implementing Maximal Munch
start d o
start d o u b l e
start Σ
D O U B L ED O U B
Monday, April 29, 13
T_Do doT_Double doubleT_Mystery [A-Za-z]
Implementing Maximal Munch
start d o
start d o u b l e
start Σ
D O U B L ED O U B
Monday, April 29, 13
T_Do doT_Double doubleT_Mystery [A-Za-z]
Implementing Maximal Munch
start d o
start d o u b l e
start Σ
D O U B L ED O U B
Monday, April 29, 13
T_Do doT_Double doubleT_Mystery [A-Za-z]
Implementing Maximal Munch
start d o
start d o u b l e
start Σ
D O U B L ED O U B
Monday, April 29, 13
A Minor Simplification
d o
d o u b l e
Σ
ε
ε
εstart
Monday, April 29, 13
Syntax Analysis
• Goal, input, output
• Formal Language
• context free grammar
• leftmost/rightmost derivation
• Ambiguity
• Parsing
• parse tree construction
• Abstract Syntax Tree
• Top down parsing (bfs,dfs,predictive, LL(1))
Monday, April 29, 13
w h i l e ( i < z ) \n \t + i p ;
while (ip < z) ++ip;
p + +
T_While ( T_Ident < T_Ident ) ++ T_Ident
ip z ip
While
++
Ident
<
Ident Ident
ip z ip
Monday, April 29, 13
Overview of Syntax Analysis
• Input: stream of tokens from the lexer
• Output: Abstract Syntax Tree (AST)
if
==
b 0
=
a �Hi�
;
Source code: if (b==0) a = “Hi”;
AST:
Monday, April 29, 13
Goal of Syntax Analysis
Monday, April 29, 13
Goal of Syntax Analysis
• Recovered the structure described by the token stream
• Report errors if the tokens do not properly encode a structure
• Differentiate parsing errors and semantic errors
Monday, April 29, 13
What%Parsing%Doesn�t%Do%• %Doesn�t%check:%type%agreement,%variable%declara;on,%
variables%ini;aliza;on,%etc.%
• int x = true; • int y; • z = f(y);
• %Deferred%un;l%seman;c%analysis%
Monday, April 29, 13
Formal Language for Syntax Analysis
Monday, April 29, 13
Formal Language for Syntax Analysis
Monday, April 29, 13
Formal Language for Syntax Analysis
• Can we use regular expressions (on tokens) to specify programming language syntax?
Monday, April 29, 13
• Language of all strings that contain balanced parentheses?
• () (()) ()()() (())()((()()))
• (( )( ()) (()()
• Construct a Finite Automaton for this...?
Monday, April 29, 13
• Language of all strings that contain balanced parentheses?
• () (()) ()()() (())()((()()))
• (( )( ()) (()()
• Construct a Finite Automaton for this...?
) ) ) ) )
( ( ( ( (
Monday, April 29, 13
• Language of all strings that contain balanced parentheses?
• () (()) ()()() (())()((()()))
• (( )( ()) (()()
• Construct a Finite Automaton for this...?
) ) ) ) )
( ( ( ( (
• Limits of regular language: DFA has only finite number of states; cannot perform unbounded counting
Monday, April 29, 13
• Language of all strings that contain balanced parentheses?
• () (()) ()()() (())()((()()))
• (( )( ()) (()()
• Construct a Finite Automaton for this...?
) ) ) ) )
( ( ( ( (
• Limits of regular language: DFA has only finite number of states; cannot perform unbounded counting
• Need a More Powerful Representa3on: Context Free Grammar
Monday, April 29, 13
CFG Terminology• Terminals
• Token or ε
• Non-terminals
• variables
• Start symbol
• Begins the derivation
• Productions
• Specify how non-terminals may be expanded to form strings (replacement rules)
• LHS: single non-terminal, RHS: string of terminals (including ε) or non-terminals
• S → ( S ) S
• S → ε
Monday, April 29, 13
• Need to understand CFG grammar
CFGs for Programming Languages
BLOCK → STMT | { STMTS }
STMTS → ε
| STMT STMTS
STMT → EXPR; | if (EXPR) BLOCK
| while (EXPR) BLOCK | do BLOCK while (EXPR); | BLOCK | …
EXPR → identifier | constant
| EXPR + EXPR | EXPR – EXPR | EXPR * EXPR | ...
Monday, April 29, 13
Leftmost Derivations
→ STMTS
⇒ STMT STMTS
⇒ EXPR; STMTS
⇒ EXPR = EXPR; STMTS
⇒ id = EXPR; STMTS
⇒ id = EXPR + EXPR; STMTS
⇒ id = id + EXPR; STMTS
⇒ id = id + constant; STMTS
⇒ id = id + constant;
BLOCK → STMT | { STMTS }
STMTS → ε
| STMT STMTS
STMT → EXPR; | if (EXPR) BLOCK | while (EXPR) BLOCK
| do BLOCK while (EXPR); | BLOCK | …
EXPR → identifier | constant | EXPR + EXPR | EXPR – EXPR | EXPR * EXPR | EXPR = EXPR | ...
Productions Derivations
Monday, April 29, 13
Leftmost Derivations
→ STMTS
⇒ STMT STMTS
⇒ EXPR; STMTS
⇒ EXPR = EXPR; STMTS
⇒ id = EXPR; STMTS
⇒ id = EXPR + EXPR; STMTS
⇒ id = id + EXPR; STMTS
⇒ id = id + constant; STMTS
⇒ id = id + constant;
BLOCK → STMT | { STMTS }
STMTS → ε
| STMT STMTS
STMT → EXPR; | if (EXPR) BLOCK | while (EXPR) BLOCK
| do BLOCK while (EXPR); | BLOCK | …
EXPR → identifier | constant | EXPR + EXPR | EXPR – EXPR | EXPR * EXPR | EXPR = EXPR | ...
Productions Derivations
• Need to be able to conduct derivations
• What are the two important types of derivations that we talked about?
Monday, April 29, 13
Leftmost Derivations
● A leftmost derivation is a derivation in which each step expands the leftmost nonterminal.
● A rightmost derivation is a derivation in which each step expands the rightmost nonterminal.
● These will be of great importance when we talk about parsing next week.
Monday, April 29, 13
Related Derivations
⇒ E
⇒ E Op E
⇒ int Op E
⇒ int * E
⇒ int * (E)
⇒ int * (E Op E)
⇒ int * (int Op E)
⇒ int * (int + E)
⇒ int * (int + int)
⇒ E
⇒ E Op E
⇒ E Op (E)
⇒ E Op (E Op E)
⇒ E Op (E Op int)
⇒ E Op (E + int)
⇒ E Op (int + int)
⇒ E * (int + int)
⇒ int * (int + int)
Monday, April 29, 13
• Do leftmost and rightmost derivations generate different parsing trees?
Monday, April 29, 13
For ComparisonE
E EOp
int *
E
( int + int )
E EOp
E
E EOp
int *
E
( int + int )
E EOp
A Notational Shorthand
E → int | E Op E | (E)
Op → + | - | * | /
Monday, April 29, 13
For ComparisonE
E EOp
int *
E
( int + int )
E EOp
E
E EOp
int *
E
( int + int )
E EOp
Left-most derivation and right-most derivation generate the same parse tree
A Notational Shorthand
E → int | E Op E | (E)
Op → + | - | * | /
Monday, April 29, 13
For ComparisonE
E EOp
int *
E
( int + int )
E EOp
E
E EOp
int *
E
( int + int )
E EOp
Left-most derivation and right-most derivation generate the same parse tree
The order of the construction is different
A Notational Shorthand
E → int | E Op E | (E)
Op → + | - | * | /
Monday, April 29, 13
• Need to be able to construct the parsing tree
For ComparisonE
E EOp
int *
E
( int + int )
E EOp
E
E EOp
int *
E
( int + int )
E EOp
Left-most derivation and right-most derivation generate the same parse tree
The order of the construction is different
A Notational Shorthand
E → int | E Op E | (E)
Op → + | - | * | /
Monday, April 29, 13
E
E EOp
int * int + int
E EOp
E
E EOp
int * int
E EOp
+ int
A Serious Problem
int * (int + int) (int * int) + int
Ambiguity
• Need to be able to tell if CFG is ambiguous
Monday, April 29, 13
Parse&Tree&vs.&AST&• Parse&Tree,&aka&
concrete&syntax&S
E S +
S E ) (
E S + 5
1 E S +
2 E
S ) (
E S +
3 E
4
+
+ 5
1 +
2 +
3 4
Abstract Syntax Tree
Discards/abstracts unneeded information
• Need to be able to construct AST
Monday, April 29, 13
Parsing Algorithms
• Top Down ( will be tested)
• Beginning with the start symbol, try to guess the productions to apply to end up at the user’s program
• Bottom up (will not be tested)
• Beginning with the user's program, try to apply productions in reverse to convert the program back into the start symbol.
Monday, April 29, 13
• Topdown parsing
• parsing as a search
• breath first search (using what data structure?)
• leftmost depth first search (using what data structure?)
• problem with leftmost depth-first search?
• how to eliminate the problem
• Predictive Parsing
• LL(1)
• Algorithm, parsing table
• Limitation
Monday, April 29, 13
Parsing as a SearchE → TE → T + ET → intT → (E)
E
T
T + E
int
int + E
T + T
T + T + E
(E)
int + T
T + (E)
int + (E)
(T)
(T + E)
…
…
…
…int + T + E
int + int
…
…
……
…
Monday, April 29, 13
Our First Top-Down Algorithm
● Breadth-First Search
● Maintain a worklist of sentential forms, initially just the start symbol S.
● While the worklist isn't empty:
● Remove an element from the worklist.
● If it matches the target string, you're done.
● Otherwise, for each possible string that can be derived in one step, add that string to the worklist.
● Can recover a parse tree by tracking what productions we applied at each step.
Monday, April 29, 13
Breadth-First Search Parsing
Worklist
int + int
E → TE → T + ET → intT → (E)
Monday, April 29, 13
Breadth-First Search Parsing
Worklist
int + int
E → TE → T + ET → intT → (E)
E
Monday, April 29, 13
Breadth-First Search Parsing
Worklist
int + int
E → TE → T + ET → intT → (E)
E
Monday, April 29, 13
Breadth-First Search Parsing
Worklist
int + int
E → TE → T + ET → intT → (E)
E
T T + E
Monday, April 29, 13
Breadth-First Search Parsing
Worklist
int + int
E → TE → T + ET → intT → (E)
E
T T + E
T T + E
Monday, April 29, 13
Breadth-First Search Parsing
Worklist
int + int
E → TE → T + ET → intT → (E)
T T + E
Monday, April 29, 13
Breadth-First Search Parsing
Worklist
int + int
E → TE → T + ET → intT → (E)
T + E
T
Monday, April 29, 13
Breadth-First Search Parsing
Worklist
int + int
E → TE → T + ET → intT → (E)
T + E
T
int (E)
Monday, April 29, 13
• ...
Monday, April 29, 13
Leftmost DFS
● Idea: Use depth-first search.
● Advantages:
● Lower memory usage: Only considers one branch at a time.
● High performance: On many grammars, runs very quickly.
● Easy to implement: Can be written as a set of mutually recursive functions.
Monday, April 29, 13
Leftmost DFS
int + int
E → TE → T + ET → intT → (E)
Monday, April 29, 13
Leftmost DFS
int + int
E
E → TE → T + ET → intT → (E)
Monday, April 29, 13
Leftmost DFS
int + int
E
TE → TE → T + ET → intT → (E)
Monday, April 29, 13
Leftmost DFS
int + int
E
T
intE → TE → T + ET → intT → (E)
Monday, April 29, 13
Leftmost DFS
int + int
E
T
intE → TE → T + ET → intT → (E)
Monday, April 29, 13
Leftmost DFS
int + int
E
TE → TE → T + ET → intT → (E)
Monday, April 29, 13
Leftmost DFS
int + int
E
T
(E)E → TE → T + ET → intT → (E)
Monday, April 29, 13
Leftmost DFS
int + int
E
TE → TE → T + ET → intT → (E)
Monday, April 29, 13
• ...
Monday, April 29, 13
Problems with Leftmost DFS
A → Aa | c
c
A
Aa
Aaa
Aaaa
Aaaaa
Monday, April 29, 13
Left Recursion
● A nonterminal A is said to be left-recursive iff
A ⇒* Aω
for some string ω.
● Leftmost DFS may fail on left-recursive grammars.
● Fortunately, in many cases it is possible to eliminate left recursion (see Handout 08 for details).
Saturday, April 13, 13
Monday, April 29, 13
Left Recursion
● A nonterminal A is said to be left-recursive iff
A ⇒* Aω
for some string ω.
● Leftmost DFS may fail on left-recursive grammars.
● Fortunately, in many cases it is possible to eliminate left recursion (see Handout 08 for details).
Saturday, April 13, 13
• need to know what left recursion is
• and why it causes problems for leftmost DFS
Monday, April 29, 13
Eliminating Left Recursion
● In general, left recursion can be converted into right recursion by a mechanical transformation.
● Consider the grammar
A → Aω | α
● This will produce α followed by some number of ω's.
● Can rewrite the grammar as
A → αB
B → ε | ωB
Monday, April 29, 13
Summary of Leftmost BFS/DFS
● Leftmost BFS works on all grammars.
● Worst-case runtime is exponential.
● Worst-case memory usage is exponential.
● Rarely used in practice.
● Leftmost DFS works on grammars without left recursion.
● Worst-case runtime is exponential.
● Worst-case memory usage is linear.
● Often used in a limited form as recursive descent.
Monday, April 29, 13
Predictive Parsing
● The leftmost DFS/BFS algorithms are backtracking algorithms.
● Guess which production to use, then back up if it doesn't work.
● Try to match a prefix by sheer dumb luck.
● There is another class of parsing algorithms called predictive algorithms.
● Based on remaining input, predict (without backtracking) which production to use.
Monday, April 29, 13
A Simple Predictive Parser: LL(1)
● Top-down, predictive parsing:
● L: Left-to-right scan of the tokens
● L: Leftmost derivation.
● (1): One token of lookahead
● Construct a leftmost derivation for the sequence of tokens.
● When expanding a nonterminal, we predict the production to use by looking at the next token of the input. The decision is forced.
Monday, April 29, 13
A Simple Predictive Parser: LL(1)
● Top-down, predictive parsing:
● L: Left-to-right scan of the tokens
● L: Leftmost derivation.
● (1): One token of lookahead
● Construct a leftmost derivation for the sequence of tokens.
● When expanding a nonterminal, we predict the production to use by looking at the next token of the input. The decision is forced.
Monday, April 29, 13
• LL (1)
• how does it work
• how to construct the parsing table for LL(1)
• limitations
Monday, April 29, 13
LL(1) Parse Tables
Monday, April 29, 13
LL(1) Parse Tables
E → intE → (E Op E)Op → +Op → *
Monday, April 29, 13
LL(1) Parse Tables
E → intE → (E Op E)Op → +Op → *
int ( ) + *
E
Op
int (E Op E)
*+
Monday, April 29, 13
(1) E → int(2) E → (E Op E)(3) Op → +(4) Op → *
(int + (int * int))
LL(1) Parsing
Monday, April 29, 13
(1) E → int(2) E → (E Op E)(3) Op → +(4) Op → *
(int + (int * int))E
LL(1) Parsing
Monday, April 29, 13
(1) E → int(2) E → (E Op E)(3) Op → +(4) Op → *
(int + (int * int))E
int ( ) + *
E
Op
1 2
43
LL(1) Parsing
Monday, April 29, 13
(1) E → int(2) E → (E Op E)(3) Op → +(4) Op → *
(int + (int * int))$E$
int ( ) + *
E
Op
1 2
43
LL(1) Parsing
Monday, April 29, 13
(1) E → int(2) E → (E Op E)(3) Op → +(4) Op → *
(int + (int * int))$E$
int ( ) + *
E
Op
1 2
43
LL(1) Parsing
The $ symbol is the end-of-input
marker and is used by the parser
to detect when we have reached
the end of the input. It is not
a part of the grammar.
The $ symbol is the end-of-input
marker and is used by the parser
to detect when we have reached
the end of the input. It is not
a part of the grammar.
Monday, April 29, 13
(1) E → int(2) E → (E Op E)(3) Op → +(4) Op → *
(int + (int * int))$E$
int ( ) + *
E
Op
1 2
43
LL(1) Parsing
Monday, April 29, 13
(1) E → int(2) E → (E Op E)(3) Op → +(4) Op → *
(int + (int * int))$E$
int ( ) + *
E
Op
1 2
43
LL(1) Parsing
The first symbol of our guess is a
nonterminal. We then look at our
parsing table to see what
production to use.
This is called a predict step.
The first symbol of our guess is a
nonterminal. We then look at our
parsing table to see what
production to use.
This is called a predict step.
Monday, April 29, 13
(1) E → int(2) E → (E Op E)(3) Op → +(4) Op → *
(int + (int * int))$E$
int ( ) + *
E
Op
1 2
43
LL(1) Parsing
The first symbol of our guess is a
nonterminal. We then look at our
parsing table to see what
production to use.
This is called a predict step.
The first symbol of our guess is a
nonterminal. We then look at our
parsing table to see what
production to use.
This is called a predict step.
Monday, April 29, 13
(1) E → int(2) E → (E Op E)(3) Op → +(4) Op → *
(int + (int * int))$E$
int ( ) + *
E
Op
1 2
43
LL(1) Parsing
Monday, April 29, 13
(1) E → int(2) E → (E Op E)(3) Op → +(4) Op → *
(int + (int * int))$
(int + (int * int))$
E$
(E Op E)$
int ( ) + *
E
Op
1 2
43
LL(1) Parsing
Monday, April 29, 13
(1) E → int(2) E → (E Op E)(3) Op → +(4) Op → *
(int + (int * int))$
(int + (int * int))$
E$
(E Op E)$
int ( ) + *
E
Op
1 2
43
LL(1) Parsing
The first symbol of our guess is
now a terminal symbol. We thus
match it against the first symbol
of the string to parse.
This is called a match step.
The first symbol of our guess is
now a terminal symbol. We thus
match it against the first symbol
of the string to parse.
This is called a match step.
Monday, April 29, 13
(1) E → int(2) E → (E Op E)(3) Op → +(4) Op → *
(int + (int * int))$
(int + (int * int))$
int + (int * int))$
E$
(E Op E)$
E Op E)$
int ( ) + *
E
Op
1 2
43
LL(1) Parsing
Monday, April 29, 13
(1) E → int(2) E → (E Op E)(3) Op → +(4) Op → *
(int + (int * int))$
(int + (int * int))$
int + (int * int))$
E$
(E Op E)$
E Op E)$
int ( ) + *
E
Op
1 2
43
LL(1) Parsing
Monday, April 29, 13
(1) E → int(2) E → (E Op E)(3) Op → +(4) Op → *
(int + (int * int))$
(int + (int * int))$
int + (int * int))$
int + (int * int))$
E$
(E Op E)$
E Op E)$
int Op E)$
int ( ) + *
E
Op
1 2
43
LL(1) Parsing
Monday, April 29, 13
(1) E → int(2) E → (E Op E)(3) Op → +(4) Op → *
(int + (int * int))$
(int + (int * int))$
int + (int * int))$
int + (int * int))$
+ (int * int))$
E$
(E Op E)$
E Op E)$
int Op E)$
Op E)$
int ( ) + *
E
Op
1 2
43
LL(1) Parsing
Monday, April 29, 13
(1) E → int(2) E → (E Op E)(3) Op → +(4) Op → *
(int + (int * int))$
(int + (int * int))$
int + (int * int))$
int + (int * int))$
+ (int * int))$
E$
(E Op E)$
E Op E)$
int Op E)$
Op E)$
int ( ) + *
E
Op
1 2
43
LL(1) Parsing
Monday, April 29, 13
• ......
Monday, April 29, 13
(1) E → int(2) E → (E Op E)(3) Op → +(4) Op → *
(int + (int * int))$
(int + (int * int))$
int + (int * int))$
int + (int * int))$
+ (int * int))$
+ (int * int))$
(int * int))$
(int * int))$
int * int))$
int * int))$int * int))$
* int))$
* int))$
int))$
int))$
))$
)$
$
E$
(E Op E)$
E Op E)$
int Op E)$
Op E)$
+ E)$
E)$
(E Op E))$
E Op E))$
int Op E))$
Op E))$
* E))$
E))$int))$
))$
)$
$
int ( ) + *
E
Op
1 2
43
LL(1) Parsing
Monday, April 29, 13
• Need to know how LL(1) works
• How to construct the parsing table for LL(1) ( language that does not have epsilon transition)
• Limitations of LL(1)
Monday, April 29, 13
STMT → if EXPR then STMT (1)
| while EXPR do STMT (2)
| EXPR ; (3)
EXPR → TERM -> id (4)
| zero? TERM (5)
| not EXPR (6)
| ++ id (7)
| -- id (8)
TERM → id (9)
| constant (10)
Constructing LL(1) Parse Tables
STMT
EXPR
TERM
if then while zero? not ++ -- id const=
1 3 3 3 3 3 3
5 6 7 8 4 4
9 10
→ ;while
2
do
Monday, April 29, 13
STMT → if EXPR then STMT (1)
| while EXPR do STMT (2)
| EXPR ; (3)
EXPR → TERM -> id (4)
| zero? TERM (5)
| not EXPR (6)
| ++ id (7)
| -- id (8)
TERM → id (9)
| constant (10)
Constructing LL(1) Parse Tables
STMT
EXPR
TERM
if then while zero? not ++ -- id const=→ ;while do
Monday, April 29, 13
STMT → if EXPR then STMT (1)
| while EXPR do STMT (2)
| EXPR ; (3)
EXPR → TERM -> id (4)
| zero? TERM (5)
| not EXPR (6)
| ++ id (7)
| -- id (8)
TERM → id (9)
| constant (10)
Constructing LL(1) Parse Tables
STMT
EXPR
TERM
if then while zero? not ++ -- id const=→ ;while do
Monday, April 29, 13
STMT → if EXPR then STMT (1)
| while EXPR do STMT (2)
| EXPR ; (3)
EXPR → TERM -> id (4)
| zero? TERM (5)
| not EXPR (6)
| ++ id (7)
| -- id (8)
TERM → id (9)
| constant (10)
Constructing LL(1) Parse Tables
STMT
EXPR
TERM
if then while zero? not ++ -- id const=
9 10
→ ;while do
Monday, April 29, 13
STMT → if EXPR then STMT (1)
| while EXPR do STMT (2)
| EXPR ; (3)
EXPR → TERM -> id (4)
| zero? TERM (5)
| not EXPR (6)
| ++ id (7)
| -- id (8)
TERM → id (9)
| constant (10)
Constructing LL(1) Parse Tables
STMT
EXPR
TERM
if then while zero? not ++ -- id const=
9 10
→ ;while do
Monday, April 29, 13
STMT → if EXPR then STMT (1)
| while EXPR do STMT (2)
| EXPR ; (3)
EXPR → TERM -> id (4)
| zero? TERM (5)
| not EXPR (6)
| ++ id (7)
| -- id (8)
TERM → id (9)
| constant (10)
Constructing LL(1) Parse Tables
STMT
EXPR
TERM
if then while zero? not ++ -- id const=
5 6 7 8
9 10
→ ;while do
Monday, April 29, 13
STMT → if EXPR then STMT (1)
| while EXPR do STMT (2)
| EXPR ; (3)
EXPR → TERM -> id (4)
| zero? TERM (5)
| not EXPR (6)
| ++ id (7)
| -- id (8)
TERM → id (9)
| constant (10)
Constructing LL(1) Parse Tables
STMT
EXPR
TERM
if then while zero? not ++ -- id const=
5 6 7 8
9 10
→ ;while do
Monday, April 29, 13
STMT → if EXPR then STMT (1)
| while EXPR do STMT (2)
| EXPR ; (3)
EXPR → TERM -> id (4)
| zero? TERM (5)
| not EXPR (6)
| ++ id (7)
| -- id (8)
TERM → id (9)
| constant (10)
Constructing LL(1) Parse Tables
STMT
EXPR
TERM
if then while zero? not ++ -- id const=
5 6 7 8
9 10
→ ;while do
Monday, April 29, 13
• ...
Monday, April 29, 13
Semantic Analysis
• Goal
• Scope
• static scope, dynamic scope
• symbol table
• Type checking
• calculate the type of expressions and statements
• function overloading/overriding
Monday, April 29, 13
Why$Separate$Seman-c$Analysis?$• Parsing$cannot$catch$some$errors$• Why?$
• Some$language$constructs$are$not$context9free$– Example:$All$used$variables$must$have$been$declared$(scoping)$
– Example:$A$method$must$be$invoked$with$arguments$of$proper$type$(typing)$
Monday, April 22, 13Monday, April 29, 13
What%Does%Seman-c%Analysis%Do?%• Checks%of%many%kinds:%
1. All%iden-fiers%are%declared%2. Types%%3. Inheritance%rela-onships%4. Classes%defined%only%once%5. Methods%in%a%class%defined%only%once%6. Reserved%iden-fiers%are%not%misused%And%others%.%.%.%
• The%requirements%depend%on%the%language%
Monday, April 22, 13
Need to be able to identify semantic errors in sample code
Monday, April 29, 13
Scoping:)General)Rules)• The)scope)rules)of)a)language:)
– Determine)which)declara:on)of)a)named)object)corresponds)to)each)use)of)the)object)
– Scoping)rules)map)uses)of)objects)to)their)declara:ons)
• C++)and)Java)use)sta$c&scoping:)– Mapping)from)uses)to)declara:ons)at)compile):me)– C++)uses)the)"most)closely)nested")rule)
• a)use)of)variable)x)matches)the)declara:on)in)the)most)closely)enclosing)scope))
• such)that)the)declara:on)precedes)the)use)
Monday, April 22, 13Monday, April 29, 13
Dynamic(Scoping(Revisited(• Each(func8on(has(an(environment(of(defini8ons(• If(a(name(that(occurs(in(a(func8on(is(not(found(in(its(environment,(its(caller�s(environment(is(searched(
• And(if(not(found(there,(the(search(con8nues(back(through(the(chain(of(callers((
• Now,(with(that(cleared(up…(– Assuming(that(dynamic(scoping(is(used,(what(is(output(by(the(
following(program?((
• void main() { int x = 0; f1(); g(); f2(); } • void f1() { int x = 10; g(); } • void f2() { int x = 20; f1(); g(); } • void g() { print(x); }
Monday, April 22, 13
“most recently called still active” Monday, April 29, 13
Dynamic(Scoping(Revisited(• Each(func8on(has(an(environment(of(defini8ons(• If(a(name(that(occurs(in(a(func8on(is(not(found(in(its(environment,(its(caller�s(environment(is(searched(
• And(if(not(found(there,(the(search(con8nues(back(through(the(chain(of(callers((
• Now,(with(that(cleared(up…(– Assuming(that(dynamic(scoping(is(used,(what(is(output(by(the(
following(program?((
• void main() { int x = 0; f1(); g(); f2(); } • void f1() { int x = 10; g(); } • void f2() { int x = 20; f1(); g(); } • void g() { print(x); }
Monday, April 22, 13
“most recently called still active”
Need to be able to write the output for
sample programs
Monday, April 29, 13
Symbol Tables: The Intuition
0: int x = 137;
1: int z = 42;
2: int MyFunction(int x, int y) {
3: printf("%d,%d,%d\n", x@2, y@2, z@1); 4: {
5: int x, z;
6: z@5 = y@2; 7: x@5 = z@5; 8: {
9: int y = x@5;10: {
11: printf("%d,%d,%d\n", x@5, y@9, z@5);12: }
13: printf("%d,%d,%d\n", x@5, y@9, z@5);14: }
15: printf("%d,%d,%d\n", x@5, y@2, z@5);16: }17: }
Symbol Table
x 0
z 1
x 2
y 2
x 5
z 5
Monday, April 29, 13
Symbol Tables: The Intuition
0: int x = 137;
1: int z = 42;
2: int MyFunction(int x, int y) {
3: printf("%d,%d,%d\n", x@2, y@2, z@1); 4: {
5: int x, z;
6: z@5 = y@2; 7: x@5 = z@5; 8: {
9: int y = x@5;10: {
11: printf("%d,%d,%d\n", x@5, y@9, z@5);12: }
13: printf("%d,%d,%d\n", x@5, y@9, z@5);14: }
15: printf("%d,%d,%d\n", x@5, y@2, z@5);16: }17: }
Symbol Table
x 0
z 1
x 2
y 2
x 5
z 5
Need to be able to construct the symbol table specifying the decls in each scope, and
the current scopeMonday, April 29, 13
Scoping with Inheritance
public class Base {
public int publicBaseInt = 1;
protected int baseInt = 2;
}
public class Derived extends Base {
public int derivedInt = 3;
public int publicBaseInt = 4;
public void doSomething() {
System.out.println(publicBaseInt);
System.out.println(baseInt);
System.out.println(derivedInt);
int publicBaseInt = 6;
System.out.println(publicBaseInt);
}
}
Root Scope
Base
publicBaseInt 1
baseInt 2
Derived
derivedInt 3
publicBaseInt 4
> 4
2
3
6
doSomething
publicBaseInt 6
Monday, April 29, 13
Scoping with Inheritance
public class Base {
public int publicBaseInt = 1;
protected int baseInt = 2;
}
public class Derived extends Base {
public int derivedInt = 3;
public int publicBaseInt = 4;
public void doSomething() {
System.out.println(publicBaseInt);
System.out.println(baseInt);
System.out.println(derivedInt);
int publicBaseInt = 6;
System.out.println(publicBaseInt);
}
}
Root Scope
Base
publicBaseInt 1
baseInt 2
Derived
derivedInt 3
publicBaseInt 4
> 4
2
3
6
doSomething
publicBaseInt 6
Same for scoping with inheritance
Monday, April 29, 13
• Type inference
• infer the type for expression, statement,
• check “well-formed”
Monday, April 29, 13
• Function overloading/overriding
Monday, April 29, 13
Overloading with Inheritancevoid Function();
void Function(int x);
void Function(double x);
void Function(Base b);
void Function(Derived d);
Function(new Derived);
() (int x) (double x) (Base b) (Derived d)
Monday, April 29, 13
Overloading with Inheritancevoid Function();
void Function(int x);
void Function(double x);
void Function(Base b);
void Function(Derived d);
Function(new Derived);
() (int x) (double x) (Base b) (Derived d)
Need to know how overloading decides what function to call (best
match, hierarchical function overloads for ambiguity, etc)
Monday, April 29, 13
Relaxing our Restrictions
class Base {
Base clone() {
return new Base;
}
}
class Derived extends Base {
Derived clone() {
return new Derived;
}
}Is this safe?
Monday, April 29, 13
Is this safe?
Relaxing our Restrictions (Again)
class Base {
bool equalTo(Base B) {
/* … */
}
}
class Derived extends Base {
bool equalTo(Derived D) {
/* … */
}
}
Monday, April 29, 13
A Concrete Exampleclass Fine {
void nothingFancy(Fine f) {
/* … do nothing … */
}
}
class Borken extends Fine {
int missingFn() {
return 137;
}
void nothingFancy(Borken b) {
Print(b.missingFn());
}
}
int main() {
Fine f = new Borken;
f.nothingFancy(new Fine);
}
(That calls this
one)
Monday, April 29, 13
Monday, April 29, 13
• Good luck!
Monday, April 29, 13