compiler Constreuctioncompiler Constreuction 11
Chapter 4 Syntax AnalysisChapter 4 Syntax Analysis
Topics to cover:Topics to cover:
Context-Free Grammars: Context-Free Grammars:
Concepts and NotationConcepts and Notation
Writing and rewriting a grammar Writing and rewriting a grammar
Syntax Error Handling and RecoverySyntax Error Handling and Recovery
compiler Constreuctioncompiler Constreuction 22
IntroductionIntroduction
Why CFGWhy CFG CFG gives a precise syntactic specification of a CFG gives a precise syntactic specification of a
programming language.programming language. Automatic efficient parser generatorAutomatic efficient parser generator Enabling automatic translator generator Enabling automatic translator generator Language extension becomes easierLanguage extension becomes easier
The role of the parserThe role of the parser Taking tokens from scanner, parsing, reporting syntax Taking tokens from scanner, parsing, reporting syntax
errorserrors Not just parsing, in a syntax-directed translator, the Not just parsing, in a syntax-directed translator, the
parser also conducts type checking, semantic analysis parser also conducts type checking, semantic analysis and IR generation.and IR generation.
compiler Constreuctioncompiler Constreuction 33
Example of CFGExample of CFG
A C– program is made out of functions, a function out A C– program is made out of functions, a function out of declarations and blocks, a block out of statements, a of declarations and blocks, a block out of statements, a statement out of expressions, … etcstatement out of expressions, … etc
<program> <program> <global_decl_list> <global_decl_list><global_decl_list> <global_decl_list> <global_decl_list><global_decl> | e <global_decl_list><global_decl> | e<global_decl> <global_decl> <decl_list> <function_decl> <decl_list> <function_decl><function_decl> <function_decl> <type> id ( <param_list> ) { <block> } <type> id ( <param_list> ) { <block> }<block> <block> <decl_list> <statement_list> | e <decl_list> <statement_list> | e<decl_list> <decl_list> <decl_list> <decl> | <decl> | e <decl_list> <decl> | <decl> | e<decl> <decl> <type_decl> | <var_decl> <type_decl> | <var_decl> <type> <type> void | int | float void | int | float <statement_list> <statement_list> …. ….<statement> <statement> { <block> } { <block> }
compiler Constreuctioncompiler Constreuction 44
Notational ConventionsNotational Conventions
Following symbols are terminalsFollowing symbols are terminals Lower case letters such as a,b,c.Lower case letters such as a,b,c. Operators (+,-, etc) and punctuation symbols Operators (+,-, etc) and punctuation symbols
(parentheses, commas, etc)(parentheses, commas, etc) Digits such as 0,1,2,etcDigits such as 0,1,2,etc Boldface strings such as Boldface strings such as idid or or ifif
compiler Constreuctioncompiler Constreuction 55
Notational ConventionsNotational Conventions
NonterminalsNonterminals Upper case letters such as A,B,CUpper case letters such as A,B,C The letter S – the start symbolThe letter S – the start symbol Lower case italic names such as Lower case italic names such as exprexpr or or stmtstmt
Grammar symbolsGrammar symbols upper case, late in the alphabet, such as X,Y,Z,.upper case, late in the alphabet, such as X,Y,Z,.
Strings of terminals Strings of terminals lower case letters late in the alphabet, such as u,v,.. zlower case letters late in the alphabet, such as u,v,.. z
Strings of grammar symbolsStrings of grammar symbols Lower-case Greek letters, such as Lower-case Greek letters, such as
compiler Constreuctioncompiler Constreuction 66
ExampleExample
expr expr op exprexpr (expr)expr - exprexpr idop +op -op *op /op
Using the notational shorthand
E E A E | (E) | -E | idA + | - | * | / |
Non-terminals: E and AStart symbol: E
compiler Constreuctioncompiler Constreuction 77
DerivationDerivation
Given a string AIfis a production, then we can replace
A by , written as A means derives in one-step+ means derive in one or more steps* means drive in zero or more steps
The language L(G) generated by G is the set of terminal strings w such that S + w. The string w is called a sentence of G.If S * where may contain nonterminals, we say is a sentential form of G
compiler Constreuctioncompiler Constreuction 88
ExerciseExercise
What is a sentence of language L defined What is a sentence of language L defined by the C++ grammar G?by the C++ grammar G?
Is the following string a sentence or a Is the following string a sentence or a sentential form?sentential form?
int parse(<int parse(<parameter_listparameter_list>) {}>) {}
a C++ program
A sentential form
compiler Constreuctioncompiler Constreuction 99
Derivation (cont.)Derivation (cont.)
Consider the following grammar G0
E E + E | E * E | (E) | -E | id
The string -(id + id) is a sentence of G0 because there is a derivationE - E - (E) - (E+E) - (id +E) -(id + id)
Leftmost derivation: only the leftmost nonterminal is replacedRightmost derivation: only the rightmost nonterminal is replaced
Exercise: is id-id a sentence of G0? Is –id+id a sentence?
No Yes
compiler Constreuctioncompiler Constreuction 1010
Parse Tree and DerivationParse Tree and Derivation
A Parse tree can be viewed as a graphical representationfor a derivation that ignore replacement order.
E - E - (E) - (E+E) - (id +E) -(id + id)
E
- E
( E )
E + E
id id
Interior node: non-terminalLeaves: terminalChildren: right-hand side
compiler Constreuctioncompiler Constreuction 1111
CFG is more powerful than RECFG is more powerful than RE
Every RE can be described by a CFGEvery RE can be described by a CFG ExampleExample (a|b)*abb(a|b)*abb
A A aA | bA | abb aA | bA | abb Converting a NFA into a CFGConverting a NFA into a CFG
For each state I of the NFA, create a For each state I of the NFA, create a nonterminal symbol Ainonterminal symbol Ai
If state i goes to stat j on input a, add If state i goes to stat j on input a, add production Ai production Ai aAj aAj
Ai Ai Aj if state i goes to j on eAj if state i goes to j on e Ai Ai e if state i is an accepting state e if state i is an accepting state
compiler Constreuctioncompiler Constreuction 1212
Why do we need RE?Why do we need RE?
RE is sufficiently powerful for lexical rulesRE is sufficiently powerful for lexical rules RE is more concise and easier to understandRE is more concise and easier to understand More efficient lexical analyzer can be More efficient lexical analyzer can be
constructed from RE than from CFGconstructed from RE than from CFG Separating lexical from nonlexical part has a Separating lexical from nonlexical part has a
few advantages such as modularization, easier few advantages such as modularization, easier to port, etc.to port, etc.
Exercise:Exercise: what if we don’t have token definition?what if we don’t have token definition?
compiler Constreuctioncompiler Constreuction 1313
Defects in CFGDefects in CFG
Useless nonterminalsUseless nonterminals S S A | B A | B A A a a B B Bb Bb C C c c
AmbiguityAmbiguity Top-Down parsing issuesTop-Down parsing issues
Left recursionLeft recursion Left factoringLeft factoring
<derives no terminal string><unreachable>
compiler Constreuctioncompiler Constreuction 1414
AmbiguityAmbiguity A grammar is A grammar is ambiguousambiguous if it produces more than one if it produces more than one
parse tree for some sentencesparse tree for some sentences example 1: A+B+C example 1: A+B+C ( is it (A+B)+C or A+(B+C) )( is it (A+B)+C or A+(B+C) )
Improper production: expr Improper production: expr expr + expr | id expr + expr | id
example 2: A+B*C example 2: A+B*C ( is it (A+B)*C or A+(B*C) ) ( is it (A+B)*C or A+(B*C) ) Improper production: expr Improper production: expr expr + expr | expr * expr expr + expr | expr * expr
example 3: example 3: ifif E1 E1 then ifthen if E2 E2 thenthen S1 S1 elseelse S2 S2 (which (which thenthen does the does the elseelse match with) match with) Improper production: Improper production:
stmt stmt if expr then stmt if expr then stmt | if expr then stmt else stmt| if expr then stmt else stmt
compiler Constreuctioncompiler Constreuction 1515
Two parse trees of example 3
stmt
if E1 then stmt
if E2 then S1 else S2
stmt
if E1 then stmt else S2
if E2 then S1
compiler Constreuctioncompiler Constreuction 1616
Eliminating AmbiguityEliminating Ambiguity
Operator AssociativityOperator Associativity expr expr expr + term | term expr + term | term
Operator PrecedenceOperator Precedence expr expr expr + term | term expr + term | term
term term term * factor | factor term * factor | factor
Dangling ElseDangling Else stmt stmt matched | unmatched matched | unmatched
matchedmatched if expr then if expr then matchedmatched else matched else matched
unmatched unmatched if expr then stmt if expr then stmt
| if expr then | if expr then matchedmatched else unmatched else unmatched
compiler Constreuctioncompiler Constreuction 1717
Eliminating Left RecursionEliminating Left Recursion
Immediate left recursionImmediate left recursion Example: A Example: A A A | | TransformationTransformation
A A A A1 | A1 | A | … | | … | | |2 | …2 | …
Where no Where no begins with A, we replace A productions begins with A, we replace A productions byby
A A 1A’ | 1A’ | 2A’ | ….2A’ | ….
A’ A’ 1A’ | 1A’ | 2A’ | … | 2A’ | … |
compiler Constreuctioncompiler Constreuction 1818
Indirect Indirect Left RecursionLeft Recursion Example: Example:
S S Aa | b Aa | bA A Ac | Sd | Ac | Sd |
Transformation (assuming no cycles ATransformation (assuming no cycles A+ A)+ A)1.1. Arrange nonterminals in order A1, A2, … AnArrange nonterminals in order A1, A2, … An2.2. for i := 1 to n dofor i := 1 to n do
for j := 1 to i-1 do beginfor j := 1 to i-1 do begin Replace Ai Replace Ai Aj Ajbybyi i .... where Ajwhere Aj | … are current Aj prod | … are current Aj prod endend Eliminate the immediate left recursion among AiEliminate the immediate left recursion among Aiendend
compiler Constreuctioncompiler Constreuction 1919
In the above example, In the above example, S S Aa | b Aa | bA A Ac | Sd | Ac | Sd |
A A Sd will be replaced by Sd will be replaced byA A Ac | Aad | bd | Ac | Aad | bd | , , then eliminates immediate then eliminates immediate
recursion among A productions and yields the followingrecursion among A productions and yields the following
S S Aa | b Aa | bA A bdA’ | A’ bdA’ | A’A’ A’ cA’ | adA’ | cA’ | adA’ |
compiler Constreuctioncompiler Constreuction 2020
Algorithm 4.1 Eliminating Left RecursionAlgorithm 4.1 Eliminating Left Recursion This algorithm will systematically eliminate left This algorithm will systematically eliminate left
recursions from a grammar. recursions from a grammar. This is about how to remove This is about how to remove indirectindirect left left
recursions.recursions. Precondition: the grammar has no cycles or Precondition: the grammar has no cycles or --
productions. A cycle means: A productions. A cycle means: A + A+ ATo avoid getting A To avoid getting A A type of productions during A type of productions during nonterminal replacement.nonterminal replacement.For example, AFor example, A BA, B BA, B Ab | Ab |
when Awhen ABA is derived to ABA is derived to AAAa cycle shows up.a cycle shows up. -production also makes the algorithm more complex -production also makes the algorithm more complex
because Abecause ABCD may be derived to ABCD may be derived to ACD so CD so handling the leftmost non-terminal only is not sufficienthandling the leftmost non-terminal only is not sufficient
compiler Constreuctioncompiler Constreuction 2121
Indirect Left RecursionIndirect Left Recursion
A A Bb | a Bb | aB B Cc | b Cc | bC C Dd | c Dd | cD D Aa | d Aa | d
A A Bb Bb Ccb Ccb Ddcb Ddcb Aadcb Aadcb C C Dd Dd Aad Aad Bbad Bbad Ccbad Ccbad
Need to expose immediate left recursions and Need to expose immediate left recursions and then eliminate them. Some ordering is needed. then eliminate them. Some ordering is needed. Suppose we replace ASuppose we replace ABb by ABb by A Ccb and Ccb and then start with B then start with B Cc Cc Ddc Ddc Aadc Aadc Ccbabc, this would never expose the Ccbabc, this would never expose the immediate left recursion in this example.immediate left recursion in this example.
compiler Constreuctioncompiler Constreuction 2222
Algorithm 4.1Algorithm 4.1For i:= 1 to n do beginFor i:= 1 to n do beginFor j:= 1 to i-1 do beginFor j:= 1 to i-1 do begin
replace each production of the form replace each production of the form Ai Ai AjAj by bythe productions the productions ii .. .. where where AjAj | … are current | … are current Aj Aj productionproduction
EndEndeliminate the immediate left recursion among Ai-eliminate the immediate left recursion among Ai-productions productions
EndEnd
Key idea:Key idea:For each non-terminal Ai, all references to lower For each non-terminal Ai, all references to lower numbered non-terminal Aj, (where j < i) will be replaced numbered non-terminal Aj, (where j < i) will be replaced by higher numbered non-terminals. by higher numbered non-terminals.
compiler Constreuctioncompiler Constreuction 2323
..
A1 A1 … …
A2 A2 Ai-1 Ai-1 Ai+k Ai+k ……
……
Ai Ai Ai-1 Ai-1 | A2 | A2 … …
……
AnAn
After replacement,there will be no backwardreferences
compiler Constreuctioncompiler Constreuction 2424
Left FactoringLeft Factoring
Consider the following grammarA 1 |
It is not easy to determine whether to expand A to or A transformation called left factoring can be applied. It becomes:
A A’A’
compiler Constreuctioncompiler Constreuction 2525
ExerciseExercise
stmt stmt if expr then stmt if expr then stmt | if expr then stmt else stmt| if expr then stmt else stmt
For the following grammar form:For the following grammar form:
A A 1 | 1 | 22
What is What is ? ? 1? 1? 2?2? : if expr then stmtelse stmt
compiler Constreuctioncompiler Constreuction 2626
Syntax Error HandlingSyntax Error Handling
Different type of errorsDifferent type of errors LexicalLexical SyntacticSyntactic SemanticSemantic LogicalLogical
Error handling goalsError handling goals Report errors clearly and accuratelyReport errors clearly and accurately Recover quickly Recover quickly FastFast
compiler Constreuctioncompiler Constreuction 2727
Error Handling StrategiesError Handling Strategies
Don’t quit after detecting the 1Don’t quit after detecting the 1stst error. error. Avoid introducing “spurious” errorsAvoid introducing “spurious” errors Inhibit error messages that stem from errors Inhibit error messages that stem from errors
uncovered too close togetheruncovered too close together Simple error repair will be sufficient due to the Simple error repair will be sufficient due to the
increasing emphasis on interactive computing increasing emphasis on interactive computing and good programming environment.and good programming environment.
compiler Constreuctioncompiler Constreuction 2828
Error Recovery StrategiesError Recovery Strategies Panic modePanic mode
Deleting input tokens until one of a designated Deleting input tokens until one of a designated set of synchronizing tokens is found. set of synchronizing tokens is found.
Phrase levelPhrase level Local correction to repair punctuation errorsLocal correction to repair punctuation errors
Error productionsError productions Augment the grammar with error productionsAugment the grammar with error productions
Global correctionGlobal correction Globally least-cost correction to a string, costly to Globally least-cost correction to a string, costly to
implement.implement.
Top Related