Context free languages 1. Equivalence of context free grammars 2. Normal forms.
Context-Free Analysis - softech.cs.uni-kl.de · Context-Free Syntax Analysis (2) Remarks: •...
Transcript of Context-Free Analysis - softech.cs.uni-kl.de · Context-Free Syntax Analysis (2) Remarks: •...
Context-Free AnalysisLecture Compilers SS 2009
Dr.-Ing. Ina Schaefer
Software Technology GroupTU Kaiserslautern
Ina Schaefer Context-Free Analysis 1
Content of Lecture
1. Introduction: Overview and Motivation2. Syntax- and Type Analysis
2.1 Lexical Analysis2.2 Context-free Syntax Analysis2.3 Context-sensitive Syntax Analysis
3. Translation to Target Language3.1 Translation of Imperative Language Constructs3.2 Translation of Object-Oriented Language Constructs
4. Selected Aspects of Compilers4.1 Intermediate Languages4.2 Optimization4.3 Command Selection4.4 Register Allocation4.5 Code Generation
5. Garbage Collection6. XML Processing (DOM, SAX, XSLT)
Ina Schaefer Context-Free Analysis 2
Outline of Context-Free Analysis
1. Specification of Parsers
2. Implementation of ParsersTop-Down Syntax Analysis
Recursive DescentLL(k) Parsing TheoryLL Parser Generation
Bottom-Up Syntax AnalysisPrinciples of LR ParsingLR Parsing TheorySLR, LALR, LR(k) ParsingLALR-Parser Generation
3. Error Handling
4. Concrete and Abstract Syntax
Ina Schaefer Context-Free Analysis 3
Context-Free Syntax Analysis
Tasks• Check, if Token Stream (from Scanner) matches context-free
syntax of languages! Error Case: Error handling! Correctness: Construction of Syntax Tree
Parser
Token Stream
Abstract / Concrete Syntax Tree
Ina Schaefer Context-Free Analysis 4
Context-Free Syntax Analysis (2)
Remarks:• Parsing can be interleaved with other actions processing the
program (e.g. attributation).• Syntax tree controls important parts of translation. Hence, we
distinguish! Concrete syntax tree corresponds to context-free grammar! Abstract syntax tree aims at further processing steps, compact
representation of essential information.
Ina Schaefer Context-Free Analysis 5
Specification of Parsers
Specification of Parsers
2 general specification techniques• Syntax Diagrams• Context-Free Grammars (often in extended form)
Ina Schaefer Context-Free Analysis 6
Specification of Parsers
Context-Free Grammars
DefinitionLet N and T be two alphabets, with N ! T = " and ! a finite subset ofN # (N $ T )! and S % N. Then " = (N, T ,!, S) is a context-freegrammar (CFG) where
• N is the set of non-terminals• T is the set of terminals• ! is the set of productions rules• S is the start symbol (axiom)
Ina Schaefer Context-Free Analysis 7
Specification of Parsers
Context-Free Grammars (2)
Notations:• A, B, C, . . . denote non-terminals• a, b, c, . . . denote terminals• x , y , z, . . . denote strings of terminals, i.e. x % T !
• !,", #, $,%,&, ' are strings of terminals and non-terminals, i.e.! % (N $ T )!
Productions are denoted by A & !.
The notation A & !|"|#| . . . is an abbreviation forA & !, A & ", A & #, . . .
Ina Schaefer Context-Free Analysis 8
Specification of Parsers
Derivation
Let " =( N, T ,!, S) be a CFG:• $ is directly derivable from % in ", % produces $ directly, % ' $, if
there exists &A' = % and &!' = $ and A & ! % !
• $ is derivable from % in ",% '! $, if there exists %0, . . . ,%n with% = %0 and $ = %n and for all i % {0, . . . , n ( 1} it holds that%i ' %i+1. %0, . . . ,%n is the derivation of $ from %.
• '! is the reflexive, transitive closure of '.
Ina Schaefer Context-Free Analysis 9
Specification of Parsers
Derivation (2)
• The derivation %0, . . . ,%n is a left derivation (right derivation), if in%i the left-most (right-most) non-terminal is replaced. Leftderivation steps are denoted by % 'lm $. Right derivation stepsare denoted by % 'rm $.
• The tree-like representation of a derivation is a syntax tree.
• L(") = {z % T ! |S '! z} is the language generated by ".
• x % L(") is a sentence of ".
• % % (N $ T )! with S '! % is a sentential form of ".
Ina Schaefer Context-Free Analysis 10
Specification of Parsers
Ambiguity in Grammars
• A sentence is unambiguous if it has exactly one syntax tree. Asentence is ambiguous if it has more than one syntax tree.
• For each syntax tree, there exists exactly on left derivation andexactly one right derivation.
• Thus it holds: A sentence is unambiguous if-and-only-if it hasexactly one left (right) derivation.
• A grammar is ambiguous if it contains an ambiguous sentence,else it is unambiguous.
• For programming languages, unambiguous grammars areessential, as the semantics and the translation are defined by thesyntactic structure.
Ina Schaefer Context-Free Analysis 11
Specification of Parsers
Ambiguity in Grammars (2)
Example 1: Grammar for Expressions "0
• S & E• E & E + E• E & E ) E• E & (E)
• E & ID
Consider the input string (av + av) ) bv + cv + dv
which is the following input to the context-free analysis(ID + ID) ) ID + ID + ID
Ina Schaefer Context-Free Analysis 12
Specification of Parsers
Ambiguity in Grammars (3)Syntax tree for (ID + ID) ) ID + ID + ID
54© A. Poetzsch-Heffter, TU Kaiserslautern25.04.2007
Beispiele: (Mehrdeutigkeit)
1. Beispiel einer Ausdrucksgrammatik:
!0: S E, E E + E, E E * E, E ( E ), E ID
Betrachte die Eingabe: (av+av) * bv + cv +dv)
Eingabe zur kf-Analyse: ( ID + ID ) * ID + ID + ID
S
"
" "
" "
E E E E E
( ID + ID ) * ID + ID + ID
- Syntaxbaum entspricht nicht den üblichen
Rechenregeln.
- Es gibt mehrere Syntaxbäume gemäß !0,
insbesondere ist die Grammatik mehrdeutig.
• Syntax tree does not match conventional rules of arithmetic.• There are several syntax trees according to "0 for this input,
hence "0 is ambiguous.
Ina Schaefer Context-Free Analysis 13
Specification of Parsers
Ambiguity in Grammars (4)Example 2: Ambiguity in if-then-else construct
if B1 then if B2 then A:= 9 else A:= 7
First Derivation
55© A. Poetzsch-Heffter, TU Kaiserslautern25.04.2007
2. Mehrdeutigkeit beim if-then-else-Konstrukt:
if B1 then if B2 then A:=8 else A:= 7
IFTHENELSE
ANW
IFTHEN
ANW ANW
ZW ZW
IF ID THEN IF ID THEN ID EQ CO ELSE ID EQ CO
ZW ZW
ANW ANW
IFTHENELSE
ANW
IFTHEN
Ina Schaefer Context-Free Analysis 14
Specification of Parsers
Ambiguity in Grammars (5)
Second Derivation
55© A. Poetzsch-Heffter, TU Kaiserslautern25.04.2007
2. Mehrdeutigkeit beim if-then-else-Konstrukt:
if B1 then if B2 then A:=8 else A:= 7
IFTHENELSE
ANW
IFTHEN
ANW ANW
ZW ZW
IF ID THEN IF ID THEN ID EQ CO ELSE ID EQ CO
ZW ZW
ANW ANW
IFTHENELSE
ANW
IFTHEN
Ina Schaefer Context-Free Analysis 15
Specification of Parsers
Ambiguity in Grammars (6)
Remarks:• Each derivation corresponds to exactly one syntax tree. In
reverse, for each syntax tree, there can be several derivations.• Instead of the term "syntax tree" often also the terms "structure
tree" or "derivation tree" are used.• For each language, there can be several generating grammars, i.e.
the mapping L: Grammar & Language is in general not injective.
Ina Schaefer Context-Free Analysis 16
Specification of Parsers
Ambiguity as Grammar Property
Ambiguity is a grammar property. The grammar for expressions "0 isan example of an ambiguous grammar.
"0:• S & E• E & E + E• E & E ) E• E & (E)
• E & ID
57© A. Poetzsch-Heffter, TU Kaiserslautern25.04.2007
Beispiel: (Mehrdeutigkeit als Grammatikeig.)
Die obige Ausdrucksgrammatik
!0: S E, E E + E | E * E | E ( E ) | E ID
ist ein Beispiel für eine mehrdeutige Grammatik:
S
E
E E
E E
ID + ID * ID
E E
E E
E
S
Mehrdeutigkeit ist zunächst einmal eine Grammatik-
eigenschaft.
Ina Schaefer Context-Free Analysis 17
Specification of Parsers
Ambiguity as Grammar Property (2)
But there exists an unambiguous grammar for the same language:
"1:• S & E• E & T + E |T• T & F ) T |F• F & (E) |ID
58© A. Poetzsch-Heffter, TU Kaiserslautern25.04.2007
Aber es gibt eine eindeutige Grammatik für die Sprache:
!1: S E, E T + E | T, T F * T | F, F ( E ) | ID
S
E
E
E
E
F
F
F
F
T
T
T
T
( ID + ID ) * ID + ID
F
T
Lesen Sie zu Abschnitt 2.2.1:
Wilhelm, Maurer:
• aus Kap. 8, Syntaktische Analyse, die S. 271 - 283
Appel:
• aus Chap. 3, S. 40 - 47
(Es gibt aber auch kontextfreie Sprachen, die nur durch
mehrdeutige Grammatiken beschrieben werden.)Ina Schaefer Context-Free Analysis 18
Specification of Parsers
Literature
Recommended Reading:• Wilhelm, Maurer: Chapter 8, pp. 271 - 283 (Syntactic Analysis)• Appel: Chapter 3, pp. 40-47
Ina Schaefer Context-Free Analysis 19
Implementation of Parsers
Implementation of Parsers
Overview• Top-Down Parsing
! Recursive Descent! LL-Parsing! LL-Parser Generation
• Bottom-Up Parsing! LR-Parsing! LALR, SLR, LR(k)-Parsing! LALR-Parser Generation
Ina Schaefer Context-Free Analysis 20
Implementation of Parsers
Methods for Context-Free Analysis
• Manually developed, grammar-specific implementation(error-prone, inflexible)
• Backtracking (simple, but inefficient)• Cocke-Younger-Kasami-Algorithm (1967):
! for all CFGs in Chomsky Normalform! based on idea of dynamic programming! Time Complexity O(n3), however linear complexity desired
• Top-Down-Methods: from Axiom to Token stream• Bottom-up-Methods: from Token stream to Axiom
Ina Schaefer Context-Free Analysis 21
Implementation of Parsers
Example: Top-Down AnalysisAccording to "1: Result is a left derivation.
60© A. Poetzsch-Heffter, TU Kaiserslautern25.04.2007
Beispiel: (Top-down-Analyse)
S
E =>
T + E =>
F * T + E =>
( E ) * T + E =>
( T + E ) * T + E =>
( F + E ) * T + E =>
( ID + E ) * T + E =>
( ID + T ) * T + E =>
( ID + F ) * T + E =>
( ID + ID ) * T + E =>
( ID + ID ) * F + E =>
( ID + ID ) * ID + E =>
( ID + ID ) * ID + T =>
( ID + ID ) * ID + F =>
( ID + ID ) * ID + ID
Ergebnis der td-Analyse ist eine Linksableitung.
Gemäß !1 :
Ina Schaefer Context-Free Analysis 22
Implementation of Parsers
Example: Bottom-Up AnalysisAccording to "1: Result is a right derivation.
61© A. Poetzsch-Heffter, TU Kaiserslautern25.04.2007
Beispiel: (Bottom-up-Analyse)
( ID + ID ) * ID + ID <=
( F + ID ) * ID + ID <=
( T + ID ) * ID + ID <=
( T + F ) * ID + ID <=
( T + T ) * ID + ID <=
( T + E ) * ID + ID <=
( E ) * ID + ID <=
F * ID + ID <=
F * F + ID <=
F * T + ID <=
T + ID <=
T + F <=
T + T <=
T + E <=
E <=
S <=
Ergebnis der bu-Analyse ist eine Rechtsableitung.
Gemäß !1 :
Ina Schaefer Context-Free Analysis 23
Implementation of Parsers
Context-free Analysis with linear complexity
• Restrictions on Grammar (not every CFG has a linear parser)• Use of push-down automata or systems of recursive procedures• Usage of look ahead to remaining input in order to select next
production rule to be applied
Ina Schaefer Context-Free Analysis 24
Implementation of Parsers
Syntax Analysis Methods and Parser Generators
• Basic Knowledge of Syntax Analysis is essential for use of ParserGenerators.
• Parser generators are not always applicable.• Often, error handling has to be done manually.• Methods underlying parser generation is a good example for a
generic technique (and a highlight of computer science!).
Ina Schaefer Context-Free Analysis 25
Implementation of Parsers Top-Down Syntax Analysis
Top-Down Syntax Analysis
Educational Objectives• General Principle of Top-Down Syntax Analysis• Recursive Descent Parsing (at an Example)• Expressiveness of Top-Down Parsing• Basic Concepts of LL(k) Parsing
Ina Schaefer Context-Free Analysis 26
Implementation of Parsers Top-Down Syntax Analysis
Recusive Descent
Basic Idea
• Each non-terminal A is associated with a procedure. Thisprocedure accepts a partial sentence derived from A.
• The procedure implements a finite automaton constructed fromthe productions starting from A
• Recursion of grammar is mapped to mutual recursive proceduressuch that stack of higher programing languages can be used forimplementation.
Ina Schaefer Context-Free Analysis 27
Implementation of Parsers Top-Down Syntax Analysis
Construction of Recursive Descent Parser
Let ""1 be a CFG (like "1) with a terminal # denoting the end of theinput.
""1:• S & E#
• E & T + E |T• T & F ) T |F• F & (E) |ID
Construct item automaton for each non-terminal A. The itemautomaton describes the analysis of the productions with left side A.
Ina Schaefer Context-Free Analysis 28
Implementation of Parsers Top-Down Syntax Analysis
Item Automata
S & E#
64© A. Poetzsch-Heffter, TU Kaiserslautern25.04.2007
Konstruktion eines Parsers mit der Methode
des rekursiven Abstiegs (exemplarisch):
Sei !‘ wie !1, aber mit Randzeichen #, d.h.
S E #, E T + E | T, T F * T | F, F ( E ) | ID
Konstruiere für jedes Nichtterminal A den sogenannten
Item-Automaten. Er beschreibt die Analyse derjenigen
Produktionen, deren linke Seite A ist:
1
[S .E #] [S E.# ] [S E#.]
[E .T+E]
[E .T ]
[T .F*T]
[ T .F ]
[F .(E)]
[F .ID ]
[E T+.E] [E T+E.][E T.+E]
[E T.]
[ T F.][T F.*T]
[T F*.T] [T F*T.]
[F ID.]
[F (.E)] [F (E.)] [F (E).]
E #
T + E
F*
T
(
ID
E )
E & T + E |T
64© A. Poetzsch-Heffter, TU Kaiserslautern25.04.2007
Konstruktion eines Parsers mit der Methode
des rekursiven Abstiegs (exemplarisch):
Sei !‘ wie !1, aber mit Randzeichen #, d.h.
S E #, E T + E | T, T F * T | F, F ( E ) | ID
Konstruiere für jedes Nichtterminal A den sogenannten
Item-Automaten. Er beschreibt die Analyse derjenigen
Produktionen, deren linke Seite A ist:
1
[S .E #] [S E.# ] [S E#.]
[E .T+E]
[E .T ]
[T .F*T]
[ T .F ]
[F .(E)]
[F .ID ]
[E T+.E] [E T+E.][E T.+E]
[E T.]
[ T F.][T F.*T]
[T F*.T] [T F*T.]
[F ID.]
[F (.E)] [F (E.)] [F (E).]
E #
T + E
F*
T
(
ID
E )
T & F ) T |F
64© A. Poetzsch-Heffter, TU Kaiserslautern25.04.2007
Konstruktion eines Parsers mit der Methode
des rekursiven Abstiegs (exemplarisch):
Sei !‘ wie !1, aber mit Randzeichen #, d.h.
S E #, E T + E | T, T F * T | F, F ( E ) | ID
Konstruiere für jedes Nichtterminal A den sogenannten
Item-Automaten. Er beschreibt die Analyse derjenigen
Produktionen, deren linke Seite A ist:
1
[S .E #] [S E.# ] [S E#.]
[E .T+E]
[E .T ]
[T .F*T]
[ T .F ]
[F .(E)]
[F .ID ]
[E T+.E] [E T+E.][E T.+E]
[E T.]
[ T F.][T F.*T]
[T F*.T] [T F*T.]
[F ID.]
[F (.E)] [F (E.)] [F (E).]
E #
T + E
F*
T
(
ID
E )Ina Schaefer Context-Free Analysis 29
Implementation of Parsers Top-Down Syntax Analysis
Item Automata (2)
F & (E) |ID
64© A. Poetzsch-Heffter, TU Kaiserslautern25.04.2007
Konstruktion eines Parsers mit der Methode
des rekursiven Abstiegs (exemplarisch):
Sei !‘ wie !1, aber mit Randzeichen #, d.h.
S E #, E T + E | T, T F * T | F, F ( E ) | ID
Konstruiere für jedes Nichtterminal A den sogenannten
Item-Automaten. Er beschreibt die Analyse derjenigen
Produktionen, deren linke Seite A ist:
1
[S .E #] [S E.# ] [S E#.]
[E .T+E]
[E .T ]
[T .F*T]
[ T .F ]
[F .(E)]
[F .ID ]
[E T+.E] [E T+E.][E T.+E]
[E T.]
[ T F.][T F.*T]
[T F*.T] [T F*T.]
[F ID.]
[F (.E)] [F (E.)] [F (E).]
E #
T + E
F*
T
(
ID
E )
Ina Schaefer Context-Free Analysis 30
Implementation of Parsers Top-Down Syntax Analysis
Recursive Descent Parsing Procedures
• Item Automata can be mapped to recursive procedures.• The input is a token stream terminated by #.• The variable currSymbol contains one token look ahead,
i.e. the first symbol of the stream.
Ina Schaefer Context-Free Analysis 31
Implementation of Parsers Top-Down Syntax Analysis
Recursive Descent Parsing Procedures (2)
Production: S & E#
void S() {E ();if (currSymbol == ’#’){
accept();} else {
error();}
}
Ina Schaefer Context-Free Analysis 32
Implementation of Parsers Top-Down Syntax Analysis
Recursive Descent Parsing Procedures (3)Production: E & T + E |Tvoid E() {
T();if (currSymbol == ’+’){
readSymbol();E();}
}
Production: T & F ) T |Fvoid T() {
F();if (currSymbol == ’*’){
readSymbol();T();
}}
Ina Schaefer Context-Free Analysis 33
Implementation of Parsers Top-Down Syntax Analysis
Recursive Descent Parsing Procedures (4)
Production: F & (E) |IDvoid F() {
if (currSymbol == ’(’){readSymbol();E();if (currSymbol == ’)’){readSymbol();}else error();}else if (currSymbol == ID ){
readSymbol();}else error();
}
Ina Schaefer Context-Free Analysis 34
Implementation of Parsers Top-Down Syntax Analysis
Recursive Descent Parsing Procedures (5)
Remarks:• Recursive Descent
! is relatively easy to implement! can easily be used with other tasks (see following example)! is a typical example for syntax-directed methods (see also following
example)• Example uses one token look ahead.
Ina Schaefer Context-Free Analysis 35
Implementation of Parsers Top-Down Syntax Analysis
Recursive Descent and Evaluation
Example: Interpreter for Expressions using recursive descent
int env(Ident); // ID -> int// local variable int_result stores intermediate results
int S() {int int_result : = E();if (currSymbol == ’#’) {return int_result;}
else {error();return error_result;
}}
Ina Schaefer Context-Free Analysis 36
Implementation of Parsers Top-Down Syntax Analysis
Recursive Descent and Evaluation (2)
int E() {int int_result := T();if (currSymbol == ’+’){
readSymbol();return int_result + E();
}}
int T() {int int_result := F();if (currSymbol == ’*’){
readSymbol();return int_result * T();
}}
Ina Schaefer Context-Free Analysis 37
Implementation of Parsers Top-Down Syntax Analysis
Recursive Descent and Evaluation (3)
int F() {int int_result;if (currSymbol == ’(’){
readSymbol();int_result := E();if (currSymbol == ’)’){
readSymbol();return int_result;
}else { error();
return error_result; }}else if (currSymbol == ID) {
readSymbol();return env(code(ID));}
else { error();return error_result; }
}Ina Schaefer Context-Free Analysis 38
Implementation of Parsers Top-Down Syntax Analysis
Recursive Descent and Evaluation (4)
• Extension of Parser with Actions/Computations can easily beimplemented, but mixing of conceptually different tasks andcauses programs hard to maintain.
• For which grammars does the recursive descent technique work?& LL(k) Parsing Theory
Ina Schaefer Context-Free Analysis 39
Implementation of Parsers Top-Down Syntax Analysis
LL Parsing
• Basis for Town-Down Syntax Analysis• First L: Read Input from Left to Right• Second L: Search for Left Derivations
Ina Schaefer Context-Free Analysis 40
Implementation of Parsers Top-Down Syntax Analysis
LL(k) Grammars
Definition (LL(k) Grammar)Let " =( N, T ,!, S) be a CFG and k % N." is an LL(k) grammar, if for any two left derivations
S '!lm uA! 'lm u"! '!
lm ux
and
S '!lm uA! 'lm u#! '!
lm uy
it holds that if prefix(k , x) = prefix(k , y) then " = #.
Ina Schaefer Context-Free Analysis 41
Implementation of Parsers Top-Down Syntax Analysis
LL(k) Grammars (2)
Remarks:• A grammar is an LL(k) grammar if for a left derivation with k token
look ahead the correct production for the next derivation step canbe found.
• A Language L * #! is LL(k) if there exists LL(k) grammar " withL(") = L.
• The definition of LL(k) grammars provides no method to test if agrammar has the LL(k) property.
Ina Schaefer Context-Free Analysis 42
Implementation of Parsers Top-Down Syntax Analysis
Non LL(k) Grammars
Example 1: Grammar with Left Recursion "2:• S & E#
• E & E + T |T• T & T ) F |F• F & (E) |ID
Elimination of Left Recursion:Replace productions of form A & A!|" where " does not start with Aby A & "A" and A" & !A"|(.
Ina Schaefer Context-Free Analysis 43
Implementation of Parsers Top-Down Syntax Analysis
Non LL(k) Grammars (2)
Elimination of Left Recursion: From "2 we obtain "3.
"2:• S & E#
• E & E + T |T• T & T ) F |F• F & (E) |ID
"3
• S & E#
• E & TE "
• E " & +TE "|(• T & FT "
• T " & )FT |(• F & (E) |ID
Ina Schaefer Context-Free Analysis 44
Implementation of Parsers Top-Down Syntax Analysis
Non LL(k) Grammars (3)
Example 2: Grammar "4 with unlimited look ahead
• STM & VAR := VAR|ID(IDLIST )
• VAR & ID|ID(IDLIST )
• IDLIST & ID|ID, IDLIST
"4 is not an LL(k) grammar for any k.(Proof: cf. Wilhelm, Maurer, Example 8.3.4, p. 319)
Transformation to LL(2) grammar ""4:• STM & ASS_CALL|ID := VAR• ASS_CALL & ID(IDLIST )ASS_CALL_REST• ASS_CALL_REST &:= VAR|(
Ina Schaefer Context-Free Analysis 45
Implementation of Parsers Top-Down Syntax Analysis
Non LL(k) Grammars (4)
Remark:
The transformed grammars accept the same language, but provideother syntax trees.
From a theoretical point of view, this is acceptable.
From a programming language implementation perspective, this is ingeneral not acceptable.
There are languages L for which no LL(k) grammar " exists thatgenerates the language, i.e. L(") = L. (Example: Grammar "5)
Ina Schaefer Context-Free Analysis 46
Implementation of Parsers Top-Down Syntax Analysis
Non LL(k) Grammars (5)
Example 3: For L("5), there exists no LL(k) grammar.
• S & A|B• A & aAb|0• B & aBbb|1
We show that there is no ksuch that "5 is an LL(k)grammar.
Proof.Let k be arbitrary but fixed. Choose two derivations according to theLL(k) definition and show that desipite of equal prefixes of length k theresuts from " and # are not equal:
S '!lm S ' Alm '!
lm ak0bk
S '!lm S 'lm B '!
lm ak1b2k
Then: prefix(k , ak0bk ) = prefix(k , ak1b2k ) = ak , but " = A += B = #.
Ina Schaefer Context-Free Analysis 47
Implementation of Parsers Top-Down Syntax Analysis
FIRST and FOLLOW Sets
DefinitionLet " =( N, T ,!, S) be a CFG, k % N. T#k = {u % T ! | length(u) , k}denotes all prefixes of length at least k .
We define:• FIRSTk : (N $ T )! & P(T#k )
with FIRSTk (!) = {prefix(k , u) |! '! u}where prefix(n, u) = u for all u with length(U) , n.
• FOLLOWk : (N $ T )! & P(T#k )with FOLLOWk (!) = {w |S '! "!# - w % FIRSTk (#)}
Ina Schaefer Context-Free Analysis 48
Implementation of Parsers Top-Down Syntax Analysis
FIRST and FOLLOW Sets in Parse Trees
X
S
FIRSTk(X) FOLLOWk(X)
Ina Schaefer Context-Free Analysis 49
Implementation of Parsers Top-Down Syntax Analysis
Characterization of LL(1) Grammars
Definition (Reduced CFG)A CFG " = (N, T ,!, S) is reduced if each non-terminal occurs in aderivation and each non-terminal derives at least one word.
LemmaA reduced CFG is LL(1) if-and-only-if for any two productions A & "and A & # it holds that
FIRST1(").1 FOLLOW1(A) ! FIRST1(#).1 FOLLOW1(A) = "
where L1 .1 L2 = {prefix(1, vw) | v % L1, w % L2}
Remark: FIRST and FOLLOW sets are computable, such this criterioncan be checked automatically.
Ina Schaefer Context-Free Analysis 50
Implementation of Parsers Top-Down Syntax Analysis
Examples: FIRSTk and FOLLOWk
Check that modified expression grammar "3 is LL(1).• S & E#
• E & TE "
• E " & +TE "|(• T & FT "
• T " & )FT |(• F & (E) |ID
Compute FIRST1 and FOLLOW1 for each non-terminal.
Ina Schaefer Context-Free Analysis 51
Implementation of Parsers Top-Down Syntax Analysis
Examples: FIRSTk and FOLLOWk (2)
• F & (E) |ID:FI1((E)).1 FOLLOW1(F ) ! FIRST1(ID). FOLLOW1(F )
= {(}.1 FOLLOW1(F ) ! {ID}. FOLLOW1(F )= "
• E " & +TE "|(:FIRST1(+TE ").1 FOLLOW1(E ") ! FIRST1((). FOLLOW1(E ")
= {+}.1 FOLLOW1(E ") ! {(}. FOLLOW1(E ")= {+} !{ #, )}= "
• T " & )FT |( :FIRST1()FT ").1 FOLLOW1(T ") ! FIRST1((). FOLLOW1(T ")
= {)}.1 FOLLOW1(T ") ! {(}. FOLLOW1(T ")= {)} ! {+,#, )}= "
Ina Schaefer Context-Free Analysis 52
Implementation of Parsers Top-Down Syntax Analysis
Proof of LL Characterization Lemma
• Left-To-Right Direction: " is LL(1) implies FIRST and FOLLOWcharacterization.
Proof by Contradiction:Suppose two productions A & " and A & # with " += # and $ =FIRST1(").1 FOLLOW1(A) ! FIRST1(#).1 FOLLOW1(A) += ".
Then there exists z % $ with length(z) = 1.
As " is reduced, there are two derivations:
S '! $A! ' $"! '! $zxS '! $A! ' $#! '! $zy
Ina Schaefer Context-Free Analysis 53
Implementation of Parsers Top-Down Syntax Analysis
Proof of LL Characterization Lemma (2)
Thus, we can construct the following left derivations :
S '!lm uA! 'lm u"! '!
lm uzxS '!
lm uA! 'lm u#! '!lm uzy
with prefix(1, zx) = z = prefix(1, zy) which contradicts the LL(1)property of ".
Ina Schaefer Context-Free Analysis 54
Implementation of Parsers Top-Down Syntax Analysis
Proof of LL Characterization Lemma (3)
• Right-To-Left Direction: FIRST and FOLLOW characterizationimplies " is LL(1) .
Proof by Contradiction:Suppose " is not LL(1). Then there are two different derivationswith length(z) = 1:
S '! $A! ' $"! '! $zxS '! $A! ' $#! '! $zy
But z % FIRST1(").1 FIRST1(#) which is a contradiction.
Ina Schaefer Context-Free Analysis 55
Implementation of Parsers Top-Down Syntax Analysis
Parser Generation for LL(k) Languages
LL(k) Parser Generator
Grammar
Table for Push-Down Automaton/Parser Program
Error: Grammar is not LL(k)
Ina Schaefer Context-Free Analysis 56
Implementation of Parsers Top-Down Syntax Analysis
Parser Generation for LL(k) Languages (2)
Remarks:• Use of push-down automata with look ahead• Select Production from Tables• Advantages over bottom-up techniques in error analysis and error
handling
Example System: ANTLR (http://www.antlr.org/)
Recommended Reading for Top-Down Analysis:• Wilhelm, Maurer: Chapter 8, Sections 8.3.1. to Sections 8.3.4, pp.
312 - 329
Ina Schaefer Context-Free Analysis 57