Context-Free Analysis - softech.cs.uni-kl.de · Context-Free Syntax Analysis (2) Remarks: •...

29
Context-Free Analysis Lecture Compilers SS 2009 Dr.-Ing. Ina Schaefer Software Technology Group TU Kaiserslautern Ina Schaefer Context-Free Analysis 1 Content of Lecture 1. Introduction: Overview and Motivation 2. Syntax- and Type Analysis 2.1 Lexical Analysis 2.2 Context-free Syntax Analysis 2.3 Context-sensitive Syntax Analysis 3. Translation to Target Language 3.1 Translation of Imperative Language Constructs 3.2 Translation of Object-Oriented Language Constructs 4. Selected Aspects of Compilers 4.1 Intermediate Languages 4.2 Optimization 4.3 Command Selection 4.4 Register Allocation 4.5 Code Generation 5. Garbage Collection 6. XML Processing (DOM, SAX, XSLT) Ina Schaefer Context-Free Analysis 2

Transcript of Context-Free Analysis - softech.cs.uni-kl.de · Context-Free Syntax Analysis (2) Remarks: •...

Context-Free AnalysisLecture Compilers SS 2009

Dr.-Ing. Ina Schaefer

Software Technology GroupTU Kaiserslautern

Ina Schaefer Context-Free Analysis 1

Content of Lecture

1. Introduction: Overview and Motivation2. Syntax- and Type Analysis

2.1 Lexical Analysis2.2 Context-free Syntax Analysis2.3 Context-sensitive Syntax Analysis

3. Translation to Target Language3.1 Translation of Imperative Language Constructs3.2 Translation of Object-Oriented Language Constructs

4. Selected Aspects of Compilers4.1 Intermediate Languages4.2 Optimization4.3 Command Selection4.4 Register Allocation4.5 Code Generation

5. Garbage Collection6. XML Processing (DOM, SAX, XSLT)

Ina Schaefer Context-Free Analysis 2

Outline of Context-Free Analysis

1. Specification of Parsers

2. Implementation of ParsersTop-Down Syntax Analysis

Recursive DescentLL(k) Parsing TheoryLL Parser Generation

Bottom-Up Syntax AnalysisPrinciples of LR ParsingLR Parsing TheorySLR, LALR, LR(k) ParsingLALR-Parser Generation

3. Error Handling

4. Concrete and Abstract Syntax

Ina Schaefer Context-Free Analysis 3

Context-Free Syntax Analysis

Tasks• Check, if Token Stream (from Scanner) matches context-free

syntax of languages! Error Case: Error handling! Correctness: Construction of Syntax Tree

Parser

Token Stream

Abstract / Concrete Syntax Tree

Ina Schaefer Context-Free Analysis 4

Context-Free Syntax Analysis (2)

Remarks:• Parsing can be interleaved with other actions processing the

program (e.g. attributation).• Syntax tree controls important parts of translation. Hence, we

distinguish! Concrete syntax tree corresponds to context-free grammar! Abstract syntax tree aims at further processing steps, compact

representation of essential information.

Ina Schaefer Context-Free Analysis 5

Specification of Parsers

Specification of Parsers

2 general specification techniques• Syntax Diagrams• Context-Free Grammars (often in extended form)

Ina Schaefer Context-Free Analysis 6

Specification of Parsers

Context-Free Grammars

DefinitionLet N and T be two alphabets, with N ! T = " and ! a finite subset ofN # (N $ T )! and S % N. Then " = (N, T ,!, S) is a context-freegrammar (CFG) where

• N is the set of non-terminals• T is the set of terminals• ! is the set of productions rules• S is the start symbol (axiom)

Ina Schaefer Context-Free Analysis 7

Specification of Parsers

Context-Free Grammars (2)

Notations:• A, B, C, . . . denote non-terminals• a, b, c, . . . denote terminals• x , y , z, . . . denote strings of terminals, i.e. x % T !

• !,", #, $,%,&, ' are strings of terminals and non-terminals, i.e.! % (N $ T )!

Productions are denoted by A & !.

The notation A & !|"|#| . . . is an abbreviation forA & !, A & ", A & #, . . .

Ina Schaefer Context-Free Analysis 8

Specification of Parsers

Derivation

Let " =( N, T ,!, S) be a CFG:• $ is directly derivable from % in ", % produces $ directly, % ' $, if

there exists &A' = % and &!' = $ and A & ! % !

• $ is derivable from % in ",% '! $, if there exists %0, . . . ,%n with% = %0 and $ = %n and for all i % {0, . . . , n ( 1} it holds that%i ' %i+1. %0, . . . ,%n is the derivation of $ from %.

• '! is the reflexive, transitive closure of '.

Ina Schaefer Context-Free Analysis 9

Specification of Parsers

Derivation (2)

• The derivation %0, . . . ,%n is a left derivation (right derivation), if in%i the left-most (right-most) non-terminal is replaced. Leftderivation steps are denoted by % 'lm $. Right derivation stepsare denoted by % 'rm $.

• The tree-like representation of a derivation is a syntax tree.

• L(") = {z % T ! |S '! z} is the language generated by ".

• x % L(") is a sentence of ".

• % % (N $ T )! with S '! % is a sentential form of ".

Ina Schaefer Context-Free Analysis 10

Specification of Parsers

Ambiguity in Grammars

• A sentence is unambiguous if it has exactly one syntax tree. Asentence is ambiguous if it has more than one syntax tree.

• For each syntax tree, there exists exactly on left derivation andexactly one right derivation.

• Thus it holds: A sentence is unambiguous if-and-only-if it hasexactly one left (right) derivation.

• A grammar is ambiguous if it contains an ambiguous sentence,else it is unambiguous.

• For programming languages, unambiguous grammars areessential, as the semantics and the translation are defined by thesyntactic structure.

Ina Schaefer Context-Free Analysis 11

Specification of Parsers

Ambiguity in Grammars (2)

Example 1: Grammar for Expressions "0

• S & E• E & E + E• E & E ) E• E & (E)

• E & ID

Consider the input string (av + av) ) bv + cv + dv

which is the following input to the context-free analysis(ID + ID) ) ID + ID + ID

Ina Schaefer Context-Free Analysis 12

Specification of Parsers

Ambiguity in Grammars (3)Syntax tree for (ID + ID) ) ID + ID + ID

54© A. Poetzsch-Heffter, TU Kaiserslautern25.04.2007

Beispiele: (Mehrdeutigkeit)

1. Beispiel einer Ausdrucksgrammatik:

!0: S E, E E + E, E E * E, E ( E ), E ID

Betrachte die Eingabe: (av+av) * bv + cv +dv)

Eingabe zur kf-Analyse: ( ID + ID ) * ID + ID + ID

S

"

" "

" "

E E E E E

( ID + ID ) * ID + ID + ID

- Syntaxbaum entspricht nicht den üblichen

Rechenregeln.

- Es gibt mehrere Syntaxbäume gemäß !0,

insbesondere ist die Grammatik mehrdeutig.

• Syntax tree does not match conventional rules of arithmetic.• There are several syntax trees according to "0 for this input,

hence "0 is ambiguous.

Ina Schaefer Context-Free Analysis 13

Specification of Parsers

Ambiguity in Grammars (4)Example 2: Ambiguity in if-then-else construct

if B1 then if B2 then A:= 9 else A:= 7

First Derivation

55© A. Poetzsch-Heffter, TU Kaiserslautern25.04.2007

2. Mehrdeutigkeit beim if-then-else-Konstrukt:

if B1 then if B2 then A:=8 else A:= 7

IFTHENELSE

ANW

IFTHEN

ANW ANW

ZW ZW

IF ID THEN IF ID THEN ID EQ CO ELSE ID EQ CO

ZW ZW

ANW ANW

IFTHENELSE

ANW

IFTHEN

Ina Schaefer Context-Free Analysis 14

Specification of Parsers

Ambiguity in Grammars (5)

Second Derivation

55© A. Poetzsch-Heffter, TU Kaiserslautern25.04.2007

2. Mehrdeutigkeit beim if-then-else-Konstrukt:

if B1 then if B2 then A:=8 else A:= 7

IFTHENELSE

ANW

IFTHEN

ANW ANW

ZW ZW

IF ID THEN IF ID THEN ID EQ CO ELSE ID EQ CO

ZW ZW

ANW ANW

IFTHENELSE

ANW

IFTHEN

Ina Schaefer Context-Free Analysis 15

Specification of Parsers

Ambiguity in Grammars (6)

Remarks:• Each derivation corresponds to exactly one syntax tree. In

reverse, for each syntax tree, there can be several derivations.• Instead of the term "syntax tree" often also the terms "structure

tree" or "derivation tree" are used.• For each language, there can be several generating grammars, i.e.

the mapping L: Grammar & Language is in general not injective.

Ina Schaefer Context-Free Analysis 16

Specification of Parsers

Ambiguity as Grammar Property

Ambiguity is a grammar property. The grammar for expressions "0 isan example of an ambiguous grammar.

"0:• S & E• E & E + E• E & E ) E• E & (E)

• E & ID

57© A. Poetzsch-Heffter, TU Kaiserslautern25.04.2007

Beispiel: (Mehrdeutigkeit als Grammatikeig.)

Die obige Ausdrucksgrammatik

!0: S E, E E + E | E * E | E ( E ) | E ID

ist ein Beispiel für eine mehrdeutige Grammatik:

S

E

E E

E E

ID + ID * ID

E E

E E

E

S

Mehrdeutigkeit ist zunächst einmal eine Grammatik-

eigenschaft.

Ina Schaefer Context-Free Analysis 17

Specification of Parsers

Ambiguity as Grammar Property (2)

But there exists an unambiguous grammar for the same language:

"1:• S & E• E & T + E |T• T & F ) T |F• F & (E) |ID

58© A. Poetzsch-Heffter, TU Kaiserslautern25.04.2007

Aber es gibt eine eindeutige Grammatik für die Sprache:

!1: S E, E T + E | T, T F * T | F, F ( E ) | ID

S

E

E

E

E

F

F

F

F

T

T

T

T

( ID + ID ) * ID + ID

F

T

Lesen Sie zu Abschnitt 2.2.1:

Wilhelm, Maurer:

• aus Kap. 8, Syntaktische Analyse, die S. 271 - 283

Appel:

• aus Chap. 3, S. 40 - 47

(Es gibt aber auch kontextfreie Sprachen, die nur durch

mehrdeutige Grammatiken beschrieben werden.)Ina Schaefer Context-Free Analysis 18

Specification of Parsers

Literature

Recommended Reading:• Wilhelm, Maurer: Chapter 8, pp. 271 - 283 (Syntactic Analysis)• Appel: Chapter 3, pp. 40-47

Ina Schaefer Context-Free Analysis 19

Implementation of Parsers

Implementation of Parsers

Overview• Top-Down Parsing

! Recursive Descent! LL-Parsing! LL-Parser Generation

• Bottom-Up Parsing! LR-Parsing! LALR, SLR, LR(k)-Parsing! LALR-Parser Generation

Ina Schaefer Context-Free Analysis 20

Implementation of Parsers

Methods for Context-Free Analysis

• Manually developed, grammar-specific implementation(error-prone, inflexible)

• Backtracking (simple, but inefficient)• Cocke-Younger-Kasami-Algorithm (1967):

! for all CFGs in Chomsky Normalform! based on idea of dynamic programming! Time Complexity O(n3), however linear complexity desired

• Top-Down-Methods: from Axiom to Token stream• Bottom-up-Methods: from Token stream to Axiom

Ina Schaefer Context-Free Analysis 21

Implementation of Parsers

Example: Top-Down AnalysisAccording to "1: Result is a left derivation.

60© A. Poetzsch-Heffter, TU Kaiserslautern25.04.2007

Beispiel: (Top-down-Analyse)

S

E =>

T + E =>

F * T + E =>

( E ) * T + E =>

( T + E ) * T + E =>

( F + E ) * T + E =>

( ID + E ) * T + E =>

( ID + T ) * T + E =>

( ID + F ) * T + E =>

( ID + ID ) * T + E =>

( ID + ID ) * F + E =>

( ID + ID ) * ID + E =>

( ID + ID ) * ID + T =>

( ID + ID ) * ID + F =>

( ID + ID ) * ID + ID

Ergebnis der td-Analyse ist eine Linksableitung.

Gemäß !1 :

Ina Schaefer Context-Free Analysis 22

Implementation of Parsers

Example: Bottom-Up AnalysisAccording to "1: Result is a right derivation.

61© A. Poetzsch-Heffter, TU Kaiserslautern25.04.2007

Beispiel: (Bottom-up-Analyse)

( ID + ID ) * ID + ID <=

( F + ID ) * ID + ID <=

( T + ID ) * ID + ID <=

( T + F ) * ID + ID <=

( T + T ) * ID + ID <=

( T + E ) * ID + ID <=

( E ) * ID + ID <=

F * ID + ID <=

F * F + ID <=

F * T + ID <=

T + ID <=

T + F <=

T + T <=

T + E <=

E <=

S <=

Ergebnis der bu-Analyse ist eine Rechtsableitung.

Gemäß !1 :

Ina Schaefer Context-Free Analysis 23

Implementation of Parsers

Context-free Analysis with linear complexity

• Restrictions on Grammar (not every CFG has a linear parser)• Use of push-down automata or systems of recursive procedures• Usage of look ahead to remaining input in order to select next

production rule to be applied

Ina Schaefer Context-Free Analysis 24

Implementation of Parsers

Syntax Analysis Methods and Parser Generators

• Basic Knowledge of Syntax Analysis is essential for use of ParserGenerators.

• Parser generators are not always applicable.• Often, error handling has to be done manually.• Methods underlying parser generation is a good example for a

generic technique (and a highlight of computer science!).

Ina Schaefer Context-Free Analysis 25

Implementation of Parsers Top-Down Syntax Analysis

Top-Down Syntax Analysis

Educational Objectives• General Principle of Top-Down Syntax Analysis• Recursive Descent Parsing (at an Example)• Expressiveness of Top-Down Parsing• Basic Concepts of LL(k) Parsing

Ina Schaefer Context-Free Analysis 26

Implementation of Parsers Top-Down Syntax Analysis

Recusive Descent

Basic Idea

• Each non-terminal A is associated with a procedure. Thisprocedure accepts a partial sentence derived from A.

• The procedure implements a finite automaton constructed fromthe productions starting from A

• Recursion of grammar is mapped to mutual recursive proceduressuch that stack of higher programing languages can be used forimplementation.

Ina Schaefer Context-Free Analysis 27

Implementation of Parsers Top-Down Syntax Analysis

Construction of Recursive Descent Parser

Let ""1 be a CFG (like "1) with a terminal # denoting the end of theinput.

""1:• S & E#

• E & T + E |T• T & F ) T |F• F & (E) |ID

Construct item automaton for each non-terminal A. The itemautomaton describes the analysis of the productions with left side A.

Ina Schaefer Context-Free Analysis 28

Implementation of Parsers Top-Down Syntax Analysis

Item Automata

S & E#

64© A. Poetzsch-Heffter, TU Kaiserslautern25.04.2007

Konstruktion eines Parsers mit der Methode

des rekursiven Abstiegs (exemplarisch):

Sei !‘ wie !1, aber mit Randzeichen #, d.h.

S E #, E T + E | T, T F * T | F, F ( E ) | ID

Konstruiere für jedes Nichtterminal A den sogenannten

Item-Automaten. Er beschreibt die Analyse derjenigen

Produktionen, deren linke Seite A ist:

1

[S .E #] [S E.# ] [S E#.]

[E .T+E]

[E .T ]

[T .F*T]

[ T .F ]

[F .(E)]

[F .ID ]

[E T+.E] [E T+E.][E T.+E]

[E T.]

[ T F.][T F.*T]

[T F*.T] [T F*T.]

[F ID.]

[F (.E)] [F (E.)] [F (E).]

E #

T + E

F*

T

(

ID

E )

E & T + E |T

64© A. Poetzsch-Heffter, TU Kaiserslautern25.04.2007

Konstruktion eines Parsers mit der Methode

des rekursiven Abstiegs (exemplarisch):

Sei !‘ wie !1, aber mit Randzeichen #, d.h.

S E #, E T + E | T, T F * T | F, F ( E ) | ID

Konstruiere für jedes Nichtterminal A den sogenannten

Item-Automaten. Er beschreibt die Analyse derjenigen

Produktionen, deren linke Seite A ist:

1

[S .E #] [S E.# ] [S E#.]

[E .T+E]

[E .T ]

[T .F*T]

[ T .F ]

[F .(E)]

[F .ID ]

[E T+.E] [E T+E.][E T.+E]

[E T.]

[ T F.][T F.*T]

[T F*.T] [T F*T.]

[F ID.]

[F (.E)] [F (E.)] [F (E).]

E #

T + E

F*

T

(

ID

E )

T & F ) T |F

64© A. Poetzsch-Heffter, TU Kaiserslautern25.04.2007

Konstruktion eines Parsers mit der Methode

des rekursiven Abstiegs (exemplarisch):

Sei !‘ wie !1, aber mit Randzeichen #, d.h.

S E #, E T + E | T, T F * T | F, F ( E ) | ID

Konstruiere für jedes Nichtterminal A den sogenannten

Item-Automaten. Er beschreibt die Analyse derjenigen

Produktionen, deren linke Seite A ist:

1

[S .E #] [S E.# ] [S E#.]

[E .T+E]

[E .T ]

[T .F*T]

[ T .F ]

[F .(E)]

[F .ID ]

[E T+.E] [E T+E.][E T.+E]

[E T.]

[ T F.][T F.*T]

[T F*.T] [T F*T.]

[F ID.]

[F (.E)] [F (E.)] [F (E).]

E #

T + E

F*

T

(

ID

E )Ina Schaefer Context-Free Analysis 29

Implementation of Parsers Top-Down Syntax Analysis

Item Automata (2)

F & (E) |ID

64© A. Poetzsch-Heffter, TU Kaiserslautern25.04.2007

Konstruktion eines Parsers mit der Methode

des rekursiven Abstiegs (exemplarisch):

Sei !‘ wie !1, aber mit Randzeichen #, d.h.

S E #, E T + E | T, T F * T | F, F ( E ) | ID

Konstruiere für jedes Nichtterminal A den sogenannten

Item-Automaten. Er beschreibt die Analyse derjenigen

Produktionen, deren linke Seite A ist:

1

[S .E #] [S E.# ] [S E#.]

[E .T+E]

[E .T ]

[T .F*T]

[ T .F ]

[F .(E)]

[F .ID ]

[E T+.E] [E T+E.][E T.+E]

[E T.]

[ T F.][T F.*T]

[T F*.T] [T F*T.]

[F ID.]

[F (.E)] [F (E.)] [F (E).]

E #

T + E

F*

T

(

ID

E )

Ina Schaefer Context-Free Analysis 30

Implementation of Parsers Top-Down Syntax Analysis

Recursive Descent Parsing Procedures

• Item Automata can be mapped to recursive procedures.• The input is a token stream terminated by #.• The variable currSymbol contains one token look ahead,

i.e. the first symbol of the stream.

Ina Schaefer Context-Free Analysis 31

Implementation of Parsers Top-Down Syntax Analysis

Recursive Descent Parsing Procedures (2)

Production: S & E#

void S() {E ();if (currSymbol == ’#’){

accept();} else {

error();}

}

Ina Schaefer Context-Free Analysis 32

Implementation of Parsers Top-Down Syntax Analysis

Recursive Descent Parsing Procedures (3)Production: E & T + E |Tvoid E() {

T();if (currSymbol == ’+’){

readSymbol();E();}

}

Production: T & F ) T |Fvoid T() {

F();if (currSymbol == ’*’){

readSymbol();T();

}}

Ina Schaefer Context-Free Analysis 33

Implementation of Parsers Top-Down Syntax Analysis

Recursive Descent Parsing Procedures (4)

Production: F & (E) |IDvoid F() {

if (currSymbol == ’(’){readSymbol();E();if (currSymbol == ’)’){readSymbol();}else error();}else if (currSymbol == ID ){

readSymbol();}else error();

}

Ina Schaefer Context-Free Analysis 34

Implementation of Parsers Top-Down Syntax Analysis

Recursive Descent Parsing Procedures (5)

Remarks:• Recursive Descent

! is relatively easy to implement! can easily be used with other tasks (see following example)! is a typical example for syntax-directed methods (see also following

example)• Example uses one token look ahead.

Ina Schaefer Context-Free Analysis 35

Implementation of Parsers Top-Down Syntax Analysis

Recursive Descent and Evaluation

Example: Interpreter for Expressions using recursive descent

int env(Ident); // ID -> int// local variable int_result stores intermediate results

int S() {int int_result : = E();if (currSymbol == ’#’) {return int_result;}

else {error();return error_result;

}}

Ina Schaefer Context-Free Analysis 36

Implementation of Parsers Top-Down Syntax Analysis

Recursive Descent and Evaluation (2)

int E() {int int_result := T();if (currSymbol == ’+’){

readSymbol();return int_result + E();

}}

int T() {int int_result := F();if (currSymbol == ’*’){

readSymbol();return int_result * T();

}}

Ina Schaefer Context-Free Analysis 37

Implementation of Parsers Top-Down Syntax Analysis

Recursive Descent and Evaluation (3)

int F() {int int_result;if (currSymbol == ’(’){

readSymbol();int_result := E();if (currSymbol == ’)’){

readSymbol();return int_result;

}else { error();

return error_result; }}else if (currSymbol == ID) {

readSymbol();return env(code(ID));}

else { error();return error_result; }

}Ina Schaefer Context-Free Analysis 38

Implementation of Parsers Top-Down Syntax Analysis

Recursive Descent and Evaluation (4)

• Extension of Parser with Actions/Computations can easily beimplemented, but mixing of conceptually different tasks andcauses programs hard to maintain.

• For which grammars does the recursive descent technique work?& LL(k) Parsing Theory

Ina Schaefer Context-Free Analysis 39

Implementation of Parsers Top-Down Syntax Analysis

LL Parsing

• Basis for Town-Down Syntax Analysis• First L: Read Input from Left to Right• Second L: Search for Left Derivations

Ina Schaefer Context-Free Analysis 40

Implementation of Parsers Top-Down Syntax Analysis

LL(k) Grammars

Definition (LL(k) Grammar)Let " =( N, T ,!, S) be a CFG and k % N." is an LL(k) grammar, if for any two left derivations

S '!lm uA! 'lm u"! '!

lm ux

and

S '!lm uA! 'lm u#! '!

lm uy

it holds that if prefix(k , x) = prefix(k , y) then " = #.

Ina Schaefer Context-Free Analysis 41

Implementation of Parsers Top-Down Syntax Analysis

LL(k) Grammars (2)

Remarks:• A grammar is an LL(k) grammar if for a left derivation with k token

look ahead the correct production for the next derivation step canbe found.

• A Language L * #! is LL(k) if there exists LL(k) grammar " withL(") = L.

• The definition of LL(k) grammars provides no method to test if agrammar has the LL(k) property.

Ina Schaefer Context-Free Analysis 42

Implementation of Parsers Top-Down Syntax Analysis

Non LL(k) Grammars

Example 1: Grammar with Left Recursion "2:• S & E#

• E & E + T |T• T & T ) F |F• F & (E) |ID

Elimination of Left Recursion:Replace productions of form A & A!|" where " does not start with Aby A & "A" and A" & !A"|(.

Ina Schaefer Context-Free Analysis 43

Implementation of Parsers Top-Down Syntax Analysis

Non LL(k) Grammars (2)

Elimination of Left Recursion: From "2 we obtain "3.

"2:• S & E#

• E & E + T |T• T & T ) F |F• F & (E) |ID

"3

• S & E#

• E & TE "

• E " & +TE "|(• T & FT "

• T " & )FT |(• F & (E) |ID

Ina Schaefer Context-Free Analysis 44

Implementation of Parsers Top-Down Syntax Analysis

Non LL(k) Grammars (3)

Example 2: Grammar "4 with unlimited look ahead

• STM & VAR := VAR|ID(IDLIST )

• VAR & ID|ID(IDLIST )

• IDLIST & ID|ID, IDLIST

"4 is not an LL(k) grammar for any k.(Proof: cf. Wilhelm, Maurer, Example 8.3.4, p. 319)

Transformation to LL(2) grammar ""4:• STM & ASS_CALL|ID := VAR• ASS_CALL & ID(IDLIST )ASS_CALL_REST• ASS_CALL_REST &:= VAR|(

Ina Schaefer Context-Free Analysis 45

Implementation of Parsers Top-Down Syntax Analysis

Non LL(k) Grammars (4)

Remark:

The transformed grammars accept the same language, but provideother syntax trees.

From a theoretical point of view, this is acceptable.

From a programming language implementation perspective, this is ingeneral not acceptable.

There are languages L for which no LL(k) grammar " exists thatgenerates the language, i.e. L(") = L. (Example: Grammar "5)

Ina Schaefer Context-Free Analysis 46

Implementation of Parsers Top-Down Syntax Analysis

Non LL(k) Grammars (5)

Example 3: For L("5), there exists no LL(k) grammar.

• S & A|B• A & aAb|0• B & aBbb|1

We show that there is no ksuch that "5 is an LL(k)grammar.

Proof.Let k be arbitrary but fixed. Choose two derivations according to theLL(k) definition and show that desipite of equal prefixes of length k theresuts from " and # are not equal:

S '!lm S ' Alm '!

lm ak0bk

S '!lm S 'lm B '!

lm ak1b2k

Then: prefix(k , ak0bk ) = prefix(k , ak1b2k ) = ak , but " = A += B = #.

Ina Schaefer Context-Free Analysis 47

Implementation of Parsers Top-Down Syntax Analysis

FIRST and FOLLOW Sets

DefinitionLet " =( N, T ,!, S) be a CFG, k % N. T#k = {u % T ! | length(u) , k}denotes all prefixes of length at least k .

We define:• FIRSTk : (N $ T )! & P(T#k )

with FIRSTk (!) = {prefix(k , u) |! '! u}where prefix(n, u) = u for all u with length(U) , n.

• FOLLOWk : (N $ T )! & P(T#k )with FOLLOWk (!) = {w |S '! "!# - w % FIRSTk (#)}

Ina Schaefer Context-Free Analysis 48

Implementation of Parsers Top-Down Syntax Analysis

FIRST and FOLLOW Sets in Parse Trees

X

S

FIRSTk(X) FOLLOWk(X)

Ina Schaefer Context-Free Analysis 49

Implementation of Parsers Top-Down Syntax Analysis

Characterization of LL(1) Grammars

Definition (Reduced CFG)A CFG " = (N, T ,!, S) is reduced if each non-terminal occurs in aderivation and each non-terminal derives at least one word.

LemmaA reduced CFG is LL(1) if-and-only-if for any two productions A & "and A & # it holds that

FIRST1(").1 FOLLOW1(A) ! FIRST1(#).1 FOLLOW1(A) = "

where L1 .1 L2 = {prefix(1, vw) | v % L1, w % L2}

Remark: FIRST and FOLLOW sets are computable, such this criterioncan be checked automatically.

Ina Schaefer Context-Free Analysis 50

Implementation of Parsers Top-Down Syntax Analysis

Examples: FIRSTk and FOLLOWk

Check that modified expression grammar "3 is LL(1).• S & E#

• E & TE "

• E " & +TE "|(• T & FT "

• T " & )FT |(• F & (E) |ID

Compute FIRST1 and FOLLOW1 for each non-terminal.

Ina Schaefer Context-Free Analysis 51

Implementation of Parsers Top-Down Syntax Analysis

Examples: FIRSTk and FOLLOWk (2)

• F & (E) |ID:FI1((E)).1 FOLLOW1(F ) ! FIRST1(ID). FOLLOW1(F )

= {(}.1 FOLLOW1(F ) ! {ID}. FOLLOW1(F )= "

• E " & +TE "|(:FIRST1(+TE ").1 FOLLOW1(E ") ! FIRST1((). FOLLOW1(E ")

= {+}.1 FOLLOW1(E ") ! {(}. FOLLOW1(E ")= {+} !{ #, )}= "

• T " & )FT |( :FIRST1()FT ").1 FOLLOW1(T ") ! FIRST1((). FOLLOW1(T ")

= {)}.1 FOLLOW1(T ") ! {(}. FOLLOW1(T ")= {)} ! {+,#, )}= "

Ina Schaefer Context-Free Analysis 52

Implementation of Parsers Top-Down Syntax Analysis

Proof of LL Characterization Lemma

• Left-To-Right Direction: " is LL(1) implies FIRST and FOLLOWcharacterization.

Proof by Contradiction:Suppose two productions A & " and A & # with " += # and $ =FIRST1(").1 FOLLOW1(A) ! FIRST1(#).1 FOLLOW1(A) += ".

Then there exists z % $ with length(z) = 1.

As " is reduced, there are two derivations:

S '! $A! ' $"! '! $zxS '! $A! ' $#! '! $zy

Ina Schaefer Context-Free Analysis 53

Implementation of Parsers Top-Down Syntax Analysis

Proof of LL Characterization Lemma (2)

Thus, we can construct the following left derivations :

S '!lm uA! 'lm u"! '!

lm uzxS '!

lm uA! 'lm u#! '!lm uzy

with prefix(1, zx) = z = prefix(1, zy) which contradicts the LL(1)property of ".

Ina Schaefer Context-Free Analysis 54

Implementation of Parsers Top-Down Syntax Analysis

Proof of LL Characterization Lemma (3)

• Right-To-Left Direction: FIRST and FOLLOW characterizationimplies " is LL(1) .

Proof by Contradiction:Suppose " is not LL(1). Then there are two different derivationswith length(z) = 1:

S '! $A! ' $"! '! $zxS '! $A! ' $#! '! $zy

But z % FIRST1(").1 FIRST1(#) which is a contradiction.

Ina Schaefer Context-Free Analysis 55

Implementation of Parsers Top-Down Syntax Analysis

Parser Generation for LL(k) Languages

LL(k) Parser Generator

Grammar

Table for Push-Down Automaton/Parser Program

Error: Grammar is not LL(k)

Ina Schaefer Context-Free Analysis 56

Implementation of Parsers Top-Down Syntax Analysis

Parser Generation for LL(k) Languages (2)

Remarks:• Use of push-down automata with look ahead• Select Production from Tables• Advantages over bottom-up techniques in error analysis and error

handling

Example System: ANTLR (http://www.antlr.org/)

Recommended Reading for Top-Down Analysis:• Wilhelm, Maurer: Chapter 8, Sections 8.3.1. to Sections 8.3.4, pp.

312 - 329

Ina Schaefer Context-Free Analysis 57