chap5
-
Upload
vivekanandan8 -
Category
Documents
-
view
283 -
download
2
description
Transcript of chap5
![Page 1: chap5](https://reader035.fdocuments.us/reader035/viewer/2022081413/5466a871b4af9fbb068b4681/html5/thumbnails/1.jpg)
Formal Languages
Chapter 5 Context-Free Languages
Wuu Yang
National Chiao-Tung University, Taiwan, R.O.C.
September 15, 2008
1
![Page 2: chap5](https://reader035.fdocuments.us/reader035/viewer/2022081413/5466a871b4af9fbb068b4681/html5/thumbnails/2.jpg)
Chapter Outline
1. Context-Free Grammars
2. Parsing and Ambiguity
3. Context-Free Grammars and Programming Languages
2
![Page 3: chap5](https://reader035.fdocuments.us/reader035/viewer/2022081413/5466a871b4af9fbb068b4681/html5/thumbnails/3.jpg)
We have seen many languages that are not regular, for instance,{(n)n | n ≥ 0}, which is a special case of properly nestedparentheses widely used in conventional programming languages.
Context-free languages are mostly used in the specification ofhigh-level computer programming languages, such as Java and Perl.
To decide the membership problem (whether a string belongs to acontext-free language) is called parsing, which is the front-end of acompiler.
3
![Page 4: chap5](https://reader035.fdocuments.us/reader035/viewer/2022081413/5466a871b4af9fbb068b4681/html5/thumbnails/4.jpg)
§5.1 Context-Free Grammars
Definition. A grammar G =def (V, T, S, P ) is a context-freegrammar if all production rules in P have the form A → α, whereA ∈ V and α ∈ (V ∪ T )∗. A language L is context-free if and only ifL = L(G) for some context-free grammar G.
Note that a regular grammar satisfies the above definition and,hence, it is also a context-free grammar. Consequently, a regularlanguage is also a context-free language.
Example 5.1. The following grammar is context-free, but is notregular.
S → aSa
S → bSb
S → λ
Here is a sample derivation:S ⇒ aSa ⇒ aaSaa ⇒ aabSbaa ⇒ aabbaa. The language generated
4
![Page 5: chap5](https://reader035.fdocuments.us/reader035/viewer/2022081413/5466a871b4af9fbb068b4681/html5/thumbnails/5.jpg)
by this grammar is {wwR | w ∈ Σ∗}, which is context-free, but notregular. 2
Note that this grammar is linear (see slide 3-20) in that theright-hand side of each production rule contains at most onenonterminal. But it is not right-linear nor left-linear.
From this example, we conclude that the family of regularlanguages is a proper subclass of the family of the context-freelanguages.
5
![Page 6: chap5](https://reader035.fdocuments.us/reader035/viewer/2022081413/5466a871b4af9fbb068b4681/html5/thumbnails/6.jpg)
Example 5.2. The following grammar is context-free, but is notregular.
S → abB
A → aaBb
B → bbAa
A → λ
The language generated by this grammar is{ab(bbaa)nbba(ba)n | n ≥ 0}. This language, which is similar to{enfn | n ≥ 0}, is not regular.
Note that, though, similar to a right-linear grammar, the right-handside of each production rule contains at most one nonterminal, thegrammar is not right-linear (hence, not regular). 2
6
![Page 7: chap5](https://reader035.fdocuments.us/reader035/viewer/2022081413/5466a871b4af9fbb068b4681/html5/thumbnails/7.jpg)
Example. The language L =def {w ∈ {a, b}∗ | na(w) = nb(w)} iscontext-free, but is not regular. We can derive a grammar for L:
V → V aV bV
V → V bV aV
V → λ
There are at least two derivations of the sentence abab:V ⇒ V aV bV ⇒ aV bV ⇒ abV ⇒ abV aV bV ⇒ abaV bV ⇒ababV ⇒ abab andV ⇒ V aV bV ⇒ V aV b ⇒ V ab ⇒ V aV bV ab ⇒ aV bV ab ⇒abV ab ⇒ abab andV ⇒ V aV bV ⇒ V aV b ⇒ aV b ⇒ aV bV aV b ⇒ aV bV ab ⇒abV ab ⇒ abab. We way this grammar is ambiguous. 2
7
![Page 8: chap5](https://reader035.fdocuments.us/reader035/viewer/2022081413/5466a871b4af9fbb068b4681/html5/thumbnails/8.jpg)
Example. The language L =def {w ∈ {a, b}∗ | na(w) ≥ nb(w)} iscontext-free, but is not regular. We can derive a grammar for L:
T → TaT
T → V
V → V aV bV | V bV aV | λ
This grammar is also ambiguous. 2
Example. The language L =def {w ∈ {a, b}∗ | na(w) > nb(w)} iscontext-free, but is not regular. We can derive a grammar for L:
S → TaT
T → TaT | V
V → V aV bV | V bV aV | λ
This grammar is also ambiguous. 2
8
![Page 9: chap5](https://reader035.fdocuments.us/reader035/viewer/2022081413/5466a871b4af9fbb068b4681/html5/thumbnails/9.jpg)
Example. The language L =def {w ∈ {a, b}∗ | na(w) 6= nb(w)} iscontext-free, but is not regular. We can derive a grammar for L:
S → TaT | UbU
T → TaT | V
U → UbU | V
V → V aV bV | V bV aV | λ
This grammar is also ambiguous.
This language is the complement of a previous context-freelangauge. 2
9
![Page 10: chap5](https://reader035.fdocuments.us/reader035/viewer/2022081413/5466a871b4af9fbb068b4681/html5/thumbnails/10.jpg)
Example. The language L =def {anbm | n = m} is context-free, butis not regular. We can derive a grammar for L:
V → aV b
V → λ
This grammar is unambiguous. 2
Example. The language L =def {anbm | n ≥ m} is context-free, butis not regular. We can derive a grammar for L:
T → aT
T → V
V → aV b | λ
The strings derived from T contains zero or more a’s than b’s. Thisgrammar is unambiguous. 2
10
![Page 11: chap5](https://reader035.fdocuments.us/reader035/viewer/2022081413/5466a871b4af9fbb068b4681/html5/thumbnails/11.jpg)
Example. The language L =def {anbm | n > m} is context-free, butis not regular. We can derive a grammar for L:
S → aT
T → aT | V
V → aV b | λ
The strings derived from S contains one or more a’s than b’s. Thisgrammar is unambiguous. 2
11
![Page 12: chap5](https://reader035.fdocuments.us/reader035/viewer/2022081413/5466a871b4af9fbb068b4681/html5/thumbnails/12.jpg)
Example 5.3. The language L =def {anbm | n 6= m} is context-free,but is not regular. We can derive a grammar for L:
S → aT | Ub
T → aT | V
U → Ub | V
V → aV b | λ
Either (1) the strings derived from S contains one or more a’s thanb’s (if we take S → aT during the first derivation step) or (2) thestrings derived from S contains one or more b’s than a’s (if we takeS → Ub during the first derivation step). This grammar isunambiguous.
(2nd solution). Here is the grammar from the textbook:
S → AV | V B
A → aA | a
B → Bb | b
12
![Page 13: chap5](https://reader035.fdocuments.us/reader035/viewer/2022081413/5466a871b4af9fbb068b4681/html5/thumbnails/13.jpg)
V → aV b | λ
This grammar is unambiguous. 2
How can we show that the two grammars generate the samelanguage?
Exercise 25. Find a linear grammar for this language.
13
![Page 14: chap5](https://reader035.fdocuments.us/reader035/viewer/2022081413/5466a871b4af9fbb068b4681/html5/thumbnails/14.jpg)
Example 5.4. Consider the following grammar:
S → aSb | SS | λ
The language generated by this grammar is{w ∈ {a, b}∗ | na(w) = nb(w); na(v) ≥ nb(v), for any prefix v of w}.This is the language of properly nested parentheses commonly usedin computer programming languages and mathematical expressions.2
This language is not regular.
Question. Is there a linear grammar for this language? (Seechapter 8.)
14
![Page 15: chap5](https://reader035.fdocuments.us/reader035/viewer/2022081413/5466a871b4af9fbb068b4681/html5/thumbnails/15.jpg)
Leftmost and Rightmost Derivations
A derivation is a sequence of steps. In each step we expand anonterminal A by replacing A with the right-hand side of anA-production rule. For example, consider the following grammar:
S → AB
A → aaA
A → λ
B → Bb
B → λ
The language generated by this grammar is{a2nbm | n ≥ 0,m ≥ 0}. The string aab is a sentence (or anelement) of this language. Here are two derivations of this sentence:
S ⇒ AB ⇒ aaAB ⇒ aaB ⇒ aaBb ⇒ aab
S ⇒ AB ⇒ ABb ⇒ Ab ⇒ aaAb ⇒ aab
15
![Page 16: chap5](https://reader035.fdocuments.us/reader035/viewer/2022081413/5466a871b4af9fbb068b4681/html5/thumbnails/16.jpg)
The result of each derivation step is called a sentential form. Thederivation stops when non more nonterminal is left. A sententialform without nonterminals is called a sentence.
The first derivation is a leftmost derivation in which the leftmostnonterminal is expanded first. Similarly, the second derivation is arightmost derivation in which the rightmost nonterminal isexpanded first.
A derivation can be drawn as a derivation tree. A derivation tree isalso called a syntax tree. For example:
16
![Page 17: chap5](https://reader035.fdocuments.us/reader035/viewer/2022081413/5466a871b4af9fbb068b4681/html5/thumbnails/17.jpg)
S
A B
aa A B b
Fig 5.1
A
aa A
(a) a derivation tree
(b) a partial derivation tree
A derivation tree is an ordered tree, which means that there is anordering among siblings. The root of a derivation is labelled withthe start symbol of the grammar. The leaves are labelled with anelement of T ∪ {λ}. The internal nodes are labelled with anonterminal (or a variable, which is an element of V ). A subtree ofthe derivation tree with some sub-subtrees removed is called apartial derivation tree.
17
![Page 18: chap5](https://reader035.fdocuments.us/reader035/viewer/2022081413/5466a871b4af9fbb068b4681/html5/thumbnails/18.jpg)
Example 5.6. Consider the following grammar:
S → aAB
A → bBb
B → A | λ
The language generated by this grammar is {a(bb)m | m ≥ 1}. Thestring abbbb is a sentence of this language. Here is the leftmostderivation of this sentence:
S ⇒ aAB ⇒ abBbB ⇒ abbB ⇒ abbA ⇒ abbbBb ⇒ abbbb
18
![Page 19: chap5](https://reader035.fdocuments.us/reader035/viewer/2022081413/5466a871b4af9fbb068b4681/html5/thumbnails/19.jpg)
S
A B
b B
Fig 5.2(a) a derivation tree
(b) a partial derivation tree
a
b A
b B b
B
A
b B b
19
![Page 20: chap5](https://reader035.fdocuments.us/reader035/viewer/2022081413/5466a871b4af9fbb068b4681/html5/thumbnails/20.jpg)
Theorem 5.1. There is an obvious correspondence between aderivation of a sentence w ∈ L(G) and its derivation tree.
20
![Page 21: chap5](https://reader035.fdocuments.us/reader035/viewer/2022081413/5466a871b4af9fbb068b4681/html5/thumbnails/21.jpg)
§5.2 Parsing and Ambiguity
There are two sides of a (context-free) grammar:
• We may use a grammar to generate sentences (derivation).
• We may ask whether a string can be generated by a grammar(parsing).
A simple parsing method is to try all possible derivations and see ifthe string could be derived.
We use a top-down, breadth-first, left-to-right approach.
21
![Page 22: chap5](https://reader035.fdocuments.us/reader035/viewer/2022081413/5466a871b4af9fbb068b4681/html5/thumbnails/22.jpg)
0. exhaustive search1. Input is a string w and a grammar G.2. T = {S} (the start symbol of the grammar)3. repeat4. for each sentential form f in T do5. locate the leftmost nonterminal, say A,6. expand A with every A-rule7. T := T − {f} ∪ { new sentential forms }8. delete those sentential forms that cannot generate therequired string.9. end for10. until we finds a leftmost derivation of the string or thecollection of sentential becomes empty.
If w ∈ L(G), then this algorithm always terminates and returns aleftmost derivation of w. If w 6∈ L(G), this algorithm may notterminate.
22
![Page 23: chap5](https://reader035.fdocuments.us/reader035/viewer/2022081413/5466a871b4af9fbb068b4681/html5/thumbnails/23.jpg)
An alternative strategy of exhaustive search. We may dothis by following the leftmost derivation. When expandinga nonterminal A, we try each A-rule in turn. Deriving asentence stops whenever it is possible to decide whetherthe result is the required string.
This exhaustive search method may not terminate, even ifw ∈ L(G), due to left-recursive rules (that is, rules of theform L → Lα). This same problem occurs if we follow therightmost derivation, due to right-recursive rules.
This is a top-down, depth-first approach. 2
23
![Page 24: chap5](https://reader035.fdocuments.us/reader035/viewer/2022081413/5466a871b4af9fbb068b4681/html5/thumbnails/24.jpg)
Recall the reverse of a grammar GR defined in §3.3. A leftmostderivation in G corresponds to a rightmost derivation in GR.
24
![Page 25: chap5](https://reader035.fdocuments.us/reader035/viewer/2022081413/5466a871b4af9fbb068b4681/html5/thumbnails/25.jpg)
Example 5.7. Consider the string aabb and the grammar
S → SS | aSb | bSa | λ
In the 1st round, we will try the following derivations in turn:
S ⇒ SS
S ⇒ aSb
S ⇒ bSa
S ⇒ λ
The last two derivations cannot lead to the string aabb. In the 2ndround, we have 8 sentential forms:
S ⇒ SS ⇒ SSS
S ⇒ SS ⇒ aSbS
S ⇒ SS ⇒ bSaS
25
![Page 26: chap5](https://reader035.fdocuments.us/reader035/viewer/2022081413/5466a871b4af9fbb068b4681/html5/thumbnails/26.jpg)
S ⇒ SS ⇒ λS
S ⇒ aSb ⇒ aSSb
S ⇒ aSb ⇒ aaSbb
S ⇒ aSb ⇒ abSab
S ⇒ aSb ⇒ aλb
The 3rd, 7th, and 8th derivations cannot lead to the required stringaabb. There are 5 sentential forms left. We may conduct the 3rdround and will find a leftmost derivation:
S ⇒ aSb ⇒ aaSbb ⇒ aaλbb
26
![Page 27: chap5](https://reader035.fdocuments.us/reader035/viewer/2022081413/5466a871b4af9fbb068b4681/html5/thumbnails/27.jpg)
Problems with exhaustive search:
• It is inefficient.
• It may not terminate if w 6∈ L(G). If we impose the additionalconstraint that there is no λ rules (that is, rules of the formA → λ) nor rules of the form A → B, then the aboveexhaustive search method always terminates with a correct,definite answer whether or not w ∈ L(G).We will see later that this constraint does not affect the powerof context-free grammars in any significant way.
27
![Page 28: chap5](https://reader035.fdocuments.us/reader035/viewer/2022081413/5466a871b4af9fbb068b4681/html5/thumbnails/28.jpg)
Example 5.8. The grammar in example 5.7 is equivalent to thefollowing grammar (except the empty sentence), which satisfies theabove constraint (no λ-rules):
T → TT | aTb | bTa | ab | ba
2
Corollary. Let G be a context-free grammar which does notinclude rules of the forms A → λ and A → B where A,B ∈ V .Then the derivation of a sentence w ∈ L(G) takes at most 2|w| − 1steps.
Proof. Note that in such grammars, every derivation stepincreases the length of the derived sentential form by atleast 1 or it changes a nonterminal to a terminal (with arule A → a). 2
28
![Page 29: chap5](https://reader035.fdocuments.us/reader035/viewer/2022081413/5466a871b4af9fbb068b4681/html5/thumbnails/29.jpg)
Theorem 5.2. Let G be a context-free grammar which does notinclude rules of the forms A → λ and A → B where A,B ∈ V .Then the exhaustive search method always terminate with acorrect answer.
Proof. Due to the above corollary, we can limit our searchto at most 2|w| − 1 rounds (there is a derivation step perround), where w is the given string. If w ∈ L(G) we willfind a (leftmost) derivation. Otherwise, the search willterminate with a NO answer. 2
29
![Page 30: chap5](https://reader035.fdocuments.us/reader035/viewer/2022081413/5466a871b4af9fbb068b4681/html5/thumbnails/30.jpg)
Next we will consider the time complexity of exhaustive search.
Initially, there is a single sentential form (which consists of thesingle start symbol S). In each round, a sentential form is expandedinto at most |P | new sentential forms. There are at most 2|w| − 1rounds. Hence the upper bound of the number of sentential forms is
|P |+ |P |2 + |P |3 + . . . + |P |2|w|−1 =|P |2|w| − |P ||P | − 1
= O(|P |2|w|)
This is an exponential function on the length of the input string|w|. There are more efficient general parsers, such as CYK andEarley’s parsers.
30
![Page 31: chap5](https://reader035.fdocuments.us/reader035/viewer/2022081413/5466a871b4af9fbb068b4681/html5/thumbnails/31.jpg)
Theorem 5.3. Every context-free grammars have a O(n3)-timeparser.
Context-free grammars and parsing are used mostly inprogramming languages and compilers.
In practice we usually require a linear-time parser.
Not all context-free grammars have a linear-time parser.
31
![Page 32: chap5](https://reader035.fdocuments.us/reader035/viewer/2022081413/5466a871b4af9fbb068b4681/html5/thumbnails/32.jpg)
Definition. A (context-free) grammar G is ambiguous if and only ifthere is a sentence w ∈ L(G) that have two or more leftmostderivations.
Equivalently, a (context-free) grammar G is ambiguous if and onlyif there is a sentence w ∈ L(G) that have two or more rightmostderivations.
Equivalently, a (context-free) grammar G is ambiguous if and onlyif there is a sentence w ∈ L(G) that have two or more derivationtrees.
Example 5.10. The grammar S → aSb | SS | λ is ambiguous sincethe sentence aabb has the following two leftmost derivations:
S ⇒ SS ⇒ S ⇒ aSb ⇒ aaSbb ⇒ aabb
S ⇒ aSb ⇒ aaSbb ⇒ aabb
32
![Page 33: chap5](https://reader035.fdocuments.us/reader035/viewer/2022081413/5466a871b4af9fbb068b4681/html5/thumbnails/33.jpg)
S
S ba
S
Fig 5.4
ba
S
S
S
S ba
S ba
Sometimes it is possible to transform an ambiguous grammar intoan unambiguous one. For instance, the above grammar isequivalent to the following unambiguous grammar:
S → T | λ
T → U | UT
U → ab | aUb
33
![Page 34: chap5](https://reader035.fdocuments.us/reader035/viewer/2022081413/5466a871b4af9fbb068b4681/html5/thumbnails/34.jpg)
It is very difficult to determine if a context-free grammar isambiguous. (We will discuss this later)
Example 5.11. The following grammar E → E + E | E ∗E | (E) | a
is ambiguous. This grammar is used to model the usual arithmeticexpressions.
Usually, we impose the additional stipulation that ∗ is performedbefore + (that is, ∗ has a higher precedence than +). We may usethe following (unambiguous) grammar to show this precedence:
Example 5.12.
E → E + T | T
T → T ∗ F | F
F → (E) | a
34
![Page 35: chap5](https://reader035.fdocuments.us/reader035/viewer/2022081413/5466a871b4af9fbb068b4681/html5/thumbnails/35.jpg)
The above examples show that a context-free grammar can be usedto impose precedence. Similarly, associativity can also be enforcedby context-free grammars.
For left-associative operations, such as +:
L → L + E | E
For right-associative operations, such as ∗∗:R → E ∗ ∗R | E
35
![Page 36: chap5](https://reader035.fdocuments.us/reader035/viewer/2022081413/5466a871b4af9fbb068b4681/html5/thumbnails/36.jpg)
We have shown that ambiguity sometimes can be removed byproperly transforming the grammar. However, this is not alwayspossible.
Certain context-free languages have only ambiguous grammars.They are called inherently ambiguous languages.
Definition. Let L be a context-free grammar. If L has anunambiguous grammar, it is unambiguous. Otherwise, it isinherently ambiguous.
Example 5.13. Consider the following language
L =def {anbncm} ∪ {anbmcm}The left part {anbncm} can be generated by a grammar:
S → Sc | A
A → aAb | λ
Similarly, the right part {anbmcm} can be generated by a grammar:
36
![Page 37: chap5](https://reader035.fdocuments.us/reader035/viewer/2022081413/5466a871b4af9fbb068b4681/html5/thumbnails/37.jpg)
T → aT | B
B → bBc | λ
Their union is described by one additional rule:
Q → S | T
The string anbncn, which belongs to both parts, have twoderivations.
Though this does not shown L is inherently ambiguous, it is quitepossible that it is never possible to combine the two parts with asingle unambiguous grammar.
37
![Page 38: chap5](https://reader035.fdocuments.us/reader035/viewer/2022081413/5466a871b4af9fbb068b4681/html5/thumbnails/38.jpg)
§5.4 Context-Free Languages and Programming Languages
The syntax of a programming language is usually specified by acontext-free grammar. Due to the consideration of parsingefficiency, we are usually restricted to the subclass of LL(1) orLR(1) grammars.
The following page contains C’s LALR(1) grammar.
38
![Page 39: chap5](https://reader035.fdocuments.us/reader035/viewer/2022081413/5466a871b4af9fbb068b4681/html5/thumbnails/39.jpg)
39
![Page 40: chap5](https://reader035.fdocuments.us/reader035/viewer/2022081413/5466a871b4af9fbb068b4681/html5/thumbnails/40.jpg)
Indexambiguous grammar, 7, 32
sentence, 16sentential form, 16
39-1