Computational Models — Lecture 41
Handout Mode
Ronitt Rubinfeld and Iftach Haitner.
Tel Aviv University.
March 21/23, 2016
1Based on frames by Benny Chor, Tel Aviv University, modifying frames by Maurice Herlihy, Brown University.
Ronitt Rubinfeld and Iftach Haitner (TAU) Computational Models, Lecture 4 March 21/23, 2016 1 / 36
Talk Outline
I Context Free Grammars/Languages (CFG/CFL)
I Algorithmic issues for CFL
Next two weeks:
I Pumping Lemma for context free languages
I Push Down Automata (PDA)
I Equivalence of CFGs and PDAs
I Sipser’s book, 2.1, 2.2 & 2.3
Ronitt Rubinfeld and Iftach Haitner (TAU) Computational Models, Lecture 4 March 21/23, 2016 2 / 36
Short Overview of the Course
So far we saw
I finite automata,
I regular languages,
I regular expressions,
I Myhill-Nerode theorem
I pumping lemma for regular languages.
We now introduce stronger machines and languages with more expressivepower:
I pushdown automata,
I context-free languages,
I context-free grammars,
I pumping lemma for context-free languages.
Ronitt Rubinfeld and Iftach Haitner (TAU) Computational Models, Lecture 4 March 21/23, 2016 3 / 36
Context Free Grammars (CFG)
An example of a context free grammar, G1:
I A→ 0A1
I A→ B
I B → #
Terminology:
I Each line is a substitution rule or production.
I Each rule has the form: symbol→ string.The left-hand symbol is a variable (usually upper-case).
I A string consists of variables and terminals.
I One variable is the start variable (lhs of top rule). In this case, it is A.
Ronitt Rubinfeld and Iftach Haitner (TAU) Computational Models, Lecture 4 March 21/23, 2016 4 / 36
Rules for Generating Strings
I Write down the start variable.
I Pick a variable written down in current string and a derivation that startswith that variable.
I Replace that variable with right-hand side of that derivation.
I Repeat until no variables remain.
I Return final string (concatenation of terminals).
Process is inherently non deterministic.
Ronitt Rubinfeld and Iftach Haitner (TAU) Computational Models, Lecture 4 March 21/23, 2016 5 / 36
Example
Grammar G1:
I A→ 0A1
I A→ B
I B → #
Derivation with G1:
A → 0A1→ 00A11→ 000A111→ 000B111→ 000#111
Ronitt Rubinfeld and Iftach Haitner (TAU) Computational Models, Lecture 4 March 21/23, 2016 6 / 36
Parsing Trees
A
A
A
A
B
#0 10 0 1 1
Indifferent to derivation order.
Question 1What strings can be generated in this way from the grammar G1?
Answer: Exactly those of the form 0n#1n (n ≥ 0).
Ronitt Rubinfeld and Iftach Haitner (TAU) Computational Models, Lecture 4 March 21/23, 2016 7 / 36
Context-Free Languages (CFL)
The language generated in this way is called the language of the grammar.
For example, L(G1) = {0n#1n : n ≥ 0}.
Any language generated by a context-free grammar is called a context-freelanguage.
Ronitt Rubinfeld and Iftach Haitner (TAU) Computational Models, Lecture 4 March 21/23, 2016 8 / 36
A Useful Abbreviation
Rules with same variable on left hand side
A → 0A1A → B
are written as:
A→ 0A1 | B
Ronitt Rubinfeld and Iftach Haitner (TAU) Computational Models, Lecture 4 March 21/23, 2016 9 / 36
English-like Sentences
A grammar G2 to describe a few English sentences:
< SENTENCE > → NP < VERB >
NP → < ARTICLE >< NOUN >
< NOUN > → boy | girl | flower< ARTICLE > → a | the
< VERB > → touches | likes | sees
Ronitt Rubinfeld and Iftach Haitner (TAU) Computational Models, Lecture 4 March 21/23, 2016 10 / 36
Deriving English-like Sentences
A specific derivation in G2:
< SENTENCE > → NP < VERB >
→ < ARTICLE >< NOUN >< VERB >
→ a < NOUN >< VERB >
→ a boy < VERB >
→ a boy sees
More strings generated by G2:
a flower seesthe girl touches
Ronitt Rubinfeld and Iftach Haitner (TAU) Computational Models, Lecture 4 March 21/23, 2016 11 / 36
Derivation and Parse Tree
< SENTENCE > → NP < VERB >
→ < ARTICLE >< NOUN >< VERB >
→ a < NOUN >< VERB >
→ a boy < VERB >
→ a boy sees
SENTENCE
NOUN-PHRASE VERB
ARTICLE NOUN
a boy sees
Ronitt Rubinfeld and Iftach Haitner (TAU) Computational Models, Lecture 4 March 21/23, 2016 12 / 36
Formal Definition
A context-free grammar is a 4-tuple (V ,Σ,R,S), where
I V is a finite set of variables
I Σ is a finite set of terminals
I R is a finite set of rules: each rule is a variable and a finite string ofvariables and terminals.
I S is the start symbol.
I If u and v are strings of variables and terminals,and A→ w is a rule of the grammar,then uAv yields uwv , written uAv → uwv .
I u ∗→ v if u = v , or u → u1 → . . .→ uk → v for some sequenceu1,u2, . . . ,uk
Definition 2
The language of the grammar G, denoted L(G), is {w ∈ Σ∗ : S ∗→ w}
where ∗→ is determined by G.Ronitt Rubinfeld and Iftach Haitner (TAU) Computational Models, Lecture 4 March 21/23, 2016 13 / 36
Example 1
G3 = ({S}, {a,b},R,S).
R (Rules): S → aSb | SS | ε
Some words in the language: aabb, aababb.
Question 3What is this language?
Hint: Think of parentheses: i.e., a is "(" and b is ")". (()), (()())
Using larger alphabet (i.e., more terminals), ([]()), represent well formedprograms with many kinds of nested loops, "if then/else" statements.
Ronitt Rubinfeld and Iftach Haitner (TAU) Computational Models, Lecture 4 March 21/23, 2016 14 / 36
Example 2
G4 = ({S}, {a,b},R,S).
R (Rules): S → aSa | bSb | ε
Some words in the language: abba,aabaabaa.
Question 4What is this language?
L(G4) = {wwR : w ∈ {a,b}∗} (almost but not quite the set of palindromes)
Ronitt Rubinfeld and Iftach Haitner (TAU) Computational Models, Lecture 4 March 21/23, 2016 15 / 36
Proving L(G4) = {wwR : w ∈ {a,b}∗}What do we need to show?
1. If z = wwR then z is generated by G4.
2. If z generated by G4 then z = wwR .
Part 1:Proof by induction on the length of z.Base: |z| = 0, then S → ε. Hence ε ∈ L(G4).Inductive claim:Let |z| = 2k .Let z = wwR and w = σw ′. Then z = σw ′(w ′)Rσ.Consider the derivation S → σSσ.By the induction hypothesis S ∗→ w ′(w ′)R , since |w ′(w ′)R | = 2k − 2 < |z|.Therefore, G5 generates S → σSσ ∗→ σw ′(w ′)Rσ = z.Hence z ∈ L(G4).
Ronitt Rubinfeld and Iftach Haitner (TAU) Computational Models, Lecture 4 March 21/23, 2016 16 / 36
Proving L(G4) = {wwR : w ∈ {a,b}∗}Part 2:Assume that S ∗→ z.We prove by induction on the number of derivations that z = wwR .Base: If we have a single derivation, then the only possible derivation isS → ε, and we are done.Inductive claim:Assume S ∗→ z in k derivations.Assume we have S → σSσ as the first derivation.Then by the inductive hypothesis, S → w ′(w ′)R .Therefore, z = σw ′(w ′)Rσ.
Ronitt Rubinfeld and Iftach Haitner (TAU) Computational Models, Lecture 4 March 21/23, 2016 17 / 36
Example 3
G5 = ({S,A,B}, {a,b},R,S)
R (Rules):S → aB | bA | εA→ a | aS | bAAB → b | bS | aBB
Some words in the language: aababb, baabba.
Question 5What is this language?
L(G5) = {w ∈ {a,b}∗ : #a(w) = #b(w)}
#x (w) — number of occurrences of x in w
Ronitt Rubinfeld and Iftach Haitner (TAU) Computational Models, Lecture 4 March 21/23, 2016 18 / 36
Proving L(G5) = {w ∈ {a,b}∗ : #a(w) = #b(w)}What do we need to show?
1. If w generated by G5 then #a(w) = #b(w).
2. If #a(w) = #b(w) then w is generated by G5.
Claim 6
If S ∗→ w , then #a(w) + #A(w) = #b(w) + #B(w).
Claim 7
For w ∈ {a,b}∗ let k = k(w) = #a(w)−#b(w). Then:
1. If k = 0, then S ∗→ wS
2. If k > 0, then S ∗→ wB|k|
3. If k < 0, then S ∗→ wA|k|
Are we done?! How we prove these claims?
Ronitt Rubinfeld and Iftach Haitner (TAU) Computational Models, Lecture 4 March 21/23, 2016 19 / 36
Proving Claim 7
Proof by induction on |w |:
I Basis: w = ε then k(w) = 0. Since S ∗→ S, then S ∗→ εS
I Induction step: for w ∈ {a,b}n write w = w ′σ with |w ′| = n − 1
Suppose:
1. k ′ = k(w ′) = (#a(w ′)−#b(w ′)) > 02. σ = a
Note that if both are the case, we have k ′ = k − 1
By the induction hypothesis S ∗→ w ′Bk ′
Since B → aBB, thenS ∗→ w ′Bk ′
= w ′BBk ′−1 ∗→ w ′(aBB)Bk ′−1 = wBk ′+1 = wBk(w)
To complete the proof, need to show for σ = b and for k ′ ≤ 0, k ′ = 0 (6 casesin all).
Ronitt Rubinfeld and Iftach Haitner (TAU) Computational Models, Lecture 4 March 21/23, 2016 20 / 36
Designing Context-Free Grammars
No recipe in general, but few rules-of-thumb
I If CFG is the union of several CFGs, rename variables (not terminals) sothey are disjoint, and add new rule S → S1 | S2 | . . . | Si .
I For languages (like {0n#1n : n ≥ 0} ), with linked substrings, a rule ofform R → uRv is helpful to force desired relation between substrings.
I For a regular language, grammar “follows” a DFA for the language (seenext frame).
Ronitt Rubinfeld and Iftach Haitner (TAU) Computational Models, Lecture 4 March 21/23, 2016 21 / 36
How expressive are CFG’s
Are they more expressive or less expressive than regular languages?
Ronitt Rubinfeld and Iftach Haitner (TAU) Computational Models, Lecture 4 March 21/23, 2016 22 / 36
CFG for Regular Languages
Given a DFA : M = (Q,Σ, δ,q0,F )
CFG G for L(M):
1. Let R0 be the starting variable
2. Add rule Ri → aRj , for any qi ,qj ∈ Q and a ∈ Σ with δ(qi ,a) = qj
3. Add rule Ri → ε, for any qi ∈ F
Claim 8L(G) = L(M)
Proof?
Claim 9
R0∗→ wRj iff M(w) = qj .
Proof? Next class we’ll see alternative proof via “Push-Down Automata"
Ronitt Rubinfeld and Iftach Haitner (TAU) Computational Models, Lecture 4 March 21/23, 2016 23 / 36
Ambiguity in CFLs
Grammar G: E → E + E | E × E | (E) | a
aXa+a
EEE
E
E
aXa+a
EEE
E
E
Ronitt Rubinfeld and Iftach Haitner (TAU) Computational Models, Lecture 4 March 21/23, 2016 24 / 36
Arithmetic Example
Consider the grammar G′ = (V ,Σ,R,E), where
I V = {E ,T ,F}I Σ = {a,+,×, (, )}
I Rules:E → E + T | TT → T × F | FF → (E) | a
Claim 10L(G′) = L(G)
Proof? but G′ is not ambiguous.
Ronitt Rubinfeld and Iftach Haitner (TAU) Computational Models, Lecture 4 March 21/23, 2016 25 / 36
Parsing Tree of G′ for a + a× a
G′:
E → E + T | TT → T × F | FF → (E) | a
aXa+a
FFF
TT
TE
E
Ronitt Rubinfeld and Iftach Haitner (TAU) Computational Models, Lecture 4 March 21/23, 2016 26 / 36
Parsing Tree of G′ for (a + a)× a
G′:
E → E + T | TT → T × F | FF → (E) | a
( a + aX)a
F F F
T T
E
E
F
T
T
E
Ronitt Rubinfeld and Iftach Haitner (TAU) Computational Models, Lecture 4 March 21/23, 2016 27 / 36
Ambiguity
Definition 11A string w is derived ambiguously from grammar G, if w has two or moredifferent parse trees that generate it from G. A CFG is ambiguous, if itambiguously derives a string.
I Ambiguity is usually not only a syntactic notion but also semantic,implying multiple meanings for the same string. Think of a + a× a fromlast grammar.
I It is sometime possible to eliminate ambiguity by finding a differentcontext free grammar generating the same language. This is true for thearithmetic expressions grammar.
Some languages are inherently ambiguous.
Example: {1i2j3k : i = j ∨ j = k}
Proof? Book!
Ronitt Rubinfeld and Iftach Haitner (TAU) Computational Models, Lecture 4 March 21/23, 2016 28 / 36
Part I
Checking Membership
Ronitt Rubinfeld and Iftach Haitner (TAU) Computational Models, Lecture 4 March 21/23, 2016 29 / 36
Checking Membership in a CFL
Challenge
Given a CFG G and a string w , decide whether w ∈ L(G)?
Initial Idea: Design an algorithm that tries all derivations.
Problem: If G does not generate w , we’ll never stop.
Possible solution: Use special grammars that are:
I just as expressive!
I better for checking membership.
Ronitt Rubinfeld and Iftach Haitner (TAU) Computational Models, Lecture 4 March 21/23, 2016 30 / 36
Chomsky Normal Form (CNF)
A simplified, canonical form of context free grammars.G = (V ,Σ,R,S) is in a CNF, if every rule in in R has one of the followingforms:
A→ a, A ∈ V ∧ a ∈ ΣA→ BC, A ∈ V ∧ B,C ∈ V \ {S}S → ε.
Simpler to analyze: each derivation adds (at most) a single terminal, S onlyappears once, ε appears only at the empty word
Ronitt Rubinfeld and Iftach Haitner (TAU) Computational Models, Lecture 4 March 21/23, 2016 31 / 36
CNF: Theorem
Theorem 12Any context-free language is generated by a context-free grammar inChomsky Normal Form.
Proof Idea: [Next time] Give algorithms which does the following:
I Add new start symbol S0.
I Eliminate all ε rules of the form A→ ε.
I Eliminate all “unit” rules of the form A→ B.
I Patch up rules so that grammar generates the same language.
I Convert remaining “long rules” to proper form.
Ronitt Rubinfeld and Iftach Haitner (TAU) Computational Models, Lecture 4 March 21/23, 2016 32 / 36
CNF: Bounded Derivation Length
Lemma 13For a CNF grammar G and w ∈ L(G) with |w | = n ≥ 1, it holds that w has aderivation of length 2n − 1.
Proof? consider the parsing tree for w – degree at most 2 and n leaves.
Advantage: Easier to check whether w ∈ L(G) – only need to try allderivations of length 2n − 1. This is finite, but there are a lot of these!
Ronitt Rubinfeld and Iftach Haitner (TAU) Computational Models, Lecture 4 March 21/23, 2016 33 / 36
Checking Membership for Grammars in CNF Form
Given a CNF grammar G = (V ,Σ,R,S), we build a function Derive(A, x) thatreturns TRUE iff A ∗→ x .
Algorithm 14 (Derive(A, x))
I If x = ε: if A→ ε ∈ R (i.e., A = S) return TRUE, otherwise return FALSE.
I If |x | = 1: if A→ x ∈ R return TRUE, otherwise return FALSE.
I For each A→ BC and each partition x = x1x2:
I Call Derive(B, x1) and Derive(C, x2).I Return TRUE if both return TRUE.
I Return FALSE.
Test whether w ∈ L(G) by calling Derive(S,w)
Correctness?
I Procedure Derive can also output a parse tree for w
I Have we critically used that G is in CNF?
Ronitt Rubinfeld and Iftach Haitner (TAU) Computational Models, Lecture 4 March 21/23, 2016 34 / 36
Time Complexity of Derive
What is the time complexity T : N 7→ N of Derive?
I Each recursive call tests |R| rules and n partitions.I T (n) ≤ |R| · n · 2T (n − 1)I T (n) ∈ O((|R| · n)n).
Still exponential...
Ronitt Rubinfeld and Iftach Haitner (TAU) Computational Models, Lecture 4 March 21/23, 2016 35 / 36
Efficient Algorithm
I Keep in memory the results of Derive(A, x).
I Number of different inputs: |V | · n2.I Only |V | · n2 calls, each takes O(|R| · n).I T (n) ∈ O(|R| · n3 · |V |).
I Polynomial time!
I This approach is called Dynamic Programming
Basic idea:
I If number of different inputs is limited, say I(n).I Each run (excluding recursive calls) takes at most R(n) timeI Total running time is bounded by T (n) ≤ R(n)I(n).
Ronitt Rubinfeld and Iftach Haitner (TAU) Computational Models, Lecture 4 March 21/23, 2016 36 / 36
Top Related