MELJUN CORTES automata9
-
Upload
meljun-cortes -
Category
Technology
-
view
60 -
download
0
Transcript of MELJUN CORTES automata9
CSC 3130: Automata theory and formal languages
Normal forms and parsing
Fall 2008MELJUN P. CORTES, MELJUN P. CORTES, MBA,MPA,BSCS,ACSMBA,MPA,BSCS,ACS
MELJUN CORTESMELJUN CORTES
Testing membership and parsing
• Given a grammar
• How can we know if a string x is in its language?
• If so, can we reconstruct a parse tree for x?
S → 0S1 | 1S0S1 | TT → S | e
First attempt
• Maybe we can try all possible derivations:
S → 0S1 | 1S0S1 | TT → S | ε x = 00111
S 0S1
1S0S1
T
00S1101S0S110T1
S
ε
10S10S1...
when do we stop?
Problems
• How do we know when to stop?
S → 0S1 | 1S0S1 | TT → S | ε x = 00111
S 0S1
1S0S1
00S1101S0S110T110S10S1
...
when do we stop?
Problems
• Idea: Stop derivation when length exceeds |x|
• Not right because of ε-productions
• We might want to eliminate ε-productions too
S → 0S1 | 1S0S1 | TT → S | ε x = 01011
S ⇒ 0S1 ⇒ 01S0S11 ⇒ 01S011 ⇒ 010111 3 7 6 5
Problems
• Loops among the variables (S → T → S) might make us go forever
• We might want to eliminate such loops
S → 0S1 | 1S0S1 | TT → S | ε x = 00111
Unit productions
• A unit production is a production of the form
where A1 and A2 are both variables
• Example
A1 → A2
S → 0S1 | 1S0S1 | TT → S | R | εR → 0SR
grammar: unit productions:
S T
R
Removal of unit productions
• If there is a cycle of unit productions
delete it and replace everything with A1
• Example
A1 → A2 → ... → Ak → A1
S → 0S1 | 1S0S1 | TT → S | R | εR → 0SR
S T
R
S → 0S1 | 1S0S1S → R | εR → 0SR
T is replaced by S in the {S, T} cycle
Removal of unit productions
• For other unit productions, replace every chain
by productions A1 → α,... , Ak → α
• Example
A1 → A2 → ... → Ak → α
S → R → 0SR is replaced by S → 0SR, R → 0SR
S → 0S1 | 1S0S1 | R | εR → 0SR
S → 0S1 | 1S0S1 | 0SR | εR → 0SR
Removal of ε-productions
• A variable N is nullable if there is a derivation
• How to remove ε-productions (except from S)
Find all nullable variables N1, ..., Nk
For i = 1 to kFor every production of the form A → αNiβ,
add another production A → αβIf Ni → ε is a production, remove it
If S is nullable, add the special production S → ε
N ⇒ ε*
Example
• Find the nullable variables
S → ACDA→ aB → εC → ED | εD → BC | bE → b
B C D
nullable variablesgrammar
Find all nullable variables N1, ..., Nk
Finding nullable variables
• To find nullable variables, we work backwards– First, mark all variables A s.t. A → ε as nullable– Then, as long as there are productions of the form
where all of A1,…, Ak are marked as nullable, mark A as nullable
A → A1… Ak
Eliminating ε-productions
S → ACDA→ aB → εC → ED | εD → BC | bE → b
nullable variables: B, C, D
For i = 1 to kFor every production of the form A → αNiβ,
add another production A → αβIf Ni → ε is a production, remove it
D → CS → ADD → BD → εS → ACS → AC → E
Recap
• After eliminating ε-productions and unit productions, we know that every derivation
doesn’t shrink in length and doesn’t go into cycles
• Exception: S → ε – We will not use this rule at all, except to check if ε ∈ L
• Note� ε-productions must be eliminated before unit productions
S ⇒ a1…ak where a1, …, ak are terminals*
Example: testing membership
S → 0S1 | 1S0S1 | TT → S | ε
x = 00111
S → ε | 01 | 101 | 0S1 |10S1 | 1S01 | 1S0S1
S 01, 101
10S1
1S01
1S0S1
10011, strings of length ≥ 6
10101, strings of length ≥ 6
unit, ε-prod
eliminate
only strings of length ≥ 6
0S1 0011, 0101100S11strings of length ≥ 6
only strings of length ≥ 6
Algorithm 1 for testing membership
• We can now use the following algorithm to check if a string x is in the language of G
Eliminate all ε-productions and unit productionsIf x = ε and S → ε, accept; else delete S → εLet X := S
While some new production P can be applied to XApply P to XIf X = x, acceptIf |X| > |x|, backtrack
If no more productions can be applied to X, reject
Practical limitations of Algorithm I
• Previous algorithm can be very slow if x is long
• There is a faster algorithm, but it requires that we do some more transformations on the grammar
G = CFG of the java programming languagex = code for a 200-line java program
algorithm might take about 10200 steps!
Chomsky Normal Form
• A grammar is in Chomsky Normal Form if every production (except possibly S → ε) is of the type
• Conversion to Chomsky Normal Form is easy:
A → BC A → aor
A → BcDEreplace terminalswith new variables
A → BCDEC → c
break upsequenceswith new variables
A → BX1
X1 → CX2
X2 → DEC → c
Exercise
• Convert this CFG into Chomsky Normal Form:
S → ε |ADDA
A → a
C → c
D → bCb
Algorithm 2 for testing membership
S → AB | BCA → BA | aB → CC | bC → AB | a
x = baaba
Idea: We generate each substring of x bottom up
ab b aa
ACB B ACAC
BSA SASC
B– B
SAC–
SAC
Parse tree reconstruction
S → AB | BCA → BA | aB → CC | bC → AB | a
x = baabaab b aa
ACB B ACAC
BSA SASC
B– B
SAC–
SAC
Tracing back the derivations, we obtain the parse tree
Cocke-Younger-Kasami algorithm
For i = 1 to k If there is a production A → xi
Put A in table cell iiFor b = 2 to k For s = 1 to k – b + 1 Set t = s + b For j = s to t If there is a production A → BC where B is in cell sj and C is in cell jt Put A in cell st
x1 x2 … xk
11 22 kk
12 23… …
1k
tablecells
s j t k1
b
Input: Grammar G in CNF, string x = x1…xk
Cell ij remembers all possible derivations of substring xi…xj