Formalising the Normal Forms of CFGs in HOL4users.cecs.anu.edu.au/~aditi/tum.pdf · Context-free...
Transcript of Formalising the Normal Forms of CFGs in HOL4users.cecs.anu.edu.au/~aditi/tum.pdf · Context-free...
Formalising the Normal Forms of CFGs in HOL4
Aditi Barthwal1 Michael Norrish2
1 Australian National University2 NICTA
Technische Universitat Munchen
September 2010
Aditi Barthwal CFG Normal Forms 1/31
Context-free grammars
G = (V ;T ;P;S), where
V = finite set of variables or nonterminals
T = finite set of terminals
P = finite set of productions, each one of form A ! �, where
A 2 V and � is a string of symbols such that � 2 (V [ T )�S = start symbol
A word is a string over terminals.
Language of G, L(G), are all the words reachable from the start
symbol.
Aditi Barthwal CFG Normal Forms 2/31
CFGs — The HOL Version
Types:
(’nts, ’ts) symbol = NTS of ’nts | TS of ’ts
(’nts, ’ts) rule
= rule of ’nts => (’nts, ’ts) symbol list
(’nts, ’ts) grammar
= G of (’nts, ’ts) rule list => ’nts
A grammar’s language:
L g =f tsl |
(derives g)� [NTS (startSym g)] tsl ^isWord tsl g
Aditi Barthwal CFG Normal Forms 3/31
Results I will not talk about
Simplification/normalisation of CFGs by
removing symbols that do not generate a terminal string or
are not reachable from the start symbol of the grammar
(useless symbols);
Aditi Barthwal CFG Normal Forms 4/31
Results I will not talk about
Simplification/normalisation of CFGs by
removing symbols that do not generate a terminal string or
are not reachable from the start symbol of the grammar
(useless symbols);
removing �-productions (as long as � is not in the language
generated by the grammar);
Aditi Barthwal CFG Normal Forms 5/31
Results I will not talk about
Simplification/normalisation of CFGs by
removing symbols that do not generate a terminal string or
are not reachable from the start symbol of the grammar
(useless symbols);
removing �-productions (as long as � is not in the language
generated by the grammar);
removing unit productions, i.e. ones of the form A ! B where
B is a nonterminal symbol.
Aditi Barthwal CFG Normal Forms 6/31
Chomsky Normal Form
A grammar G is in Chomsky Normal Form if every rule is of the
form
A ! A1A2
where Ai is a non-terminal
or
A ! a
where a is a terminal.
Aditi Barthwal CFG Normal Forms 7/31
The Chomsky Normal Form Theorem
Language Equivalence
INFINITE U(:’nts) ^ [] =2 L g )9 g0: isCnf g0 ^ L g = L g0Proof:
H&U’s proof is 3.5 pages long with examples
The HOL proof is 1444 loc
Translation from H&U to HOL is straightforward
Aditi Barthwal CFG Normal Forms 8/31
Point of Difference from Text Proof
Assumption INFINITE U (:’nts)
Required because Need to introduce a new nonterminal not in g
Aditi Barthwal CFG Normal Forms 9/31
Point of Difference from Text Proof
Assumption INFINITE U (:’nts)
Required because Need to introduce a new nonterminal not in g
(S1) Universal set of nonterminals is infinite
Aditi Barthwal CFG Normal Forms 10/31
Point of Difference from Text Proof
Assumption INFINITE U (:’nts)
Required because Need to introduce a new nonterminal not in g
(S1) Universal set of nonterminals is infinite +
(S2) Nonterminals in g are finite
Aditi Barthwal CFG Normal Forms 11/31
Point of Difference from Text Proof
Assumption INFINITE U (:’nts)
Required because Need to introduce a new nonterminal not in g
(S1) Universal set of nonterminals is infinite +
(S2) Nonterminals in g are finite )Can pick a nonterminal that is in S1 but not in S2
Aditi Barthwal CFG Normal Forms 12/31
The Relational Approach to Grammar Transformation
Both normalisations feature “non-determinism”:
choice of fresh non-terminals
order in which rules are transformed
Rather than define a function, use a “one-step” relation:
R : grammar ! grammar ! bool
(Additional parameters possible: e.g. fresh symbols)
Show:
Each application of R preserves language equality
There is always a step possible while grammar has not
reached final form
Aditi Barthwal CFG Normal Forms 13/31
Greibach Normal Form (GNF)
A grammar G is in Greibach Normal Form if every rule is of the
form
A ! aA1A2 : : :An
where n � 0.
Aditi Barthwal CFG Normal Forms 14/31
The GNF Destination
Language Equivalence
INFINITE U(:’nts) ^ [] =2 L g )9 g0: isGnf g0 ^ L g = L g0Proof (in H&U):
3 pages long
Includes a crucial picture
Aditi Barthwal CFG Normal Forms 15/31
The Crux of GNF
The central issue in the proof is dealing with left-recursion: rules
of the form
A ! A �or loops such as
A ! B �B ! C C ! A Æ
Aditi Barthwal CFG Normal Forms 16/31
GNF: Step 0
Convert grammar to Chomsky Normal Form.
Aditi Barthwal CFG Normal Forms 17/31
GNF: Step 1
Order the non-terminals. (Another source of non-determinism!)
“Substitute out” variable references so that
Ai ! Aj �only occurs if j > i
(Hard in presence of left-recursion!)
Aditi Barthwal CFG Normal Forms 18/31
GNF: Step 1 (The Easy Case)
Working on Ai .
Assume that all Aj<i have been done.
In order (j = 1 : : : i � 1), if rule is Ai ! Aj �take all possible RHSes for Aj (�1 : : : �n)
replace rule above with Ai ! �k � (k 2 f1 : : : ng)
(Each replacement preserves the language (H&U Lemma 4.3))
May result in a rule Ai ! Ai . . .
Aditi Barthwal CFG Normal Forms 19/31
GNF: Step 1 (The Hard Bit)
May now have a left-recursive rule A ! A�(No left-recursive cycles possible though.)
Aditi Barthwal CFG Normal Forms 20/31
Hopcroft & Ullman Lemma 4.4: the “left to right” lemma
Change the left recursive rules into right recursive rules.
Lemma (“left to right lemma”)
Let g = (V ;T ;P;S) be a CFG. Let A ! A�1 j A�2 j : : : j A�r be
the set of left recursive A-productions. Let A ! �1 j �2 j : : : j �s
be the remaining A-productions. Then we can construct
g0 = (V [ fBg;T ;P1;S) such that L(g) = L(g0) by replacing all
the left recursive A-productions by the following productions:
Rule 1 A ! �i and A ! �iB
Rule 2 B ! �i and B ! �iB
Here, B is a fresh nonterminal that does not belong in g.
Aditi Barthwal CFG Normal Forms 21/31
Hopcroft & Ullman’s Picture
Any derivation in the left-recursive grammar can be mimicked in
the right-recursive grammar, and vice versa:
A
a1
A
bA
a2
B
an
B
a2
A
anA
b
B
a1
Aditi Barthwal CFG Normal Forms 22/31
Hopcroft & Ullman’s Picture
Any derivation in the left-recursive grammar can be mimicked in
the right-recursive grammar, and vice versa:
A
a1
A
bA
a2
B
an
B
a2
A
anA
b
B
a1
Derivation A ! Aa1 ! Aa2a1 ! � � � ! An : : : a2a1 ! ban : : : a2a1
can be transformed into derivation
A ! bB ! ban ! � � � ! ban : : : a2 ! ban : : : a2a1.
Aditi Barthwal CFG Normal Forms 23/31
Realising the Picture Formally
A
a1
A
bA
a2
B
an
B
a2
A
anA
b
B
a1
A-block
B-block
Proof by induction on block.
Aditi Barthwal CFG Normal Forms 24/31
The “left to right” lemma
Result: Language Equivalence8 g g0: left2Right A B g g0 ) L g = L g0Aditi Barthwal CFG Normal Forms 25/31
GNF: Step 2 (A-productions to a-productions)
a-productions Let a-productions be rules of the form A ! a�where a is a terminal symbol.
Ai ! Aj� in g1 are replaced by Ai ! a��, where Aj ! a�Aditi Barthwal CFG Normal Forms 26/31
GNF: Step 2 (A-productions to a-productions)
Look at nonterminals in decreasing order, Aj , Aj�1,. . . ,A1
Grammar must have Aj ! a�, for some terminal a
If Aj�1 ! b� for some terminal b and some symbols �Done
If Aj�1 ! Aj� for some symbols �replace Aj with a�
Repeat for Aj�2 to A1
Aditi Barthwal CFG Normal Forms 27/31
GNF: Step 3 (B-productions to a-productions)
Bk ! Ai� in g2 are replaced with Bk ! a��, where Ai ! a�Aditi Barthwal CFG Normal Forms 28/31
The Proof Effort in Summary
�1 year�14000 lines of code�700 lemmas and theorems
+ library of common definitions and theorems
Aditi Barthwal CFG Normal Forms 29/31
Conclusion
Relational idiom for non-determinism
Mechanisation of Chomsky Normal Form
Mechanisation of Greibach Normal Form
Lemma 4.3 — substituting out non-terminal references
Lemma 4.4 — removal of left-recursion
Translation of H&U’s picture into an induction
Aditi Barthwal CFG Normal Forms 30/31
Hopcroft & Ullman Lemma 4.3
Let A-productions be those productions whose LHS is the
nonterminal A.
Lemma (“aProds lemma”)
Let G = (V ;T ;P;S) be a CFG. Let A ! �1B�2 be a production in
P and B ! �1j�2j : : : j�r be the set of all B-productions. Let
G1 = (V ;T ;P1;S) be obtained from G by deleting the production
A ! �1B�2 from P and adding the productions
A ! �1�1�2j�1�2�2j : : : j�1�2�2. Then L(G) = L(G1).
Aditi Barthwal CFG Normal Forms 31/31