1 Chomsky Normal Form of CFG’s Definition Purpose Method of Constuction.

33
1 Chomsky Normal Form of CFG’s Definition Purpose Method of Constuction

Transcript of 1 Chomsky Normal Form of CFG’s Definition Purpose Method of Constuction.

1

Chomsky Normal Form of CFG’s

DefinitionPurpose

Method of Constuction

2

A construct used to establish properties of context-free languages (CFLs)

Every CFL without can be generated by a CFG in Chomsky normal form.

To show that language without is a CFL it is sufficient to show that it has a CFG in Chomsky normal form.

Typical approach to closure properites

Chomsky Normal Form: Purpose

3

Chomsky Normal Form: Definition

A context free grammar (CFG) in which all production are of the form A->BC or A->a, where A, B and C are variables and a is a terminal

4

Eliminate “useless: symbols Variables or terminals that do not

appear in any derivation of a terminal string from the start symbol

Eliminate -productions A->

Eliminate unit-productions A->B for variables A and B

Chomsky Normal Form: method of construction

5

For each elimination task, a method will be defined reclusively by an inductive proof.

Order in which tasks are preformed is important

Chomsky Normal Form: method of construction - 2

6

Generating and Reachable Symbols

X is generating if X =>* w (terminal string)

If X is a terminal, then it can generate itself in zero steps.

X is reachable if S =>* X for some and , (S is a start symbol)

Any symbol that is not generating and reachable is useless

7

Induction to find generating variables

Basis: If there is a production A -> w, where w is a terminal string, then A is generating.

Induction: If there is a production A -> , where consists only of terminals and variables known to derive a terminal string, then A derives a terminal string; hence is generating.

8

Algorithm to eliminate non-generating variables

1. Discover all variables that derive terminal strings.

2. For all other variables, remove all productions in which they appear either on the LHS or RHS of ->.

9

Example: finding generating variables

S->AB|C, A->aA|a, B->bB, C->c Basis: A and C are generating due

to productions A->a and C->c. Induction: S is generating due to

production S->C. Eliminate B->bB and S->AB Result: S->C, A->aA|a, C->c Still have unreachable variables

10

Finding reachable symbols

Basis: Obviously, start symbol is reachable.

Induction: if we can reach A, and there is a production A->, then we can reach all symbols of .

In result from previous slide S->C, A->aA|a, C->c

Only S and C are reachable

11

Epsilon Productions Theorem: If L is a CFL with no empty

string, then it has a CFG which can be put in Chomsky form with no -productions.

A-> is clearly an -production To eliminate all types -productions, we

must first discover the nullable variables, i.e. variables A such that A =>* ε.

12

Inductive definition of nullable symbols

Basis: If there is a production A -> ε, then A is nullable.

Induction: If there is a production A -> , and all symbols of are nullable, then A is nullable.

13

Example: Nullable Symbols

S->AB, A->aA|ε, B->bB|A A is nullable because of A -> ε. B is nullable because of B -> A. S is nullable because of S -> AB.

14

Algorithm to eliminate -productions Identify all nullable symbols.

Consider each production A->X1…Xn that contains nullable symbols

Suppose A->X1…Xn contains m<n nullable symbols

Construct a family of productions with 2m members that are all combinations of nullable symbols present or absent

If m=n exclude case with all symbols absent

15

Eliminating -productions The new CFG with no -productions

consist of all families of productions derived from productions with nullable symbols

Plus all productions from the original CFG that did not contain nullable symbols

16

Example: Eliminating ε-Productions

S->ABC, A->aA|ε, B->bB|ε, C->ε A, B, C, and S are all nullable. Productions S->ABC|AB|AC|BC|A|B|C

come from S->ABC Productions A->aA|a come from A-

>aA Productions B->bB|b come from B-

>bB

17

Eliminating ε-Productions continued

S->ABC, A->aA|ε, B->bB|ε, C->ε No contribution to CNF from original CFG C is not generating Eliminate C in productions of the new

CFGS -> ABC | AB | AC | BC | A | B | CA -> aA | aB -> bB | b

18

Define Unit Productions

A unit production is a production whose right side consists of exactly one variable.

A->a is not a unit production if a is terminal

Eliminate by expansion is most common approach

19

Eliminate by expansion In the CFG defined by

E->T|E+T T->F|T*F F->I|(E) I->a|Ia

E->T eliminated by E->F|T*F|E+T E->F eliminated by E->I|(E)|T*F|E+T E->I eliminated by E->a|Ia|(E)|T*F|

E+T

20

Eliminate by expansion Will not work on cycles of unit

productions A->B B->C C->A

Alternative: find all pairs (A,B) such that A=>*B by a sequence of unit productions

Works in all cases.

21

Alternative to expansion in eliminating unit productions Basic idea: If A=>*B by a series of

unit productions, and B-> is a non-unit-production, then add production A-> and drop the unit productions.

Example

22

Example of basic idea In the CFG defined by

E->T|E+T T->F|T*F F->I|(E) I->a|Ia

E=>*I by the series of unit productions E->T, T->F, F->I

I->a is a non-unit production. Replace by E->a E->a|Ia|(E)|T*F|E+T (same as

expansion method)

23

Pair search defined by induction

Find all pairs (A,B) such that A=>*B by a sequence of unit productions only.

Basis: A=>*A, therefor (A,A). Induction: If we have found (A,B), and B-

>C is a unit production, then add (A,C)

24

Example of pair search In CFG defined by

E->T|E+T T->F|T*F F->I|(E) I->a|Ia

Obviously (E,T), (T,F), (F,I) (T,I) and (E,F) also

25

Cleaning up a Grammar

Theorem: if L is a CFL, then there is a CFG for L – {ε} that has:

1. No useless symbols.2. No ε-productions.3. No unit productions.

every right side of a production is either a single terminal or has length > 2.

26

Clean-up continued Proof: Start with a CFG for L. Perform the following steps in order:

1. Eliminate ε-productions.2. Eliminate unit productions.3. Eliminate variables that derive no

terminal string.4. Eliminate variables not reached from

the start symbol.Must be first. Can createunit productions and uselessvariables.

27

Chomsky Normal Form A CFG is said to be in Chomsky

Normal Form if every production is of one of these two forms:

1. A -> BC (right side is two variables).2. A -> a (right side is a single terminal).

Theorem: If L is a CFL, then L – {ε} has a CFG in CNF.

28

Proof by construction Step 1: “Clean” the grammar, so every

production has right side either a single terminal or length >2.

Step 2: For each right side a single terminal, make the right side all variables. For each terminal a create new variable Aa and

production Aa -> a. (not a unit production)

Replace a by Aa in right sides of productions.

29

Example: Step 2

Consider production A -> BcDe. We need variables Ac and Ae. with

productions Ac -> c and Ae -> e. Note: you create at most one variable

for each terminal, and use it everywhere it is needed.

Replace A -> BcDe by A -> BAcDAe.

30

CNF construction: final step

Step 3: Break right sides longer than 2 into a chain of productions with right sides of two variables.

Example: A -> BCDE is replaced by A -> BF, F -> CG, and G -> DE. F and G must be used nowhere else.

31

Example text p266

S->ABA->aAA|B->bBB|

32

Assignment 11, Due 11-19-14

Exercise 7.1.2 text p 275 and 277

33