Properties of Context-Free Languages - Computer Science

Post on 19-Feb-2022

10 views 0 download

Transcript of Properties of Context-Free Languages - Computer Science

Properties ofContext-Free Languages

COMP 455 โ€“ 002, Spring 2019

Jim Anderson (modified by Nathan Otterness) 1

Simplification of CFGs

We can simplify CFGs by removing:

Useless symbols.

โ–๐‘‹ is generating if ๐‘‹ ึœโˆ—๐‘ค, where ๐‘ค โˆˆ ๐‘‡โˆ—.

โ–๐‘‹ is reachable if ๐‘† ึœโˆ—๐›ผ๐‘‹๐›ฝ (๐‘† is the start symbol).

โ–๐‘‹ is useful only if it is both reachable and generating.

๐œบ-productions, of the form ๐ด โ†’ ํœ€.

โ–If ํœ€ is in the language, we will still need oneํœ€-production.

Unit productions, of the form ๐ด โ†’ ๐ต.

Jim Anderson (modified by Nathan Otterness) 2

Finding Generating Variables

Theorem 7.4: The following algorithm correctly finds all generating variables.

Jim Anderson (modified by Nathan Otterness) 3

old_vars := โˆ…;new_vars := ๐ด | ๐ด โ†’ ๐‘ค exists, and ๐‘ค โˆˆ ๐‘‡โˆ— ;while old_vars โ‰  new_vars:

old_vars = new_vars;new_vars = old_vars โˆช ๐ด | ๐ด โ†’ ๐›ผ, ๐›ผ โˆˆ ๐‘‡ โˆช old_vars โˆ— ;

return new_vars;

Finding Generating Variables

Proof of Theorem 7.4:

We want to show that ๐‘‹ is added to new_vars if and only

if ๐‘‹ ึœโˆ—๐‘ค for some ๐‘ค โˆˆ ๐‘‡โˆ—.

โ€œOnly ifโ€: We must show that if ๐‘‹ is added to new_vars

then ๐‘‹ ึœโˆ—๐‘ค.

This can be proven by induction on the number of iterations of the algorithm (specifics are left as an exercise).

Jim Anderson (modified by Nathan Otterness) 4

old_vars := โˆ…;new_vars := ๐ด | ๐ด โ†’ ๐‘ค exists, and ๐‘ค โˆˆ ๐‘‡โˆ— ;while old_vars โ‰  new_vars:

old_vars = new_vars;new_vars = old_vars โˆช ๐ด | ๐ด โ†’ ๐›ผ, ๐›ผ โˆˆ ๐‘‡ โˆช old_vars โˆ— ;

return new_vars;

Finding Generating Variables

Proof of Theorem 7.4:

We want to show that ๐‘‹ is added to new_vars if and only

if ๐‘‹ ึœโˆ—๐‘ค for some ๐‘ค โˆˆ ๐‘‡โˆ—.

โ€œIfโ€: We must show that if ๐‘‹ ึœโˆ—๐‘ค, then ๐‘‹ is eventually

added to new_vars.

This can be proven by induction on the length of the derivation (specifics are left as an exercise).

Jim Anderson (modified by Nathan Otterness) 5

old_vars := โˆ…;new_vars := ๐ด | ๐ด โ†’ ๐‘ค exists, and ๐‘ค โˆˆ ๐‘‡โˆ— ;while old_vars โ‰  new_vars:

old_vars = new_vars;new_vars = old_vars โˆช ๐ด | ๐ด โ†’ ๐›ผ, ๐›ผ โˆˆ ๐‘‡ โˆช old_vars โˆ— ;

return new_vars;

Finding Reachable Variables

Theorem 7.5: There exists an iterative algorithm that will correctly find all reachable symbols.

This is similar to the previous algorithm, except this time youโ€™ll start with a set containing the start symbol and look for new reachable symbols in each iteration.

Jim Anderson (modified by Nathan Otterness) 6

old_vars := โˆ…;new_vars := ๐‘† ;while old_vars โ‰  new_vars:

old_vars = new_vars;new_vars = old_vars โˆช๐ด | ๐ด is produced by something in old_vars ;

return new_vars;

Eliminating Useless Symbols

Theorem 7.2: (Abbreviated) Every nonempty CFL is generated by a CFG with no useless symbols.

Proof:

Let ๐ฟ be the language of some CFG ๐บ, where ๐ฟ โ‰  โˆ….

Define:

๐บ ๐บ1 ๐บ2

Jim Anderson (modified by Nathan Otterness) 7

RemoveNon-generating

using Theorem 7.4

RemoveNon-reachable

using Theorem 7.5

This order matters!

Eliminating Useless SymbolsProof, continued:

Assume ๐บ2 contains a useless variable, ๐‘‹.

By Theorem 7.5, ๐‘†ึœ๐บ2

โˆ—๐›ผ๐‘‹๐›ฝ.

โ– In other words, we know that ๐‘‹ is reachable in ๐บ2.

Any production in ๐บ2 must be a production in ๐บ1, so ๐‘†ึœ๐บ1

โˆ—๐›ผ๐‘‹๐›ฝ must be a production in ๐บ1.

By Theorem 7.4, ๐‘†ึœ๐บ1

โˆ—๐›ผ๐‘‹๐›ฝึœ

๐บ1

โˆ—๐‘ค.

โ– In other words, we know that ๐‘‹ is producing in ๐บ1.

Every symbol in this derivation is reachable from ๐‘†, so none will be eliminated by Theorem 7.5. So, ๐‘†ึœ

๐บ2

โˆ—๐›ผ๐‘‹๐›ฝึœ

๐บ2

โˆ—๐‘ค.

This contradicts the assumption that ๐‘‹ was useless in ๐บ2.

Jim Anderson (modified by Nathan Otterness) 8

๐บ

Removenonโˆ’generating

๐บ1

Removeunreachable

๐บ2

It should be intuitively clear that removing useless

symbols wonโ€™t change the language of a grammar.

Eliminating Useless Symbols: Example

Jim Anderson (modified by Nathan Otterness) 9

๐‘† โ†’ ๐ด๐ต | ๐‘Ž๐ด โ†’ ๐‘Ž

๐‘† โ†’ ๐ด๐ต | ๐‘Ž๐ด โ†’ ๐‘Ž

๐‘† โ†’ ๐‘Ž๐ด โ†’ ๐‘Ž

Remove Unreachable

Remove Non-generating

๐‘† โ†’ ๐ด๐ต | ๐‘Ž๐ด โ†’ ๐‘Ž

๐‘† โ†’ ๐‘Ž๐ด โ†’ ๐‘Ž

๐‘† โ†’ ๐‘Ž

Remove Non-generating

Remove Unreachable

Incorrect order: Correct order:

Note that these are not the same!

Removing Nullable Symbols

A symbol ๐ด is nullable if ๐ด ึœโˆ—ํœ€.

Theorem 7.7: There exists an algorithm that will correct identify all nullable symbols.

We will not prove thisโ€”it should be intuitively similar to what weโ€™ve done before.

Jim Anderson (modified by Nathan Otterness) 10

old_vars := โˆ…;new_vars := ๐ด | ๐ด โ†’ ํœ€ exists ;while old_vars โ‰  new_vars:

old_vars = new_vars;new_vars = old_vars โˆช ๐ด | ๐ด โ†’ ๐›ผ, ๐›ผ โˆˆ old_varsโˆ— ;

return new_vars;

Removing ํœ€-Productions

Theorem 7.9, reworded: If ๐ฟ = ๐ฟ ๐บ for a CFG ๐บ, then there exists a CFG ๐บ1 with no ํœ€-Productions such that ๐ฟ ๐บ1 = ๐ฟ ๐บ โˆ’ ํœ€ .

Proof:

To construct ๐‘ฎ๐Ÿ: If ๐ด โ†’ ๐‘‹1โ€ฆ๐‘‹๐‘˜ is in ๐‘ƒ, then add all productions of the form ๐ด โ†’ ๐›ผ1โ€ฆ๐›ผ๐‘˜ to ๐‘ƒ1, where:

1. If ๐‘‹๐‘– is not nullable, then ๐›ผ๐‘– = ๐‘‹๐‘–,

2. If ๐‘‹๐‘– is nullable, then ๐›ผ๐‘– is either ๐‘‹๐‘– or ํœ€, and

3. Not all ๐›ผ๐‘–โ€™s are ํœ€.

Jim Anderson (modified by Nathan Otterness) 11

This requires adding two production rules for each nullable ๐‘‹๐‘–.

Removing ํœ€-Productions: Example

CFG ๐‘ƒ:

๐‘† โ†’ ๐ด๐ต๐ถ

๐ด โ†’ ํœ€

๐ต โ†’ ๐‘ | ํœ€

๐ถ โ†’ ๐‘

๐ด and ๐ต are nullable. CFG ๐‘ƒ1:

๐‘† โ†’ ๐ด๐ต๐ถ ๐ต๐ถ ๐ด๐ถ | ๐ถ

๐ต โ†’ ๐‘

๐ถ โ†’ ๐‘

Jim Anderson (modified by Nathan Otterness) 12

๐ด is now useless in ๐‘ƒ1. (If you want to eliminate both ํœ€-productions and useless symbols, you must remove

ํœ€-productions first.

Removing ํœ€-Productions: Proof

Proof, continued:

Claim: For all ๐ด โˆˆ ๐‘‰ and ๐‘ค โˆˆ ๐‘‡โˆ—, ๐ดึœ๐บ1

โˆ—๐‘ค if and only if

๐‘ค โ‰  ํœ€ and ๐ดึœ๐บ

โˆ—๐‘ค.

โ€œIfโ€: Assume ๐ดึœ๐บ

๐‘–๐‘ค and ๐‘ค โ‰  ํœ€. We prove by

induction on ๐‘– that ๐ดึœ๐บ1

โˆ—๐‘ค.

Base case: ๐‘– = 1 (one derivation step)

๐ด โ†’ ๐‘ค must be a production in ๐‘ƒ. Because ๐‘ค โ‰  ํœ€,๐ด โ†’ ๐‘ค is also a production in ๐บ1.

Jim Anderson (modified by Nathan Otterness) 13

๐บ1: The CFG without ํœ€-productions producing

๐ฟ ๐บ โˆ’ ํœ€ .

Removing ํœ€-Productions: ProofInductive step: ๐‘– > 1 (more than one derivation step)

Assume ๐ดึœ๐บ๐‘Œ1โ€ฆ๐‘Œ๐‘š ึœ

๐บ

๐‘–โˆ’1๐‘ค. Then ๐‘Œ๐‘—ึœ

๐บ

โˆ—๐‘ค๐‘— and

๐‘ค = ๐‘ค1โ€ฆ๐‘ค๐‘š.

If ๐‘ค๐‘— โ‰  ํœ€, then ๐‘Œ๐‘—ึœ๐บ1

โˆ—๐‘ค๐‘—, by the induction hypothesis.

If ๐‘ค๐‘— = ํœ€, then ๐‘Œ๐‘— is nullable.

Therefore, ๐ด โ†’ ๐›ฝ1โ€ฆ๐›ฝ๐‘š is in the productions of ๐บ1, where

๐›ฝ๐‘— = ๐‘Œ๐‘— if ๐‘ค๐‘— โ‰  ํœ€,

๐›ฝ๐‘— = ํœ€ if ๐‘ค๐‘— = ํœ€.

And we have the following derivation in ๐บ1:

๐ด ึœ ๐›ฝ1๐›ฝ2โ€ฆ๐›ฝ๐‘š ึœโˆ—๐‘ค1๐›ฝ2โ€ฆ๐›ฝ๐‘š ึœ

โˆ—๐‘ค1๐‘ค2โ€ฆ๐‘ค๐‘š = ๐‘ค.

Jim Anderson (modified by Nathan Otterness) 14

Claim: For all ๐ด โˆˆ ๐‘‰ and ๐‘ค โˆˆ ๐‘‡โˆ—,

๐ดึœ๐บ1

โˆ—๐‘ค if and only if ๐‘ค โ‰  ํœ€ and ๐ดึœ

๐บ

โˆ—๐‘ค.

Removing ํœ€-Productions: Proof

โ€œOnly ifโ€: Assume ๐ดึœ๐บ1

๐‘–๐‘ค. Then, ๐‘ค โ‰  ํœ€.

We will prove by induction on ๐‘– that ๐ดึœ๐บ

โˆ—๐‘ค.

Base case: ๐‘– = 1.

๐ด โ†’ ๐‘ค is in the productions of ๐บ1. Therefore, ๐ด โ†’ ๐›ผis in the productions of ๐บ where ๐‘ค = ๐›ผ with nullable symbols replaced by ํœ€.

Jim Anderson (modified by Nathan Otterness) 15

Claim: For all ๐ด โˆˆ ๐‘‰ and ๐‘ค โˆˆ ๐‘‡โˆ—,

๐ดึœ๐บ1

โˆ—๐‘ค if and only if ๐‘ค โ‰  ํœ€ and ๐ดึœ

๐บ

โˆ—๐‘ค.

Removing ํœ€-Productions: Proof

โ€œOnly ifโ€, continued:

We must show that the derivation ๐ดึœ๐บ๐›ผึœ

๐บ

โˆ—๐‘ค exists in ๐บ.

Inductive step: Suppose that ๐ดึœ๐บ1๐‘‹1โ€ฆ๐‘‹๐‘˜ ึœ

๐บ1

๐‘–โˆ’1๐‘ค.

Then, ๐ด โ†’ ๐›ฝ is in the productions of ๐บ, where๐‘‹1โ€ฆ๐‘‹๐‘˜ = ๐›ฝ with some nullable symbols removed.

As in the base case, ๐ดึœ๐บ

โˆ—๐‘‹1โ€ฆ๐‘‹๐‘˜. And, by the inductive

hypothesis, we can show that ๐‘‹1โ€ฆ๐‘‹๐‘˜ึœ๐บ

โˆ—๐‘ค.

Jim Anderson (modified by Nathan Otterness) 16

This is where nullable symbols are replaced by ํœ€.

โ€œAssume the derivation with ๐‘– โˆ’ 1 steps is correctโ€ฆโ€ etc.

Removing Unit Productions

Theorem 7.13 (reworded): If ๐ฟ = ๐ฟ ๐บ for a CFG ๐บ, then there exists a CFG ๐บ1 with no unit productions such that ๐ฟ = ๐ฟ ๐บ1 .

Proof: Let ๐บ = ๐‘‰, ๐‘‡, ๐‘ƒ, ๐‘†

To construction ๐บ1, first add all non-unit productions in ๐‘ƒ to ๐‘ƒ1.

Next, if ๐ด ึœโˆ—

๐บ๐ต and ๐ต โ†’ ๐›ผ is a non-unit production in

๐‘ƒ, then add ๐ด โ†’ ๐›ผ to ๐‘ƒ1.

Jim Anderson (modified by Nathan Otterness) 17

We can find all pairs of ๐ด, ๐ต where

๐ดึœ๐บ

โˆ—๐ต using an iterative algorithm like

before (see Section 7.1.4 of the book).

Removing Unit Productions: Example

Consider the CFG ๐บ with the following productions:

๐‘† โ†’ ๐ด | ๐‘

๐ด โ†’ ๐ด๐ด๐‘Ž

After removing the unit production ๐‘† โ†’ ๐ด, this becomes:

๐‘† โ†’ ๐ด๐ด๐‘Ž | ๐‘

๐ด โ†’ ๐ด๐ด๐‘Ž

Jim Anderson (modified by Nathan Otterness) 18

Removing Unit Productions: Proof

Claim: ๐ฟ ๐บ1 โŠ† ๐ฟ ๐บ

Proof:

If ๐ด โ†’ ๐›ผ is in ๐‘ƒ1, then ๐ดึœ๐บ

โˆ—๐›ผ. Therefore, ๐ดึœ

๐บ1

โˆ—๐›ผ implies

๐ดึœ๐บ

โˆ—๐›ผ.

Jim Anderson (modified by Nathan Otterness) 19

๐บ: The grammar containing unit productions.๐บ1: The grammar with unit productions removed.

Removing Unit Productions: Proof

Claim: ๐ฟ ๐บ โŠ† ๐ฟ ๐บ1

Proof:

Suppose ๐‘ค โˆˆ ๐ฟ ๐บ .

Let ๐‘† = ๐›ผ0ึœ๐บ๐›ผ1ึœ

๐บโ€ฆึœ

๐บ๐›ผ๐‘› = ๐‘ค be a leftmost derivation.

If ๐›ผ๐‘–ึœ๐บ๐›ผ๐‘–+1 is due to a non-unit production, then ๐›ผ๐‘–ึœ

๐บ1๐›ผ๐‘–+1.

Jim Anderson (modified by Nathan Otterness) 20

๐บ: The grammar containing unit productions.๐บ1: The grammar with unit productions removed.

Removing Unit Productions: Proof

Claim: ๐ฟ ๐บ โŠ† ๐ฟ ๐บ1

Consider the following leftmost derivation with unit productions in ๐บ: ๐›ผ๐‘–โˆ’1ึœ

๐บ๐›ผ๐‘–ึœ

๐บ๐›ผ๐‘–+1ึœ

๐บโ€ฆึœ

๐บ๐›ผ๐‘—ึœ

๐บ๐›ผ๐‘—+1.

Unit productions just replace the symbol at the same (leftmost) position, so ๐›ผ๐‘–โˆ’1 ึœ ๐›ผ๐‘—+1 will also hold by

some production in ๐‘ƒ1 โˆ’ ๐‘ƒ.

Jim Anderson (modified by Nathan Otterness) 21

๐บ: The grammar containing unit productions.๐บ1: The grammar with unit productions removed.

Non-unit production

Non-unit production

Unit productions

Putting it all Together

Theorem 7.14: If ๐ฟ is the language of a CFG ๐บ, and ๐ฟcontains at least one string other than ํœ€, then there exists a CFG ๐บ1 with no ํœ€-productions, unit productions, or useless symbols such that ๐ฟ ๐บ1 = ๐ฟ โˆ’ ํœ€ .

(See the book for a formal proof.)

The order in which we apply the previous results is important.

Jim Anderson (modified by Nathan Otterness) 22

Putting it all Together

The order of previous results:

We saw on slide 12 that eliminating ํœ€-productions may cause some symbols to become useless, so we must eliminate ํœ€-productions before removing useless symbols.

In the same example on slide 12, removing ํœ€-productions also introduced a unit production (๐‘† โ†’๐ถ), so ํœ€-productions must be eliminated before unit productions.

Jim Anderson (modified by Nathan Otterness) 23

Putting it all Together

Finally, unit productions must be eliminated before useless symbols. Consider this example:

So, the only viable order is: 1) Eliminate ํœ€-productions, 2) Eliminate unit productions, and 3) Eliminate useless symbols.

Jim Anderson (modified by Nathan Otterness) 24

๐‘† โ†’ ๐ด๐ต๐ด โ†’ ๐ต๐ต โ†’ ๐ถ๐ถ โ†’ ๐‘

๐‘† โ†’ ๐ด๐ต๐ด โ†’ ๐‘๐ต โ†’ ๐‘๐ถ โ†’ ๐‘

Remove useless symbols

Remove unit productions

๐ถ is now useless, but is still in the

grammar

Chomsky Normal Form (CNF)

A CFG is in Chomsky Normal Form if all of its productions are of the form ๐ด โ†’ ๐ต๐ถ or ๐ด โ†’ ๐‘Ž, and it contains no useless symbols.

Theorem 7.16 (reworded): Any CFL that doesnโ€™t include ํœ€ can be generated by a CFG in CNF.

Proof: Let ๐ฟ = ๐ฟ ๐บ for some CFG ๐บ, where ํœ€ โˆ‰ ๐ฟ. Use Theorem 7.14 to convert ๐บ into ๐บ1 = ๐‘‰, ๐‘‡, ๐‘ƒ, ๐‘† , where ๐บ1 contains no ํœ€-productions, unit productions, or useless symbols.

Jim Anderson (modified by Nathan Otterness) 25

Converting to Chomsky Normal Form

Proof (continued):

If ๐ด โ†’ ๐‘‹ is a production, then ๐‘‹ โˆˆ ๐‘‡ (which is already in the correct form).

Otherwise, consider ๐ด โ†’ ๐‘‹1๐‘‹2โ€ฆ๐‘‹๐‘š, where ๐‘š โ‰ฅ 2.

If ๐‘‹๐‘– is a terminal, introduce a new variable ๐ถ๐‘Ž and a new production ๐ถ๐‘Ž โ†’ ๐‘Ž, then replace ๐‘‹๐‘– by ๐ถ๐‘Ž.

Call the resulting grammar ๐บ2 (after making all such replacements).

Claim: ๐ฟ ๐บ1 = ๐ฟ ๐บ2 . (The proof is left as an exercise.)

Jim Anderson (modified by Nathan Otterness) 26

Otherwise ๐ด โ†’ ๐‘‹ would be a unit production, and have

been eliminated already.

Converting to Chomsky Normal Form

The remaining problem is that we need to replace productions of the form ๐ด โ†’ ๐ต1๐ต2โ€ฆ๐ต๐‘š (where๐‘š โ‰ฅ 3).

Replace such a production by:

๐ด โ†’ ๐ต1๐ท1, ๐ท1 โ†’ ๐ต2๐ท2, โ€ฆ, ๐ท๐‘šโˆ’1 โ†’ ๐ต๐‘šโˆ’1๐ต๐‘š, using newly added ๐ท๐‘– variables.

Call the resulting grammar ๐บ3.

Claim: ๐ฟ ๐บ3 = ๐ฟ ๐บ2 . (The proof is left as an exercise.)

Jim Anderson (modified by Nathan Otterness) 27

Conversion to CNF: Example

Jim Anderson (modified by Nathan Otterness) 28

๐‘† โ†’ ๐‘๐ด | ๐‘Ž๐ต๐ด โ†’ ๐‘๐ด๐ด | ๐‘Ž๐‘† | ๐‘Ž๐ต โ†’ ๐‘Ž๐ต๐ต | ๐‘๐‘† | ๐‘

๐‘† โ†’ ๐ถ๐‘๐ด | ๐ถ๐‘Ž๐ต๐ด โ†’ ๐ถ๐‘๐ด๐ด | ๐ถ๐‘Ž๐‘† | ๐‘Ž๐ต โ†’ ๐ถ๐‘Ž๐ต๐ต | ๐ถ๐‘๐‘† | ๐‘๐ถ๐‘Ž โ†’ ๐‘Ž๐ถ๐‘ โ†’ ๐‘

Replace ๐ด โ†’ ๐ถ๐‘๐ด๐ด by:๐ด โ†’ ๐ถ๐‘๐ท1, ๐ท1 โ†’ ๐ด๐ด

Replace ๐ต โ†’ ๐ถ๐‘Ž๐ต๐ต by:๐ต โ†’ ๐ถ๐‘Ž๐ท2, ๐ท2 โ†’ ๐ต๐ต

๐‘† โ†’ ๐ถ๐‘๐ด | ๐ถ๐‘Ž๐ต๐ด โ†’ ๐ถ๐‘๐ท1 | ๐ถ๐‘Ž๐‘† | ๐‘Ž๐ต โ†’ ๐ถ๐‘Ž๐ท2 | ๐ถ๐‘๐‘† | ๐‘๐ท1 โ†’ ๐ด๐ด๐ท2 โ†’ ๐ต๐ต๐ถ๐‘Ž โ†’ ๐‘Ž๐ถ๐‘ โ†’ ๐‘

Starting CFG with no ํœ€-productions, unit productions, or useless symbols.

CFG in CNF.

The Pumping Lemma for CFLs

Theorem 7.18 (the Pumping Lemma for CFLs):Let ๐ฟ be any CFL. Then there exists a value ๐‘› such that for all ๐‘ง โˆˆ ๐ฟ, where ๐‘ง โ‰ฅ ๐‘›, there exist strings๐‘ข, ๐‘ฃ, ๐‘ค, ๐‘ฅ, ๐‘ฆ such that:

1. ๐‘ง = ๐‘ข๐‘ฃ๐‘ค๐‘ฅ๐‘ฆ,

2. ๐‘ฃ๐‘ฅ โ‰ฅ 1,

3. ๐‘ฃ๐‘ค๐‘ฅ โ‰ค ๐‘›, and

4. For all ๐‘– โ‰ฅ 0, ๐‘ข๐‘ฃ๐‘–๐‘ค๐‘ฅ๐‘–๐‘ฆ โˆˆ ๐ฟ.

Jim Anderson (modified by Nathan Otterness) 29

Proof of the Pumping Lemma for CFLs

If ๐ฟ is a CFL, let ๐บ be a CFG generating ๐ฟ โˆ’ ํœ€ .

Claim: Let ๐‘ง โˆˆ ๐ฟ โˆ’ ํœ€ . If a parse tree for ๐‘ง in ๐บ hasno path longer than ๐‘›, then ๐‘ง โ‰ค 2๐‘›โˆ’1. (This is Theorem 7.17 in the book.)

Proof, by induction on ๐‘›:

Base case: ๐‘› = 1. String ๐‘ง = ๐‘Ž. Tree: ๐‘† ๐‘ง = 1 = 2๐‘›โˆ’1

๐‘ŽJim Anderson (modified by Nathan Otterness) 30

The Pumping Lemma only applies to strings longer

than ๐‘›, so removing ํœ€doesnโ€™t matter.

Proof of the Pumping Lemma for CFLs

Inductive step: ๐‘› > 1.

Suppose that a tree exists with some path of length ๐‘›, but no path exceeding a length ๐‘›. It looks like this:

Jim Anderson (modified by Nathan Otterness) 31

๐‘†

๐ด ๐ต

โ‰ค 2๐‘›โˆ’2 โ‰ค 2๐‘›โˆ’2

๐‘‡1 ๐‘‡2

โ‰ค 2๐‘›โˆ’1

Proof of the Pumping Lemma for CFLs

Let ๐‘š equal the number of variables in the CFG ๐บ.

Let ๐‘› = 2๐‘š.

Suppose ๐‘ง โˆˆ ๐ฟ ๐บ , where ๐‘ง โ‰ฅ ๐‘›.

โ–Note: ๐‘ง > 2๐‘šโˆ’1

We claim that any parse tree for ๐‘ง has a path of length โ‰ฅ ๐‘š + 1.

โ–To see this, suppose all paths in a tree are shorter than ๐‘š + 1 (no path is > ๐‘š). Then, ๐‘ง โ‰ค 2๐‘šโˆ’1, contradicting the claim that ๐‘ง > 2๐‘šโˆ’1.

Jim Anderson (modified by Nathan Otterness) 32

โ€ข ๐‘ง = ๐‘ข๐‘ฃ๐‘ค๐‘ฅ๐‘ฆโ€ข ๐‘ฃ๐‘ฅ โ‰ฅ 1โ€ข ๐‘ฃ๐‘ค๐‘ฅ โ‰ค ๐‘›โ€ข For all ๐‘– โ‰ฅ 0,

๐‘ข๐‘ฃ๐‘–๐‘ค๐‘ฅ๐‘–๐‘ฆ โˆˆ ๐ฟ.

Proof of the Pumping Lemma for CFLs

As stated on the previous slide, any parse tree for ๐‘งhas a path of length โ‰ฅ ๐‘š + 1.

Such a path has at least ๐‘š + 2 nodes, ๐‘š + 1 of which are variables.

Jim Anderson (modified by Nathan Otterness) 33

โ€ข ๐‘ง = ๐‘ข๐‘ฃ๐‘ค๐‘ฅ๐‘ฆโ€ข ๐‘ฃ๐‘ฅ โ‰ฅ 1โ€ข ๐‘ฃ๐‘ค๐‘ฅ โ‰ค ๐‘›โ€ข For all ๐‘– โ‰ฅ 0,

๐‘ข๐‘ฃ๐‘–๐‘ค๐‘ฅ๐‘–๐‘ฆ โˆˆ ๐ฟ.

๐‘†

๐‘ง

Some path from the start symbol to a terminal in ๐‘ง

must contain ๐‘š + 1 variables.

The CNF grammar requires replacing variables with a

terminal at the end of the path.

Proof of the Pumping Lemma for CFLs

Since the CFG contains ๐‘š variables, but the path in ๐‘งโ€™s parse tree contains ๐‘š + 1 variables, at least one variable must be repeated in the path.

If ๐ด is the repeated variable, the path looks like this:

Jim Anderson (modified by Nathan Otterness) 34

โ€ข ๐‘ง = ๐‘ข๐‘ฃ๐‘ค๐‘ฅ๐‘ฆโ€ข ๐‘ฃ๐‘ฅ โ‰ฅ 1โ€ข ๐‘ฃ๐‘ค๐‘ฅ โ‰ค ๐‘›โ€ข For all ๐‘– โ‰ฅ 0,

๐‘ข๐‘ฃ๐‘–๐‘ค๐‘ฅ๐‘–๐‘ฆ โˆˆ ๐ฟ.

๐‘†

๐‘ง

๐ด

๐ด

Proof of the Pumping Lemma for CFLs

Consider the subtrees rooted at each occurrence of ๐ด:

Jim Anderson (modified by Nathan Otterness) 35

โ€ข ๐‘ง = ๐‘ข๐‘ฃ๐‘ค๐‘ฅ๐‘ฆโ€ข ๐‘ฃ๐‘ฅ โ‰ฅ 1โ€ข ๐‘ฃ๐‘ค๐‘ฅ โ‰ค ๐‘›โ€ข For all ๐‘– โ‰ฅ 0,

๐‘ข๐‘ฃ๐‘–๐‘ค๐‘ฅ๐‘–๐‘ฆ โˆˆ ๐ฟ.

๐‘†

๐‘ง

๐ด

๐ด

๐‘ข ๐‘ฃ ๐‘ค ๐‘ฅ ๐‘ฆ

Proof of the Pumping Lemma for CFLs

A subtree rooted at ๐ด has (at least) two possible yields: ๐‘ค๐‘ค๐‘ฅ and ๐‘ค.

Jim Anderson (modified by Nathan Otterness) 36

โ€ข ๐‘ง = ๐‘ข๐‘ฃ๐‘ค๐‘ฅ๐‘ฆโ€ข ๐‘ฃ๐‘ฅ โ‰ฅ 1โ€ข ๐‘ฃ๐‘ค๐‘ฅ โ‰ค ๐‘›โ€ข For all ๐‘– โ‰ฅ 0,

๐‘ข๐‘ฃ๐‘–๐‘ค๐‘ฅ๐‘–๐‘ฆ โˆˆ ๐ฟ.๐‘†

๐ด

๐ด

๐‘ข ๐‘ฃ ๐‘ค ๐‘ฅ ๐‘ฆ

๐‘ฃ๐‘ค๐‘ฅ = yield of first subtree of ๐ด๐‘ค = yield of second subtree of ๐ด

Proof of the Pumping Lemma for CFLs

We can replace the possible subtrees rooted at ๐ด with each other to generate different strings.

Jim Anderson (modified by Nathan Otterness) 37

โ€ข ๐‘ง = ๐‘ข๐‘ฃ๐‘ค๐‘ฅ๐‘ฆโ€ข ๐‘ฃ๐‘ฅ โ‰ฅ 1โ€ข ๐‘ฃ๐‘ค๐‘ฅ โ‰ค ๐‘›โ€ข For all ๐‘– โ‰ฅ 0,

๐‘ข๐‘ฃ๐‘–๐‘ค๐‘ฅ๐‘–๐‘ฆ โˆˆ ๐ฟ.๐‘†

๐‘ข

๐‘ค

๐‘ฆ

This tree has a yield ๐‘ข๐‘ค๐‘ฆ = ๐‘ข๐‘ฃ0๐‘ค๐‘ฅ0๐‘ฆThis string must also be in ๐ฟ.

๐ด

Proof of the Pumping Lemma for CFLs

Jim Anderson (modified by Nathan Otterness) 38

โ€ข ๐‘ง = ๐‘ข๐‘ฃ๐‘ค๐‘ฅ๐‘ฆโ€ข ๐‘ฃ๐‘ฅ โ‰ฅ 1โ€ข ๐‘ฃ๐‘ค๐‘ฅ โ‰ค ๐‘›โ€ข For all ๐‘– โ‰ฅ 0,

๐‘ข๐‘ฃ๐‘–๐‘ค๐‘ฅ๐‘–๐‘ฆ โˆˆ ๐ฟ.๐‘†

๐ด

๐ด

๐‘ข ๐‘ฃ ๐‘ฅ ๐‘ฆ

We can replace the possible subtrees rooted at ๐ด with each other to generate different strings.

๐‘ฃ ๐‘ค ๐‘ฅ

This tree has a yield ๐‘ข๐‘ฃ๐‘ฃ๐‘ค๐‘ฅ๐‘ฅ๐‘ฆ = ๐‘ข๐‘ฃ2๐‘ค๐‘ฅ2๐‘ฆThis string must also be in ๐ฟ.

๐ดNote that this introduces another occurrence of ๐ด.

Proof of the Pumping Lemma for CFLs

Jim Anderson (modified by Nathan Otterness) 39

โ€ข ๐‘ง = ๐‘ข๐‘ฃ๐‘ค๐‘ฅ๐‘ฆโ€ข ๐‘ฃ๐‘ฅ โ‰ฅ 1โ€ข ๐‘ฃ๐‘ค๐‘ฅ โ‰ค ๐‘›โ€ข For all ๐‘– โ‰ฅ 0,

๐‘ข๐‘ฃ๐‘–๐‘ค๐‘ฅ๐‘–๐‘ฆ โˆˆ ๐ฟ.

๐‘†

๐ด

๐ด

๐‘ข ๐‘ฃ ๐‘ฅ ๐‘ฆ

๐‘ฃ ๐‘ค ๐‘ฅ

๐ด

๐ด

๐ด

๐‘ฃ ๐‘ฅ

๐‘ฃ ๐‘ฅ

โ€ฆ

We can repeat this process indefinitely to keep generating strings in ๐ฟ of the form ๐‘ข๐‘ฃ๐‘–๐‘ค๐‘ฅ๐‘–๐‘ฆ.

So, ๐‘ข๐‘ฃ๐‘–๐‘ค๐‘ฅ๐‘–๐‘ฆ โˆˆ ๐ฟ for all ๐‘– โ‰ฅ 0.

Application of the CFL Pumping Lemma

๐ฟ = ๐‘Ž๐‘–๐‘๐‘–๐‘๐‘– | ๐‘– โ‰ฅ 1 .

Let the string ๐‘ง = ๐‘Ž๐‘›๐‘๐‘›๐‘๐‘›, for an arbitrary ๐‘› > 0. Clearly, ๐‘ง โˆˆ ๐ฟ for any ๐‘›.

Let ๐‘ง = ๐‘ข๐‘ฃ๐‘ค๐‘ฅ๐‘ฆ, ๐‘ฃ๐‘ฅ โ‰ฅ 1, and ๐‘ฃ๐‘ค๐‘ฅ โ‰ค ๐‘›. This means:

โ– ๐‘ฃ๐‘ฅ is all ๐‘Žโ€™s, ๐‘โ€™s, or ๐‘โ€™s, or

โ– ๐‘ฃ๐‘ฅ is all ๐‘Žโ€™s and ๐‘โ€™s or all ๐‘โ€™s and ๐‘โ€™s

โ– (The key point is that ๐‘ฃ๐‘ฅ can not possibly contain ๐‘Žโ€™s, ๐‘โ€™s, and ๐‘โ€™s all at the same time.)

If ๐‘ฃ๐‘ฅ contains only one type of symbol, ๐‘ข๐‘ฃ0๐‘ค๐‘ฅ0๐‘ฆ will contain too few of that symbol.

If ๐‘ฃ๐‘ฅ contains two types of symbols, ๐‘ข๐‘ฃ0๐‘ค๐‘ฅ0๐‘ฆ will contain too many of the symbol not in ๐‘ฃ๐‘ฅ.

Therefore, ๐‘ข๐‘ฃ0๐‘ค๐‘ฅ0๐‘ฆ โˆ‰ ๐ฟ, and the Pumping Lemma for CFLs does not hold for ๐ฟ.

Jim Anderson (modified by Nathan Otterness) 40

Application of the CFL Pumping Lemma ๐ฟ = ๐‘ค๐‘ค| ๐‘ค โˆˆ ๐ŸŽ + ๐Ÿ โˆ— .

Let ๐‘ง = 0๐‘›1๐‘›0๐‘›1๐‘›. Clearly, ๐‘ง โˆˆ ๐ฟ for any ๐‘›.

Let ๐‘ง = ๐‘ข๐‘ฃ๐‘ค๐‘ฅ๐‘ฆ, ๐‘ฃ๐‘ฅ โ‰ฅ 1, and ๐‘ฃ๐‘ค๐‘ฅ โ‰ค ๐‘›. This means:

โ– ๐‘ฃ๐‘ฅ is all 0s or all 1s

โ– ๐‘ฃ๐‘ฅ is some 0s followed by some 1s

โ– ๐‘ฃ๐‘ฅ is some 1s followed by some 0s.

If ๐‘ฃ๐‘ฅ contains only 0s or only 1s, ๐‘ข๐‘ฃ0๐‘ค๐‘ฅ0๐‘ฆ will contain too few 0s or 1s in one half of the string.

If ๐‘ฃ๐‘ฅ contained 0s followed by 1s, then either the first or second half of ๐‘ข๐‘ฃ0๐‘ค๐‘ฅ0๐‘ฆ will contain fewer 0s and 1s than the other half.

If ๐‘ฃ๐‘ฅ contained 1s followed by 0s (๐‘ฃ๐‘ฅ is in the middle of ๐‘ง), then ๐‘ข๐‘ฃ0๐‘ค๐‘ฅ0๐‘ฆwill have more 0s in the first group of 0s than the second group of 0s. (The number of 1s will also be similarly imbalanced.)

In any of these cases, ๐‘ข๐‘ฃ0๐‘ค๐‘ฅ0๐‘ฆ โˆ‰ ๐ฟ.

Jim Anderson (modified by Nathan Otterness) 41

Closure of CFLs under Substitution

Recall that a homomorphism maps characters in some alphabet ฮฃ to strings over another alphabet ฮ”.

A homomorphism is actually a special case of substitution, which maps characters in one alphabet to any string in a language over another alphabet.

Consider this example substitution ๐‘“:

โ–ฮฃ = 0,1 , ฮ” = ๐‘Ž, ๐‘ , ๐‘“ 0 = ๐š + ๐›โˆ—, ๐‘“ 1 = ๐šโˆ—๐›.

โ–๐‘“ ๐ŸŽโˆ—๐Ÿโˆ— = ๐š + ๐›โˆ— โˆ— ๐šโˆ—๐› โˆ—.

โ–(Note that this particular example uses regular languages.)

Jim Anderson (modified by Nathan Otterness) 42

Closure of CFLs under Substitution

Theorem 7.23 (reworded): The CFLs are closed under substitution and, by extension, homomorphism.

Proof:

The main idea is to replace all terminals in a CFG with start symbols of another CFG.

Let ๐ฟ be a CFL, and ๐ฟ โŠ† ฮฃโˆ—. For all ๐‘Ž โˆˆ ฮฃ, let ๐ฟ๐‘Ž be a CFL.

Let ๐ฟ = ๐ฟ ๐บ . For all ๐‘Ž โˆˆ ฮฃ, let ๐ฟ๐‘Ž = ๐ฟ ๐บ๐‘Ž .

Assume these grammars have distinct variables.

Jim Anderson (modified by Nathan Otterness) 43

Closure under Substitution, Proof contd.

Let ๐บ = ๐‘‰, ๐‘‡, ๐‘ƒ, ๐‘† , and for all ๐‘Ž โˆˆ ฮฃ, ๐บ๐‘Ž = ๐‘‰๐‘Ž, ๐‘‡๐‘Ž, ๐‘ƒ๐‘Ž , ๐‘†๐‘Ž

Define ๐บโ€ฒ = ๐‘‰โ€ฒ, ๐‘‡โ€ฒ, ๐‘ƒโ€ฒ, ๐‘†โ€ฒ , where:

๐‘‰โ€ฒ = ๐‘Žโˆˆฮฃ๐‘‰๐‘Žฺ‚ โˆช ๐‘‰

๐‘‡โ€ฒ = ๐‘Žโˆˆฮฃ๐‘‡๐‘Žฺ‚

๐‘†โ€ฒ = ๐‘†

๐‘ƒโ€ฒ = ๐‘Žโˆˆฮฃ๐‘ƒ๐‘Žฺ‚ โˆช {๐ด โ†’ ๐›ผโ€ฒ | ๐ด โ†’ ๐›ผ is in ๐‘ƒ, and๐›ผโ€ฒ = ๐›ผ with each ๐‘Ž โˆˆ ฮฃ replaced by ๐‘†๐‘Ž}.

The language defined by substitution equals ๐ฟ ๐บโ€ฒ . (The proof is left as an exerciseโ€”or see Theorem 7.23 in the book.)

Jim Anderson (modified by Nathan Otterness) 44

Union, Concatenation, and Closure

Theorem 7.24: CFLs are closed under Union, Concatenation, โˆ—-closure, and +-closure.

Proof:

Union: Let ๐ฟ1 and ๐ฟ2 be CFLs. ๐ฟ1 โˆช ๐ฟ2 = ๐‘  ๐ฟ , where ๐ฟ = 1, 2 (which is clearly a CFL), and ๐‘  is the substitution defined by ๐‘  1 = ๐ฟ1 and ๐‘  2 = ๐ฟ2.

The proofs for the others are similar.

Jim Anderson (modified by Nathan Otterness) 45

Notation similar to โˆ—, but meaning โ€œ1 or more repetitionsโ€.

Closure under Reversal

Theorem 7.25: CFLs are closed under reversal.

Proof:

If ๐ฟ = ๐ฟ ๐บ , where ๐บ = ๐‘‰, ๐‘‡, ๐‘ƒ, ๐‘† , then ๐ฟ๐‘… = ๐ฟ ๐บ๐‘… , where ๐บ๐‘… = ๐‘‰, ๐‘‡, ๐‘ƒ๐‘… , ๐‘† , and ๐‘ƒ๐‘… = {๐ด โ†’ ๐›ผ๐‘… | A โ†’ ๐›ผ is a production in ๐‘ƒ}.

(The full formal proof is left as an exercise.)

Jim Anderson (modified by Nathan Otterness) 46

The productions in ๐บ๐‘…

are just the productions in ๐บ written backwards.

(Lack of) Closure under Intersection

Theorem: CFLs are not closed under intersection.

Proof:

๐ฟ1 = ๐‘Ž๐‘–๐‘๐‘–๐‘๐‘– | ๐‘– โ‰ฅ 1 . We know ๐ฟ1 isnโ€™t a CFL from earlier slides.

๐ฟ2 = ๐‘Ž๐‘–๐‘๐‘–๐‘๐‘— | ๐‘– โ‰ฅ 1, ๐‘— โ‰ฅ 1 . This is a CFL.

๐ฟ3 = ๐‘Ž๐‘–๐‘๐‘—๐‘๐‘— | ๐‘– โ‰ฅ 1, ๐‘— โ‰ฅ 1 . This is also a CFL.

However, ๐ฟ1 = ๐ฟ2 โˆฉ ๐ฟ3.

Jim Anderson (modified by Nathan Otterness) 47

See example 7.26 in the book for CFGs.

(Lack of) Closure under Complementation

Corollary to the previous theorem: CFLs are notclosed under complementation.

Proof:

CFLs are closed under union.

๐ฟ1 โˆฉ ๐ฟ2 โ‰ก ๐ฟ1 โˆช ๐ฟ2.

So, if CFLs are closed under complementation, they would be closed under intersection, too.

Jim Anderson (modified by Nathan Otterness) 48

Intersection with Regular Languages

Theorem 7.26: If ๐ฟ is a CFL and ๐‘… is a regular language, then ๐ฟ โˆฉ ๐‘… is a CFL.

Proof:

Let ๐ฟ be the language of some PDA ๐‘ƒ = ๐‘„๐‘ƒ, ฮฃ, ฮ“, ๐›ฟ๐‘ƒ, ๐‘ž๐‘ƒ , ๐‘0, ๐น๐‘ƒ .

Let ๐‘… be the language of some DFA ๐ด = ๐‘„๐ด, ฮฃ, ๐›ฟ๐ด, ๐‘ž๐ด, ๐น๐ด .

Idea: Create a new PDA combining the states of ๐‘ƒ and ๐ด, similar to combining two DFAs.

Jim Anderson (modified by Nathan Otterness) 49

Proof: Intersection with Reg. Languages

Let ๐‘ƒโ€ฒ = ๐‘„๐‘ƒ ร— ๐‘„๐ด, ฮฃ, ฮ“, ๐›ฟ, ๐‘ž๐‘ƒ , ๐‘ž๐ด , ๐‘0, ๐น๐‘ƒ ร— ๐น๐ด , where:

๐›ฟ ๐‘ž, ๐‘ , ๐‘Ž, ๐‘‹ contains ๐‘Ÿ, ๐‘  , ๐›พ if and only if๐›ฟ๐ด ๐‘, ๐‘Ž = ๐‘  and ๐›ฟ๐‘ƒ ๐‘ž, ๐‘Ž, ๐‘‹ contains ๐‘Ÿ, ๐›พ .

Claim: ๐‘ž๐‘ƒ , ๐‘ž๐ด , ๐‘ค, ๐‘0 โ”œ๐‘–

๐‘ƒโ€ฒ๐‘ž, ๐‘ , ํœ€, ๐›พ if and only if

๐‘ž๐‘ƒ , ๐‘ค, ๐‘0 โ”œ๐‘–

๐‘ƒ๐‘ž, ํœ€, Y and ๐›ฟ ๐‘ž๐ด, ๐‘ค = ๐‘.

(This can be proven by induction on ๐‘–.)

Jim Anderson (modified by Nathan Otterness) 50

Closure under Inverse Homomorphism

Theorem 7.30: CFLs are closed under inverse homomorphism.

Proof:

Consider ๐ฟ, where ๐ฟ is the language of some PDA ๐‘ƒ.

๐‘ƒ = ๐‘„, ฮ”, ฮ“, ๐›ฟ, ๐‘ž0, ๐‘0, ๐น .

Let โ„Ž: ฮฃ โ†’ ฮ”โˆ—.

Construct a PDA ๐‘ƒโ€ฒ that accepts โ„Žโˆ’1 ๐ฟ .

Jim Anderson (modified by Nathan Otterness) 51

Closure under Inverse Homomorphism

Proof, continued:

The key idea is the same as for regular languages: When processing an input ๐‘Ž, ๐‘ƒโ€ฒ simulates ๐‘ƒ on the input โ„Ž ๐‘Ž .

In A DFA, simulation required just a single state transition.

However, ๐‘ƒ, being a PDA, does more than just change state on input โ„Ž ๐‘Ž --it may change the stack contents or make nondeterministic choices.

Solution: Use a buffer to hold the symbols of โ„Ž ๐‘Ž .

Jim Anderson (modified by Nathan Otterness) 52

This buffer will really be part of ๐‘ƒโ€ฒโ€™s (finite!) state.

Closure under Inverse Homomorphism

Conceptually:

Jim Anderson (modified by Nathan Otterness) 53

โ„Žโ„Ž ๐‘Ž

Buffer

Input๐‘Ž

Original PDA state

Stack

Accept/Reject

The buffer must be large enough to hold the longest

string produced by โ„Ž.

Closure under Inverse Homomorphism

Short example: โ„Ž ๐‘Ž = 01, and โ„Ž ๐‘ = 111

Jim Anderson (modified by Nathan Otterness) 54

โ„Ž

BufferInput string:๐‘Ž

Original PDA state

Stack

Accept/Reject

๐‘๐‘Ž11011 Etc.

Closure under Inverse Homomorphism

We have a homomorphism โ„Ž: ฮฃ โ†’ ฮ”โˆ—.

Recall that ๐‘ƒ = ๐‘„, ฮ”, ฮ“, ๐›ฟ, ๐‘ž0, ๐‘0, ๐น

Let ๐‘ƒโ€ฒ = ๐‘„โ€ฒ, ฮฃ, ฮ“, ๐›ฟโ€ฒ, ๐‘ž0, ํœ€ , ๐‘0, ๐น ร— ํœ€ .

States in ๐‘„โ€ฒ is a set of pairs ๐‘ž, ๐‘ฅ such that

โ–๐‘ž is a state in ๐‘„, and

โ–๐‘ฅ is the finite โ€œbufferโ€โ€”โ„Ž ๐‘Ž or a suffix of โ„Ž ๐‘Žfor some symbol ๐‘Ž โˆˆ ฮฃ.

Jim Anderson (modified by Nathan Otterness) 55

Start in the original start state, with an empty buffer.

Accept in any of the original accepting states, if the buffer is empty.

Closure under Inverse Homomorphism

Transitions in ๐‘ƒโ€ฒ:

๐›ฟโ€ฒ ๐‘ž, ๐‘ฅ , ํœ€, ๐‘‹ contains all ๐‘, ๐‘ฅ , ๐›พ such that ๐›ฟ ๐‘ž, ํœ€, ๐‘‹

contains ๐‘, ๐›พ .

โ– โ€œSimulate ํœ€-transitionsโ€

๐›ฟโ€ฒ ๐‘ž, ๐‘๐‘ฅ , ํœ€, ๐‘‹ contains all ๐‘, ๐‘ฅ , ๐›พ such that ๐›ฟ ๐‘ž, ๐‘, ๐‘‹contains ๐‘, ๐›พ .

โ– โ€œSimulate non-ํœ€ transitionsโ€

๐›ฟโ€ฒ ๐‘ž, ํœ€ , ๐‘Ž, ๐‘‹ contains ๐‘ž, โ„Ž ๐‘Ž , ๐‘‹ for all ๐‘Ž โˆˆ ฮฃ and ๐‘‹ โˆˆ ฮ“.

โ– โ€œLoad the bufferโ€Jim Anderson (modified by Nathan Otterness) 56

Closure under Inverse Homomorphism

Claim: If ๐‘ž0, โ„Ž ๐‘ค , ๐‘0 โ”œ๐‘ƒ

โˆ—๐‘, ํœ€, ๐›พ , then

๐‘ž0, ํœ€ , ๐‘ค, ๐‘0 โ”œ๐‘ƒโ€ฒ

โˆ—๐‘, ํœ€ , ํœ€, ๐›พ .

Proof sketch:

For each ๐‘Ž โˆˆ ฮฃ, whatever sequence of moves ๐‘ƒ makes on โ„Ž ๐‘Ž , ๐‘ƒโ€ฒ can make a corresponding sequence of moves on input ๐‘Ž.

Jim Anderson (modified by Nathan Otterness) 57

Closure under Inverse Homomorphism

Claim: If ๐‘ž0, ํœ€ , ๐‘ค, ๐‘0 โ”œ๐‘ƒโ€ฒ

โˆ—๐‘, ํœ€ , ํœ€, ๐›พ , then

๐‘ž0, โ„Ž ๐‘ค , ๐‘0 โ”œ๐‘ƒ

โˆ—๐‘, ํœ€, ๐›พ .

Proof sketch:

๐‘ƒโ€ฒ can only process ๐‘ค one character at a time. For each character ๐‘Ž, ๐‘ƒโ€ฒ does what ๐‘ƒ does on โ„Ž ๐‘Ž .

The first claim implied that ๐ฟ ๐‘ƒโ€ฒ โŠ‡ โ„Žโˆ’1 ๐ฟ ๐‘ƒ .

This claim implies that ๐ฟ ๐‘ƒโ€ฒ โŠ† โ„Žโˆ’1 ๐ฟ ๐‘ƒ .

Jim Anderson (modified by Nathan Otterness) 58

Decision Properties of CFLs

As with regular languages, weโ€™ll focus less on efficiency and more on simplicity than whatโ€™s done in the book.

Theorem: There exist algorithms to determine if a CFL is (a) empty, (b) finite, or (c) infinite.

Jim Anderson (modified by Nathan Otterness) 59

Detecting if a CFL is Empty

Nonemptiness:

Use the iterative algorithm for detecting useless

symbols to test if ๐ด ึœโˆ—๐‘ค for all ๐ด โˆˆ ๐‘‰.

The CFL is nonempty if and only if ๐‘† ึœโˆ—๐‘ค for some

terminal string ๐‘ค.

Jim Anderson (modified by Nathan Otterness) 60

Detecting if a CFL is Finite or Infinite

Assume ๐ฟ does not contain ํœ€. (If ํœ€ โˆˆ ๐ฟ, then consider ๐ฟ โˆ’ ํœ€ instead.)

Suppose ๐ฟ = ๐ฟ ๐บ , where ๐บ = ๐‘‰, ๐‘‡, ๐‘ƒ, ๐‘† is in CNF (and therefore has no useless symbols).

Consider the directed graph ๐‘‰, ๐ธ , where ๐ด, ๐ต โˆˆ ๐ธ if ๐ด โ†’ ๐ต๐ถ or ๐ด โ†’ ๐ถ๐ต is in ๐‘ƒ for some variable ๐ถ.

Claim: ๐ฟ is finite if and only if ๐‘‰, ๐ธ has no cycles.Jim Anderson (modified by Nathan Otterness) 61

๐ฟ โˆ’ ํœ€ will be finite if and only if ๐ฟ is finite.

Detecting if a CFL is Finite or Infinite

Claim (again): ๐ฟ is finite if and only if ๐‘‰, ๐ธ has no cycles.

โ€œOnly ifโ€: A cycle has the form ๐ด0, ๐ด1, โ€ฆ, ๐ด๐‘›, ๐ด0.

Therefore, we have ๐ด0 ึœ ๐›ผ1๐ด1๐›ฝ1 ึœ ๐›ผ2๐ด2๐›ฝ2 ึœ โ‹ฏ ึœ ๐›ผ๐‘›๐ด๐‘›๐›ฝ๐‘› ึœ๐›ผ๐‘›+1๐ด0๐›ฝ๐‘›+1, where ๐›ผ๐‘–๐›ฝ๐‘– = ๐‘–.

Since ๐บ has no useless symbols:

๐›ผ๐‘›+1 ึœโˆ—๐‘ค,

๐›ฝ๐‘›+1 ึœโˆ—๐‘ฅ,

๐‘† ึœโˆ—๐‘ฆ๐ด0๐‘ง, and

๐ด0 ึœโˆ—๐‘ฃ,

where ๐‘ค, ๐‘ฅ, ๐‘ฆ, ๐‘ง, and ๐‘ฃ are terminal strings.Jim Anderson (modified by Nathan Otterness) 62

This is due to the grammar being in CNFโ€”we can

only produce ๐‘– symbols in ๐‘– derivation steps.

Detecting if a CFL is Finite or Infinite

โ€œOnly ifโ€ proof, continued:

From:

๐›ผ๐‘›+1 ึœโˆ—๐‘ค

๐›ฝ๐‘›+1 ึœโˆ—๐‘ฅ

๐‘† ึœโˆ—๐‘ฆ๐ด0๐‘ง

๐ด0 ึœโˆ—๐‘ฃ

we have:

๐‘† ึœโˆ—๐‘ฆ๐ด0๐‘ฅ ึœ

โˆ—๐‘ฆ๐‘ค๐ด0๐‘ฅ๐‘ง ึœ

โˆ—๐‘ฆ๐‘ค2๐ด0๐‘ฅ

2๐‘ง ึœโˆ—โ€ฆึœ

โˆ—๐‘ฆ๐‘ค๐‘–๐ด0๐‘ฅ

๐‘–๐‘ง ึœโˆ—๐‘ฆ๐‘ค๐‘–๐‘ฃ๐‘ฅ๐‘–๐‘ง.

Therefore, ๐ฟ is infinite.

Jim Anderson (modified by Nathan Otterness) 63

Detecting if a CFL is Finite or Infinite

Claim (again): ๐ฟ is finite if and only if ๐‘‰, ๐ธ has no cycles.

โ€œIfโ€: Suppose ๐‘‰, ๐ธ has no cycles.

Definition: The rank of ๐ด โˆˆ ๐‘‰ is the longest path beginning from ๐ด. If a graph has no cycles, then all ranks are finite.

Claim: If ๐ด has a rank ๐‘Ÿ, then ๐ด derives no terminal string of length exceeding 2๐‘Ÿ.

โ–This can be proven by induction on ๐‘Ÿ, similarly to some claims prior to the Pumping Lemma.

Since ๐‘‰, ๐ธ has no cycles, ๐‘† has a finite rank. Therefore, the longest string derivable from ๐‘† has a finite length, so ๐ฟ must be finite.

Jim Anderson (modified by Nathan Otterness) 64

Membership: CYK Algorithm

We want to know if a string ๐‘ฅ is in ๐ฟ.

Let ๐ฟ โˆ’ ํœ€ = ๐ฟ ๐บ for a CFG ๐บ in CNF.

Let ๐‘ฅ = ๐‘›.

Let ๐‘ฅ๐‘–๐‘— be a substring of ๐‘ฅ of length ๐‘— beginning at

position ๐‘–.

Inductively determine all variables ๐ด such that

๐ด ึœโˆ—๐‘ฅ๐‘–๐‘—.

๐‘ฅ โˆˆ ๐ฟ if and only if ๐‘† ึœโˆ—๐‘ฅ1๐‘›.

Jim Anderson (modified by Nathan Otterness) 65

Membership: CYK Algorithm

Base case: ๐‘— = 1. ๐ด ึœโˆ—๐‘ฅ๐‘–๐‘— if and only if ๐ด โ†’ ๐‘ฅ๐‘–๐‘—.

Inductive step: ๐‘— > 1. ๐ด ึœโˆ—๐‘ฅ๐‘–๐‘— if and only if there

exists ๐ด โ†’ ๐ต๐ถ and ๐‘˜, 1 โ‰ค ๐‘˜ < ๐‘—, such that ๐ต ึœโˆ—๐‘ฅ๐‘–๐‘˜ and

๐ถ ึœโˆ—๐‘ฅ๐‘–+๐‘˜ ๐‘—โˆ’๐‘˜.

Jim Anderson (modified by Nathan Otterness) 66

๐‘— = 1, so ๐‘ฅ๐‘–๐‘— = ๐‘ฅ๐‘–1. ๐‘ฅ๐‘–1has a length of 1, so

itโ€™s a terminal symbol.

Membership: CYK Algorithm

Jim Anderson (modified by Nathan Otterness) 67

for i := 1 to n:Vi1 := {A | A โ†’ xi1 }

for j := 2 to n:for i := 1 to n โ€“ j + 1:

Vij := ;for k := 1 to j โ€“ 1:

Vij := Vij {A | A โ†’ BC, B Vik, C Vi+k j-k}

The time complexity is ๐‘‚ ๐‘›3 .

Pseudocode for determining the sets of variables, ๐‘‰๐‘–๐‘—,

that can produce the substring ๐‘ฅ๐‘–๐‘—:

CYK Algorithm: Example

Jim Anderson (modified by Nathan Otterness) 68

๐‘† โ†’ ๐ด๐ต | ๐ต๐ถ๐ด โ†’ ๐ต๐ด | ๐‘Ž๐ต โ†’ ๐ถ๐ถ | ๐‘๐ถ โ†’ ๐ด๐ต | ๐‘Ž๐‘ฅ = ๐‘๐‘Ž๐‘Ž๐‘๐‘Ž (๐‘› = 5)

for i := 1 to n:Vi1 := {A | A โ†’ xi1 }

for j := 2 to n:for i := 1 to n โ€“ j + 1:

Vij := ;for k := 1 to j โ€“ 1:

Vij := Vij {A | A โ†’ BC, B Vik, C Vi+k j-k}

1 2 3 4 512345

i โ†’

j

b a a b a

CYK Algorithm: Example

Jim Anderson (modified by Nathan Otterness) 69

๐‘† โ†’ ๐ด๐ต | ๐ต๐ถ๐ด โ†’ ๐ต๐ด | ๐‘Ž๐ต โ†’ ๐ถ๐ถ | ๐‘๐ถ โ†’ ๐ด๐ต | ๐‘Ž๐‘ฅ = ๐‘๐‘Ž๐‘Ž๐‘๐‘Ž (๐‘› = 5)

for i := 1 to n:Vi1 := {A | A โ†’ xi1 }

for j := 2 to n:for i := 1 to n โ€“ j + 1:

Vij := ;for k := 1 to j โ€“ 1:

Vij := Vij {A | A โ†’ BC, B Vik, C Vi+k j-k}

1 2 3 4 51 B A, C A, C B A,C2345

i โ†’

j

b a a b a

CYK Algorithm: Example

Jim Anderson (modified by Nathan Otterness) 70

๐‘† โ†’ ๐ด๐ต | ๐ต๐ถ๐ด โ†’ ๐ต๐ด | ๐‘Ž๐ต โ†’ ๐ถ๐ถ | ๐‘๐ถ โ†’ ๐ด๐ต | ๐‘Ž๐‘ฅ = ๐‘๐‘Ž๐‘Ž๐‘๐‘Ž (๐‘› = 5)

for i := 1 to n:Vi1 := {A | A โ†’ xi1 }

for j := 2 to n:for i := 1 to n โ€“ j + 1:

Vij := ;for k := 1 to j โ€“ 1:

Vij := Vij {A | A โ†’ BC, B Vik, C Vi+k j-k}

1 2 3 4 51 B A, C A, C B A,C2 S, A B S, C S, A345

i โ†’

j

b a a b a

CYK Algorithm: Example

Jim Anderson (modified by Nathan Otterness) 71

๐‘† โ†’ ๐ด๐ต | ๐ต๐ถ๐ด โ†’ ๐ต๐ด | ๐‘Ž๐ต โ†’ ๐ถ๐ถ | ๐‘๐ถ โ†’ ๐ด๐ต | ๐‘Ž๐‘ฅ = ๐‘๐‘Ž๐‘Ž๐‘๐‘Ž (๐‘› = 5)

for i := 1 to n:Vi1 := {A | A โ†’ xi1 }

for j := 2 to n:for i := 1 to n โ€“ j + 1:

Vij := ;for k := 1 to j โ€“ 1:

Vij := Vij {A | A โ†’ BC, B Vik, C Vi+k j-k}

1 2 3 4 51 B A, C A, C B A,C2 S, A B S, C S, A3 B B45

i โ†’

j

b a a b a

CYK Algorithm: Example

Jim Anderson (modified by Nathan Otterness) 72

๐‘† โ†’ ๐ด๐ต | ๐ต๐ถ๐ด โ†’ ๐ต๐ด | ๐‘Ž๐ต โ†’ ๐ถ๐ถ | ๐‘๐ถ โ†’ ๐ด๐ต | ๐‘Ž๐‘ฅ = ๐‘๐‘Ž๐‘Ž๐‘๐‘Ž (๐‘› = 5)

for i := 1 to n:Vi1 := {A | A โ†’ xi1 }

for j := 2 to n:for i := 1 to n โ€“ j + 1:

Vij := ;for k := 1 to j โ€“ 1:

Vij := Vij {A | A โ†’ BC, B Vik, C Vi+k j-k}

1 2 3 4 51 B A, C A, C B A,C2 S, A B S, C S, A3 B B4 S, A, C5

i โ†’

j

b a a b a

CYK Algorithm: Example

Jim Anderson (modified by Nathan Otterness) 73

๐‘† โ†’ ๐ด๐ต | ๐ต๐ถ๐ด โ†’ ๐ต๐ด | ๐‘Ž๐ต โ†’ ๐ถ๐ถ | ๐‘๐ถ โ†’ ๐ด๐ต | ๐‘Ž๐‘ฅ = ๐‘๐‘Ž๐‘Ž๐‘๐‘Ž (๐‘› = 5)

for i := 1 to n:Vi1 := {A | A โ†’ xi1 }

for j := 2 to n:for i := 1 to n โ€“ j + 1:

Vij := ;for k := 1 to j โ€“ 1:

Vij := Vij {A | A โ†’ BC, B Vik, C Vi+k j-k}

1 2 3 4 51 B A, C A, C B A,C2 S, A B S, C S, A3 B B4 S, A, C5 S, A, C

i โ†’

j

b a a b a

๐‘† โˆˆ ๐‘‰1๐‘›, so ๐‘ฅ โˆˆ ๐ฟ.