LANGUAGES AND AUTOMATAmath.ef.jcu.cz/.../2016/12/LANGUAGES_AND_AUTOMATA.pdfdwatch Institute, the...

36
LANGUAGES AND AUTOMATA e-book aclav N ´ YDL, Vivian WHITE, Anna MALCEVA Department of Applied Mathematics and Informatics Faculty of Economy, University of South Bohemia in ˇ Cesk´ e Budˇ ejovice 1

Transcript of LANGUAGES AND AUTOMATAmath.ef.jcu.cz/.../2016/12/LANGUAGES_AND_AUTOMATA.pdfdwatch Institute, the...

LANGUAGES AND AUTOMATA•

e-book

Vaclav NYDL, Vivian WHITE, Anna MALCEVA

Department of Applied Mathematics and Informatics

Faculty of Economy, University of South Bohemia in Ceske Budejovice

1

This publication was created as a part of the project IP16-18 12/EF-Klufova/1/Nydl - andit is targeted to parallel lecturing in the English Language

Ceske Budejovice, November 2016the authors

c⃝ Vaclav Nydl, 2016

2

TOPIC 1 - The Algebra on Strings over a Given Alphabet

The basic concepts are an alphabet and a string or a word over the alphabet. The set of allfinite strings over an alphabet Σ is usually denoted by Σ∗ (using ‘the Kleene star’). The emptystring ε always belongs to Σ∗; the symbol Σ+ is used for the set of all non-empty strings,i.e. for Σ∗ −{ε}. The strings of length 1 are the symbols of the alphabet. Longer strings arechains (sequences) of the symbols - in fact, they are the permutations with repetition of anylength. The notation #(w) or |w| means the number of all symbols in the string w - i.e. thelength of the string w; if the alphabet Σ consists of n different symbols, then the number ofstrings of length k over Σ equals nk (the permutations of n elements taken k at a time). Thenotation #b(w) means the number of occurrences (repetitions) of symbol b in the string w.

On Σ∗, different (unary, binary, etc.) operations can be defined and we can talk about thestring algebra. Some of them are shown below.1

(1) The reversal of the string w is obtained when we write the string in the reverse orderof its symbols. It is denoted by wR. If w = wR then the string w is called a palindrome (e.g.ROTOR).

Example 1.1 Let Σ = {a, b, c}, w = abcabc; then wR = cbacba. Further, εR = ε, aR = a.

(2) The concatenation (i.e. the union) of two or more strings is denoted by concat(w1, w2)or concat(w1, w2, w3) or w1 · w2 or w1 · w2 · w3; also simply w1w2 or w1w2w3.

Example 1.2. Let Σ = {a, b, c}, w1 = acc, w2 = abab, w3 = abcabc;then w2w1 = ababacc, w1w2w3 = accabababcabc, #(w2w1) = 7, #a(w1w2w3) = 5.

Especially, the i-th power of the string w, denoted by wi, is the result of i-times repeatedconcatenation of the string w; for example, (baaa)3 = baaabaaabaaa.

We say that the string u is a substring of the string w, if there exist two strings w1, w2 suchthat w = w1uw2. Especially, if w = uv, then we say that u is a prefix and v is a suffix of thestring w. The notation #u(w) means the number of occurrences of the string u as a substringof w.

For example, for w = aaababbbab the string bbb is a substring of w, and, further, aaab is onepossible prefix of w and bab is one possible suffix of w. Also #ab(w) = 3.

In our course, we will define some special operations:

(3) For i ≤ j, a cut of the string w of the form sub(i..j, w) means the substring of wfrom the i-th to the j-th symbol of w, provided j ≤ #(w). The length of this new string isj − i+ 1. If j > #(w) then sub(i..j, w) = ε by the definition.

Example 1.3. Σ = {a, b, c}, w = abcabcab; then sub(2..3, w) = bc, sub(2..8, w) = bcabcab.

(4) For i ≤ j, the pumping of w of the form pump(i..j, w, k) means the following:

let u = sub(i..j, w); then w = w1uw2, where w1 = sub(1..i−1, w) is a prefix and w2 =sub(j+1..n, w) is a suffix; now, pump(i..j, w, k) = w1u

kw2. It means that, inside w, we placethe chain of k copies of u instead of u. The length of the new string equals #(w)+(k−1)·#(u).

Especially, pump(i..j, w, 0) = w1w2 will be the result of deleting of the substring u =sub(i..j, w) from the string w; it will also be denoted by del(i..j, w)).

1The Python programming language uses: u+v for the concatenation, w[i..j] for the cut, w∗3 for thepower.

3

Example 1.4. Let Σ = {a, b, c}, w = acbacccc; then pump(3..4, w, 8) = acbabababababababaccccand the length of this string is 8+ (8− 1) · 2 = 22. Further, pump(3..5, w, 0) = del(3..5, w) =acccc and the length of the result is 8 + (0− 1) · 3 = 5.

Example 1.5. The operations can be combined: if Σ = {a, b, c}, w1 = acbabc, w2 = abba,then wR

1 [sub(2..3, w2)]2pump(4..6, w1, 3) = cbabcabbbbacbabcabcabc.

SAMPLE EXERCISES

Exercise 1.1. Find some examples of palindromes n an ordinary language.

Solution. (by Martina CHUCHLOVA)Czech: nepotopen; nezarazen; zemansenodonesnamez,English: madam; wasitacatisaw.

Exercise 1.2. For two strings u = abb and v = bbcacb over the alphabet Σ = {a, b, c} createthe string w = pump(2..3, v, 4)[del(3..5, u2)]R and then determine the value of #bc(w).

Solution. (by Frantisek DRDAK)pump(2..3, v, 4) = bbcbcbcbcacb, u2 = abbabb, del(3..5, u2) = abb, [del(3..5, u2)]R = bba, w =pump(2..3, v, 4)[del(3..5, u2)]R = bbcbcbcbcacbbba; #bc(w) = 4.

Exercise 1.3. Which of the following formulas is correct for any string w over Σ = {a, b}?FORMULA I. #a(w

4) = (#a(w))4.

FORMULA II. sub(1..2, sub(1..4, w)) = sub(1..2, w) for #(w) ≥ 4.

Solution. The first formula is incorrect. For example, if w = aab then w4 = aabaabaabaab.Finally, #a(w) = 2,#a(w

4) = 8 but (#a(w))4 = 24 = 16.

F.II is correct: if w = x1x2x3x4 . . . xk then sub(1..4, w) = x1x2x3x4 and sub(1..2, sub(1..4, w)) =sub(1..2, x1x2x3x4) = x1x2 = sub(1..2, w).

PROBLEMS TO SOLVE

Problem 1.1. Find some examples of palindromes in the Czech, the English, the French,the German, the Slovak, the Spanish, the Russian, or other languages.

Problem 1.2. For the given two words u and v over the alphabet Σ = {a, b, c}, first createtwo new words w1 = u2vRsub(3..5, v), and w2 = pump(1..2, v, 3)[del(2..5, u2)]R, andthen solve the tasks T1 and T2:

T1: Write down the words w1 and w2. T2: Determine the values of #b(w1) and #ca(w2).(i) u = cab, v = abbbca (ii) u = aab, v = abccca (iii) u = acb, v = abbcba.

Problem 1.3. Which of the following formulas is correct for any string w over Σ ={a,b,c}?FORMULA I. #c(w

R) = #c(w). FORMULA II. #a(w) = 2 ·#aa(w).

FORMULA III. #bc(wR) = #cb(w). FORMULA IV. #a(w) + #b(w) ≥ #ab(w).

FORMULA V. pump(1..2, pump(3..4, w, 2), 2) = pump(1..4, w, 2), for #(w) ≥ 4

4

TOPIC 2 - Operations on Languages2

A language over the alphabet Σ is any set (finite or infinite) L of the strings over Σ, i.e. anysubset L ⊆ Σ∗. If Σ is a non-empty set then Σ∗ is infinite - more precisely, it is a countableset and also the number of languages over Σ is infinite - more precisely, it is an uncountableset (the set theory proves that the set of all subsets of a countable set is uncountable).

Example 2.1. Let Σ = {a, b}. An example of a finite language over Σ is J1 = {a, ab, aab}An example of an infinite language over Σ is J2 = {w ∈ Σ∗; #a(w) = 3}, that consists ofall words containing just 3 copies of the symbol ‘a’ and any number of copies of the symbol‘b’ - for example, aaa, ababa, bbbbabbaa. Another infinite language, J3 = {aibi; i ∈ N∪{0}},consists of all words in which the first part is a chain of copies of ‘a’ and the second part isa chain of the same number of copies of ‘b’; e.g. ε, ab, aabb, aaabbb, . . . .

One of the fundamental problems of the theory of formal languages is seeking methods ofdescription of an infinite language by means of some finite way.

The first method was used above on languages J2, J3. These descriptions use formulas andconsist of a finite number of symbols. They have a form (also called set builders or set-formers) of J = {w ∈ Σ∗; P (w)}, where P is a predicate in variable w. The language Jconsists all strings w, such that P (w) is a true proposition.

We introduce several basic operations (some of them are the standard set operations) onlanguages over a chosen alphabet Σ:

(1) The complement of language L is co(L) = {w; w ∈ Σ∗ ∧ w /∈ L} = Σ∗ − L.

Example 2.1 continued. For the language J1, the language co(J1) consists of all stringsover Σ with the exception of a, ab, aab. Further co(J2) = {w ∈ Σ∗; #a(w) = 3}

(2) The reversal of language L is LR = {wR; w ∈ L}.Example 2.1 continued. For the languages from Example 1: JR

1 = {a, ba, baa}, JR2 = J2,

and further JR3 = {biai; i ∈ N}.

(3) The intersection of languages K and L is K ∩ L = {w; w ∈ K ∧ w ∈ L}.Example 2.1 continued. For the languages from Ex.2.1: J1∩J3 = {ab}, J2∩J3 = {aaabbb}.

(4) The union of languages K and L is K ∪ L = {w; w ∈ K ∨ w ∈ L}.The union or the intersection of more than 2 languages are also possible.

(5) The concatenation of languages K and L is KL = {w1w2; w1 ∈ K,w2 ∈ L}.We also can define the concatenation of more than 2 languages; for any language L its powersLk, where k = 0, 1, 2, 3, . . ., are: L0 = {ε}, L1 = L, Lk = LL . . . L (k-times).

(6) The iteration of language L is L∗ = L0 ∪ L1 ∪ L2 ∪ L3 ∪ . . ..

Example 2.1 continued. For the language J2, the iteration J∗2 consists of (besides ε) all

strings w, such that #a(w) is divisible by 3.

2There are roughly 6,500 spoken languages in the world today. However, about 2,000 of those languageshave fewer than 1,000 speakers. In Nigeria, there can be found about 515 languages. According the Worl-dwatch Institute, the most popular languages in the world are: Mandarin Chinese - 955 million, Spanish- 405 million, English - 360 million, Arabic - 295 million, Portuguese - 215 million, Bengali - 200 million,Russian - 155 million, etc. It is estimated that, if nothing is done, half the languages spoken today willdisappear by the end of this century (Wikipedia and other sources).

5

SAMPLE EXERCISES

Exercise 2.1 Solve the tasks T1, T2, T3 below, for the languages over Σ = {a, b}:L1 = {w; |#a(w)−#b(w)| = 1}, L2 = {w; w=wR}, L3 = {w; w = uuR ∧ u ∈ Σ∗}.

T1: In L∗1 − L1, find a word w such that #a(w) = 6 and write it down.

T2: In co(L2), find a word w such that #ab(w) > 3 and write it down.

T3: In (L2 ∪ L3 − L2 ∩ L3) find two words and write them down.

Solution. (by Anna STEPURA) ad T1: ababababababbb, ad T2: abababab, ad T3: aba, bab.

Exercise 2.2 Let Σ = {a, b} and the languages

L1 = {w; w=wR}, L2 = {w; |#a(w)−#b(w)| ≤ 1},L3 = {w; w=(ab)i(ba)i, i ∈ N}, be three subsets of Σ∗.

Use a Venn diagram to express their relationship.

Solution. (by Martin SIK)'&

$%

��

��

�� � L1

L2

L3

Exercise 2.3 Which of the following formulas is correct for any languages L1, L2?

FORMULA I. (L1L2)∗ = L∗

1L∗2, FORMULA II. co(LR

1 ) = (co(L1))R.

Solution. The first formula is not correct. For example, if L1 = {a}, L2 = {b}, then L1L2 ={ab} and (L1L2)

∗ = {w; w = (ab)i, i ∈ N∪{0}}. On the other hand: L∗1 = {ai, i ∈ N∪{0}},

L∗2 = {bi, j ∈ N∪{0}} and L∗

1L∗2 = {w; w = aibj, i, j ∈ N∪{0}} = (L1L2)

∗.

FORM. II is correct: w ∈ co(LR1 ) ⇔ w /∈ LR

1 ⇔ wR /∈ L1 ⇔ wR ∈ co(L1) ⇔ w ∈ (co(L1))R.

PROBLEMS TO SOLVE

Problem 2.1 For the languages L1, L2, L3, L4 over Σ = {a, b}, solve the following tasks:

T1: In L∗1 − L1, find a word w such that #a(w) = 6 and write it down.

T2: In co(L2), find a word w such that #ab(w) > 3 and write it down.

T3: In (L2 ∪ L4 − L2 ∩ L4) find two words and write them down.

T4: In L2 ∩ L3, find two words and write them down.

T5: In Σ∗, find a word which does not belong to co(L4) ∪ L2L3 and write it down.

(a) L1 = {w; |#a(w)−#b(w)| ≤ 1}, L2 = {w; w=(ab)ia(ba)i, i ∈ N},L3 = {w; w=wR}, L4 = {w; w = uuR ∧ u ∈ Σ∗}.

(b) L1 = {w; |#a(w)−#b(w)| > 1}, L2 = {w; w=(a)ibb(a)j, i, j ∈ N},L3 = {w; w =wR}, L4 = {w; w = uuR ∧ u ∈ Σ∗}.

Problem 2.2 Σ = {a, b} and languages K,L,M are three subsets of Σ∗. Use a Venn diagramto express their relationship.

(a) K = {w; |#a(w)−#b(w)| = 1}, L = {w; w=wR}, M = {w; w=(ab)ia(ba)i, i ∈ N},(b) K = {w; w=(ab)ib(ba)i, i ∈ N}, L = {w; |#a(w)−#b(w)| = 2}, M = {w; w=wR},(c) K = {w; w =wR}, L = {w; |#a(w)−#b(w)| = 1}, M = {w; w=(ab)i(ba)i, i ∈ N}.

Problem 2.3 Which of the following formulas is correct for all languages L1, L2, L3?

FORMULA I. L1L2 = L2L1, FORMULA II. L1(L2 ∪ L3) = L1L2 ∪ L1L3,

FORMULA III. (L2)∗ = (L∗)2, FORMULA IV. co(L1 ∪L2) = co(L1)∪ co(L2),

FORMULA V. (L1 ∪ L2)∗ = L∗

1 ∪ L∗2, FORMULA VI. L∗

1 ∩ L∗2 = (L1 ∩ L2)

6

TOPIC 3 - The Language Accepted by a Finite Automaton(Acceptor)

The second method of a finite description of languages are finite automata.

Basic ideas. A finite automaton3 A is a system consisting of a processing unit, having a finitenumber of states, a reading head connected to an input tape; the input string (the data) is writtenon the tape from left to right (each symbol is placed into one cell of the tape).At the beginning, the processing unit is in the start state and the head is over the cell containingthe first symbol of the input string. The automaton processes the string in steps - we talk aboutthe calculation or the run of the automaton. In each step, the head reads the symbol on the tapeand then the automaton consults the transition function that is a part of the processing unit. Inaccordance with its current state and the current symbol, the processing unit changes its state toa new (unique) one. Then the reading head moves one cell right and everything repeats.After the last symbol of the input string has been processed, the calculation stops. If the resultingstate of the processing unit belongs to the set of accepting states the automaton releases the signalthat the string was accepted; also, we say that it was an accepting calculation or an accepting run.Otherwise, the string is not accepted, i.e. rejected.

The set of all strings accepted by A is called the language accepted by A and is denoted by L(A).

Example 3.1. Finite deterministic automaton B over Σ = {a, b} isdescribed by its state diagram at the right. It consists of:

• Σ = {a, b} . . . the (input) alphabet,

• Q = {q0, q1, q2} . . . the set of all states; the start state q0 markedby an arrow coming in from nowhere.

����

��������

����

-R

bR

a

Ia, b

a

�b

q0 q1 q2

automaton B

• δ : Q× Σ → Q . . . the transition function (see the state diagram above - the circles (nodes) arethe states, a-arcs and b-arcs are the transitions) is defined as follows:

δ(q0, a) = q0, δ(q0, b) = q1, δ(q1, a) = q2, δ(q1, b) = q1, δ(q2, a) = δ(q2, b) = q1,

• F = {q1} . . . the set of all accept states (only one accept state in this example) - double circles.

Example 3.2. The run of the automaton B (above) on the string w1 = ababab is the following:

At the beginning, B is in the start state q0, the configuration of the automaton is (q0, ababab).The first symbol, (i.e. a), of the input ababab is read. The transition function determines the newstate, i.e. δ(q0, a) = q0 and the input string will be babab (because the head moves one cell right -now it is over the second symbol of the original string). The new configuration is (q0, babab).

In the second step, we read the current symbol (i.e. b), the transition function gives δ(q0, b) = q1and then the input string is abab, etc. (we use a special notation ⊢ (‘goes to’) for the change ofthe configuration; we talk about transition of configuration (state, string) to a new configurationin one step. The iteration symbol ⊢∗ (‘gradually goes to’) means a transition in more subsequenttransition steps:

(q0, ababab)⊢(q0, babab)⊢(q1, abab)⊢(q2, bab)⊢(q1, ab)⊢(q2, b)⊢(q1, ε), i.e. (q0, ababab) ⊢∗ (q1, ε).

Shortly (using the labels of states) we can write: q0 − q0 − q1 − q2 − q1 − q2 − q1 or simply

q0w1−→ q1. The resulting state q1 ∈ F , i.e. the automaton B accepts the string w1 = ababab.

Example 3.3. The run of the automaton B (above) on w2 = abba (the start state of B is q0):

(q0, abba) ⊢ (q0, bba) ⊢ (q1, ba) ⊢ (q1, a) ⊢ (q2, ε), i.e. q0 − q0 − q1 − q1 − q2 or q0w2−→ q2.

The resulting state q2 /∈ F , i.e. the automaton does not accept the string w2 = abba.

3The full phrase is the Deterministic Finite Automaton (DFA)

7

After some observation, we can see that the language accepted by B is infinite and

L(B) = {w ∈ Σ∗; B accepts the string w} = {w ∈ Σ∗; w = bai, where i = 0, 2, 4, 6, . . .}.

‘Small’ automata over the alphabet Σ = {a, b}

Two possible (up to isomorphism, of course) one-state automata overΣ are shown at the right. ��

������

-�

a,b

q0 ����

-�

a,b

q0

The first one accepts all strings from Σ∗ while the other one accepts no string.

There are quite many (up to isomorphism) two-state automata over Σ with states q0, q1:

• if F = {q0, q1} then such automata accept all strings from Σ∗,• if F = ∅ then such automata accept no string,• a special group of two-state automata is created by all automata satisfying δ(q0, a) = δ(q0, b) = q0.In them, the state q1 is un-reachable, thus, in fact, they can be reduced to one-state automata.

In the table below, we show some ‘interesting’ (up to isomorphism) cases:

AUTOMATON A LANGUAGE L(A)

����

��������

-R

b�

a�

a,b

q0 q1

all strings containing at least one symbol b, i.e.

{w ∈ Σ∗;#b(w) ≥ 1}

����

��������

-R

b

Ib

�a

�a

q0 q1

all strings containing an odd number of the symbols b, i.e.

{w ∈ Σ∗;#b(w) is odd}

����

��������

-R

b

Ia

�a

�b

q0 q1

all the strings having the suffix b, i.e.

{ub;u ∈ Σ∗}

����

��������

-R

b

Ia,b

�a

q0 q1

{bn; n is odd} ∪ {uabn; u ∈ Σ∗ ∧ n is odd}

����

��������

-R

a,b�

a,b

q0 q1

all strings from Σ∗ but ε, i.e. Σ∗ − {ε}

����

��������

-R

a,b

Ib

�a

q0 q1

{bn;n is odd} ∪ {uabn; u ∈ Σ∗ ∧ n is even, including n=0}

����

��������

-R

a,b

Ia,b

q0 q1

all strings of an odd length, i.e.

{w ∈ Σ∗; #(w) is an odd number}

8

SAMPLE EXERCISE

Exercise 3.1. Over the alphabet Σ = {a, b, c}, there are giventwo strings u = abb, v = bbcacb. First, create the string w =pump(2..3, v, 4)sub(3..5, u2)R. Then solve the following tasks:

(i) Show the run of the automaton B on the input string w.Then decide if the string belongs to the language L(B).(ii) Find a string w1 such that w1 ∈ co(L(B)) and #ab(w1) = 6. Show the run of the automaton B on this string.

��������

����

��������

6

R

b

Ic

R

a

Ia

a,c

b

b,c

q1 q0 q2

automaton B

Solution. (by Martin HOREJS)v1 = pump(2..3, v, 4) = bbcbcbcbcacb, u2 = abbabb, v2 = sub(3..5, u2)R = (bab)R = bab,w = v1v2 = bbcbcbcbcacbbab.

ad (i) (q0, bbcbcbcbcacbbab)⊢(q0, bcbcbcbcacbbab)⊢(q0, cbcbcbcacbbab)⊢(q1, bcbcbcacbbab)⊢(q0, cbcbcacbbab)⊢(q1, bcbcacbbab)⊢(q0, cbcacbbab)⊢(q1, bcacbbab)⊢(q0, cacbbab)⊢(q1, acbbab)⊢(q1, cbbab)⊢(q1, bbab)⊢(q0, bab)⊢(q0, ab)⊢(q2, b)⊢(q2, ε), i.e. (q0, w)⊢∗(q2, ε). It means that w ∈ L(B).

ad (ii) We can take w1 = ababababababcb and the computation on it givesq0 − q2 − q2 − q0 − q0 − q2 − q2 − q0 − q0 − q2 − q2 − q0 − q0 − q1 − q0, i.e. (q0, w) ⊢∗ (q0, ε) and itmeans that w1 /∈ L(B) or w1 ∈ co(L(B)).

PROBLEMS TO SOLVE

Problem 3.1. Over the alphabet Σ = {a, b, c}, there are given two strings u = acb and v = abbcbaand two automata A1 and A2:

����

��������

-R

b,c

Ib,c

a

a

q0 q1

automaton A1

��������

����

��������

6

R

b

Ic

R

a

Ib

a,c

b

a,c

q1 q0 q2

automaton A2

Now, create two new strings w1 = u2vRsub(2..5, v), w2 = pump(2..3, v, 3)sub(3..5, u2)R and then,for each automaton Ai, i = 1, 2, solve the following four tasks:

(i) Show the run of the automaton B on an input string w1 or w2, respectively. Then decide if thestring belongs to the language L(Ai).

(ii) Find a string w such that w ∈ co(L(Ai)) and #(w) > 20. Show the run of the automaton Ai

on this string.

(iii) Find a string w such that w /∈ co(L(Ai)) and #ab(w) = 6 . Show the run of the automatonAi on this string.

(iv) In your own words, describe the language L(Ai).

Problem 3.2. Over the alphabet Σ = {a, b}, design a two-state automaton A such that

(a) L(A) = {ε}, (b) L(A) = {bn;n ≥ 0},

(c) L(A) = {w ∈ Σ∗;#a(w) ≥ 1}, (d) L(A) = {w ∈ Σ∗;#(w) = 0, 2, 4, . . .},

(e) L(A) = {ua;u ∈ Σ∗}, (f) L(A) = {an;n is odd} ∪ {uban;u ∈ Σ∗ ∧ n is odd}.

9

TOPIC 4 - The Analysis of the Work of an Automaton (DFA)

Let A be a deterministic finite automaton over an alphabet Σ with the set of states Q, the transitionfunction δ : Q × Σ → Q and the set of accept states F ⊆ Q. The language accepted by A isL(A). The iteration δ∗ of the transition function δ is acting on the set of all configurations as(state, string) ⊢∗ (state′, string′). Such an iterated transition process is called a partial calculationor a partial run of A.

Example 4.1. The automaton A over Σ = {a, b} is given by its transition diagram below.

(1) We will show the partial run of our automaton on the stringw = aababba started from the state q1 made in 4 steps:

(q1, aababba) ⊢ (q2, ababba) ⊢ (q2, babba) ⊢ (q3, abba) ⊢ (q2, bba);

in short: (q1, aababba) ⊢∗ (q2, bba).

(2) For i=0, 1, 2, 3, we describe the sets of strings M0,M1,M2,M3, where Mi = {w ∈ Σ∗; (q0, w) ⊢∗ (qi, ε)}. We get:

M0 = {ε, a, a2, a3, . . .} = {an; n = 0, 1, 2, 3, . . .}.

M1 = {b, ab, a2b, a3b, . . .} = {anb; n = 0, 1, 2, 3, . . .},

M2 = {ba, aba, b2a, . . .} = {w ∈ Σ∗; w = ua ∧#b(w) ≥ 1},

M3 = {bb, abb, bab, bbb, . . .} = {w ∈ Σ∗; w = ub ∧#b(w) ≥ 2}. ������������������������?

a

Rb

6

ba?

b?

zb

za

za

q3

q2

q1

q0

'

&

$

%M3

M2

M1

M0

ε, a, aa,

a3, . . .

b, ab, aab,a3b, . . .

ba, aba, bba,aaba, . . .

bb, abb, bab,

b3 . . .

Σ∗

We can see that we have obtained a partition of Σ∗ onto 4 classes and, moreover, L(A) = M1∪M2.

This example demonstrates a general property of any finite deterministic automaton, namely that:

Every finite deterministic automaton A over an alphabet Σ defines, on the set Σ∗, the relationA∼ de-

scribed as: w1A∼ w2 if and only if there exists i such that (q0, w1) ⊢∗ (qi, ε) and (q0, w2) ⊢∗ (qi, ε).

A∼ is an equivalence relation and its partition classesM0,M1,M2, . . . are related to states q0, q1, q2, . . .of A. The following theorem is very important:

Nerode Theorem (1958). For every finite deterministic automaton A over an alphabet Σ, the

relationA∼ is a right congruence on Σ∗ which means that for any three strings w1, w2, u the followings

holds: w1A∼ w2 ⇒ w1u

A∼ w2u. On the set Σ∗, the relationA∼ defines a partition onto a finite

system of sets Mi related to states qi of automaton A; especially L(A) =∪i∈F

Mi.

Note. The Theorem is used to prove that some languages can not be defined by an automaton;for example, {an2

; n = 0, 1, 2, 3, . . .}, or {w ∈ {a, b}∗; w = wR}.

Example 4.2. K = {aibi; i=1, 2, . . .} over Σ={a, b} is not a language of any automaton.

Proof (by contradiction). Let us assume, that K = L(K) where K is an n-state automaton with

states q0, q1, . . ., qn−1. Automaton K defines, on Σ∗, the right congruenceK∼ with the partition classes

M0,M1, . . . ,Mn−1. Let us take n+1 strings over Σ, namely the strings a1, a2, a3, . . . , an, an+1. Bythe Dirichlet principle, there must exist two natural numbers i, j, (i < j ≤ n + 1) and an index ℓ

such that ai ∈ Mℓ ∧ aj ∈ Mℓ. It means that aiK∼ aj . Now, using the Nerode Theorem on w1 = ai,

w2 = aj , and u = bi, we get aibiK∼ ajbi.

Now, it is evident that K accepts the string aibi (see the description of language K above), but itdoes not accept the string ajbi (because j = i). This is a contradiction with the fact received in theprevious paragraph, that these two words belong to the same equivalence class.

10

SAMPLE EXERCISE

Exercise 4.1. The automaton A over the alphabet Σ ={a, b} is given by its transition diagram.

(i) On the string aaabbbabba, perform the 4-step partialcalculation started from state q1. Describe its run. ��

����������

����

-R

a

R

b�

b

a

a,b

q0 q1 q2

(ii) To each of the states q0, q1, q2 there is its associated set of strings M0,M1,M2, respectively.For each of these sets, find three different strings belonging to it.

(iii) Describe the language L(A). Justify your answer.

Solution. (by Ondrej LEVY)

(i) (q1, aaabbbabba) ⊢ (q1, aabbbabba) ⊢ (q1, abbbabba) ⊢ (q1, bbabba) ⊢ (q2, babba),in short: (q1, aaabbbabba) ⊢∗ (q2, babba).

(ii) M0 = {ε, bbb, bbbbb, . . .}, M1 = {bba, baaa, bbaa, . . .},M2 = {baba, abbb, aabb, . . .}.

(iii) L(A) = {biaj ; i ≥ 0, j > 0}. Explanation: First of all, M0 = {bi; i ≥ 0}. Then, it is clearthat M1 = {w;w = ua where u ∈ M0 or u ∈ M1} = {biaj ; i ≥ 0, j > 0}, and L(A) = M1.

PROBLEMS TO SOLVE

Problem 4.1. Two automata A1,A2 over the alphabet Σ = {a, b} are given by their transitiondiagrams.

A1

��������

����

-R

a

Ia,b

�b

q0 q1

A2

����

����

��������

-R

b

�aR

b

Y

a

a

b

q0 q1 q2

(i) On the string bbaaabbabba, perform the 5-step partial calculation started from state q1.Describe its run.

(ii) In A1 (or in A2), to each of the states q0, q1 (or q0, q1, q2 ) there is its associated set of stringsM0,M1 (or M0,M1,M2). For each of these sets, find three different strings belonging to it.

(iii) Choose the language L(Ai) from among the list of languages Q1 through Q6 offered below.Justify your answer.

A Table of Chosen Languages over the Alphabet Σ

Language Q1 . . . {waa;w ∈ Σ∗},Language Q2 . . . {wbb;w ∈ Σ∗},Language Q3 . . . {wbai;w ∈ Σ∗ ∧ i=0, 2, 4, 6, . . .},Language Q4 . . . Q3 ∪ {ai; i=0, 2, 4, 6, . . .},Language Q5 . . . {wbai;w ∈ Σ∗ ∧ i=1, 3, 5, 7, . . .},Language Q6 . . . Q5 ∪ {ai; i=1, 3, 5, 7, . . .}.

Problem 4.2.

(a) Using the Nerode Theorem, prove by contradiction that L = {w ∈ {a, b}∗; w = wR}. is nota language of any automaton.

(b) Using the Nerode Theorem, prove by contradiction that L = {an2; n = 0, 1, 2, 3, . . .}. is

not a language of any automaton.

11

TOPIC 5 - Finite and Infinite Regular Languages

Definition. Language L over the alphabet Σ is called a regular language, if there exists an infinitedeterministic automaton A such that L = L(A). The class of all regular languages over Σ will bedenoted by REG(Σ).

Observation. Every finite language L is regular, i.e. there exists an automaton A such thatL = L(A).

The construction of A uses so called partial transition function, that shows only the importanttransitions, while the unimportant transitions are omitted [replaced by the symbol ‘−’ (hyphen)].To complete the transition table we use an extra state, a garbage state.

Example 5.1. For Σ = {k, o, s} and L = {k, o, s, kos, kok}, we will construct an automaton Ausing a table of a partial transition function δ and an incomplete transition diagram.

k o s

→ q0 q1 q4 q4q∗1 − q2 −q2 q3 − q3q∗3 − − −q∗4 − − −

����

��������

����

��������

��������

-R

kR

o

Rk,s

@@@@R

o,s

q0 q1 q2 q3

q4

Now, we will show how to complete the partial transition function. One introduces a new garbagestate q5 and all the missing transition arcs will be directed to q5:

k o s

→ q0 q1 q4 q4q∗1 q5 q2 q5q2 q3 q5 q3q∗3 q5 q5 q5q∗4 q5 q5 q5q5 q5 q5 q5

����

��������

����

��������

��������

����

-R

kR

o

Rk,s

@@@@R

o,s@@@@R

k,s�

���

k,o,s

-k,o,s

?o

Kk,o,s

q0 q1 q2 q3

q4 q5

Theorem. Let, for an n-state automaton A, there exists a string w ∈ L(A) such that #(w) ≥ n.Then the language L(A) is infinite.

The idea of the proof will be illustrated on an example.

Example 5.2. A 4-state automaton A over Σ = {a, b} is describedby the state diagram at the right. This automaton accepts the stringw = aabaaa; the computation is:

(q0, aabaaa)⊢(q1, abaaa)⊢(q3, baaa)⊢(q1, aaa)⊢(q3, aa)⊢(q2, a)⊢⊢(q1, ε), or: q0 − q1 − q3 − q1 − q3 − q2 − q1. The sequence consistsof #(w) + 1 = 6+1 = 7 states and, necessarily, some state has torepeat (the Dirichlet principle).

����

��������

��������

����

- -a �a,b

�����a

a�b

b

�b

q0 q1 q2

q3

automaton A

Therefore, we can find a partial computation that is a computational loop; for example (q1, abaaa)⊢(q3, baaa)⊢(q1, aaa)is ‘looping’ from q1 to q1. This loop can be repeated k-times (k = 1, 2, 3, . . .) – you simply will be‘pumping’ the string aabaaa as pump(2..3, aabaaa, k).Summary: This automaton will accept all the strings of the form pump(2..3, w, k) with k =

1, 2, 3, . . ., i.e. aabaaa, aababaaa, aabababaaa, aababababaaa, . . . etc. It means that L(A)is infinite.

12

SAMPLE EXERCISES

Exercise 5.1.For the finite language L = {or, for, of} ∪ {w;#(w) = 1} over the alphabet Σ = {f, r, o} createan automaton B such that L = L(B). Show its partial transition function and its partial statediagram.

Solution. (by Jan SIMECEK)

f r o

→ q0 q2 q3 q1q∗1 q3 q3 −q∗2 − − q4q∗3 − − −q4 − q3 −

����

��������

��������

����

��������

-Rf Ro

@@@@R

r

���������r?

f,r

-o

q0 q1 q2 q4

q3

Exercise 5.2. Given a 3-state automaton A (at the right) find astring w of length at least 6 which is accepted by this automaton.Use the ’pumping principle’ on this string and prove that thelanguage L(A) is infinite. ��

������

��������

-R

bR

b

Ia

�a

�a,b

q0 q1 q2

automaton A

Solution. (by Martina CHUCHLOVA) w = ababbab and the computation will be:(q0, ababbab)⊢(q0, babbab)⊢(q1, abbab)⊢(q0, bbab)⊢(q1, bab)⊢(q2, ab)⊢(q2, b)⊢(q2, ε).A computational loop is (q1, abbab)⊢(q0, bbab)⊢(q1, bab).Pumping the original string w we get pump(3..4, w, k) with k = 1, 2, 3, . . ., i.e. ababbab, abababbab,ababababbab, . . . etc. All these strings belong to L(A), i.e. L(A) is infinite.

PROBLEMS TO SOLVE

Problem 5.1. For the given finite language L over the given alphabet Σ create an automaton Asuch that L = L(A). Show its partial transition function and its partial state diagram.

(a) Σ = {f, g, o}, L = {fog, off} ∪ {w;#(w) = 2},(b) Σ = {f, g, o}, L = {go, for, off} ∪ {w;#(w) = 1},(c) Σ = {d, g, o}, L = {go, do, god} ∪ {w;#(w) = 2},(d) Σ = {f, r, o}, L = {or, of} ∪ {w;#(w) = 3},

Problem 5.2. For the given automaton B (below) find a string w of length at least 5 which isaccepted by this automaton. Use the ’pumping principle’ on this string and prove that the languageL(B) is infinite.

(a) ����

��������

-R

b

Ia

�a

�b

q0 q1

automaton B

(b) ����

��������

����

-R

bR

a

Ia, b

a

�b

q0 q1 q2

automaton B

(c) ��������

����

��������

6

Rb

Ic

Ra

Ib

�a,c

�b

�a,c

q1 q0 q2

automaton B

(d) ����

����

����

��������

-R

aR

bR

a,b

Ia, b

�b

a

q0 q1 q2 q3

automaton B

13

TOPIC 6 - The Algebra on Automata - part 1

(1) The copy of an automaton. Let A be an n-state deterministic automaton over Σ with thestate set Q; let Φ : Q −→ P be a bijection. Then we can naturally define the Φ-copy of A (denotedby Φ(A) ) with the state set P (see Book 1 on discrete mathematics for the details). Φ(A) will bealso an n-state deterministic automaton over Σ and L(Φ(A)) = L(A).

(2) The co-automaton. Let Σ be an alphabet, let A be a deterministic automaton over Σ withthe state set Q and the accept state set F ⊆ Q. Now, in A, we change the accept state set to Q−F(swapping the set of non-accept and the set of accept states); we get the co-automaton co(A) andL(co(A)) = Σ∗ − L(A).

Example 6.1.

����� �� ��������� ��-

Rb

Ia

-b�

a�

a,b

q0 q1 q2

automaton B

⇒ ��������� �� ����-

Rb

Ia

-b�

a�

a,b

q0 q1 q2

automaton co(B)

(3) The synchronous parallel composition of automata. Let A, B be two automata overΣ with the state sets QA, QB, the start states q0,A, q0,B, the transition functions δA, δB, andthe accept states FA, FB. Then, for their parallel composition, there will be: the state set Q =QA × QB = {[q, q′]; q ∈ QA, q

′ ∈ QB}, the start state [q0,A, q0,B]; the transition function is:δ([q, q′], a) = [δA(q, a), δB(q

′, a)] for every a ∈ Σ.There are three different accept state sets F , namely:

for the automaton A− B: F = {[q, q′]; q ∈ FA ∧ q′ /∈ FB} and L(A− B) = L(A)− L(B);for the automaton A ∪ B: F = {[q, q′]; q ∈ FA ∨ q′ ∈ FB} and L(A ∪ B) = L(A) ∪ L(B);for the automaton A ∩ B: F = {[q, q′]; q ∈ FA ∧ q′ ∈ FB} and L(A ∩ B) = L(A) ∩ L(B).

Example 6.2. The picture shows the preparationof the state set for the parallel composition of auto-mata A and B. In fact, it is the cartesian product ofQA and QB where, instead of [qi, qj ], we write qi,j ;especially the start set will be q0,0. The transitionarcs are created in the evident way. There are threechoices of F :

for A− B there is F = {q1,1},for A ∪ B there is F = {q0,0, q1,0, q1,1, q0,2, q1,2},for A ∩ B there is F = {q1,0, q1,2}.The three automata are shown below:

������������?

'

&

$

%

'

&

$

%

'&

$%

Ib

?b

za

za q1

q0

automaton A ��������

����

��������

-R

b

I a

-b�

a�

a,b

q0 q1 q2

automaton B

����

����

���� a�

a�

a

q0,0 q0,1 q0,2

����

����

����

-

Ia

Ka Ka

Ib

?b

@@@R

b

����

b

@@@R

b

����

bq1,0 q1,1 q1,2

����

����

����

'

&

$

%

a�a

�a

q0,0 q0,1 q0,2

����

��������

����

-

Ia

Ka Ka

Ib

?b

@@@R

b

����

b

@@@R

b

����

bq1,0 q1,1 q1,2

automaton A− B

��������

����

��������

'

&

$

%

a�a

�a

q0,0 q0,1 q0,2

��������

��������

��������

-

Ia

Ka Ka

Ib

?b

@@@R

b

����

b

@@@R

b

����

bq1,0 q1,1 q1,2

automaton A ∪ B

����

����

����

'

&

$

%

a�a

�a

q0,0 q0,1 q0,2

��������

����

��������

-

Ia

Ka Ka

Ib

?b

@@@R

b

����

b

@@@R

b

����

bq1,0 q1,1 q1,2

automaton A ∩ B

14

SAMPLE EXERCISES

Exercise 6.1. Two automata A and B over the alphabet Σ = {a, b} are given below. Your task isto describe three automata: A− B, A ∪ co(B), and co(A) ∩ co(B).

����

��������

-R

a

Ia

�b

�b

q0 q1

automaton A

�������� ����

����

����

- -b

-a,b

)b

a

a

q0 q1 q2

automaton B

Solution. (by Jakub BARTA) The picture showsthe state set for the parallel composition of auto-mata A and B and the arcs. The three automataare described as:

for A− B there is F = {q1,2},for A ∪ co(B) there is F = {q1,0, q1,1, q1,2, q0,2},for co(A) ∩ co(B) there is F = {q0,2}. ��

����������?

'

&

$

%

'

&

$

%

'&

$%

Ia

?a

zb

zb q1

q0

automaton A ��������

��������

����

- -b

-a,b

�b

�a

�a

q0 q1 q2

automaton B

����

����

����

-b -b�

b

q0,0 q0,1 q0,2

����

����

����

-

-b

-bYb

Ia

Ia

?a

?a

@@@R

a

����

aq1,0 q1,1 q1,2

Exercise 6.2. Let A and B be the automata over Σ = {a, b} from Ex. 6.1. Use the constructionsfrom Ex. 6.1. and find a string w ∈ Σ∗ such that

(a) w ∈ L(A) ∧ w /∈ L(B) ∧ #(w) = 5 (b) w /∈ L(A) ∧ w /∈ L(B) ∧ #ab(w) = 3.

Solution.ad (a) We need w ∈ L(A − B) and we know that this automaton has only one accept state, thestate q12. We take w = abbaa and then the calculation in A− B gives(q00, abbaa)⊢(q10, bbaa)⊢(q11, baa)⊢(q12, aa)⊢(q02, a)⊢(q12, ε), i. e. (q00, w) ⊢∗ (q12, ε).

ad (b) We need w ∈ L(co(A)∩co(B)) and we know that this automaton has only one accept state,the state q02. We take w = abababa and then the calculation in co(A) ∩ co(B) gives (q00, w) ⊢∗

(q02, ε).

PROBLEMS TO SOLVE

Problem 6.1. Over the alphabet Σ = {a, b}, there are given three automata A1,A2,A3:

��������

����

-R

b

Ia

�a

�b

q0 q1

automaton A1

����

��������

��������

-R

bR

a

Ia, b

a

�b

q0 q1 q2

automaton A2

����

��������

����

��������

-R

aR

bR

a,b

Ia, b

�b

a

q0 q1 q2 q3

automaton A3

(a) Determine the number of accept states of the automaton

(i) co(A3)−A1 (ii) A2 ∩ A3 (iii) A1 ∪ A2 ∪ A2

(b) Design a state diagram for the automaton

(i) co(A1)−A2 (ii) A1 ∩ A2 (iii) A1 ∪ A3

(c) In each automaton in Prob. 6.1. (b), show the computation on the string w = abbbabab.

(d) For each of the automata in Prob. 6.1. (b), find a string u of its language such that

(1) #u = 8 ∧ #bau ≤ 1, (2) [#au = #b(u)] ∧ u = uR.

15

TOPIC 7 - Non-Deterministic Finite Automata (NFA)

Note: It will be more convenient in this chapter to start the numbering of states of an automatonfrom q1, i.e. Q = {q1, q2, . . .}. If Q is the state set, the symbol P(Q) means the power set of Q, i.e. the set of all subsets of Q.

The finite non-deterministic automaton A over the alphabet Σ is the following generalization ofthe finite deterministic automaton: The start state is not necessarily only one, but it can be anynon-empty subset Q0 ⊆ Q. The transition function can be multivalued, i.e. δ : Q × Σ → P(Q).Every computational step is again of the form (qi, aw) ⊢ (qj , w) but it means that qj ∈ δ(qi, a), i.e.it is not uniquely determined. A partial calculation is again a sequence of computational steps, wewrite again (qi, w1) ⊢∗ (qj , w2); for every state qi we can define the set Mi ⊆ Σ∗ as Mi = {w ∈ Σ∗;there exists q ∈ Q0 such that (q, w) ⊢∗ (qi, ε)}. The system of sets Mi is not necessarily a partitionof Σ∗; in general, it is a hypergraph on Σ∗. The language accepted by A is L(A) = {w ∈ Σ∗; thereexist q ∈ Q0, q

′ ∈ F such that (q, w) ⊢∗ (q′, ε)}. It is the union of all Mq′ where q′ ∈ F .

A new phenomenon appears - the result of a computation on the string w ∈ Σ∗ started at the stateq ∈ Q0 depends on the path which may be not unique (it is non-deterministic). Three cases arepossible (even for the same string w):

• there exists q′ ∈ F such that (q, w) ⊢∗ (q′, ε) - the computation accepts w,

• there exists q′ /∈ F such that (q, w) ⊢∗ (q′, ε) - the computation does not accept w,

• there exist q′ ∈ Q and u ∈ Σ∗, u = ε such that (q, w) ⊢∗ (q′, u)× and the computation can notbe finished.

Example 7.1. A non-deterministic automaton A is described by itsstate diagram at the right. It consists of: Σ={a, b}, Q={q1, q2, q3},Q0={q1}, F={q3}; the transition function δ : Q × Σ → P(Q) is de-scribed by its table.

Examples of calculations:(q1, babb) ⊢ (q1, abb) ⊢ (q1, bb) ⊢ (q2, b) ⊢ (q3, ε) - the string is accepted,(q1, babb) ⊢ (q1, abb) ⊢ (q1, bb) ⊢ (q1, b)⊢(q2, ε) the string isn’t accepted,(q1, babb) ⊢ (q2, abb) ⊢ (q3, bb)× the computation can not be finished..

����

����

��������

-b-

a,b-

�a,b

q1 q2 q3

automaton A

a b

→ q1 {q1} {q1, q2}q∗2 {q3} {q3}q3 { } { }

Now, we will show the hypergraph on Σ∗ with hyperedgesMi corre-sponding to the states qi of the automaton A.

M1 . . . it is evident that M1 = Σ∗ because for every string w ∈ Σ∗

there is a computation (q1, w) ⊢∗ (q1, ε)

M2 . . .M2 = {wb;w ∈ Σ∗} because just in such cases we have acomputation (q1, wb) ⊢∗ (q1, b) ⊢ (q2, ε)

M3 . . .M3 = {wba;w ∈ Σ∗} ∪ {wbb;w ∈ Σ∗} because just in suchcases we have a computation (q1, wbx) ⊢∗ (q1, bx) ⊢ (q2, x) ⊢ (q3, ε)where x = a or x = b. ��

��������������?

?

?a,b

b

za, b

q3

q2

q1

'

&

$

%

'&

$%

M3

M2

M1=Σ∗

The language L(A) = M3 which means that it consists of all strings with the penultimate (i.e. thesecond from the right) symbol equal to b.

Theorem. To every n-state non-deterministic automaton A there exists a deterministic automatonAd such that L(A) = L(Ad). The automaton Ad has 2n states.

The proof is based on so called subset construction. Let, for the original non-deterministic auto-maton A, Q = {q1, q2, . . . , qn} be the state set, Q0 ⊆ Q be the start state set, F ⊆ Q be the acceptstate set, δ : Q× Σ → P(Q) be the transition function.

For the new deterministic automaton Ad: Qd = P(Q) is the state set, Q0 ∈ P(Q) is the startstate, Fd = {Q′ ∈ P(Q); Q′ ∩ F = ∅} ∈ P(Q) is the accept state set. The transition function is

16

defined as δd : P(Q)× Σ → P(Q) where δd(Q′, a) =

∪q∈Q′

δ(q, a) ∈ P(Q).

We will show this construction on the automaton A from Example 7.1. with state set Q ={q1, q2, q3}. For the elements of power set P(Q) (just 8 = 23 subsets), we will use the notation ex-ploiting the binary code: ∅ = p000, {q1} = p100, {q2} = p010, {q3} = p001, {q1, q2} = p110, {q1, q3} =p101, {q2, q3} = p011, {q1, q2, q3} = p111. The rest is evident from the transition diagram of automa-ton Ad (at the left below).

����

����

����

��������

-b

-a

� a,bp000 p100 p110 p101

automaton Ad

����

��������

��������

��������

-

�a,b

-a,b

KaKa,b

*b

6a

@@@@R

b@

@@

@I

a,b=⇒

p010 p001 p011 p111

����

����

��������

-b

-a

� a,bp100 p110 p101

��������

-

Ka

*b

6a

@@@@R

b

p111

automaton Areducedd

From the picture (at the left), we can see that some of the states can not be reached by anycalculation. These four states (called not accessible) will be removed from the transition diagram(see the diagram of automaton Areduced

d at the right above).

SAMPLE EXERCISEExercise 7.1. At the right, you can see a nondeterministic automaton Aover the alphabet Σ = {a, b}.

����

��������

-R

b

Ib

a,b

q1 q2

(i) For the automaton A, find (if there exist) three strings w1, w2, w3 ∈ Σ∗ such that:w1 ∈ L(A), w2 /∈ L(A) and every computation on it will be finished, w3 /∈ L(A) and everycomputation on it will not be finished.

(ii) Use the standard subset construction and create the transition diagram of the deterministicautomaton Ad. Find the number of non-accessible states in Ad.

Solution. (by Jan SIMECEK)

ad (i) w1 = ba and then (q1, ba) ⊢ (q2, a) ⊢ (q2, ε), i.e. w1 ∈ L(A); w2 = ε /∈ L(A) and the onlypossible computation is (q1, ε); w3 = abb /∈ L(A) and the only possible computation is (q1, abb)×and the computation can not be finished.

ad (ii) At the right, we show the automatonAd and its transition table. From the diagram, itis evident that this new automaton has no non-accessible state.

a b

→ p10 p00 p01p∗01 p01 p11p∗11 p01 p11p00 p00 p00

������

��

����������

������-

a����

b���*

a �����

b?

�a,b

�b

sa

p00

p10

p01

p11

deterministic automaton Ad

PROBLEMS TO SOLVE

Problem 7.1. For the given non-deterministic automaton A over Σ = {a, b} find

(i) u, v ∈ Σ∗ such that u∈L(A), v /∈L(A), (ii) the number of non-accessible states in Ad.

(a) ����

��������

- b-

a�

b

q1 q2

automaton A

(b) �������� ����

����

����

- b-

a

b-�

a

q1 q2 q3

automaton A

(c) ����

����

����

��������

-b-

a,b-

a,b-

�a,b

q1 q2 q3 q4

automaton A

17

TOPIC 8 - Nondeterministic Finite Automata with ε-Transitions

A non-deterministic finite automaton B with ε-transitions over the alphabet Σ is the following ge-neralization of a non-deterministic automaton: in the state diagram, there can appear transitionsof new kind, labeled by ε, i. e. the transition function is δ : Q × (Σ ∪ {ε}) → P(Q). For every

transition arrow qiε→ qj , the computation step has a form of (qi, w)

ε⊢ (qj , w), where w is any

string. The possibilities for computations are more rich.

Example 8.1. An example of a non-deterministic automaton with ε-transitions B is given as: Σ={a, b}, Q={q0, q1, q2}, Q0={q0}, F={q2};the transition function including ε-transitions is δ : Q×(Σ ∪ {ε}) →P(Q) and it is described by its transition diagram and its table.Examples of computations:

(q0, abbab)⊢(q1, bbab)⊢(q2, bab)ε⊢(q1, bab)⊢(q2, ab)

ε⊢(q0, ab)⊢(q1, b)⊢(q2, ε)

the string is accepted,(q0, abbab)⊢(q1, bbab)⊢(q1, bab)⊢(q1, ab)× not finished,

����

����

��������

- a-

ε

b-Y

ε

b

q0 q1 q2

automaton B

a b ε

→q0 {q1} { } { }q1 { } {q1,q2} { }q∗2 { } { } {q0,q1}

(q0, ababb)⊢(q1, babb)⊢(q2, abb)ε⊢(q0, abb)⊢(q1, bb)⊢(q1, b)⊢(q1, ε) the string is not accepted.

The language L(B) = {abk1abk2 . . . abkn ; n ≥ 1, ki ≥ 1}.

Theorem. For every non-deterministic automaton with ε-transitions B, there exists (on the samestate set) a NFA without ε-transitions B−ε such that L(B) = L(B−ε).

Proof. The construction of a new automaton: we take every chain of transition arcs of the form

qi1ε→qi2

ε→ . . .ε→qik

x→qik+1

ε→ . . .ε→qin−1

ε→qin , where x ∈ Σ, and add the new transition arc qi1x→qin .

Finally, all the ε-transitions are deleted.

Example 8.1. - continued. To transform the automaton with ε-transition B from Ex. 8.1. ontoan automaton without transitions we will investigate of ‘special’ chains of arcs in it creating newarcs that are not ε-arcs:

q1b→q2

ε→q1 ⇒ q1b→q1 (already exists), q1

b→q2ε→q0 ⇒ q1

b→q0 (a new b-arc),

q2ε→q1

b→q2 ⇒ q2b→q2 (a new b-arc), q2

ε→q0a→q1 ⇒ q2

a→q1 (a new a-arc),

q2ε→q1

b→q1 ⇒ q2b→q1 (a new b-arc), q2

ε→q1b→q2

ε→q0 ⇒ q2b→q0 (a new b-arc).

The new state diagram and the new transi-tion table of the automaton B−ε are: ��

������

��������

- a-a,b

b

b-Y

b

b b

q0 q1 q2

a b

→q0 {q1} { }q1 { } {q0,q1,q2}q∗2 {q1} {q0,q1,q2}

PROBLEMS TO SOLVE

Problem 8.1. For each of the given non-deterministic automata B over Σ = {a, b} below showone accepting and one non-accepting calculation on the string ababb.

Problem 8.2. For the given non-deterministic automaton B over Σ = {a, b} construct B−ε.

(a) ����

��������

-b-

ε�a

q1 q2

automaton B

(b) �������� ����

����

����

-a-

b-

ε� εq1 q2 q3

automaton B

(c) ����

��������

����

��������

-a-

ε-

a-

jε �a,b

q1 q2 q3 q4

automaton B

18

TOPIC 9 - The Algebra on Automata - part 2

(4) The reversal automaton. Let A be a finite non-deterministic automaton with ε-transitionsover the alphabet Σ; the state sets are Q,Q0, F and the transition function is δ. In the newautomaton, we reverse all the arcs of the transition diagram of A; moreover, F becomes the set ofstart states and Q0 becomes the set of accept states of the new automaton. The new automatonwill be denoted by AR and L(AR) = L(A)R.

Example 9.1. The picture shows an automaton A and its reversal AR :

����

����

��������

- a-

ε

b-Y

ε

b

q0 q1 q2

automaton A

⇒ ��������

����

����

�a�Rε

b�*ε

b

q0 q1 q2

automaton AR

Examples of calculations in A: (q0, abb)(1)

⊢ (q1, bb)(2)

⊢ (q2, b)(3)ε

⊢ (q1, b)(4)

⊢ (q2, ε), i.e. abb∈L(A),

in AR: (q2, bba)(4)R

⊢ (q1, ba)(3)εR

⊢ (q2, ba)(2)R

⊢ (q1, a)(1)R

⊢ (q0, ε), i.e. bba∈L(A)R.

L(A) = {abk1abk2 . . . abkn ; n ≥ 1, ki ≥ 1} and L(AR) = {bk1abk2a . . . bkna; n ≥ 1, ki ≥ 1}.

(5) The serial composition of automata. Let A1, A2 be two non-deterministic automata withε-transition over the alphabet Σ; the state sets are Q1, Q2 (Q1∩Q2 = ∅), the start sets are Q0

1, Q02,

the accept states are F1, F2, the transition functions are δ1, δ2. The serial composition A1→A2 hasthe state set Q = Q1 ∪Q2, the start set is Q0 = Q0

1, the accept state set is F = F2, the transitionfunction is δ = δ1 ∪ δ2 altogether with all ε-arcs of the form q

ε→ q′, where q ∈ F1, q′ ∈ Q0

2. ThenL(A1→A2) = L(A1)L(A2).

Example 9.2.

���� � ������ ����� ��- a-

ε

b-q0 q1 q2

aut. A1

���� ����� ��-

Ra

�b

a

r0 r1

aut. A2

⇒ ���� ���� ����- a-

ε

b-q0 q1 q2

automaton A1→A2

���� ����� ��Ra

�b

aq-εε

r0 r1

Examples of calculations in A1: (q0, aab)(1)

⊢ (q1, ab)(2)ε

⊢ (q0, ab)(3)

⊢ (q1, b)(4)

⊢ (q2, ε), i.e. aab ∈ L(A1),

in A2: (r0, aa)(5)

⊢ (r0, a)(6)

⊢ (r1, ε), i.e. aa ∈ L(A2), in A1→A2 : (q0, aabaa)(1)

⊢ (q1, abaa)(2)ε

⊢ (q0, abaa)(3)

⊢ (q1, baa)(4)

⊢ (q2, aa)ε⊢(r0, aa)

(5)

⊢ (r0, a)(6)

⊢ (r1, ε), i.e. aabaa ∈ L(A1→A2).

L(A1) = {ambn; m ≥ 1, n ∈ {0, 1}}, L(A2) = {as(bak1) . . . (bakp); s ≥ 1, ki ≥ 1, p ≥ 0},L(A1→A2) = {ambnas(bak1) . . . (bakp); m ≥ 1, s ≥ 1, n ∈ {0, 1}, ki ≥ 1, p ≥ 0}.

(6) The iteration of an automaton. Let A be a finite non-deterministic automaton with ε-transitions over the alphabet Σ; the state sets are Q,Q0, F and the transition function is δ. In thenew automaton A∗, we add all the backwards ε-arcs of the form q′

ε→ q, where q′ ∈ F , q ∈ Q0.Moreover, to the state set we add a new state qε, which is both start and accept state (it is isolatedand it accepts the empty string ε only). Then L(A∗) = L(A)∗.

Example 9.3.

����

��������

��������

- a- b-q0 q1 q2

automaton A

⇒ ������������

��������

��������

-

-

a-

εb-

q0

q1 q2

automaton A∗

Examples of calculations in A: (q0, a)(1)

⊢ (q1, ϵ), or (q0, ab)(2)

⊢ (q1, b)(3)

⊢ (q2, ε), i.e. a, ab ∈ L(A),

in A∗: (q0, abaa)(2)

⊢ (q1, baa)(3)

⊢ (q2, aa)ε⊢(q0, aa)

(1)

⊢ (q1, a)ε⊢(q0, a)

(1)

⊢ (q1, ε), abaa ∈ L(A)∗.

19

L(A) = {a, ab} and L(A∗) = {ak1bak2b . . . aknbj ; n ≥ 1, ki ≥ 1, j ∈ {0, 1}} ∪ {ε}.

Summary. On the class of all ε-NFA we have defined the operations of copying, a co-automaton,synchronous parallel composition, reversal, serial composition, iteration. The result ofeach operation is an ε-NFA as well. It means that the class REG(Σ) is closed to complements,differences, intersections, unions, reversals, concatenations, and iterations.

Application 9.1. As we could see the ε-NFA give us more freedom when we want to constructan automaton accepting a given language. The example below is taken from Hopcroft, Motwani,Ullman. Introduction to Automata Theory, Languages, and Computation, 2001, pg. 73: “In thepicture is an ε-NFA that accepts

decimal numbers. Of particular interest is the tran-sition from q0 to q1 of any of ε,+, or −. Thus, stateq1 represents the situation in which we have seenthe sign if there is one, and perhaps some digits,but not the decimal point. State q2 represents the

����

����

����

������

����������

-ε,+ ,−

-.-

0,1,...,9-

ε-

HHHj0,1,...,9 ���*.

0,1,...,9

0,1,...,9

q0 q1 q2

q4

q3 q5

situation where we have just seen the decimal point, and may or may not have seen prior digits. Inq4 we have definitely seen at least one digit, but not the decimal point.”

SAMPLE EXERCISEExercise 9.1. At the right, you can see nondetermi-nistic automata A and B over Σ = {a, b}. First, ��

����������

- �bR

a,b

a,b

q1 q2

automaton A

����

��������

- �aR

b

a,b

r0 r1

automaton B

create the reversal of B, i.e. BR, and then C = A → BR. Now, eliminate the ε-transitions from Cand get C−ε. Finally, use the standard subset construction and convert the automaton C−ε into thedeterministic automaton D = (C−ε)d, but show only its accessible states.

Solution. (by Anna STEPURA)

����

����

�����-a

b

a,b

r0 r1

automaton BR

⇒ ����

����

����

��������

- �bR

a,b

a,b

- -ε �aR

b

a,b

q1 q2 r1 r0

automaton C = A → BR

����

����

����

��������

- �bR

a,b

a,b

- -a,b �aR

b

*a,b �b

a,b

s1 s2 s3 s4

automaton C−ε

⇒ ����

����

����

��������

- -a,b

a

- -b �bR

a

b

a

p1000 p0110 p1111 p0111

automaton D = (C−ε)d

PROBLEMS TO SOLVEProblem 9.1. Using the automata A1 andA2 over Σ = {a, b} create a new automaton Band then convert it into B−ε.

����

��������

- �-ε

a

q0 q1

automaton A1

��������

����

��������

- -b -ε

a,b

r0 r1 r2

automaton A2

(a) B = AR1 , (b) B = A∗

2, (b) B = AR2 , (d) B = A1→AR

2 .

Problem 9.2. Remove the ε-transitions from the ε-NFA in Application 9.1. obtaining a NFAwithout ε transitions.

20

TOPIC 10 - Moore and Mealy Machines

These deterministic automata not only read an input string, but also write an output string.

Example 10.1. A Moore machine M is described by itsstate diagram. It involves:

• Q = {q0, q1, q2} . . . state set, q0 start state,• Σ = {a, b} . . . input alphabet,• Γ = {x, y, z} . . . output alphabet,• δ : Q× Σ → Q . . . transition function (see the table):

We do not need accept states because the result of any com-putation is some output string. A new feature is that everystate has labeled (after the slash symbol ’/’) a symbol of theoutput alphabet (this symbol is written at the moment thestate is active), i.e. q0/x, q1/z, q2/x (the labeling functionis µ : Q −→ Γ).

����

����

����

-R

bR

a

Ia, b

�a �b

q0/x q1/z q2/x

Moore machine M

a b output symbol

→ q0 q0 q1 x

q1 q2 q1 z

q2 q1 q1 x

Demonstration.The computation of the Moore machine M on the string w = abab in computational steps

The start state of M is q0. Because q0 is active, the machine writes the symbol x. Then, the firstsymbol (i.e. a) of the input string abab is read. The transition function determines the new state,i.e. δ(q0, a) = q0, and q0 is active. The next symbol is written, i.e. x. At this moment, the new wordis xx and an old word (remaining to be read) is bab. Now, the next symbol is read (i.e. b) etc.

The configurations are of the form (state, oldstring, newstring) and the computation is:

(q0, abab, x) ⊢ (q0, bab, xx) ⊢ (q1, ab, xxz) ⊢ (q2, b, xxzx) ⊢ (q1, ε, xxzxz), or(q0, abab, x) ⊢∗ (q1, ε, xxzxz).The result of the computation of the machine M on the string abab is the string xxzxz.

Example 10.2. A Mealy machine N is described by itsstate diagram. It involves:

• Q = {q0, q1, q2} . . . state set, q0 start state,

• Σ = {a, b} . . . input alphabet,• Γ = {x, y, z} . . . output alphabet,

• δ : Q× Σ → Q× Γ . . . the modified

����

����

����

-R

b/z

Ra/y

Ia/y

b/z

a/x

b/y

q0 q1 q2

Mealy machine N

transition function defines for every transition arrow (after the slash symbol ’/’) the output symbolthat is written immediately after this transition step is executed.

Here, δ(q0, a) = (q0, x), δ(q0, b) = (q1, z), δ(q1, a) = (q2, y),

δ(q1, b) = (q1, y), δ(q2, a) = (q1, y), δ(q2, b) = (q1, z).

The modified transition function is shown at the right:

a b

→ q0 q0/x q1/z

q1 q2/y q1/y

q2 q1/y q1/z

Demonstration.The computation of the Mealy machine N on the string w = abab in computational steps

The configurations are of the form (state, oldstring, newstring) and the computation is:

(q0, abab, ε) ⊢ (q0, bab, x) ⊢ (q1, ab, xz) ⊢ (q2, b, xzy) ⊢ (q1, ε, xzyz), or(q0, abab, ε) ⊢∗ (q1, ε, xzyz). The result of the computation of the machine N on the string abab isthe string xzyz.

21

A SAMPLE EXERCISE

Exercise 10.1. In the picture below, you can see two automataM1 andM2 , both with Σ = {0, 1}and Γ = {X,Y }.

����

��������

����

�����

@@

@@I 0 1

) 1

-1

-1

0

0

?0

q2/X q1/X q0/Y

q3/Y

automaton M1

����

����

����

- - �1/Y 1/X

*0/Y

R0/Y

1/Y

0/X

q0 q1 q2

automaton M2

(i) Decide which is a Moore automaton and which is a Mealy one and explain why.

(ii) For each of them, create its transition table.

(iii) For each of them, show the calculation on the string 010111.

Solution. (by Frantisek DRDAK)

ad (i) M1 is a Moore machine because its labeling function is determined by the states only.

ad (ii) M1:

0 1 output symbol

→ q0 q0 q2 Y

q1 q3 q0 X

q2 q2 q1 X

q3 q2 q0 Y

, M2:

0 1

→ q0 q2/Y q1/Y

q1 q2/Y q1/Y

q2 q2/X q1/X

.

ad (iii) In M1: (q0, 010111, Y ) ⊢ (q0, 10111, Y Y ) ⊢ (q1, 0111, Y Y X) ⊢ (q2, 111, Y Y XX) ⊢(q2, 11, Y Y XXX) ⊢ (q1, 1, Y Y XXXY ) ⊢ (q1, ε, Y Y XXXYX),

in M2: (q0, 010111, ε) ⊢ (q2, 10111, Y ) ⊢ (q1, 0111, Y X) ⊢ (q2, 111, Y XY ) ⊢ (q1, 11, Y XY X) ⊢(q1, 1, Y XY XY ) ⊢ (q1, ε, Y XY XY Y ).

PROBLEMS TO SOLVEProblem 10.1. At the right, you can see a transition tableof a Mealy machine. Sketch its transition diagram and thenperform the calculation on the string ababb.

a b

→ q0 q2/C q1/D

q1 q0/D q1/D

q2 q3/C q3/D

q3 q2/D q0/C

.

Problem 10.2. At the right, you can see a transition tableof a Moore machine. Sketch its transition diagram and thenperform the calculation on the string ababb.

a b output symbol

→ q0 q1 q2 D

q1 q2 q0 C

q2 q0 q1 E

,

Problem 10.3. Every Moore machine can be converted into a Mealy machine. Convert M1 fromExercise 10.1. into a Mealy machine.

Problem 10.4. Every Mealy machine can be converted into a Moore machine. Convert M2 fromExercise 10.1. into a Moore machine.

22

TOPIC 11 - Regular Expressions and Automata

A regular expression (i.e. RE) over an alphabet Σ is “a formula” that can be used “to create” somestrings from Σ∗. If R is a regular expression over Σ, then L(R) means the language over Σ that isdetermined by R.

Kleene Theorem. The languages generated by regular expressions over a language Σ are exactlythe regular languages over Σ, i.e. the languages accepted by finite deterministic automata over Σ.

Now, we give the correct mathematical definition of the regular expression and the language createdby this expression:

The inductive definition of the regular expression over the alphabet Σ:

1. the empty expression ∅ and the notation ε are regular expressions over Σ,

2. every symbol a from Σ is a regular expression over Σ,

3. if R1, R2, R are regular expressions over Σ, then the following are also regular expressions overΣ:

• (R1 +R2) or simply R1 +R2,

• (R1·R2) or simply R1R2,

• (R)∗ or simply R∗.

Note. You can reduce the amount of brackets in a regular expression if you accept the hierarchyroles of operators + , · , ∗, i.e. ∗ has the highest priority, then · and finally +. For example, insteadof (a+ (b(a)∗)) you can write a+ ba∗. But (a+ b)∗ is not the same as a∗ + b∗.

The inductive definition of the language L(R) generated by the regular expression Rover the alphabet Σ:

1. for the empty expression ∅ there is L(∅)=∅ and for the expression ε there is L(ε)={ε},

2. for every symbol a from Σ there is L(a) = {a},

3. if regular expressions R1, R2, R over Σ generate languages L(R1), L(R2), L(R), then:

• L(R1 +R2) = L(R1) ∪ L(R2),

• L(R1·R2) = L(R1R2) = L(R1)L(R2),

• L(R∗) = [L(R)]∗ = [L(R)]0 ∪ [L(R)]1 ∪ [L(R)]2 ∪ [L(R)]3 ∪ . . . =

= {ε} ∪ L(R) ∪ [L(R)]2 ∪ [L(R)]3 ∪ . . ..

Example 11.1. Let Σ = {a, b}. If R = ε + a + b + ab(a + b)∗, then L(R) contains ε, a, b and,further, all the strings over Σ with the prefix ab, i.e.

L(R) = {ε, a, b, ab, aba, abb, abaa, abab, . . .} = {ε, a, b} ∪ {abw;w ∈ Σ∗}.

Example 11.2. Let Σ = {a, b}. If R = a∗babb∗, then L(R) contains all strings over Σ of the formaibabbj , i.e.

L(R) = {bab, abab, ababb, aabab, . . .} = {aibabj ; i = 0, 1, 2, . . . , j = 1, 2, 3, . . .}.

Regular expressions are the third way how to describe languages. The term “regular” means thatthey are exactly the languages accepted by automata.

23

Theorem 11.1. (Converting regular expressions to automata) Let Σ be an alphabet. For everyregular expression R, there is an automaton B, such that L(B) = L(R).

Outline of proof. If you require an exact mathematical proof you must do it by induction and usethe inductive definition of the regular expression. The result can be a non-deterministic automatonB with ε-transitions (we are able to transform it onto a deterministic one).

In the first inductive step, we show how to construct automata B1. B2, B3 for the elementaryregular expressions of the forms L1 = ∅, L2 = {ε}, L3 = {a}, respectively. They are

f fb-�� �

B1 fb f-�� �

B2 f fb-�� �

B3

a-

Let us show how to construct the automata for the union, the concatenation and iteration of regularexpressions(see below):

Unionf fb-�� � f fb-

�� � ⇒ f ff fb- �� �

���*HHHj

HHHj

���*

f f�� � '

&

$

ε

ε

ε

Concatenationf fb-�� � f fb-

�� � ⇒ f f-�� � f fb�� �

��

��ε-

Iterationf fb-�� � ⇒ f ff fb-

�� � '

&

$

%�

�- -ε ε

ε

ε

Example 11.3. Let Σ = {a, b}. Construct an automaton B accepting the languageK = {w ∈ Σ∗; w has a suffix ba} ∪ {bb}.

Solution. The regular expression for language K is R = (a+ b)∗ba+ bb. We gradually constructthe automata for the regular expression a + b, then for its iteration; further, we construct theconcatenation of the current automaton and the automaton for the regular expressions b and a;finally, we construct the union of the automaton and the automaton for bb. The result is in thepicture (the internal ovals are helper only):

f ff f-

-

a

b

f -f - - f - f - f - f - f - fb

f - f - f - fZZZZ

ZZZ~ ��������������������:

ε

ε

- �� �

automaton B

ε ε ε ε b�� � ε a

� �εε ε

b

�� � ε b

�� �

���*HHHj

HHHj

���*

f f�� �

'

&

$

%

ε

ε

ε

ε

24

Theorem 11.2. (Converting automata to regular expressions) Let Σ be an alphabet. For everyautomaton A, there is a regular expression R such that L(R) = L(A).

Outline of proof. First, we define so called matrix of regular transitions R = (Rij)ni,j=1 of order

n; its entries are regular expressions Rij such that Mij = L(Rij). The matrix is created in steps(compare the matrices created in the steps of the famous Floyd algorithm); we gradually create thesequence of matrices 0R, 1R, . . . , nR = R.

WHAT ARE THE MATRICES kR where k = 0, 1, . . . , n?

First, let kMi,j be the set of strings w ∈ Σ∗ such that there exists, in A, a partial computation ofthe form (qi, w) ⊢∗ (qj , ε), and, moreover, for every state qℓ in this computation (with the exception

of the first and the last state) there is ℓ ≤ k. Now, kR =(kRij

)ni,j=1

is a matrix of order n and its

entries are regular expressions kRij such that kMij = L(kRij).

At the beginning, 0Rij are constructed directly form the transition arrows of A (if there is notransition from qi to qj we put 0Rij = ∅). Then, we continue in steps - the k-th step is:

k+1Rij =kRij +

kRi,k+1 ·(kRk+1,k+1

)∗· kRk+1,j

The second part of the expression above looks like: e e e- -kRi,k+1kRk+1,jqi qj

qk+1

(kRk+1,k+1

)∗

The example below shows more details.

Note. We will write R1 ≈ R2, if these two regular expressions generate the same language. Forexample, there is: xx∗ ≈ x∗x, (ε+x)∗ ≈ x∗, (ε+x)x∗ ≈ x∗(ε+x) ≈ x∗, ε+x+x∗ ≈ x∗ etc.

Example 11.4. For the automaton nj n njq1 q2 q3-a

a

- -ε, a b

b

create

the matrix of regular transitions R and then a regular expression for its language.

Solution. The matrices kR are gradually created, using the helper matrices kHLP .

0R=

ε+a ε+a ∅∅ ε b∅ b ε+a

bbb b bbb-HHHHj

����*����*

-HHHHj

ε+a

ε+a

ε+a

q1

q2

q3

q1

q2

q3

q1

a∗

1HLP=

a∗ a∗ ∅∅ ∅ ∅∅ ∅ ∅

1R =

a∗ a∗ ∅∅ ε b∅ b ε+a

bbb b bbb-HHHHj

����*����*

-HHHHj

a∗

ε

b

∅ε

b

q1

q2

q3

q1

q2

q3

q2

ε

2HLP=

∅ a∗ a∗b∅ ε b∅ b bb

2R =

a∗ a∗ a∗b∅ ε b∅ b ε+a+bb

bbb b bbb-HHHHj

����*����*

-HHHHj

a∗b

b

ε+a+bb

b

ε+a+bb

q1

q2

q3

q1

q2

q3

q3

(a+bb)∗

3HLP=

∅ a∗b(a+bb)∗b a∗b(a+bb)∗

∅ b(a+bb)∗b b(a+bb)∗

∅ (a+bb)∗b (a+bb)∗

3R =

a∗ a∗+a∗b(a+bb)∗b a∗b(a+bb)∗

∅ ε+b(a+bb)∗b b(a+bb)∗

∅ (a+bb)∗b (a+bb)∗

Now, R = 3R11 +3R13 = a∗ + a∗b(a+bb)∗.

25

A SAMPLE EXERCISE

Exercise 11.1. R1 = b∗ab∗ab∗ and R2 = a(a + b)∗a + b(a + b)∗b are two regular expressionsover the alphabet Σ = {a, b}. From the list of languages L1 through L6 at the bottom of this pagechoose the right ones for L(R1) and L(R2). Justify your answer.

Solution. (by Jirı HOMAN)

L(R1) = L2 because the expression b∗ab∗ab∗ describes any string with just two copies of a and anynumber of copies of b before the first a, between the a’s, and after the second a.

L(R1) = L6 because this expression gives two kinds of strings, either awa, or bwb for any stringw ∈ Σ∗.

PROBLEMS TO SOLVE

Problem 11.1. Choose the language L(R) from among the list of languages L1 through L6 offeredin the table at the bottom of this page. Justify your answer.

(a) R = (a+b)∗b(aa)∗, (b) R = (a+ b)∗aa, (c) R = (aa+ b)∗, (d) R = b∗(a+ b).

Problem 11.2. Convert the given regular expression R into an ε-NFA.

(a) R = a+ a∗b, (b) R = ba+ a∗ + b∗, (c) R = (aa+ b)∗, (d) R = ε+ aa+ b∗.

Problem 11.3. Convert the given ε-NFA B into a regular expression.

(a) ����

��������

-R

b

�a

q1 q2

automaton B

(b) ��������

����

��������

-R

aR

b

a

q1 q2 q3

automaton B

(c) ����

��������

��������

-R

a

b

�b

q1 q2 q3

automaton B

(d) ����

����

��������

����

-R

bR

a,ε

Ra

i

ε

q1 q2 q3 q4

automaton B

A Table of Chosen Languages over the Alphabet Σ = {a, b}

Language L1 . . . {waa;w ∈ Σ∗},Language L2 . . . {w ∈ Σ∗; #a(w) = 2},Language L3 . . . {wbai;w ∈ Σ∗ ∧ i=0, 2, 4, 6, . . .},Language L4 . . . {w ∈ Σ∗;#a(w) = 0, 2, 4, 6, . . .},Language L5 . . . {bix;x ∈ {a, b} ∧ i = 0, 1, 2, 3, . . .},Language L6 . . . {xwx;x ∈ {a, b} ∧ w ∈ Σ∗}.

26

TOPIC 12 - The Grammar of a Deterministic Finite Automaton

Grammars4 are the fourth way of describing formal languages. A new feature is that grammars candescribe also languages that are not regular. But, first of all, we start with regular languages.

Example 12.1. The automaton A over the alphabet Σ = {a, b}is given by its state diagram at the right. It consists of:

• Q = {q0, q1, q2} . . . the state set, q0 the start state,F = {q1} the set of accept states of A,

• δ : Q × Σ → Q . . . the transition function (see the diagram atthe right):

����

��������

����

-R

bR

a

a, b

a

b

q0 q1 q2

automaton A

Now, for the automaton A, we introduce, instead of the statesymbols q0, q1, q2, the variable symbols S,X, Y and we describethe grammar G as follows:

• Σ = {a, b} . . . the alphabet of terminals,

• Π = {S,X, Y } the alphabet of nonterminals - variables,S . . . a chosen variable called the start symbol of G,

• P . . . the set of production rules or productions of G:(1) S −→ aS | bX(2) X −→ bX | aY | ε(3) Y −→ bX | aX

����

��������

����

-R

bR

a

a, b

a

b

S X Y

modified automaton A

The process of creating a string w by means of the production rules is called the derivation of thestring w. In the record of a derivation process the symbol ⇒ means one step of this procedure (thenumber of the production rule used can be placed to it, i. e. S ⇒(i)); to express a chain of stepsas a whole we will use the iteration convention, i. e. ⇒∗ (we can say ‘gradually becomes’). Thederivation process always starts from S and the notation S ⇒∗ w means that the result was thestring w.

Now, we will show the derivation of the string abbbab using the rules of G above

(the strings appearing during the derivation process belong to (Σ ∪Π)∗).

We begin with the start symbol S and we use the production (rule) (1) - we get the string aS.Now, in the string aS the nonterminal S is replaced by the string bX (production (1)) and we getabX. Further, we apply production (2) and we get abbX, etc. . . . After 6 steps, we have the stringabbbabX and we finally replace the variable X by the empty string ε (production (2)) - we havegot the string abbbab.

The record of a derivation process is then:

S ⇒(1) aS ⇒(1) abX ⇒(2) abbX ⇒(2) abbbX ⇒(2) abbbaY ⇒(3) abbbabX ⇒(2) abbbab or S ⇒∗

abbbab.

The grammar G generates the language L(G) = {w ∈ Σ∗;S ⇒∗ w}. We have shown that abbbab ∈L(G).

4In natural languages the word ‘grammar’ means the system of rules. For example, the rule for creatinga simple sentence in the English language is known as SVOMPT, i.e. the order of words is required asSubject -Verb - Object - Manner - Place - Time, or who - does - what - how - where - when

27

The derivation of the string abbbab from the start symbolS using the production rules from P can be described bymeans of a planted tree, that is called the parse tree (seethe picture at the right). In this tree, the root is labeledby the start symbol S, each interior node is labeled by avariable, each leaf is labeled by a terminal and the lastone by ϵ. In each step of the derivation, we step down;gradually, we create the string from left to right. Afterthe whole parse tree has been created we ’read‘ all itsleaves from left to right and we get the string abbbab.

Note. If the final string of symbols contains some non-terminals, then some leaves of the parse tree are labeledby nonterminals. Such string does not belong to L(G), ofcourse.

����@@����

@@����@@����

@@����@@����

@@����

������������

������������

������������

������

S

S

X

X

X

Y

X

a

b

b

b

a

b

ε

PROBLEMS TO SOLVE

Problem 12.1. Two deterministic finite automata A1,A2 over the alphabet Σ = {a, b} are givenby their transition diagrams.

A1

��������

����

-R

a

Ia,b

�b

q0 q1

A2

����

����

��������

-R

b

�aR

b

Y

a

a

b

q0 q1 q2

(i) For each automaton describe its grammar.

(ii) For each automaton choose a string w of length 6 from its language and show its derivationby means of a grammar. Also sketch a parse tree.

Problem 12.2. Sketch a transition diagram of a deterministic finite automaton A if its grammarG is given as

(a) • Σ = {a, b} . . . terminals,

• Π = {S,X} . . . nonterminals,

• P . . . production rules:(1) S −→ aX | bS | ε(2) X −→ aS | bS

(b) • Σ = {a, b, c} . . . terminals,

• Π = {S,X} . . . nonterminals,

• P . . . production rules:(1) S −→ aS | bX | cX(2) X −→ aS | bS | cX | ε

(c) • Σ = {a, b, c} . . . terminals,

• Π = {S,X, Y } . . . nonterminals,

• P . . . production rules:(1) S −→ aS | bS | cY(2) X −→ aX | bY | cY | ε(3) Y −→ aY | bX | cY

(d) • Σ = {a, b} . . . terminals,

• Π = {S,X, Y, Z} . . . nonterminals,

• P . . . production rules:(1) S −→ aZ | bX(2) X −→ aX | bY | ε(3) Y −→ aX | bZ | ε(4) Z −→ aS | bY

28

TOPIC 13 - Context-free Grammars and Pushdown Automata

Every context-free grammar G consists of:

• Σ = {a, b, c, . . .} the alphabet of terminals,

• Π = {S,X, Y, Z . . .} the alphabet of nonterminals - variables, S . . . the start variable,

• V = Σ ∪Π = {a, b, c, . . . , S,X, Y, Z . . .} the complete alphabet of G,

• P . . . k productions (1), . . . , (k) of G of the form:

(i) X −→ α, where X ∈ Π and α ∈ V ∗, i.e. X is a nonterminal, α is any string that cancontain both terminals and nonterminals.

In the record of a derivation process, the symbol ⇒(i) means one step of this procedure (the numberof the production rule is (i)); to express a chain of steps we will use the iteration convention, i. e.⇒∗ (we can say ‘gradually becomes’). The language generated by G is defined as

L(G) = {w ∈ Σ∗;S ⇒∗ w}.

The derivation of a string w ∈ L(G) begins from the start symbol S and then it uses the productionrules from P . It can be described by means of a planted tree, that is called the parse tree. In thistree, the root is labeled by the start symbol S, each interior node is labeled by a variable, each leafis labeled by a terminal. In each step of the derivation of the form X ⇒(i) α we step down andcreate the new string from left to right. After the whole parse tree has been created we ’read‘ allits leaves from left to right and we get the final string w (the result).

Example 13.1. The grammar G is described as:

Σ = {a, b},Π = {S,X, Y },P . . . the list of productions of G is

(1) S −→ aSb | bX | ε(2) X −→ aX | Y X | b(3) Y −→ bb

A derivation of string aababbabbb using the rules of G:We work with strings over the total alphabet a, b, S,X, Y . Theparse tree is at the right.

We begin with the start symbol S; we applied the rule (1) threetimes, then the rule (2) twice, then the rule (3), and, finally, therule (2) twice.

The record of a derivation process of the string aababbabbb is:

S ⇒(1) aSb ⇒(1) aaSbb ⇒(1) aabXbb ⇒(2) aabaXbb ⇒(2)

aabaY Xbb ⇒(3) aababbXbb ⇒(2) aababbaXbb ⇒(2) aababbabbb,or S ⇒∗ aababbabbb; so, we showed that aababbabbb ∈ L(G).

����S

������a ����

S@@����

b������

a ����S

@@����b

������b

@@����X

������a

@@����X

������Y

@@����X

������b ����

b ����a

@@����X

����b

29

The Pushdown Automaton Description

In general, it is a non-deterministic automaton with ε-transitions. Moreover, it is equipped with a simple internal me-mory, so called stack. The stack is an array unlimited from theleft. From the left, we can insert the strings into the stack orremove the strings from the stack.In the picture, first, the sample stack contained the string aXbb,then the symbol a was removed, and, finally, the string bXYwas inserted.

a X b b

X b b

b X Y X b b

In the pushdown automaton Z, the transition function δ, in every step of computation, reads onesymbol from the output string and makes a decision based on the current state of the automatonand the first symbol of the stack. Namely, δ determines the new state of the automaton and replacesthe first symbol of the stack by a new string (in a non-deterministic automaton, there are morepossibilities).As a result of one computational step we have: the new input string is one symbol shorter or thesame (in the case of ϵ-transition), the new content of the stack and the new state of the automaton.

An accepting computation must finish with the empty input string and the empty stack (so calledacceptance by empty stack). All the accepted strings form the language L(Z).

Now, we will explain how, for every context-free grammar G, to construct a pushdownautomaton ZG such that L(ZG) = L(G):

The context-free grammar G has Σ = {a, b, c, . . .} (terminal alphabet), Π = {S,X, Y, Z . . .} (non-terminal alphabet),S . . . the start variable and V = Σ ∪Π is the total alphabet.P is the list of k productions: (i) X −→ α, kde X ∈ Π a α ∈ V ∗.

The pushdown automaton, denoted by ZG , has only one state q. The transition function δ will, forevery couple (t, z) where t ∈ Σ and z ∈ V (the first symbol of the stack), determine possible stringsγ ∈ δ(t, z) on which the symbol z can be changed. In more details:

For every production rule X −→ α, there will be α ∈ δ(ε,X) and, further, for every terminal t,there will be δ(t, t) = ε.

Example 13.2. The grammar G:Σ = {a, b},Π = {S,X, Y },P . . . the productions are(1) S −→ aSb | bX | ε(2) X −→ aX | Y X | b(3) Y −→ bb

The transition function ZG is given by the table:

δ S X Y a b

ε aSb | bX | ε aX | Y X | b bb - -

a - - - ε -

b - - - - ε

A derivation of the string w = ababbbb using the grammar G is:

S ⇒(1) aSb ⇒(1) abXb ⇒(2) abaXb ⇒(2) abaY Xb ⇒(3) ababbXb ⇒(2) ababbbb,

or S ⇒∗ ababbbb; we showed that ababbbb ∈ L(G).

An accepting computation on the string w = ababbbb by the pushdown automaton ZG is:

[the configurations are the couples (input string, stack string); the start configuration is (w, S)]

(ababbbb, S) ⊢ε (ababbbb, aSb) ⊢a (babbbb, Sb) ⊢ε (babbbb, bXb) ⊢b (abbbb,Xb) ⊢ε (abbbb, aXb) ⊢a

(bbbb,Xb) ⊢ε (bbbb, Y Xb) ⊢ε (bbbb, bbXb) ⊢b (bbb, bXb) ⊢b (bb,Xb) ⊢ε (bb, bb) ⊢b (b, b) ⊢b (ε, ε) or(ababbbb, S) ⊢∗ (ε, ε); we showed that ababbbb ∈ L(ZG).

30

SAMPLE EXERCISESExercise 13.1. The grammar G with Σ = {a, b},Π = {S,X} isdescribed by the list of productions at the right.

(i) Find a string w ∈ L(G) such that #(w) = 10, show its derivationand construct a parse tree describing it.

(ii) From the list of languages K1 through K7 at the bottom of thispage choose the language L(G). Justify your answer.

(1) S −→ aSa | X(2) X −→ X | bX | ε

Solution. (by Jirı HOMAN)

ad (i) w = aaaabbaaaa, its derivation in G is:

S ⇒(1) aSa ⇒(1) aaSaa ⇒(1) aaaSaaa ⇒(1) aaaaSaaaa ⇒(1)

aaaaXaaaa ⇒(2) aaaabXaaaa ⇒(2) aaaabbXaaaa ⇒(2) aaaabbaaaaand the parsing tree is sketched at the right.

ad (ii) One can easily see that each string from L(G) is of the formaibjai thus L(G) = K4.

����S

������a ����

S@@����

a

������a ����

S@@����

a

������a ����

S@@����

a

������a ����

S@@����

a

����X

������b

@@����X

������b

@@����ε

Exercise 13.2. Transform the grammar G from exercise 13.1. intoa pushdown automaton and then perform the calculation of thisautomaton on the string w = aaaabbaaaa.

Solution. The transition table of the pushdown automaton is shown at the right.The calculation is:(aaaabbaaaa, S)⊢ε(aaaabbaaaa, aSa)⊢a(aaabbaaaa, Sa)⊢ε

(aaabbaaaa, aSaa)⊢a(aabbaaaa, Saa)⊢ε(aabbaaaa, aSaaa)⊢a

(abbaaaa, Saaa)⊢ε(abbaaaa, aSaaaa)⊢a(bbaaaa, Saaaa)⊢ε

(bbaaaa, bXaaaa)⊢b(baaaa,Xaaaa)⊢ε(baaaa, bXaaaa)⊢b

(aaaa,Xaaaa)⊢ε(aaaa, aaaa)⊢a(aaa, aaa)⊢a(aa, aa)⊢a(a, a)⊢a(ε, ε).

δ S X a b

ε aSa | X bX | ε - -

a - - ε -

b - - - ε

PROBLEMS TO SOLVE

Problem 13.1. The grammar G with Σ = {a, b},Π = {S,X} is described by the list of productions.From the list of languages K1 through K7 at the bottom of this page choose the language L(G).Justify your answer.

(a) (1) S −→ aSb | X(2) X −→ aX | Xb | ε

(b) (1) S −→ aSb | X(2) X −→ Xb | ab | ε

(c) (1) S −→ aX | ε(2) X −→ Sb

A Table of Chosen Languages over the Alphabet Σ = {a, b}

Language K1 . . . {wwR;w ∈ Σ∗},Language K2 . . . {w ∈ Σ∗;w = wR},Language K3 . . . {w ∈ Σ∗; #a(w) = #b(w)},Language K4 . . . {aibjai; 0 ≤ i, 0 ≤ j},Language K5 . . . {aibj ; 0 ≤ i, 0 ≤ j},Language K6 . . . {aibj ; 0 ≤ i ≤ j},Language K7 . . . {aibi; 0 ≤ i}.

31

TOPIC 14 - Turing5 Machines

A Turing machine proceeds in steps - its head scans the cells of the tape containing the input string.then it always moves right or left.At the beginning, the tape-head is positioned at the first (i.e. the leftmost) symbol of the inputstring and the start state is q0. In each step, the machine changes its state, but also rewrites thecurrent symbol on the tape. Then the head is moved one cell right (R) or left (L). The configurationof the machine is a couple of the form (state, string). On the current string, we always underlinethe position of the head (before it reads).Each computation step is of the form (q, w) ⊢ (q′, w′). If the computation can not continue, themachine halts. It can also happen that the machine never stops - then we say that the machine iscycling).

Example 14.1. Turing machine T1 is acting as anacceptor of the regular language L = {x1x2 . . . xn ∈{a, b}∗;n≥ 2∧x2=b}, i.e. the set of all strings over Σ = {a, b}with the second symbol b .

T1 is descried by its state diagram at the right. It consistsof:

• Q = {q0, q1, q2, q3} . . . the state set, q0 the start state,• Σ = {a, b} . . . the input alphabet,• Γ = {B, a, b} . . . the tape alphabet (B is the blank sym-bol),• F = {q3} . . . the set of accept states,• δ : Q×Γ → Q×Γ×{L,R} . . . the partial transition function(the table) - it describes the transitions, the writing and themoves of the head. In the diagram, the transition arrow islabeled by the input symbol in front of the slash, and thecouple symbol-to-write, move after the slash.

����

������������

����

-qB/aR

-b/bR

�a/aR

-B, a/aR

a/aR

?b/bR

q0 q1 q2

q3

Turing machine T1

a b B

→ q0 q1, a, R q1, b, R q2, a, R

q1 q2, a, R q3, b, R q2, a, R

q2 q2, a, R − −q∗3 − − −

A sample computation of the Turing machine T1 on the string w = abaa in steps:

The first step: the start state is q0, the head is positioned on the leftmost symbol. The head readsthe symbol (i.e. a), re-writes it to a, moves one cell right (R), and the machine changes its state toq1 (the transition function says δ(q0, a) = q1, a, R).Next step: the head reads the symbol (i.e. b), re-writes it to b, moves one cell right (R), and themachine changes its state to q3 (the transition function says δ(q1, b) = q3, b, R).The calculation finishes - from q3 we can not continue. The string abaa is accepted - the machinehalted in an accept state. The complete record of the calculation:

(q0, abaa) ⊢ (q1abaa) ⊢ (q3, abbb), or (q0, abaa) ⊢∗ (q3, abaa). The string w = abaa is accepted.

Another computation - the input string is baabb: (q0, baabb) ⊢ (q1, baabb) ⊢ (q2, baabb) ⊢ (q2, baabb),or (q0, baabb) ⊢∗ (q2, baabb). The string w = baabb is not accepted - the machine halted in stateq2 /∈ F .

5Alan M. Turing (1912 - 1954) was an English computer scientist, mathematician, logician, cryptanalystand theoretical biologist. He was highly influential in the development of theoretical computer science,providing a formalisation of the concepts of algorithm and computation. Turing is widely considered to bethe father of theoretical computer science and artificial intelligence.

32

Example 14.2. Turing machine T2 is acting as an acceptorof the non-regular language L = {0n1n; n ≥ 1}. It is describedby its state diagram at the right. It consists of:

• Q = {q0, q1, q2, q3, q4} . . . the state set, q0 the start state,• Σ = {0, 1} . . . the input alphabet,• Γ={B, 0, 1, X, Y } . . . the tape alphabet (B the blank s.),• F = {q4} . . . the set of accept states,

����

��������

����

��������

-i

X/XR

-0/XR

-B/BR

@@@@R

Y/Y R

-1/Y L�

0/0R, Y/Y R

0/0L, Y/Y L

*Y/Y R

q0 q1 q2

q3 q4

Turing machine T2

• δ : Q×Γ → Q×Γ×{L,R} . . . the partial transition function;

all values of the function δ: δ(q0, 0) = q1, X,R, δ(q0, Y ) = q3, Y,R, δ(q1, 0) = q1, 0, R, δ(q1, Y ) =q1, Y, R, δ(q1, 1) = q2, Y, L, δ(q2, 0) = q2, 0, L, δ(q2, Y ) = q2, Y, L, δ(q2, X) = q0, X,R, δ(q3, Y ) =q3, Y, R, δ(q3, B) = q4, B,R.

A sample computation of the Turing machine T2 on the string w = 0011 in steps:

(q0, 0011) ⊢ (q1, X011) ⊢ (q1, X011) ⊢ (q2, X0Y 1) ⊢ (q2, X0Y 1) ⊢ (q0, X0Y 1) ⊢ (q1, XXY 1) ⊢(q1, XXY 1) ⊢ (q2, XXY Y ) ⊢ (q2, XXY Y ) ⊢ (q0, XXY Y ) ⊢ (q3, XXY Y ) ⊢ (q3, XXY Y B) ⊢(q4, XXY Y BB), or(q0, 0011) ⊢∗ (q4, XXY Y BB). T2 halted in state q4 - the string is accepted.

Another computation in T2: (q0, 0111)⊢(q1, X111)⊢(q2, XY 11)⊢(q0, XY 11)⊢(q3, XY 11),

or (q0, 0111) ⊢∗ (q3, XY 11). T2 halted in state q3 - the string 0111 is not accepted.

The original Turing idea was that the machine would work as a function, i.e. the input will be oneobject (a string) nd the result of the computation would be a new object.

Example 14.3. Turing machine T3 acting as a functionf : {a, b}∗ −→ {a, b}∗, where f(w) = wR.

• Q={q0, q1, q2, q3, q4, q5, q6, q7}, F={q7}, Σ={a,b},• Γ = {B, a, b,#,&}, the transition function table is:

a b B # &

→ q0 q0, a, R q0, b, R q1,&, L − −q1 q2, a, R q3, b, R q2, a, R q2, a, R −q2 q2, a, R q2, b, R q4, a, L q2,#, R q2,&, R

q3 q3, a, R q3, b, R q5, b, L q3,#, R q2,&, R

q4 q4, a, L q4, b, L − − q1,&, L

q5 q5, a, L q5, b, L − − q1,&, L

q6 − − − q6, B,R q7, B,R

q∗7 − − − − −

����

����

����

����

����

����

����

��������

-

RB/BR

?B/aL

?B/bL

�a/#R -b/#R

��

��

B/&L

��

���

&/&L @@

@@I

&/&L

��

��

&/BR

�a/aRb/bR

�a/aRb/bR

#/#R&/&R

#/#L

W a/aRb/bR

#/#R&/&R

Oa/aLb/bL

O#/BR

� a/aLb/bL

q0

q2 q1 q3

q4 q6 q5

q7

Turing machine T3

Explaining how the machine works: first, it repeats moving until it reaches the right end of thestring and, right of it, it records the symbol &. Then, it repeats moving left and it deletes theleftmost symbol (using #); a copy of the deleted symbol will be written on the right end of thecurrent string. Finally, all the helper symbols # and the symbol & are replaced by the emptysymbol B. A sample computation:

(q0, abb) ⊢ (q0, abb) ⊢ (q0, abb) ⊢ (q0, abbB) ⊢ (q1, abb&) ⊢ (q3, ab#&) ⊢ (q3, ab#&B) ⊢(q5, ab#&b) ⊢ (q1, ab#&b) ⊢ (q1, a&b#&b) ⊢ (q3, a##&b) ⊢ (q3, a##&b) ⊢ (q3, a##&b) ⊢(q3, a##&bB) ⊢ (q5, a##&bb) ⊢ (q1, a##&bb) ⊢ (q1, a##&bb) ⊢ (q1, a##&bb) ⊢(q1, a##&bb) ⊢ (q2,###&bb) ⊢ (q2,###&bb) ⊢ (q2,###&&bb) ⊢ (q2,###&bb) ⊢(q2,###&bb) ⊢ (q2,###&bbB) ⊢ (q4,###&bba) ⊢ (q4,###&bba) ⊢ (q4,###&bba) ⊢(q1,###&bba) ⊢ (q1,###&bba) ⊢ (q1,###&bba) ⊢ (q1, B###&bba) ⊢ (q6, B###&bba) ⊢(q6, BB##&bba) ⊢ (q6, BBB#&bba) ⊢ (q6, BBBB&bba) ⊢ (q7, BBBBBbba).

The input string was w = abb and the machine halted when wR = bba was on the tape.

Another sample computation - on the empty string ε:

(q0, B) ⊢ (q1, B&) ⊢ (q6, B&) ⊢ (q7, BBB). The result is ε again, i.e. εR = ε.

33

Example 14.4. Turing machine T4 is acting as the functionthat sums any two non-negative whole numbers, i.e. f(n1, n2) =n1 + n2. The numbers 0, 1, 2, . . . are represented by the strings|, ||, |||, . . .. Then the sum 1 + 3 = 4 will be performed as||+|||| ⊢∗ |||||.• Q = {q0, q1, q2, q3}, q0 start state, q3 final state,• Σ = {|,+} . . . input alphabet, Γ = {B, |,+}.

A sample computation ||+|||| ⊢∗ ||||| in steps:

(q0, ||+ ||||) ⊢ (q1, B|+ ||||) ⊢ (q2, BB+||||) ⊢ (q3, BB|||||).

The picture shows what was happening on the tape (we can seethe position of the head in each step):

����

������������

����

- -|/BR -|/BR

��

�+/|R

|/|R

?+/BR

q0 q1 q2

q3

Turing machine T4

| + B

→ q0 q1, B,R − −q1 q2, B,R q3, B,R −q2 q2, |, R q3, |, R −q∗3 − − −

· · · · · ·B B | | + | | | | B B → · · · · · ·B B B | + | | | | B B →

· · · · · ·B B B B + | | | | B B → · · · · · ·B B B B | | | | | B B

SAMPLE EXERCISESExercise 14.1. Turing machine T is described by its table oftransition function Σ = {0, 1}, Γ = {B, 0, 1}.(i) Sketch the transition diagram for the machine T2.(ii) Describe the computations on the strings 111 and 101.

0 1 B

→ q0 q0, 0, R q1, 1, R q2, 0, L

q∗1 q1, 0, R q0, 1, R −q2 q0, 0, R q0, 1, R q0, B,R

(iii) Explain how the machine works and then describe the language it accepts.

Solution. (by Jan PREDOTA)

ad (i) The transition diagram of T is at the right.

ad (ii) The computation on the string 111:(q0, 111) ⊢ (q1, 111) ⊢ (q1, 111) ⊢ (q0, 111B) . . . accepted,

the computation on the string 101:(q0, 101) ⊢ (q1, 101) ⊢ (q0, 101) ⊢ (q1, 101B). . . cycling.

��������

����

����?

1/1R

0/0R, 1/1R,B/BR

-1/1R

-B/0L

0/0R

0/0R

q1 q0 q2

Turing machine T

ad (iii) A string w is accepted only if #1(w) is odd, i. e. L(T ) = {w ∈ Σ∗; #1(w) is odd}.

PROBLEMS TO SOLVE

Problem 14.1. (A review of Examples 14.1 through 14.4)

(a) In Turing machine T1 from Example 14.1., perform the calculation on the input string(i) b (ii) ababab (iii) ba (iv) aaaaa

(b) In Turing machine T2 from Example 14.2., perform the calculation on the input string(i) ε (ii) 000111 (iii) 101 (iv) 01

(c) In Turing machine T3 from Example 14.3., perform the calculation on the input string(i) a (ii) abab (iii) aabaa (iv) aaaa

(d) In Turing machine T4 from Example 14.4., perform the calculation on the input task(i) 3 + 1 (ii) 0 + 4 (iii) 3 + 1 1 + 1

34

TOPIC 15 - Chomski6 Hierarchy of Formal Languages over Σ

'

&

$

%

'

&

$

%

'

&

$

%

'

&

$

%��

��All regular languages over Σ

(generated by the grammars of type 3)

All context-free languages over Σ(generated by the grammars of type 2)

All context sensitive languages over Σ(generated by the grammars of type 1)

The languages over Σ generated by the all grammars(i. e. by the grammars of type 0)

All languages over Σ

Explanation.

The conventional notations for any grammar G:Σ = {a, b, c, . . .} the alphabet of terminals, Π = {S,X, Y, Z . . .} the alphabet of non-terminals(variables),

S is the root of the grammar, V = Σ ∪ Π = {a, b, c, ..., S,X, Y, Z . . .} the total alphabet of thegrammar,

Σ∗ = u, v, w, . . . the strings of terminals (they create languages), V ∗ = α, β, γ, . . . the stringsconsisting of both terminals and non-terminals (used in production rules), production rules are ofthe form α −→ β, where the string α must contain at least one non-terminal.

If (i) α −→ β is the i-th production of the grammar G then derivation step applied on the stringγ1αγ2 is defined as γ1αγ2 ⇒(i) γ1βγ2. The symbol ⇒∗ is used for the iteration of derivation steps.

The language generated by the grammar G is defined as L(G) = {w ∈ Σ∗;S ⇒∗ w}.

Definition(Chomski hierarchy of grammars).

• The grammar of type 0, is the grammar without any limitation (phrase grammar) on the rules.The language generated by this grammar is called the language of type 0.

• The grammar of type 1 (context sensitive, CS) is the grammar satisfying the condition:

all productions are of the form γ1Xγ2 −→(i) γ1βγ2, where X ∈ Π, β, γ1, γ2 ∈ V ∗, β = ε (with theexception S −→ ε).

The language generated by this grammar is called the language of type 1, i.e. context sensitive.

• The grammar of type 2 (ontext free, CF) is the grammar satisfying the condition:

all productions are of the form X −→ β, where X ∈ Π, β ∈ V ∗, β = ε (with the exception S −→ ε).

The language generated by this grammar is called the language of type 2, i.e. context free.

• The grammar of type 3 (regular) is the grammar satisfying the condition:

all productions are of the form X −→ aY or X −→ a, kde X,Y ∈ Π, a ∈ Σ.

The language generated by this grammar is called the language of type 3, i. e. regular.

6Avram Noam Chomski [tchomski] (born in 1928) is an American linguist, philosopher, cognitive scien-tist, historian, social critic, and political activist. Sometimes described as“the father of modern linguistic”,Chomski is also a major figure in analytic philosophy, and one of the founders of the field of cognitive science.He is Institute Professor Emeritus at the Massachusetts Institute of Technology (MIT), where he has workedsince 1955, and is the author of over 100 books on topics such as linguistics, war, politics, and mass media.

35

Example 15.1. The language L(G1) = {aibici; i ∈ N} over Σ = {a, b, c} can be described by thegrammar G1:

(1) S −→ aSY Z | aZY(2) ZY −→ Y Z(3) aY −→ ab(4) bY −→ bb(5) bZ −→ bc(6) cZ −→ cc

This language is known as a non-context-free language (so called”pumping lemma for context-free

grammars“ can be used for the proof).

Example 15.2. The language L(G2) = {aibi; i ∈ N} over Σ = {a, b} can be described by thegrammar G2:

(1) S −→ aSb | abThis language is known as a context-free language which is not regular (so called

”lemma on the

right congruence on regular languages“ can be used for the proof).

Example 15.3. The language Lp = {ai; i is a prime number } over Σ = {a} can not be describedby any grammar.

36