Theoretical Computer Science - dmg.tuwien.ac.atdmg.tuwien.ac.at/hetzl/teaching/tcs_2018.pdf ·...

Theoretical Computer Science

Stefan [email protected]

Vienna University of Technology

Summer Term 2018

[email protected]

Contents

1 Automata theory 3

1.1 Formal languages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

1.2 Finite automata . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

1.3 Formal grammars . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

2 Computability theory 15

2.1 Partial recursive functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

2.2 Turing machines and the Church-Turing thesis . . . . . . . . . . . . . . . . . . . 19

2.3 Undecidability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

3 Computational complexity theory 25

3.1 Nondeterministic Turing machines . . . . . . . . . . . . . . . . . . . . . . . . . . 25

3.2 NP-completeness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

3.3 More NP-complete problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29

Introduction

These lecture notes give a short introduction into the following three areas:

1. Automata theory

2. Computational complexity theory

3. Computability theory

These three areas have in common that they are concerned with measures for the complexityof subsets of a fixed countably infinite set (for example, the natural numbers). In this context,“complexity” is understood as computational complexity, i.e., for such X Ď N: how difficult isit to answer the following question: given an n P N, is n P X?

The difficulty of answering this question can differ strongly depending on the set X Ď N. If,for example, X is the set of even numbers, then, given an n P N in decimal notation one canimmediately answer if n P X by checking whether the last digit of n is even. On the other hand,if X is the set prime numbers P, it is also possible to decide if a given n P N is in P, howeverthis will clearly be more difficult in general.

These three areas differ in the complexity of the sets under consideration and consequently alsoin their methods and questions of importance. In the above list they are sorted in ascendingorder on the complexity of the sets under consideration. However, more complex sets do notautomatically lead to more difficult mathematical questions.

These three areas have different historical roots: while automata theory has strong roots inlinguistics, it is nowadays primarily applied in computer science. The topic of computationalcomplexity theory is “efficient computability” in a wide sense and grew out quite directly fromcomputer science. Computability theory is a well established area of mathematical logic andconsiderably older than the other two.

For further reading, the following books can be recommended: [4], [1] (primarily for automatatheory), [3] (primarily for computational complexity theory), and [2] (for computability theory).

1

Chapter 1

Automata theory

1.1 Formal languages

We start this chapter about formal languages by defining some basic notions and operations.An alphabet is a finite set of symbols. We will use A,A1, . . . to denote an alphabet. As symbolswe will usually use a, b, c, 0, 1, . . . und as variables for symbols x, y, z, . . .. An A-word is a finitesequence of symbols from A. Often the alphabet is clear from the context, so that we simplyspeak about “words” instead of “A-words”. For notating a word we usually use w, v, u, . . .. Theempty sequence of symbols, the empty word, is written as ε. The set of all A-words is writtenas A˚. A language is a set L Ď A˚.

Definition 1.1. We define the following operations on words:

1. Concatenation: For words v “ x1 ¨ ¨ ¨xn and w “ y1 ¨ ¨ ¨ yk we define the word v ¨ w asx1 ¨ ¨ ¨xny1 ¨ ¨ ¨ yk. Often we will just write vw.

2. Power: For a word w and a k P N we define wk recursively by w0 “ ε and wk`1 “ wwk.

3. Length: For a word w “ x1 ¨ ¨ ¨xn we define |w| “ n.

Algebraically speaking, pA˚, ¨, εq is the monoid freely generated by A.

We continue by defining some operations on languages. Since a language is a set of words, wecan use set-operations such as union Y, intersection X, and complement ¨c as usual. Note thatthe complement is to be understood relatively to an alphabet since Lc “ A˚zL.

Definition 1.2. We define the following operations on languages:

1. Concatenation: For languages L1 and L2 we define the concatenation L1 ¨ L2 “ tw1w2 |

w1 P L1, w2 P L2u. As for words, also here we often just write L1L2.

2. Power: For a language L and a k P N we define Lk recursively by L0 “ tεu and Lk`1 “

L ¨ Lk.

3. Kleene-star1: For a language L we define L˚ “Ť

kě0 Lk.

We can now understand the notation A˚ for the set of all A-words as a special case of theKleene-star.

1named after Stephen C. Kleene (1909–1994)

3

Example 1.1. ta, bu˚ is the set of all words that consist of a and b only.

It is straightforward to show that xPpA˚q,Y,H, ¨, tεuy is a semiring, in particular: concatenationdistributes over union. As usual in semirings, also here we will work with the convention thatmultiplication ¨ binds stronger than addition Y.

Example 1.2. ptc, bu˚ptεu Y tau˚tbuqq˚ denotes the set of all ta, b, cu-words where every stringof a’s is followed by a b.

1.2 Finite automata

Example 1.3. Consider the language L “ tbu˚tauta, bu˚. Intuitively, we can represent L by thefollowing automaton:

// q0a //

b

��

q1

a,b

��

We will now make the notion of deterministic finite automaton precise

Definition 1.3. A deterministic finite automaton (DFA) is a tuple D “ xQ,A, δ, q0, F y whereQ is a finite set of states, A is an alphabet, δ : Qˆ AÑ Q is the transition function, q0 P Q isthe initial state, and F Ď Q is the set of final states.

Definition 1.4. We extend the transition function δ of a DFA by the following recursivedefinition to δ : QˆA˚ Ñ Q:

δpq, εq “ q

δpq, xwq “ δpδpq, xq, wq

Definition 1.5. The language accepted by a DFA D “ xQ,A, δ, q0, F y is defined as LpDq “tw P A˚ | δpq0, wq P F u.

Example 1.4. The language L “ tw P ta, bu˚ | w contains an even number of a’s and an even number of b’suhas the following natural DFA:

��

q00a

��

b

��

q10

a

HH

b

��

q01a

��

b

VV

q11

a

HH

b

VV

This automaton is in state qij iff the word read so far has congruent i modulo 2 many a’s andcongruent j modulo 2 many b’s.

Note that the transition function δ of a DFA has the property that, for every pq, xq P QˆA, itdefines exactly one δpq, xq P Q, the next state. All the transition diagrams we have seen so faralso have the same property and thus properly define a DFA. Sometimes it is helpful to use a

4

transition diagram which defines, for every pq, xq P QÂ, at most one next state. For example,we may want to recognise the language L “ tabw | w P ta, bu˚u by a diagram such as

// q0a // q1

b // q2

a,b

��

Note that, a priori, this diagram does not specify a DFA since the transitions for pq0, bq andpq1, aq are not defined. However, based on the intuition that a word is accepted iff it inducesa run from the initial state to a final state, one can consider the automaton as “stuck” if itsees a b in q0 (or an a in q1) and therefore it does not accept. This can be made precise byconsidering this diagram to specify a DFA which contains an additional state, a trap, where allthe undefined transition go and which can never be left:

// q0a //

b

��

q1b //

a

��

q2

a,b

��

qt

a,b

JJ

If, instead, we specify more than one possible next state for a pair pq, xq P Qˆ A we naturallyarrive at the notion of nondeterministic finite automaton.

Example 1.5. A natural automaton for the language L “ twab | w P ta, bu˚u would be

// q0

a,b

��a // q1

b // q2

where the loop of the state q0 is used to read w P ta, bu˚ and the states q1 and q2 for readingthe symbols a and b at the end of the word. Note that this diagram does not describe a DFAsince the value of δpq0, aq is not unique.

Definition 1.6. A nondeterministic finite automaton is a tuple N “ xQ,A,∆, q0, F y where: Qis a finite set of states, A is an alphabet ∆ Ď QÂˆQ is the transition relation, q0 P Q is theinitial state, and F Ď Q is the set of final states.

Definition 1.7. We extend the transition relation ∆ of an NFA to ∆ : QÂ˚ Ñ Q by:

pq, ε, qq P ∆ for all q P Q

pp, xw, rq P ∆ if pp, x, qq P ∆ and pq, w, rq P ∆

Definition 1.8. Let N “ xQ,A,∆, q0, F y be an NFA. The language accepted by N is LpNq “tw P A˚ | Dq P F s.t. pq0, w, qq P ∆u.

Note that in the above definition, the existence of a single accepting path is sufficient in orderto accept a word as element of the language, all other paths of that word may end in a non-finalstate.

Example 1.6. The diagram given in example 1.5 describes the NFA N “ xQ,A,∆, q0, F y whereQ “ tq0, q1, q2u, A “ ta, bu, ∆ “ tpq0, a, q0q, pq0, b, q0q, pq0, a, q1q, pq1, b, q2qu, and F “ tq2u.

5

Theorem 1.1. Let L Ď A˚. Then there exists a D with LpDq “ L iff there exists an NFA Nwith LpNq “ L.

Proof. The direction from left to right is trivial: every transition function can be considered asa transition relation. For the other direction, let N “ xQ,A,∆, q0, F y be an NFA. We definethe DFA D “ xPpQq, A, δ, tq0u, tS Ď Q | S X F ‰ Huy where

δpS, xq “ tq P Q | Dp P S s.t. pp, x, qq P ∆u.

We first show, by induction on |w|, that

δpS,wq “ tq P Q | Dp P S s.t. pp, w, qq P ∆u. (*)

For the empty word we have

δpS, εq “ S “ tq P Q | Dp P S s.t. pp, ε, qq P ∆u

and for an arbitrary word w “ xv we have

δpS, xvq “ δpδpS, xq, vq “IH tq P Q | Dp P δpS, xq s.t. pp, v, qq P ∆u

“ tq P Q | Dp1 P S, p P Q s.t. pp1, x, pq, pp, v, qq P ∆u

“ tq P Q | Dp1 P S s.t. pp1, xv, qq P ∆u.

Therefore we have

LpDq “ tw P A˚ | δptq0u, wq X F ‰ Hu

“p˚q tw P A˚ | Dq P F s.t. pq0, w, qq P ∆u

“ LpNq.

The construction of the above proof is also known as “subset construction” in the literaturebecause we build a DFA whose states are all subsets of states of the given NFA.

If a certain class of objects (such as a set of languages) can be described by different formalisms(such as DFAs and NFAs), it is usually an indication that one is dealing with a robust, andhence important, class. This is the case here and thus we define:

Definition 1.9. A language L Ď A˚ is called regular if there is a DFA D s.t. LpDq “ L, or,equivalently, there is an NFA N s.t. LpNq “ L.

Example 1.7. Continuing Example 1.5 we would like to find a DFA which accepts the samelanguage. In principle, it is possible to apply the subset construction of Theorem 1.1 literallywhich, in this example, would yield a DFA with 23 “ 8 states. However, in practice it is moreclever to only construct those states which are actually reachable from the starting state tq0u.These states can be determined systematically as follows: we create a table containing thestarting state and the symbols of the alphabet.

a b

tq0u

6

The cell in the i-th row and the j-th column will contain the set of states of the NFA which arereachable from a state of the label of the i-th row with the symbol that labels the j-th column.If this process creates a new state of the DFA, then it will be added to the existing row labels.The table is saturated, if every state of the DFA which appears in the table also appears as arow label of the table. Then the construction of the NFA is complete. The saturation of theabove table is:

a b

tq0u tq0, q1u tq0u

tq0, q1u tq0, q1u tq0, q2u

tq0, q2u tq0, q1u tq0u

Thus we have constructed the following DFA which is, by construction, equivalent to the NFAof Example 1.5:

// q0

b

��a // q01

a

b��

q02

a

TT

b

__

So we have seen that NFAs and DFAs describe the same class of languages, i.e., that they areequivalent extensionally. A question which is not answered by this result is whether they areequally concise as descriptional formalisms. It is clear that NFAs are at least as concise asDFAs since every DFA can be trivially considered an NFA. In the other direction, the subsetconstruction which turns an NFA into an equivalent DFA is obviously exponential. Can therebe another construction which has a better complexity? The following theorem shows that theanswer to this question is no.

Theorem 1.2. For all k ě 1 there is a language Lk Ď ta, bu˚ and an NFA Nk with k` 1 states

s.t. LpNkq “ Lk, but every DFA D mit LpDq “ Lk has at least 2k states.

Proof. Let Lk “ twav | w, v P ta, bu˚, |v| “ k ´ 1u. We define the NFA Nk as:

q0

a,b

��

a // q1a,b// q2

a,b// ¨ ¨ ¨

a,b// qk´1

a,b// qk

Let D “ xQ,A, δ, q0, F y be a DFA with LpDq “ Lk and |Q| ă 2k. The pigeon hole principleentails that there are words x “ x1 ¨ ¨ ¨xk and y “ y1 ¨ ¨ ¨ yk s.t.

δpq0, xq “ δpq0, yq and x ‰ y.

Hence there is i s.t. xi ‰ yi. Let w.l.o.g. xi “ a and yi “ b. We define

u “ xai´1, and v “ yai´1.

Now, u P Lk but v R Lk. Nevertheless we have δpq0, xwq “ δpq0, ywq for all w P ta, bu˚, inparticular δpq0, uq “ δpq0, vq. Contradiction.

7

Example 1.8. Occasionally it is convenient to allow a finite automaton to make a state transitionwithout reading a symbol. A class of automata that allows to do that are NFAs with ε-transitions. One can then accept the language consisting of the usual decimal notation for allintegers as follows: Let A “ t0, 1, . . . , 9,´u and define an NFA with ε-transitions as:

// q0´,ε//

0''

q11,...,9

// q2

0,...,9

��

q3

This automaton uses the ε-transition for the optional sign ´.

Definition 1.10. A nondeterministic finite automaton with ε-transitions (ε-NFA) is definedjust like an NFA as xQ,A,∆, q0, F y with the only difference that now ∆ Ď Qˆ pAY tεuq ˆQ.

The extended transition relation ∆ Ď QˆA˚ ˆQ as well as the language LpNq of an ε-NFA isdefined analogously to the corresponding notions of NFAs. We also have:

Theorem 1.3. A language L Ď A˚ is regular iff there is an ε-NFA N s.t. LpNq “ L.

Proof sketch. We proceed essentially as in the proof of Theorem 1.1. The subset constructionis extended to consider the ε-hull of a state q, i.e., the set of all states reachable from q byε-transitions. The details can be found, e.g., in [1].

Theorem 1.4. The class of regular languages has the following closure properties:

1. If L1 and L2 are regular, then L1 Y L2 is regular.

2. If L is regular, then Lc is regular.

3. If L1 and L2 are regular, then L1 X L2 is regular.

4. If L is finite, then L is regular.

5. If L1 and L2 are regular, then L1 ¨ L2 is regular.

6. If L is regular, then L˚ is regular.

Proof. We start with the following observation, which is helpful in the context of this proof.If L is regular, then there is a ε-NFA N with LpNq “ L which has a single final state and notransitions leaving this state. To see this, let N 1 be an arbitrary ε-NFA for L. We can thenobtain N from N 1 by adding a new final state qf and an ε-transitions from every final state ofN to qf .

We now turn to the proof of the theorem.

8

1. Let N1 and N2 ε-NFAs for L1 and L2 as above. We obtain an ε-NFA for L1 Y L2 by:

N1

ε

""// q0

ε44

ε**

qf

N2

ε

<<

2. Let D “ xQ,A, δ, q0, F y be a DFA for L. Then D1 “ xQ,A, δ, q0, QzF y is a DFA for Lc.

3. L1 X L2 “ pLc1 Y L

c2q

c.

4. Let w “ x1 ¨ ¨ ¨xn be a word with xi P A. Then the language twu is accepted by theautomaton:

// q0x1 // q1

x2 // ¨ ¨ ¨xn // qn

Then 1. entails that every finite language is regular.

5. Let N1 and N2 be ε-NFAs for L1 and L2 as above. Then we obtain an ε-NFA for L1 ¨ L2

by:

// N1ε // N2

6. LetN “ xQ,A,∆, q0, tqfuy be an ε-NFA for L as above. ThenN 1 “ xQ,A,∆Ytpqf , ε, q0qu, q0, tq0uy

is an ε-NFA for L˚.

Example 1.9. Using the above closure properties we can easily see that the language L “ tanbk |n, k ě 0u is regular. First note that L “ tau˚ ¨ tbu˚. Now, since tau and tbu are finite, they areregular by 4. Then, by 6., also tau˚ and tbu˚ are regular and hence, by 5., also tau˚ ¨ tbu˚.

On the other hand, we will soon see that the language L “ tanbn | n ě 0u is not regular. Inorder to show this we need the following lemma.

Lemma 1.1 (pumping lemma). Let L be regular. Then there is an n P N s.t. every w P L with|w| ě n can be written as w “ v1v2v3 where

1. v2 ‰ ε,

2. |v1v2| ď n, and

3. for all k ě 0: v1vk2v3 P L.

Proof. Let D “ xQ,A, δ, q0, F y be a DFA with LpDq “ L. Let n “ |Q|, let w P L with |w| “m ě n and w “ x1 ¨ ¨ ¨xm for xi P A. Define the path in D induced by w by: pi “ δpq0, x1 ¨ ¨ ¨xiqfor i “ 0, . . . ,m. Since D has only n states, the pigeon hole principle entails that there are

9

i, j P t0, . . . , nu with i ă j and pi “ pj . As a diagram, the path induced by w has the followingform:

// p0x1 // ¨ ¨ ¨

xi // pi��

xj`1// ¨ ¨ ¨

xm // pm

“ q0 “ pj P F

We define v1 “ x1 ¨ ¨ ¨xi, v2 “ xi`1 ¨ ¨ ¨xj and v3 “ xj`1 ¨ ¨ ¨xm. Then we obtain 1. from i ă j,2. from j ď n, and 3. from pi “ pj .

This lemma is a very useful tool for showing that a given language is not regular. Its name,“pumping lemma”, derives from the fact that the middle part v2 of the word w can be pumpedup arbitrarily.

Example 1.10. We can now use the pumping lemma for showing that L “ takbk | k ě 0u isnot regular. Let n be as in the pumping lemma for L and consider w “ anbn. Then there arewords v1, v2, v3 s.t. w “ v1v2v3. Since |v1v2| ď n, we have v1 “ ak and v2 “ al. But thenalso v1v

22v3 “ an`lbn P L. But since v2 ‰ ε, we have l ą 0 and hence n ` l ‰ n and therefore

an`lbn R L.

1.3 Formal grammars

A natural language, such as English, German, . . ., follows a grammar, i.e., a collection of ruleswhich defines the set of well-formed sentences. In this section we will see how we can makethis notion mathematically precise. Even though linguistics has been an important root for thedevelopment of automata theory, the types of grammars we will see here are not sufficientlycomplex to model natural languages. They are, however, sufficiently complex to model mostprogramming languages. Consequently this mathematical theory has a wealth of applications inareas of computer science such as compiler construction, the theory of programming languages,etc.

Example 1.11. The specification of the set of arithmetical expressions (such as, for example,5 ¨ p12` 7q) can be defined by the following rules:

E Ñ Z | E ` E | E ¨ E | pEq

Z Ñ D | DZ

D Ñ 0 | 1 | ¨ ¨ ¨ | 9

In order to make this idea precise, we define:

Definition 1.11. A context-free grammar (CFG) is a tuple xN,T, P, Sy where N is a finite setof nonterminal symbols, T is a finite set of terminal symbols with T XN “ H, P Ď NˆpNYT q˚

is a set of production rules, and S P N is the start symbol.

The role of the set of terminal symbols T is analogous to that of the alphabet A in the speci-fication of an automaton. Instead of pN,wq we will write N Ñ w for a production rule. Thisnotation is extended to N Ñ w1 | ¨ ¨ ¨ | wn as abbreviation for the set tpN,wiq | 1 ď i ď nu.

Example 1.12. The rules given in Example 1.11 are interpreted as the set of production rules ofa context-free grammar with N “ tE,Z,Du, T “ t0, 1, . . . , 9,`, ¨, p, qu and the starting symbolE.

10

Definition 1.12. Let G “ xN,T, P, Sy be a CFG. For all w,w1 P pN YT q˚ we define the 1-stepderivation relation as: w ùñG w1 iff w “ w1Aw2, w1 “ w1vw2 and A ùñ v P P . We define thederivation relation ùñ˚

G as reflexive and transitive closure of ùñG.

The language generated by G is LpGq “ tw P T ˚ | S ùñ˚G wu.

Definition 1.13. A language L Ď A˚ is called context-free if there is a CFG G s.t. LpGq “ L.

Definition 1.14. A CFG G “ xN,T, P, Sy is called right-linear if every production is of one ofthe following forms:

1. AÑ x for x P T ,

2. AÑ xB for x P T and B P N , or

3. AÑ ε.

Example 1.13. Let N “ tA,Bu, T “ ta, bu, P “

AÑ aA | aB,

B Ñ bB | b,

and G “ xN,T, P,Ay. Then G is right-linear and LpGq “ taibj | i, j ě 1u.

Theorem 1.5. A language L Ď A˚ is regular iff there is a right-linear grammar G s.t. LpGq “ L.

Proof. From left to right: Let N “ xQ,A,∆, q0, F y be an NFA for L. Define the grammarG “ xN,T, P, Sy by N “ Q, T “ A,

P “ tq Ñ ap | pq, a, pq P ∆u Y tq Ñ ε | q P F u,

and S “ q0. Then it is easy to show that LpGq “ LpNq.

From right to left: Let G “ xN,T, P, Sy be a right-linear grammars. Define an ε-NFA N “

xQ,A,∆, q0, F y by Q “ N Y tqfu, A “ T ,

∆ “ tpB, x,Cq | B Ñ xC P P u Y tpB, x, qfq | B Ñ x P P u Y tpB, ε, qfq | B Ñ ε P P u,

and F “ tqfu. Then it is again easy to show that LpNq “ LpGq.

The above proof shows clearly that the nonterminals of a right-linear grammar correspondclosely to the states of a nondeterministic automaton. The above result directly entails thatevery regular language is context-free. The other direction is not true.

Example 1.14. In Example 1.10 we have shown that L “ takbk | k ě 0u is not regular. However,L is generated by the following context-free grammar:

S Ñ aSb | ε

There is also a pumping lemma for context-free languages. Since this is a larger class of languagesits pumping lemma has a more complicated structure.

Lemma 1.2 (Pumping lemma for context-free languages). Let L be a context-free language.Then there is an n P N s.t. every w P L with |w| ě n can be written as w “ v1v2v3v4v5 s.t.

1. |v2v3v4| ď n,

11

2. v2v4 ‰ ε, and

3. for all k ě 0 also v1vk2v3v

k4v5 P L.

Without proof.

Example 1.15. L “ takbkck | k ě 0u is not context-free. Obtain n for L from the pumpinglemma for context-free languages and let w “ anbncn. Then w can be written as w “ v1v2v3v4v5

where |v2v3v4| ď n. Therefore v2v3v4 cannot contain both a and c. Assume that v2v3v4 doesnot contain a, then, by the pumping lemma, w1 “ v1v

22v3v

24v5 P L where w1 “ anbn

1

cn2

withn1, n2 ě n and: n1 ą n oder n2 ą n. Contradiction. The case where v2v3v4 does not contain cis analogous.

Theorem 1.6. The class of context-free languages has the following closure properties:

1. Every regular language is context-free.

2. If L1 and L2 are context-free, then L1 Y L2 is context-free.

3. If L1 and L2 are context-free, then L1 ¨ L2 is context-free.

4. If L is context-free, then L˚ is context-free.

5. The context-free languages are not closed under intersection.

6. The context-free languages are not closed under complement.

Proof. 1. Follows directly from Theorem 1.5.

2. Let G1 “ xN1, T1, P1, S1y and G2 “ xN2, T2, P2, S2y be CFGs for L1 and L2. W.l.o.g. weassume that N1 XN2 “ H. Then

G “ xN1 YN2 Y tSu, T1 Y T2, P1 Y P2 Y tS Ñ S1 | S2u, Sy

generates L1 Y L2.

3. Let G1 “ xN1, T1, P1, S1y and G2 “ xN2, T2, P2, S2y be CFGs for L1 and L2. W.l.o.g. weagain assume that N1 XN2 “ H. Then

G “ xN1 YN2 Y tSu, T1 Y T2, P1 Y P2 Y tS Ñ S1S2u, Sy

generates L1 ¨ L2.

4. Let G “ xN,T, P, Sy be a CFG that generates L. Then

G1 “ xN Y tS1u, T, P Y tS1 Ñ SS1 | εu, S1y

generates L˚.

5. Let L1 “ tanbnck | k, n ě 0u and L2 “ tanbkck | n, k ě 0u. We have already seenin Example 1.14 that L11 “ tanbn | n ě 0u is context-free. The language L21 “ tcu˚ isregular and thus, by 1., also context-free. Since L1 “ L11 ¨L

21, also L1 is context-free by 3.

Analogously we show that L2 is context-free. But now we have

L1 X L2 “ takblcm | k, l,m ě 0, k “ l, l “ mu “ takbkck | k ě 0u,

which we have shown in Example 1.15 to be not context-free.

12

6. Suppose the context-free languages would be closed under complement. Then, since L1X

L2 “ pLc1 Y L

c2q

c, they would also be closed under intersection. Contradiction.

There is also a presentation of the context-free languages by automata; the corresponding classof automata are called pushdown automata (PDAs). A PDA has, in addition to a finite numberof states, also a stack over a finite alphabet as memory. In a stack there are only two operations:pushing a new element on the top of the stack and popping the topmost element from the stack.Thus, a stack is a form of memory which is unbounded but has strongly restricted access. Wewill not treat these automata here, details can be found, e.g., in [1] and [4].

13

Chapter 2

Computability theory

A central root of computability theory (or recursion theory as it is also often called) is theso-called1 “decision problem” posed by D. Hilbert in 1928. This question is, even in the Englishliterature, often still referred to by its original German name “Entscheidungsproblem”. Incontemporary terminology it is the following question: is there an algorithm which, given aformula ϕ in first-order predicate logic, determines whether ϕ is valid. Such an algorithmwould be very useful, since almost2 every mathematical problem can be posed as a formula infirst-order predicate logic.

Hilbert’s decision problem has been solved negatively in 1936, independently by A. Turing andA. Church: there is no such algorithm. In a question like this we can again recognise an asym-metry that we see often in this course: in order to show that such an algorithm exists, it sufficesto present one (and to show that it actually solves the problem). However, in order to showthat no such algorithm exists we have to give a much more involved argument; it is necessaryto come up with a general mathematical model of the intuitive notion of an “algorithm” andthen to show that, in that model, there is no solution to this decision problem.

In this chapter we will see two such models, we will discuss their relationship and see the mostbasic notions, results, and proof techniques in computability theory.

2.1 Partial recursive functions

One approach to defining the set of functions which are computable (in the intuitive sense) is tostart “from below”: define some functions which are obviously computable, then define closureoperators which transform computable functions in computable functions.

Definition 2.1. The basic functions are:

1. the constant (nullary function) 0 P N,

2. the successor function s : NÑ N, x ÞÑ x` 1,

3. for all k ě 1, 1 ď i ď k, the projection function Pki : Nk Ñ N : px1, . . . , xkq ÞÑ xi.

All of the basic functions are obviously computable.

1Nowadays, the term “decision problem” has a more general meaning.2It would go beyond the scope of this course to elaborate on what “almost” means here, a course on mathe-

matical logic is better suited for discussing this topic.

15

Definition 2.2. Let f : Nn Ñ N, g1 : Nk Ñ N, . . . , gn : Nk Ñ N. Then the function obtainedby composition of f with g1, . . . , gn is

h : Nk Ñ N, x ÞÑ fpg1pxq, . . . , gnpxqq.

If f, g1, . . . , gn are computable, then so is h: in order to compute h, we first compute yi “ gipxqfor i “ 1, . . . , n which is possible by assumption and then we compute fpy1, . . . , ynq which is,again, possible by assumption.

Definition 2.3. Let f : Nk Ñ N and g : Nk`2 Ñ N. Then the function obtained by primitiverecursion of f and g is h : Nk`1 Ñ N defined by

hpx, 0q “ fpxq, and

hpx, y ` 1q “ gpx, y, hpx, yqq.

If f and g are computable then so is h. We argue, informally, by induction: let x P Nk, y P N.If y “ 0 then, by assumption, fpxq can be computed and thus hpx, 0q can be. If y ą 0, sayy “ y1 ` 1, we can compute z “ hpx, y1q by induction hypothesis and then we can computehpx, y1, zq by assumption.

Definition 2.4. A function f : Nk Ñ N is called primitive recursive if it can be obtained fromthe basic functions by a finite number of applications of the operators composition and primitiverecursion.

Example 2.1. Consider the functions f “ P11 : N Ñ N and g : N3 Ñ N, px, y, zq ÞÑ z ` 1. Then

g “ s ˝ P33. By primitive recursion of f and g we obtain the function h : N2 Ñ N defined by

hpx, 0q “ P11pxq “ x, and

hpx, y ` 1q “ gpx, y, hpx, yqq “ hpx, yq ` 1.

In other words, h is the addition of natural numbers which is hence primitive recursive.

In a similar way one can define multiplication using primitive recursion on addition, exponen-tiation by primitive recursion on multiplication and many more functions. At this point onemay start to wonder: are these all computable functions? did we miss some? The answer isnot obvious. In fact, it turns out that there are computable functions which are not primitiverecursive. We will analyse a concrete such example in the following.

Definition 2.5. The Ackermann function3 is a : N2 Ñ N, pp, nq ÞÑ appnq, defined by

a0pnq “ n` 1,

ap`1p0q “ app1q, and

ap`1pn` 1q “ appap`1pnqq.

We choose the notation appnq instead of app, nq because it will be useful to think of the ap asfunctions from N to N.

Lemma 2.1. For all p, n P N we have ap`1pnq “ an`1p p1q.

Proof. Induction on n: ap`1p0q “ app1q, ap`1pn` 1q “ appap`1pnqq “IH appa

n`1p p1qq “ an`2

p p1q.

3named after Wilhelm Ackermann (1896-1962)

16

The Ackermann function is computable. We argue by induction on p P N. If p “ 0, then apis the successor function which is clearly computable. For p ` 1 P N we compute ap`1pnq byiterating ap n` 1 times on 1.

Example 2.2. a0 is the successor function by definition, a1pnq “ an`10 p1q “ n` 2, and a2pnq “

an`11 p1q “ 1` pn` 1q ¨ 2 “ 2n` 3.

We will now show that a is not primitive recursive. The key to this result is to show that theAckermann functions grows faster than any primitive recursive function. We first collect a fewbasic properties of the Ackermann function in the following lemma:

Lemma 2.2. For all m,n, p, q P N we have:

1. appnq ą n,

2. n ă m implies appnq ă appmq,

3. p ă q implies appnq ă aqpnq,

4. appaqpnqq ď amaxtp,q´1u`2pnq.

Proof. For 1. we proceed by induction on p. It is clear for p “ 0, for p`1 note that, by inductionhypothesis, 1 ă app1q ă a2

pp1q ă ¨ ¨ ¨ ă an`1p p1q and hence ap`1pnq “ an`1

p p1q ą n` 1 ą n.

For 2. it suffices to show appnq ă appn ` 1q. This is clear for p “ 0, for p ` 1 we haveap`1pn` 1q “ appap`1pnqq ą

1. ap`1pnq.

For 3. it suffices to show appnq ă ap`1pnq. As before we have anp p1q ą n and hence ap`1pnq “appa

np p1qq ą

2. appnq.

For 4. let r “ maxtp, q ´ 1u. Note that r ě p, r ` 1 ě q. We have

appaqpnqq ď arpar`1pnqq “ arpan`1r p1qq “ an`2

r p1q,

ar`2pnq “ an`1r`1 p1q “ anr`1par`1p1qq “ anr`1pa

2rp1qq,

and an`2r p1q ď anr`1pa

2rp1qq.

We have a3pnq ą 2n for all n P N, a4pnq is greater than the n-fold iterated exponential, and soon. In fact, we obtain:

Lemma 2.3. Let h : Nm Ñ N be primitive recursive, then there is a p P N s.t. for all x P Nm:hpxq ă appmaxtxuq

Proof. Since h is primitive recursive, it is obtained by a finite number of compositions andprimitive recursions. We will proceed by induction on this number. For the basic functionsnote that 0 ă a0p0q “ 1, that spnq “ n ` 1 ă n ` 2 “ a1pnq, and that Pm

i pxq “ xi ă xi ` 1 ďa0pmaxtxuq.

For the case of composition, let f : Nn Ñ N, g1, . . . , gn : Nk Ñ N, and h : Nk Ñ N, x ÞÑfpg1pxq, . . . , gnpxqq. By induction hypothesis there is a p P N s.t. fpyq ă appmaxtyuq for ally P Nn and there are q1, . . . , qn P N s.t. gipxq ă aqipmaxtxuq for all x P Nk and all i P t1, . . . , nu.Now we have

hpxq “ fpg1pxq, . . . , gnpxqq ă appmaxtg1pxq, . . . , gnpxquq

17

and, by monotonicity of max and ap,

ă ap`

maxtaq1pmaxtxuq, . . . , aqnpmaxtxuqu˘

and, letting q “ maxtq1, . . . , qnu by monotonicity of a,

“ appaqpmaxtxuqq

and by Lemma 2.2/4,

ď amaxtp,q´1u`2pmaxtxuq

For the case of primitive recursion, let f : Nk Ñ N, g : Nk`2 Ñ N, x P Nk, y P N. Then wehave hpx, 0q “ fpxq and hpx, y` 1q “ gpx, y, hpx, yqq. From the induction hypothesis we obtaina p P N s.t. fpxq ă appmaxtxuq for all x P Nk and a q P N s.t. gpx, y, zq ă aqpmaxtx, y, zuq forall x P Nk and y, z P N. Let r “ maxtp, qu ` 1. At first, we show that

hpx, yq ă arpmaxtxu ` yq

by induction on y. For the induction base we have hpx, 0q “ fpxq ă appmaxtxuq ă arpmaxtxuq.For the induction step we have

hpx, y ` 1q “ gpx, y, hpx, yqq ă aqpmaxtx, y, hpx, yquq,

by induction hypothesis hpx, yq ă arpmaxtxu ` yq and since also maxtxu, y ă arpmaxtxu ` yq,

ă aqparpmaxtxu ` yqq ď ar´1parpmaxtxu ` yqq “ arpmaxtxu ` y ` 1q.

In order to obtain a majorisation in terms of the maximum instead of the sum, observe that

hpx, yq ă arpmaxtxu ` yq ď arp2 maxtx, yuq ă arpa2pmaxtx, yuqq ă amaxtr,1u`2pmaxtx, yuq.

Theorem 2.1. The Ackermann function a : N2 Ñ N is not primitive recursive.

Proof. Suppose a is primitive recursive. Then, by Lemma 2.3, there is a p P N s.t. for allx1, x2 P N: apx1, x2q ă appmaxtx1, x2uq. But then we have

app, pq ă appmaxtp, puq “ apppq “ app, pq,

contradiction.

So the primitive recursive functions are not sufficient as a model of the intuitive notion ofcomputability. But what did we miss? Note that one important feature of primitive recursionis the following: when we start the computation of hpx, yq which is obtained by primitiverecursion we know already how often h will call itself: y times. In all programming languagesthere are constructs that allow to start a recursion or an iteration without knowing in advancehow often it will be repeated, instead a condition is given which decides when to terminate therecursion/iteration (e.g. while- or repeat ... until-loops). However, in such constructs wedo not have a guarantee that the condition will eventually be met, the computation may notterminate. In case of non-termination the value of the function that is computed is not defined.Consequently we define:

18

Definition 2.6. A partial function from Nn to N, in symbols f : Nn ãÑ N, is a functionf : D Ñ N for some D Ď Nn.

For x P NnzD we also say that fpxq is not defined. The definitions of composition and primitiverecursion generalise naturally to partial functions (where a result of a function is only definedif all results required for computing it by the respective operator are defined).

Definition 2.7. Let f : Nn`1 ãÑ N, then the function obtained from minimisation of f isg : Nn ãÑ N where gpxq “ y if fpx, yq “ 0 and, for all y1 ă y, fpx, y1q is defined and fpx, y1q ‰ 0.If there is no such y, then gpxq is undefined.

If f is computable, then so is g: we compute g by computing fpx, 0q, fpx, 1q, . . . until we find a ywith fpx, yq “ 0. If one of the computations fpx, y1q does not terminate, then the computationof g does not terminate. If all the computations of fpx, y1q terminate but none of them yields0, then the computation of g does not terminate.

Definition 2.8. A partial function f : Nn ãÑ N is called partial recursive if it can be obtainedfrom the basic functions by a finite number of applications of the operators of composition,primitive recursion, and minimisation.

The Ackermann function is partial recursive (but we refrain from showing this here). Now weare in a similar situation as after the definition of the primitive recursive functions: we havedefined a class of functions all of which are computable in the intuitive sense. But how can webe sure that we did not miss anything now? We can gather confidence in this class of functionsby arriving at the same class from a very different direction, just as regular languages turnedout to be a quite robust class permitting many different characterisations, so will the partialrecursive functions.

2.2 Turing machines and the Church-Turing thesis

Another model of computation are Turing machines. The basic idea of a Turing machine is thata finite automaton controls the computation and may make unrestricted use of an unboundedmemory. This unbounded memory is modelled by an infinite tape, its unrestricted use by theability to both read and write to that tape. In order to carry out read- and write-operationsthere is a cursor which, at every point in time, is positioned on a particular cell of the tape.On each cell, there is one of the symbols 0, 1, and Ź. We use 0 and 1 for encoding binarystrings, represents the space character, and Ź marks the beginning of the tape. All the tapeswe consider will have in all but finitely many cells. At each step of the computation thecursor may either be moved one cell to the left, one cell to the right or remain where it is. Forindicating these movements of the cursor we use the symbols Ð, Ñ and ´.

Definition 2.9. A Turing machine is a tuple M “ xQ, δ, q0y where q0 P Q is the starting stateand

δ : Qˆ t0, 1, ,Źu ÝÑ pQY tyes, no, doneuq ˆ t0, 1, ,Źu ˆ tÐ,Ñ,´u

is the transition function. We require: if δpq,Źq “ pq1, s, dq for a q1 P Q, then s “ Ź and d ‰Ð.

The states yes, no and done are final states which explains why they do not serve as input ofthe transition function. The side condition on δ merely ensures that the cursor cannot “fall off”the tape on the left.

19

Definition 2.10. Let M “ xQ, δ, q0y be a Turing machine. A configuration of M is a tuplepq, u, vq where q P QY tyes, no, doneu and u, v P t0, 1, ,Źu˚.

The configuration pq, u, vq is interpreted as follows: q is the current state, u is the contents ofthe tape on the left and including the cursor and v is the contents of the tape on the right ofthe cursor until there are only blanks.

We fix the following convention for representing a tuple x “ px1, . . . , xnq P Nn on a tape: fori “ 1, . . . , n let xi,1, . . . , xi,ki P t0, 1u

ki be the binary representation of xi. Then we identify xwith the word x1,1 ¨ ¨ ¨x1,k1 ¨ ¨ ¨ xn,1 ¨ ¨ ¨xn,kn .

Definition 2.11. Let x P Nn. The initial configuration for x is pq0,Ź, xq.

As a picture, the initial configuration for x is:

Ź x1,1 ¨ ¨ ¨ x1,k1 ¨ ¨ ¨ xn,1 ¨ ¨ ¨ xn,kn . . .Ò

Definition 2.12. Let M “ xQ, q0, δy be a Turing machine. We write pq, u, vqMÑ pq1, u1, v1q if

the configuration pq1, u1, v1q is obtained in one step from the configuration pq, u, vq. For k ě 0

we write pq, u, vqMÑ

kpq1, u1, v1q if the configuration pq1, u1, v1q is obtained in k steps from the

configuration pq, u, vq. We write pq, u, vqMÑ˚

pq1, u1, v1q if there is a k P N s.t. pq, u, vqMÑ

k

pq1, u1, v1q.

Note that the Turing machines we have considered so far are deterministic in the sense that,

for a Turing machine M and an input x P Nn, there is exactly oneMÑ-path which starts in the

initial configuration pq0,Ź, xq. We say that a Turing machine M terminates on the input x P Nn

if this path is finite. In contrast to a finite automaton a Turing machine may not terminate onall inputs.

Definition 2.13. Let L Ď t0, 1u˚ and M “ xQ, δ, q0y be a Turing machine. We say that Mdecides L if for all x P t0, 1u˚:

• If x P L then there are u, v s.t. pq0,Ź, xqMÑ˚

pyes, u, vq.

• If x R L then there are u, v s.t. pq0,Ź, xqMÑ˚

pno, u, vq.

IN particular, if M decides a language L, then M terminates on all inputs x P t0, 1u˚. One cangive an analogue definition for tuples L Ď pt0, 1u˚qn and tuples x P Nn.

Definition 2.14. A language L Ď t0, 1u˚ is called Turing-decidable if there is a Turing machineM which decides L.

20

Example 2.3. The following Turing machine decides if the input word x P t0, 1u˚ is a palindrome.

q0

0 : p0,Ñq, 1 : p1,Ñq

)) : p ,Ðq// q10

0 : p ,Ðq

$$

1 : p1,´q

��

Ź : pŹ,Ñq

{{// s

Ź : pŹ,Ñq

JJ

: p ,´q//

0 : pŹ,Ñq

;;

1 : pŹ,Ñq

$$

yes no q

0 : p0,Ðq, 1 : p1,Ðq

UU

Ź : pŹ,Ñq��

q10 : p0,Ñq, 1 : p1,Ñq** : p ,Ðq

// q11

1 : p ,Ðq

;;

0 : p0,´q

OO

Ź : pŹ,Ñq

dd

We also want to use Turing machines to compute functions. Define Năω :“Ť

kě1 Nk.

Definition 2.15. Let M be a Turing machine and x P Năω. M induces the partial functionM : Năω ãÑ Năω where

Mpxq “

#

y if pq0,Ź, xqMÑ˚

pdone,Ź, yq

undef otherwise

Definition 2.16. A partial function f : Nn ãÑ N is Turing-computable if there is a Turingmachine M s.t. MæNn“ f , i.e., for all x P Nn: fpxq is defined iff Mpxq is defined, and in thiscase: fpxq “Mpxq.

Theorem 2.2. Every partial recursive function is Turing-computable.

Proof sketch. If f is partial recursive, it is obtained by a finite number of applications of theoperators composition, primitive recursion, and minimisation from the basic functions. We pro-ceed by induction on that number. For the base case we construct, for every basic function f ,a Turing machine that computes f . For the step case we make a case distinction on the lastoperator. Under the assumption that the functions used to define f can be computed by Turingmachines we construct a new Turing machine computing f . These newly constructed machinesfollow the informal algorithms given above to argue that the operators preserve informal com-putability.

Theorem 2.3. Every Turing-computable function is partial recursive.

Proof sketch. Let M “ xQ, δ, q0y be a Turing machine. There is a primitive recursive encodingof a tuple Năω in N, e.g., by mapping px1, . . . , xnq to

śni“1 p

xii where pi is the i-th prime number.

We can use this to obtain a primitive recursive encoding of a configuration pq, u, vq of M asa natural number, e.g., by identifying Q with t1, . . . , |Q|u, the tape alphabet t0, 1, ,Źu witht1, 2, 3, 4u, and by encoding q, u, and v as a triple. Then a single-step of M is merely a large

case distinction and thus primitive recursive. Using minimisation we find k s.t.MÑ

kleads to a

configuration with a final state if such a k exists. If not M does not terminate and the partialrecursive function we have constructed is not defined.

A noteworthy feature of the above proof is that is uses minimisation only once; this leads toKleene’s normal form theorem but we do not pursue this direction here. Instead we observe:

21

Corollary 2.1. A partial function is Turing-computable iff it is partial recursive.

We have thus seen two very different formalisms which happen to describe the same class offunctions. In fact, there is a large number of other formalisms, all of which describe the class ofpartial recursive functions. For example, the λ-calculus by A. Church is another such formalismwhich is very similar to a functional programming language; register machines is another suchformalism, a bit more similar to contemporary computers than Turing machines. Much as inthe case of the regular languages this is an indication that we are dealing with an importantclass.

This situation has led to the Church-Turing thesis.

Church-Turing thesis

A partial function is computable iff it is Turing-computable.

Of course, each of the formalisms considered above gives rise to an equivalent statement of thethesis, e.g.: a partial function is computable iff it is partial recursive. We speak about a thesishere and not about a theorem, because the statement is not a mathematical statement. Thenotion “computable” is not a mathematical notion but refers to our human intuition for analgorithm. In contrast to that, the notions “Turing-computable”, “partial recursive”, etc. aremathematical notions. The equivalence of all these different formalisms justifies our faith inhaving successfully characterised the intuitive notion of computability by (any of) these for-malisms. Justified by the Church-Turing thesis we will henceforth only speak about computable(instead of Turing-computable) functions and decidable (instead of Turing-decidable) relations.

2.3 Undecidability

Theorem 2.4. There are undecidable languages.

Proof. There are uncountable many languages but only countably many Turing machines.

This proof is of course not very satisfying because it does not provide a concrete example of anundecidable language. In the following, we will define a concrete example.

Definition 2.17. Let TMe be the e-th Turing machine in an arbitrary but fixed bijection toN. The halting problem is the following binary relation:

H “ tpe, xq P Nˆ N | TMe terminates on input xu.

Theorem 2.5. The halting problem is undecidable.

Proof. Define f : N ãÑ N by

fpnq “

#

0 if pn, nq R H

undef otherwise

Suppose there is a Turing machine that decides H, then there is also a Turing machine M thatcomputes f . We can construct M explicitely by formalising the computation implicit in theabove definition of f . We skip this technical part of the proof.

22

Let e be the code of M . Then we have

pe, eq P HDef. fðñ fpeq is undefined

Def. Mðñ M does not terminate on input e

Def. Hðñ pe, eq R H

which is a contradiction.

Theorem 2.6. There is a universal Turing machine, i.e., a Turing machine U s.t.:

Upe, xq “ TMepxq for all pe, xq P Nˆ N

Proof sketch. U is just an interpreter (in the sense of computer science). In our context, thismeans that U considers the first part e of the input as code of a Turing machine and x as inputfor the Turing machine TMe. The Turing machine U then simulates the Turing machine TMe,in particular, an encoding of the current configuration of TMe is stored on the tape of U . Toactually construct such a machine U is not very difficult but technically involved so we leaveout the details of this proof.

Note that the above proof requires that the bijection e ÞÑ TMe is chosen in a “computable”way, i.e., s.t. the Turing machine TMe can actually be simulated if e is given. This does notpresent an obstacle since it is the case for every natural definition of e ÞÑ TMe. In contrast,this property is not necessary for showing the undecidability of the halting problem.

Definition 2.18. Let L Ď t0, 1u˚ and M “ xQ, δ, q0y be a Turing machine. We say that Maccepts L if for all x P t0, 1u˚:

• If x P L then there are u, v s.t. pq0,Ź, xqMÑ˚

pyes, u, vq.

• If x R L then M does not terminate.

A language L is called semi-decidable if there is a Turing machine M which accepts L.

Theorem 2.7. The halting problem is semi-decidable.

Proof. We start with the universal Turing machine U and modify it to obtain a Turing machineU 1 that terminates in yes iff U terminates (in one of yes, no, or done). Then U 1 accepts thehalting problem.

Every decidable problem is semi-decidable. A problem L Ď t0, 1u˚ is decidable iff both L andthe complement of L are semi-decidable. There are problems which are not semi-decidable.e.g., the complement of the halting problem.

Let R Ď Nˆ N, then we can define a set S Ď N via existential quantification as follows:

x P S ô Dy s.t. px, yq P R

If R is decidable, then S is semi-decidable (we skip the proof of this fact here, but cf. withsimulation of Turing machines by partial recursive functions). Moreover, in the other direction:for every semi-decidable set S there is a decidable set R s.t. S can be defined from R as above.So existential quantification allows to give an abstract definition of the semi-decidable sets basedon the decidable sets without having to consider a machine model.

23

Chapter 3

Computational complexity theory

3.1 Nondeterministic Turing machines

Definition 3.1. A nondeterministic Turing machine is a tuple M “ xQ,∆, q0y where q0 P Q isthe initial state and

∆ Ď Qˆ t0, 1, ,Źu ˆ pQY tyes, no, doneuq ˆ t0, 1, ,Źu ˆ tÐ,Ñ,´u

is the transition relation. We require: if pq,Ź, q1, s, dq P ∆, then s “ Ź and d ‰Ð.

The notion of configuration, as well asMÑ,

MÑ

k, and

MÑ˚

are defined just as for deterministic

Turing machines. Note however that now a given x P Nn does no longer induce a uniqueMÑ-path.

Definition 3.2. Let L Ď t0, 1u˚ and M “ xQ,∆, q0y a nondeterministic Turing machine. We

say that M decides the language L if allMÑ˚

-paths are of finite length and for all x P t0, 1u˚:

• If x P L then there are u, v s.t. ps,Ź, xqMÑ˚

pyes, u, vq.

• If x R L then ps,Ź, xqMÑ˚

ps˚, u, vq and s˚ P tyes, no, doneu implies that s˚ “ no.

Just as for NFAs also here, the existence of a single path that ends in a yes-configuration issufficient to recognise the input element as being in the language.

Theorem 3.1. A language L Ď t0, 1u˚ is decidable by a deterministic Turing machine iff L isdecidable by a nondeterministic Turing machine.

Proof sketch. A nondeterministic Turing machine can be simulated by a deterministic Turingmachine by trying out all nondeterministic choices sequentially. It will terminate with yes assoon as it found one path of the nondeterministic Turing machine that ends in yes. If there isno such path it will still terminate since all paths are finite and end in state no.

Definition 3.3. Let M be a deterministic or nondeterministic Turing machine with initialstate q0 and let f : N Ñ N. We say that M has runtime f if for all x P t0, 1u˚ and for all

configurations C: if pq0,Ź, xqMÑ

kC, then k ď fp|x|q.

Definition 3.4. A language L Ď t0, 1u˚ is called decidable in deterministic polynomial time ifthere is a deterministic Turing machine M and a polynomial q : N Ñ N s.t. M decides L inruntime q.

25

L is called decidable in nondeterministic polynomial time if there is a nondeterministic Turingmachine M and a polynomial q : NÑ N s.t. M decides L in runtime q.

P is the set of all languages which are decidable in deterministic polynomial time and NP isthe set of all languages which are decidable in nondeterministic polynomial time.

Since every deterministic Turing machine is also a nondeterministic Turing machine, it is im-mediately clear that P Ď NP. Whether the other direction, NP Ď P, is also true is one ofthe most difficult open problems in mathematics. Despite being in the centre of attention oftheoretical computer science for decades it has resisted every attempt to solve it so far.

Note that nondeterministic computation is merely a theoretical concept and not a type ofcomputation which can be carried out by an actual computer. The interest in the class NPcomes from the fact that a very large number of practically relevant languages lie in NP; wewill soon see several examples. In contrast, the interest in the class P is motivated by the factthat polynomial-time deterministic computation is quite close to what a computer can do inpractice (at least for polynomials of small degree with constants of reasonable size). So, on

a conceptual level, the motivation behind the question P?“ NP is to find out if these many

practically relevant languages in NP can be solved exactly and efficiently in practice1.

We will now consider an important example of nondeterministic computation.

Definition 3.5. Formulas in propositional logic are defined inductively as follows. We startwith a countably infinite set tpi | i ě 1u of atoms.

1. Every atom is a formula.

2. If ϕ,ψ are formulas, then ϕ, ϕ^ ψ, ϕ_ ψ, and ϕÑ ψ are formulas.

For a formula ϕ we write Lpϕq for the set of atoms which occur in ϕ. An interpretation of aformula ϕ is a function I : Lpϕq Ñ t0, 1u. We define Ipϕq P t0, 1u inductively on the structureof the formula ϕ based on the usual truth table (where 0 represents “false” and 1 represents“true”). A formula ϕ is called satisfiable if there is an interpretation I of ϕ s.t. Ipϕq “ 1.

Example 3.1. The formula ϕ “ p1 ^ pp1 Ñ p2q is satisfiable by the interpretation I “ tp1 ÞÑ

1, p2 ÞÑ 0u. The formula ϕ^ p2 is unsatisfiable.

Definition 3.6. The Sat-problem

• Instance: a propositional formula ϕ

• Question: is ϕ satisfiable?

A decision problems like the above can be represented as a language L Ď t0, 1u˚ using asuitable coding. We will not carry this out explicitely here. Depending on the context Satmeans either the set of satisfiable propositional formulas or their encodings as binary strings.In computational complexity theory one speaks about “problems” in the sense of the abovedefinition instead of languages.

Theorem 3.2. Sat P NP

1Note, however, that it is conceivable that P “ NP but the degree (and constants) of all polynomials needed tosimulate NP-computation deterministically is so enormous that the result is entirely without immediate practicalconsequences.

26

Proof. In order to show that Sat P NP we will briefly sketch how a nondeterministic Turingmachine M can check the satisfiability of a formula: M starts with the input formula ϕ onits tape. It then makes a nondeterministic decision for every pi P Lpϕq to either 1 or 0. Oncethat decision has been made, M modifies the tape in such a way that pi is replaced by 1, or 0respectively, everywhere on the tape. Thus it makes linearly many nondeterministic decisionswhich leave the tape to contain a formula ϕ˚ which does not contain any pi anymore. The valueof such a formula can be computed in deterministic polynomial time. If it is 1, this computationpath of M ends in yes, if it is 0, it ends in no.

The definition of the semi-decidable sets in terms of existential quantification over decidablerelations has an analogue on the level of polynomial time complexity:

Theorem 3.3. S P NP iff there is an R P P and a polynomial q : NÑ N s.t.

x P S ô D y P N with |y| ă qp|x|q and px, yq P R

where |x| is the length of the binary representation of x.

Proof sketch. Let R P P and S be defined by Dy P N with |y| ă qp|x|q and px, yq P R. An NP-Turing machine M for S proceeds on input x by making qp|x|q many nondeterministic choicesto fix the value of y and then computes Rpx, yq in deterministic polynomial time.

In the other direction, let S P NP and M be an NP-Turing machine for S. Then, with sometechnical work, one can define a binary relation R P P where y is the nondeterministic choicesmade by M . Since the runtime of M is polynomial, M can make only polynomially manynondeterministic choices and thus y is polynomially bounded.

The above result often simplifies the proof that a given problem is in NP. It then suffices toshow that a correct solution can be guessed and checked in deterministic polynomial time. Inthe literature, this kind of proof is often called “guess-and-check”. We can, for example, give amore concise proof of Theorem 3.2 by guess-and-check as follows:

Proof of Theorem 3.2. Sat P NP since a propositional formula ϕ is satisfiable iff there is aninterpretation I which satisfies ϕ. The size of I is bound (linearly) by that of ϕ and the relation“I satisfies ϕ” is computable in deterministic polynomial time.

3.2 NP-completeness

Definition 3.7. A language L1 Ď t0, 1u˚ is reducible to a language L2 Ď t0, 1u

˚ if there is a(deterministic) polynomial time computable function f : t0, 1u˚ Ñ t0, 1u˚ s.t. for all x P t0, 1u˚:x P L1 ô fpxq P L2. This is written as L1 ďp L2.

So in case L1 ďp L2, the language L2 is at least as difficult as L1 (up to polynomial timereducibility) because a Turing machine that solves L2 can be used as a black box for solvingL1.

Definition 3.8. A language L Ď t0, 1u˚ is called NP-hard if for all L1 P NP: L1 ďp L.

L is called NP-complete if L P NP and L is NP-hard.

So, up to polynomial time reducibility, the NP-complete problems are the hardest problems inNP. Therefore, the following result does not come as a surprise:

27

Lemma 3.1. Let L be NP-complete. Then P “ NP iff L P P.

Proof. If P “ NP then L P P since L P NP by assumption. For the other direction, let L P Pand L be NP-complete. Then L1 ďp L for every L1 P NP and thus, using the polynomial timereduction f with x P L1 ô fpxq P L also L1 P P.

Based on this lemma, every NP-complete problem L induces a question that is equivalent toP “ NP: the question if there exists a deterministic polynomial time algorithm for L.

As indicated above, the interest in the P vs. NP question comes from the fact that manypractically relevant problems lie in the class NP. In fact, it turns out that many are even

NP-complete. The above proposition shows that P?“ NP has as many equivalent formulations

as there are NP-complete problems. For any one of them, it is equivalent to asking for adeterministic polynomial time algorithm.

A first, and central, example for an NP-complete problem is Sat.

Theorem 3.4 (Cook 1971). Sat is NP-complete.

Proof. We have already shown that Sat P NP in Theorem 3.2, so it suffices to show that Sat isNP-hard. To that aim, let L P NP and M “ xQ,∆, q0y be a nondeterministic Turing machinethat decides L in polynomial time P : NÑ N. Since L is a decision problem we assume w.l.o.g.that every computation path of M ends in either yes or no. In particular, done is not reachablein M . We define a function ϕM which is computable in deterministic polynomial time andwhich assigns to every x P t0, 1u˚ a propositional formula ϕM pxq s.t. ϕM pxq is satisfiable iff Mwith input x has an accepting path, i.e., if x P L.

A central observation for this construction is: in time P p|x|q the Turing machine M can onlyconsider the first P p|x|q cells of its tape. Therefore it suffices to simulate M on this, finite, partof its tape. To that aim we will use the following propositional atoms.

• zt,q for t P t1, . . . , P p|x|qu and q P Q Y tyes, nou where the intended interpretation of zt,qis that M at time T is in state q.

• ct,i for t, i P t1, . . . , P p|x|qu where the intended interpretation of ct,i is that the cursor attime t is at the i-th cell of the tape.

• bt,i,v for t, i P t1, . . . , P p|x|qu and v P t0, 1,Ź, u where the intended interpretation of bt,i,vis that at time t the i-th cell contains the value v.

The formula will have the following form

ϕM pxq “ AM pxq ^ TM pP p|x|qq ^ EM pP p|x|qq ^ UM pP p|x|qq

where

• the start conditions are

AM pxq “ z1,q0 ^ c1,1 ^ b1,1,Ź ^

|x|ľ

i“1

b1,i`1,xi ^

P p|x|qľ

i“|x|`2

b1,i, .

28

• The transition conditions are TM pnq “ T 1M pnq ^ T2M pnq ^ T

3M pnq where

T 1M pnq “n´1ľ

t“1

nľ

i“1

ľ

qPQ

ľ

vPt0,1,Ź, u

`

pzt,q^ct,i^bt,i,vq Ñł

pq1,v1,dq mitpq,v,q1,v1,dqP∆

pzt`1,q1^bt`1,i,v1^ct`1,i`dq˘

where i` d denotes either i` 1, i, or i´ 1, depending on the value of d. Moreover,

T 2M pnq “n´1ľ

t“1

ľ

qPtyes,nou

pzt,q Ñ pzt`1,q ^

nľ

i“1

pct,i Ñ ct`1,iq ^

nľ

i“1

ľ

vPt0,1,Ź, u

pbt,i,v Ñ bt`1,i,vqqq

and

T3M pnq “n´1ľ

t“1

nľ

i“1

pp ct,i ^ bt,i,vq Ñ bt`1,i,vq.

• For defining the uniqueness conditions, let

Uptϕ1, . . . , ϕnuq “ pϕ1 _ . . ._ ϕnq ^ľ

1ďiăjďn

pϕi ^ ϕjq.

Note that Uptϕ1, . . . , ϕnuq is true iff exactly one of the ϕi is true. The uniqueness condi-tions are then defined as

UM pnq “nľ

t“1

Uptzt,q | q P QY tyes, nouuq^

nľ

t“1

Uptct,i | 1 ď i ď nuq^

nľ

t“1

nľ

i“1

Uptbt,i,v | v P t0, 1,Ź, uuq

• The final condition is just

EM pnq “ zn,yes

The formula ϕM pxq corresponds to the definition of the transition relation, the initial configu-ration, etc. Therefore ϕM pxq is satisfiable iff M has an accepting path on input x.

Moreover, it is easy to verify that the size of ϕM pxq is Op|P pxq|3q. Since ϕM pxq can be computedin time linear in its size, ϕM pxq can be computed in deterministic polynomial time.

3.3 More NP-complete problems

In fact, Sat can be restricted to formula in conjunctive normal form (CNF) and remains NP-complete. A propositional formula is in CNF if it has the form

Źni“1

Žkij“1 Li,j where each Li,j

is either an atom or a negated atom. In the literature the Sat problem is also often understoodas only considering formulas in CNF as input. The 3Sat-problem is Sat restricted to CNFswhere ki ď 3 for all i P t1, . . . , nu. Also 3Sat is already NP-complete.

We will now consider a problem which structurally differs quite significantly from Sat but whichwill turn out to be NP-complete nevertheless.

29

Definition 3.9. An (undirected) graph is a pair G “ pV,Eq where V is a set of vertices andE Ď ttv1, v2u | v1, v2 P V u is the set of edges.

Definition 3.10. Let G “ pV,Eq be a graph. A path in G is a list v1, . . . , vn of vertices s.t.tvi, vi`1u P E for all i P t1, . . . , n´ 1u.

A Hamiltonian path in G is a path that contains every vertex v P V exactly once.

Definition 3.11. The Hamiltonian Path-problem

• Instance: a finite graph G

• Question: does G have a Hamiltonian path?

Example 3.2.

G1 “

‚ ‚

‚ ‚

G2 “

‚ ‚

‚ ‚

The graph G1 has a Hamiltonian path, the graph G2 does not.

Theorem 3.5. Hamiltonian Path is NP-complete.

Proof. Hamiltonian Path P NP can be shown by a guess-and-check argument: first guess apermutation of the vertices. Given such a permutation it is possible to determine in deterministicpolynomial time whether it is a path.

In order to show NP-hardness of Hamiltonian Path, we will show that Sat ďp Hamiltonian Path.

Given a formula ϕ in CNF, i.e., ϕ “Źn

i“1

Žkij“1 Li,j we want to compute in polynomial time

a graph Gϕ s.t. Gϕ has a Hamiltonian path iff ϕ is satisfiable. Note that an interpretation I

satisfies a CNFŹn

i“1

Žkij“1 Li,j iff for every i P t1 . . . , nu there is a j P t1, . . . , kiu s.t. IpLi,jq “ 1.

We will use the following “gadgets”: first the choice gadget

‚

‚

Every Hamiltonian path must contain exactly one of the two edges. It is impossible to choosenone (then the lower vertex would not be in the path) and it is impossible to choose both (thenone of the two vertices would be in the path twice). This gadget will later also be used formaking a choice between two paths (not just two edges).

The XOR-gadget:˝ ‚ ‚ ‚ ‚ ˝

‚ ‚ ‚ ‚

˝ ‚ ‚ ‚ ‚ ˝

30

A Hamiltonian path that passes through this graph either connects the two upper ˝-vertices orthe two lower ˝-vertices. It is easy to check that there are no other pairs of ˝-vertices that itcan connect. We will use the XOR-gadget in order to create an XOR between two edges. Thiscan be written as:

˝ ˝

‘

��

OO

˝ ˝

For a clause C “ L1_ . . ._Ln we will form a circle with n edges where the edge ei being in theHamiltonian path means that Li is set to false. Note that a Hamiltonian path cannot containall edges of a circle since then, it would contain a vertex twice.

A further gadget is the complete graph Kn with n vertices. A property of Kn which is importantfor our construction is: let p “ v1, . . . , vk be a path in Kn which does not contain a vertex twice,let w be a vertex which does not occur in p. Then p can be extended to a Hamiltonian pathv1, . . . , vk, vk`1, . . . , vn “ w in Kn.

We will now explain the mapping of ϕ to Gϕ based on an example. Let

ϕ “ pp1 _ p2q ^ p p1 _ p2 _ p3q ^ p p1 _ p2 _ p3q

Then the graph Gϕ is

‚ ‚

‚ ‚

p1 p1

‚

p1 p2

‚

‚ ‚

p2 p2

‚ p1

p2

‚

p3

‚

‚ ‚

p3 p3

‚ p1

p2

‚

p3

‚

‚ ‚

where in addition to the edges given above the following edges are added:

1. Every edge labelled with a literal L on the left side is connected to the unique L-edge onthe right side using an XOR-gadget (if several XOR-gadgets are required for a single edgeon the right side, then they are arranged sequentially).

2. All circled vertices are connected to a complete graph.

31

First observe that Gϕ can be computed from ϕ in deterministic polynomial time. It remains toshow that Gϕ has a Hamiltonian path iff ϕ is satisfiable.

from left to right: a Hamiltonian path induces on an interpretation I through its form on theright side. The XOR-gadgets ensure that exactly those edges from the left side are in the pathwhose literals are interpreted as 0. Since the path is Hamiltonian, none of the clause circles cancontain all edges. Thus in every clause there is at least one true literal, i.e., Ipϕq “ 1.

from right to left: Let I be an interpretation with Ipϕq “ 1. Then we start the Hamiltonianpath in the vertex on the upper right and continue it, as induced by I, to the circled vertexon the lower right. The edges on the left side will be included in the path according to theXOR-gadgets. There is no clause circle all of whose edges are in the path (since otherwisewe would have Ipϕq “ 0). Therefore there is no vertex which occurs twice in the path so farconstructed. The remaining vertices of the complete graph are visited in such a way that thepath ends in the circled vertex on the upper left.

There are several thousand NP-complete problems, for example:

Definition 3.12. The Smallest Grammar-problem

• Instance: a word w P A˚, a k P N

• Question: does there exist a CFG G with LpGq “ twu and |G| ď k?

Definition 3.13. The Knapsack-problem

• Instance: a finite set U , a weight function w : U Ñ Q, a value function v : U Ñ Q, andb, k P N

• Question: does there exist a K Ď U s.t.ř

uPK wpuq ď b andř

uPK vpuq ě k.

Definition 3.14. The travelling salesman (Tsp) problem

• Instance: a finite graph G “ pV,Eq, a weight function w : E Ñ N, and a k P N

• Question: does there exist a Hamiltonian circle, i.e., a Hamiltonian path that returns toits first vertex, with total weight at most k?

32

Bibliography

[1] John E. Hopcroft and Jeffrey D. Ullman. Introduction to Automata Theory, Languages, andComputation. Addison-Wesley, 1979.

[2] Piergiorgio Odifreddi. Classical Recursion Theory, volume 125 of Studies in Logic and theFoundations of Mathematics. North-Holland Publishing Co., 1989.

[3] Christos H. Papadimitriou. Computational Complexity. Addison-Wesley, 1994.

[4] Michael Sipser. Introduction to the Theory of Computation. Cengage Learning, 3rd edition,2012.

33

Theoretical Computer Science - dmg.tuwien.ac.atdmg.tuwien.ac.at/hetzl/teaching/tcs_2018.pdf ·...

Documents

Transcript of Theoretical Computer Science - dmg.tuwien.ac.atdmg.tuwien.ac.at/hetzl/teaching/tcs_2018.pdf ·...