Maximal Machine Learnable Classes

4
Journal of Computer and System Sciences 58, 211214 (1999) Maximal Machine Learnable Classes* John Case Department of CIS, University of Delaware, Newark, Delaware 19716 and Mark A. Fulk Genesee Valley Software, 92 Monteroy Road, Brighton, New York 14618 Received April 21, 1987; revised June 18, 1997 A class of computable functions is maximal iff it can be incrementally learned by some inductive inference machine (IIM), but no infinitely larger class of computable functions can be so learned. Rolf Wiehagen posed the question whether there exist such maximal classes. This question and many interesting variants are answered herein in the negative. Viewed positively, each IIM can be infinitely improved upon! Also discussed are the problems of algorithmically finding the improvements proved to exist. ] 1999 Academic Press 1. INTRODUCTION Let PR denote the class of primitive recursive functions. 1 There is a machine M PR which, when fed successively more values of any f # PR, after finitely many trial and error out- put programs, eventually settles on an output program which computes f [16, 3]. We say M PR incrementally learns (or Ex-identifies ) f [10]. PR is a large and inclusive class of computable functions [11]. It is interesting, then, to ask whether there are classes of computable functions strictly larger than PR which are also incrementally learnable (i.e., Ex-identifiable). Let Ex denote the class of all classes of computable func- tions each Ex-identifiable by some machine. Then, for example, PR # Ex. It is well known and easy to show that, if S # Ex and SS$, where ( S$&S) is finite, then S$# Ex too. 2 Hence, if we add any finite collection of com- putable functions to PR we still get a class in Ex. This discussion suggests the following. Definition 1. A class of computable functions S is Ex-maximal def both S # Ex and there is no S$# Ex such that SS$ and ( S$&S) is infinite. A variant of the question of the first paragraph in this sec- tion is whether PR is Ex-maximal. It trivially is not. For example, using the enumeration technique [3, 15, 16] one can show that each level of the Pe ter-hierarchy [22, 27] is in Ex. 3 Wiehagen [30] asked whether there is an Ex-maximal class. In Section 3 below we answer the question and inter- esting variants each negatively. 4 A positive, informal restate- ment is Infinite improvement is always possible! In Section 4 below we discuss the problems of algo- rithmically finding the improvements proved to exist. 2. PRELIMINARIES The definitions, notations, and facts given here all from [10, 16, 26]. N is the set of natural numbers [ 0, 1, 2, ... ]; i, n, m, x, y, and z, as well as subscripted and superscripted versions of Article ID jcss.1997.1518, available online at http:www.idealibrary.com on 211 0022-000099 30.00 Copyright 1999 by Academic Press All rights of reproduction in any form reserved. * First author's e-mail: casecis.udel.edu. We thank Rolf Wiehagen for posing the problem which led to the present paper. This work was partially supported by NSF Grant MCS8010728. We thank the Xerox University Grants program for providing equipment to the University of Rochester which was used in the initial preparation of the present paper. We appreciate, too, the referee's helpful suggestions. The second author passed away May 18, 1997 before completion of the final draft. 1 Roughly, PR is characterized as the class of functions computable by assembly language programs whose only loop structure is equivalent for the for-loop [18, 29]. 2 This case be proved by iteration of the following. Suppose M Ex-iden- tifies S. Suppose f is computable. Then M$ Ex-identifies ( S _ [f] ), where M$ on g(0), g(1), ..., g( n) outputs a fixed program for f unless it discovers some k n such that g( k){ f ( k); in this case it outputs like M would on g(0), g(1), ..., g( n ). 3 Of course each level of this hierarchy above the primitive recursive functions is infinitely larger than the previous level. 4 Wiehagen's question, with some of its variants, was third in the list problems in [4]. Our solutions were announced in [7]. Firevalds and Kinber announced a later, independent solution, which was restricted to Wiehagen's original question [12]. The present paper is the first publi- cation of our proofs outside of the second author's dissertation [13] done under the supervision of the first author.

Transcript of Maximal Machine Learnable Classes

L

o

7

Journal of Computer and System Sciences 58, 211�214 (1999)

Maximal Machine

John

Department of CIS, University of D

an

Mark A

Genesee Valley Software, 92 Monter

Received April 21, 198

A class of computable functions is maximal iff it can be incrementallylearned by some inductive inference machine (IIM), but no infinitelylarger class of computable functions can be so learned. Rolf Wiehagenposed the question whether there exist such maximal classes. Thisquestion and many interesting variants are answered herein in thenegative. Viewed positively, each IIM can be infinitely improved upon!Also discussed are the problems of algorithmically finding theimprovements proved to exist. ] 1999 Academic Press

1. INTRODUCTION

Let PR denote the class of primitive recursive functions.1

There is a machine MPR which, when fed successively morevalues of any f # PR, after finitely many trial and error out-put programs, eventually settles on an output programwhich computes f [16, 3]. We say MPR incrementally learns(or Ex-identifies) f [10]. PR is a large and inclusive class ofcomputable functions [11]. It is interesting, then, to askwhether there are classes of computable functions strictlylarger than PR which are also incrementally learnable (i.e.,Ex-identifiable).

Let Ex denote the class of all classes of computable func-tions each Ex-identifiable by some machine. Then, forexample, PR # Ex. It is well known and easy to show that,if S # Ex and S�S$, where (S$&S) is finite, then

Article ID jcss.1997.1518, available online at http:��www.idealibrary.com on

* First author's e-mail: case�cis.udel.edu. We thank Rolf Wiehagen forposing the problem which led to the present paper. This work was partiallysupported by NSF Grant MCS8010728. We thank the Xerox UniversityGrants program for providing equipment to the University of Rochesterwhich was used in the initial preparation of the present paper. Weappreciate, too, the referee's helpful suggestions. The second author passedaway May 18, 1997 before completion of the final draft.

1 Roughly, PR is characterized as the class of functions computable byassembly language programs whose only loop structure is equivalent forthe for-loop [18, 29].

21

earnable Classes*

Case

elaware, Newark, Delaware 19716

d

. Fulk

y Road, Brighton, New York 14618

; revised June 18, 1997

S$ # Ex too.2 Hence, if we add any finite collection of com-putable functions to PR we still get a class in Ex. Thisdiscussion suggests the following.

Definition 1. A class of computable functions S isEx-maximal �

defboth S # Ex and there is no S$ # Ex such

that S�S$ and (S$&S) is infinite.

A variant of the question of the first paragraph in this sec-tion is whether PR is Ex-maximal. It trivially is not. Forexample, using the enumeration technique [3, 15, 16] onecan show that each level of the Pe� ter-hierarchy [22, 27] isin Ex.3

Wiehagen [30] asked whether there is an Ex-maximalclass. In Section 3 below we answer the question and inter-esting variants each negatively.4 A positive, informal restate-ment is Infinite improvement is always possible!

In Section 4 below we discuss the problems of algo-rithmically finding the improvements proved to exist.

2. PRELIMINARIES

The definitions, notations, and facts given here all from[10, 16, 26].

N is the set of natural numbers [0, 1, 2, ...]; i, n, m, x, y,and z, as well as subscripted and superscripted versions of

2 This case be proved by iteration of the following. Suppose M Ex-iden-tifies S. Suppose f is computable. Then M$ Ex-identifies (S _ [ f ]), whereM$ on g(0), g(1), ..., g(n) outputs a fixed program for f unless it discoverssome k�n such that g(k){ f (k); in this case it outputs like M would ong(0), g(1), ..., g(n).

3 Of course each level of this hierarchy above the primitive recursivefunctions is infinitely larger than the previous level.

4 Wiehagen's question, with some of its variants, was third in the listproblems in [4]. Our solutions were announced in [7]. Firevalds andKinber announced a later, independent solution, which was restricted toWiehagen's original question [12]. The present paper is the first publi-cation of our proofs outside of the second author's dissertation [13] doneunder the supervision of the first author.

1 0022-0000�99 �30.00Copyright � 1999 by Academic Press

All rights of reproduction in any form reserved.

these letters, range over N. The asterisk * is used as a sym-bol in various locations that might otherwise be occupied bya number; a ranges over N _ [V].

. is an arbitrary acceptable numbering of the partialcomputable functions [17, 23, 24, 25, 28]. Hence, .p isthe partial computable function: N � N computed by.-program p. Lower case Greek letters other than . rangeover partial computable functions. R is the set of com-putable functions; f and g range over R.

The domain of a partial computable function � is denotedby $�. We write �(x) a if x # �, that is, if �(x) is defined. Wewrite �1(x)=�2(x) iff either both �1 and �2 are undefinedon x or both are defined and equal on x. We write �1 =n �2

iff [x # N : �1(x){�2(x)] has at most n elements; in thiscase, we call �2 an n-variant of �1 . We write �1

*= �2 iff[x # N : �1(x){�2(x)] is finite; in that case, we call �2 afinite variant of �1 . If P is a predicate on N, we write(\

�x) P(x) iff the set of numbers that do not satisfy P is

finite.The variables _, _0 , and _1 range over finite sequences of

numbers; consistently with the use of lower-case Greekletters given above, finite sequences are considered to befunctions from initial segments of N to N. We write _0 �_1

and _0 / f iff, respectively, _0 is an initial subsequence of _1

and _0 is an initial subsequence of f (0), f (1), ... .An inductive inference (or learning) machine (IIM) is an

algorithmic device which takes, as input, a sequence ofvalues f (0), f (1), ... from some computable function f, andwhich, from time to time, as it is receiving its input, outputsa computer program [10, 26].

The variables M, M0 , and M$ range over inductiveinference machines. Following [10], we assume that theoutput of an IIM is well defined after any sequence of inputs(including the empty sequence); [3, 10] demonstrate thatour assumptions about IIMs entail no loss of generality.

We write M( f ) a iff (_x)(\�

_/ f )[M(_)=x]. That is,M( f ) a iff, as M is receiving its input, it eventually outputssome program x and thereafter never outputs any otherprogram. If M( f ) a , we write M( f ) for that final program.

We say that M Exa-identifies f [10, 16] iff both M( f ) aand .M( f ) =a f. That is, M as it is receiving successive valuesof f as input, eventually outputs a program for ana-variant of f (finite variant if a= V ) and thereafter neveroutputs a different program.

We say that M Bca-identifies f [2, 10] iff (\�

_/ f )[.M(_) =a f ]. That is, if M is fed the values of f, then, afterM has received some suitable amount of input, thereaftereach of the outputs of M is a program for ana-variant of f.

For a given IIM M, Exa(M) is the class of functions thatM Exa-identifies. Similarly, Bca(M) is the class of func-

212 CASE AN

tions that M Bca-identifies. Exa=[S�R : (_M)[S�Exa(M)]]. Bca=[S�R : (_M)[S�Bca(M)]]. Note thatExa is a class of classes of computable functions, as is Bca.

3. RESULTS

The following definition generalizes Ex-maximality fromDefinition 1 in Section 1 above.

Definition 2. Suppose I is a class of classes of com-putable functions (e.g., I could be Exa or Bca). A class ofcomputable functions S is I-maximal �

defboth S # I and

there is no S$ # I such that S�S$ and (S$&S) is infinite.

Theorem 1. For each a # N _ [V], there is noExa-maximal class.

Proof. Assume we are given a set S # Exa. Let M be anIIM such that S�Exa(M). We are done if we can con-struct an IIM M$ such that Exa(M)�Exa(M$) and(Exa(M$)&Exa(M)) is infinite. The crucial trick is toconsider the following two cases.5 :

Case (1). For each finite sequence _, there is anf # Exa(M) that extends _ (i.e., which is such that _/ f ).

By Kleene's S-m-n theorem [26], there is a computablefunction g such that, for each i, .-program g(i) runs thefollowing computation in stages and .g(i) is the union of[{0 , {1 , . . .] from the stages.

Stage 0. {0={$0=(0, i) , y0=0.

Stage n+1. g(i) searches for {$n+1 and yn+1 suchthat {n �{$n+1 , yn+1 � ${$n+1 , and .M({$n+1)( yn+1) a . g(i)then sets {n+1 to be such that {$n+1 �{n+1 ; {n+1( yn+1)=1+.M({$n+1)( yn+1), and {n+1(x)=0 for all x< yn+1 suchthat x � ${$n+1 .

[.g(i) : i # N] is easily seen to be an infinite set of partialcomputable functions from the fact that, for each i,.g(i)(0) a =i.

To finish Case (1), we proceed with a series of claims.

Claim 1. For each i, .g(i) is total.

Proof of Claim 1. We begin by proving that every stagein the construction of .g(i) halts. That Stage 0 halts isobvious.

Consider Stage n+1, for an arbitrary n. Since we are inCase (1), there is some f such that {n � f and f # Exa(M).From the definition of Exa, for all sufficiently long _�f,.M(_)=

a f. Choose such a _ such that {n �_. All sufficientlylarge z satisfy .M(_)(z) a = f (z); choose such a z largeenough that z � $_. The chosen _ and z satisfy therequirements on {$n+1 and yn+1 in Stage n+1; therefore thesearch in Stage n+1 halts, and we have shown that everystage halts.

It is easily seen that, for each n, ({n+1&{n) is not empty.

D FULK

Immediately we have Claim 1. K

5 We will have more to say about the dichotomy provided by these casesin Section 4 below.

L

Claim 2. For each i, .q(i) � Exa(M).

Proof of Claim 2. Suppose by way of contradictionotherwise. Then M(.g(i)) a . Note that y0< y1< } } } . Forall m, .g(i)( ym)=1+.M({$m)( ym){.M({$m)( ym). For allsufficiently large m, M({$m)=M(.g(i)). Therefore, for allsufficiently large m, .M(.g(i))

( ym){.g(i)( ym). K

Claim 3. (Exa(M) _ [.g(i) : i # N]) # Exa.

Proof of Claim 3. Let M$ be an IIM such that

M$(_)={g(_(0)),M(_),

if _�.g(_(0)) ,otherwise.

Suppose f # Exa(M). Then f{.g(_(0)) , and for all suf-ficiently large _�f, _�3 .g(_(0)) and M$(_)=M(_). ThenM$( f ) a =M( f ) and f # Exa(M$).

Suppose i given. For all _�.g(i) , M$(_)= g(i), soM$(.g(i)) a = g(i) and therefore .g(i) # Ex0(M$)�Exa(M$). K

Claims 1, 2, and 3 prove Theorem 1 in Case (1).

Case (2). Not Case (1); that is, there is a _0 such that(\ f | _0 � f )[ f � Exa(M)].

Let S0 be some easily identifiable infinite set of totalextensions of _0 ; for example, the set of functions of finitesupport [3] containing _0 . Let M0 be an IIM thatExa-identifies S0 .

Let M$ be such that

M$(_)={M0(_),M(_),

if _0 �_,otherwise.

We will prove that (Exa(M) _ S0)�Exa(M$).Suppose f # Exa(M). Then, for all _�f, _0 �3 _, and

M$(_)=M(_). Therefore, M$( f )a =M( f ), and f#Exa(M$).Suppose f # S0 . Then, for all sufficiently large _�f,

_0 �_ and M$(_)=M0(_). Since f # Exa(M0), it followsthat M$( f ) a =M0( f ) and f # Exa(M$). K

Theorem 2. For each m # N, there is no Bcm-maximalclass.

Proof. This proof is identical to the proof of Theorem 1above, except for the following modifications in Case (1). InStage n+1, g(i) searches for {$n+1 and y0

n+1 , y1n+1 , ..., ym

n+1

such that, for each i�m, .M({$n+1)( y in+1) a . For each i�m,

set {n+1( y in+1)=1+.M({$n+1)( y i

n+1). Thus, infinitely manyof the output programs of M are made wrong on m+1inputs, when M is fed .g(i) . K

By Harrington's surprising result in [10], R # Bc*.Hence, since R cannot even be finitely improved upon, R isa Bc*-maximal class.

MAXIMAL MACHINE

It is easy to modify the proofs of Theorems 1 and 2 toapply to identifiable classes of 0, 1-valued functions. Thefirst change is to have the sequence of values of .g(i) begin

with i 0's followed by a 1. The second change is to substituteproper subtraction for addition in the definition of {n+1 inStage n+1 of the constructions in Case (1) of each proof.We thus see that, if we restrict Exa and Bcm to 0, 1-valuedcomputable functions, there are still no Exa-maximal orBcm-maximal classes.

References [3, 8, 10, 14, 19] contain definitions ofrestricted classes of inductive inference machines calledPopperian, postdictively complete, postdictively consistent,and reliable (or strong). It is easy to see that the machinesM$ constructed in the proof of Theorem 1 have any of theseproperties that apply to the input machines M.

4. RELATED WORK AND FURTHER PROBLEMS

Another model of machine learning, called formallanguage learning, involves sequential presentation of theelements of a recursively enumerable language L (positivedata about or text for L) to an IIM which must, after finitelymany trial and error rounds, eventually output a grammar(or grammars) for L. See [6, 9, 16, 20, 21] for definitionsand basic results. The language learning analogs of Exa andBca are called TxtExa and TxtBca, respectively. Sur-prisingly, while the class F of all finite languages is inTxtEx0, if L is any finite r.e. language, (F _ [L]) is not inTxtBc* [6, 16, 20]. For related results see [1, 20].

From Theorems 1 and 2 in Section 3 above, we have thatany IIM M for Exa-identification or Bcm-identification canbe infinitely improved. More specifically, there is an IIM M$which identifies infinitely many more computable functionsthan M does. It is interesting to ask whether such an M$ canbe algorithmically found from M. Unfortunately, it cannotand not even if M$ is supposed to identify only one morecomputable function than does M [10]. It is further shownin [5] that this latter kind of M$ also cannot be found fromM by an incremental algorithm which is allowed to changeits mind finitely many times about its outputs before settlingon a correct output.

The proofs of Theorems 1 and 2 above, usefully con-sidered a dichotomy of IIMs M: those M satisfying Case (1)and those satisfying Case (2). Reference [7] contained theslightly overzealous claim to the effect that we had foundtwo algorithms for finding M$ (satisfying Theorem 1): onefor when M satisfies Case (1) and another for when Msatisfied Case (2). In fact, we had not found an algorithm forthe side of this dichotomy corresponding to Case (2). Thesecond author has conjectured that there is none, but thisremains open. It is also open whether there is somedichotomy with algorithms for (infinitely) improving IIMson each side.

213EARNABLE CLASSES

REFERENCES

1. G. Baliga and J. Case, Learnability: Admissible, co-finite, and hyper-simple sets, J. Comput. System Sci. 53 (1996), 26�32.

2. J. Barzdin, Two theorems on the limiting synthesis of functions, in``Theory of Algorithms and Programs, Latvian State University, Riga,1974,'' Vol. 210, pp. 82�88.

3. L. Blum and M. Blum, Toward a mathematical theory of inductiveinference, Inform. and Control 28 (1975), 125�155.

4. J. Case, Problems in inductive inference, Bull. Europ. Assoc. Theor.Comput. Sci. 18 (1982), 117�118.

5. J. Case, Infinitary self-reference in learning theory, Journ. Exper.Theoret. Artif. Intell. 6 (1994), 3�16.

6. J. Case, ``The Power of Vacillation in Language Learning,'' TechnicalReport LP-96-08, Logic, Philosophy and Linguistics Series, Institutefor Logic, Language and Computation, University of Amsterdam,1996; SIAM J. Comput., to appear.

7. J. Case and M. Fulk, Solution to inductive inference problem P3, Bull.Europ. Assoc. Theoret. Comput. Sci. 19 (1983), 8.

8. J. Case, S. Jain, and S. Ngo Manguelle, Refinements of inductiveinference by Popperian and reliable machines, Kybernetika 30 (1994),23�52.

9. J. Case and C. Lynes, Machine inductive inference and language iden-tification, in ``Proceedings, 9th International Colloquium on Automata,Languages and Programming'' (M. Nielsen and E. Schmidt, Eds.),Vol. 140, pp. 107�115, Lecture Notes in Computer Science, Springer-Verlag, Berlin, 1982.

10. J. Case and C. Smith, Comparison of identification criteria for ma-chine inductive inference, Theoretical Computer Science 25 (1983),193�220.

11. M. Davis, R. Sigal, and E. Weyuker, ``Computability, Complexity, andLanguages,'' 2nd ed., Academic Press, San Diego, 1994.

12. Freivalds and E. Kinber, On extension of inferrable classes,manuscript, 1984.

13. M. Fulk, ``A Study of Inductive Inference Machines,'' Ph.D. thesisSUNY at Buffalo, Buffalo, NY, 1985.

14. M. Fulk, Saving the phenomenon: Requirements that inductivemachines not contradict known data, Inform. and Comput. 79 (1988),193�209.

214 CASE AN

15. M. Fulk, Robust separations in inductive inference, in ``Proceedings,31st Annual Symposium on Foundations of Computer Science,St. Louis, Missouri, 1990,'' pp. 405�410.

16. E. Gold, Language identification in the limit, Inform. and Control 10(1967), 447�474.

17. M. Matchey and P. Young, ``An Introduction to the General Theoryof Algorithms,'' North Holland, New York, 1978.

18. A. Meyer and D. Ritchie, The complexity of loop programs, in``Proceedings, 22nd national ACM Conference,'' pp. 465�469, Thomas,Springfield, IL, 1967.

19. E. Minicozzi, Some natural properties of strong identification in induc-tive inference, Theor. Comput. Sci. (1976), 345�360.

20. D. Osherson, M. Stob, and S. Weinstein, ``Systems that Learn, AnIntroduction to Learning Theory for Cognitive and Computer Scien-tists,'' MIT, Cambridge, MA, 1986.

21. D. Osherson and S. Weinstein, Criteria of language learning, Inform.and Control 52 (1982), 123�138.

22. R. Pe� ter, ``Recursive Functions,'' Academic Press, New York, 1967.23. G. Riccardi, ``The Independence of Control Structures in Abstract

programming Systems,'' Ph.D. thesis, SUNY, Buffalo, NY, 1980.24. G. Riccardi, The independence of control structures in abstract

programming systems, J. Comput. Syst. Sci. 22 (1981), 107�143.25. H. Rogers, Go� del numberings of partial recursive functions, J. Symb.

Logic 23 (1958), 331�341.26. H. Rogers, ``Theory of Recursive Functions and Effective Computab-

ility,'' McGraw�Hill, New York, 1967; Reprinted, MIT Press, 1987.27. H. Rose, ``Subrecursion: Functions and Hierarchies,'' Oxford Univ.

Press, Oxford, 1984.28. J. Royer, ``A Connotational Theory of Program Structure,'' Lecture

Notes in Computer Science, Vol. 273, Springer-Verlag, New York�Berlin, 1987.

29. J. Royer and J. Case, ``Subrecursive Programming Systems: Com-plexity and Succinctness,'' Progress in Theoretical Computer Science,Birkha� user, Boston, 1994.

30. R. Wiehagen, personal communication to J. Case.

D FULK