## Abstract

Initially discussed are some of Alan Turing's wonderfully profound and influential ideas about mind and mechanism—including regarding their connection to the main topic of the present study, which is within the field of computability-theoretic learning theory. Herein is investigated the part of this field concerned with the algorithmic, trial-and-error inference of eventually correct programs for functions from their data points. As to the main content of this study: in prior papers, beginning with the seminal work by Freivalds *et al.* in 1995, the notion of *intrinsic complexity* is used to analyse the learning complexity of *sets* of functions in a Gold-style learning setting. Herein are pointed out some weaknesses of this notion. Offered is an alternative based on *epitomizing sets* of functions—sets that are learnable under a given learning criterion, but not under other criteria that are not at least as powerful. To capture the idea of epitomizing sets, new reducibility notions are given based on *robust learning* (closure of learning under certain sets of computable operators). Various degrees of epitomizing sets are *characterized* as the sets complete with respect to corresponding reducibility notions! These characterizations also provide an easy method for showing sets to be epitomizers, and they are then employed to prove several sets to be epitomizing. Furthermore, a scheme is provided to generate easily *very strong* epitomizers for a multitude of learning criteria. These strong epitomizers are the so-called *self-learning* sets, previously applied by Case & Kötzing in 2010. These strong epitomizers can be *easily* generated and employed in a myriad of settings to witness *with certainty* the strict separation in learning power between the criteria so epitomized and other not as powerful criteria!

## 1. Introduction

The present study is within the field of computability-theoretic learning theory.^{1}

We provide, as part of this introduction to the actual subject of the present study, rhetoric to connect some of Alan Turing's wonderfully profound and influential ideas about mind and mechanism to the primary content of the study.

Turing [1] gave the world, among other things, an analysis of algorithmic calculation by an (importantly) human clerk that led Turing to the first convincing mathematically rigorous and complete definition of what is computable by algorithm.^{2} The importance of Turing's analysis to theoretical psychology was recognized, for example, by Myhill [3], p. 149, where he says:
… in the author's view, the theorems of Church and Gödel are psychological laws. Mr. E. H. Galanter of the Department of Psychology, University of Pennsylvania, described them in conversation with the author as ‘the only known psychological laws comparable in exactitude with the laws of physics’.

^{3}

Of course, Turing himself initiated earlier the part of artificial intelligence that is concerned with computer simulations of human intelligence [4]. A little later, Turing carried his mechanization program over to a mathematical model of the very complex biological process of embryonic development [5]. This model has been influential in both developmental biology and chemistry.

Next we will segue into the topic of the present study—but with asides of relevance to treating mind as (algorithmic) mechanism (which, as noted above, Turing began).

In the present study, we analyse the problem of algorithmically learning a description (our descriptions will also be algorithmic) for an infinite sequence (a function from the natural numbers into the natural numbers) when presented with larger and larger initial segments of that sequence.^{4} For example, a learner *h* might be presented with more and more of the sequence *g*=0,1,4,9,16,…. After each new datum of *g*, *h* may output a description of a function as its conjecture. For example, *h* might output a computer program for the constantly 0 function after seeing the first element of this sequence *g*, a program for the sequence of all natural numbers after seeing the first two elements of *g*, and a program for the squaring function on the further elements from *g*.

In computability-theoretic learning theory, Gold, in his seminal paper [6], mathematically models human child language learning as algorithmic.^{5} Case [7,8] posits that the universe and hence its component humans are algorithmic at least in (quantum mechanically) *expected* behaviour, and claims that, then, some theorems about algorithmic learnability admit of interpretations for cognitive science and for philosophy of science. Regarding the relevance for philosophy of science, Gold [6] hints at connections between algorithmic learnability and (scientific) inductive inference. Blum and Blum [9] clarified this connection with essentially the following example (where successful conjectures are for the sequence/function itself).
Consider the physicist who looks for a law to explain a growing body of physical data. His data consist of a set of pairs (

*x*,*y*), where *x* describes a particular experiment, e.g., a high-energy physics experiment, and *y* describes the results obtained, e.g., the particles produced and their respective properties. The law he seeks is essentially an algorithm for computing the function *f*(*x*)=*y*.

As noted in [10,11], such an algorithm is a *predictive explanation*: if one has the good fortune to *have* such an algorithm, one can use it to predict the outcomes of the associated experiments. Finding programs for such functions (also known as sequences) is what one hopes to do in science—modelled as the search for predictive explanations. The present study does not itself involve direct applications to philosophy of science, whereas Case [7,8] surveys a number of them. Such applications essentially involve a slight extension of Turing's mechanization of mind program to apply to the (collections of) minds of human scientists (over historical time) attempting to provide predictive explanations for phenomena.

Many criteria for saying whether learner *h* is *successful* on sequence/function *g* have been proposed in the literature. Gold [6] gave a first, simple learning criterion, later called [10,11] *Ex-learning*,^{6} where a learner is *successful* if and only if (iff) it eventually stops changing its conjectures, and its final conjecture is a correct description of the input sequence.^{7}

Trivially, each *single*, describable sequence *g* has a suitable constant function as an Ex-learner (this learner constantly outputs an algorithmic description for *g*). Thus, we are interested in knowing for which *sets of functions* there is a *single learner* *h* learning *each* member of . We are interested herein in learning sets of (total) computable functions, and we will use (codes of) *programs* from a fixed *programming system* as possible conjectured (algorithmic) descriptions for the functions.^{8} This framework is known as *function learning in the limit* and has been studied extensively, employing a wide range of learning criteria similar to Ex-learning [12].

Freivalds *et al.* [13] considered how to define the *learning complexity* of sets of learnable functions. They introduced the seminal notion of *intrinsic complexity* and defined, for learning criterion *I*, a corresponding reducibility relation ≤^{I}. *Intrinsic* here is intrinsic to a learning task or problem , not to particular learning algorithms for . The idea is that, if , then is at least as hard to *I*-learn as is . In particular, Freivalds *et al.* [13] show that, if and is Ex-learnable, then is Ex-learnable. This intrinsic complexity has been further studied in some detail [13–15].

From [13], for a given learning criterion *I*, an *I*-learnable set of functions is said to be *≤ ^{I}-complete* iff, for all

*I*-learnable sets of functions , . As far as ≤

^{I}describes the relative difficulty of learnability, ≤

^{I}-complete sets are the most difficult to

*I*-learn. Freivalds

*et al.*[13] show that the set of all computable functions of finite support

^{9}is ≤

^{Ex}-complete. These notions from [13] are structural analogues, for example, to the various notions from complexity theory of polynomial time reducibility and completeness.

There are at least two problems connected with the notion of intrinsic complexity from Freivalds *et al.* [13].

For some learning criteria

*I*, the relation ≤^{I}is not very fine-grained. In particular, there are ≤^{I}-complete sets of functions that are also learnable with respect to much more restricted learning criteria (theorem 4.3).There are learning criteria

*I*and sets of functions such that and is*I*-learnable, but is*not**I*-learnable (theorem 4.4).

In this study, we quantify the difficulty of learning a given set of functions in a *new* way. First, we consider the following concept, essentially from [13]. A set of functions *epitomizes* a learning criterion *I* *with respect to* some set of learning criteria iff is *I*-learnable, and, for each , if some *I*-learnable task is too hard to be *I*′-learned, then is already such an example task too hard to be *I*′-learned.^{10}

We believe that epitomization nicely captures the learning complexity of a set of functions. Hence, the work herein aims at *finding* such epitomizers. Naturally, the interest is in epitomizers with respect to as large as possible sets of learning criteria . We give epitomizers with respect to sets of all learning criteria that are *robust* with respect to certain sets of (computable) operators (operating on functions). Essentially, a learning criterion *I* is *robust* with respect to a given set of operators iff, for each *I*-learnable task and each operator , is also *I*-learnable, i.e. the set of *I*-learnable sets of functions is closed under operators from .^{11}

Furthermore, for any set of operators , we define a reducibility and a corresponding completeness notion. As an important first theorem, we have that a set epitomizes a learning criterion *I* with respect to all -robust learning criteria iff is -complete for all *I*-learnable sets (theorem 3.7)! The benefits of this theorem, which we call the fundamental epitomization theorem, are twofold.

First, since, as noted above, we believe that epitomization captures the complexity of learning, the fundamental epitomization theorem entails that our reducibility notions also capture this complexity.

Secondly, we now need only to prove completeness to get epitomization!

Other than structural insight, we get the following two benefits from epitomizers.

First, we can use epitomizers to show the identity of the learning power of two learning criteria. For example, theorem 6.1 establishes to be epitomizing with respect to various learning criteria and certain sets of operators. In corollary 6.2, we use this to show a learning criterion *I* to be as powerful as one of the epitomized learning criteria by showing that can be *I*-learned. With classic methods, the proof of this result might have required tedious work with attention to detail, whereas we can conclude it to be a corollary to *structural properties* uncovered by our theorems.

The second way in which epitomization helps us is by providing *canonical candidates to witness the separation of two learning criteria*. To this end, the so-called *self-learning* sets (theorem 6.6) are particularly useful epitomizers and have recently been used in the literature to prove particularly difficult separations (see [19,20], which solve two previously open problems using this technique, and see also [21,22]). Intuitively, the defining learner for a *self-learning set* merely runs each input function value as a program (on inputs relevant for that kind of learner).^{12}

Freivalds *et al.* [13] noted that their ≤^{Ex}-completeness does not give epitomization with respect to even the set of learning criteria considered in [13].

Thus, we believe that our approach to complexity of learning is both more *comprehensive* and more *useful* than the notion of intrinsic complexity from [13].

We present mathematical preliminaries in §2. The notions discussed above and some first theorems about them are given in §3—including the mentioned fundamental epitomization theorem characterizing epitomizers as complete sets (theorem 3.7).

Section 4 gives definitions and results regarding the notion of intrinsic complexity introduced in [13]. We have already mentioned our theorems 4.3 and 4.4 that witness drawbacks of this older notion; furthermore, in theorem 4.6, we characterize ≤^{Ex} in terms of one of our reducibility notions, and conclude in corollary 4.7 that all sets complete with respect to a central one of our reducibility notions are ≤^{Ex}-complete.

Finally, in §6, we present a series of tasks and we state which learning criteria they epitomize *at what strength*. Note that epitomizers with respect to larger sets of learning criteria are stronger. As indicated above, we give each epitomization result, for some set of operators , and with respect to the corresponding set of -robust learning criteria. It will be seen that, the smaller the set of operators , the larger the set of -robust learning criteria. Theorem 6.1 entails that , the set of functions of finite support introduced above, is a very weak epitomizer. Some so-called self-*describing* sets (from the earlier literature) are also surprisingly weak epitomizers (theorem 6.3); some are of considerably greater strength (theorem 6.4); but, interestingly, the strongest are the (to be defined below) *self-learning* sets (theorem 6.6).

Some of our proofs involve subtle infinitary program self-reference arguments employing (variants of) Case's operator recursion theorem (ORT) [24,25].

We are working on extending the present study to employ self-learning sets that work for those learning criteria, *unlike those herein*, which require the learners to be total on *all* inputs. Some preliminary success re-learning grammars for languages (as noted above, not the topic of the present study) appear in [20] and involve the introduction of a hybrid operator recursion theorem (HORT).^{13}

The present study is an extension of [26].

## 2. Mathematical preliminaries

### (a) Notation and definitions

Unintroduced computability-theoretic notions follow Rogers [16].

denotes the set of natural numbers, {0,1,2,…}.

The symbols , respectively, denote the subset, proper subset, superset and proper superset relation between sets. The symbol ∖ denotes set difference.

The quantifier means ‘for all but finitely many ’; the quantifier means ‘for infinitely many ’. For any set *A*, card(*A*) denotes its cardinality, and Pow(*A*) denotes the set of all subsets of *A*.

With and we denote, respectively, the set of all partial and of all total functions . With dom and range we denote, respectively, the domain and range of a given function. Set-theoretically, (partial) functions are identified with their graphs, i.e. they are treated as sets of ordered pairs, and we sometimes compare them by ⊆.

We sometimes denote a partial function *f* of *n*>0 arguments *x*_{1},…,*x*_{n} in lambda notation (as in Lisp) by *λx*_{1},…,*x*_{n} *f*(*x*_{1},…,*x*_{n}). For example, with , *λx* *c* is the constantly *c* function of one argument.

If is not defined for some argument *x*, then we denote this fact by *f*(*x*)↑, and we say that *f* on *x* *diverges*; the opposite is denoted by *f*(*x*)↓, and we say that *f* on *x* *converges*. If *f* on *x* converges to *p*, then we denote this fact by *f*(*x*)↓=*p*.

We say that *converges to p* iff ; we write to denote this.

^{14}

For any (possibly partial) predicate *P*, we let *μx* *P*(*x*) denote the least *x* such that *P*(*x*) and, for all *y*<*x*, *P*(*x*)↓ (if no such *x* exists, *μx* *P*(*x*) is undefined).

We fix any computable one-to-one and onto pairing function .^{15} With *π*_{1} and *π*_{2}, respectively, we denote decoding into first and second arguments of pairing, respectively. Whenever we consider tuples of natural numbers as input to , it is understood that the general coding function 〈⋅,⋅〉 is used to (left-associatively) code the tuples into a single natural number (but we will not necessarily state the pairing explicitly).

For any and , we let *g*[*x*] denote the sequence of the numbers *g*(0),…,*g*(*x*−1), if all are defined, and ↑ otherwise.

A partial function is *partial computable* iff there is a deterministic, multi-tape Turing machine which, on input *x*, returns *f*(*x*) if *f*(*x*)↓, and loops infinitely if *f*(*x*)↑. and denote, respectively, the set of all partial computable and all (total) computable functions . The functions in are called *computable functions*.

We let *φ* be any fixed acceptable programming system for the partial computable functions with the associated complexity measure *Φ* [12,28]. Further, we let *φ*_{p} denote the partial computable function computed by the *φ*-program with code number *p*, and we let *Φ*_{p} denote the partial computable *complexity* function of the *φ*-program with code number *p*.

Whenever we consider sequences or finite sets as input to functions, we assume these objects to be appropriately coded as natural numbers. Similarly, when functions are defined to give non-numeric output, for example, when the outputs are in , we implicitly assume to be appropriately coded onto the natural numbers.

We use complexity-theoretic notions as introduced in [27]. We let LinF be the set of all linear time deterministic multi-tape Turing machine computable functions. A function *g* is called *linlin* iff *g* is computable in linear time and there is a linear time computable function *g*^{−1} such that *g*^{−1}°*g*=*λx* *x*. We let LL be the set of all linlin functions.

Some of our proofs will make use of s-m-n [16]. It is easy to see from [27] that, for a natural choice of *φ* corresponding to multi-tape Turing machines, we can get the following S-m-n theorem:^{16}
2.1
For this study, we will assume (2.1). As a direct consequence (see [29] and the proof of [30], theorem 2.3.5), we get a linlin version of ORT, given as follows ( is the set of all effective operators, see definition 3.2).
2.2

### (a) Learning in the limit

A *learner* is a partial computable function. A *target* is a total computable function *g*; a *learning task* is a set of targets .

A learning criterion consists of three parts that, together, determine whether a given learner is successful on a given learning task.

First, the learning criterion has to specify what learners are allowed to do. This is called a *learner admissibility restriction*, and is modelled as a set , the set of all admissible learners.

Secondly, the learning criterion has to specify how the learner and the target interact. This part is modelled as a *sequence generating operator*, which is an operator *β* taking as arguments a learner *h* and a target *g* and that outputs a function *p*. We denote by *p* the *learning sequence* of *h* given *g*. For this study, we think of *p* as the sequence of conjectured programs of *h* on *g*.

Thirdly, the learning criterion has to specify which learning sequences are to be considered ‘successful’ on a given target. This is done with a *sequence acceptance criterion*, a total binary predicate *δ* on a learning sequence and a target function.^{17}

For a learner admissibility restriction, *β* a sequence generating operator, *δ* a sequence acceptance criterion and *h* a learner, we call a learning criterion. For every learning criterion *I* with we let , *β*_{I}=*β* and *δ*_{I}=*δ*. Let be a learning criterion. We proceed by giving definitions for *I*-learning.

We say that *h* * I-learns a learning task * iff and, for all , with

*p*=

*β*(

*h*,

*g*), (

*p*,

*g*)∈

*δ*. We denote by and also by the set of all

*I*-learnable learning tasks.

^{18}With an abuse of notation, we sometimes also use to denote

*I*.

Any set of complexity-bounded functions is an example learner admissibility restriction, as are and . We omit mentioning , if (no restriction on the learner).

We make use of the special symbols ‘?’ and ‘#’, where ‘?’ is used to denote ‘don't know’ conjectures, and ‘#’ is used to denote a pause in a text.

We give the following three examples for sequence generating operators. Let **G** be defined thus:^{19}
2.3
Let **It** be defined thus:^{20}
2.4
Finally, we give *transductive* learning [29].
2.5
For sequence acceptance criteria, we give the following five examples. We define *explanatory* learning [6] as follows:
*Finite* learning is given by
Further, we define a sequence acceptance criterion corresponding to *postdictively complete* learning^{21} [9,32,33].
For *conservative* learning [34], we give the following sequence acceptance criterion:
Finally, behaviourally correct learning as given by [11,35] is associated with
Any two sequence acceptance criteria *δ* and *δ*′ can be combined by intersecting them. For ease of notation we write *δδ*′ instead of *δ*∩*δ*′.

Note that Ex-learning from the introduction is captured in this notation as **GEx**. *Bc-learning* from the prior literature, e.g. [11,35], requires for success that, for the learner's sequence of conjectured programs, all but finitely many of these programs correctly compute the input sequence/function. Bc-learning is similarly captured by **GBc**. For more examples of the above concepts, see [30].

We will use the following proposition later.

### Proposition 2.1

Let be ⊆-downward closed. Then there is a learning criterion *I* with .

### Proof.

We let *p* and *p*′ be two different computable functions. We associate with each element a learner . Let *β* be such that, for all and , ; for all other pairs , let *β*(*h*,*g*)=*p*′. Let *δ* accept, for all , (*p*,*g*). Obviously, .

## 3. Concept definitions

In this section, we give the key concepts used in this study in definition 3.4. Before we can get to that, we define pre-orders and associated notions, as well as several sets of operators, for which we give examples and remark on some easy properties.

The main theorem of this section is the fundamental epitomization theorem (theorem 3.7), which shows the important connections between complete sets and epitomizers.

### Definition 3.1

Let *S* be a set and ⪯ a binary relation on *S*. We call ⪯ a *pre-order* iff ⪯ is reflexive and transitive. For *s*∈*S* and *T*⊆*S*, we say that *s* is *⪯-complete for T* iff

*s*∈

*T*and, for all

*t*∈

*T*,

*t*⪯

*s*.

Next we define several sets of operators. For illustration, see example 3.3.

### Definition 3.2

A function is called an *operator*. We define the following sets of operators.

— Let be the set of all effective operators [16], i.e. all operators

*Θ*such that there is a computable function with ∀*e*:*Θ*(*φ*_{e})=*φ*_{s(e)}.^{22}— Let . Let be the set of all such that there is a function with .

^{23}We write for ; note that is equivalent to 3.1— Let . Let be the set of all such that

^{24}3.2 and there is a function with . We write for .^{25}— Let . Let be the set of all such that there is a function such that .

^{26}We write for .

Clearly, . We define sets of *left-invertible* operators as follows. For any set of operators and , we let
3.3

### Example 3.3

For illustration, we give the following example operators.

— The operator

*Θ*such that is in , but not in or ; furthermore,*Θ*is not in (*Θ*is not one-to-one and hence cannot be left-inverted).— The operator

*Θ*such that 3.4 is in and in , but not in ; furthermore,*Θ*is in , but not in (*Θ*has a computable left-inverse, but not a local one).— The operator

*Θ*such that is in , and also ; furthermore,*Θ*is even in .

Now we give the definition of the central notions of this study.

### Definition 3.4

Let be a set of operators and *I* a learning criterion. Let .

*I*is called*-robust*iff, for all and , 3.5 Intuitively,*I*is -robust iff the set of*I*-learnable sets is closed under operators from that are left-invertible*on the set to which they are applied*(by an operator from ).We say that

*epitomizes**I*with respect to a set of learning criteria iff and^{27}3.6If epitomizes

*I*with respect to the set of all -robust learning criteria, then we say that*-epitomizes*.*I*We say that

*-generates**I*iff is the set of all*I*-learnable functions.iff there is an operator such that .

As an interesting first observation on epitomizers, we make the following remark regarding separations from unions of learning criteria.

### Proposition 3.5

Let *I* be a learning criterion and a set of learning criteria with . Suppose that there is a set epitomizing *I* with respect to . Then

Theorem 6.6 provides very strong existence results for epitomizers that can be used to satisfy the corresponding hypothesis of proposition 3.5.^{28}

We can use proposition 3.5 to give cases where epitomizers do *not* exist: for example, let *I* be reliable learnability and let be the set of all learning criteria of delayed postdictive completeness;^{29} then and , which shows that there is no epitomizer of *I* with respect to (see [36] for the definitions; the result will be included in an extension of [36]).^{30}

The following theorem gives a number of general observations regarding the concepts introduced in definition 3.4.

### Proposition 3.6

Let be sets of operators, and *I* a learning criterion.

Let

*I*be -robust. Then, for all withIf then is a pre-order.

Suppose that

*I*is -robust. Then is downward closed under i.e. for all and

### Proof.

Regarding (a): Let , as witnessed by and . The direction ‘⇒’ is obvious.

Suppose . We have (as witnessed by *Θ*); hence, as , .

Item (b) is obvious.

Regarding (c): Suppose witnesses . As , . Hence, from *I* being -robust and (a), we get .

Next is the central theorem of this section, showing that, for important sets of operators and certain learning criteria *I*, -completeness for *I* *characterizes* -epitomization of *I*.

### Theorem 3.7 (Fundamental epitomization theorem)

*Let* *be a set of operators containing the identity and which is closed under composition. Let I be a* *-robust learning criterion. Let* *. The following are equivalent.*

-epitomizes I.

-generates I.

*is*-complete for I.

### Proof.

‘(a)⇒(b)’: By proposition 2.1, there is a learning criterion *I*′ such that exactly all of are *I*′-learnable. We need to show that . As *I* is -robust, . To show the other direction, we first show that *I*′ is -robust. Any *I*′-learnable set is of the form where and as witnessed by . Pick such . Suppose as witnessed by . As is closed under composition, as witnessed by . Thus, . Therefore, *I*′ is -robust.

Now we show that . As contains the identity, . Thus, as -epitomizes *I*, .

‘(b)⇒(c)’: Let . As is an -generator for *I*, let and as witnessed by be such that . Now we have that is left-invertible on (by *Θ*). Thus, and , which shows that as desired.

‘(c)⇒(a)’: Let *I*′ be -robust. Suppose there is . From being -complete for , we get such that . As *I*′ is -robust and not *I*′-learnable, we have that is not *I*′-learnable; thus, () is not *I*′-learnable.

## 4. Connection to intrinsic complexity

We will now give the definitions of intrinsic complexity. Some version thereof was introduced in [13]; here we give an interpretation that fits our formalism. After that we will give theorems regarding the shortcomings of intrinsic complexity, as discussed in §1.

In particular, theorem 4.3 gives an example ≤^{GEx}-complete set of functions that is nonetheless learnable in much more restricted criteria. Then, in theorem 4.4, we give two natural learning criteria *I* for which the learnable sets are not downward closed with respect to ≤^{I}. For the two criteria, the cause of this failure of closure is different: in one case it is a local restriction on the conjectures, and in the other a memory restriction on the learner.

Finally, we show the equivalence of ≤^{GEx} with one of our reducibility notions in theorem 4.6.

### Definition 4.1

Let be a learning criterion and *g* a function. We say that a sequence *p* is * I-admissible* for

*g*iff (

*p*,

*g*)∈

*δ*.

^{31}

Let , and an identification criterion *I* be given. We say that iff there exist recursive operators *Θ* and *Ψ* such that, for any function ,

and

for any

*I*-admissible sequence*p*for*Θ*(*g*),*Ψ*(*p*) is an*I*-admissible sequence for*g*.

We say that a set is ≤^{I}-complete iff is ≤^{I}-complete for .

Note that, for any learning criterion *I*, the definition of ≤^{I} is not sensitive to the learner admissibility restriction or the sequence generating operator of *I*. We formalize this as follows.

### Remark 4.2

Let *I*_{0} and *I*_{1} be learning criteria with identical sequence acceptance criteria. Then ≤^{I0} and ≤^{I1} coincide.

### Proof.

Completeness was proven in [13]; **GPcpEx**⊂**GEx** is a well-known result [9,12,35]; the rest is straightforward.

The following theorem shows the learnable sets of two learning criteria to be not downward closed with respect to their respective intrinsic reducibility notions.

### Theorem 4.4

*There are sets**and**such that**and**but*.*There are sets**and**such that**and**but*.

### Proof.

Regarding (a): The intuitive idea of this proof is that ≤^{ItEx} does not depend on the severe memory limitation imposed by **It**. Clearly, ≤^{ItEx} equals the relation ≤^{GEx}. Let (the existence of such sets is well known [12]). As is ≤^{GEx}-complete, we have ; thus, . Clearly, , which concludes the proof of (a).

Regarding (b): The idea of this proof is that ≤^{GPcpEx} allows ‘looking into the future’ by the operators involved.

We use a set of functions from [37], corollary 15 (see also [9], p. 140 ff). Let be such that
4.1
Let be such that
4.2
^{33}

We argue the following three claims, which will complete the proof.

— ;

— ;

— .

The last item has been shown in [37] (without the set of finite functions already in [9]); the second is straightforward to see by a learner *h* as follows. On no data, *h* outputs anything. Furthermore, as long as all data seen by *h* have a 1 as second component, *h* outputs the first component of the first datum. Otherwise, *h* outputs a canonical program for an extension of the seen data with 〈0,0〉s.

By s-m-n, there is such that ∀*e*:*φ*_{s(e)}=*π*_{1}°*φ*_{e}. Let *Ψ* be such that
4.3
To show that *Θ*,*Ψ* witness , it suffices to show that, for all and *p* **GPcpEx**-admissible for *Θ*(*g*), *Ψ*(*p*) is **GPcpEx**-admissible for *g*. Let and *p* be **GPcpEx**-admissible for *Θ*(*g*). Therefore, for all *x* and all *w*<*x*, *φ*_{p(x)}(*w*)=*Θ*(*g*)(*w*); thus,
4.4
This shows that *Ψ*(*p*) is a postdictively complete sequence of conjectures for *g*. Convergence of *p* to a correct index is straightforward.

In order to characterize the reducibility of intrinsic complexity in terms of our notions, we give the following definitions, extending notions from §3.

### Definition 4.5

Let and be sets of operators and .

Let .

For two sets and , we write iff there is an operator such that .

Furthermore, let be the set of all partial operators

*Θ*such that there is a computable function with 4.5

^{34}

Now we show the equivalence of one particular reducibility notion of intrinsic complexity to one of our extended variants from definition 4.5. Note that a similar characterization can be made for **GBc**.

### Theorem 4.6

*We have
*

### Proof.

Let . Suppose as witnessed by and . Then *Θ* is as required for . Let be as given by . Let *Ψ* be such that
4.6
Clearly, *Ψ* is as required for .

For the converse, suppose as witnessed by *Θ* and *Ψ*. Then *Θ* is as required for . By s-m-n, there is an such that
4.7
Let be such that
4.8
Obviously, *s* is total. Furthermore, for all *e* with , from the choice of *Ψ* we have that *φ*_{f(e)} is total and converges to a program number for the pre-image of *φ*_{e} with respect to *Θ*. We let be as in (4.5). It is easy to see that is also as required.

We get the following corollary from theorem 4.6.

### Corollary 4.7

Let *I* be a learning criterion with *δ*_{I}=**Ex** as the sequence acceptance criterion. We have that is a subrelation of ≤^{GEx}; in particular, for all

### Proof.

Clearly, ≤^{I} and ≤^{GEx} are identical. Note that, for all sets of operators and with and we have that is a subrelation of . Further, for all sets of operators , it is easy to see that equals . Thus, is a subrelation of . The rest follows from theorem 4.6.

## 5. Robustness

In this section, we give a number of examples for robustness of learning criteria with respect to various sets of operators. These results are summarized in table 1.

### Example 5.1

Let contain all linlin functions. Let be closed under -composition and contain all linear time computable functions.

Table 1 states several kinds of robustness of learning criteria with respect to certain sets of operators. In particular, in each row, the learning criteria in the right column are robust with respect to the set of operators in the left column.

Note that each criterion in any given row just above could also be listed in any lower row.

The remaining section deals with proving the statement of example 5.1.

First, we define some properties for sequence acceptance criteria.

### Definition 5.2

Let be a set of effective operators. A sequence acceptance criterion *δ* is called -*robust* iff, for all and with ∀*e*:*Θ*(*φ*_{e})=*φ*_{s(e)},
5.1^{35}

### Definition 5.3

Let .^{36} A sequence acceptance criterion *δ* is called a *delayable* iff
5.2^{37}

The following examples are straightforward.

### Example 5.4

**Ex**and**Bc**are delayable and -robust.**Pcp**is*not*delayable, whereas**Conv**is.

Next, we make a side remark and use Pitt-style delaying tricks (see [40]) to show that, for learning criteria with **G** as a sequence generating operator and a delayable sequence acceptance criterion, all learners can be assumed linear time computable.

### Proposition 5.5

Let *δ* be delayable. Then
5.3

### Proof.

The inclusion ‘’ is trivial. We fix linear time computable functions *T* and *S* as shown existent in [27], theorem 3.20 such that, for all *e* and *x*, if *φ*_{e}(*x*)↓, then, for all but finitely many *t*, *T*(*e*,*x*,*t*)=1; and, for all *e*,*x* and *t*, if *T*(*e*,*x*,*t*)=1, then *S*(*e*,*x*,*t*)=*φ*_{e}(*x*).

Let as witnessed by . Let *e* be such that *φ*_{e}=*h*. Let *f*,*h*′∈**LinF** be such that, for all *σ*,
5.4
and
5.5
It is easy to see that *h*′ is linear time computable and learns all functions learned by *h*.

We now characterize -robustness of **Pcp** and **Conv** in the following two theorems.

### Theorem 5.6

*Let* *be a set of effective operators. We have
*
5.6

### Proof.

We show the equivalence operator-wise. Suppose and *s* is any element with ∀*e*:*Θ*(*φ*_{e})=*φ*_{s(e)}.

‘⇐’: Suppose (*p*,*g*)∈**Pcp**, let . We have *φ*_{p(x)}[*x*]=*g*[*x*]. Thus, as ,
5.7

‘⇒’: Suppose and with *g*[*x*+1]=*g*′[*x*+1]. Suppose, by way of contradiction, *Θ*(*g*)(*x*)≠*Θ*(*g*′)(*x*). Let *e* and *e*′ be such that *φ*_{e}=*g* and *φ*_{e′}=*g*′. Let be such that
5.8
Then (*p*,*g*)∈**Pcp**. As
5.9
we get (*s*°*p*,*Θ*(*g*))∉**Pcp**.

### Theorem 5.7

*Let* *be a set of operators. We have
*
5.10

### Proof.

We show the equivalence operator-wise. Let and with ∀*e*:*Θ*(*φ*_{e})=*φ*_{s(e)}.

‘⇐’: Suppose (*p*,*g*)∈**Conv**, let with *p*(*x*)≠*p*(*x*+1). Thus, *g*[*x*+1]≠*φ*_{p(x)}[*x*+1]. Hence, as ,
5.11

‘⇒’: Without loss of generality, suppose *s* is one-to-one. Let and *x* be such that *g*[*x*]=*g*′[*x*] and *g*(*x*)≠*g*′(*x*). Suppose, by way of contradiction, that *Θ*(*g*)[*x*+1]=*Θ*(*g*′)[*x*+1]. Let *e* and *e*′ be such that *φ*_{e}=*g* and *φ*_{e′}=*g*′. Let be such that
5.12
Then (*p*,*g*)∈**Conv**. We have *s*(*e*)≠*s*(*e*′), as *s* is one-to-one and *g*≠*g*′. As
5.13
we get (*s*°*p*,*Θ*(*g*))∉**Conv** (a mind change was made from *x* to *x*+1, even though the previous hypothesis (*s*°*p*)(*x*) was correct on known data *Θ*(*g*)[*x*+1]).

### Definition 5.8

— An operator

*Θ*is called -bijective iff, for all and , 5.14— A sequence acceptance criterion

*δ*is called*local bijection robust*iff, for all ,*Θ*-bijective and*s*total (not necessarily computable) with ∀*e*:*Θ*(*φ*_{e})=*φ*_{s(e)}, 5.15

As a corollary to the proofs of the directions ‘⇐’ of theorems 5.6 and 5.7, we see that **Pcp** and **Conv** are local bijection robust. Furthermore, local bijection robust criteria are closed under intersection.

### Proposition 5.9

Let . Then there is an *f*∈**LinF** such that, for all sequences *σ*, *f*(*σ*) is a sequence with
and, for all with , there is such that^{38}
^{39}

### Proof.

We argue informally. We view *Θ* as a recursive operator. Let *σ* be a sequence. We define *f*(*σ*) as follows. We let *Θ* compute *Θ*(*σ*)(*x*) for all for steps. Let *x*_{0} be maximal such that, for all *x*<*x*_{0}, this computation halted with output *z*_{x}. Then *f*(*σ*) is the sequence *z*_{0},…,*z*_{x0−1}. For all with , we let *r* be such that, for all *t*, *r*(*t*) is the *x*_{0} obtained while computing *f*(*g*[*t*]) as above. It is straightforward to check that *f* is as desired.

After having introduced various properties of sequence acceptance criteria and establishing them for some examples, we are now ready to give our main theorem to establish robustness of learning criteria with respect to certain sets of operators. Example 5.1 provides corollaries to theorem 5.10.

### Theorem 5.10

*Let* *contain all linlin functions. Let* *be closed under* *-composition and contain all linear time computable functions. Further, let δ,δ′ be sequence acceptance criteria and* .

*Suppose δ is**-robust and delayable. Then**is**-robust.**Suppose δ is local bijection robust. Then**is**-robust.**Suppose δ is**-robust. Then**is**-robust.*

### Proof.

Regarding (a): Let as witnessed by and let as witnessed by . Let be such that ∀*e*:*Θ*(*φ*_{e})=*φ*_{s(e)}. Let be as claimed existent for in proposition 5.9. Let be such that
5.16
It suffices to show that *h*′-learns , as we then get (a) by the application of proposition 5.5. Obviously, . Let . Let *p*=**G**(*h*,*g*). By the choice of *f*, for all *g*′ with total, there is such that . Let *r* be witnessing this for *Θ*(*g*) (as we have that is total). By the choice of we have . Thus,
By *δ* being -robust, we have (*s*°*p*,*Θ*(*g*))∈*δ*. As *δ* is delayable, we have (*s*°*p*°*r*,*Θ*(*g*))∈*δ* as desired.

Regarding (b): Let as witnessed by ; furthermore, let as witnessed by . Let *s* be linlin such that ∀*e*:*Θ*(*φ*_{e})=*φ*_{s(e)}. Let be as given by , that is, such that . In particular, . Let be such that
5.17
We show that *h*′-learns . Obviously, , as is closed under -composition. Let and *p*=**G**(*h*,*g*). By the choice of *f*,
5.18
similarly as in the proof of (a). By *δ* being local bijection robust, it suffices to show *Θ* to be {*g*}-bijective. The direction ‘⇒’ in (5.14) of {*g*}-bijectivity follows directly from , the other from being an inverse for *Θ* on *g*.

Regarding (c): Let as witnessed by ; furthermore, let as witnessed by . Let *s* be linlin such that ∀*e*:*Θ*(*φ*_{e})=*φ*_{s(e)}. Let be such that . Let be such that
5.19^{40}
The remainder of the proof is analogous to (a).

## 6. Specific function learning epitomizers

In this section we present several epitomizers, starting with the very natural set of functions in theorem 6.1. After noting an immediate corollary, we show a self-*describing* set to be an epitomizer. Finally, we give a construction for self-*learning* sets and show them to be very powerful epitomizers in theorem 6.6, i.e. epitomizing with respect to a very wide range of learning criteria, much wider than for self-descibing sets (which is in turn wider than for ).

Note that all proofs rely implicitly on the fundamental epitomization theorem (theorem 3.7), as they show completeness and derive epitomization from that.

Next, we show to epitomize various *particular* learning criteria with respect to various *sets* of learning criteria.

### Theorem 6.1

*We have*

*-epitomizes***GEx***;**does not**-epitomize***GEx**;*-epitomizes***GPcpEx**;*does not**-epitomize***GPcpEx**.

### Proof.

We prove (a) at the end, since the proof of (c) gives valuable intuition for the proof of (a).

The claim of (b) follows from ; see [9,12,35].

The claim of (d) follows from the fact that is identifiable by a learner outputting only programs for total functions; the associated learning criterion, called *Popperian* identification, is strictly weaker than **GPcpEx** and -robust, as only subsets of uniformly recursive functions are learnable [11,41].

We prove (c). We use theorem 3.7 and show to be -complete for **GPcpEx**. Let as witnessed by *h*. Let *Θ* and be such that, for all and ,
6.1
and
6.2
^{41} Obviously, . Let and . We show . We have *Θ*(*g*)(0)≠0. Let *t*≤*x* be maximal such that *Θ*(*g*)(*t*)≠0. By the definition of *Θ* and the maximality of *t*, *Θ*(*g*)(*t*)=1+*h*(*g*[*x*+1]). Thanks to *h* learning *g* postdictively completely, we have
6.3
Furthermore, for all , as *h* converges on *g*, .

Regarding (a), we use a similar but more complex construction. The main difference is that we need to insert more information as long as the learner is not converged to ensure invertibility of the operator to be defined.

Let as witnessed by *h*. Without loss of generality, we assume *h* to be total [12]. Let *Θ*, , *Ψ*_{0}, *Ψ*_{1} and *c* be such that, for all ,
6.4
6.5
6.6
6.7
^{42} Obviously, . Let and . We show .

Let *g*′=*Θ*(*g*). Obviously, for each *j*∈{0,1}, *Ψ*_{j}(*g*′,*x*)↓.

*Case 1*: There is *t*≤*Φ*_{Ψ1(g′,x)}(*x*) with *t*≥*x* and *g*′(*t*)≠0.

Fix the minimal such *t*. Then the first case in the definition of does not hold, but the second case does. Therefore,
6.8

*Case 2*: ∀*t*≥*x*:*g*′(*t*)=0.

Then *h* on *g* is converged after *x* steps (by the definition of *Θ*) and *h*(*g*[*x*]) is a program for *g*. Thus, *φ*_{Ψ1(g,x)}(*x*)↓=*g*(*x*), which shows that the first and second cases in the definition of do not hold and .

*Case 3*: Otherwise.

As case 1 does not hold, for all *t* with *x*≤*t*≤*Φ*_{Ψ1(g′,x)}(*x*), *g*′(*t*)=0. As case 2 does not hold, *Φ*_{Ψ1(g′,x)}(*x*)↓. Let *t*=*Φ*_{Ψ1(g′,x)}(*x*). We have *g*′(*t*)=0. Let . Note that *Ψ*_{1}(*g*′,*t*)=*h*(*g*[*t*′])=*h*(*g*[*t*]).

From the definition of *Θ*, as *Θ*(*g*)(*t*)=0, *φ*_{h(g[t])}(*x*)=*g*(*x*). Hence, the first and second cases in the definition of do not hold and
6.9
This finishes the different cases. We also have, for all , as *h* converges on *g*, .

From theorem 6.1 we can directly get the following corollary, showing an identity of learning criteria power. The very short proof shows the power of our notions of epitomizers and robustness.

### Corollary 6.2

We have

### Proof.

Clearly, , the set of all computable functions with finite support, is **GPcpConvEx**-learnable. By example 5.1, **GPcpConvEx** is -robust. We obtain the result using theorem 6.1(c).

Next we analyse a set of self-describing functions that was first given in [11] and used extensively in [12]. Theorem 6.2 shows that this set is not a very good epitomizer.

### Theorem 6.3

*Let* *. Then* *-epitomizes* **G****Fin***. However,* *does not* *-epitomize any learning criterion that can be built from components given in this study as* *.*

### Proof.

We show to be -complete for **G****Fin** and apply theorem 3.7. Let as witnessed by (we can assume, without loss of generality, ). By the one-to-one ORT,^{43} there is a one-to-one such that
6.10
Let *Θ*, be such that
6.11
and
6.12
Clearly, as witnessed by . Furthermore, . This finishes the proof of the first assertion of the theorem.

We get by letting (a different) *Θ* be such that
6.13
We have , as, for all , *g*′(0) contains all the information to compute the pre-image of *g*′. Clearly, .

By the one-to-one ORT,^{44} there is a (different from above) one-to-one such that
6.14
Now we get by letting (a different) *Θ* be such that
6.15
We have , as, for all , *g*′(0) contains all the information to compute the pre-image of *g*′. Clearly, .

Next we analyse a set of self-describing functions that was used in [11] to show the separation of **GBc** and (a stronger version of) **GEx**. Theorem 6.4 shows that this set necessarily shows the separation of **GEx** and **GBc**, if any set does.

### Theorem 6.4

*Let* *. Then* *-epitomizes* **GBc***. However,* *does not* *-epitomize* **GBc***, as* *.*^{45}

### Proof.

We show to be -complete for **GBc** and apply theorem 3.7. Let as witnessed by *h*∈**LinF** (without loss of generality, we assume *h* to be linear time computable).

By linlin ORT, there is a linlin *s*∈**LL** such that
6.16
Let *Θ* be such that
6.17
Clearly, . We show . Let . We have, for all but finitely many *x* and all *y*,
6.18
6.19
6.20
6.21
Thus, . Hence, . This finishes the proof of the first assertion.

We show . Let and suppose, by way of contradiction, . Let with *g*≠*g*′. We have that *Θ*(*g*) and *Θ*(*g*′) are finite variants of one another; thus, for large enough *x*, *Θ*(*g*)(*x*)=*Θ*(*g*′)(*x*) is a program for both *Θ*(*g*) and *Θ*(*g*′). Therefore, *Θ*(*g*)=*Θ*(*g*′). Hence, *Θ* is not one-to-one on and thus not left-invertible on .

Theorem 6.6 gives examples for *self-learning* sets of functions, which turn out to be very strong epitomizers. In order to define these sets, we need the following notions of *computable robustness* and *data normality*.

### Definition 6.5

Let *I* be a learning criterion.

We call *I* *computably element-wise robust* iff *I* is element-wise robust in a constructive way, i.e. there is an effective operator such that, for all , *e*°*I*(*h*)⊆ *I*(*Ψ*(*h*,*e*)).

We call *I* *data normal* iff (a)–(c) below hold.

There is such that 6.22

^{46}There is a function such that 6.23

^{47}There is an effective operator such that, for all , and

^{48}

### Theorem 6.6

*Let I be a computably element-wise robust learning criterion with* *(i.e. I does not impose global restrictions on the learner). Suppose I is data normal as witnessed by f and d. Let h*_{0} *be such that
*
6.24
*Further, let* *.*^{49} *Then* *-epitomizes I.*

### Proof.

We show to be -complete for and apply theorem 3.7. Let as witnessed by .

Let *Ψ* be as given by *I* being computably element-wise robust. Let be as given by (c) of *I* being data normal. By linlin ORT, there is a strictly monotonic increasing *e*∈**LL** such that
6.25

It now suffices to show that , as . Suppose .

Let
6.26
We show, by induction on *i*,
6.27

This is clear for *i*=0 (both sides of the equality are ?). Let *i*>0, and suppose (inductively) . Let
6.28
Note that . We have
6.29
6.30
6.31
6.32
6.33
This concludes the induction proof of (6.27). Thus, *h*_{0} on any function makes the same conjectures as on *g*. By the choice of *Ψ* and , is an *I*-learner for ; thus, .

Theorem 6.6 provides epitomizing sets for the learning criteria *βδ* with *β* and *δ* as explicitly given in this study, and many more. Furthermore, of theorem 6.6 epitomizes with respect to all learning criteria that can be built from the example components given in this study (including the ones with learner admissibility restrictions), as they are all -robust. In particular, theorem 6.6 provides a superior epitomizer for **GBc** than the epitomizer of theorem 6.4.^{50}

## Acknowledgements

Timo Kötzing was supported by the Deutsche Forschungsgemeinschaft (DFG) under grant NE 1182/5-1. The authors are grateful for the many suggestions of reviewers for the current version as well as for a previous (conference) version [26].

## Footnotes

One contribution of 18 to a Theme Issue ‘The foundations of computation, physics and mentality: the Turing legacy’.

↵1 Sometimes simply called inductive inference.

↵2 There were previous attempts by Church to give such a definition, but it was Turing's analysis and definition that ultimately convinced Gödel as to the definitional correctness [2]. Nicely too Turing showed his definition was constructively equivalent to Church's. In computer science terms, this means that Turing's formalism and Church's are intercompilable.

↵3 Myhill, at least in his later years, was not in favour of the idea that humans are algorithmic mechanisms.

↵4 For us, the natural numbers are the non-negative integers.

↵5 Gold's sequences to be learned represent the correct utterances of a language, and his successful conjectures for such are programs or grammars for the

*set*of utterances, instead of, as above (and as in the present study), for the*sequence*itself.↵6

*Ex*stands for*explanatory*.↵7 Later in the study, after providing more formal definitions of this and other learning criteria, for uniformity of terminology, we change the term

*Ex-learning*to*GEx-learning*. This is to show that this criterion is ‘Gold-like’ as in [6].↵8 One could, for example, think of the programming system as one of Java, C, Turing machines, etc.

↵9 A (total) function has

*finite support*iff only finitely many arguments have a function value other than 0.↵10 Note that Freivalds

*et al.*[13] called epitomizing sets*characteristic*. To the best of our knowledge, neither this term nor the concept caught on in the later literature—until now.↵11 In the previous literature, a set was called

*robustly*iff, for*I*-learnable*all recursive operators*[16]*Θ*, the set of total functions in is*I*-learnable [17,18]. The motivation behind such*past*notions of robustness was to eliminate self-referential examples. The motivation herein is*quite*different. Herein, as will be seen, it is very interesting that which operators are to be considered can be restricted, and ask for*allI*-learnable tasks to be robust with respect to some possibly restricted set of operators.↵12 Not explored herein is the connection between machine self-reference (which, as we will see, is needed to exploit the just-mentioned self-learning sets) and the weaker phenomenon of human (conscious) self-modelling. Regarding this connection, see [23].

↵13 The HORT permits infinitary self-and-other program reference between certain highly restricted complexity-bounded programming systems and a general-purpose programming system.

↵14

*f*(*x*) converges should not be confused with*f*converges*to*.↵15 For a linear time computable and invertible example, see [27], §2.3.

↵16 We are grateful to Jim Royer for remarks providing an S-m-n theorem also linear time

*invertible*.↵17 Herein, our

*β*s and*δ*s can essentially be modelled as multi-argument variants of Rogers' [16]*recursive operators*.↵18 Note that ‘’ is the classical way to denote this, while we use ‘’ whenever ,

*β*and*δ*are not explicit.↵21 Also called

*consistent*learning.↵22 Note that, without loss of generality, we can take

*s*to be linlin (this can be proven using linlin s-m-n).↵23 Intuitively, for all

*g*,*x*,*Θ*(*g*)[*x*] depends only on*g*[*x*]. We call these operators*local*.↵24 ‘inj’ stands for

*injective*, but note that the operators from are not merely injective, but have a stronger requirement that could be called*locally injective*.↵25 Note that all effective operators satisfying (3.2) are in .

↵26 We call these operators

*element-wise*.↵27 Equation (3.6) is equivalent to . Thus, epitomizers give canonical candidates for learning criteria separations.

↵28 We give the following example for illustration. Let, for all ,

**Ex**^{a}be like**Ex**, but, for success, the final program is allowed to be incorrect on up to*a*places. Further, let**Ex*** be like**Ex**, but the final program is allowed to be incorrect on up to finitely many places. From [11] we know that, for all ,**GEx**^{a}⊂**GEx**^{a+1}; hence,**GEx**^{a}⊂**GEx***. Theorem 6.6 provides an appropriate epitomizer for**GEx***, so that we can deduce, with proposition 3.5, (which was shown in [11]).↵29 Intuitively, the definition of delayed postdictive completeness asks for the learner to start a

*countdown*for each datum seen; as soon as the countdown has run out, the learner has to correctly postdict the associated datum.↵30 We are grateful to an anonymous referee who pointed out a (different) example for the non-existence of epitomizers.

↵31

*I*-admissibility is not to be confused with learner admissibility restrictions.↵32 An anonymous referee pointed out that an even more interesting result would be to find a set in

**GPcpEx**that is ≤^{GPcpEx}-complete for the**GEx**learnable sets; this way, the more fine-grained reduction ≤^{GPcpEx}would be shown to be too coarse.↵33 Intuitively,

*Θ*‘looks into the future’, as*Θ*(*g*)(*x*) depends on*g*(*x*+1).↵34 Note that the operators from resemble those from [38,39].

↵35 Note that, for robustness of learning criteria, we restricted ourselves to certain left-invertible operators, whereas, for robustness of sequence acceptance criteria, we make no such restriction.

↵36 Note that is equal to the set of all non-decreasing and unbounded total computable functions.

↵37 Intuitively, if

*p*is a valid (with respect to*δ*) sequence of conjectures for*g*, then any delayed variant of*p*is.↵38 We gave the set in definition 5.3.

↵39 Note that, in

*this*study, we do not use that*f*is linear time computable, but merely that it is computable.↵40 Recall that the input to an iterative learner on function

*g*in iteration*x*is the triple*x*,*g*(*x*) and prior conjecture.↵41 By convention, and (0−1)↑.

↵42 Note that there are several ways in which

*Φ*_{Ψ1(g,x)}(*x*) can be undefined, in which case makes an unbounded search.↵43 Note that using the one-to-one parametric recursion theorem suffices.

↵44 Note that, again, using the one-to-one parametric recursion theorem suffices.

↵45 This uses the fundamental epitomization theorem (theorem 3.7).

↵46 Intuitively, the

*i*th conjecture of*h*on*g*depends only on some information (as specified by*f*_{I}) about the first*i*data points and conjectures.↵47 Intuitively, in the

*i*th round the learner has access to the (*i*−1)st data item.↵48 Intuitively, without loss of generality, the output based on no data of a learner equals ?.

↵49 Note that is a

*self-learning set*. Roughly, the learner (*h*_{0}in theorem 6.6) defining such a set ( in theorem 6.6) just*runs*each input datum as coding a program to determine its corresponding conjecture to output [19,21]. Note that*h*_{0}is not total and hence not, for example, polynomial time computable. As noted in §1, [20] introduce a new tool (HORT) to begin handling some complexity-bounded examples of total learners.↵50 The first author of the present paper, when he was co-creating [11], had the intuition that, for any criterion

*I*, if , then the of theorem 6.4 above would witness that separation. Consider*I*=**TdBc**. Clearly, separates**GBc**from**TdBc**. However, the epitomizer of theorem 6.4 clearly*is***TdBc**-learnable—disproving the present first author's old intuition. Nicely, though, from -robustness of**TdBc**(example 5.1), the epitomizer of theorem 6.6*does*separate**GBc**from**TdBc**.An aside: it is argued in [42] that self-referential examples may portend more natural examples.

- This journal is © 2012 The Royal Society