Hard Choices in Scien - Simon Fraser Universityoschulte/files/pubs/schulte_phd.pdf · Chapter 1...

Hard Choices in Scienti�c Inquiry

Oliver Schulte

Department of Philosophy

Carnegie Mellon University

December 12, 1997

Contents

1 Induction: The Problem and How To Solve It 7

1.1 The Problem of Induction . . . . . . . . . . . . . . . . . . . . . . 7

1.2 Hypothetical Imperatives for Inductive Inference . . . . . . . . . 9

1.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

1.3.1 Means-Ends Vindications of Traditional Proposals . . . . 11

1.3.2 Novel Solutions to Traditional Problems . . . . . . . . . . 11

1.3.3 New Questions and Answers . . . . . . . . . . . . . . . . . 12

1.3.4 Analysis of Inductive Problems from Scienti�c Practice . 14

1.3.5 Rational Choice in Games . . . . . . . . . . . . . . . . . . 14

1.4 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

2 A Model of Scienti�c Inquiry 17

2.1 Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

2.2 A Model of Scienti�c Inquiry . . . . . . . . . . . . . . . . . . . . 18

2.3 Examples of Inductive Problems and Scienti�c Methods . . . . . 26

2.3.1 Universal Generalizations . . . . . . . . . . . . . . . . . . 26

2.3.2 Almost Universal Generalizations . . . . . . . . . . . . . . 28

2.3.3 Goodman's Riddle of Induction . . . . . . . . . . . . . . . 30

2.3.4 Identifying Limiting Relative Frequencies . . . . . . . . . 33

2.3.5 Cognitive Science and the Physical Symbol System Hy-

pothesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36

2.3.6 Theories of Particle Physics . . . . . . . . . . . . . . . . . 37

2.4 Inquiry, Belief and Action . . . . . . . . . . . . . . . . . . . . . . 39

2.5 Revising Background Knowledge . . . . . . . . . . . . . . . . . . 43

3 Truth,Content and Minimal Change 45

3.1 Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45

3.2 Dominance in Error and in Content . . . . . . . . . . . . . . . . 47

3.3 Dominance Principles and Minimal Theory Change . . . . . . . . 49

3.4 Is \Minimal Change" Belief Revision Minimal? . . . . . . . . . . 53

3.5 Empirical Inquiry and Belief Revision . . . . . . . . . . . . . . . 58

3.6 Proofs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60

1

4 Discovery Problems and Reliable Solutions 63

4.1 Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63

4.2 Convergence to the Truth . . . . . . . . . . . . . . . . . . . . . . 64

4.3 Reliable Solutions for Discovery Problems . . . . . . . . . . . . . 66

4.4 Testing and Topology . . . . . . . . . . . . . . . . . . . . . . . . 70

4.5 Against Method . . . . . . . . . . . . . . . . . . . . . . . . . . . 76

4.6 Contra Convergence . . . . . . . . . . . . . . . . . . . . . . . . . 78

4.7 There's No Reliable Method|So What? . . . . . . . . . . . . . . 82

4.8 Proofs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83

5 Reliable Inference 85

5.1 Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85

5.2 Reliable Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . 86

5.3 Popper, Levi and Deductivism . . . . . . . . . . . . . . . . . . . 89

5.3.1 A New and Improved Falsi�cationism . . . . . . . . . . . 89

5.3.2 A Reliable Enterprise of Knowledge . . . . . . . . . . . . 90

5.4 Gettier meets Meno . . . . . . . . . . . . . . . . . . . . . . . . . 91

5.5 Proofs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94

6 Fast and Steadfast Inquiry 96

6.1 Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96

6.2 Data-Minimal Methods . . . . . . . . . . . . . . . . . . . . . . . 97

6.3 Retractions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99

6.4 Minimaxing Retractions . . . . . . . . . . . . . . . . . . . . . . . 103

6.5 A Characterization of Discovery With Bounded Mind Changes . 109

6.6 The Hierarchy of Cognitive Goals . . . . . . . . . . . . . . . . . . 114

6.7 Data-Minimality vs. Minimaxing Retractions . . . . . . . . . . . 115

6.8 Proofs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118

7 Theory Discovery 127

7.1 Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127

7.2 Reliable Theory Inference . . . . . . . . . . . . . . . . . . . . . . 128

7.3 Uniform Theory Discovery . . . . . . . . . . . . . . . . . . . . . . 130

7.4 Piecemeal Theory Discovery . . . . . . . . . . . . . . . . . . . . . 134

7.5 Countable Hypothesis Languages and Finite Axiomatizability . . 136

7.6 Proofs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137

8 Reliable Theory Discovery in Particle Physics 140

8.1 Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 140

8.2 Elementary Particles and Reactions . . . . . . . . . . . . . . . . . 142

8.3 Evidence in Particle Physics . . . . . . . . . . . . . . . . . . . . . 144

8.4 What Particles Are There? . . . . . . . . . . . . . . . . . . . . . 146

8.5 Identifying Subnuclear Reactions . . . . . . . . . . . . . . . . . . 148

8.6 Conservation Laws in Particle Physics . . . . . . . . . . . . . . . 150

2

8.7 Inferring Conservation Theories Without Virtual Particles . . . . 154

8.8 Inferring Conservation Theories With Virtual Particles . . . . . . 157

8.9 Parsimony, Conservatism and the Number of Quantum Properties 163

8.10 Proofs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 166

9 Admissibility in Games 171

9.1 Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 171

9.2 Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 172

9.2.1 Extensive and Strategic Form Games . . . . . . . . . . . . 172

9.2.2 Restricted Game Trees . . . . . . . . . . . . . . . . . . . . 173

9.3 Admissibility in Games . . . . . . . . . . . . . . . . . . . . . . . 174

9.4 Iterated Admissibility . . . . . . . . . . . . . . . . . . . . . . . . 177

9.5 Strict Dominance and Backward Induction . . . . . . . . . . . . 182

9.6 Weak Dominance and Forward Induction . . . . . . . . . . . . . 187

9.7 Proofs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 191

10 Conclusion 197

3

List of Figures

1.1 Categorical vs. Hypothetical Imperatives . . . . . . . . . . . . . 9

1.2 Performance Standard for Inductive Methods = Evaluation Cri-

terion + Cognitive Value . . . . . . . . . . . . . . . . . . . . . . . 10

1.3 The Hierarchy of Cognitive Goals . . . . . . . . . . . . . . . . . . 13

2.1 Possible Worlds and Propositions . . . . . . . . . . . . . . . . . . 19

2.2 Data Stream � . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

2.3 The Observations that may arise in a given World . . . . . . . . 22

2.4 Global Underdetermination of the question \is there a black swan"? 23

2.5 Empirical Hypotheses and Entailment . . . . . . . . . . . . . . . 25

2.6 An Inductive Method . . . . . . . . . . . . . . . . . . . . . . . . 26

2.7 A Universal Generalization . . . . . . . . . . . . . . . . . . . . . 27

2.8 Almost Universal Generalizations with Finitely Many Exceptions 29

2.9 The New Riddle of Induction . . . . . . . . . . . . . . . . . . . . 32

2.10 The limit of the relative frequencies is 1/2. . . . . . . . . . . . . 34

2.11 How many particles are there? . . . . . . . . . . . . . . . . . . . 38

3.1 Content vs. Error . . . . . . . . . . . . . . . . . . . . . . . . . . . 50

3.2 Pareto-Minimal Theory Changes that avoid Additions and Re-

tractions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52

3.3 Pareto-Minimal Theory Changes and the AGM Axioms . . . . . 54

3.4 Three Notions of Minimal Theory Change . . . . . . . . . . . . . 57

4.1 Successful Discovery: On data stream �, method � identi�es the

correct hypothesis from a set of alternatives. . . . . . . . . . . . . 65

4.2 Testing Empirical Hypotheses: Decision in the Limit of Inquiry . 68

4.3 The Projection Set of a Discovery Method � . . . . . . . . . . . . 71

4.4 Method � veri�es hypothesis H in the limit. . . . . . . . . . . . . 73

4.5 Data stream � is a limit point of hypothesis H. . . . . . . . . . . 75

5.1 Decomposing Hypotheses Into Refutable Subsets. . . . . . . . . . 87

5.2 The Bumping Pointer Method . . . . . . . . . . . . . . . . . . . . 88

4

5.3 Given the observations from data stream �, a connectionist model

N is the true theory of machine intelligence, but the production

systems approach is never conclusively refuted along �. . . . . . . 93

6.1 A data-minimal method must project its conjectures: �0 domi-

nates � with respect to convergence time. . . . . . . . . . . . . . 98

6.2 Method � always projects its current conjecture and hence is data-

minimal. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100

6.3 In Search of a Neutrino. . . . . . . . . . . . . . . . . . . . . . . . 104

6.4 The Riddle of Induction. . . . . . . . . . . . . . . . . . . . . . . . 106

6.5 Another Riddle of Induction. . . . . . . . . . . . . . . . . . . . . 108

6.6 \Feather" Structures characterize Discovery with Bounded Mind

Changes. The �gure illustrates 0-feathers and 1-feathers. . . . . . 111

6.7 2-feathers and 3-feathers . . . . . . . . . . . . . . . . . . . . . . . 112

6.8 Minimaxing Retractions requires waiting until time n. . . . . . . 113

6.9 The Hierarchy of Cognitive Goals. . . . . . . . . . . . . . . . . . 116

7.1 Two Notions of Theory Discovery: (a) Uniform Theory Discovery

(b) Piecemeal Theory Discovery . . . . . . . . . . . . . . . . . . . 129

7.2 Method � projects both Hr and H2 along some data stream, but

not both on the same data stream. . . . . . . . . . . . . . . . . . 131

7.3 Method �0 changes its overall theory three times, its conjecture

about H!1 twice, about H1 once. . . . . . . . . . . . . . . . . . . 133

8.1 A Particle World and the Particles in it. . . . . . . . . . . . . . . 144

8.2 The Evidence that may arise in a Particle World . . . . . . . . . 147

8.3 Does reaction r occur? . . . . . . . . . . . . . . . . . . . . . . . . 149

8.4 A Set of Reactions, encoded as Vectors with associated Linear

Equations. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 155

8.5 Track of the Decay of a Pion into a Muon . . . . . . . . . . . . . 158

9.1 Admissibility in a Game of Perfect Information. The label inside

a nodes indicates which player is choosing at that node. . . . . . 175

9.2 A Game Without Perfect Recall . . . . . . . . . . . . . . . . . . 178

9.3 Weak Admissibility . . . . . . . . . . . . . . . . . . . . . . . . . . 179

9.4 Order-Free Elimination of weakly Dominated Strategies . . . . . 181

9.5 A Game of Perfect Information . . . . . . . . . . . . . . . . . . . 184

9.6 Subgame Perfection vs. Weak Admissibility . . . . . . . . . . . . 186

9.7 Backward vs. Forward Induction Principles . . . . . . . . . . . . 189

5

List of Tables

8.1 Some Elementary Particles . . . . . . . . . . . . . . . . . . . . . 143

8.2 Quantum Number Assignments . . . . . . . . . . . . . . . . . . 151

6

Chapter 1

Induction: The Problem

and How To Solve It

1.1 The Problem of Induction

Much of epistemology and the philosophy of science is a set of responses to the

problem of induction|the observation that no matter how much evidence we

have accumulated, the very next datum might refute our generalizations. This

observation led Hume to the conclusion that, although we may be in the habit ofpreferring some inferences over others, one generalization is as good as another

[Hume 1984]. Others maintain that although we may not be certain that any

generalization is correct, still some inferences are better than others: There are

some conclusions that an inquirer ought to draw, whether or not her psychologyinclines her towards them. For centuries philosophers have sought to articu-

late principles that lead to the right inferences. Statisticians and researchers

interested in algorithms for machine learning have joined in this enterprise.

What grounds can we give for the claim that some inferences are the right

ones? Kant referred to rules for what one ought to do as imperatives, and drew

a fundamental distinction between two kinds: categorical imperatives and hy-

pothetical imperatives [Kant 1785]. Categorical imperatives are rules that

constrain a person's choices no matter what her interests or abilities are. An

example of a categorical imperative for epistemic agents, to which many philoso-

phers subscribe, is that they ought to have consistent beliefs. Another example

are \postulates of rationality" such as the constraints on \rational" decision

making that Savage stipulates [Savage 1954]. Hypothetical imperatives are

rules of \sagacity"|Klugheit|as Kant said, which guide an agent as to how

he should go about attaining a given goal. Thus hypothetical imperatives are

of the form: \Action A will bring about consequence C: Therefore, if you want

C (under the hypothesis that you want C), you should do A." Hypothetical

7

imperatives form the core of instrumental , or means-ends rationality: On

this conception, rational agents are those who choose the best means for accom-

plishing their aims.

Students of inductive inference have o�ered both categorical and hypothet-

ical imperatives for principles of induction. The main mode of justi�cation for

categorical imperatives is to defend them because they are \intuitively plau-

sible", or because they agree with \exemplary scienti�c practice". If a di�-

cult case turns up where the categorical imperative appears to give the wrong

guidance|as judged by intuition|the champion of \pure rationality" adjusts

his principles to accommodate the di�cult case, hoping that one day he will

reach \re ective equilibrium", a point at which his principles of \pure rational-

ity" pass the tribunal of intuition. 1

For a proponent of \pure inductive rationality", the basic unit of analysis is

the methodological maxim. She applies the maxim to various inductive prob-

lems to see if it gives the \intuitively plausible" answer, tallying up \successes"

and \failures" (e.g., [Earman 1992]). For means-ends rationality, the basic unit

of analysis is the inductive or scienti�c problem. No general \principle of ra-

tionality" can beat the trivial advice \choose the best method for the problem

at hand". The proponent of instrumental rationality casts a critical eye on al-

legedly general principles of inductive rationality: does the principle in question

help the agent to achieve the aims of inquiry? Or does it prevent the agent

from doing so? As William James said, \a rule that would prevent me from

acknowledging certain kinds of truth if those kinds of truth were really there,

would be an irrational rule" [James 1982, Sec.X, p.206]. 2

Figure 1.1 summarizes the main contrasts between these two perspectives

on the problem of induction.

In philosophy, the tradition of con�rmation theory seeks categorical impera-

tives for inductive inference (e.g., [Carnap 1962], [Glymour 1980]). The means-

ends approach is the normative perspective underlying Classical Statistics and

formal learning theory. Formal learning theory examines what inductive

methods are reliable means for arriving at a true theory in the limit of in-

quiry. Hilary Putnam, one of the founders of formal learning theory, asked

whether Carnap's con�rmation functions were adequate to the goal of settling

on true generalizations, and found them wanting [Putnam 1963]. Since then

learning theory has ourished in philosophy and in computer science (see for

example [Angluin and Smith 1983], [Glymour 1991], [Kelly and Glymour 1992],

[Kelly et al. 1994]). Kevin Kelly's recent book, \The Logic of Reliable Inquiry",

1A locus classicus that outlines this project is [Goodman 1983, Section III.3]. The search

for re ective equilibrium is common to all attempts at philosophical \explication". For ex-ample, John Rawls says that in constructing a procedural test of justice, we must check the

formulations of the procedure by seeing whether the conclusions reached match \our consid-

ered judments on due re ection" [Rawls 1996, III.1.4].2James's fellow pragmatist Peirce made the point even more colorfully: \The following

motto deserves to be inscribed upon every wall of the city of philosophy: Do not block the

path of inquiry." [Peirce 1958, 1:35]

8

Figure 1.1: Categorical vs. Hypothetical Imperatives

builds on learning theory to develop a comprehensive philosophy of science and

empirical inquiry [Kelly 1995]. This thesis extends Kelly's work.

1.2 Hypothetical Imperatives for Inductive In-

ference

I apply learning-theoretic techniques to determine what methods are best for

accomplishing given aims of inquiry in a given problem. I examine a variety

of cognitive values that various writers have proposed as desiderata in inquiry.

These are: content, truth, avoiding error, avoiding retractions (theory changes),

convergence to the truth, fast convergence to the truth, and producing theories

that are �nitely axiomatizable. The next step is to de�ne what it is for an

investigative method to perform well with respect to a given cognitive value.

For this I draw on two basic principles from decision theory: admissibility and

the minimax principle. Combining an evaluation criterion with a cognitive value

yields a performance standard for inductive methods; see Figure 1.2.

The theory of optimal inductive inference that I develop in this thesis answers

three kinds of questions.

1. What standards of success are feasible in a given problem?

2. What methods are optimal for feasible standards of success in a given

inductive problem?

9

Figure 1.2: Performance Standard for Inductive Methods = Evaluation Crite-

rion + Cognitive Value

10

3. Are there systematic dependencies and con icts between di�erent epis-

temic values?

1.3 Results

The fruits of the theory are rewarding. I group the results in �ve di�erent

categories.

1.3.1 Means-Ends Vindications of Traditional Proposals

I show that some traditional proposals for how scienti�c inference ought to

proceed|usually defended on grounds of \intuitive plausibility"|are in fact

the optimal means for attaining certain cognitive goals. These proposals include

the principle of always entailing the evidence, a skeptical attitude that will not

generalize beyond the evidence, axioms for \minimal change" belief revision,

and Popper's conjectures-and-refutations scheme. My results clarify the status

of such norms: Those who subscribe to the epistemic values implicit in the norm

should follow it, whereas others may (and sometimes should) prefer di�erent

methods.

Theories of belief revision are linked with theories of conditionals and non-

monotonic inference; currently, this cluster of topics is one of the most active

areas of research in philosophical logic. My means-ends analysis leads to a prin-

cipled new notion of minimal belief change that di�ers from the received axioms

(the so-called \AGM" postulates from [Alchourr�on et. al. 1985]).

1.3.2 Novel Solutions to Traditional Problems

Some combinations of epistemic values are particularly interesting. Suppose

that we follow learning theory in ranking reliable convergence to the truth �rst.

Then add the desiderata of minimizing convergence time and avoiding retrac-

tions. We may think of the result as a standard of e�ciency for asymptotically

reliable methods. This e�ciency criterion has particular intuitive appeal:

� In Goodman's Riddle of Induction, the only projection rule that is e�-

cient in the sense described is the natural one. (On a sample of all green

emeralds, project that all emeralds are green.)

� This criterion requires a variant ofOccam's razor. To be precise, it requires

that a scientist should not infer the existence of an entity that is observable

in principle until she in fact observes it.

� In some circumstances, e�ciency in the sense de�ned requires a particle

theorist to posit the existence of unobservable particles. Thus there is a

purely instrumental reason for introducing hidden entities.

11

It is no accident that the same combination of epistemic values underwrites

the intuitive inferences in these problems: I show that Goodman's Riddle, Oc-

cam's Razor and the theory of reactions among elementary particles share a

common topological structure. Thus learning theory discovers epistemically rel-

evant common structure among problems that look very di�erent to unguided

intuition.

These results provide reliabilists such as Hans Reichenbach (cf. Section 2.3.4)

with an answer to those critics|such as [Salmon 1967]|who point out that

convergence in the long run does not rule out any inference in the short run.

Although this observation hardly re ects badly on the aim of arriving at the

truth, it shows that long-run reliability by itself does not provide the guidance

for what to believe in the short run that many writers concerned with the

problem of induction have sought (e.g., [Earman 1992, Ch.9]).

I propose a new answer for the reliabilist to Salmon's challenge: If we aug-

ment the reliabilist's \pragmatic vindication" (Reichenbach's term for a means-

ends justi�cation) of reliable inference rules by taking into account other epis-temic values, such as avoiding retractions and minimizing convergence time, we

obtain interesting strong constraints for what to infer in the short run. Indeed,

that move gives the long-run perspective the resources to solve famous short-

run puzzles such as Goodman's Riddle of Induction|puzzles that have eluded

theories of induction framed from a short-run perspective.

1.3.3 New Questions and Answers

I de�ne a set of notions of e�ciency for reliable methods by combining the

decision-theoretic evaluation criteria with various epistemic goals. Considering

the variety of cognitive values that I examine, one may fear that the resulting

theory will be a complicated catalog of means-ends analyses with no systematic

relationships among di�erent epistemic aims. On the contrary, the cognitive

goals fall into a tidy feasibility hierarchy . For example, for any inductive prob-

lem in which it is possible to minimax convergence time, it is possible to attain

any of the other standards of e�ciency. In that sense, minimaxing convergence

time is the least feasible or the most stringent of any of the e�ciency criteria in

the hierarchy. On the other hand, applying the admissibility criterion to evalu-

ate reliable methods with respect to convergence time yields the most feasible or

least stringent standard of e�ciency. Figure 1.3 shows the hierarchy of cognitive

goals;

Chapter 6 explains in detail the relationships that the �gure illustrates.

The hierarchy of cognitive goals exhibits systematic dependencies among

cognitive values. I also examine systematic con icts among them. Speaking

roughly and generally, the \skeptical" aims of avoiding error and avoiding re-

tractions form a group that con ict with the \realist" goals of converging to the

truth, providing content and minimizing convergence time. The former values

pull inquiry away from \inductive risks" whereas the latter require \bold gener-

12

Figure 1.3: The Hierarchy of Cognitive Goals

13

alizations". However, the relationships among these cognitive values are more

subtle than this simple dichotomy. Often minimaxing retractions does con ict

with minimizing convergence time, and I provide an exact characterization of

the extent of the con ict. But sometimes an agent can have it all, epistemically

speaking: She can reliably converge to the right answer, without unnecessary de-

lays and without unnecessarily many retractions. In these case the combination

of reliability with minimizing convergence time and retractions has particular

intuitive appeal. Cases of this kind are Goodman's Riddle, Occam's Razor and

theories of particle reactions.

In sum, systematic means-ends analysis describes precisely subtle dependen-

cies and con icts among epistemic aims that are news in epistemology and the

theory of inductive inference.

1.3.4 Analysis of Inductive Problems from Scienti�c Prac-

tice

One of the goals of the philosophy of science is to illuminate methodological

problems that arise in scienti�c practice. The two main problems of particle

physics are (1) to �nd the set of elementary particles, and (2) to describe the

possible reactions among elementary particles [Omnes 1971]. I analyze these

problems from a learning-theoretic perspective. Without further background

assumptions, there is no method for theorizing about the set of elementary

particles that is guaranteed to arrive at the right answer, even in the limit of

inquiry. In practice it is common to assume that there are only �nitely many

elementary particles. With that assumption, learning theory gives a positive

result: there is a method for generalizing from laboratory data to the existence of

elementary particles that reliably converges to positing the true set of elementary

particles.

As for problem (2), I show that without the assumption that particle re-

actions can be described with conservation principles|again, a common as-

sumption in practice|or another assumption of similar strength, there is no

reliable method for theorizing about which reactions are possible. But given

this assumption, there is an e�cient algorithm for reliably identifying the set of

possible reactions|-that is, an algorithm that minimizes convergence time and

retractions. Moreover, there are situations|which I characterize precisely|in

which such a reliable e�cient method has to introduce hidden particles to avoid

taking back its conjectures.

These results give a precise sense in which important elements of particle

physics|conservation laws and hidden particles|serve goals of inquiry.

1.3.5 Rational Choice in Games

I rely on the well-known decision-theoretic principle of admissibility to evaluateinductive strategies. Several of my results show that for a given epistemic value,

14

a method is admissible with respect to that value just in case it is admissible

at each stage of inquiry. This connection between admissibility in the scien-

ti�c \game against nature" overall and admissibility at stages of inquiry is a

fundamental reason why the admissibility principle yields short-run constraints

on inductive methods. I show that this connection re ects a fundamental fact

about sequential games in general, not just \games against nature" (which have

a rather special form): In just about any sequential game, a strategy is admis-

sible just in case it is admissible at each stage of the game. This basic theorem

of game theory suggests using the admissibility criterion to derive a powerful

procedure for making predictions about strategic interactions in general, with

applications in the foundations of micro-economics and political philosophy. I

prove that this procedure has several attractive properties as a solution concept

for games.

1.4 Overview

Chapter 2 presents the model of empirical inquiry that I will use throughout

this dissertation to formulate and analyze inductive problems. I describe several

prominent examples of inductive problems to illustrate the model, and discuss

some of the epistemological assumptions that are (or are not) implicit in the

model.

Chapter 3 begins the systematic investigation of optimality criteria for induc-

tive methods, starting with content, avoiding error and avoiding mind changes.

This chapter includes a critique of common proposals for \minimal theory

change" and suggests an alternative.

Chapter 4 characterizes the conditions under which inductive problems havereliable solutions|that is, methods that are guaranteed to arrive at a correct

answer to the problem in the long run. I describe the di�erences between reliable

convergence in the long run as an ideal of success in inquiry and other proposals,

and defend the importance of long-run convergence against various objections.

Chapter 5 describes the structure of reliable methods. I prove a normal form

theorem that shows that (virtually) all reliable methods can be constructed

in a certain way. This insight has applications to problems in Popper's and

Levi's epistemologies. I use the normal form theorem to raise and answer an

in�nitary version of Gettier's paradox for Plato's account of knowledge as stable

true belief.

Chapter 6 is the core chapter of this thesis. I de�ne a set of six e�ciency

criteria for reliable inference, and characterize what methods are e�cient in

the respective senses. These results yield a solution to Goodman's Riddle of

Induction, and a vindication of a variant of Occam's Razor. I show that the six

e�ciency criteria fall into an exact hierarchy of feasibility.

Chapter 7 extends the theory of e�ciency for reliable inquiry to the task of

�nding correct theories for a range of logically independent phenomena under

15

investigation. (The previous chapters examine the problem of identifying the

true hypothesis from a range of mutually exclusive alternatives.)Chapter 8 applies the machinery from Chapter 7 to analyze inductive prob-

lems that arise in particle physics. I show that conservation principles and

hidden particles (or similar theoretical ingredients) are necessary for reliable,

e�cient inquiry. A by-product of the analysis is a striking result about the

structure of conservation principles in particle physics: Roughly, under con-

servation of energy, there cannot be more conservation principles than stable

particles.

Chapter 9 proves a fundamental theorem in game theory: Roughly, that a

strategy is admissible for a sequential game if and only if it is admissible at

each stage of the game. Several of the characterizations of admissible methods

from the previous chapters depend on this fact. I apply this insight to derive

a solution concept for games in general based on the admissibility principle.

I establish various attractive properties of this solution concept, and develop

some of the implications for the theory of rational choice in games and the

foundations of micro-economics.

Each chapter begins with an outline that summarizes the main questions

that I tackle in the chapter, and describes the answers. The body of the chapter

gives the details, with examples, diagrams and de�nitions. I state and explain

formal results, and outline informal but precise arguments for why they are true.

The last section of each chapter contains the formal proofs.

16

Chapter 2

A Model of Scienti�c

Inquiry

2.1 Outline

Isaac Levi writes:

Scienti�c inquiry, like other forms of human deliberation, is goal-

directed activity. Consequently, an adequate conception of the goal

or goals of scienti�c inquiry ought to shed light on the di�erence

between valid and invalid inference; for valid inferences are good

strategies designed to attain these objectives [Levi 1967, preface].

This passage suggests a three-step program for the study of scienti�c method.

1. Develop an adequate conception of scienti�c inquiry.

2. Develop an adequate conception of the goals of scienti�c inquiry.

3. Investigate what strategies for inquiry attain these goals.

These steps have counterparts in the way that learning theorists treat prob-

lems of induction.

1. Specify a model of the learning situation|what kind of data are available

to the learner, and what kind of outputs does the learner produce?

2. Specify a criterion of success|for example, to reliably converge to the

truth in the Putnam-Gold paradigm, or to produce an approximately cor-

rect hypothesis, fairly quickly and with high probability. 1

1This is known as the PAC paradigm; see Section 4.6.

17

3. Investigate when and how it is possible to achieve the speci�ed kind of

success.

Philosophers have spent a good deal of e�ort in weighing the virtues of one

conception of scienti�c inquiry and its aims against another. They have been

less thorough with the third step: to determine the best scienti�c methods for a

given conception of science and its aims. Learning theorists, on the other hand,

present such means-end analyses within a number of learning models. They

are not so concerned with �nding \the adequate conception" of science, or in-

duction. What matters is whether a model captures some important aspects

of empirical inquiry. Di�erent models describe di�erent types of inquiry. For

example, we may study various scenarios for how the scientist gathers evidence:

by passive observation only, through experiments [Kelly 1995, Ch.14], or with

random sampling. Similarly, learning theorists investigate a number of stan-

dards of success for inquiry, corresponding to di�erent cognitive objectives and

the resources that the inquirer has for achieving them.

The approach of this thesis is in the spirit of learning theory. My project

is to carry out a means-ends analysis for a variety of proposals about the goals

of inquiry, all of which are interesting, but none of which I take to be the only

adequate one. I describe a model of the scientist's situation (step 1), but I do

not claim that it is the only adequate description. Rather, this model has three

virtues that recommend it as a basis for my project: (1) it is conceptually and

formally relatively simple, (2) it allows me to formulate a number of problems of

induction clearly, and (3) it �ts certain parts of scienti�c practice (drawn from

particle physics) in a natural way. The model is Kelly's extension [Kelly 1995,

Ch.3] of the Putnam-Gold framework. This chapter describes Kelly's model,

and illustrates how useful and exible it is for formulating inductive problems

precisely. I conclude with a discussion of how my conception of scienti�c inquiry

relates to other proposals in the philosophical literature.

2.2 A Model of Scienti�c Inquiry

Empirical inquiry begins with uncertainty about the world. A possible world

w completely speci�es the facts of relevance to the inquirer. For example, if the

researcher is interested in the colour of swans, a possible world determines the

colour of each swan in that world. The set of possible worlds W comprises

all possible descriptions of the way things|those things that the researcher is

interested in|might be.

At the beginning of inquiry, the inquirer is uncertain about which of the

possible worlds is the actual one. I identify a proposition P with a set of

possible worlds (namely those at which the proposition is true). Figure 2.1

illustrates these notions.

A scientist collects evidence to �nd out whether one or more propositions

of interest|hypotheses|are true in the actual world. As inquiry proceeds, one

18

Figure 2.1: Possible Worlds and Propositions

19

piece of evidence after another is collected; if inquiry continues inde�nitely, an

in�nite sequence �1; �2; :::�n ; ::: of evidence items is obtained. I refer to such

sequences as data streams. Figure 2.2 shows a generic data stream.

The scientist may be uncertain about what will be observed in a given world.

In a world in which all swans are black, she may suppose that only black swans

will be reported; in a world in which one swan is black and all others white, she

may expect to observe a black swan exactly once (see Figure 2.3).

I represent the inquirer's beliefs about what may be observed in a given world

by a relation Gen(�; w) (read: \In world w, data stream � may be generated in

inquiry".) 2 The set of evidence items E contains all pieces of evidence that

appear at some time in some world; formally, E =Sw2Wfrange(�) : Gen(�; w)

holdsg, where range(�) is the set of evidence items that occur along data stream

�. A data stream is an in�nite sequence of discrete observations drawn from

E: I denote the set of all data streams by E!. The empirical content of a

propositionP is the set of all data streams that the scientist regards as consistent

with P ; formally, the empirical content of P is the set of data streams de�ned

bySw2P

f� : Gen(�; w)g.

If two di�erent possible worlds can generate the same data stream (that is, if

Gen(�; �) is not a function), even the total in�nite amount of evidence may not

settle the hypotheses under investigation. For example, if the scientist considers

it possible that a black swan may exist without ever being found, he concedes

that only white swans may be observed even if there in fact is a black swan. If

this is the case, philosophers say that the hypothesis \there is a black swan" is

globally underdetermined (see Figure 2.4).

Global underdetermination gives rise to a number of interesting issues in

methodology (cf. [Kelly 1995, Ch.13, Ch.15]). But for studying how the ends

of inquiry control the legitimacy of inferences, it is better to avoid the compli-

cations that result when the possibility of global underdetermination is raised.

If we make the simplifying assumption that global underdetermination does not

arise|such that, for example, a black swan exists if and only if some black

swan is observed|we may consider the empirical content of the hypotheses un-

der investigation instead of the hypotheses directly. Accordingly, I assume that

the hypotheses under investigation are given as empirical hypotheses|that

is, as a set of data streams. 3 In general, an empirical proposition is a set of

data streams, and so is an empirical theory (which we may think of as the

conjunction|intersection|of a set of empirical propositions). I usually denote

an empirical hypothesis by H, and a collection of empirical hypotheses by H.

2This relation speci�es what [Kelly 1995, Ch.3] calls \the data protocol".3Even if we grant the possibility of global underdetermination, investigating the empirical

content of hypotheses instead of the hypotheses directly is justi�ed if the scientist is satis�edwith an empirically adequate theory. For example, the hypothesis that all swans are white is

false in the world in which there is a black swan, but nonetheless empirically adequate if noblack swan is ever observed. [Van Fraassen 1980] makes a case that the goal of science should

be construed as �nding empirically adequate theories, and nothing more.

20

Figure 2.2: Data Stream �

21

Figure 2.3: The Observations that may arise in a given World

22

Figure 2.4: Global Underdetermination of the question \is there a black swan"?

23

An empirical hypothesis H is true (or: correct) on an in�nite data sequence � if

� 2 H. The complement (or negation) of an empirical hypothesis H is the set

of data streams on which H is false, that is, E!�H. I denote the complement of

H by H . An empirical proposition P entails another empirical proposition P 0

just in case P 0 is true whenever P is; formally, P j= P0 () P � P

0. Similarly,

two empirical propositions P1 and P2 entail P 0 if P 0 is true whenever P1 and

P2 are; hence I de�ne P1; P2 j= P0() P1 \ P2 � P

0. An empirical proposition

P is consistent with P0 if on some data stream, P and P 0 are both true, that

is if P \ P0 6= ;. I say that a �nite data sequence e is consistent with a data

stream � just in case � extends e (written � � e). The concatenation of two

�nite data sequences e and e0 is denoted by e� e0; similarly I write e� � and e�x

for the concatenation of e with a data stream �, and with a datum x 2 E. A

�nite data sequence e corresponds to the empirical proposition that some data

stream consistent with e is the one obtained in inquiry. I write this proposition

as [e]; so [e] = f� : � � eg. The length of a �nite data sequence e is the number

of items in e, and is denoted by lh(e). The �nite initial sequence of a data

stream � of length n, namely �1; �2; :::; �n , is written as �jn . Figures 2.2 and

2.5 illustrate these notions.

A scienti�c, or inductive, method � takes as input a �nite data sequence,

and produces an empirical theory; see Figure 2.6. 4

This de�nition does not allow agents to distinguish between \serious possibil-

ities", \epistemically necessary" propositions and plain \belief", or \background

assumptions" and \conjectures". In Section 2.4 I discuss di�erent epistemic at-

titudes that a scientist may have towards a proposition. Eventually I shall

adopt a model of inquiry in which the scientist's background knowledge, or her

\standard of serious possibility", remains essentially the same through time (see

Section 2.5). But before we engage the general and abstract philosophical issues

involved in the conception of belief, it is best to complete the exposition of my

basic model of scienti�c inquiry by going through a number of examples. To ac-

commodate the scientist's standard of "serious possibility", I split the scientist's

theory into two parts, K and T , where T must entail K. We may think of K

as the scientist's background knowledge and of T as the scientist's conjecture.

In Section 2.4, I discuss how one may interpret this kind of theory in terms of

the scientist's epistemic attitudes.

For the formal de�nition of such a scienti�c method, let E� denote the set of

�nite sequences of evidence items drawn from E. A scienti�c method � assigns

to each �nite data sequence e 2 E� a pair of empirical theories (K;T ) that

4Learning theory is exible about what exactly it is that empirical methods produce.

Taking the outputs of an empirical method to be an empirical theory is a simple yet widelyapplicable model that serves my purposes best. However, the outputs of a method could also

be theories containing theoretical terms (in Chapter 8, scienti�c methods introduce hidden

particles), a scientist's \practices" [Kitcher 1993, Ch.3] or a grammar for a language, as inapplications of learning theory to modelling language learning (e.g., [Osherson et al. 1986]).

See also Section 2.4 below.

24

Figure 2.5: Empirical Hypotheses and Entailment

25

Figure 2.6: An Inductive Method

satisfy T j= K.

2.3 Examples of Inductive Problems and Scien-

ti�c Methods

The main purpose of the following examples is to show how the concepts just

presented apply to familiar problems of empirical inquiry. These examples will

serve as illustrations of methodological points throughout this thesis. Another

purpose is to introduce the concept of a reliable method by way of examples.

2.3.1 Universal Generalizations

Suppose we are interested in a universal generalization such as \all swans are

white". A hypothetical ornithologist may investigate this hypothesis by exam-

ining one swan after another. Let us assume that the ornithologist divides the

colour spectrum into discrete colors|such as white, black, gray, blue, green|

numbered 0,1,2,.. Then the evidence items are reports of the form \this swan

has colour n", which we may simply encode by the number n; let 0 encode \this

swan is white" and 1 \this swan is black". A data stream is a sequence of natu-

ral numbers, representing the observed colors. If no global underdetermination

obtains, the hypothesis that all swans are white is true just in case all observed

swans are white; that is, just in case the data stream produced in inquiry is an

in�nite sequence of 0s. The empirical content of the hypothesis H \all swans

are white" is the singleton containing the everywhere 0 data stream; formally,

H = f0!g. The complement H of H, \not all swans are white", is the set of all

data streams with some observation other than 0; formally, H = f� 2 N! : for

some time k; �k 6= 0g. The scientist may initially be convinced that only white

and black swans will be observed. That is, his background knowledge K is the

set f0; 1g! of all data streams featuring 0s and 1s. Figure 2.7 illustrates this

situation.

His initial conjecture, before seeing any data, might be that all swans are

white. In my notation, the scientist's �rst theory is �(;) = (K;H). Assuming

26

Figure 2.7: A Universal Generalization

27

that only white and black swans are observed, a complete rule for inductive

inference in this problem is the following.

1. Expand background knowledge by adding the data to the original assump-

tion that only white and black swans will be observed.

2. If all observed swans are white, conjecture \all swans are white";

3. otherwise, let the current conjecture be just the current background knowl-

edge (which by clause 1 entails that not all swans are white).

This procedure is de�ned as follows for all �nite data sequences e consistent

with K.

1. If all observed swans are white|that is, if range(e) = f0g|then �(e) =

(K \ [e];H):

2. Else if some black swan appears along e|that is, if 1 2 range(e)|then

�(e) = (K \ [e];K \ [e]):

This method has the property that it eventually settles on the correct truth-

value for H, on all data streams consistent with background knowledge K. If H

is true|if all swans are white|� makes the right conjecture from the start. If

H is false, eventually a black swan is observed, and � becomes certain that H is

false. When a method is guaranteed to eventually settle on the right answer no

matter what the right answer is, I say that the method is reliable. Although

a reliable method is guaranteed to converge to the truth, we may never know

when the method has done so. In the current example, if all swans are white, �

converges to the right answer, but at any time � might change its mind if a black

swan appears. Another way of putting the matter is that a reliable method will

eventually give the right answer, but not with certainty.

2.3.2 Almost Universal Generalizations

Consider again the birds from the previous example, but this time with hypothe-

ses that allow �nitely many exceptions to a rule about the colour of swans. The

hypothesis \almost all swans are black" is true if only �nitely many non-black

swans are observed, and similarly \almost all swans are white" is true if only

�nitely many non-white swans are observed (see Figure 2.8.).

In terms of our previous encoding, \almost all swans are black" is correct

on a data stream � if � stabilizes to 0, and \all swans are white" is true on � if �

stabilizes to 1. An interesting rule for inductive inference in this problem is to

conjecture that all future swans have the same colour as the last one.

1. Begin with background knowledgeK = f0; 1g! as in the previous example,

and conjecture that all swans are white.

28

Figure 2.8: Almost Universal Generalizations with Finitely Many Exceptions

29

2. Expand background knowledge by adding the data to the original assump-

tion that only white and black swans will be observed.

3. If the last of n observed swans is white, conjecture \almost all swans are

white";

4. otherwise, conjecture \almost all swans are black".

In our formal notation, we may render this inference rule as follows. Let

Hw be the empirical content of the hypothesis that all but n swans are white.

That is, � 2 Hw () there are only �nitely many times n such that �n 6= 0.

Similarly, the empirical content of the hypothesis that almost all swans are black

is denoted by Hb. So � 2 H

b () there are only �nitely many times n such

that �n 6= 1. Our inference rule is:

1. �(;) = (K;Hw).

2. Let lh(e) = n > 0. If the last observed swan is white|that is, if en = 0|

then �(e) = (K \ [e];Hw \ [e]);

3. if the last observed swan is black|that is, if en = 1|then �(e) = (K \

[e];Hb\ [e]):

If one of the almost universal generalizations is true, then � eventually settles

on the right one. For example, if almost all swans are white, then after some

�nite time, only white swans appear, and � converges to Hw . However, both

universal generalizations might be false, namely if there are in�nitely many

white swans and in�nitely many black swans. In that case � goes back and

forth between the two possibilities, without ever ruling out both of them. Thus

� does not reliably settle on a true theory given K. On the other hand, if we

assume that one of the universal generalizations is true, then � is reliable with

respect to this assumption. Formally, if K 0 = Hw [Hb, then � is reliable given

K0. This illustrates how the reliability of a method depends on the scientist's

background knowledge.

2.3.3 Goodman's Riddle of Induction

In his \New Riddle of Induction", Nelson Goodman introduces an unusual color

predicate for emeralds [Goodman 1983].

Suppose that all emeralds examined before a certain time t are green

. . . Our evidence statements assert that emerald a is green, that

emerald b is green, and so on . . .

Now let me introduce another predicate less familiar than \green".

It is the predicate \grue" and it applies to all things examined before

t just in case they are green but to other things just in case they are

30

blue. Then at time t we have, for each evidence statement asserting

that a given emerald is green, a parallel evidence statement asserting

that emerald is grue.

It is natural to consider not just one \grue" predicate, but a family of them,

one for each critical time t. I will model the Riddle of Induction as the problem of

�nding the colour predicate that correctly classi�es all emeralds. To describe therange of alternative hypotheses|universal generalizations of colour predicates|

it is convenient to assume, as in [Salmon 1963], that the emeralds are examined

in a �xed order, so that we may denote the emerald examined at time 1 by 1

(rather than a), the one examined at time 2 by 2 (rather than b), etc. Then we

can de�ne

x is grue(n) () x � t and x is green(n) or

x > n and x is blue

and

x is bleen(n) () x � n and x is blue or

x > n and x is green

.

As Goodman noted, green can be de�ned from grue(n) and bleen(n), for

any n:

x is green () x � n and x is grue(n) or

x > n and x is bleen(n)

Similarly for blue.

The hypotheses of interest in the riddle of induction are the universal general-

izations of these predicates, which I denote by Hgreen;Hblue;Hgrue(n);Hbleen(n) .

Each sequence of emeralds satis�es at most one of these universals (regard-

less of which colour predicates are used to report the colour of the individual

emeralds). In the green � blue reference frame, we can diagram the empirical

content of some of the universal hypotheses as in Figure 2.9.

Now for inference rules, or as Goodman calls them, projection rules. The

scientist's initial background assumptionK is that either all emeralds are green,

blue, grue(n), or bleen(n), for some n. Formally, if we let H be the collection

of the universal hypotheses that are candidates for projection, K = [H. The

natural projection rule is the following, for all �nite data sequences consistent

with the background knowledge.

31

Figure 2.9: The New Riddle of Induction

32

1. If n green emeralds have been observed, project that all emeralds are

green.

2. If n green emeralds have been observed and the n+ 1-th emerald is blue,

project that all emeralds are grue(n).

3. If n blue emeralds have been observed, conjecture that all emeralds are

blue (where n > 0).

4. If n blue emeralds have been observed and the n+ 1-th emerald is green,

conjecture that all emeralds are bleen(n).

In our notation, we may describe the natural projection rule as follows, for

all �nite data sequences e consistent with K.

1. �(;) = (K;Hgreen):

2. If e is consistent with Hgreen; �(e) = (K \ [e];Hgreen) .

3. If e is consistent with Hblue; �(e) = (K \ [e];Hblue)

4. Otherwise �(e) = (K \ [e];K \ [e]):

It is easy to see that the natural projection rule is reliable given K. So

are many other projection rules, for example: project grue(100) if less than 100

green emeralds have been observed; project green if 100 or more green emeralds

have been observed; and otherwise project the colour predicate that is consistent

with the evidence (there is only one). On the other hand, a \grue" a�cionado,

who projects nothing but predicates of the form grue(n), would fail to identify

the correct colour predicate if all emeralds are green. In Chapter 6, I show that

among reliable projection rules, the natural one is the best for converging to the

true colour predicate quickly while avoiding unnecessarily many retractions.

2.3.4 Identifying Limiting Relative Frequencies

Hans Reichenbach sought to reduce all inductive inference to estimates of prob-

abilities [Reichenbach 1949]. Reichenbach subscribed to the frequentist inter-

pretation of probability. According to frequentists, the statement that \the

probability of this coin coming up heads is p" means that in an in�nite random

sequence of coin tosses, the rate of heads comes closer and closer to 1/2 as more

and more tosses are observed [Von Mises 1981]. To make this idea precise, let

e be a �nite (non-empty) sequence of coin tosses. The relative frequency of

heads in e is the number of heads occurring in e divided by the number of total

tosses in e. Given an in�nite sequence � of coin tosses, the limiting frequency

of heads in � is (for example) 1/2 just in case for every ratio r di�erent from

1/2, eventually the relative frequency of coin tosses in the �nite initial sequences

of � is always closer to 1/2 than to r; see Figure 2.10.

33

6

-

relative frequency in sample

0

1

1/2

��LLLLLLLLLLLL��,,,B

BB�� . . .

number of tosses

XXXX

Figure 2.10: The limit of the relative frequencies is 1/2.

34

The hypothesis \the limiting frequency of heads is 1/2" is true on an in�nite

sequence of coin tosses � just in case the limiting frequency of heads in � is 1/2.

Reichenbach proposed to go about identifying the limiting frequency of an event

from short-run frequency data in this way: Posit that the probability of the event

in question ism=n if the event occursm times in n trials. He called this inference

rule the straight rule. Reichenbach noted that if the observed frequencies of

the event of interest converge to a limit p, the straight rule will come arbitrarily

close to p. That is, for any degree of approximation r, eventually the posits of

the straight rule are always within jp� rj of p. Reichenbach viewed this fact as

a \pragmatic vindication" of the straight rule.

To formulate a version of Reichenbach's problem and his solution in our

model, let us consider the problem of �nding the probability of a coin coming

up heads. There are two evidence statements: \the coin shows heads", encoded

by 0, and \the coin shows tails", encoded by 1. The possible data streams

are the in�nite sequences of 0s and 1s, that is, the set f0; 1g!. Reichenbach's

assumption that the limiting relative frequency of heads exists on the actual

data stream is represented by the background knowledge K = f� 2 f0; 1g! : for

some p between 0 and 1, the limiting frequency of heads in � is pg: The straight

rule is the following method, de�ned for all �nite sequences of 0s and 1s:

1. �(;) = (K; 1=2), where 1/2 is just an arbitrary guess for the probability

of heads before any trials are made.

2. �(e) = (K \ [e];m=n), where n = lh(e) > 0, and m is the number of 0s

(heads) occurring in e.

The straight rule does not reliably converge stabilize to the true limiting

frequency of heads, but it approaches the limiting frequency to an arbitrary

degree. Following Kelly's terminology [Kelly 1995, Ch.9], I say that the straight

rule gradually identi�es the true limiting relative frequency of heads (assum-

ing the relative frequencies of heads exists|that is, if K is true). If we assume

as background knowledge that the limiting relative frequency exists and is either

1/4 or 3/4, then there is a reliable method for identifying which of these two

alternatives is true: Given a sequence of coin tosses e, if the relative frequency

of heads in e is closer to 1/4 than to 3/4, conjecture \the limiting relative fre-

quency is 1/4"; otherwise conjecture that it is 3/4. By the de�nition of limiting

relative frequency, eventually the observed frequency will always di�er from the

true one by less than 1/2; after that point, this procedure stabilizes to the true

limiting relative frequency. In general, it is possible to reliably identify which of

a number of possible limiting relative frequencies is the true one if the number

of alternatives is �nite.

35

2.3.5 Cognitive Science and the Physical Symbol System

Hypothesis

Newell and Simon raised the conjecture that any intelligent behavior can be pro-

duced by a \physical symbol system"|essentially, a computer [Newell and Simon 1976].

This claim was part of their \physical symbol system hypothesis". (The other

part is that intelligence requires the capacities of a physical symbol system).

They emphasize that this conjecture is not a philosophical claim about the con-

cept of intelligence, but rather an empirical hypothesis. A simple-minded way

in which we might go about testing the physical symbol system hypothesis (cf.

[Kelly 1995, Ch. 7]) is this: Select some general task that requires intelligence,

and see if there is a computer program that solves a sequence of instances of the

task. For de�niteness, we may ask the mechanical candidates for intelligence

to judge the grammaticality of sentences in English. Choose some means for

producing an unbounded sequence of sentences with English words; for exam-

ple, we may take the �rst sentence s of the editorial in the day's New York

Times, together with the result of adding \not" at the beginning of s. The

computer has to classify these sentences as grammatically correct or not. The

computer's answer counts as correct if it agrees with the judgments of competent

speakers of English (if competent speakers of English disagree about whether

a given sentence is grammatical or not, we shall discard the sentence). For

any �nite number of sentences, it is trivial to �nd a physical symbol system|a

computer|that correctly classi�es these sentences: we can employ a look-up

table, or \hardcode" the right answers, as programmers say. The interesting

question is whether a computer can give all of the right answers.Formally, we have an in�nite sequence s1; s2; ::: of sentences with English

words, and evidence items of the form hsk; bi, where b is 0 if sk is ungrammatical,

1 if sk is grammatical. Let K be the set of in�nite sequences of such evidence

items. The physical symbol system hypothesis, denoted by HPSS , is correct

(as far as our evidence goes) on a data stream � in K just in case there is

a computer program M such that for all k;M outputs \grammatical" on sk

if �k = hsk; 1i, and outputs \ungrammatical" on sk if �k = hsk; 0i. A �rm

believer in cognitive science would conjecture that HPSS is correct no matter

what evidence is observed. In that case, �(e) = (K;HPSS) for all �nite data

sequences e. This procedure is unreliable because it gives the wrong answer if

HPSS is false.

A more open-minded researcher might proceed as follows. Start with a

promising system m1, and conjecture that the physical symbol system hypoth-

esis is correct as long as m1 performs correctly. If m1 fails, the researcher

conjectures that the physical symbol system hypothesis is false, and tries other

programsm2;m3; :::. He continues to conjecture that HPSS is false until one of

these programsmn gives the correct answer on all the sentences to be classi�ed.

Then he conjectures that HPSS is true as long as mn performs correctly, and

so on.

36

To describe this procedure in our formalism, let m1;m2; :::;mn ; ::: be a se-

quence which enumerates the programs that our researcher considers candidates

for arti�cial intelligence. I introduce a \pointer" to keep track of which programs

have failed at a given stage (cf. [Kelly 1995, Ch.9] and Section 5.2 below). A �-

nite sequence of sentence classi�cations hs1; b1i; hs2; b2i; :::; hsn; bni is consistent

with a computer program M if for all si;M halts on si and outputs bi. The

following research method � models our cognitive scientist.

1. pointer(;) = m1;�(;) = (K;HPSS):

2. Let pointer(e) = mn. If mn is consistent with e � x, then pointer(e � x) =

mn, and �(e � x) = (K;HPSS). Otherwise pointer(e � x) = mn+1, and

�(e � x) = (K;HPSS).

We may assume that if there is a program that masters the sentence clas-

si�cation task, it is included in the enumeration of candidates for arti�cial in-

telligence, m1;m2; ::: (for a logically omniscient inquirer, this assumption is of

no consequence because she can enumerate all computer programs, intelligent

or not). Then if HPSS is true, � will eventually try a program that masters

the sentence classi�cation task, and settle on the conjecture that HPSS is true.

If there is no program that classi�es all sentences correctly, � will typically go

through the following pattern: the �rst programs that � tries fail immediately

on the test sentences, and � conjectures that HPSS is false. After a while, �

�nds a programM that classi�es the test sentences correctly for some time, and

conjectures that HPSS is true. Eventually M will make a mistake, � conjectures

that HPSS is false, tries some more programs, and so forth. Thus � goes back

and forth between \the physical symbol system hypothesis is true" and \the

physical symbol system hypothesis is false". It is also possible that all of �'s

programs fail on the available test sentences as soon as � tries them. In that

case, � would|correctly|stabilize to the conjecture thatHPSS is false. To sum

up: if HPSS is correct, � eventually settles on the correct conjecture 1. If HPSS

is false, � may stabilize to 0, but will not stabilize to 1. In Kelly's terminology, �

reliably veri�es HPSS in the limit given K [Kelly 1995, Ch.4] (see Section 4.4

below).

2.3.6 Theories of Particle Physics

Particle physics aims to discover what elementary particles exist, how they

decay and how they react with each other. With regard to �nding out what

particles exist, let us take as our data annual reports from particle physicists

as to whether they have discovered a new elementary particle or not. In Figure

2.11, 0 encodes\no new particles this year", and 1 encodes \a new particle has

been discovered".

A particle physicist may be interested in determining the exact number of

elementary particles; let Hn denote the hypothesis that there are exactly n

37

Figure 2.11: How many particles are there?

38

particles. Assuming that global underdetermination does not arise, Hn is true

just in case exactly n particles are observed. In our representation,Hn is correct

on those data streams that feature exactly n 1s; that is, Hn = f� : exactly n 1s

appear along �g. One rule of inference for this problem is to conjecture at each

stage that the particles discovered so far are all that there are. If background

knowledge K is consistent with any number of particles being discovered in any

order|that is, if K = f0; 1g!|the corresponding method is this:

1. �(;) = (K;H0):

2. �(e) = (K \ [e];Hn), where n is the number of 1s occurring in e, that is,

the number of discovered particles.

If there are only �nitely many particles, � eventually identi�es their correct

number. But if there are in�nitely many, � fails to identify this fact. Thus � is

not reliable given K, but � is reliable given K0 =SnHn.

Chapter 8 examines the more complicated problem of inferring how elemen-

tary particles react.

There are many more examples of learning-theoretic models for empirical

problems. Glymour uses Kelly's framework to analyze the scope of methods

for neuropsychology [Glymour 1994]. Kelly examines Kant's question about

the divisibility of matter [Kelly 1995, Ch.3]. Learning theory has been applied

extensively for investigating language acquisition [Osherson et al. 1986].

In the next section, I compare my model of scienti�c inquiry with other

conceptions from the philosophical literature.

2.4 Inquiry, Belief and Action

Four questions form the core of the debate about conceptions of scienti�c in-

quiry: (1) what is the scientist's attitude towards his evidence? (2) what does

inquiry produce? (3) what is the scientist's attitude towards the results of his

inquiry? (4) how do the scientist's actions depend on (1){(3)? I will brie y

review the most common answers to these questions, and then discuss what

epistemological interpretations we can give to the formal model of scienti�c

inquiry from Section 2.2.

What is the scientist's attitude towards the evidence? Many writers sup-

pose that the inquirer is certain that his evidence is correct; for example, in

\Gambling With Truth" Levi writes:

To accept H as evidence is not merely to accept H as true but to

regard as pointless further evidence collection in order to check H.

[Levi 1967, p.149]

39

Karl Popper, by contrast, viewed scientists as accepting evidence statements

only provisionally; he pointed out that if a scientist fails to replicate a phe-

nomenon in question, he may reject the reports of the phenomenon as spurious

[Popper 1968]. My model can interpret both views. For example, if the available

evidence e reports ten white swans, a method � might accept [e] as evidence in

Levi's sense, such that �(e) = (K \ [e]; T ), for some theory T . Or � may only

provisionally accept [e], as Popper suggested, such that �(e) = (K;T \ [e]). Or

� may accept provisionally part of e but not all of it. Note that for Levi, as

for me, statements that are accepted as evidence need not come from some dis-

tinguished `observation language'. What matters is not the form or the source

of the evidence statement, but the inquirer's attitude towards it|that he will

assume the evidence statement in further inquiry (cf. [Levi 1967, p.28]). So

Levi allows that a scientist might accept a statement such as H =\all swans are

white" as evidence when, say, 100 swans are all observed to be white. In my

model, this means that the scientist includes H in her background knowledge,

such that �(e) = (H;H) on a sample e of 100 white swans.

What are the products of inquiry? There are sundry proposals for what

the outputs of scienti�c methods are: \Conjectures", \hypotheses" (Popper),

\acceptance as evidence", \acceptance as true" (early Levi), \posits" (Reichen-

bach), \inferences", \estimates" (Hacking), \degrees of con�rmation" (Carnap),

\degrees of belief" (Bayesians). Many writers see two epistemic attitudes in sci-

enti�c inquiry that correspond to the components of the output of scienti�c

methods in my model. In \Gambling With Truth", Levi calls these \acceptance

as evidence" and \acceptance as truth" [Levi 1967]; Kelly refers to \background

knowledge" and \conjectures" [Kelly 1995]; Kitcher distinguishes between \en-

dorsing" and \entertaining" a hypothesis [Kitcher 1993, p.65]; in a discussion

of Reichenbach's straight rule, Hacking speaks of \inferences" and \estimates":

Adam has made 20 tosses, say HHHTH..., giving him 17 H, 3 T.

Now from this data Adam, you and I would cheerfully infer|state

categorically in light of the data (though we may be wrong)|that

the chance of heads exceeds 1/10. ...I estimate that I shall use 63

board feet for my new fence, and order accordingly. I never need to

infer|to state categorically on the data|that I shall use exactly 63

feet. [Hacking 1968] , my emphasis.

If we regard the distinction between \acceptance as evidence" and \accep-

tance as truth" as irrelevant or, like the later Levi, as indefensible [Levi 1980],

we can adopt methods that produce theories with only one component. As

for degrees of belief, we could extend Kelly's model such that scienti�c meth-

ods produce a probability distribution over an algebra of propositions (which

would include all hypotheses of interest). If we further stipulate that the in-

quirer update his background knowledge K simply by adding the new evidence

e, such that the new background knowledge is K\ [e], we obtain the model that

40

Bayesians use [Earman 1992]. However, that school requires agents to update

their degrees of belief by Bayesian conditioning, whereas for learning theorists

conditioning is just one of many possible methods for changing degrees of belief.

What is the scientist's attitude towards the results of his inquiry?A natural interpretation of my model is that the scientist is certain of his

background assumptions and believes his conjecture, which corresponds to \ac-

ceptance as evidence" and \acceptance as true", or perhaps \inference" and

\estimate" in Hacking's terms. If methods produce probability distributions,

we may interpret these as the scientist's \degree of belief". Some writers say

that the study of empirical methods does not require consideration of belief or

other epistemic attitudes at all.

I used to take pride in the fact that I am not a belief philosopher:

I am primarily interested in ideas, in theories, and I �nd it com-

paratively unimportant whether or not anybody `believes' in them.

[Popper 1972, p.25]

In a similar vein, Reichenbach referred to estimates of long-run frequencies

as \posits", a neutral term that leaves open what the inquirer believes about

her posits (cf. [Salmon 1991, p.116]).

How do the scientist's theories relate to his actions?On a traditional view, scienti�c method guides the inquirer in what to be-

lieve, and practical action relies on these beliefs. For example, Popper says that

\we should prefer as basis for action the best-tested theory" [Popper 1972, p.22],(emphasis Popper's). Carnap held that the \aim of inductive logic" was to de-

termine \rational credence functions" that could be used in decision making as

\rational degrees of belief"[Carnap 1962]. However, the early Levi argued that

a scientist's belief in H does not entail that the scientist would or should act as

if H were true.

...the evidence may entitle the scientist to accept one of the Hj 's

as true, yet may not warrant the decision-maker's choosing the act

that produces maximum utility when Hj is true. Recently certain

medical groups temporarily suspended dispensing the birth control

pill, Enovid, pending further examinations of evidence regarding its

safety. Several physicians endorsed this policy, even though they

acknowledged that they believed the pill to be safe. [Levi 1967,

p.10]

He concluded that we should recognize \the quest for truth as a legitimate

human activity whose aims and products are not directly relevant to practical

concerns" [Levi 1967, p.14].

Salmon disagreed with Carnap about the role of probability estimates in

practical action (in this case, betting behavior). Although the straight rule

licenses the inference that the probability of heads is 1 if all coin tosses come up

41

heads, this does not mean that an agent should bet on these odds. \[Users of the

straight rule] would o�er such practical advice as to avoid making large bets at

unfavorable odds on the basis of probabilities whose values are not known with

great con�dence" [Salmon 1991, p.115]. Nowadays, many Bayesian con�rmation

theorists concur that rational degrees of belief need not be related to practical

decisions (e.g., [Hellman 1997]).

In \The Enterprise of Knowledge", Levi changed his position about a sci-

entist who takes her theories for granted in practical action, but is ready to

question them in scienti�c inquiry; he called someone in that situation a victim

of \cognitive schizophrenia" [Levi 1980, pp.16{18]. If a scientist accepts a belief

at time t, Levi says that he should regard it as infallibly true in decision making

at time t. Moreover, Levi holds that accepting a belief is itself a decision, and

therefore in deciding what beliefs to accept at time t + 1, the scientist should

consider his beliefs from time t to be infallibly true. Nonetheless, Levi says,

the scientist's cognitive values may make it rational for him to change his be-

liefs. Thus he may adopt a new theory T at time t + 1, even though at time

t he is convinced that T is false. Hence for Levi, infallibility does not imply

incorrigibility|the scientist may revise beliefs that he regards as infallibly true.

Answers to these four questions de�ne a conception of what scienti�c infer-

ence is, and what it aims for. For each conception of scienti�c inquiry, we can

apply means-ends analysis to determine what inferences are good inferences. For

example, learning theory can make recommendations for methods that produce

probability distributions (e.g., [Putnam 1963], [Juhl 1993]). The reason why

my basic model treats only methods whose theories assign de�nite truth-values

to the hypotheses under investigation is, �rst, because this is a simpler model.

Second, for the applications that I am interested in|situations like those in

Section 2.3 |it is also a more natural model. For example, it is awkward to

ask to what degree a particle physicist should believe that quarks do not have

a �ner substructure; physicists don't seem to be aware of and certainly do not

advertise explicit degrees of belief about that kind of question. But it is nat-

ural to ask why this should be her current hypothesis. Scienti�c practice also

motivates the distinction between the scientist's background assumption and

his conjectures. For example, in particle physics, quantum mechanics is pre-

supposed, but the structure of quarks is still a question that warrants further

\collection of evidence". But the signi�cance of background assumptions in my

model goes beyond an attempt to accommodate features of scienti�c practice.

I agree with Levi that we should use decision theory to study decisions to be-

lieve. An essential part of formulating a decision problem is to specify the set of

possibilities that the decision maker regards as relevant. This is the role of the

scientist's background assumptions. Hence I interpret the scientist's background

knowledge as his standard of serious possibility (so far as the goals of inquiry

are concerned). The decision problem is to choose among possible methods,

which are evaluated relative to the scientist's background knowledge. I assume

that the scientist believes in his background assumptions|in the sense that his

42

choice of a method is based on them|but I do not assume, or deny, that the

scientist believes in his theories. Nor do I assume, or deny, that the scientist

bases practical decisions on his theories. Since the relationship between belief

and practical action is an open philosophical question, I regard it as a virtue of

the learning-theoretic approach to methodology that it does not presuppose an

answer to this question. 5

It is worth remarking that the formal model does not come with psycholog-

ical claims about what the scientist is or is not aware of. In some applications

it is implausible to suppose that the learner is aware of what the serious possi-

bilities and the alternative theories are. For example, when a child is learning

a language, she is not aware of whatever constraints on natural languages there

might be, and does not have in mind a set of \alternative" languages from which

she is drawing successive conjectures. A scientist working in the context of a

research program may not be aware of the space of alternative theories spanned

by the program, and may simply be disposed to respond to evidence in a cer-

tain way, without consciously following an explicit research strategy. In cases

like this the perspective of our model is that of an outsider, someone who is

analyzing the assumptions and inferential dispositions that are implicit in the

scientist's theory and practice.

2.5 Revising Background Knowledge

In this thesis I will examine strategies for revising conjectures on the basis of

background assumptions, but not strategies for revising background knowledge.

This is common practice in learning theory, but a simpli�ed view of scienti�c in-

quiry. Historians of science have described how \scienti�c revolutions" interrupt

the course of \normal science" [Kuhn 1970]. One of the hallmarks of a scienti�c

revolution is a change in what Lakatos calls \core assumptions" [Lakatos 1970].

This leads in most cases to what I would call changes in background knowledge.

Other changes that attend scienti�c revolutions include:

� Changes in the basic ontology of scienti�c theories, that is, the set of

possible worlds.

� From changes in the basic ontology often follow changes in the meaning

of hypotheses, that is, the set of possible worlds in which a hypothesis is

true changes.

5As I mentioned above, Levi rejects my distinction between the scientist's backgroundassumptions and his conjectures as untenable [Levi 1980, pp.16{18]. He holds that an agent

who accepts a theory must think that it could not possibly be false. To model this position,I would take the scientist's theories to have only one component, namely the theory that he

currently takes to be necessarily true. In Section 5.3.1, I indicate how to adapt my resultsabout inductive inference for Levi's notion of acceptance.

43

� Improvements in observational capabilities, which alter the scientist's be-

liefs about what may be observed in a given possible world.

� New scienti�c questions added to the set of hypotheses under investigation.

[Kelly 1995, Ch.15] and [Kelly and Glymour 1992] give a learning-theoretic

analysis of methods that actively bring about these kinds of changes. But since

my goal is not a comprehensive theory of all aspects of scienti�c inquiry, but to

understand how cognitive values underwrite inductive inferences, I shall make

the simplifying assumption that the set of possible worlds, the hypotheses of in-

terest, and the scientist's background knowledge do not change in the course of

inquiry. In the language of [Kuhn 1970], my topic is induction in normal science.6 Hence I restrict scienti�c methods to revise their initial background

knowledge by adding nothing but the evidence. Formally, this means

that if �(;) = (K;T ); then for all e consistent with K; �(e) = (K\ [e]; T 0), where

T and T 0 are arbitrary theories. The conjectures of methods are not de�ned for

data sequences that are inconsistent with their original background knowledge.7

Under these restrictions, specifying the initial background knowledge K deter-

mines the future background knowledge. So to describe an inductive method, it

su�ces to specify its initial background knowledge K and its conjectures; I will

speak of methods with background knowledge K that produce empirical

theories only, not background knowledge paired with a theory. For example, to

de�ne the method from Section 2.3.1 in this way, let K = f0; 1g!, and �(e) = H

(\all swans are white") if all swans observed in e are white, and �(e) = [e]

otherwise (which entails that H is false).

6[Kitcher 1993, Ch.7, Sec.4] too views scientist as attacking problems of induction within

a given \background practice", which he connects to Kuhn's paradigms: \Thus, Kuhn'sconception of `normal science' allows for determinate resolution of issues because of the

constraining role of background practice (`the paradigm')." [Kitcher 1993, p.248, fn.42].[Donovan et al. 1988] and [Shapere 1984] also model scientists as relying on background

knowledge.7Thus my model of updating background knowledge agrees with the Bayesian's evolution

of his \sample space"; see Section 2.4.

44

Chapter 3

Truth,Content and Minimal

Change

3.1 Outline

This chapter begins the investigation of which methods are optimal for given

cognitive values. I start with content and truth (or empirical adequacy).

To evaluate the performance of a method with respect to a given desider-

atum, I apply the principle of admissibility. A method is admissible if it is

not dominated. In general, an act a dominates another option a0 if a in no case

leads to worse outcomes than a0, and sometimes leads to better outcomes. It

is easy to see that the only methods that are admissible with respect to avoid-

ing errors|call these error-minimal|are the methods whose theories never go

beyond the evidence. Following [Levi 1967, p.6], we may label such methods

\skeptical". 1 A skeptical method � never produces a false conjecture. But

if some conjecture T of a non-skeptical method � goes beyond the evidence,

T might be false. So � might make an error when � doesn't, and hence �

dominates � with respect to avoiding errors. Of course, skeptical methods do

not produce theories with any interesting content. Let us say that a theory T

has more content than a theory T0 if T is logically stronger than T

0 (in

semantic terms, T � T0). Then the only methods that are admissible with re-

spect to content|call these content-optimal|are those that always produce the

contradiction, because the contradiction has maximum content. If we restrict

ourselves to consistent theories, the only content-optimal methods are those

that provide complete theories; a theory T is complete if T makes a unique

prediction for each future observation, such that T = f�g for some data stream

1[Levi 1967] distinguishes between \global" skepticism, which denies claims to knowledgeof anything, and \local" skepticism, which holds that beliefs are justi�ed only if they follow

logically from the evidence. My skeptic is a local skeptic.

45

�. In either case, avoiding error con icts strongly with the goal of providing con-

tent, as several writers have noted. [Levi 1967] examines how an inquirer might

balance these values according to her taste. Without resorting to subjective

weights, we can go one step further by applying another principle from decision

theory: Pareto-optimality. The Pareto principle says that when an agent has

to make a trade-o� between con icting desiderata, it should not be possible to

improve her choice on one dimension without making her worse o� on another.

I call methods that are Pareto-optimal with respect to content and avoiding

errors content-error acceptable. I show that the content-error acceptable

methods are exactly those that always entail the evidence. These results are

simple, but they illustrate one of the main themes of this thesis: that familiar

methodological norms have a means-ends justi�cation with respect to certain

cognitive values. Another development of this theme leads to a critique of some

well-known principles for \minimal change" belief revision [G�ardenfors 1988].

Suppose we consider not only properties of a scientist's theory T at stage n of

inquiry, but also how her theory changes at stage n + 1 in light of new evi-

dence. What principles should guide this change? This question has received

much attention in philosophical logic and computer science. Many writers �nd

it plausible that the change should be a \minimal change". The idea is that in a

change from one theory to another, the new theory should be \as close as possi-

ble" to the old theory. However, it is notoriously di�cult to de�ne a satisfactory

notion of distance between theories (as the work on verisimilitude has shown

us; cf. [Miller 1974]). Another approach is to apply the concept of dominance.

Let us distinguish two kinds of change to a theory T : Adding a proposition P

to T , and retracting a proposition P from T . There are two plausible ways of

de�ning a notion of minimal change with dominance considerations:

1. Apply the Pareto principle. Then a change T0 from T is not minimal if

there is another change T1 such that T1 retracts no more from T than T0does, but adds less, or such that T1 adds no more to T than T0 does, but

retracts less.

2. Rank avoiding retractions �rst, avoiding additions second.

I show that each of these two notions half-agrees with the standard principles

for belief revision (known as the AGM postulates): The �rst agrees with the

AGM postulates when the evidence contradicts the current theory, but not

necessarily otherwise. The second agrees with the AGM postulates when the

evidence is consistent with the current theory, but not necessarily otherwise.

The recommendations for minimal belief change that stem from the Pareto

principle form an intuitively plausible new set of axioms for belief revision. It

is well-known in philosophical logic that axioms for belief change correspond

to principles of the logic of conditionals (\if p, then q") [G�ardenfors 1988]. An

interesting topic for future research is to examine which principles of conditional

logic the Pareto-optimality de�nition of minimal change validates.

46

Finally, I discuss ways in which belief revision theorymay constrain inductive

inferences, and argue that the AGM principles are not plausible as rules for

empirical inquiry. They do not help agents to gradually replace false beliefs

with true ones; if anything, they are obstacles on the path to truth that an

inquirer must steer around.

3.2 Dominance in Error and in Content

To de�ne what it means for a method to perform better than another with

respect to errors, I apply dominance twice: First, a method � weakly dominates

another method �0 with respect to error on a given data stream � if �0 makes

an error along � whenever � does, and at least once when � doesn't. Second,

a method � weakly dominates another method �0 with respect to background

knowledge K if � weakly dominates �0 on some data stream consistent with K,

and �0 makes an error along any data stream in K whenever � does.

De�nition 3.1 Dominance in Error

� � dominates �0 in error on a data stream � (written � �E

��0)()

1. for all n; if �(�jn) is false on �, then �0(�jn) is false on �, and

2. for some k; �0(�jn) is false on �, and �(�jn) is true.

� � dominates �0 in error given K (written � �E

K�0)()

1. for all data streams � in K, for all n; if �(�jn) is false on �, then�0(�jn) is false on �, and

2. for some data stream � in K, � dominates �0 in error on a data stream

� (i.e., � �E

��0).

� � is error-minimal given K () � is not dominated in error given K.

It is clear that a skeptical method � whose theories at each stage are entailed

by background knowledge and the evidence never makes an error on any data

stream consistent with the background knowledge. And if another method �

produces a theory that is not entailed by the evidence and background knowl-

edge, then that theory|by the de�nition of entailment|is false on some data

stream consistent with the background knowledge. So � dominates �0 in error.

It follows that the only error-minimal methods are the skeptical ones.

Fact 3.1 A method � is error-minimal given background knowledge K () for

all �nite data sequences e consistent with K; �(e) is entailed by [e] and K.

Because the content of a theory, unlike its truth, does not depend on the

entire data stream, I de�ne dominance with respect to content in terms of �nite

data sequences.

47

De�nition 3.2 Dominance in Content

� � dominates �0 in content given K (written � �C

K�0)()

1. for all �nite data sequences e consistent with K; �(e) has at least

much content as �0(e)|that is, �(e) j= �0(e)\K, and

2. for some �nite data sequence e consistent with K; �(e) has more con-tent than �

0(e)|that is, �(e) � �0(e)\K.

� � is content-optimal given K () � is not dominated in content givenK.

Since the contradiction ; has more content than any other theory, it follows

that the only content-optimal method is the one that always produces the con-

tradiction. However, we may wish to require that our methods produce only

theories consistent with given background knowledge K|call such methods

consistent given K|and then consider which theories among the consistent

ones provide maximum content. To formalize this idea, rephrase the second

part of De�nition 3.2 such that a method is content-optimal given K amongconsistent methods just in case it is not weakly dominated in content given K

by any consistent method. Restricting the test for weak dominance in a given

criterion c (in this case, content-optimality) to methods that satisfy another

criterion c0 (in this case, consistency) has the e�ect of ranking c

0 before c. I

shall often make use of this device. If we rank consistency before content, a

method is content-optimal among consistent methods if and only if it produces

complete theories.

Fact 3.2 A consistent method � is content-admissible given K among consistentmethods () for all �nite data sequences e consistent with K; �(e) = f�g, forsome data stream � consistent with K \ [e].

Facts 3.1 and 3.2 show that the goals of avoiding error and providing con-

tent are two masters that no method can serve at the same time. A scientist

might weight these factors in some subjective way to arrive at a trade-o� (along

the lines of [Levi 1967]). But independent of subjective weights, it seems un-

controversial that a bad trade-o� would be one that could be improved in one

dimension without impairing the other. This is the familiar Pareto principle. I

de�ne Pareto-dominance for content and error as follows.

De�nition 3.3 Pareto-dominance in Content and Error

� � Pareto-dominates �0 given K ()

1. for all �nite data sequences e consistent with K; �(e) has at least asmuch content as �

0(e), and � dominates �0 in error given K (i.e.,

� �E

K�0), or

48

2. for all data streams � in K, for all n, if �(�jn) is false on �, then�0(�jn) is false on �, and � dominates �

0 in content given K (i.e.,� �

C

K�0).

� � is content-error acceptable given K () � is not Pareto-dominated

given K.

What are the content-error acceptable methods? Suppose that a method

� fails to entail some evidence e. Then we can strengthen �(e) to entail the

evidence, without incurring an additional possibility of error. Hence � is not

content-error acceptable. Conversely, suppose we have a method � whose theo-

ries always entails the evidence. If we strengthen one or more conjectures of �,

we introduce a possibility of error where � has none. If we weaken one or more

conjectures of �, we lose content. Hence � is content-error acceptable. Thus a

method � is content-error acceptable just in case its theories always entail the

evidence; Figure 3.1 illustrates this fact.

Fact 3.3 A method � is content-error acceptable given K () for all �nite datasequences consistent with K; �(e) j= K \ [e] .

3.3 Dominance Principles and Minimal Theory

Change

Content and truth are properties of theories at a given time t. Just as a body

in motion has not only a position at time t, but also a velocity, we may ask

how the content and truth of a method's conjectures are changing from time

to time. What principles should govern theory change? A proposal that has

been in uential is that revising one's theory T in light of a proposition P|

such as a piece of evidence|should lead to a \minimal change" from T . How

might we de�ne a \minimal change"? One approach is to specify a notion of

distance between theories|a metric on the space of theories|and to use this

to measure how much a change T 0 di�ers from T . However, the di�culties with

de�ning distance between theories are familiar [Miller 1974]. But even without a

metric on theories, we can apply the concept of dominance to arrive at a notion

of \minimal change". There are two ways of changing a theory T : adding a

proposition to T , and removing a proposition from T . To be precise, let us say

that a theory T0 adds a proposition P to T just in case T 6j= P and T

0 j= P .

Similarly, a theory T0 retracts a proposition P from T just in case T j= P

and T0 6j= P . Using the concept of dominance to de�ne \adding more" and

\retracting less" leads to the following de�nition.

De�nition 3.4 Dominance in Retractions and Additions

� T0 retracts more from T than T1 (written T1 �R

TT0)()

49

Figure 3.1: Content vs. Error

50

1. for all propositions P , if T1 retracts P from T , then so does T0, and

2. for some proposition P , T0 retracts P from T but T1 does not.

� T0 adds more to T than T1 (written T1 �A

TT0)()

1. for all propositions P , if T1 adds P to T , so does T0, and

2. for some proposition P , T0 adds P to T but T1 does not.

The problem of revising theories with minimal changes is this: Given a

theory T , we wish to include a new piece of information P such that the result

of revising T with P|which I denote by T + P|is the minimal change from

T that includes P . I say that a revision T + P is a minimal change from

T if there is no other change T 0 from T such that (1) T 0 entails P , and (2) T 0

Pareto-dominatesT with respect to additions and retractions. (The de�nition of

Pareto-dominance with respect to additions and retractions follows the pattern

of De�nition 3.3. I spell it out in Section 3.6.) The following necessary and

su�cient conditions characterize minimal theory changes.

Proposition 3.4 Suppose that a revision T + P entails P . Then T + P is aminimal change ()

1. T \ P entails T + P and

2. if T j= P , then T + P = T .

Figure 3.2 illustrates Proposition 3.4.

Clause 2 obtains because obviously T itself is the minimal change from T

when T entails the new information P . Since adding P to T retracts nothing

from T , it follows that a theory T 0 that has more content than the addition of

P to T (that is, T 0 � T \ P ) adds more to T than T \ P , but does not retract

any less from T . So a minimal revision T + P cannot be stronger than T \ P .

However, at �rst glance it may seem that a minimal revision should not have

less content than T \ P , because a weaker theory retracts beliefs from T . But

as the proof of Proposition 3.4 in Section 3.6 shows, weaker theories than T \P

add fewer beliefs to T than T \P (provided that T 6j= P ), which compensates for

such retractions. An example may clarify this point. Recall the physical symbol

system hypothesis from Section 2.3.5. A scientist investigating this hypothesis

may believe that a certain AI system, say SOAR, is the only candidate for

arti�cial intelligence. This scientist presumably believes that \if SOAR passes

all future tests for intelligent responses, SOAR is intelligent", and also that \if

SOAR fails a test, there is no machine with arti�cial intelligence". Initially, the

scientist believes neither that the physical symbol system hypothesis is true,

nor that it is false. Now suppose the scientist learns that SOAR , after a

promising beginning of passing one-hundred tests, failed the last one. If he

added this evidence to his beliefs as they stand, he would conclude that machine

51

Figure 3.2: Pareto-Minimal Theory Changes that avoid Additions and Retrac-

tions

52

intelligence is impossible, although beforehand he had no de�nite view about the

matter. But if he were to retract his belief that \if SOAR fails a test, there is

no arti�cially intelligent machine", he might speculate that some machine other

than SOAR will possess true intelligence. So retracting this belief avoids adding

any conclusions about the possibility or impossibility of machine intelligence.

3.4 Is \Minimal Change" Belief Revision Mini-

mal?

In the previous section, I showed how we can apply the Pareto principle to

the values of avoiding additions and retractions to arrive at a notion of min-

imal theory change. How does this notion of minimal change compare with

proposals in the literature on \belief revision theory"? In a seminal paper,

[Alchourr�on et. al. 1985] propose that a revision should count as a minimal

change if it satis�es certain axioms, known as the AGM postulates. In my

setting (with consistent evidence statements), the AGM postulates amount to

the following (cf. [Kelly et al. 1995]).

(AGM 1) T + P j= P .

(AGM 2) If P 6= ;, then T + P 6= ;.

(AGM 3) If P \ T 6= ;, then T + P = T \ P .

(AGM 4) If P \ (T +Q) 6= ;, then T + (P \Q) = (T +Q)\ P .

Figure 3.3 contrasts the AGM postulates with minimal theory changes de-

�ned via the Pareto principle.

AGM 1 expresses the idea that the result of incorporating P into a theory

must entail P . It is not the notion of minimal change that seems to motivate

AGM 2, but rather an independent norm that an agent should avoid contra-

dictions. (G�ardenfors calls an inconsistent theory \epistemic hell".) AGM 3

says that if P is consistent with T , the minimal change to T that incorporates

P is simply the result of adding P to T . AGM 4 implies that revising T on a

conjunct P \Q should yield the same result as revising T �rst on Q and then

on P , provided that the revision of T on Q is consistent with P .

What I call a minimal revision on P entails P and hence satis�es AGM 1.

If T is inconsistent with P , the contradiction is a minimal revision, contrary to

AGM 2. Other than ruling out the contradiction, the AGM postulates provide

no guidance. Likewise, by clause 1 of Proposition 3.4, all revisions are minimal

revisions (since they are no stronger than T \ P = ;). In the case in which

T entails P , by Clause 2 of Proposition 3.4 the minimal revision of T on P is

just T , as in belief revision theory. (G�ardenfors lists Clause 2 as a separate

postulate,K+4 [G�ardenfors 1988, p.49].) But when T does not entail P , AGM

53

Figure 3.3: Pareto-Minimal Theory Changes and the AGM Axioms

54

3 rules out any theory that is weaker than the addition of P to T , although these

are minimal revisions by my de�nition. This is the main di�erence between the

notion of minimal change based on Pareto-optimality with respect to changes,

and the AGM approach. Is there another way of de�ning \minimal" change

from basic dominance principles that is closer to the AGM axioms? We can

�nd a clue in G�ardenfors' writing; he justi�es the postulate in question, AGM

3, as follows.

The next postulate for expansions can be justi�ed by the \eco-

nomic" side of rationality. The key idea is that, when we change our

beliefs, we want to retain as much as possible of our old beliefs|

information is in general not gratuitous, and unnecessary losses of

information are therefore to be avoided. This heuristic criterion is

called the criterion of information economy... If P is indetermined

in T (or if P is accepted in T ), then ... P does not contradict any of

the beliefs in T . It is therefore possible to retain all the old beliefs

in the expansion of T by P ; so the criterion of information econ-

omy justi�es the following: (K+3) if T \ P 6= ;, then T + P j= T .

[G�ardenfors 1988, p.49] (Emphasis G�ardenfors'; here and elsewhere

I adapt his notation to mine.)

Avoiding \loss of information" is a dubious defense of (K+3). The postulate

prohibits an agent from changing her mind in a way that we might well regard as

a gain in information. For example, if an ornithologist investigating the colour

of swans (as in Section 2.3.1) starts out with the belief that there is a black

swan, but after �nding one-hundred white swans accepts that all swans are

white, she violates (K+3); but if anything, she seems to have gained, not lost

information! I will take up this issue in Section 3.5. The quotation makes clear

that the purpose of (K+3), and of AGM 3, is to \retain old beliefs", that is to

avoid retractions. G�ardenfors states that his postulates are motivated by the

\conservativity principle", which says that \when changing beliefs in response

to new evidence, you should continue to believe as many of the old beliefs as

possible" [G�ardenfors 1988, p.67]. This suggests examining changes T 0 that are

not dominated in retractions, such that no other change retracts less than T 0 (in

the sense of De�nition 3.4). The next proposition shows that retraction-minimal

changes have at least as much content as the addition of P to T .

Proposition 3.5 A revision T + P is retraction-minimal () T + P j= T .

Now if we select among the changes that minimize retractions those that

minimize additions, we �nd that the only such change is the addition of the

evidence to the current theory, as AGM 3 requires.

Proposition 3.6 If a revision T + P is addition-minimal among retraction-minimal revisions, then T + P = T \ P whenever T and P are consistent with

each other.

55

In light of these results and G�ardenfors' text, can we interpret the AGM

postulates as specifying the means for an agent who wishes to, �rst, avoid

retractions, and second, avoid additions? The answer is no: If the current

theory T is inconsistent with P , then by Proposition 3.5, the retraction-minimal

revision T +P entails T \P = ;, and hence T +P = ;. If we categorically rule

out the contradictory theory as a result of a revision, we still have that in order

to minimize retractions, an agent must produce a complete theory whenever

the new information contradicts his previous beliefs. Figure 3.4 summarizes the

relationships between the two notions of belief change derived from dominance

principles and the AGM postulates.

G�ardenfors rejects this consequence of avoiding retractions as \unpalatable".

When T is a nonmaximal belief set, which is the normal case, [and T

is inconsistent with P ,] T +P is a maximal belief set that obviously

is too large [viewed as a set of sentences] to represent any intuitive

process of rational belief revision|just because you revise your be-

liefs with respect to P , you should not be led to have a de�nite

opinion on every matter. [G�ardenfors 1988, p.59]

It seems fair to ask why conservatism is the fundamental principle of \ratio-

nal" belief revision when the theory is consistent with the evidence, but not

otherwise. If we should aim to \retain as many of our old beliefs as pos-

sible", and being a \besserwisser" [G�ardenfors 1988, p.58], a know-it-all, is

the only way to do so when contradictory information arrives (by Proposition

3.5 and G�ardenfors' observations), then should we not become besserwissers?

G�ardenfors' text suggests that we should not have to adopt \a de�nite opinion

on every matter" when, \in the normal" case we did not have one before. He

seems to think that adding many new beliefs is too high a price for retaining

old beliefs. But if avoiding additions can compensate for retractions when the

evidence refutes the agent's beliefs, the same should be true when the agent's

beliefs are consistent with the evidence. The Pareto-optimality principle for

additions and retractions strikes just this kind of balance.

One question is how to de�ne a plausible conception of \minimal belief

change" in a principled way. The dominance principles that I proposed accom-

plish that; it is not clear that the AGM postulates do. Postulates for minimal

theory change may give good guidance for updating data bases or revising legal

codes [G�ardenfors 1988]. But for the theory of inductive inference, the issue is

whether agents who are engaged in empirical inquiry should revise their theories

in a minimal way. Should scienti�c inquiry obey the AGM postulates? This is

the question of the next section.

56

Figure 3.4: Three Notions of Minimal Theory Change

57

3.5 Empirical Inquiry and Belief Revision

What are the implications of the AGM principles for empirical inquiry? Some

writers deny that there are any. Thus [Nayak 1994] states that \AGM is inad-

equate for iterated belief change". And recently, some writers have suggested

that we should view belief revision principles not as norms for actual belief

change at all, but instead as specifying acceptance conditions for conditionals

(e.g., [Levi 1988]). But to see if the AGM principles are plausible recommen-

dations for scienti�c inference, I shall explore some ways of interpreting them

in this setting. A natural proposal is to de�ne a scienti�c method as the result

of revising an initial theory on the evidence. 2 Let a belief revision operator

+ and an initial theory T be given. I say that a method � is represented

repetitively by + and T if for all data sequences e and single observations x:

1. �(;) = T:

2. �(e � x) = �(;) + [e � x].

Another proposal is that it is always the current theory that is revised as

the data comes in; I call this sequential revision. Accordingly, a method � is

represented sequentially by + and T if for all data sequences e and single

observations x:

1. �(;) = T:

2. �(e � x) = �(e) + [e � x].

A method is consistent with the AGM postulates if there is some belief revi-sion operator + satisfying the AGM postulates such that + and �(;) represent

�. I refer to such methods as AGM methods. [Kelly et al. 1995, Prop.2] show

that as far as de�ning the class of AGM methods is concerned, it does not

matter if we represent methods repetitively or sequentially.

Proposition 3.7 (Kelly, Schulte, Hendricks) A method � is represented se-quentially by some AGM belief revision operator + () � is represented repet-

itively by some AGM belief revision operator +0.

[Kelly et al. 1995, Prop.2] characterize what the AGM methods are like.

Proposition 3.8 (Kelly, Schulte, Hendricks) A method � is AGM() for

all �nite data sequences e; e � x

1. if e � x is consistent with �(e), then �(e � x) = �(e)\ [e � x];

2A di�erent proposal is that some inductive process decides what proposition to incorporate

into a theory, but that the AGM principles guide the incorporation; see [Kelly et al. 1995].The following de�nitions are from [Kelly et al. 1995].

58

2. �(e) is consistent;

3. �(e) j= [e].

By Fact 3.3, all content-error acceptable methods satisfy clause 3. Clause 2

keepsAGM methods out of \epistemic hell". (Proposition 6.1 in Chapter 6 pro-

vides a means-end justi�cation for producing only consistent theories). Clause 1

combines two requirements for the case in which the data is consistent with the

current theory: The method should not add more than the data to its theory,

and the method should not weaken its theory. [Kelly et al. 1995] refer to the

�rst requirement as \timidity" and to the second as \stubbornness". Proposi-

tion 3.4 shows that timidity follows not only from the AGM principles, but also

from the weaker notion of minimal change based on Pareto-optimality between

additions and retractions. G�ardenfors motivates the principle as follows.

[The other postulates] do not ... exclude the possibility that T+P

contains a lot of beliefs, not included in T , that have no connection

whatsoever with P . Because we want to avoid endorsing beliefs that

are not justi�ed, we should require that T+P not contain any beliefs

that are not required by the other postulates [i.e., AGM 1, 2 and

the requirement that T + P j= T \ P when T is consistent with P ].

[G�ardenfors 1988, p.51]

But if \unjusti�ed beliefs" are the chicken to kill, AGM 3 is a butcher's

knife. Consider a hypothetical ornithologist working on the colour of swans, as

in Example 2.3.1, who is initially agnostic about whether all swans are white

or not. If in fact all swans are white, timidity prevents the ornithologist from

accepting this generalization, no matter how many white swans he sees: for each

new white swan, a minimal change in his beliefs will not add any more than that

this swan, too, is white. Writers from [Putnam 1963] to [Maher 1996] agree that

an inductive method should eventually produce such a universal generalization.

Hence timidity is unacceptable as a norm for empirical inquiry.

Stubbornness fares little better. Proposition 3.5 shows that stubbornness

avoids retractions. But if the scientist starts out with false beliefs, good meth-

ods should lead him to retract these in favor of true beliefs. However, if our

ornithologist initially believes that there is a black swan, then stubbornness will

prevent him from retracting this belief no matter how many white swans he

sees|clearly a bad prescription for inductive inference.

So the AGM principles do not help an agent to arrive at true beliefs|

but do they interfere with that goal? This depends on the scientist's initial

beliefs. In the example of the swans, a scientist who begins with the bold

hypothesis that all swans are white, and only adds the data at each stage, is

guaranteed to eventually have the correct opinion about this hypothesis (cf.

Section 9.5). This idea works in general: [Kelly et al. 1995] show that if there

is any method at all that reliably settles a set of hypotheses of interest, then

59

there is an AGM method that does so. 3 The trick is to choose su�ciently

strong initial beliefs, for example a complete theory. Then one of two things

may happen: Either the strong theory is true, in which case simply adding the

data to the theory entails the correct answer about the hypotheses of interest.

Or the theory is false, and is eventually refuted by the evidence; at that point,

the AGM principles allow the method to choose any other conjecture. Making

these choices in the right way leads to a reliable method. Thus from the point

of view of means-ends analysis, the problem with the AGM norms is not that

they make reliable inquiry impossible, but that they force agents who want to

be reliable to produce theories with more content than necessary for settling the

question under investigation, exposing the inquirer to higher risk of error and

retractions. (Section 5.3.1 discusses this issue in more detail, and shows that

the AGM theory shares this problem with Popper's falsi�cationism.)

One important goal of inquiry is to replace uncertainty at the beginning

with true theories. The AGM principles contribute nothing towards this goal.

Part of the problem is that the epistemic value that motivates belief revision

theory is avoiding retractions, not �nding the truth. Another part is that the

minimal-change perspective looks ahead only one step at a time. In the next

chapter I consider a perspective that stands in sharp contrast to this myopia:

What is empirical inquiry like if the goal is to settle on a true theory, whether

it takes one, ten, or any number of steps?

3.6 Proofs

De�nition 3.5 Let T; T0; T1 be three theories.

� A change T0 from T is Pareto-dominated with respect to retractions andadditions by T1 ()

1. T0 adds more propositions than T1 to T , and for all empirical propo-

sitions H; if T j= H and T0 j= H; then T1 j= H, or

2. T0 retracts more propositions than T1 from T , and for all empirical

propositions H; if T 6j= H and T1 j= H; then T0 j= H.

� A revision T + P is a minimal change () T + P is a change from T

that is not Pareto-dominated with respect to retractions and additions byanother change T 0 that entails P .

Proposition 3.4 Suppose that a revision T + P entails P . Then T + P is aminimal change ()

1. T \ P entails T + P and

3[Martin and Osherson 1995] prove a similar universality result for AGM methods.

60

2. if T j= P , then T + P = T .

Proof. (=)) Part 1: I show the contrapositive. Suppose that T \ P 6j=

T +P . Then there is a data stream � that makes T and P true but T +P false.

Thus T + P j= f�g, but T 6j= f�g. Now consider (T + P ) [ f�g. This theory

does not entail f�g, and if (T + P ) [ f�g adds anything to T , so does T + P

since T + P j= (T + P )[ f�g. Hence (a) (T + P )[ f�g adds fewer propositions

to T than T + P . Moreover, (b) (T + P ) [ f�g retracts from T only what

T + P retracts from T : clearly if T + P retracts H from T , then (T + P )[ f�g

retracts H from T since (T +P )[f�g has less content than T +P . Conversely,

suppose that (T + P )[ f�g 6j= H, and that T j= H. Since � 2 T , it follows that

T +P 6j= H. Thus if (T +P )[f�g retracts a proposition from T , so does T +P .

Finally, we have (c) (T + P ) [ f�g j= P , since both T + P and f�g entail P . It

follows from (a), (b) and (c) that T + P is not a minimal change

Part 2: Since T j= P by hypothesis, T is the unique minimal change from T

that entails P .

((=) Suppose that T + P satis�es conditions 1 and 2. Then the claim is

immediate if T j= P ; suppose that T 6j= P , and let � be a data stream on which

T is true and P false. I show that no change T 0 that entails P Pareto-dominates

T +P with respect to revisions. First, suppose that T +P retracts a proposition

H from T , and T 0 does not, so that T 0 j=H. Since T 0 j= P j= f�g, we have that

(d) T 0 j= H\f�g. But T 6j= f�g, and hence (e) T 6j= H \f�g. Since T +P 6j= H,

we have that (f) T +P 6j= H \f�g: From (d), (e) and (f) it follows that T 0 adds

a belief to T that T + P does not to T . So T 0 does not Pareto-dominate T + P

in additions or retractions from T .

Second, suppose that T + P adds a proposition H to T , but T 0 does not

entail H. Hence T 0 is consistent with H; this and the fact that T \ P j= T + P

implies that T 0 \H 6j= T \ P , because otherwise T 0 \H and hence H would be

consistent with T + P . But if T 0 \H 6j= T \ P , then there is a data stream �

in T0 \ H on which T is false, because T 0 and thus T 0 \ H entails P . So (g)

T j= f�g, but T 0 does not. Finally, (h) T + P j= f�g because T + P j= H and

H is false on �. From (g) and (h) it follows that T 0 retracts a belief from T

that T + P does not retract from T . So T 0 does not Pareto-dominate T + P in

additions or retractions from T .2

Proposition 3.5 A revision T + P is retraction-minimal () T + P j= T .

Proof. ((=) If T + P entails T , then T + P retracts nothing from T and

hence is clearly retraction-minimal.

(=)) I show the contrapositive. Suppose that T + P 6j= T ; let � be a data

stream that makes T + P true but T false. Then T entails f�g, but T + P does

not. So T + P � f�g retracts fewer propositions from T than T + P .2

Proposition 3.6 If a revision T + P is addition-minimal among retraction-

minimal revisions, then T + P = T \ P whenever T is consistent with P .

61

Proof. From Proposition 3.5 we have that T +P j= T \P . From Proposition

3.4 it follows that T \ P j= T + P . Thus T + P = T \ P .2

62

Chapter 4

Discovery Problems and

Reliable Solutions

4.1 Outline

In the previous chapter I showed that principles of minimal change revision

give no guidance towards �nding a true hypothesis, and indeed may interfere

with that goal. In sharp contrast, philosophers such as Peirce, Putnam and

Reichenbach have proposed that inquiry should take as its primary goal �nding

the truth|if not immediately, then in the long run. This chapter and the next

analyze the design of scienti�c methods for attaining this goal. I consider the

problem of identifying a correct hypothesis among a range of mutually exclusive

alternatives. A method � is admissible for this purpose if there is no other

method �0 that identi�es a correct hypothesis on more data streams than �.

The �rst result of this chapter is that the methods admissible in this sense are

exactly those that [Kelly 1995] has termed reliable: they succeed on all data

streams consistent with given background knowledge.

When exactly is there a reliable method for identifying a correct hypothe-

ses from a given set of alternatives? The answer is that the set of alternatives

must be enumerable, and that it must be possible to decide each hypothesis

in the limit. The second condition means that there must be a method that

eventually always entails the hypothesis if the hypothesis is true, and eventually

always entails the negation of the hypothesis if the hypothesis is false. Some

hypotheses can be reliably assessed in the limit of inquiry even though no �-

nite amount of evidence conclusively falsi�es the hypothesis in question. Thus,

contrary to Popper's ideas, we can admit unfalsi�able hypotheses as candidates

for|asymptotically reliable|empirical tests without jeopardizing the ability of

science to avoid error and �nd interesting truth in the long run [Popper 1968].

Indeed, if all alternatives under investigation are unfalsi�able but decidable in

63

the limit, Popper's recommended conjectures-and-refutations scheme is not themost reliable means for �nding the truth.

Since the feasibility of reliably identifying a true hypothesis depends crucially

on the testability of the alternatives in question, I examine what the structureof hypotheses must be like if we want to be able to test them in various senses.

This de�nes a precise topological scale of inductive complexity which places

hypotheses that are harder to test above the easier ones.

The results in this chapter bring out some of the consequences for method-

ology if we adopt the norm that scienti�c methods should be reliable in the long

run. This norm has been criticized both as too weak|because success in the

long run is allegedly not an interesting aim for science|and as too strong|

because we may want to regard an inductive problem as tractable even when

every method for investigating it might fail to converge to the truth, on the

grounds that the possibilities of failure are \negligible". I address these criti-

cisms at the end of this chapter.

4.2 Convergence to the Truth

Signi�cant questions about the world prompt empirical research, and translate

into hypotheses. The researcher seeks a theory that settles the hypotheses of

interest. An important special case obtains when the hypotheses of interest are

mutually exclusive alternatives; then I say that these hypotheses form a parti-

tion. Popper referred to the task of selecting one hypothesis from a partition of

alternatives as the problem of discovery [Popper 1968]. Of course, choosing

some alternative is a trivial task; the interesting problem is to choose the right

one. We would like science to quickly determine with certainty which alternative

is correct. But as the examples from Section 2.3 show, this is usually too much

to ask: no �nite amount of evidence will tell us with certainty that all swans

are white or whether there are in�nitely many elementary particles. As William

James pointed out, \the intellect, even with truth directly in its grasp, may

have no infallible signal for knowing whether it be truth or no" [James 1982,

Sec.VI, p.197].

Philosophers such as Peirce, Putnam, Reichenbach, Glymour and Kelly, and

learning theorists such as Gold, have proposed a more feasible goal of inquiry:

that the scientist should eventually settle on the right alternative, without nec-

essarily providing a sign that she has done so (see Figure 4.1).

This conception of success �ts with Plato's idea that stable true belief is bet-

ter than �ckle opinion that happens to be true, and may indeed be a necessary

aspect of knowledge.

For [true opinions], so long as they stay with us, are a �ne pos-

session, and e�ect all that is good; but they do not care to stay for

long, and run away out of the human soul, and thus are of no great

64

Figure 4.1: Successful Discovery: On data stream �, method � identi�es the

correct hypothesis from a set of alternatives.

65

value until one makes them fast with causal reasoning...But when

once they are fastened, in the �rst place they turn into knowledge,

and in the second, are abiding. [Plato 1967, p.363]

We may de�ne success in the limit of inquiry like this. A method � con-

verges to an alternative H on a data stream � if after some stage n, for all

later stages n0 � n :

1. �(�jn0) j= H, and

2. �(�jn0) is consistent.

Given a collection of alternative hypothesesH and a data stream �, I denote

the hypothesis from H that is true on � by H(�). (Here and elsewhere I assume

that for each data stream � in K, there is a hypothesis in H that is correct on �;

that is, K j=SH.) I say that a method � succeeds on � if � converges to H(�)

on �. What methods are admissible with respect to the aim of converging to a

true alternative? Obviously, it su�ces if a method succeeds on all data streams

consistent with background knowledge K. Conversely, a method � is admissible

as far as convergence to the truth is concerned only if � succeeds everywhere in

K. For suppose that � fails on some data stream � consistent with K. Then

de�ne �0 like this: Conjecture H(�) while the data are consistent with �. If

the data deviate from �, �0 follows �. Then �0 succeeds on � and everywhere

that � does, and hence dominates � with respect to converging to the truth. So

convergence-admissible methods succeed everywhere in K.

Fact 4.1 Let H be a collection of alternative hypotheses, and let K be back-ground knowledge. Then a method � is admissible given K with respect to con-

verging to a correct alternative in H () � converges to the correct alternativeon each data stream � in K.

Following [Kelly 1995], I say that a method � solves the discovery prob-

lem for H given background knowledge K, or is reliable for H given K, if �

succeeds on all data streams in K. Thus Fact 4.1 says that a method is admissi-

ble with respect to converging to the truth if and only if the method is reliable.

Section 2.3 gives examples of reliable methods.

4.3 Reliable Solutions for Discovery Problems

As with any proposed aim of inquiry, I examine under what conditions the aim

is feasible, and how to attain it when it is feasible. In the case of converging to

a true hypothesis in a discovery problem, by Fact 4.1 this question turns into

the question of when a discovery problem permits a reliable solution, and what

reliable methods are like.

66

Popper had a proposal for how to attack discovery problems: His conjectures-

and-refutations method. Brie y, the idea is the following. Assume that the

collection of alternatives is countable, and thus can be enumerated in some se-

quence H1;H2; ::: (perhaps with \bolder" hypotheses coming before more timid

ones, if an audacity ranking is available). Begin with the �rst hypothesis H1,

and conjecture H1 until the evidence falsi�es H1. Then move on to H2, con-

jecture H3 if H2 is falsi�ed, etc. The conjectures-and-refutations scheme will

reliably identify the true alternative if each alternative hypothesis Hi has the

property that whenever Hi is false, the evidence eventually falsi�es Hi. I say

that hypotheses with this property are refutable with certainty. To see that

the conjectures and refutations scheme is reliable if all alternatives are refutable

with certainty, let Hn be the true hypothesis. So H1;H2; : : : ;Hn�1 are false,

and hence eventually each of these alternatives is conclusively falsi�ed. After

that point, the conjectures-and-refutations procedure always produces the true

hypothesis Hn.

One might think that the alternative hypotheses under investigationmust berefutable with certainty if inquiry can reliably identify a true one. For otherwise

we run the danger of always maintaining a false conjecture that is never conclu-

sively refuted by the evidence. But the almost universal generalizations from

Section 2.3.2 show the fallacy in this argument: The generalizations \almost all

swans are white" and \almost all swans are black" allow any �nite number of

exceptions, and hence are consistent with any evidence. Yet, we saw in Section

2.3.2 that it is possible to reliably identify which of the almost universal gener-

alizations is true (assuming that one of them is). Similarly, statements about

the limiting frequency of an event are not falsi�able by any �nite amount of evi-

dence. But as we saw in Section 2.3.4, an inductive method can reliably identify

the true limiting frequency of an event from a �nite set of possible alternatives.

Popper derived his conjectures-and-refutations scheme from his conception

of how science tests theories, namely by successive attempts at falsi�cation. In-

deed, testing alternative against empirical data hypotheses is crucial to reliable

discovery. But reliable discovery methods need not proceed by waiting for the

evidence to conclusively falsify the current hypothesis. It su�ces to have a test

method that will settle the truth value of each alternative in the limit, without

ever yielding certainty about whether a conjecture is true or false. Following

[Kelly 1995, Ch.4], I say that such a test procedure decides a hypothesis H in

the limit. To be precise, a method � decides a hypothesis H in the limit given

background knowledge K if for all data streams � in K:

1. � converges to H if H is true on �;

2. � converges to H if H is false on �.

Figure 4.2 illustrates decision in the limit.

A hypothesis H is decidable in the limit given background knowledge K

if there is a method � that decides H in the limit given K: A hypothesis H

67

increasing evidencefrom ε

δ conjecturesthat H


(a) H is true on

is false

is true


H



(b) H is false on

is false

is true

Conjectures ofmethod δ


ε;conjectures stabilize to H

ε; conjectures stabilize to the negation of

Figure 4.2: Testing Empirical Hypotheses: Decision in the Limit of Inquiry

68

that is refutable with certainty, such as \all swans are white", is decidable in

the limit: Conjecture H until the evidence refutes H. If H is never refuted,

then H is true and hence this method settles on the correct truth-value for H

(immediately). The hypothesis \almost all swans are white" is not refutable

with certainty, but decidable in the limit if we assume that either almost all

swans are white or almost all swans are black: Conjecture \almost all swans

are white" if the last observed swan is white, and \almost all swans are black"

otherwise (cf. Section 2.3.2).

If there are only countably many alternative hypotheses, and if each of them

is decidable in the limit, there is a reliable method for identifying the true one

among them: Use a limiting decision test procedure �1; �2; :: for each of the

hypotheses H1;H2::: respectively. Out of all the hypotheses Hi whose test �iis positive on given evidence e|that is, �(e) j= H and �(e) is consistent|

conjecture the alternative that comes �rst in the enumeration. If Hn is the true

alternative, after some �nite time the limiting decision procedures �1; �2; :::; �n�1

for H1;H2:::;Hn�1 always return negative results, whereas the test procedure

�n for Hn is positive. After that point, the discovery method described set-

tles on Hn. Conversely, if there is a reliable method � for identifying the true

alternative from a collection of hypotheses H, then each hypothesis H in H

must be decidable in the limit. In fact, � itself decides each hypothesis H in

H in the limit, because � eventually settles on the true one, which entails the

negation of all the other (false) alternatives, since the hypotheses in H are mu-

tually exclusive. Finally, if there are more than countably many alternatives in

H, there is no reliable method for discovering the true one. This is so because

there are only countably many �nite data sequences, so if there are uncount-

ably many alternatives, there must be some that a discovery method � never

produces. If one of these is correct|call it H|� never conjectures H, much

less stabilizes to H. Thus the space of alternative hypotheses must be small|at

most countable|in comparison to the uncountable space of empirical possibil-

ities, the data streams. In sum, we have the following necessary and su�cient

conditions for reliable discovery.

Proposition 4.2 (Kevin Kelly) Let a partition H of background knowledgeK be given such that K =

SH:

1 Then the discovery problem for H is solvable

given K ()

1. H is countable, and

2. each H 2 H is decidable in the limit given K:

1The assumption that K =S

H incurs no loss of generality. For I assume throughout

that H covers all possibilities in K, such that K �S

H . And in the presence of background

knowledge K, we may consider H 0 = fH \ K : H 2 Hg as the speci�cation of alternative

hypotheses; clearly K =S

H 0.

69

This characterization is useful in several ways. For given background knowl-

edge K, it tells us whether there is a reliable discovery method for a partition

H of alternative hypotheses. If there is, we may recommend that a scientist

should use a reliable method (as in the examples from Section 2.3), and criticize

those who do not (as I criticized the AGM postulates in Chapter 3). If there

is no reliable discovery method, we may lower our sights to a weaker notion of

inductive success (such as gradual identi�cation; see Section 2.3.4), or we may

criticize a research program for a mismatch between its ambitions and its means

(as Kelly critiqued cognitive science [Kelly 1995, Ch.7]; see also Section 4.4 be-

low.) If the standard of serious possibility K is open to discussion, one of the

factors that may be relevant in deciding to adopt a standard K is whether K

gives rise to reliable discovery methods for problems of interest. (The results in

Chapter 8 suggest that this is one motivation for using conservation principles

in particle physics.)

4.4 Testing and Topology

Proposition 4.2 shows that the question \is there a reliable discovery method

for a countable partition of alternatives H?" reduces to the question \is each

alternativeH decidable in the limit?". This section characterizes the structure of

hypotheses that are decidable in the limit. If a hypothesisH and its complement

H are both countable unions of refutable hypotheses, then H is decidable in the

limit. For in that case we can form an enumeration C = CH

1 ; CH

1 ; CH

2 ; CH

2 ; ::: of

refutable subsets of H and H , respectively, such that C covers H and H. Then

we \internally" conjecture the �rst hypothesis in the enumeration consistent

with the evidence. Since C covers both H and its complement, one of the

refutable hypotheses in the enumeration must be true, and our method will

eventually stabilize to the �rst true \internal" hypothesis Ci in C that entails

the correct answer H, or H , as the case may be.

The converse holds as well. Let � be an inductive method. I say that �

projects a hypothesis H from H along � at e if � extends e and � stabilizes to

H at e. I refer to this set as the projection set of the method �, and denote

it by proj(�; e); formally, proj(�; e) = f� : if e � e0 � � and �(e) j= H 2 H, then

�(e0) j= H and �(e0) 6= ;g. Figure 4.3 illustrates this concept.

The proposition that a method � stabilizes at a given time is refutable with

certainty; for if it is false, then � changes its mind and conclusively falsi�es the

proposition.

Lemma 4.3 Let � be a discovery method, and let background knowledge K begiven. Then for all �nite data sequences e, proj(�; e) \K is refutable with cer-tainty given K.

Now suppose that � decides a hypothesis H in the limit. Consider the set

Stabil(�;H) of all �nite data sequences at which � stabilizes to H; formally,

70

Figure 4.3: The Projection Set of a Discovery Method �

71

Stabil(�;H) = fe : proj(�; e) j= Hg. By Lemma 4.3, Stabil(�;H) is a countable

collection of refutable hypotheses, each of which entails H. Since � is reliable,

� stabilizes to H whenever H is true, and thus each data stream that makes H

true is contained in some member of Stabil(�;H). Therefore H is a countable

union of refutable hypotheses. The same argument holds for H. Taking into

account background knowledge, these observations yield the following result.

Proposition 4.4 (Kevin Kelly) A hypothesis H is reliably decidable in the

limit given background knowledge K () H \K and H \K are each countableunions of hypotheses that are refutable with certainty given K.

A hypothesis like \there are only �nitely many elementary particles" (cf.

Section 2.3.6) is a countable union (disjunction) of refutable hypotheses: the

hypothesis is equivalent to \there are no elementary particles or there is at

most one or there are at most two ...". But this hypothesis|call it H|is not

reliably decidable in the limit. For let any method � be given. An inductive

cousin of a Cartesian demon may lead � astray as follows. If �(e) entails H, the

demon presents one new particle after another, until � conjectures that there

are in�nitely many elementary particles. Then the demon stops the ow of new

particles until the theory of � entails H again, etc. If � stabilizes to H, then

the demon presents in�nitely many particles, and H is false. If � stabilizes to

H, then the demon presents only �nitely many particles, and H is false. So if

� stabilizes to an answer, it is the wrong answer. On the other hand, if � does

not converge, then � fails to decide H in the limit. Thus in all cases, � fails.

Since the demonic strategy works against any method, H is not decidable in

the limit. It follows from Proposition 4.4 that the negation of H, \there are

in�nitely many elementary particles" is not a countable disjunction of refutable

hypotheses. However, we saw in Section 2.3.6 that a method � can reliably

assess H in a weaker sense: if H is true, then � eventually always entails H;

and if the hypothesis is false, � will not stabilize to entailing H, although � may

not stabilize to entailing H either. Following [Kelly 1995, Ch.3], I say that �

reliably veri�es H in the limit (Figure 4.4).

The di�erence between decision and veri�cation in the limit arises when a

hypothesis H is false. In that case decision requires that a reliable method

must eventually always entail that H is false, whereas veri�cation allows that a

reliable method may go back and forth between H and its negation H. Other

hypotheses that are veri�able but not decidable in the limit are the two almost

universal generalizations from Section 2.3.2, assuming that both of them may be

false on the same data stream, and Newell and Simon's physical symbol system

hypothesis (cf. Section 2.3.5). There are two ways of viewing the fact that the

physical symbol system hypothesis is veri�able but not decidable in the limit.

A supporter of arti�cial intelligence might point out that there is a reliable

procedure for �nding an intelligent machine if there is one, and hence Newell

and Simon's research program is indeed capable of leading us to the truth about

72




(a) H is true on

is false

H

is true




(b) H is false on

is false

is true



...

ε;

ε;conjectures need not stabilize

conjectures stabilize to

Figure 4.4: Method � veri�es hypothesis H in the limit.

73

machine intelligence. A critic might reply that if the physical symbol system

hypothesis is false, there is no reliable method for determining this fact. He may

add that if there is no intelligent machine, then one attempt after another at

building an intelligent system will fail, but no particular history of failures will

appear to be a better reason to abandon the project than another. Reliability|

avoiding conjecturing the physical symbol system hypothesis forever when it is

in fact false|puts pressure on the AI proponent to eventually give up his quest;but she may respond to each failure by asking for more patience (and research

grants). Such a critic would conclude that Newell and Simon's faith in the ability

of \empirical research" to settle the physical symbol system hypothesis as they

formulated it is unjusti�ed, and that arti�cial intelligence researchers need a

stronger theory of intelligence to arrive at plausible background assumptions

that reduce the inductive complexity of the physical symbol system hypothesis.

An argument just like the one for Proposition 4.4 establishes that a hypoth-

esis H is reliably veri�able in the limit just in case H is a countable union of

refutable hypotheses.

Proposition 4.5 (Kevin Kelly) A hypothesis H is veri�able in the limit given

background knowledge K () H \K is a countable union of hypotheses that arerefutable with certainty given K.

As an immediate corollary, a hypothesis H is decidable in the limit just in

case both H and its negation H are veri�able in the limit.

Corollary 4.6 (Kevin Kelly) A hypothesis H is reliably decidable in the limitgiven background knowledge K () H and H are each reliably veri�able in the

limit.

Thus the answer to the questions \is H decidable in the limit?" and \is

H veri�able in the limit?" depends on whether H and H can be decomposed

into refutable hypotheses. The structure of refutable hypotheses is as follows.

Recall that a hypothesis H is refutable with certainty just in case the evidence

eventually falsi�es H whenever H is false. This condition will fail if and only

if there is a data stream � that makes H false but no �nite amount of evidence

from � conclusively falsi�es H. Following [Kelly 1995, Ch.4], I call such a data

stream a limit point of H. Figure 4.5 illustrates this concept.

To be precise, a data stream � is a limit point of H just in case all �nite initial

segments �jn of � are consistent with H. A set of data streams H is closed just

in caseH contains all of its limit points, and closed given background knowledge

K just in case H contains all of its limit points that are consistent with K. So

\all swans are white" is closed, but its negation \there is a non-white swan" is

not. A closed hypothesis H is refutable with certainty: if H is false on a data

stream �, then � is not a limit point of H since H is closed, which implies that

H is eventually falsi�ed along �. Conversely, if H is refutable with certainty,

then H is closed: Otherwise there is a limit point � of H missing from H, which

means that H is false on � but never falsi�ed along �. Thus we have:

74

Figure 4.5: Data stream � is a limit point of hypothesis H.

75

Proposition 4.7 (Kevin Kelly) A hypothesis H is refutable with certaintygiven background knowledge K () H is closed given K.

This notion of closed set de�nes a topology in the space of data streams

(cf. [Kelly 1995, Ch.4], [Schulte and Juhl 1996]) whose open sets are the com-

plements of closed sets. Kelly shows that an open set in this topology is a union

of evidence propositions [e]. The open sets are exactly the hypotheses that

are eventually entailed by the evidence whenever they are true; we may call

such a hypothesis veri�able with certainty. Obviously hypotheses that are

veri�able with certainty are veri�able in the limit. Since a refutable (closed) hy-

pothesis is also veri�able in the limit|by Proposition 4.5|and its complement

is open and thus veri�able with certainty, it follows from Proposition 4.4 that

all refutable hypotheses are decidable in the limit (as we observed in Section

4.3).

Topologists refer to countable unions of closed sets|hypotheses veri�able

in the limit|as G� sets. There is a correspondence between quanti�er alterna-

tions in natural-language de�nitions of hypotheses and their topological struc-

ture: Closed sets correspond to universally quanti�ed hypotheses (\all swans

are white"). Open sets correspond to existential hypotheses (\there is a non-

white swan"). G� sets correspond to 98 claims (\there is a time n such that for

all later times n0, no new particles are discovered"). The correspondence holds

exactly only under certain conditions (cf. [Kelly 1995, Ch.12]), but it is a useful

heuristic for determining the sense in which a given hypothesis is testable.

4.5 Against Method

An essential part of long-run methodology is the notion of a long-term strategy

for responding to evidence. Isaac Levi rejects this picture of inquiry on the

grounds that an agent should be free to review and revise his approach during

the course of inquiry.

In the �rst place, although I am inclined to the view that there

are some �xed methodological norms..., these norms are extremely

weak and heavily dependent for their operation on contextual factors

which change with developments in scienti�c inquiry. Consequently,

I too am \against method" if by this one means a very substantive

method immune to criticism and revision during the course of inquiry

itself. [Levi 1980, p.68]

I grant that \contextual factors" may change during the course of inquiry, for

example in a scienti�c revolution (cf. Section 2.5). But although the course of

inquiry may change how we conceive of a given scienti�c question, we would still

like to know how to investigate an empirical question as we currently conceive

of it. I grant too that a scientist ought to be able to review his approach to a

76

research problem and to reject it if he �nds it wanting. But it does not follow

that an agent should not make any plans at all, or refrain from comparing

the potential results of one long-term plan from another. This is a question

not just about scienti�c methodology, but about the role of plans in sequential

decision making in general. Suppose Maite plans to drive from Pittsburgh to

San Francisco via Chicago. As she approaches the exit to Chicago, she checks

the map and realizes that she might take a shortcut to Iowa and avoid Chicago,

with a likely time savings of several hours. Of course she should be able to

change her plan at that point. But rather than taking this as an argument

against making a plan at all, I would say that Maite should have chosen a better

plan; she should have planned to avoid Chicago in the �rst place. Although

an agent may reject a plan as she is following it, it is plausible to require of a

good plan that it will withstand \criticism and revision" during the course of

its execution. I therefore advocate an equilibrium between the long-run and the

myopic perspective: Optimal long-term plans should be such that an agent will

not want to change them during the course of inquiry. For scienti�c methods,

this motivates the following principle of sequential optimality.

If a method � is optimal by some criterion C given background

knowledge K; then � should remain C-optimal after observing evi-

dence e|that is, � should be C-optimal given background knowledge

K \ [e]|provided that e is consistent with K.

Game theorists have adopted the principle of sequential optimality as a

constraint on strategies for sequential decisions, under di�erent names and

with slight variations, such as: \subgame-perfection" [Selten 1965], \the back-

wards induction principle" [Kohlberg and Mertens 1986], \sequential rational-

ity" [Kreps and Ramey 1987], \sequential admissibility" [Bicchieri and Schulte 1997]

(see also Chapter 9). All methodological recommendations presented in this the-

sis endorse methods that are sequentially optimal. In Chapter 9, I show that

the admissibility principle guarantees sequential optimality, no matter what val-

ues are taken as the relevant ones: Methods that are admissible with respect

to background knowledge K are admissible after learning evidence e, that is,

with respect to background knowledge K \ [e]. But in some decision problems,

the con ict between short-run optimality and long-run bene�ts is irreconcilable.

For example, suppose that Maite makes a plan to quit smoking tonight. After

dinner, she reviews this resolution and realizes that if she waits another day,

the pleasure from one more cigarette tonight far outweighs the slight decrease in

the risk of lung cancer that quitting today will bring her compared to quitting

tomorrow. Criticizing her original plan in this way, she revises it and plans to

quit tomorrow. But tomorrow the plan to quit is again open to the same criti-

cism, which will lead her to revise it, etc. Medical associations at least would be

\for method" in this case, if this means a commitment to a long-term plan that

the agent honors even when the plan does not look good to her from a myopic

77

perspective. To sum up: Levi views the possibility that an agent may criticize a

plan during its execution as a reason for not making one; I take it to motivate a

constraint on the choice of long-term strategies for inference, namely that they

should withstand such criticism as inquiry proceeds. I call methods that meet

this requirement sequentially optimal. Serious con icts between the long-run

and the short-run perspective arise only when there are no sequentially optimal

methods. When such con icts arise, it is not clear that we should favor the my-

opic perspective over the long-run perspective; but in any case, the criteria that

I propose for evaluating inductive methods select sequentially optimal methods.

In particular, I show in Chapter 9 that admissible methods are sequentially

optimal.2

Another cluster of objections to long-run reliability as a goal for scienti�c

inference runs from Keynes's quip \in the long run we are all dead" to PAC

models (see next section) that yield bounds on how long a method may take to

settle on an approximately right answer. These criticism share the idea that it

is not su�cient for a method to �nd the truth eventually, but that it should do

so \soon enough", by some deadline known a priori, or at least \sooner rather

than later". The next section takes up these objections.

4.6 Contra Convergence

Is a reliable method an interesting solution to a research problem? We know

that a reliable method will, after some delay, hold the correct opinion for eter-

nity. But since we and our descendants reap the fruits of inquiry for only a

�nite time, how do we pro�t from the limit of inquiry? Furthermore, often|

perhaps in most cases|the inductive problem under investigation changes in

the course of inquiry (see Section 2.5), leading us to change our methods. These

considerations suggest that the value of a reliable method lies not so much in

its actual convergence to the truth, but in its disposition to hold on to the cor-

rect opinion. Another way of putting the point is that what a reliable method

achieves|within �nite time|is true stable belief, namely belief that does not

change with further evidence. Just as an algorithmic solution to a formal ques-

tion is guaranteed to �nd the right answer, but may take any �nite amount of

time to do so, a reliable method may take any �nite amount of time to produce

2Levi raises another objection to taking into account long-run reliability that stems fromhis theory of empirical knowledge and how it functions in decision making (cf. Section 2.4).

If the agent takes his current conjecture as background knowledge at time t, and evaluateschanging his conjecture with respect to that background knowledge, he must decide not to

change it because given his background knowledge, a change would certainly lead to an error.Levi's solution is that the agent may \contract" background knowledge without re ecting

on the fact that the result of this contraction might lead to a theory at time t + 1 that is

certainly false as far as the agent at time t is concerned. Thus Levi does not want an agent tore ect on whether his theory at the next stage of inquiry is true or not, much less on whether

eventually all his theories are true. Levi's myopic stance is implausible, and tied to premisesabout knowledge and action that I rejected in Section 2.4.

78

stable true belief. As Plato suggested, we may think of stable true belief as a

sign of \understanding", or having \gotten the gist" of a problem.3

One may insist that true stable belief is not good enough, and that we want

certain belief. Indeed, some say that we want not only the right answer with

certainty; we want to know that we shall not have to wait \too long" to �nd

the right answer. Thus Kitcher remarks that

To be sure, there are [Bayesian] convergence theorems about the

long run|but as writers from Keynes on have pointedly remarked,

we want to achieve correct beliefs in the span of human lifetimes.

[Kitcher 1993, p.293]

The second of these demands is stronger than the �rst: If it is possible to

give a deadline (e.g., \human lifetime") by which a method is guaranteed to

�nd a right answer, then we can be sure that the method's conjecture is correct

when the deadline is reached.4 We all want lots of things, especially instant true

theories. But as skeptical arguments from Sextus to Popper to Kelly pointedly

show, it is a fact of life that, unless the scientist is granted strong background

assumptions, there is no philosopher's stone that is guaranteed to turn the lead

of evidence into the gold of true generalizations. Still, as Kitcher points out,

\prior background practice" in a number of research problems rules out enough

possibilities for \eliminative induction" to single out one of the alternative hy-

potheses as the only one that \accommodates" the data [Kitcher 1993, Ch.7].

If this is not the case, he says, the \reasoning" involved should be \frowned

upon".5 Perhaps the suggestion is that we should adopt a quasi-conventional

standard according to which a research problem is feasible and worthy of in-

vestigation, relative to given background knowledge, only if the background

3Plato neglected the possibility that an agent may have stable true belief for the wrongreasons. I describe such a case in Section 5.4, and show how reliable methods can avoid this

problem.4An analogy to computability theory suggests another way of de�ning \not too long". An

\e�cient" solution to a computational problem is an algorithmwhose run-time is a polynomial

function of the size of the input. By analogy, one might suggest that the amount of evidence

that a reliable method requires to �nd a correct answer should be a polynomial function ofthe size of the space of alternatives H. This is one of the core ideas of the PAC learning

paradigm. (To meet the PAC criterion, the required evidence must be a polynomial functionof H as well as a \con�dence" and a \tolerance" parameter; see below in this section.) This

idea clearly applies only in settings in which the hypothesis space H is �nite, and there is a

natural way of de�ning its \size".5\However, when the constraints are lax or when con�dence in the completeness of prior

practice is (quite reasonably) low, there is room for doubt about hypotheses that accomodateaccepted results. The problem in this case is not accomodation itself, but the state of prior

practice. I conjecture that opponents of accomodation have been moved by examples in which

the constraints from prior practice are lax, and there is no serious attempt to explore a spaceof rival hypotheses. That type of reasoning should be frowned upon, but the troubles should

be traced to failure to activate the eliminative propensity, not to some supposed defect in thestrategy of accommodation." [Kitcher 1993, p.246]; emphasis Kitcher's.

79

assumptions imply that some \reasonably small" amount of evidence will elim-

inate all but one alternative|somewhat in the manner of computer scientists

who consider a computational problem \tractable" only if it can be solved by a

Turing machine whose run-time is a polynomial function of the size of the input.

It is not my purpose to enter into a discussion of whether we should adopt such

a standard of feasibility for research problems. I note however that along with

the problem of induction, Kitcher's standard throws out many research projects

of interest in science|for example, �nding a complete theory of particle physics

and investigating gravitational theories|as well as most of the inductive prob-

lems that philosophers have found worthy of re ection, such as Reichenbach's

problem of inferring long-run frequencies, and Goodman's Riddle of Induction.

An interesting alternative would be to de�ne a discovery problem as tractable

if and only if there is a reliable method for solving it. By Proposition 4.3, this

means that we should restrict attention to hypotheses that are decidable in the

limit. By that standard, Goodman's Riddle is tractable (cf. Section 2.3.3), and

so is the problem of identifying conservation principles for particle reactions

(Chapter 8). But Simon and Newell's physical symbol system hypothesis would

not count as tractable, as we saw in the previous section.

Another attempt to achieve more than long-run reliability is to appeal to

short-run chances. Even if the available evidence does not yield certainty as

to which theory is correct, we might want to follow methods that have a high

chance of producing a correct theory. For example, Kitcher holds that

[Our] normative project is to consider whether making particular

kinds of changes on the basis of particular kinds of processes is likely

to promote cognitive progress. [Kitcher 1993, p.221]

But short-run chances at the truth may add up to less than long-run reli-

ability: There is no guarantee that repeated guesses will ever stop vacillating.

Scientists who take myopic stabs at the truth might end up chasing their tails,

rather than engaging in the systematic, cumulative and progressive enterprise

that reliability theory (and realists like Kitcher) envision.6 On the other hand,

if a method with a high likelihood of producing correct theories in the short run

actually does lead to true hypotheses in the long run, a reliabilist can embrace it

as having a desirable short-run feature in addition to long-run reliability. Thus

the appeal to short-run chances does not inherently seem to be in con ict with

6It seems that Kitcher fails to appreciate this point. In various places his vision of scienceinvolves \cumulative progress"; more and more parts of our scienti�c theory \persist" through

time, even if others are discarded. Thus his \optimistic induction" [Kitcher 1993, p.137] saysthat since through time and changing theories, science has always held on to some parts of its

current theories, we may infer that science will continue to accumulate beliefs. Another part

of Kitcher's picture that assumes stable belief through time is his account of natural kinds,which he takes to be those which are eventually always included in the successive ontologies

that science produces [Kitcher 1993, p.172]. Long-run reliability can sustain this picture ofcumulative progress; short-run chances at truth cannot.

80

reliability analysis, if we understand it as an addition rather than an alternative

to an investigation of where a given type of inquiry is headed in the long run.

The most serious shortcoming of the short-run likelihood approach is that it is

hard to get o� the ground. How can we establish what the chance is that a

given method will produce a true theory? The naturalist approach is to launch

an \empirical investigation", typically of the \psychological propensities", or

\exemplary reasoning", of prominent �gures of science, apparently in the ex-

pectation that these propensities, or that kind of reasoning, are likely to produce

true theories, at least in our world. For example, Kitcher proposes to analyze

Galileo's way of doing science as follows.

First we must identify the processes that underlie Galileo's beliefs

and decisions. Second we have to assess the ability of such processes

to generate and sustain cognitively valuable states. Both endeavors

are error-prone. But there is a fact of the matter: either Galileo's

reasoning exempli�ed a strategy likely to promote cognitive goals or

it did not. [Kitcher 1993, p.186]

How are we to \assess the ability" of a psychological propensity to produce

the truth? Kitcher and other naturalists suggest that we should consider the per-

formance of such a propensity across a number of \epistemic contexts"[Kitcher 1993,

p.189], which for the purposes of the present discussion we may equate with pairs

of discovery problems and �nite data sequences. A number of problems arise:

How do we choose a \representative sample" of epistemic contexts on which we

should test the scienti�c disposition in question (cf. [Kitcher 1993, p.237])? And

given such a sample, how do we \identify the processes that underlie Galileo's

beliefs and decisions" precisely enough to tell what Galileo would have said

in a given epistemic context? Who can say whether Galileo would have|for

example|concurred with Newton's rejection of the wave theory of light? And

if we could say what Galileo's cognitive strategies were, how can we justify

inferences from \these strategies worked for problems P1; P2; ::; Pn" to \these

strategies will work for problem Pn+1"?7 Faraday relied heavily on visualization

in his research into electromagnetism. Would this cognitive strategy have helped

or hindered him in space-time physics? Newton made progress on mechanics,

but set back the wave theory of light, and got nowhere at all in his alchemy; is

there any reason to think that his \cognitive strategies" were di�erent in these

respective researches? From the reliabilist point of view, di�erent problems

demand di�erent solutions that take advantage of the special structure of the

particular problem involved. No general \methodological maxim", or \cognitive

strategy", or \style of reasoning" can beat the trite advice \choose an optimal

strategy for the problem at hand". For if the result of such \propensities" con-

tradicts this trivial advice, then by de�nition the result is less than optimal.

Thus I critiqued the maxim of \minimal change" belief revision in Chapter 3

7The introduction to [Donovan et al. 1988] discusses this issue further.

81

by describing how it can lead to bad results on a simple problem. Learning

theorists criticize the Bayesian maxim \thou shalt conditionalize" because this

principle can lead scientists who are not logically omniscient into severe losses

of reliability [Kelly and Schulte 1995b], [Osherson and Weinstein 1988]. And

if psychological analysis can provide us with enough information about what

Galileo's or Newton's cognitive strategies exactly are, we could and should hold

them up to reliabilist standards in the same way.8

Fairly recently, computational learning theory has found an ingenious way

to argue for the likelihood of a method's conjecture being true given a �nite

(polynomial-size) amount of evidence [Valiant 1984]. In essence, the argument

is this. We suppose that the data were generated by random sampling from an

(unknown) probability distribution. A method PAC-learns if, for any sampling

distribution, after some bounded number of samples, the method produces,

with a probability (determined by the sampling distribution) of at least �, a

hypothesis whose margin of error (determined by the sampling distribution) is

no greater than �; here 0 � � < 1 and 0 < � � 1 may be speci�ed arbitrarily.

As I mentioned, we can include such short-run performance in a long-run relia-

bilist framework by asking whether repeated applications of the method would

stabilize to a true hypothesis. However, there are serious technical problems in

generalizing the PAC-idea to the kinds of empirical hypotheses that I am con-

cerned with. The PAC-framework applies to concepts for classifying instances,

not arbitrary empirical propositions. Even more importantly, the performance

guarantees lean heavily on the assumption that the data are sampled at random.

But in the kinds of scienti�c problems I am considering, there is no reason to

think that successive observations are independent of each other.9

4.7 There's No Reliable Method|So What?

What conclusions should we draw about a research project|say, a discovery

problem|when it has no reliable solution? Such a negative result seems to

support a skeptical attitude about the ability of empirical science to solve the

questions at issue. But the skeptic's case against the possibility of knowledge

(or reliably inferred stable true belief) rests on material assumptions that she

cannot know (or reliably infer). For example, an improvement in instruments

for detecting particles may reduce the inductive complexity of particle theories.

8For example, Newton's fourth \rule of reasoning" [Newton 1995, Book III] amounts to

what [Kelly et al. 1995] call \stubborness" (cf. Section 3.5). Stubbornness is unreliable if thehypotheses under investigation are not refutable with certainty, as in the example of the two

almost universal generalizations (cf. Section 4.3).9However, the PAC-framework may be appropriate for theories of particle reactions. We

shall see in Chapter 8 that theories of particle reactions may be thought of as concepts

classifying reactions into \possible" and \impossible". And it seems that one may assumethat successive observations of reactions in, say, a bubble chamber are independent of each

other.

82

Direct observations of brain activity may allow us to test theories of human

cognition in a way that stimulus-response data do not. In 1825, August Comte

o�ered speculation about the chemical composition of the stars' atmosphere as

the kind of metaphysical speculation grounded in no empirical evidence that

was unworthy of a good positivist [Comte 1968]. But then spectral analysis

came along. Thus reliabilist underdetermination arguments do not establish,

once and for all, that an empirical question is beyond the reach of science.

Rather, they show a gap between the ambitions of inquiry and its current

means. What attitude should we take towards such a gap? One kind of response

would be to set a lower standard of success|for example, gradual veri�cation

rather than decision in the limit|or to introduce more assumptions to make

the problem simpler. This is how a computer scientist treats an intractable

computational problem: she might seek an algorithm whose outputs are always

within a constant factor from the optimal solution, or she might isolate special,

tractable cases of the original problem. Another response is to acknowledge

that for all we are currently willing to assume about a given inductive problem,

our methods might fail to �nd the correct answer, but that nonetheless we need

not \worry" about this possibility because it is \not signi�cant". Probability

theory supplies the tools for developing one version of this response. [Kelly 1995,

Ch.13] shows how, and critiques the result from the point of view of learning

theory (see also [Juhl and Kelly 1994]). Since these matters are complex, not

directly relevant to the project of this thesis, and I have nothing to add to the

detailed discussions in the literature, I will not take them up here.

In any case, it is clear that we prefer to �nd the truth sooner rather than

later, and that we may be unhappy if we don't know when our methods will

deliver it. The discontent may insist on strong background assumptions, or

appeal to probabilistic arguments, to feel justi�ed in believing that they will

soon have a correct theory. Reliability theory has no general arguments for or

against making background assumptions; but methodology should be tolerant

enough to give guidance to those who don't want to neglect possibilities. Even

if we don't set science a deadline by which it must deliver the certain truth,

but content ourselves with eventually true and stable, but uncertain, beliefs, we

still want methods that do not unnecessarily delay arriving at correct theories.

These are the �rst topic of Chapter 6. Before we take up that topic, I take a

closer look at reliable methods in the next chapter.

4.8 Proofs

[Kelly 1995] gives the proofs of Propositions and Corollaries 4.2, 4.4, 4.5, 4.6

and 4.7: My Proposition 4.2 follows from his Proposition 9.12, my Propositions

4.4, 4.5, 4.6 from his 4.10, and my Proposition 4.7 from his Proposition 4.6.

Here's the proof of Lemma 4.3.

Lemma 4.2 Let � be a discovery method, and let background knowledge K be

83

given. Then for all �nite data sequences e, proj(�; e) \K is refutable with cer-tainty given K.

Proof. Let � be a data stream in K that is not a member of proj(�; e) \K.

Then � is not a member of proj(�; e) = f� � e : if e � e0 � � and �(e) j= H 2 H,

then �(e0) j= H and �(e0) 6= ;g. If � does not extend e, then proj(�; e) is refuted

when � deviates from e. Otherwise there is an initial segment e0 of � extending

e such that �(e0) is either inconsistent or fails to entail �(e). In either case, e0

conclusively falsi�es proj(�; e).2

84

Chapter 5

Reliable Inference

5.1 Outline

The previous chapter determined when it is feasible to reliably converge to the

right answer about a given discovery problem. This chapter characterizes what

the methods are like that achieve this goal. I prove a normal form theorem that

shows that (virtually) all reliable methods take a form known as the bumpingpointer method. This normal form theorem suggests a more exible and plau-

sible formulation of the falsi�cationist position. Suppose that the hypotheses

under investigation are unfalsi�able, and that there is a reliable method � for

identifying the true alternative. (� would not be a conjectures-and-refutations

method.) The theorem shows how to decompose the original hypotheses into

falsi�able conjectures, so that we may use a conjectures-and-refutations method

for those hypotheses to reliably identify a theory that gives the right answer to

the questions under investigation. The result is a conjectures-and-refutations

method �0 that is just as reliable as the method � designed for the original

hypotheses, yields more content than �, but also makes more errors. Thus we

may construe Popperian recommendations as specifying the means of choice for

those chie y concerned with reliably �nding the truth, and more concerned with

providing content than avoiding errors.

Another application of the normal form theorem concerns the ancient philo-

sophical question about the relationship between true belief and knowledge.

According to a traditional proposal, knowledge is justi�ed true belief. Edmund

Gettier refuted this idea with a famous counterexample [Gettier 1963]. The

essence of Gettier's counterexample is this: Suppose that (1) an agent is justi-

�ed in believing a claim C, (2) that C entails H, and this leads the agent to

believe H, and that (3) H is true but C is false. Then the agent's belief in

the true proposition H is justi�ed, but intuition says that the agent does not

know H because he believes H for a wrong reason, namely because he accepts

85

C. Now presumably the agent would abandon her belief in H if she received

evidence refuting C. Thus if we took knowledge to be stable true belief|belief

that does not change with more information about the world|the agent's be-

lief in H would not count as knowledge, as intuition con�rms. Indeed, in the

Meno dialogue [Plato 1967], Plato stipulated that knowledge must be stable

true belief. Although Plato's condition solves the Gettier paradox, I show with

a counterexample that it cannot be the whole story about knowledge: I describe

an in�nitary version of the Gettier problem in which an agent has stable true be-

lief, but for in�nitely many wrong reasons. On the other hand, the normal form

theorem shows that for any discovery problem that admits a reliable solution,

there is a reliable method � that avoids the in�nitary Gettier problem. That is,

if the method � stabilizes to a true alternative H, its internal beliefs supporting

H are eventually true as well. There are no Gettier-type counterexamples to

regarding the stable true beliefs of such agents as knowledge.

5.2 Reliable Methods

What are reliable methods like? To answer this question, I establish �rst that we

can view any reliable discovery method as an instance of what is known as the

bumping pointer architecture [Osherson et al. 1986], [Kelly 1995, Ch.9] (pro-

vided the method satis�es a small proviso speci�ed below). Bumping pointer

methods are like Popperian conjectures-and-refutations methods in that they

enumerate a collection of refutable hypotheses and move on to the next hypoth-

esis if and only if the evidence falsi�es the current hypothesis. The di�erence is

that a bumping pointer method needn't output the refutable hypothesis itself,

but may conjecture a weaker theory. For example, take the two almost univer-

sal generalizations from Section 2.3.2, Hw =\almost all swans are white" and

Hb =\almost all swans are black". Hw is equivalent to an in�nite disjunction,

\all swans are white or at most one swan is black or at most two swans are

black or ...". Let Cw

1 ; Cw

2 ; ::: denote these disjuncts, such that Cw

kis true just in

case at most k swans are black. Note that each Cw

kis refutable with certainty:

if there are more than k black swans, Cw

kis falsi�ed when k+1 black swans are

observed. Similarly, we may decompose Hb into refutable disjuncts Cb

1; Cb

2; :::

such that Cb

kis true just in case at most k white swans are observed; see Figure

5.1

Now combine the refutable hypotheses C0k; C

b

kinto one big enumeration

C1; C2; ::: such that each Cw

kand C

b

khypothesis occurs exactly once in the Ci

enumeration. A bumping pointer method based on the Ci enumeration employs

an \internal pointer" to mark the �rst hypothesis Ck that is consistent with

given evidence e, and conjectures Hw or Hb depending on whether Ck entails

Hw or Hb. For example, suppose that C1 = C

w

0 , \all swans are white", and

that C2 = Cb

1, \at most one swan is white". Then the bumping pointer initially

points to C1 which entails Hw, so the method initially conjecturesHw, \almost

86

Figure 5.1: Decomposing Hypotheses Into Refutable Subsets.

87

Figure 5.2: The Bumping Pointer Method

all swans are white". If the �rst observed swan is black, the pointer bumps to

C2, so the method conjectures Hb, \almost all swans are black". The method

from Section 2.3.4 that reliably identi�es the true limiting relative frequency of

an event among the two alternatives 1/4 or 3/4 can be implemented as a bump-

ing pointer method: Let the \internal" hypotheses be of the form \at stage n

and all later times, the observed relative frequency will always be closer to 1/4

(respectively, 3/4) than to 3/4 (respectively 1/4)". Each of these internal hy-

potheses is refutable with certainty. The bumping pointer method that operates

on these internal hypotheses works in the spirit of Reichenbach's straight rule:

it conjectures that the limiting frequency of an event is 1/4 (respectively, 3/4)

as long as the observed frequency is closer to 1/4 (respectively, 3/4).

To de�ne the notion of a pointer precisely, let a countable collection C of

refutable empirical hypotheses be given, and let � be an enumeration of the hy-

potheses in H. Then pointer(C; �; e) is the �rst|in �'s enumeration|hypothesis

C in C such that C is consistent with e. (Here and elsewhere I assume that there

is such a hypothesis C in C.) Now consider a collection of alternative hypothe-

ses H and background knowledge K that de�ne a discovery problem. Let C

be a countable collection of refutable empirical hypotheses, each of which en-

tails some hypothesis in H (i.e., for each C in C, there is an H in H such that

C j= H). I say that C is a re�nement of H. Given evidence e, a bumping

pointer method based on an enumeration � of a re�nement C conjectures the

hypothesis H from H that is entailed by pointer(C; �; e); see Figure 5.2.

The reason why reliable discovery methods can be implemented as bumping

88

pointer methods is this. Let � be a reliable method for a given discovery problem.

By Lemma 4.3, the projection sets of � are each refutable with certainty. So

there is a bumping pointer method �0 that operates on a collection C comprising

the projection sets of �. The pointer of �0 bumps just in case � changes its

mind. Furthermore, the projection sets of � are subsets of a hypothesis H from

H since a data stream � is in H whenever � stabilizes to H along � . Hence

the refutable sets in C|the projection sets of � |entail the same hypotheses in

H that � stabilizes to; the proof of Proposition 5.1 shows that if the bumping

pointer method �0 enumerates the projection sets of � in the right way, �0 and

� conjecture the same alternative from H whenever � projects this alternative.

The bumping pointer method �0 will thus di�er from � only if � does not project

any alternative fromH on a given data sequence e. Since in that case � is sure to

change its mind about its hypothesis �(e) no matter what, I say that � does not

take seriously the conjecture �(e). Formally, � always takes its conjectures

seriously given K if for all e consistent with K; proj(�; e)\K 6= ;. Finally, since

� is reliable given K, � stabilizes along each data stream � consistent with K,

such that � is contained in some projection set of �; I say that the collection

of projection sets covers K. All together, this shows that a reliable discovery

method � can be implemented as a bumping pointer method if � always takes

its conjectures seriously.

Proposition 5.1 Suppose that � is a reliable discovery method for H given

K, and that � always takes its conjectures seriously given K. Then � can beimplemented as a bumping pointer method. That is, there is a re�nement C

of H, an enumeration � of C, and a bumping pointer method �0 de�ned by C

and � such that for all �nite data sequences e consistent with K, �(e) j= H

() �0(e) j= H.

5.3 Popper, Levi and Deductivism

The bumping pointer method suggests a plausible formulation of a falsi�cation-

ist approach to scienti�c discovery. This proposal also addresses an unsolved

problem for Levi's epistemology: How can an agent rationally decide to change

her beliefs when she is certain that her beliefs are true?

5.3.1 A New and Improved Falsi�cationism

We saw in Section 4.3 that Popper's conjectures-and-refutations method is un-

reliable when the alternatives under investigation are decidable in the limit but

not refutable with certainty. Proposition 5.1 o�ers a falsi�cationist an inter-

esting reply: He can argue that the original discovery problem de�ned by the

alternatives H should be reposed with the alternatives C in place of H, where C

is a collection of refutable sets that re�nesH and coversK. I call this translation

89

of the problem popperizing a discovery problem. For example, to popperize

the problem of identifying an almost universal generalization about swans, we

may take the alternative hypotheses to be the collections Hw

kand H

b

k, that is,

hypotheses of the form \at most k swans are white", or \at most k swans are

black". A conjectures-and-refutations method reliably identi�es a true alterna-

tive from C since each such alternative is refutable with certainty (cf. Section

4.3). Another way of putting the matter is that Proposition 5.1 permits us to

think of a reliable method � as a bumping pointer method, and we can turn a

bumping pointer method into a conjectures-and-refutations method that pro-

duces the \internal" refutable hypotheses marked by the bumping pointer. The

result of popperizing the original method � in this way is a method �P that is

just as reliable, but produces refutable strengthenings of the original hypotheses.

A good reason for following �P is that �P is more informative; �P dominates

� with respect to content, in the sense of De�nition 3.2. A good reason for

following the original method � is that � makes fewer errors; � dominates �Pwith respect to error, in the sense of De�nition 3.1. (� also changes its mind

less often; see De�nition 6.2 below.) Thus we may construe a falsi�cationist's

preference for refutable hypotheses and the conjectures-and-refutations method

as the right advice for someone who wants to reliably �nd the truth, and cares

more about content than about avoiding errors.

Popperian conjectures-and-refutations methods are stubborn and timid in

the same sense as AGM methods (see Section 3.5): They hang on to their

conjectures as long as these are consistent with the data. As constraints on

inductive methods, the AGM principles are essentially the conjectures-and-

refutations scheme minus falsi�ability. The way to make AGM methods into

reliable discovery methods is to add falsi�ability (i.e., let them produce refutable

theories). Thus we �nd reliable AGM methods at the same high-content side of

the content-error continuum as Popper's conjectures-and-refutations approach.

5.3.2 A Reliable Enterprise of Knowledge

The bumping pointer architecture can accommodate Levi's epistemology (cf.

Section 2.4), and indeed suggests a solution for a problem that Levi left open.

The problem is this: Levi requires that an agent adopt her current theory as

background knowledge. But if the agent is certain that her current beliefs are

true, how can she rationally decide to abandon any of her current beliefs? Re-

tracting any of her beliefs incurs a loss of content without eliminating any serious

possibilities of error (that is, retracting beliefs is not content-error acceptable in

the sense of De�nition 3.3). Levi suggests that \if X detected inconsistency in

his initial corpus, he could have excellent reasons to contract. An inconsistent

corpus fails as a standard of possibility" [Levi 1983, p.165]. Levi does not make

clear what other rational reasons an agent may have to contract. 1 But if X

1Levi claims that an agent would be justi�ed in giving a theory \a hearing" if the theory has

\superior explanatory virtues", even when he knows that the theory|despite its explanatory

90

makes the falsi�cationist move from the previous section|adopting beliefs that

are refutable with certainty|the only reason that X needs to retract beliefs is

inconsistency with the current evidence. At the same time, the agent's retrac-

tions through time will take him to true beliefs, solving another mystery that

puzzles Levi:

But if our proximate aim is merely to accept hypotheses for pur-

poses of testing them and somehow this concern is seen to promote

the long-run aim of obtaining the true complete story of the world (in

a manner which remains a mystery to me), then truth or avoidance

of error may be an important desideratum in the long run. It has

no importance, however, in the proximate aims of inquiry. If test-

worthiness is what we are after, we need not concern ourselves with

the truth values of our hypotheses or with avoiding error....Thus,

Popper's view succeeds in placing truth on a pedestal remote from

the immediate concerns of inquiry. [Levi 1983, p.168]

There is no mystery at all about how the bumping pointer method uses tests

to promote the long-run aim of �nding the truth about the questions under

investigation. Rather than placing truth on a \pedestal" remote from empirical

tests, the normal form theorem proves that testing strategies must be part ofreliable processes for �nding the truth. Whatever \testworthiness" may be,

testing alternative hypotheses against empirical evidence is an indispensable

means for the long-run aim of �nding the right answer to the questions that

prompted inquiry.

5.4 Gettier meets Meno

What are reliable bumping pointer methods like? A bumping pointer method

is reliable given background knowledge K for a collection of alternatives H if

its re�nement C of H covers all serious possibilities in H, that is, if K j=SC.

For if this is the case, then some hypothesis Ck in C is true whenever K is,

and hence always consistent with the evidence, eventually the data falsify all

refutable hypotheses C1; C2; :::; Ck�1 preceding Ck, and the bumping pointer

method stabilizes to Ck. However, there are reliable bumping pointer methods

whose re�nement C does not cover all serious possibilities. For an example,

recall the physical symbol system hypothesis from Section 2.3.5. Suppose that

power|must be false [Levi 1983, p.166]. But for this to work, providing good explanations

must be an important epistemic value, so important that an inquirer should sometimes con-

sider explanations that she knows to be false because they would be so nice if they were true.But Levi's o�cial view is that content and avoiding error are by far the most important epis-

temic values. For example, in the same paper, Levi insists that an agent should not change toa more \testworthy" theory|a la Popper|because, Levi says, testworthiness is not related

to content and error; see the quotation below.

91

computers are capable of intelligence, and that only one paradigm of cognitive

science, say connectionism, is capable of producing true machine intelligence.

However, a model from another paradigm, say production systems, can account

for any �nite component of intelligent behavior. Figure 5.3 depicts this situation.

Suppose that the data stream � is the actual one; so a connectionist neural-

network modelN is true on �. The models P0; P1; ::: are production systems that

account for parts of the evidence from �. Let us focus on the empirical content

of N and P0; P1; :::; call this set of data streamsK. Consider a bumping pointer

method �PS whose enumeration of refutable sets includes P0; P1; ::: but not N ,

in such a way that for any �nite data sequence �jn from �, some production sys-

tems model Pi is the �rst hypothesis that is consistent with �jn. These refutable

sets do not cover � because the connectionist model N is missing. Nonetheless,

�PS is a reliable method for investigating the physical symbol system hypothesis

over the range of possibilities K 2: Since all along �, the bumping pointer points

to a production-system model Pi, and each Pi entails that the physical sym-

bol system hypothesis HPSS is true, �PS correctly stabilizes to HPSS along �.

Another reliable bumping pointer method �N may include N among its closed

sets in addition to P0; P1; :::; along �, its pointer would eventually stabilize to

N . Throughout K, and in particular on �, both methods give exactly the same

answers about the physical symbol system hypothesis. But there is something

unsatisfactory about the success of �PS: This method has the right opinion

about the possibility of machine intelligence, but for the wrong reason; it never

abandons its (internal) false belief that some production-system model of intel-

ligence is adequate. What is at stake here philosophically is that we don't want

just the right behavior, but the right behavior for the right reasons. This is

the sort of problem that Edmund Gettier raised for accounts of knowledge as

justi�ed true belief [Gettier 1963]. 3 A typical Gettier case is the following.

Two other people are in my o�ce and I am justi�ed on the basis

of much evidence in believing the �rst owns a Ford car; though he

(now) does not, the second person (a stranger to me) owns one. I

2We may extend � to a reliable method over the other possibilities outside of K with thereliable method described in Section 2.3.5.

3As Plato noted, \And presumably as long as he has right opinion about matters of which

the other has knowledge, he will be no worse a guide than the man who understands it, eventhough he only believes truly without understanding" [Plato 1967, 96d{98b]. The problem is

one of global underdetermination: Even if we knew everything about the overt behavior of anagent, or an entity, at all times, this information does not determine the internal reasons for

the agent's behavior. Situations like this give rise to philosophical di�culties. One example

is Searle's famous attack on Turing's test for machine intelligence [Searle 1980]. Searle arguesthat even if an entity acts intelligently, it is not actually intelligent unless it produces the

intelligent behaviour in the right way; a computer program, Searle says, is a wrong way.Another example is the question of whether a good deed su�ces to earn moral credit, or

whether in addition the deed must be done with the right kind of intentions. Kant for oneheld that a dutiful action loses all moral value if the actor was emotionally inclined to perform

the act [Kant 1785].

92

Figure 5.3: Given the observations from data stream �, a connectionist modelN

is the true theory of machine intelligence, but the production systems approach

is never conclusively refuted along �.

93

believe truly and justi�ably that someone (or other) in my o�ce

owns a Ford car, but I do not know someone does. [Nozick 1981]

In terms of methods with a bumping pointer, the Gettier problem arises

when the pointer bumps from a false \internal" hypothesis C1 (�rst person

owns a Ford car) that entails a hypothesis H (someone in my o�ce owns a Ford

car) to a true \internal" hypothesis C2 (the second person owns a Ford car)

that also entails H. When the pointer marks C1, the method believes H but

for the wrong reason.

Similarly, we are inclined to say that a follower of the production system

method �PS does not know at any point that the physical symbol system hy-

pothesis is true when the connectionist approach is the right one, whereas the

connectionist does know it, at least after some time. This shows that the Pla-

tonic condition that knowledge must be stable true belief may be necessary but

is not su�cient. My example is an in�nitary version of the Gettier problem,

in which a method settles on the right hypothesis because of in�nitely manywrong reasons. The proof of Proposition 5.1 shows that we can avoid Gettier

cases by choosing the right kind of re�nement C of H. More precisely, there is a

re�nement C, such that if the pointer of a bumping pointer method moves from

an internal hypothesis C1 to another C2, then C1 and C2 entail di�erent alter-

natives fromH. Although some reliable methods may have the Gettier problem,

for any solvable discovery problem there is a reliable method that does not.

5.5 Proofs

It is useful for the proof of Proposition 5.1 to have some notation for the last

place on a given �nite data sequence e at which a discovery method � changed

its mind. Let LastMC(�; e) be the shortest sequence of data e0 contained in e

such that � does not change its mind between e0 and e, i.e. for all e0 � e

00 �

e; �(e00) j= H () �(e) j= H. If � changes its mind at e, then LastMC(�; e) = e.

Recall that e � x denotes the �nite data sequence in which datum x follows the

observations e. So LastMC(�; e � x) = LastMC(�; e) if and only if � does not

change its mind at e � x:

Proposition 5.1 Suppose that � is a reliable discovery method for H givenK, and that � always takes its conjectures seriously given K. Then � can beimplemented as a bumping pointer method. That is, there is a re�nement C

of H, an enumeration � of C, and a bumping pointer method �0 de�ned by C

and � such that for all �nite data sequences e consistent with K, �(e) j= H

() �0(e) j= H.

Proof. For abbreviation, I write T � T0 if T j= H () T

0j= H, for all

H 2 H. Let � be a reliable discovery method for H. Choose a 1-1 encoding hi

of �nite sequences of natural numbers into natural numbers that is monotone

94

in the sense that he � xi > hei. 4 Let Ci = proj(�; hii�1), let H0 = fCigi2!,

and let � be the corresponding enumeration of the closed sets Ci. Let �0 be

the bumping pointer method with pointer(H0; �; �) that conjectures �(hii

�1) if

pointer(H0; �; e) = Ci. I show that for all �nite data sequences e; �(e) � �

0(e).

As an auxiliary hypothesis, I establish that (*) for all e; pointer(H0; �; e) =

proj(�; LastMC(�; e)). The proof is by induction on the length of �nite data

sequences.

Base Case, e = ;. Since hi is monotone, h;i = 0. So pointer(H0; �; ;) = C0.

Hence �0(;) � �(h0i�1) = �(;). Also, LastMC(�; ;) = ;. So proj(�; LastMC(�; ;)) =

proj(�; ;) = C0 .

Inductive Step: Assume that �(e) � �0(e), and consider a �nite data sequence

e � x.

Case 1: �(e) � �(e � x). Since � does not change its mind at e � x and

e � x is consistent with K; e � x is consistent with proj(�; LastMC(�; e)). So by

inductive hypothesis, e � x is consistent with pointer(H0; �; e). By the de�nition

of a pointer, pointer(H0; �; e � x) =pointer(H0

; �; e). So �0(e � x) � �

0(e) �

�(e) � �(e � x) by inductive and case hypothesis. Similarly, pointer(H0; �; e) =

proj(�; LastMC(�; e)) =proj(�; LastMC(�; e � x)) =pointer(H0; �; e � x).

Case 2: �(e) 6� �(e � x). Then LastMC(�; e � x) = e � x: Let Ci =

pointer(H0; �; e � x), that is, let i be the least number such that Ci is consis-

tent with e � x. Since � always takes its conjectures seriously, proj(�; e � x) 6= ;.

So C<e�x> =proj(�; e�x) is consistent with e�x; hence Ci, the �rst set in the enu-

meration consistent with e�x cannot follow C<e�x> in the enumeration, that is,

i � he � xi. Because hi is monotone, this means that hii�1� e�x, sinceCi is con-

sistent with e �x; and hence hii�1is consistent with e �x. Moreover, i � he � xi,

because otherwise again due to the fact that hi is monotone, hii�1

� e � x, so

that hii�1

� e. But � changes its mind at e � x and hence between hii�1and

e � x , so that e � x is inconsistent with proj(�; hii�1) = Ci. Since this contra-

dicts the assumption that e �x is consistent with Ci, we have that hii = he � xi,

so Ci = Che�xi , and hence pointer(H0; �; e � x) = Che�xi =proj(�; e � x). Thus

�0(e � x) � �(e � x).2

4A monotone 1-1 encoding himay be de�ned like this. Let Pi denote the i-th prime, and lete be a �nite sequence of natural numbers. Let h;i = 0, and let hx1 ; x2; : : : ; xki = �k

i=1(Pi)

xi .

95

Chapter 6

Fast and Steadfast Inquiry

6.1 Outline

Combining evaluation criteria for inductive methods, such as admissibility and

minimax, with cognitive values, such as content, error and avoiding retractions,

can provide strong guidance for what inferences an agent should draw in the

short run, as we saw in Chapter 3. But strong guidance is not necessarily good

guidance; in particular, the constraints I established in Chapter 3 do not help

agents �nd correct theories. Reliability analysis, on the other hand, makes clear

how to organize inquiry in a manner that is headed for correct beliefs in the long

run; but long-run reliability is consistent with any beliefs in the short run. This

chapter shows how we can apply auxiliary cognitive values to choose among

reliable methods. The result is a powerful combination of long-run reliability

and short-run constraints. We may think of the resulting norms as standards

of e�ciency for reliable inquiry. I consider three epistemic values that pertain

to discovery problems: minimizing convergence time, retractions and errors. I

apply the criteria of admissibility and minimax to each of these three to arrive

at six standards of e�ciency for reliable inquiry. Then I investigate under what

circumstances it is feasible for an inductive method to attain a given standard of

performance, and what methods do so when possible. The result is a hierarchy

of cognitive goals (illustrated in Figure 6.9): It turns out that the norms of

e�ciency fall into a strict order of feasibility.

Certain combinations of cognitive goals lead to particularly interesting re-

sults. For example, evaluating reliable methods by their convergence time with

the admissibility principle, and by their retractions with the minimax principle,

underwrites a version of Occam's Razor, selects the natural projection rule in

Goodman's Riddle of Induction as the most e�cient, and mandates the intro-

duction of hidden particles in theories of particle reactions, as we shall see in

Chapter 8.

96

6.2 Data-Minimal Methods

Time is a resource of inquiry. An inquirer who wants a correct theory as soon

as possible prefers his methods to stabilize to a true belief sooner rather than

later. Let us call the time that a method � requires to settle on a hypothesis

from a collection of alternatives H, on a given data stream �, the modulus of

� on �; I denote the modulus by mod(�; �). Formally, mod(�; �) = the least time

n such that at n and all later stages n0 > n; �(�jn0) is consistent and entails

the hypothesis H from H that is correct on �. If a method � fails to converge

to a true hypothesis on a data stream �, then I take its modulus on � to be

in�nite, somod(�; �) = !. In isolation from other epistemic concerns, minimizing

convergence time is a trivial objective: An inquirer who never changes his initial

conjecture converges immediately. The interesting question is what reliablemethods converge as fast as possible. We can use the admissibility principle

to evaluate the speed of a reliable method as follows.

De�nition 6.1 Let K be background knowledge, and let H be a collection ofalternative empirical hypotheses.

1. A method � dominates another method �0 given K;H with respect to con-

vergence time ()

(a) for all data streams � consistent with K;mod(�; �) � mod(�0; �), and

(b) for some data stream � consistent with K;mod(�; �) < mod(�0; �).

2. A method � is data-minimal given K;H () � is not dominated givenK;H with respect to convergence time by another method �0 that is reliable

for K;H .

The term \data-minimal" is standard usage in learning theory (due to Gold);

it expresses the idea that methods that converge as soon as possible make e�-

cient use of the data. Data-minimal methods are exactly the ones that satisfy

a simple, intuitive criterion: They always take their conjectures seriously, in

the sense of Section 5.2. In other words, with a data-minimal method �, it is

possible at each stage of inquiry that � locks on to its current, correct, conjec-

ture. Thus data-minimality underwrites the requirement of always producing

consistent theories (cf. Section 3.2).

Proposition 6.1 Let K be background knowledge, and let H be a collection of

alternative empirical hypotheses. A reliable method � for H is data-minimalgiven K () � always takes its conjectures seriously given K.

Figure 6.1 shows why a data-minimal method must take its conjectures se-

riously.

If a method fails to take its conjecture seriously on some �nite data sequence

e consistent with K, then we can speed it up on some data stream � extending

97

Figure 6.1: A data-minimal method must project its conjectures: �0 dominates

� with respect to convergence time.

98

e without slowing it down on other data streams, because � is not converging

on other data streams anyway. Figure 6.2 illustrates the converse.

If a method � always takes its conjectures seriously given K, any other

method �0 that seeks to �nd the true hypothesis before � on some data stream �

must disagree with the conjecture of � at some point along �, say at stage n. But

since � always takes its conjectures seriously, it does so after receiving evidence

�jn, and locks onto its conjecture �(�jn) on some data stream � extending �jn

for which �(�jn) is correct. Since we supposed that �0 disagrees with � on �jn;

the theory of �0 on �jn must be false on the data stream � . So the method �0

does converge to a correct theory on data stream � by time n, but only later on

data stream � , if at all. So � is faster than �0 on � . This shows that methods

that always take their conjectures seriously are data-minimal. The proof of

Proposition 6.1 formalizes these arguments.

Thus data-minimality imposes a plausible but weak constraint on inductive

inquiry; whenever there is a reliable method for a given discovery problem, we

can choose a data-minimal reliable method that always takes its conjectures

seriously (any bumping pointer method will do).

6.3 Retractions

Thomas Kuhn argued that one reason for sticking with a scienti�c paradigm in

trouble is the cost of retraining and retooling the scienti�c community [Kuhn 1970].

The philosophical literature around \minimal change" belief revision shows

that avoiding retractions is a plausible desideratum for theory change. Simi-

larly, learning theorists have investigated methods that avoid \mind changes"

[Putnam 1965]. For discovery methods, this motivates a di�erent criterion for

evaluating the performance of a method on a given data stream: We want

methods whose conjectures vacillate as little as possible.

Let � be a discovery method for a collection of alternatives H. I say that

� retracts its conjecture on a data stream � at time n + 1, or changes its

mind at �jn+ 1, if �(�jn) is consistent and entails a hypothesis H from H, but

�(�jn + 1) either is inconsistent or does not entail H. I denote the number of

times that a method � changes its mind on a data stream � byMC(�; �); formally,

MC(�; �) = jfn : � changes its mind at �jn + 1gj. If � does not stabilize to a

hypothesis on a data stream �, then MC(�; �) is in�nite. I de�ne admissibility

with respect to avoiding retractions as follows.

De�nition 6.2 Let K be background knowledge, and let H be a collection of

alternative empirical hypotheses.

1. A method � dominates another method �0 given K;H with respect to mind

changes ()

(a) for all data streams � consistent with K;MC(�; �) �MC(�0; �), and

99

Figure 6.2: Method � always projects its current conjecture and hence is data-

minimal.

100

(b) for some data stream � consistent with K;MC(�; �) < MC(�0; �).

2. A method � is mind-change{minimal given K;H () � is not domi-

nated given K;H with respect to mind changes by another method �0.

I shall use the terms \mind-change{minimal" and \retraction-minimal" to

mean the same thing. Retraction-minimality as just de�ned is a trivial criterion:

The only methods that satisfy it are those that never change their minds. For the

\skeptical" method from Section 3.2 that never goes beyond the evidence never

changes its mind; hence no retraction-minimal method does. These skeptical

methods are reliable only if the evidence is eventually guaranteed to eliminate

all alternatives but the correct one. 1 In other words, they are reliable only if

there is no genuine problem of induction|for the problem of induction arises

precisely when no �nite amount of evidence entails the true theory.

Can we use the desideratum of avoiding mind changes to choose among reli-

able methods, as with the aim of �nding a true theory sooner rather than later?

Long-run reliability forces the skeptic to eventually take a chance and conjec-

ture a hypothesis that goes beyond the evidence (assuming that the discovery

problem is such that the alternatives under consideration are genuine general-

izations). But because long-run reliability is consistent with any behavior in

the short run, the skeptic may delay this moment and the risk of having to

retract her theories, for as long as she pleases. For any amount of time that the

skeptic may choose to delay taking an inductive risk, she could have delayed

more and been just as reliable in the long run. It follows that a reliable method

is mind-change{minimal just in case it never changes its mind; that is, just in

case it never goes beyond the available evidence.

Proposition 6.2 Let H be a collection of alternative hypotheses, and let Kbe given background knowledge. Then there is a reliable mind-change{minimalmethod for H given K () on every data stream �, there is a time n such that

�jn and K entail the true hypothesis H in H.

In Chapter 9 I show that a method is admissible|with respect to any epis-

temic value|just in case the method is sequentially optimal, that is, optimal

at each stage of inquiry (cf. Section 4.5). Proposition 6.2 implies that if a

discovery problem is genuinely inductive, there is no reliable method that is

admissible with respect to mind changes; hence there is no method that is

sequentially optimal with respect to mind changes. This result points to an in-

teresting methodological phenomenon: myopically avoiding retractions at each

stage of inquiry leads to inferences that are not reliable in the long run. Con-

sider again the physical symbol system hypothesis, according to which there is

some computer programwhose intelligence matches that of humans. If cognitive

scientists experience a series of failures in building an intelligent system, they

1as in Kitcher's \eliminative induction"; see Section 4.6.

101

must eventually abandon the physical symbol system hypothesis, or else risk the

possibility that they might go on for eternity searching for the path to machine

intelligence when there is none (cf. Section 4.4). But just when should they

become pessimistic about the prospects of cognitive science? Suppose that the

researchers try programs m1;m2; :::;mn , in vain, and now consider whether to

give up the physical symbol system hypothesis, or else to try one more program

mn+1 before they conjecture that machine intelligence is impossible. As far as

long-run reliability is concerned, it makes no di�erence whether they give up the

belief in machine intelligence after seeing the failures on the �rst n machines, or

after trying another one. But with respect to retractions, maintaining faith in

cognitive science until the system mn+1 has been tried dominates giving up the

belief in cognitive science beforehand. For if mn+1 is successful, the researchers

need not have retracted their belief in the physical symbol system hypothesis.

But if mn+1 fails too, AI researchers may reason in the same way again:

trying one more systemmn+2 before recanting their faith in arti�cial intelligence

might save them a mind change, without risking any additional retractions. If

the researchers continue to avoid changing their belief in cognitive science in

this way, they will never recognize that machine intelligence is impossible when

it is so.

The cognitive scientists' dilemma is an instance of the general problem of

when exactly an inquirer or a group of inquirers should abandon their current

paradigm. The scientists must eventually jump ship if they want to avoid fol-

lowing the wrong paradigm for ever. But as Thomas Kuhn observed, there is

no particular point at which the revolution must occur [Kuhn 1957]. Indeed,

short-run considerations such as avoiding the embarrassment of dismissing their

prior work|that is, avoiding retractions|pull scientists in the direction of con-

servatism. [Kelly et al. 1997] discusses how the problem of deciding among

scienti�c paradigms looks from a reliabilist perspective. This problem is com-

mon to all forms of inquiry that aim at convergence to the right answer|for

example, philosophers aiming at \re ective equilibrium" face it too. 2

We saw that applying admissibility to the aim of avoiding retractions yields a

standard of performance that is too high for the interesting inductive problems,

in which we cannot simply rely on the evidence and background knowledge to

eliminate all but the true alternative. Learning theorists have examined another

decision criterion by which we may evaluate the performance of a method with

2For example, Rawls says that \since we are using our reason to describe itself and reason

is not transparent to itself, we can misdescribe our reason as we can anything else. The strug-

gle for re ective equilibrium continues inde�nitely, in this case as in all others"[Rawls 1996,III.1.4.]. At the same time, he acknowledges that \of course, the repeated failure to formulate

the procedure [for determining what political choices are just] so that it yields acceptableconclusions may lead us to abandon political contructivism. It must eventually add up or

be rejected." [Rawls 1996, III.1.4, fn. 8]. So part of the ongoing struggle for re ective equi-librium is to \eventually" reject philosophical paradigms (such as constructivism in political

philosophy) that \fail repeatedly". [Glymour and Kelly 1992] interpret Platonic inquiry into

the structure of concepts in terms of convergence to a correct hypothesis.

102

respect to retractions: the classic minimax criterion. Minimaxing retractions

turns out to be a very fruitful principle for deriving plausible constraints on the

short-run inferences of reliable methods.

6.4 Minimaxing Retractions

The minimax principle directs an agent to consider the worst-case results of

her options and to choose the act whose worst-case outcome is the best. So

to minimax retractions with respect to given background knowledge K, we

consider the maximum number of times that a method might change its mind

assuming that K is true, which is given by maxfMC(�; �) : � 2 Kg. 3 If

maxfMC(�; �) : � 2 Kg < maxfMC(�0; �) : � 2 Kg, minimaxing retractions

directs us to prefer the method � to the method �0. As is the case with mind-

change{minimal methods, the principle of minimaxing retractions by itself is

trivial, because the skeptic who always conjectures exactly the evidence never

retracts anything. But using the minimax criterion to select among the reliablemethods the one that minimaxes retractions yields interesting results, as we

shall see shortly. The following de�nition makes precise how we may use the

minimax criterion in this way.

De�nition 6.3 Suppose that � is a reliable discovery method for alternative

hypotheses H given background knowledge K. Then � minimaxes retractions

() there is no other reliable method �0 for H given K such that maxfMC(�; �) :

� 2 Kg > maxfMC(�0; �) : � 2 Kg.

If a discovery problem is such that there is no bound on the number of

times that a reliable method may have to change its mind to arrive at the

truth (as in the case of the almost universal generalizations from Section 2.3.2),

maxfMC(�; �) : � 2 Kg is in�nite for all reliable methods �, and the minimax

criterion has no interesting consequences. But if we can guarantee that a reliable

method � can succeed in identifying the correct hypothesis without ever using

more than n mind changes, the principle selects the method with the best such

bound on vacillations. I say that a method � identi�es a true hypothesis from

a collection of alternatives H given background knowledge K with at most n

mind changes if � is a reliable method for H given K, and maxfMC(�; �) :

� 2 Kg � n. The goal of minimaxing retractions leads us to seek methods that

succeed with as few mind changes as possible; learning theorists refer to this

paradigm as discovery with bounded mind changes [Kelly 1995, Ch.9] .

To get a feel for what minimaxing mind changes is like, let us consider a

simple example. Suppose a scientist wants to investigate whether a certain

particle, say a neutrino �, exists or not. Imagine that the physicist undertakes

a series of experiments e1; e2; :::; en ; :::(bigger accelerators, more sophisticated

detection devices, pure neutrino beams); see Figure 6.3

3If fMC(�; �) : � 2 Kg has no maximum, let maxfMC(�; �) : � 2 Kg = !.

103

Figure 6.3: In Search of a Neutrino.

104

Suppose that the physicist believes that if the particle � exists, one of the

experiments will eventually turn it up, so the existence of the particle is not glob-

ally underdetermined. The situation is essentially the same as with the question

of whether all swans are white (cf. Figure 2.7). What should the physicist's

inferences be? If the particle turns up, there is no problem|conclude that the

particle exists. Di�culties arise when experiment after experiment fails to de-

tect v. The physicist may withhold judgment for a while, but eventually she

must conjecture either that the particle exists or not, or else she never gives an

answer to the question under investigation. Should her �rst hypothesis be that

� will turn up eventually, or that it does not exist? If she conjectures that the

particle does exist, without having found it, reliability requires that she even-

tually change her mind if it does not appear|for else she would never abandon

her mistaken belief in its existence. But after the point at which she �nally con-

cludes that the particle will not be found, a more powerful experimental design

could turn it up, forcing her to change her mind for a second time. By contrast,

a physicist whose initial hypothesis is that � does not exist, may maintain this

view as long as the particle is not detected, and change his mind when it is.

That physicist need never change his hypothesis if � does not exist, and retracts

his view exactly once if � is found. Since the second physicist requires at most

one mind change, but the �rst may undergo two, the second minimaxes mind

changes but the �rst does not. In other words, the principle of minimaxing

mind changes requires that the physicist should �rst conjecture that the parti-

cle in question does not exist. So in this case, minimaxing mind changes leads

to a version of Occam's razor: Do not posit the existence of entities that are

observable in principle but that you have not yet observed.

The traditional version of Occam's razor|\do not needlessly multiply entities"|

applies to cases of global underdetermination, but minimaxing retractions does

not. For example, if the physicist thinks that detecting the particle � in an ex-

periment (or perhaps several experiments) indeed proves its existence, but that

� might be a \hidden" particle that is beyond the reach of physicists' appara-

tus, the hypothesis that � exists is globally underdetermined, and there is no

reliable method for investigating it, let alone a reliable method that minimaxes

retractions. Occam's razor nonetheless directs us to assume that v does not ex-

ist. So although Occam's razor and minimaxing retractions sometimes give the

same result, they are di�erent principles. Indeed, we shall see in Chapter 8 that

minimaxing retractions can even require a theorist to posit hidden particles.

Another example in which the principle of minimaxing retractions leads to

the intuitively plausible recommendation is Goodman's much-discussed \Riddle

of Induction". Recall from Section 2.3.3 that we may think of Goodman's

Riddle of Induction as a discovery problem, with the possible data streams and

hypotheses of interest as indicated in Figure 6.4.

By the same argument as in the search for the neutrino, reliability and

minimaxing retractions directs us to project �rst that all emeralds are green,

rather than any of the \grue" predicates. But a reliable projection rule that

105

Figure 6.4: The Riddle of Induction.

106

minimaxes retractionsmay still wait as long as it pleases to make any projections

about the future. Moreover, we might begin by projecting \green", and produce

the contradiction if a blue emerald is found for as long as we like, provided we

eventually project the appropriate \grue" predicate. Such a method is reliable

and retracts its projection at most once. In both of these cases, the methods that

deviate from the natural projection rule|project \green" until a blue emerald

is observed, then project the appropriate \grue" predicate|fail to take some of

their conjectures seriously, because they do not project them along any possible

sequence of reports on emerald colors. By Proposition 6.1, these methods are

not data-minimal. Hence the only reliable data-minimal projection rule that

minimaxes mind changes is the natural one. (I give a rigorous proof of this fact

in Section 6.8.)

Fact 6.3 In the Riddle of Induction, the only projection rule that is reliable,data-minimal and minimaxes retractions is the natural one.

Since minimaxing retractions yields a form of Occam's razor as well as the

natural projection rule in the Riddle of Induction (in conjunction with relia-

bility and data-minimality), one might think that this solution to the Riddle

is \nothing but" an implicit appeal to Occam's razor. To see that this is not

so, consider the following modi�cation of the Riddle, in which an alternating

hypothesis|the �rst emerald is green, the second blue, the third green, the

fourth blue, etc.|replaces \all emeralds are green". Instead of \grue" predi-

cates, we have hypotheses asserting that the colour alternations come to an end

at a certain point; see Figure 6.5.

As the reader may verify easily, the only reliable data-minimal method for

this discovery problem that minimaxes retractions projects the alternating hy-

pothesis until it is falsi�ed. This example also shows that the means-ends so-

lution to the Riddle does not rest on a syntactic criterion of the sort that was

the target of Goodman's criticism, such as higher \degrees of con�rmation" be-

stowed on universal generalizations of \basic predicates". Nor does it appeal

to notions of intuitive \simplicity" or \uniformity"; the alternating hypothesis

seems to be have neither of these attributes. Instead, my solution depends on

pragmatic features of the problem: which data streams the inquirer regards as

serious possibilities, and which alternative hypotheses he entertains. In Good-

man's Riddle, the topology of the space of possible data streams is such that

minimaxing retractions rules out projecting any of the \grue" predicates. This

connection between the natural projection rule and minimaxing retractions is

not an accident: The structure that Goodman described is exactly the kind of

structure in which the principle of minimaxing retractions applies. The ques-

tion about the existence of the neutrino shares this topological structure, and we

shall encounter it again in the problem of describing reactions among elemen-

tary particles (in Chapter 8 ). The next section characterizes this topological

107

Figure 6.5: Another Riddle of Induction.

108

structure and shows precisely how it is linked to minimaxing retractions.4

6.5 A Characterization of DiscoveryWith Bounded

Mind Changes

A reliable discovery method � identi�es a correct hypothesis from a collection

of alternatives H with at most n mind changes given background knowledge

K if � does not change its mind more than n times on any data stream �

consistent with K. That is, � succeeds with at most n mind changes if � is a

reliable discovery method for H given K, and maxfMC(�; �) : � 2 Kg � n. The

next proposition characterizes what background knowledge K must be like if

there is a discovery method � that reliably identi�es a correct hypothesis from

a collection of alternatives H and never changes its mind more than n times

on any data stream � consistent with K. I de�ne the characteristic condition

inductively, starting with discovery without any mind changes. Consider an

initial conjectureH. Suppose that H is not certain. Then any reliable discovery

method starting with H has to change its mind if H is false. If after this mind

change, still n more mind changes are required, a total of n + 1 mind changes

may result. So if a reliable discovery method � whose initial conjecture is H

never requires more than n mind changes, there must be some point at which

� can change its mind and incur no more than n � 1 mind changes whenever

H is false. The structures that meet this requirement look like \feathers". I

write Fn(K;H) for \K is an n -feather for H". The intended interpretation of

Fn(K;H) is \every reliable discovery method starting with H requires at least

n+ 1 mind changes given K".

De�nition 6.4 Let H be a collection of empirical hypotheses, and let K bebackground knowledge. Let H(�) stand for the (unique) member of H correct on

�. Write Fn(K;H) for \ K is an n-feather for H", and de�ne this notion asfollows.

� F0(K;H)() 9� 2 K �H.

� Fn+1(K;H)() 9� 2 K �H:8k: Fn(K \ [�jk];H(�)):

To illustrate this de�nition, the background knowledgeK in the Riddle of In-

duction is a 0-feather forHgreen (\all emeralds are green")|that is, F0(K;Hgreen)

holds|because Hgreen is not true on every data stream consistent with K (see

Figure 6.4).

4The topological perspective points to a deep interpretation of the signi�cance of Good-man's Riddle: The Riddle shows that translation does not necessarily preserve metric notions,

such as \distance" in the degree of con�rmation of alternative hypotheses|translation changesthe con�rmation ordering. On the other hand, translations do preserve topological relations,

and these determine which hypotheses minimax retractions. It seems that Goodman provided

us with an example ofa translation that is a homeomorphism but not an isometry.

109

But K is not a 1-feather for Hgreen, that is, :F1(K;Hgreen) holds: for

every data stream � in K on which Hgreen is false|on which a blue emerald

is observed at, say time k|there is an initial segment �jk, such that K \ [�jk]

is not a 0-feather for Hgrue(k) (\all emeralds are grue(k)"), that is, Hgrue(k)

is entailed by K; �jk. By contrast, K is a 1-feather for any Hgrue(n), that is,

F1(K;Hgrue(n)) holds: A given hypothesis Hgrue(n) is false on the sequence �

of all green emeralds. And no initial segment � jk entails Hgreen = H(�). So

K \ �jk is a 0-feather for Hgreen.

Figure 6.6 shows the general structure of 0 and 1-feathers, and Figure 6.7illustrates

2-feathers and 3-feathers.

The next lemma shows that feather structures characterize how many mind

changes a discovery method requires that starts with a certain initial conjecture.

Lemma 6.4 Let H be a collection of alternative hypotheses, and let backgroundknowledge K be given. Then there is a reliable discovery method � for H given

K such that

1. � succeeds with at most n mind changes, and

2. �(;) j= H

()(K;H) is not an n-feather (that is, :Fn(K;H)).

The last complication is that a reliable method may delay conjecturing any

of the alternative hypotheses. In fact, minimaxing retractions can require arbi-

trarily long delays. Consider a single hypothesisH under test, and suppose that

the scientist's background knowledge K implies that by time n, the evidence

is guaranteed to entail either H or its negation, but not by any earlier time.

(Figure 6.8 illustrates this scenario.)

A method that accepts only the evidence until time n thus succeeds without

any mind changes; but any conjecture as to whether H is true or false made

before time n runs the risk of refutation.

Thus the full characterization of discovery with bounded mind changes is

this: A reliable method must use at least n + 1 mind changes, if and only

if, there is one data stream � consistent with background knowledge K such

that every initial segment �jk is an n -feather for every hypothesis H under

consideration.

Proposition 6.5 Let H be a collection of alternative hypotheses, and let back-ground knowledge K be given. Then there is a reliable discovery method � for H

given K that succeeds with at most n mind changes () for every data stream �

consistent with K; there is a time k such that (K\[�jk];H(�)) is not an n-feather

(i.e., :Fn(K \ [�jk];H(�)).

110

Figure 6.6: \Feather" Structures characterize Discovery with Bounded Mind

Changes. The �gure illustrates 0-feathers and 1-feathers.

111

Figure 6.7: 2-feathers and 3-feathers

112

Figure 6.8: Minimaxing Retractions requires waiting until time n.

113

From Proposition 6.5, we can derive a universal method �MC for reliably

identifying a correct hypothesis from H given K when (K;H) is not an n-

feather. Say that the dimension of (K;H) is n if (K;H) is an n-feather but not

an n + 1 feather. �MC begins by conjecturing nothing but the evidence until

(K\ [e];H) is of dimension n. Let a �nite data sequence e�x be given; if there is

an H 0 such that (K\ [e�x];H 0) is of lower dimension than (K\ [e�x]; �MC(e)),

then �MC(e � x) = H0; otherwise �MC(e � x) = �MC(e).

6.6 The Hierarchy of Cognitive Goals

So far I have applied admissibility and minimax to mind changes, and admissibil-

ity to convergence time, to arrive at standards of e�ciency for reliable methods.

Methods that minimax convergence time are those that settle on a true hypoth-

esis by a deadline n known a priori. For without such a deadline, there is no

bound on the time that a reliable method may require.

Proposition 6.6 Let H be a collection of alternative hypotheses with back-ground knowledge K. Then there is a reliable discovery method � for H given

K that minimaxes retractions () there is some time n such that for all datastreams � in K; �jn entails one of the hypotheses H in H.

It follows immediately from Proposition 6.6 and Proposition 6.2 that min-

imaxing convergence time places stronger demands on inductive inquiry than

mind-change{minimality does. Both standards of e�ciency require that eventu-

ally the data must entail the correct hypothesis; but minimaxing speed requires

in addition that the evidence yields certainty by a set time.

I complete my survey of reliable e�cient inquiry with another epistemic

value: the number of errors that a reliable method makes before it settles on

the right answer. Formally, error(�; �) = jfn : �(�jn) is false on �gj. We can

apply the admissibility and minimax principle to errors in the by now familiar

way to de�ne two more standards of e�ciency for reliable methods. It turns

out that these do not yield anything new: The only methods that are error-

admissible and minimax error are the \wait-and-see" methods that never go

beyond the evidence. The following theorem brings together a number of the

results from this chapter.

Theorem 6.7 Let H be a collection of alternative hypotheses with backgroundknowledge K. The following conditions are equivalent:

1. Each hypothesis H in H is decidable with certainty given K.

2. There is a reliable discovery method � for H given K that succeeds withno retractions.

114

3. There is a mind-change{minimal reliable discovery method � for H givenK.

4. There is an error-admissible reliable discovery method � for H given K.

5. There is a reliable discovery method � for H given K that succeeds with abounded number of errors.

Theorem 6.7 and Proposition 6.6 establish the lower part of the hierarchy

shown in Figure 6.9.

The fact that empirical inquiry can attain several standards of performance

only when there is no problem of induction is a precise sense in which the

problem of induction makes inquiry di�cult.

Proposition 6.5 demonstrates that there is no bound to the number of mind

changes that discovery problems may require: We can always add one more

feather dimension to make one more mind change necessary. This constitutes an

in�nite subhierarchy|discovery with 0 mind changes, 1 mind change,...|within

the order of cognitive goals. Finally, I noted in Section 6.2 that any solvable

discovery problem has a data-minimal solution. Hence data-minimality takes

the top place in the hierarchy of cognitive goals because it does not discriminate

among problems with reliable solutions.

The hierarchy explains the power data-minimality and minimaxing retrac-

tions in constraining inductive inferences: These two goals are the only ones

that apply when there is a genuine problem of induction. Of these two, min-

imaxing retractions is the more powerful constraint, as our analysis of various

inductive problems showed.

6.7 Data-Minimality vs. Minimaxing Retrac-

tions

Data-minimal methods may be forced to take back their conjectures much more

often than methods that are not evaluated by their convergence time. Figure 6.8

showed an example. A data-minimal method � cannot wait until the evidence

rules out all but one alternative. It must immediately take a guess. But the

next datummay refute that guess; by data-minimality, � has to produce another

conjecture which may be refuted immediately, etc., until � has changed its mind

n times in the worst case. On the other hand, with questions such as Goodman's

Riddle of Induction, the existence of a particle, and|we shall see|describing

the physically possible reactions among elementary particles, an inquirer can

epistemically have it all: Reliably �nd the truth and minimize convergence

time and retractions. In these cases reliable inductive inferences that are data-

minimal and minimax retractions seem to have special intuitive appeal.

This observation is relevant to the status of the minimax criterion as a

decision-theoretic principle for evaluating the performanceof inductive methods.

115

Figure 6.9: The Hierarchy of Cognitive Goals.

116

On general decision-theoretic grounds, [Levi 1980] proposes to evaluate options

�rst by the admissibility criterion, and to make further discriminations among

admissible options by the minimax criterion, much as I am proposing to evaluate

methods �rst by admissibility with respect to convergence time and then by

minimax with regard to retractions. On the other hand, minimaxers are often

criticized for being unreasonably adverse to risks. 5 If an agent thinks that his

chance of getting both birds in the bush is high enough, he should give up the

one he has in the hand. Whether one agrees with this criticism of minimaxing or

not, it does not pertain to data-minimal discovery with bounded mind changes.

For at each stage of inquiry, data-minimal methods may converge to a correct

hypothesis with no further mind changes. In general, the best-case performance

of a data-minimalmethod that minimaxes retractions will be just as good as that

of another method that requires more mind changes. Data-minimal methods

that minimax retractions do go for the birds in the bush, but they make sure

that they at least get the one in their hand.

The next proposition determines the exact extent to which data-minimal

methodsmay have to undergo extra mind changes to solve an inductive problem.

I begin with a variant of De�nition 6.4. If a reliable data-minimal discovery

method can succeed with nmind changes, then whenever the previous conjecture

H of a data-minimal method is refuted, the methodmust be able to immediately

change its mind to a conjectureH 0 after which no more than n�1 mind changes

are required. Another way of putting the matter is that the universal method

�MC for discovery with bounded mind changes is not data-minimal unless for

someH 0; (K\[e�x];H 0) is of lower dimension than (K\[e�x]; �MC(e)) whenever

e � x falsi�es the previous conjecture �MC(e): This is re ected in clause 2 of

the next de�nition. The intended interpretation of DM-Fn(K;H)|read \K is

a data-minimal|n-feather for H"|is \a reliable data-minimal method whose

initial conjecture is H requires at least n+ 1 mind changes".

De�nition 6.5 Let H be a collection of empirical hypotheses, and let K be

background knowledge.

� DM-F0(K;H)() 9� 2 K �H

� DM-Fn+1(K;H) () if H is consistent with K, then 9� 2 K � H suchthat

1. 8k :DM-Fn(K \ [�jk];H(�)), or

2. 9k : [�jk];K j= H, and 8H0 in H consistent with K \ [�jk]: DM-

Fn(K \ [�jk];H 0):

5This issue arises in a prominent place in political philosophy. In his famous reduction of

questions of justice to questions of rational choice behind \the veil of ignorance", Rawles arguesthat agents will apply the minimax criterion to choose social arrangements [Rawls 1971].

Harsanyi for one has contended that rational agents should be prepared to take more risks

than the minimax criterion allows them to do [Harsanyi 1975].

117

Since data-minimal methods must immediately project one of the alternative

hypotheses, a data-minimal method cannot wait for evidence before making a

conjecture; otherwise the characterization is analogous to Proposition 6.5.

Proposition 6.8 Let H be a partition of K. Then there is a reliable method �

for H given K such that

1. � starts with H, and

2. � requires at most n mind changes, and

3. � is data-minimal

() (K;H) is not a data-minimal n-feather, i.e. : Part-DM-Fn(K;H).

6.8 Proofs

Proposition 6.1 Let K be background knowledge, and let H be a collection ofalternative empirical hypotheses. A reliable method � for H is data-minimalgiven K () � always takes its conjectures seriously given K.

Proof. ()) I show the contrapositive. Suppose that there is some �nite

evidence sequence e (consistent withK) such that � does not take its conjecture

�(e) seriously. Let e1 be a shortest data sequence that extends e such that �

does project an hypothesis H from H that is entailed by �(e1) along some data

stream � 2 K. Since � does not project �(e), e1 must properly extend e; hence

we may take e1 = e0 � x, where x is the last datum that appears in e1. Now

de�ne �0 by �0(e0) = �(e1), and �

0(e0) = �(e0) at all data sequences e0 di�erent

from e0. I show that �0 weakly dominates �. By construction, �0 projects the

hypothesisH along � at e0. Thus mod(�0; �) � lh(e0). By contrast, the choice of

e0 implies that � does not stabilize to H along � at e0, so mod(�; �) > lh(e0). So

�0 converges on � faster than � does. Furthermore, on no data stream consistent

with background knowledge K does �0 converges after �. For the only place at

which � and �0 di�er is e0, and by assumption � is not converging at e0 on any

data stream � consistent with K , since e0 is shorter than e1 and so � doesn't

take its conjecture at e0 seriously given K. This establishes that �0 weakly

dominates �.

(() Suppose that � always takes its conjectures seriously given K. Consider

some other reliable method �0 that converges faster than � on some data stream

� 2 K (i.e., mod(�0; �) < mod(�; �)). Let H be the hypothesis correct on �, and

let k be the �rst time after �0 converges on � (that is, k � mod(�0; �)) such that

�(�jk) j= H0 6= H. Now by hypothesis, � projects its conjecture H 0 along some

data stream � 2 K at �jk; that is, k � mod(�; �). Since �0 projects H along �

at k; �0 does not entail H 0 at �jk = � jk. Thus mod(�0; �) > k � mod(�; �). So �0

does not dominate � in convergence time. Since any method �0 that dominates

118

� in convergence time must be strictly faster than � on some data stream � 2 K,

this argument shows that � is data-minimal.2

Proposition 6.2 Let H be a collection of alternative hypotheses, and let Kbe given background knowledge. Then there is a reliable mind-change{minimal

method for H given K () on every data stream �, there is a time n such that�jn and K entail the true hypothesis H in H.

Proof. (() Given the right-hand side, the \skeptical" method � whose theory

is always exactly the evidence reliably identi�es a true hypothesis from H and

never retracts its beliefs.

()) Let � be a reliable mind-change{minimal discovery method for H. Be-

cause beginning with the trivial conjecture �(;) =K does not increase the num-

ber of mind changes of a method on any data stream, we may without loss of

generality assume that �(;) = K. Now suppose that on some data stream �

consistent with K;MC(�; �) > 0. Then there is a �rst time m0 > 0 at which �

makes a non-vacuous conjecture, that is, �(�jm0) is consistent and entails some

hypothesis H from H, and a �rst time m1 > m0 at which � makes a mind

change, that is, �(�jm1) 6j= H or �(�jm1) = ;. Then the following method �0

weakly dominates � with respect to mind changes given K: If �jm0 � e � �jm1;

let �0(e) = K. Otherwise �0(e) = �(e). Then �0 conjecturesK along � until �jm1,

soMC(�0; �) =MC(�; �)�1. And clearly �0 never uses more mind changes than

� does. So � is weakly dominated with respect to mind changes given K. This

shows that if there is a reliable mind-change(minimal method for H given K,

then there is a reliable method � that never changes its mind along any data

stream in K. Such a method � never entails more than the evidence. For sup-

pose that e;K do not entail �(e). Then there is a data stream � consistent with

e and K on which �(e) is false. So to succeed on �; � must change its mind at

least once (after e), contrary to supposition. Since � never entails more than

the evidence and is reliable, eventually the evidence and background knowledge

must conclusively establish which alternative in H is the true one, no matter

what it is.2

Fact 6.3 In the Riddle of Induction, the only projection rule that is reliable,

data-minimal and minimaxes retractions is the natural one.

Proof. The natural projection rule �N conjectures that all emeralds are green

until it encounters a blue one; suppose that the k-th emerald is blue. Then �N

concludes that all emeralds are grue(k). If all emeralds are green, then �N

converges to the right generalization immediately. Otherwise, �N changes its

mind, for the �rst time, after the �rst blue emerald turns up. Hence|assuming

that all emeralds are green or grue(k)|�N �nds the correct generalization with

at most one mind change. Finally, �N always takes its conjectures seriously, so

by Proposition 6.1, �N is data-minimal.

119

Now consider any projection rule � that is reliable, data-minimal and mini-

maxes mind changes; I show that � = �N . Since � is reliable, � must eventually

infer that all emeralds are green if only green emeralds are observed. Let m be

the minimal number of green emeralds from which � generalizes that all emer-

alds are green. I argue that m = 1, that is, � must immediately infer that all

emeralds are green when one green emerald is observed. For suppose otherwise

(m > 1). Since � is data-minimal, � must project some hypothesis other than

\all emeralds are green" before the m-th green emerald is observed. That is, �

changes its mind when the m-th emerald appears. But after � has inferred that

all emeralds are green from the sample of m green emeralds, a blue emerald

may be found, say at time k, which establishes that all emeralds are grue(k).

Since � is reliable and data-minimal, � must then change its mind for the sec-

ond time to conclude that all emeralds are grue(k). So if m > 1, then � does

not minimax retractions; thus � infers that all emeralds are green after seeing

the �rst green emerald. Since � is data-minimal, � projects this hypothesis as

long as all observed emeralds are green. And again by data-minimality, � con-

cludes immediately that all emeralds are grue(k) if the k-th emerald is blue.

Thus � = �N ; that is, the only reliable and data-minimal projection rule that

minimaxes retractions in the Riddle of Induction is the natural projection rule.2

Lemma 6.4 Let H be a collection of alternative hypotheses, and let backgroundknowledge K be given. Then there is a reliable discovery method � for H given

K such that

1. � succeeds with at most n mind changes, and

2. �(;) j= H

()(K;H) is not an n-feather (i.e., :Fn(K;H)).

Proof. The proof is by induction on n.

Base Case, n = 0:

(() Suppose that (K;H) is not a 0-feather. Then H is a priori certain, that

is, K j= H. So the method � that always conjectures H reliably identi�es the

truth from H with 0 retractions.

()) Suppose that (K;H) is a 0-feather. Then there exists a data stream

� 2 K �H. Let � be any reliable method that starts with H (i.e., �(;) = H).

Since � is reliable, � changes its mind on � at least once. Hence every reliable

method that starts with H may change its mind at least once.

Inductive Step: Assume the hypothesis for n and consider n+ 1.

(() Suppose that (K;H) is not an n + 1-feather. Then for every data

stream � 2 K � H, there is a time k such that :Fn(K \ [�jk];H(�)). By

inductive hypothesis, for each such point �jk, we may choose a method ��jk and

a hypothesis H�jk such that �(;) = H�jk and � succeeds with at most n mind

changes given K \ [�jk].

120

Now de�ne a discovery method � that reliably identi�es a correct hypothesis

from H given K with no more than n+ 1 mind changes, starting with H:

1. �(;) = K \H;

2. If there is a time k such that

(a) 0 < k � lh(e), and

(b) (K \ [�jk]) is not an n+ 1-feather for some H 0 in H,

then let k be the least such time and conjecture ��jk(e) .

3. Otherwise, conjecture K \ [e] \H.

To see that � succeeds with at most n+ 1 mind changes, consider any data

stream � 2 K.

Case 1: Clause 3 always obtains along �. Then � converges to H with 0

retractions. Since (K;H) is not an n+ 1-feather, � 2 H, and � is correct.

Case 2: Clause 2 obtains at some point k along �. Assume that k is the �rst

such point. Then on �, time k is the earliest at which � might change its mind.

After time k, � follows ��jk and hence succeeds with at most n mind changes.

Hence overall, � changes its mind at most n+1 times along �. Since this is true

for any data stream � consistent with background knowledge K, � requires at

most n + 1 mind changes.

()) Suppose that (K;H) is an n + 1-feather. Then there is a data stream

� 2 K � H such that for all times k; (K \ [�jk];H(�)) is an n-feather (i.e.,

Fn(K \ [�jk];H(�)) holds). Let � be any reliable discovery method that starts

with H (i.e., �(;) j= H). Then some time along �, � changes its mind to H(�);

let k be the �rst such time. By inductive hypothesis, any method �0 that begins

withH(�) requires at least n+1 mind changes on some data stream � 2 K\[�jk].

In particular, the following method �0 does.

1. �0(;) = H(�);

2. if e � �jk, �0(e) = H(�);

3. if �jk � e, �0(e) = �(e).

By construction, on K \ [�jk], �0 changes its mind only after �jk, and hence

changes its mind on K \ [�jk] exactly when � does. Hence � changes its mind

at least n + 1 times on some data stream � 2 K \ [�jk]. Since � also changes

its mind before � jk = �jk; � requires at least n + 2 mind changes. Hence any

reliable method starting with H may change its mind at least n + 2 times.2

121

Proposition 6.5 Let H be a collection of alternative hypotheses, and let back-ground knowledge K be given. Then there is a reliable discovery method � for Hgiven K that succeeds with at most n mind changes () for every data stream �

consistent with K; there is a time k such that (K\[�jk];H(�)) is not an n-feather(i.e., :Fn(K \ [�jk];H(�) ).

Proof. (() Suppose that for every data stream � consistent withK; there is

a time k such that (K\ [�jk];H(�)) is not an n-feather (i.e., :Fn(K\ [�jk];H(�)).

By Lemma 6.4, for each such point �jk, we may choose a method ��jk and a

hypothesis H�jk such that �(;) = H�jk and � succeeds with at most n mind

changes given K\ [�jk]. Now de�ne a discovery method � that reliably identi�es

a correct hypothesis from H given K with no more than n mind changes:

1. If there is a time k such that

(a) 0 < k � lh(e), and

(b) (K \ [�jk];H 0) is not an n-feather for some H 0 in H,

then let k be the least such time and conjecture ��jk(e).

2. Otherwise, conjecture K \ [e].

For any data stream � 2 K, there eventually comes a �rst time k when

(K \ [�jk];H 0) is not an n-feather for some H 0 in H. After time k, � follows ��jkand hence succeeds with at most n mind changes. Before time k, � conjectures

only the evidence and hence does not change its mind.

()) Conversely, suppose that there is a data stream � 2 K such that for

all times k (K \ [�jk];H(�)) is an n-feather (i.e., Fn(K \ [�jk];H(�) ). Let � be

any reliable discovery method. Then there is a �rst time k along � at which �

conjecturesH(�) (i.e. �(�jk) j= H(�)). By the same argument as in Lemma 6.4,

� requires at least n + 1 mind changes on some data stream � 2 K \ [�jk].2

Proposition 6.6 Let H be a collection of alternative hypotheses with back-ground knowledge K. Then there is a reliable discovery method � for H givenK that minimaxes convergence time () there is some time n such that for all

data streams � in K; �jn entails one of the hypotheses H in H.

Proof. (() Suppose that there is a deadline n by which background knowl-

edge and the data entail which hypothesis is correct. Without loss of generality,

assume that n is the earliest such time. Then a reliable method � can simply

conjecture the evidence until time n . In the worst case (and in the best case), �

converges to the correct hypothesis by time n. Moreover, no other method �0 can

achieve a better guarantee on the time that it requires to �nd the truth. This is

immediate if n = 0. Otherwise, let � be any data stream such that K; �jn�1 do

not entail which hypothesis from H is correct. Since I chose n to be the earliest

122

time by which background knowledge and the evidence always entail the correct

hypothesis, there is such a data stream. Then there is a hypothesis H in H that

is true on some extension � of �jn�1|consistent with K|such that �0(�jn�1)

does not entail H. Hence �0 converges to the correct hypothesis on � only after

time n (i.e., mod(�0; �) � n). Therefore every method requires, on some data

stream consistent with background knowledge K, at least n pieces of evidence

before settling on a correct hypothesis.

()) I show the contrapositive. Suppose that for every bound n , there is a

data stream � such that K; �jn do not entail which hypothesis in H is correct.

By same argument as for the converse implication, this means that for every

method �, and every bound n, there is some data stream � consistent with K

on which � converges to the correct hypothesis only after time n. Hence there

is no maximal element in fmod(�; �) : � 2 Kg.2

Theorem 6.7 Let H be a collection of alternative hypotheses with background

knowledge K. The following conditions are equivalent:

1. Each hypothesis H in H is decidable with certainty given K.

2. There is a reliable discovery method � for H given K that succeeds with

no retractions.

3. There is a mind-change{minimal reliable discovery method � for H given

K.

4. There is an error-admissible reliable discovery method � for H given K.

5. There is a reliable discovery method � for H given K that succeeds with a

bounded number of errors.

Proof. Proposition 6.5 implies that claims 1 and 2 are equivalent. By Propo-

sition 6.2, claim 3 is equivalent to claim 1. I prove the equivalence of claims 4

and 5 with claim 1.

(1 ) 4; 5) If each hypothesis H in H is decidable with certainty given K;

the method � that conjectures nothing but the evidence reliably identi�es the

correct hypothesis from H and never makes any errors. Hence � both is error-

admissible and minimaxes error.

(1( 4) Let � be an error-admissible method. Suppose that � makes an error

on some data stream � consistent with K, say at time k. Then the following

method �0 dominates � in error: �0 agrees with � everywhere but at �jk, and

�0(�jk) = K \ �jk: Then �

0 makes strictly fewer errors than � on �, and never

makes more. So the only error-admissible methods are those that never make an

error on any data stream consistent with K. Hence no error-admissible method

makes a conjecture that goes beyond the evidence. If such a method is reliable,

then all hypotheses are decidable with certainty.

123

(1( 5) I show the contrapositive. Suppose that not every hypothesis under

investigation is decidable with certainty given K. Then there is some hypothe-

sis H in H true on some data stream � consistent with H such that H is never

entailed along �. That is, H is a limit point of H \K (see Section 4.4). Now

consider some possible bound n on the number of errors that a reliable discovery

method � may make on any data stream. Since � is reliable, � eventually con-

verges to H on �, say at time k. Thus � conjecturesH on �jk; �jk+1; :::; �jk +n.

Since H is never entailed along �, there is data stream � extending �jk+n, con-

sistent with background knowledge K on which H is false. Thus on � , � makes

at least n + 1 errors. Since this argument applies to any bound n, there is no

bound on the number of errors that � might make on a data stream. Thus there

is no reliable discovery method that minimaxes errors unless all hypotheses in

H are decidable with certainty given background knowledge K.2

Proposition 6.8 Let H be a partition of K. Then there is a reliable data-

minimal method � for H given K such that

1. � starts with H, and

2. � requires at most n mind changes, and

3. � is data-minimal

() (K;H) is not a data-minimal n-feather, i.e. : DM-Fn(K;H).

Proof. The proof is by induction on n. The base case follows as in the proof

of Proposition 6.5.

Inductive Step: Assume the hypothesis for n and consider n+ 1.

()) Suppose that (K;H) is a data-minimal n+1-feather, i.e. DM-Fn+1(K;H).

Let � be any reliable data-minimal discovery method for H that starts with H.

It follows from Proposition 6.1 that H is consistent with K. So by the de�nition

of DM-Fn(K;H); we have that there is an � such that

1. 8k :DM-Fn(K \ [�jk];H(�)), or

2. 9k : �jk;K j= H , and 8H 0 6= H;H0 in H: DM-Fn(K \ [�jk];H 0):

Case 1: 8k :DM-Fn(K \ [�jk];H(�)). The argument proceeds as in the proof

of Proposition 6.5: A reliable method � must eventually change its mind, say at

�jk; to H(�). But then by the assumption of this case, DM-Fn(K \ [�jk];H(�)),

so � requires at least n + 1 more mind changes by inductive hypothesis.

Case 2: 9k : [�jk];K j= H, and 8H 06= H;H

0 inH: DM-Fn(K\[�jk];H 0): Let

k be the �rst time that witnesses the condition of this case. Since �jk falsi�es H

given K, and � is data-minimal, it follows from Proposition 6.1 that � changes

its mind at �jk, say to H 0 6= H. But then we have that DM-Fn(K \ [�jk];H 0),

124

so by inductive hypothesis, � requires at least n+ 1 more mind changes. Hence

in either case, � requires at least n + 2 mind changes.

(() Suppose that (K;H) is not a data-minimal n + 1-feather, i.e. :DM-

Fn(K;H). At each point �jk for which there is some H 0 in H such that (K \

[�jk];H 0) is not a data-minimal n-feather (i.e. :Fn(K \ [�jk];H 0)), apply the

inductive hypothesis to (K\[�jk];H 0) and choose a method �0�jk

and a hypothesis

H�jk with the properties that

1. �0�jk(;) = H�jk.

2. �0�jk

identi�es a correct hypothesis from H given K \ [�jk] with at most n

mind changes.

3. �0�jk

is data-minimal given K \ [�jk].

If H is consistent with K \ [�jk], I modify �0�jk

as follows: Choose ��jk 2

K \ [�jk] \H. Set ��jk(e) = K \ [e] \H if e � � , and ��jk(e) = �0�jk

otherwise.

Note that ��jk is data-minimal and reliable given K \ [�jk] since �0�jk

is.

Since (K;H) is not a data-minimal n + 1-feather, H is consistent with K .

Choose a data stream � 2 K that makes H true. Now de�ne a data-minimal

discovery method � that reliably identi�es a correct hypothesis from H given K

with no more than n + 1 mind changes:

1. If e � �; �(e) = K \ [e] \H;

2. Else if there is a time k such that

(a) 0 < k � lh(e) and

(b) (K \ [ejk];H 0) is not an n-feather for some H 0 in H

then let k be the least such time and conjecture ��jk(e) .

3. else conjecture H.

By de�nition � starts with H, i.e. �(;) = K \ [e] \ H. I show that � and

identi�es the correct hypothesis fromH using no more than n+1 mind-changes.

Let � 2 K .

Case 1: Clause 1 always obtains along �. Then � = � , and so � stabilizes to

the correct hypothesis H along � (immediately).

Case 2: Clause 1 fails at some point k along � . I consider two further cases.

Case 2a: Clause 2 is satis�ed at some n along � . Let m be the �rst such

time. Two more subcases:

Case 2a1: m � k. Then � follows ��jm, which identi�es the correct hypothesis

from H given K \ [�jm] along �. If � = ��jm, then � again does not change its

mind along � at all. Otherwise ��jm changes its mind at some time m0 � m

125

from H to follow �0�jm

, and thereafter requires at most n mind-changes . Hence

� identi�es the correct hypothesis along � using at most n+ 1 mind-changes.

Case 2a2: m < k. Then ��jn projects H along � until � jk = �jk. Thereafter

��jn follows �0�jn

.

Case 2b: Clause 2 always fails along �. Then by the de�nition of a DM-n+1-

feather, H is true on �. By construction � stabilizes to H along � (immediately).

To see that � is data-minimal, note that � takes its conjecture seriously by

inductive hypothesis on any evidence sequence e on which Clause 2 obtains. So

the only case to consider is when evidence e deviates from � but Clause 2 does

not obtain anywhere along e. This implies

1. �(e) j= H, and

2. that e is consistent with H

The �rst observation holds by Clause 3. The second follows from the second

clause of the de�nition of a DM-n + 1-feather and the fact that K \ [e] is not

an n-feather for some H 0 6= H (because otherwise Clause 2 obtains, contrary to

supposition). If Clause 2 never obtains along some data stream � 2 K\ [e], then

� projects H at e. Otherwise Clause 2 obtains eventually on all data streams

extending e. In particular, Clause 2 must obtain (for the �rst time) on some

data sequence e0 � e such that K; [e0] is consistent with H. But then �e0 and

hence � projects H at e0. Since � maintains H between e and e0 (by Clause 3), �

projectsH at e. So � always takes its conjectures seriously, and thus Proposition

6.1 implies that � is data-minimal.2

126

Chapter 7

Theory Discovery

7.1 Outline

So far I have examined the problem of �nding a correct theory from a range

of mutually exclusive alternatives. This model �ts the situation of a scientist

investigating competing theories about a particular phenomenon of interest.

However, general scienti�c theories treat a broad class of phenomena, and typ-

ically the hypothesis advanced to account for one phenomenon does not rule

out accounts of another phenomenon. For example, the goal of particle physics

is to �nd the ultimate constituents of matter|the elementary particles|and

to determine what reactions elementary particles may undergo. Knowing what

elementary particles there are does not imply knowing how they react with each

other, and the observation of a given reaction does not tell us whether another

is possible. 1 A comprehensive theory of elementary particles answers all these

questions. I refer to the task of reliably �nding a theory that gives the right an-

swers to the questions under investigation as theory discovery. This chapter

develops a learning-theoretic analysis of theory discovery.

I examine two standards of success for theory discovery de�ned by [Kelly 1995,

Ch.12]: uniform and piecemeal theory discovery. Uniform theory discovery aims

to eventually arrive at a complete theory of the domain under investigation. For-

mally, this paradigm reduces to the kind of discovery problems I have treated

so far: take the range of alternative hypotheses to be the set comprising the

complete theories for the domain of inquiry. Finding a true complete theory

of the phenomena under investigation is often a demanding task, even in the

limit of inquiry. Less ambitiously, we may be satis�ed if theorists eventually

�nd the right answer about each question of interest, although there may be no

particular time at which they have all the right answers. [Kelly 1995] refers to

1unless we make assumptions about the relationships among various reactions; see Chapter

8.

127

this standard of success as piecemeal theory discovery. I generalize the norms

of e�cient inquiry from Chapter 6 by considering how theory learners perform

not just with respect to their overall theories, but also with respect to each phe-

nomenon under investigation. It turns out that minimizing the time required to

converge to a complete theory su�ces to minimize the time required to settle

the individual hypotheses (but not necessarily vice versa). On the other hand,

there is no simple relation between minimaxing the number of global theory

changes and minimaxing the number of mind changes about each hypothesis.

Piecemeal reliability o�ers an attractive alternative to \verisimilitude" as a

conception of success in science. One of Popper's main motivations for intro-

ducing the verisimilitude concept was to allow for the possibility that science

may always produce false theories (in the sense of endorsing some false claim)

and yet be \close to the truth". Piecemeal reliability too is a notion of success

that science can attain while producing nothing but false theories (cf. Section

7.4). But piecemeal reliability is a topological notion based on convergence to

the truth that does not require a dubious metric for measuring \distance" from

the truth.

A common and attractive way of formulating scienti�c theories is to express

them as a �nite set of postulates, \laws of nature" or universal equations. I

show that whenever it is possible to piecemeal �nd the truth about each hy-

pothesis under investigation, it is possible to do so by producing, at each stage

of inquiry, a �nite set of axioms about the phenomena of interest. On the other

hand, theorists who produce only �nitely axiomatizable theories may converge

to the right answers more slowly than other theorists. So the desiderata of pro-

ducing �nitely axiomatizable theories and minimizing convergence time (i.e.,

data-minimality) may con ict with each other.

The model of theory discovery developed in this chapter extends the reach

of learning-theoretic analysis considerably. In Chapter 8, I apply the machinery

from this chapter to analyze the two main research problems of particle physics:

(1) Finding out what elementary particles exist, and (2) determining how ele-

mentary particles react with each other. Another important application of the

theory discovery model is a critique of common postulates for \belief revision"

from a reliabilist point of view (cf. Section 3.5).

7.2 Reliable Theory Inference

In the context of theory discovery, I shall refer to scienti�c methods as theory

learners, theory discovery methods, or simply theorists. [Kelly 1995,

Ch.12] de�nes two senses of reliability for theory learners. Uniform theory

identi�cation requires that the learner's conjectures must eventually always

be correct and complete. Piecemeal theory identi�cation requires that, for

each hypothesis of interest, the learner's conjectures eventually always entail

the truth about the hypothesis (but at no given time need the learner entail the

128

Figure 7.1: Two Notions of Theory Discovery: (a) Uniform Theory Discovery

(b) Piecemeal Theory Discovery

truth about all hypotheses at once). Figure 7.1 illustrates these two notions of

success in inferring theories.

The formal de�nitions are as follows.

Let H be a collection of empirical hypotheses, and let K be background

knowledge. In what follows, I will assume that if a hypothesis H is in H,

then so is its negation H. An H-theory T is K-complete () for every

hypothesis H in H, either K;T j= H or K;T j= H.

De�nition 7.1 A theory learner � uniformly identi�es the H-truth givenbackground knowledge K () for every data stream � consistent with K, thereis a time n such that for all later times n0 � n; �(�jn0) is K-complete and correct

on �.

129

De�nition 7.2 A theory learner � piecemeal identi�es the H-truth givenbackground knowledge K () for every data stream � consistent with K, forevery hypothesis H in H, there is a time n such that for all later times n0 �

n; �(�jn0);K j= H () H is correct on �.

7.3 Uniform Theory Discovery

I characterize the conditions under which it is possible to reliably identify a

complete theory. We may treat uniform theory discovery as a discovery problem

by taking the alternatives under investigation to be (equivalence classes of)

complete theories. Then the results fromChapter 4 yield necessary and su�cient

conditions for uniform theory discovery.

TwoH-theories T; T 0 areK-equivalent if and only if for eachH inH: T;K j=

H () T0;K j= H. The next lemma says that the K-equivalence classes of K

-complete theories form a partition: on any data stream � in K, all K-complete

theories that are true on � are equivalent to each other.

Lemma 7.1 Let � be a data stream in K, and let T; T 0 be two K-complete

H-theories true on �. Then T \K = T0\K.

Proposition 7.2 (Kevin Kelly) The H-truth is uniformly identi�able givenK () there are only countably many K-equivalence class of K-complete H-

theories [T1]; [T2 ]; : : : [Tn]; : : : and each Ti is decidable in the limit given K.

Since the K-complete H-theories form a partition of K, it follows by the

arguments from Chapter 6 that a method � for uniform theory discovery is

data-minimal just in case � always projects its current theory. In that case

I say that � is globally data-minimal. Similarly, the results from Chapter

6 characterize when and how uniform discovery methods can globally succeed

with a bounded number of mind changes. New issues arise when we consider the

performance of a theorist with respect to the individual hypotheses of interest.

A theory learner � that reliably identi�es the completeH-truth settles the truth

of each hypothesisH in H. Provided thatH is closed under complementation|

as I assume throughout|this means that � is a reliable discovery method for

H0 = fH;Hg. Thus we can de�ne the modulus and the mind changes of � with

respect to H as in Chapter 6; I denote these measures of �'s performance by

modH(�; �) and MCH(�; �).

I say that an H-theory learner � is piecemeal data-minimal if � is data-

minimal with respect to each hypothesis H in H. An H-theory learner � piece-

meal minimaxes mind changes if � minimaxes mind changes with respect

to each hypothesis H in H.

Since data-minimality requires a de�nite opinion about every question of

interest at each stage of inquiry, a theory learner � that is data-minimal either

with respect to convergence to a complete theory or piecemeal data-minimal

130

Figure 7.2: Method � projects both Hr and H2 along some data stream, but

not both on the same data stream.

always produces complete theories consistent with the evidence. Moreover, if �

converges to a complete theory as quickly as possible, � projects its current|

and complete|theory T along some data stream � (by Proposition 6.1). Since T

is complete, this implies that for each hypothesis H in H, � projects its current

conjecture about H along �. So if a method � is globally data-minimal, � is

piecemeal data-minimal.

Fact 7.3 Let H be a collection of hypotheses for investigation with background

knowledge K. Suppose that a method � for �nding a correct and complete H-theory is globally data-minimal. Then � is piecemeal data-minimal.

The converse may fail. A theory learner that is piecemeal data-minimal must

project each hypothesis entailed by its current theory along some data stream,

but need not project all of them along the same data stream. For example,

suppose that H2 asserts that at most two particles will be found in the lab, and

that Hr claims that some particle will decay into another (see Figure 7.2).

Suppose that before any evidence is gathered, � conjectures bothH2 andHr.

If the �rst experiment shows one or more particles, � conjectures thatH2 is false

131

but that Hr is true. Then if the second experiment shows one particle decaying

into another, � becomes certain that Hr is true. If the �rst experiment does not

turn up any particles, � conjectures that H2 is true but that Hr is false. Then

if the second experiment discovers no more than two particles, � conjectures

H2 and continues to do so unless more than two particles are observed. So �

projects each of H2 and Hr along some data stream, but there is no data stream

along which � projects its initial theory H2 \ Hr. Thus � is piecemeal but not

globally data-minimal.

Methods for discovering complete theories may minimize the number of the-

ory changes even though they cannot minimax retractions with respect to indi-

vidual hypotheses. Intuitively, the reason is that the right answer to one of the

questions under investigation may be a bold generalization that implies another

bold generalization about a second question. The theorist may have to take back

the generalization about the second question later, changing his mind more often

about the question than was necessary. In fact, Goodman's Riddle of Induction

has this structure (see Figure 6.4.) Each hypothesis Hgrue(k) is decidable with

certainty, and hence with 0 mind changes (Theorem 6.7), by waiting until the

critical time k. But if all observed emeralds are green, eventually any reliable

method � has to project that all future emeralds are green, too. If this occurs

at time k, � conjectures that Hgrue(k+1) is false. Then if the k + 1-th emerald

is blue, proving that Hgrue(k+1) is the correct generalization about the color of

emeralds, � has to change its mind about Hgrue(k+1). Hence � succeeds with a

bounded number of overall theory changes (namely 1), but does not minimax

retractions with respect to the individual Hgrue(k) hypotheses.

Conversely, a theory learner may piecemeal minimax retractions and at the

same time change its overall theories more often than is necessary. The reason

is that a theorist may schedule mind-changes about the individual hypotheses

badly. For example, take H1 to be the hypothesis that there is a white raven,

and let H!1 be the hypothesis that there is exactly one white raven (see Figure

7.3).

It is possible to assess H1 with at most one mind change; H!1 requires at

least two, but not more. The following method � �nds the complete truth about

H1 and H!1 with at most two mind changes: Conjecture that there is no white

raven until one is found (that is, hypothesize that both H1 and H!1 are false);

then conjecture that there is exactly one until a second white raven turns up.

Now consider another method �0 that also begins by conjecturing that there is

no white raven until one is discovered. When the �rst white raven appears, �0

concludes that H1 is true, as � does; but �0 continues to think that H!1 is false

(�0 may believe that if there is one white raven, there must be another). Now

if no second white raven appears, �0 must eventually change its theory|for the

second time|and guessH!1, claiming that there is exactly one white raven. But

just at that point, the second white raven may appear, leading �0 to change its

theory for the third time. Since there is another reliable method that changes

its theory at most twice, namely �, this means that �0 does not minimax theory

132

Figure 7.3: Method �0 changes its overall theory three times, its conjecture

about H!1 twice, about H1 once.

133

changes. On the other hand, �0 changes its mind about H1 and H!1 respectively

no more often than is necessary: At most once about H1, and at most twice

about H!1.2

Although globally minimaxing retractions is in general independent of piece-

meal minimaxing retractions, it seems natural to require that a method should

avoid changing its overall theory as well as its conjectures about particular

hypotheses as much as possible. Chapter 8 uses this idea to de�ne optimal

inference in particle physics.

7.4 Piecemeal Theory Discovery

Recall that a method � piecemeal identi�es the H-truth just in case for each

hypothesis H in H; � eventually converges to the correct opinion about H on

every possible data stream (De�nition 7.2). Clearly, uniform success in the-

ory discovery entails piecemeal success. Also, there is no guarantee that the

successive theories produced by a piecemeal discovery method get ever more

\verisimilar". A piecemeal method is permitted to add as much new falsehood

as it pleases at each stage, so long as more and more hypotheses have their truth

values correctly settled.

A few examples may clarify the di�erence between the two concepts of suc-

cess. Let H0 be the set of all evidence propositions [e] and their complements.

It is trivial to �nd the complete H0 truth in a piecemeal fashion simply by re-

peating the data as it is received. But no method can do so uniformly, since

for each data stream �, f�g is the complete H0-truth, and whereas there are

uncountably many such singleton theories, the range of each discovery method

is countable, so most such theories cannot even be conjectured.

Even when there are only countably many distinct H-complete theories,

piecemeal success may be possible when uniform success is not. Let the hypoth-

esis Hn say that all swans after and including the n-th one are white. Let H1

be the set of all such hypotheses and their negations. H1 requires piecemeal

solutions to perform nontrivial inductive inferences, since each hypothesis Hn is

a claim about the unbounded future. This time, there are only countably many

distinct H1-complete theories 3. To succeed piecemeal, let method � conjecture

Hn+1�Hn (\the last non-white swan was the n-th one, and the n�1-th swan is

not white") when the last non-white swan occurs in the data occurs at position

n. Nonetheless, uniform success is still impossible. For suppose for reductio

that some method � succeeds uniformly. Then a wily demon can present one

2This example raises another issue for evaluating the mind changes of a theory discoverymethod. Some of �'s overall theory changes seem more signi�cant than others (in particular,

the �rst one does). It would be good to have a criterion that weights mind changes by their

signi�cance, for example by their respective losses of content. This seems to require a measureof content (as in [Levi 1967]).

3Such a theory either says exactly when the last non-white swan appears or says that there

are in�nitely many non-white swans.

134

white swan after another until �'s conjecture entails H0 (\all swans are white"),

which it must eventually do on that data stream. Then a non-white swan is

presented (e.g., at stage m) followed by only white ones until �'s conjecture

entails Hm+1, which it must eventually do on that data stream, etc. The data

stream � so presented features in�nitely many non-white swans, so � produces

in�nitely many conjectures inconsistent with the complete H1-theory of � .

The same argument shows that no piecemeal solution could possibly succeed

by eventually producing only true hypotheses, since the construction forces an

arbitrary piecemeal solution to produce in�nitely many false conjectures. Hence,

piecemeal success is sometimes possible only if in�nitely many false conjectures

are produced.

When H is closed under complementation, a method � that piecemeal iden-

ti�es the H-truth decides each hypothesis H in H. Hence if the H-truth is

piecemeal identi�able, it must be the case that each H 2 H is decidable in the

limit (cf. Section 4.4). But this condition is not su�cient in general. A simple

counterexample is obtained by taking the hypotheses of interest to be the sin-

gleton theories consisting of exactly one data stream, i.e. H 2 H2 () H = f�g

or H = f�g. Each hypothesis in H2 is veri�able or refutable with certainty,

but it is impossible to piecemeal discover the H2-truth without countable back-

ground knowledge. For the only consistent H2 -theory that entails a singleton

hypothesis consists of the hypothesis itself. But if there are uncountably many

data streams consistent with the learner's background assumptions, then most

of these cannot be conjectured since the learner can only make countably many

conjectures. If f�g is one of the theories never conjectured by the learner, the

learner will fail on data stream �.

An important special case arises when there are only countably many hy-

potheses under investigation. For example, the hypotheses of interest may be de-

�ned in a countable language. Proposition 7.4 below (established by [Kelly 1995,

Prop.12.20]) asserts that if H is countable and each hypothesis in H is decidable

in the limit, then the H-truth can be identi�ed piecemeal. So if piecemeal dis-

covery of theH-truth requires thatH is countable, the additional condition that

each hypothesis in H be decidable in the limit would be necessary and su�cient

for identifying the H-truth piecemeal. I leave open the question of whether it

is possible to piecemeal identify the H-truth for uncountable collections H of

empirical hypotheses.

Proposition 7.4 (Kevin Kelly) Suppose that H is countable. Then the H -truth is piecemeal identi�able given K () each H 2 H is decidable in the limitgiven K.

As with uniform theory discovery, I say that a method � that piecemeal iden-

ti�es the H-truth is (piecemeal) data-minimal if and only if � is data-minimal

with respect to each hypothesis H in H. As we saw in Section 7.3, a data-

minimal theory learner � always conjectures a complete theory. The following

135

proposition 7.5 says that for any piecemeal solvable problem, there is a data-

minimal solution �. In fact, the proof of Proposition 7.5 shows more: We can

construct a piecemeal data-minimal method reliable method that projects its

entire current theory along some data stream at each stage.

Proposition 7.5 Suppose that the H-truth is piecemeal identi�able given K.Then there is a data-minimal H-theory learner � that piecemeal identi�es the

H-truth given K.

As with uniform theory learners, I say that a piecemeal theory learner �

piecemeal minimaxes retractions if � minimaxes retractions with respect to each

hypothesis under investigation. As we saw in Section 7.3, the analog of Propo-

sition 7.5 fails for piecemeal minimaxing retractions: The investigation of a

class of phenomena may force a theorist to change his mind about individual

hypotheses more often than is necessary.

7.5 Countable Hypothesis Languages and Finite

Axiomatizability

A natural and attractive way of presenting scienti�c theories is to express them

as a set of postulates, for example \laws of nature", or universal equations.

So we may desire methods for theory inference that express their theories with

�nitely many postulates.

I say that an empirical theory T is �nitely axiomatizable in a language

H given background knowledge K if T is the intersection of a �nite number of

empirical propositions fromH with K. According to this de�nition, a theorist's

conjecture may be �nitely axiomatizable even when his background theory is

not. For example, if the theorist postulates Kepler's laws, arithmeticmay well be

a part of her background assumptions (with no �nite axiomatization), whereas

there are only three of Kepler's laws.

The next proposition shows that if the collection H of hypotheses under

investigation is countable, and it is possible to piecemeal identify the H-truth,

then it is possible to do with �nitely axiomatizable theories only.

Proposition 7.6 (Kevin Kelly) Suppose that H is countable, and that the H-

truth is piecemeal identi�able given K. Then there is an H -theory learner � thatidenti�es the H-truth and always produces theories that are �nitely axiomatizable

in H.

By Proposition 7.5, we can have a data-minimal method for piecemeal iden-

tifying the H-truth, if that is possible at all, and by Proposition 7.6, we can

have a method that produces �nitely axiomatizable theories only. But we can-

not always have a method that does both. Suppose for example that we looked

upon mathematics as a \physics of abstract objects" [Co�a 1991], in which

136

mathematicians adopt proof systems that \save the mathematical phenomena",

namely the set of mathematical propositions accepted by the mathematical com-

munity at a given time. Let's consider arithmetic, the theory of natural num-

bers. To keep the example simple, I shall take the \phenomena"|the evidence

at time t|to consist of a �nite list of axioms sound and complete for the set of

arithmetical propositions accepted by the mathematical community at time t .

The hypotheses of interest are all sentences of arithmetic; hs is correct just in

case sentence s is eventually always part of the accepted arithmetic of the time.

More formally, the hypothesis hs is correct on a sequence A1; A2; ::: of axiom

systems just in case there is a time n such that for all later times n0 � n; s is a

theorem of An0 . As stated, each hypothesis hs is only veri�able but not decid-

able in the limit. But we may add the background assumption K that once a

sentence s of arithmetic is accepted by the community (that is, once s is a the-

orem of an axiom system An), then the community never retracts s (that is, s

remains a theorem of An+1; An+2; ::: ). Then each hypothesis hs is equivalent to

a simple existential claim (\there is a time at which s is accepted"), so by Propo-

sition 7.4, there is reliable method for piecemeal identifying which sentences of

arithmetic the mathematical community will accept; and by Proposition 7.5,

there is a data-minimal method for doing so. Also, there are only countably

many sentences of arithmetic, so by Proposition 7.6, there is a piecemeal theory

learner for this problem who conjectures �nitely axiomatizable theories only.

These propositions still apply if we add as further background knowledge K 0

the optimistic belief that in the limit of mathematical inquiry, mathematicians

�nd all (and only) truths of arithmetic. So hs is correct on any sequence � of

arithmetics consistent with K0 just in case s is a true sentence of arithmetic.

Now a data-minimal method for piecemeal identifying the truth about each hy-

pothesis hs given background knowledge K 0 has to produce a complete theory

consistent with K0, in the language of arithmetic. But it follows from G�odel's

�rst incompleteness theorem that no such complete theory is �nitely (or even

recursively) axiomatizable.

7.6 Proofs

Lemma 7.1 Let � be a data stream in K, and let T; T 0 be two K -completeH-theories true on �. Then T \K = T

0\K.

Proof. If both T and T0 are inconsistent with K, the claim is immediate.

Suppose for reductio that there is some data stream � on which T \K is correct

but T 0\K is not. LetHT = fH 2 H : T j= Hg, and de�neHT 0 similarly. Since

an H-theory is the intersection of hypotheses in H, we have that T =THT ,

and T0 =THT 0 . As � 62 T

0, there must be a hypothesis H 2 HT 0 such that H

is not correct on �. Since T 0 � H by de�nition of HT 0 ; we have that T0;K j= H,

and so T;K j= H because T and T0 are K-equivalent and K-complete. So

T \K � H, contrary to the assumption that � 2 T \K �H.2

137

Proposition 7.2 (Kevin Kelly) The H-truth is uniformly identi�able givenK () there are only countably many K-equivalence class of K-complete H-theories [T1]; [T2 ]; : : : [Tn]; : : : and each Ti is decidable in the limit given K.

Proof. Without loss of generality, consider only theory learners that produce

K-complete theories. We may think of such a theory learner as conjecturing

the equivalence class of its K-complete theory. The proposition then follows

immediately from Proposition 4.2 and Lemma 7.1.2

Proposition 7.5 Suppose that the H-truth is piecemeal identi�able given K.

Then there is a data-minimal H-theory learner � that piecemeal identi�es theH-truth given K.

Proof: Choose an H-theory learner � whose conjectures are always K-

consistent and that piecemeal identi�es the H-truth given K. I associate a

data stream Stream(e) 2 K with each �nite data sequence e consistent with

K, in the following way. Let e � x denote the �nite data sequence in which e is

followed by x 2 E.

1. Stream(;) = some � 2 K such that � 2 �(;).

2. Stream(e � x) =

(a) Stream(e) if e � x � Stream(e)

(b) else some � 2 K such that � 2 �(e � x).

Now de�ne

�0(e) = fStream(e)g:

I verify that �0 piecemeal identi�es the H-truth given K. Let H 2 H and � 2 K

be given. Let m = mod(�;H; �). Suppose that H is correct on �. Then for all

m0 � m; �

0(�jm0);K j= H. So for all m0 � m;Stream(�jm0) 2 H, and hence

�(�jm0);K j= H. Similarly when H is not correct on �. To see that � is data-

minimal, note that � always projects its current theory �(e) along Stream(e).

By the argument for Fact 7.3, � is piecemeal data-minimal. 2

Proposition 7.6 Suppose that H is countable, and that the H-truth is piece-meal identi�able given K. Then there is an H-theory learner � that identi�es

the H-truth and always produces theories that are �nitely axiomatizable in H.

Proof: Let H and K be as described, and let H1;H2; : : : ;Hn; : : : be an enu-

meration of H. Let � be a consistent H-theory learner that piecemeal identi�es

the H-truth given K. Let Pos(e) = fHj 2 H : �(e);K j= Hj and j � lh(e)g:

De�ne

�0(e) =

\Pos(e):

138

Clearly �'s theories are �nite intersections of propositions inH and hence �nitely

axiomatizable. To see that �0 piecemeal identi�es the H-truth, let � 2 K and

Hi 2 H be given. Let m(i) = modHi(�; �), and let m0 = maxfm; ig. Then

modHi(�0; �) � m

0.2

This argument together with the observation that if � piecemeal identi�es the

H-truth given K, then � decides each H 2 H in the limit given K, establishes

Proposition 7.4.

139

Chapter 8

Reliable Theory Discovery

in Particle Physics

8.1 Outline

One of the aims of the philosophy of science is to analyze the methodology of

problems that arise in scienti�c practice. In this chapter I study the two basic

inference problems of particle physics: To �nd out what particles exist, and to

determine what reactions are possible among them. For each of these questions,

I ask �rst under what assumptions it is possible to settle reliably on the right

answer. Second, I examine how to search for the right answer e�ciently, in the

sense of avoiding retractions and minimizing convergence time from Chapter 6.

Even with generous assumptions about the power of experimental apparatus

for detecting elementary particles, there is no reliable method for determining

whether there are �nitely or in�nitely many elementary particles|even in the

limit of inquiry. But if we add the common assumption that there are only

�nitely many elementary particles, it becomes a straightforward matter to iden-

tify them. Indeed, there is a unique data-minimal inference rule that minimaxes

retractions: Posit that the only particles that exist are those observed so far.

Thus we �nd, as in Section 6.4, that the same e�ciency considerations that

yield the natural projection rule in Goodman's Riddle of Induction underwrite

inferences in the spirit of Occam's razor in particle research.

Without further background assumptions, it is a di�cult empirical problem

to arrive at a complete true theory of how elementary particles react with each

other. There is no reliable method for accomplishing this, even if we assume

that there are only �nitely many elementary particles and that we have discov-

ered them all. This situation changes if we add a prominent feature of particle

physics: The idea that a complete theory of elementary particle reactions should

be a conservation theory. A conservation theory introduces certain quantities,

140

assigns each particle a value of those quantities, and posits that a reaction is

possible just in case it conserves all of them. The belief that some conserva-

tion theory can describe the physically possible particle reactions is a material

assumption about nature; I call it the conservation principle. Under the conser-vation principle, �nding a complete true theory of reactions among observable

particles becomes a relatively easy problem: For a given �nite set of particles,

we can identify the possible reactions among them with a bounded number of

mind changes (cf. Section 6.4). And again, avoiding retractions requires that

we take the possible reactions to be those observed so far.

The task of �nding a complete correct theory of particle reactions becomes

more complex if we allow that reactions may involve virtual, or hidden particles.

Under the conservation principle, together with the assumption that there are

only �nitely many particles (of the observable and the unobservable kind), we

still have a reliable solution to this problem. But we cannot guarantee a bound

on how often a theorist may have to change her mind to arrive at a correct

theory of particle reactions.

In the presence of virtual particles, the conservation principle interacts with

e�ciency considerations in surprising ways: It turns out that conservation theo-

ries must sometimes introduce hidden particles to rule out unobserved reactions.

I describe an algorithm for doing so. This algorithm is a reliable procedure for

identifying a complete true theory of the reactions that are possible among a

�nite set of particles; it is e�cient with respect to (1) minimizing convergence

time (data-minimality), (2) avoiding global theory changes, and (3) avoiding

changing its hypotheses about individual reactions. This analysis gives a means-

ends interpretation of the role of conservation theories and hidden particles in

particle physics: They are important moves in the game of inquiry that serve

reliability and e�ciency.

Both parsimony and conservatism|avoiding retractions|have been advanced

as important principles of scienti�c inference. In the problem of inferring con-

servation theories, there is a tension between the two. My results show that

when observed reactions refute the current conservation theory, the theorist

typically has a choice between assigning new and fewer quantum properties,

on the one hand, and extending the current ones by introducing virtual par-

ticles (such as the muon and electron neutrino). Historically, physicists have

not changed quantum properties once introduced. Whether this is a choice for

conservatism and against parsimony is an interesting question. In view of the

mass of data about particles and particle reactions, investigating this question

would require a computer program that, for a given database of particles and

reactions, �nds conservation theories that introduce as few quantum properties

and virtual particles as the data permit. Designing such a program and run-

ning it on the currently available evidence would be an interesting project. Is

it possible to simplify particle physics?

These considerations raise the question of how many quantum properties

particle physics might need. It turns out that there is a surprising connection

141

between the number of stable particles (which is small compared to the number

of unstable particles and reactions) and the complexity of conservation theories:

I show that under (a version of) conservation of energy, there cannot be more

(linearly independent) quantum properties than there are stable particles.

8.2 Elementary Particles and Reactions

A particle is an object that obeys the rules of quantum mechanics for a point

with well-de�ned mass and charge [Omnes 1971, Ch.1.1]. Physicists refer to

particles that are neither atoms nor nuclei as `elementary' particles 1. The goal

of particle physics is to determine what elementary particles there are, and to

�nd out how they react with each other. Table 8.1 shows some well-known

elementary particles. 2

Most elementary particles are unstable and decay quickly. For example,

the decay of the pion into a muon and a neutrino takes on average 2:5 �10�8

seconds. The standard notation for this decay is �+ ! �+ + ��. Two colliding

particles can react with each other to form new particles. For example, two

protons may react to produce a proton, a neutron and a pion. The notation for

this reaction is p+ p! p+ n+ �+. It appears that three-way collisions are too

unlikely to be of interest to physicists.

I introduce a number of mathematical objects to represent particles and

reactions. Let P be the set of logically possible types of particles. (In what

follows, I follow physicists in not always distinguishing between types and tokens

of particles, and will for example speak of the `logically possible particles' rather

than the \logically possible types of particles"). Since particles are discrete

entities, I take P to be the set of natural numbers N. As is standard practice,

I will denote particles by Roman or Greek letters, sometimes with indices (e.g.,

p1). The reagents in a reaction are represented by a function a : P! N; the

function a indicates how many instances of a given particle type are among

the reagents. For example, in the reaction p + p ! p + n + �+, we have that

a(p) = 2, and a(q) = 0 for all logically possible particles q other than the proton.

Similarly, the products in a reaction are represented by a function p : P! N.

In the reaction mentioned, we have that p(p) = 1;p(n) = 1;p(�+) = 1; and

p(q) = 0 for all other logically possible particles. A reaction r is a pair (a;p).

For a given reaction r = (a;p), I denote the reagents in r by agents(r), and

de�ne this set to be agents(r) = fp 2 P : a(p) > 0g: A reaction r = (a;p) is a

decay if only one particle occurs among the reagents, that is, if agents(r) = fpg

and a(p) = 1. A reaction r = (a;p) is a collision if exactly two particles occur

among the reagents, that is, ifP

p2agents(r) a(p) = 2. The products in r are

denoted by products(r), and de�ned as products(r) = fp 2 P : p(p) > 0g.

1except for the proton, which is considered an elementary particle although it is the nucleus

of the hydrogen atom.2The table is from [Cooper 1992, p.455].

142

Symbol Name Mass in Mev Charge

�� xi minus 1319 �e

�� antixi plus 1319 +e

�0 xi zero � 1311 0��0 antixi zero � 1311 0

�� sigma minus 1196 �e

��+ antisigma plus 1196 +e

�0 sigma zero 1192 0��0 antisigma zero 1192 0

�+ sigma plus 1190 +e�� antisigma minus 1190 �e

�0 lambda 1115 0��0 antilambda 1115 0

n neutron 940 0

�n antineutron 940 0

p proton 938 +e

�p antiproton 940 �e

K0 K zero 498 0

�K0 anti-K zero 498 0

K+ K plus 494 +e

K� K minus 494 �e

�+ pion plus 140 +e

�� pion minus 140 �e

�0 pion zero 135 0

photon 0 0

�� muon minus 106 �e

�+ muon plus 106 +e

e� electron 0.511 �e

e+ positron 0.511 +e

�e e neutrino 0 0

��e e antineutrino 0 0

�� neutrino 0 0

�� antineutrino 0 0

Table 8.1: Some Elementary Particles

143

Figure 8.1: A Particle World and the Particles in it.

The particles involved in r are the reagents and the products in r; formally

particles(r) = agents(r)[products(r). LetR denote the set of logically possible

reactions. For a given set P of particles, R(P ) denotes the set of possible

reactions involving particles in P ; formally, R(P ) = fr 2 R : particles(r) � Pg.

A possible particle world w is a set of reactions, that is a subset of R. The

reactions in w are called the physically possible reactions in w. The particles

in the particle world w are the particles involved in the physically possible

reactions in w; that is, particles(w) =Sfparticles(r) : r 2 wg|see Figure 8.1.3

I denote the set of possible particle worlds by W . A proposition about

particles is a set of particle worlds. A particle theory is a proposition about

particles. Thus my particle theories only give information about what reactions

are possible and what particles exist, but not about the order in which reactions

occur. Nor do they make claims about reaction times, the momenta of the

reagents and the products, or other quantities associated with a reaction. I

leave the task of expanding the model of particle theories for future work; we

shall see that the problem of determining what reactions are possible is rich

enough by itself.

8.3 Evidence in Particle Physics

Particle physicists gather information about the actual world of particles through

experiments and observations. As the experimental practice of particle physics

proceeds, the laboratories report more and more reactions. Let us imagine

that the experimental community issues a sequence l1; l2; :::; ln of successive re-

ports of the reactions that the laboratories of particle physicists have observed.

What can we assume about the relationship between the phenomena that the

lab reports and the way particles actually are? (In Kant's terminology, this

is a question about the relationship between the world of experience and the

particle world-in-itself.) An optimist would believe that whatever the lab re-

3Thus I implicitly assume that each particle in a particle world takes part in at least onereaction.

144

ports actually occurred; in that case, let us say that the experimental practice

is sound. He might also have faith that experimentalists will eventually turn

up all the particles that there are, if not immediately, then at least eventually

as inquiry continues inde�nitely. In that case I say that experimental practice

is complete in the limit. It's not easy to tell whether physicists believe that

their experimental practice will eventually turn up all particles that there are,

but in any case they don't seem too worried about the possibility that a particle

might forever escape detection by their instruments. 4 It is naive to think that

experimental practice is sound and always gets it right. The record shows some

prominent examples in which physicists reported experimental results that they

later rejected as spurious. This fact raises some interesting questions about the

epistemology of experimentation. One approach would be to ask under what

circumstances we are \justi�ed" in accepting experimental reports as true, or

when laboratory observations \con�rm" hypotheses about particles and their

behavior. ([Franklin 1990] pursues this project in detail.) A reliabilist would

instead take a long-run perspective and ask in what circumstances experimental

practice can eventually settle on a correct account of the empirical phenomena.

For example, Karl Popper proposed to view experimental reports as universal

hypotheses of the form \all future experiments will replicate this phenomenon"

[Popper 1968]; see also Section 2.4. On this view, accepting a lab report until

it fails to be replicated will converge to the correct opinion about experimental

e�ects (although we may not be certain that a phenomenon is genuine.) It may

be too optimistic to suppose that we can always trust that failures to replicate

a phenomenon prove the phenomenon to be spurious; all that we may want to

demand of experimental inquiry is that eventually, after some �nite number of

replications and failures of replication, it should yield the correct opinion about

whether or not a laboratory observation re ects the actual particle world accu-

rately. In that case, I say that experimental practice is sound in the limit. An

interesting reliabilist project is to examine what experimental designs are sound

and complete in the limit relative to given background assumptions about the

domain under study (perhaps using the tools from [Kelly 1995, Ch.14]). In this

thesis, however, my main interest is in what inferences best serve the ends of

inquiry. I will therefore focus on the task of inferring particle theories from given

evidence, and leave aside questions of how to design experiments for gathering

this evidence; we shall see that even with a naive, idealized view of the evidence

available to theorists, theorizing about particles is a complex and subtle induc-

tive task. The simplest possible interpretation of the evidence produced by the

lab is to take it at face value. So I assume that if a lab produces a report of

reactions l, then each reaction r listed in l actually occurred. Without loss of

generality, we may then take sequences of reaction reports l1; l2; :::; ln ; :: to be

monotone in the sense that the reactions from the n-th report are included in

4An empiricist might interpret this as indicating that particle physicists are content withempirically adequate theories; cf. Section 2.2 and [Van Fraassen 1980].

145

the n + 1-th report. This means that we may view our evidence simply as a

sequence of reactions. Thus a possible data stream � is simply an in�nite

sequence of reactions (i.e., a member of R!). I assume that experimental prac-

tice is sound, and I also assume that experimental practice is complete in the

limit. This means that a data stream � may be generated in a particle

world w just in case w = range(�). Figure 8.2 shows a particle world and

one of the data streams that may be generated in it.

A particle theorist conjectures a particle theory in response to evidence

about particle reactions. Formally, a particle theorist � is a function � : R� !

2W.

8.4 What Particles Are There?

Without further background knowledge, it is di�cult to �nd out what elemen-

tary particles exist in our world, even in the limit of inquiry. Let P = fp1; p2; :::g

be a set of particles, and let HP denote the hypothesis that the particles in P

are exactly the existing ones (i.e., HP = fw : particles(w) = Pg). I refer

to a proposition of this form as an ontological proposition. Let H be the

collection of all ontological propositions. Note that the alternatives in H are

mutually exclusive, so the problem of identifying the set of particles in our world

is a discovery problem in the sense of Chapter 4. Since there are uncountably

many alternatives in H (namely 2!), it follows from Proposition 4.2 that there

is no reliable solution for this discovery problem. Indeed, the demonic argu-

ment from Section 4.4 shows that it is impossible to reliably determine whether

there are in�nitely many elementary particles, much less identify exactly what

elementary particles there are. The demonic argument applies when we allow

the possibility that there may be an in�nite set P of particles, or any �nite

subset of P . Particle physicists seem to assume (or hope) that there are in fact

only �nitely many elementary particles in our world. We may formulate this

assumption as the proposition

FIN = fw : particles(w) is �niteg:

Given this assumption, it is a straightforward matter to reliably arrive at

the correct ontology of the particle world: Simply conjecture that the particles

observed so far are exactly the ones that exist. I call this inference rule the

Occam method. Thus we have the following result.

Fact 8.1 Let H be the set of alternative theories about what particles exist.

1. Without further background knowledge, it is impossible to reliably deter-mine what elementary particles exist. That is, the discovery problem forH has no reliable solution given the vacuous background knowledge W .

146

Figure 8.2: The Evidence that may arise in a Particle World

147

2. If we assume that there are only �nitely many particles, it is possible toreliably determine the ontology of the particle world.

The Occam procedure is a natural one for identifying the ontology of the

particle world; indeed, the criteria from Chapter 6 single it out as the most e�-

cient. To be precise, the Occam theorist is the only one to minimize convergence

time and at the same time avoid unnecessarily many mind changes about each

ontological proposition. In the terms of Section 7.3, the Occam rule is the only

method that is piecemeal data-minimal and minimaxes retractions with respect

to each ontological proposition in H consistent with the assumption FIN that

there are only �nitely many particles. The reason for this is essentially the same

as for the version of Occam's Razor in Section 6.4.

Proposition 8.2 Let H be the set of alternative theories about what particlesexist. Assume that there are only �nitely many particles (i.e., assume FIN

). Then the Occam theorist is the only theorist that reliably identi�es the trueontology of the particle world, is data-minimal and minimaxes retractions with

respect to each ontological proposition in H.

Of course it may be the case that physicists accept other background the-

ories that entail the existence of unobserved particles. 5 To take into account

background knowledge W , modify the Occam rule to conjecture that our world

contains exactly those particles whose existence is entailed by the available

evidence e and the theory W . The corresponding version of Proposition 8.2

holds|the modi�ed Occam rule is the only e�cient rule (in the sense of the

Proposition) for identifying the ontology of the particle world.

8.5 Identifying Subnuclear Reactions

Particle physics aims to �nd the reactions that are possible among a given set

of particles. It is trivial to piecemeal identify the physically possible reactions,

that is, to determine for each reaction r whether it is possible or not. 6 Figure

8.3 illustrates the empirical content of this hypothesis.

This hypothesis has the same topological structure as the Riddle of Induction

(see Figure 6.4) and the hypothesis that a given particle exists (see Figure

6.3). As with those discovery problems, the only rule that is data-minimal and

minimaxes retractions is the Occam-like or \closest-�t" rule: conjecture that

the reactions observed so far are exactly the possible ones.

To reliably settle on a complete theory of particle reactions, however, is a

di�cult inductive problem. This is true even if we have discovered all particles

5Famous historical examples of particles whose existence was predicted before their obser-vation are antiparticles, Yukawas pion, and the � particle.

6Assuming, as I do throughout this chapter, that all and only the possible reactions are

eventually observed.

148

Figure 8.3: Does reaction r occur?

149

that there are because a �nite number of particles may undergo an in�nite

number of transitions. Two particles su�ce to illustrate the di�culty. Consider

the family of reactions p+ p! p + p + �0 + �

0 + :::, a collision of two protons

producing some number of pions, which physicists observe in the laboratory.

Suppose that we allow that such a reaction can in principle produce any �nite

number of pions (say with the proviso that the kinetic energy of the colliding

protons must be su�ciently high 7). Then it is impossible to reliably �nd a

complete theory of the reactions among protons and pions. For let a theorist

� make an attempt to do so. Then a demon may arrange for the reactions

p + p ! p + p + �0; p + p ! p + p + �

0 + �0; ::: to occur until � theorizes that

any number of pions may be produced in a collision of two protons. Suppose

that at that time, the production of up to n pions has been observed. The

demon adds no more new reactions to the list, until the theorist conjectures

that the collision of two protons can produce at most n pions. Then the demon

resumes the reports of proton collisions that generate n+1; n+2; ::: pions, until

the theorist hypothesizes again that any number of pions may result from the

collision of two protons, etc. By the argument from Section 4.4, the theorist

fails on the data stream that results from this interplay.

This negative argument for local underdetermination raises the question of

how particle physicists might solve the task of identifying the set of physically

possible reactions. In practice, the answer is to appeal to a tradition according

to which physically possible reactions satisfy a set of conservation principles.

Physicists seem to assume that the language of conservation theories contains

a complete account of the particle reactions that we �nd in nature. I refer

to this assumption as the conservation hypothesis. We shall see that the

conservation hypothesis is a powerful antidote to local underdetermination.

8.6 Conservation Laws in Particle Physics

Roughly, two classes of conservation laws are thought to govern subatomic reac-

tions: the classical conservation laws such as conservation of energy, momentum,

angular momentumand charge, and the conservation of so-called quantum prop-

erties, namely baryon, electron, muon, and lepton number. An integer value for

each quantum property is assigned to every elementary particle. Table 8.2 shows

the values of these quantities that scientists have assigned to various particles.8

Physicists report that so far \all events have been found to be consistent

with the conservation of" these quantities [Cooper 1992, p.456]. (Strangeness

is a quantum property that is not conserved in all physically possible reactions.

Rather, the conservation of strangeness is connected with the time that it takes

for a transition to occur; see for example [Feynman 1965, p.68]. Since in this

7I neglect absolute bounds on the speed of protons such as the speed of light.8The table is from [Omnes 1971].

150

Particle Charge B.N. L.N. E.N. M.N. Strangeness Hypercharge

1 �� -1 1 0 0 0 -2 -1

2 ��+ 1 -1 0 0 0 2 1

3 �0 0 1 0 0 0 -2 -1

4 ��0 0 -1 0 0 0 2 1

5 �� -1 1 0 0 0 -1 0

6 ��+ 1 -1 0 0 0 1 0

7 �0 0 1 0 0 0 -1 0

8 ��0 0 -1 0 0 0 1 0

9 �+ 1 1 0 0 0 -1 0

10 �� -1 -1 0 0 0 1 0

11 �0 0 1 0 0 0 -1 0

12 ��0 0 -1 0 0 0 1 0

13 n 0 1 0 0 0 0 1

14 �n 0 -1 0 0 0 0 -1

15 p 1 1 0 0 0 0 1

16 �p -1 -1 0 0 0 0 1

17 K0 0 0 0 0 0 1 1

18 �K0 0 0 0 0 0 -1 -1

19 K+ 1 0 0 0 0 1 1

20 K� -1 0 0 0 0 -1 -1

21 �+ 1 0 0 0 0 0 0

22 �� -1 0 0 0 0 0 0

23 �0 0 0 0 0 0 0 0

24 0 0 0 0 0 0 0

25 �� -1 0 1 0 1 0 0

26 �+ 1 0 -1 0 -1 0 0

27 e� -1 0 1 1 0 0 0

28 e+ 1 0 -1 -1 0 0 0

29 �e 0 0 1 1 0 0 0

30 ��e 0 0 -1 -1 0 0 0

32 �� 0 0 1 0 1 0 0

31 �� 0 0 -1 -1 0 0 0

Table 8.2: Quantum Number Assignments

151

study I don't consider transition times, I will leave strangeness aside.) Rules

that posit the conservation of a quantum property in a particle reaction are

called selection rules. For an example of a selection rule, consider the fact

that the proton p is stable. None of the classical conservation laws rules out,

say, the decays p! e+ + (a proton decays into a positron and emits light) or

p! �++�

0 (a proton decays into a positively charged and a neutral pion). As

a standard text has it, \along lines made venerable by tradition, we explain this

by saying that the proton has a certain inherent property which is conserved"

[Omnes 1971, p.36].

Physicists have come to call the property in question \baryon number", or

\heavy particle number". As Table 8.2 shows, the decays mentioned do not

conserve baryon number (B.N.). That is, the baryon number of the decaying

particle, 1, is not the same as the sum of the baryon numbers of the products,

0. A selection rule describes a quantum property and asserts that all physically

possible reactions conserve it; that is, that the sum of the values of the quantum

property among the reagent(s) is the same as the sum of the values of the

quantum property among the product(s).

For decays, conservation of energy entails that the mass of the decaying

particle must not be smaller than the sum of the masses of the particles into

which it decays. For example, in the decay �0! �0 + �

0, the mass of the �0

is 1311 MeV (see Table 8.1), whereas the sum of the masses of the products is

1192 MeV + 135 MeV = 1327 MeV. So this decay is not possible. In collisions,

however, it is possible that the kinetic energy of the colliding particles is high

enough to permit the reaction to take place even when the combined mass of the

products is higher than those of the reagents. For example, when two protons

collide to produce a pion (in arrow notation, p+p! p+p+�), the energy count

balances because at least one of the protons loses momentum in the collision.

Notice that the only way in which the reaction p+ p! p+ p+ � can conserve

quantum properties is if the pion � carries 0 of all quantum properties. This in

turn implies that all reactions of the form p+ p! p + p + � + � + ::: conserve

all quantum properties: if the collision of two protons can produce one pion, it

can produce any number, as far as selection rules are concerned.

Mass and charge are assigned to each particle as soon as it is discovered, and

thus conservation of energy and charge immediately rule out reactions among

identi�ed particles. The inference problem is to assign values for quantum prop-

erties other than those of mass and charge. To simplify my analysis, I will ne-

glect conservation of mass and charge, and consider only how to account for a

given set of observed reactions by assigning quantum properties to the particles

involved. As [Valdes and Erdmann 1994] showed, we can apply useful concepts

from linear algebra to this question if we represent reactions as vectors, in the

following manner.

Let p1; p2; : : : ; pn;::: be a (possibly in�nite) set of elementary particles. For

a given reaction r, let r(pi) = a(pi) � p(pi) be the number of times particle

pi occurs among the reagents minus the number of times it occurs among the

152

products of the reaction; r(pi) is called the net occurrence of particle pi in

reaction r. For example, consider the particles �; e; ; �; p; p. Let d be the decay

� ! e + � + �. Then d(�) = 1; d(e) = �1; d(v) = �2; d(p) = 0; d(p) = 0.

For the reaction r: p + p ! p + p + p + p, we have r(�) = 0; r(e) = 0; r(v) =

0; r(p) = �1; r(p) = �1. Hence with each reaction r there is associated an

in�nite-dimensional vector r = (r(p1); r(p2); : : : ; r(pn); : : :), all but �nitely many

of whose entries are 0. In what follows, I identify reactions with the vectors

encoding them, so that the set of possible reactions R is now the set of vectors

with integral components. Note that the same vector might result from two

di�erent reactions. For example p+ p! p+ p+� and p! p+� have the same

net occurrences of protons and pions. However, two reactions with the same

encoding conserve exactly the same quantum properties, so from the point of

view of selection rules, we do not need to distinguish among them.

In what follows, I mainly consider the problem of inferring selection rules

for a �nite set of n observed particles. In that case we may take reactions to be

n-dimensional vectors with integral components, that is, the space of logically

possible reactions is !n. I view !

n as a subset of Qn, the vector space of

n-dimensional vectors with rational components and the rationals Q as scalars.

As is evident from Table 8.2, we may think of a quantum property q as

an in�nite-dimensional vector whose entries q(pi) are integers that represent

the value of the property for particle pi. If a reaction r conserves quantum

property q, then summing the value of the quantum property for each particle

times its net occurrence must yield 0. That is, q � r = 0, where � denotes

the standard dot or inner product of two vectors. A conservation theory

Q is a matrix of quantum properties, whose rows are the quantum properties

postulated by Q (so Table 8.2 features a 31 � 7 conservation theory). If Q is

�nite, with columns c1; c2; :::; cn ; I say that the particles of Q are the particles

p1; p2; :::; pn |denoted by particles(Q)|and that Q is for p1; p2; :::; pn . If Q is

in�nite, particles(Q) is the set of positive natural numbers.

A reaction r is physically possible according to a conservation theory Q()

q � r = 0 for each row q of Q, that is, r is physically possible if and only

if r conserves all the quantum properties listed in Q. Using the de�nition

of matrix multiplication, we have that r is physically possible according to

Q () [Q]r = 0, where 0 is the zero vector whose dimension is the number

of quantum properties postulated by Q, that is the number of rows in Q. For

a matrix Q, the set of vectors mapped by Q to 0 is called the kernel of Q

and de�ned by ker(Q) = fr : [Q]r = 0g. So the physically possible reactions

according to Q are the logically possible reactions in the kernel of Q, that

is, ker(Q) \ R. Thus we may state the conservation hypothesis Conserve|

the proposition that some conservation theory describes precisely all physically

possible reactions|as follows.

Conserve = fw:w = ker(Q) for some conservation theory Q s.t. particles(Q) = particles(w):g

153

Philosophers and physicists such as Poincar�e and Feynman have consid-

ered the testability of conservation principles [Poincare 52], [Feynman 1965].

( [Kelly et al. 1997] outlines a learning-theoretic perspective on this debate.)

With regard to Conserve; the conservation hypothesis for quantum properties,

it is not possible to reliably decide the truth of the proposition even in the limit.

The simplest counterexample is again the situation in which a collision of two

protons may produce any number of pions (p+p! p+p+�+�+ :::). Suppose

we observe the transition p+ p! p+ p+�, and that the proton p and the pion

� are the only elementary particles. Then the conservation hypothesis is true

just in case we observe all reactions of the form p + p ! p + p + � + � + :::,

for any �nite number of pions. As we saw at the beginning of this section, it is

impossible to decide the hypothesis that a collision of protons may produce any

number of pions without further background knowledge. Hence it is impossible

to reliably settle the truth of the conservation hypothesis, even in the limit of

inquiry.

The fact that there is no reliable test of the conservation principles does not

mean that the conservation principle is a \convention" about the use or meaning

of terms without empirical content. On the contrary, Conserve is inconsistent

with many logically possible particle worlds. For example, if there are n parti-

cles p1; p2; :::; pn in the world and we observe n linearly independent reactions

among them (viewing the reactions as vectors in the encoding described above),

then the conservation hypothesis degenerates into triviality: all reactions are

physically possible. This is because a reaction r de�nes a homogeneous linear

equation of the form r(p1)q(p1) + r(p2)q(p2) + ::: + r(pn)q(pn) = 0, where the

net occurrences of particle pi in r are the given coe�cients and the values of

quantum property q for each particle pi are the unknowns; see Figure 8.4.

If a quantum property q is conserved in a reaction r, then q is a solution

to the linear equation that r de�nes. A basic theorem of linear algebra states

that the only solution to n linearly independent equations in n unknowns is

the trivial solution that gives the value 0 to each unknown. But if all particles

have 0 of each quantum property, then all logically possible transitions among

them conserve all quantum properties, and the conservation hypothesis implies

that all logically possible transitions are possible (which is false as far as we can

tell|for example, the decay �+ ! �+ + �

0 + �0 has never been observed.)

8.7 Inferring Conservation Theories Without Vir-

tual Particles

Assuming the conservation hypothesis makes the assignment of quantum num-

bers much easier. More precisely, the conservation principle constrains the pos-

sible alternative theories of particle reactions. The results of this and the next

section show exactly how much this constraint reduces the complexity of the

154

Figure 8.4: A Set of Reactions, encoded as Vectors with associated Linear Equa-

tions.

inductive problem|complexity being measured on the scale of feasibility that

I have developed in previous chapters.

Some subtleties arise if we allow conservation theories to posit the existence

of \virtual" undetectable particles to balance the count of quantum properties in

some transitions. I �rst consider the simpler case in which conservation theories

are restricted to observable particles.

Let a �nite set of n observable particles, and some �nite setE = fe1; e2; :::; ekg

of observed reactions be given. As we saw in Section 8.5, minimaxing retractions

and data-minimality piecemeal with respect to each reaction requires choosing

the conservation theory with the \closest" �t to the reactions reported in E:

Choose a conservation theoryQ that is consistent with all the observed reactions

in E, but minimize the number of unobserved reactions consistent with Q. If we

restrict our theory to the n particles discovered so far, there is a unique conserva-

tion theory Q that meets these requirements. First, I observe that all reactions

that are linear combinations of the observed reactions in E must conserve all

quantum properties that the reactions in E conserve. For let r1 and r2 be two

(vectors encoding) reactions in E. If a conservation theory Q is consistent with

r1 and r2; we have that [Q]r1 = [Q]r2 = 0: By the linearity of matrix multipli-

cation, [Q](a1r1 + a2r2) = [Q](a1r1) + [Q](a2r2) = a1([Q]r1) + a2([Q]r2) = 0:

So all �nite linear combinations of observed reactions are consistent with any

conservation theory Q that is consistent with the observed reactions. To state

this fact formally, let span(E) be the set of linear combinations of reactions in

E (i.e., span(E) = fr : r = �jEj

i=1aiei, for arbitrary rationals ai:g:)

Fact 8.3 Let E be a �nite set of observed reactions, and let Q be a conservation

155

theory consistent with E (i.e., E � ker(Q)). Then all linear combinations ofreactions in E are consistent with Q (i.e., span(E) � ker(Q)).

Conversely, for a given list of reactions E featuring n observed particles,

there is a conservation theory that allows no reactions outside of the span of E:

we may simply choose a basis for the span of E, which is itself a linear subspace

of Qn. Since we may pick the basis vectors from E itself, it is clear that we can

choose quantum properties with integral values for each particle.

Fact 8.4 Let E be a �nite set of observed reactions. Then there is a conserva-

tion theory Q for the n observed particles in E such that Q is consistent withall and only linear combinations of the reactions in E (i.e., ker(Q) = span(E)).Moreover, the quantum properties in Q can be chosen to have integral values

only.

Fact 8.4 yields an algorithm � for inferring selection rules: For a given list

of reactions e, choose a basis for the span of e with integral components. This

procedure is the only one (up to choices of basis) that minimaxes retractions and

is data-minimal with respect to deciding whether a given reaction among the

n observed particles is possible or not. Assuming the conservation hypothesis,

the procedure is reliable and in fact identi�es a complete conservation theory

of reactions among the n observed particles with at most n mind changes. For

our algorithm � changes its mind after a list e � r only if r is not a linear

combination of the reactions in e. But this means that r is linearly independent

of the reactions in e. The rank of a set of vectors V is the maximum number of

linearly independent reactions in V , written rank(V ); similarly, rank(Q) is the

maximum number of linearly independent rows in a matrix Q. So if � changes

its mind on e � r, then rank(range(e) [ r) = rank(range(e)) + 1. That is,

with each mind change the rank of the observed reactions increases. Recall that

we may view the observed reactions as linear equations in n unknowns, with

each quantum property that is consistent with the observed reactions being a

solution to the equations. Then by a standard theorem of linear algebra, we

have that rank(E) + dim(S) = n, where S is the space of solutions to the

equations (reactions) in a set of reactions E, and dim(S) is the dimension of the

solution space, the size of the largest set of linearly independent solutions. If

a conservation theory Q is consistent with the equations (reactions) in E, then

each row in Q is a solution to the equations in E. Thus it follows that

rank(E) + rank(Q) = n

whenever Q is consistent with E. So every time that our procedure � changes

its mind on data e � r, the rank of its new conservation theory �(e � r) is one

lower than the rank of its previous conservation theory �(e). Since the highest

possible rank of a conservation theory for n particles is n and the lowest possible

one is 0, � changes its mind at most n times. To see that, given the conservation

156

hypothesisConserve, � is a reliable procedure for discovering theories of particle

reactions, note that on any data stream � consistent with Conserve, there is a

�nite time t after which span(�jt) = range(�). This is true again because every

time the span of the observed reactions increases, so does its dimensionality,

which cannot be greater than the dim(Qn) = n.

These results illustrate the power of the conservation hypothesis. With-

out this assumption, it is impossible to reliably identify a complete theory of

particle reactions among just two observable particles, even with an unlimited

number of mind changes (see Section 8.6). With the conservation hypothesis

in place, theorists can solve this problem with at most two mind changes. The

inference task becomes more complex if we allow the possibility of undetectable

particles. The next section shows that the goal of avoiding unnecessarily many

retractions|selecting the \closest �t" to the observed reactions|can require

theorists to introduce such virtual particles.

8.8 Inferring Conservation Theories With Vir-

tual Particles

Introducing undetectable particles allows the theorist to reinterpret a laboratory

report by asserting that the report only included detectable particles, while the

reaction that actually took place involved undetectable particles. For example,

experimentalists observed the decay of a neutron into a proton and an electron,

n ! p+ + e

�. This process fails to conserve energy. Physicists hypothesized

that there was another particle balancing the energy count, an anti-neutrino,

such that the neutron actually decays into three particles n! p+ + e

� + ��e.

To emphasize the distinction between what was directly observed and what

reaction actually occurred, I say that the transition n! p++e

� was reported

when the reaction n! p+ + e

� + ��e took place.

Conservation of momentum implies the presence of some hidden particles as

well. For example, Figure 8.5 illustrates the tracks that the decay of the pion

into a muon leaves in the bubble chamber [Cooper 1992, p.445].

Here conservation of momentum suggests the presence of a chargeless parti-

cle, say a neutrino. But the evidence does not determine the quantum numbers

of this particle; hence a theorist may say that it is a new kind of neutrino,

with muon number 1. Indeed, physicists take the decay of the pion to be

�+! �

+ + ��. This suggests a di�erent model for introducing undetectable

particles, where the evidence includes information about whether or not a par-

ticle without tracks was present, but the theorist decides whether or not it is

a new kind of particle and what its properties are. Indeed, the theorist may

make conjectures about exactly how many virtual particles were produced. For

example, it is thought that the decay of the muon produces a neutrino and an

anti-neutrino: �� ! e� + �� + �e.

157

Figure 8.5: Track of the Decay of a Pion into a Muon

158

I call this model of evidence for particle reactions the constrained virtualparticle model. The constrainedmodel seems closer to practice than allowing the

theorist to posit undetectable particles without constraints from the evidence.

However, the unconstrained model is simpler to analyze, and most results carry

over to the constrained scenario. I leave the analysis of inferring selection rules

in the constrained scenario open for future work.

Given a set of detectable particles D, the visible part of a reaction r is de-

noted by rjD, and de�ned by agents(rjD) = agents(r)\D, and products(rjD) =

products(r)\ D. In terms of our encoding of reactions by vectors, the visible

part of a reaction is its orthogonal projection onto the detectable particles. For

a set of reactions R;RjD denotes the visible parts of the reactions in R (i.e.,

RjD = frjD : r 2 Rg . (I assume that if physicists can track a particle in a

reaction r, then they can �nd it in another reaction r0.) Let T be a particle

theory according to which particles P exist, and let D � P be the detectable

particles in P . Then the empirical content of T is T jD . 9

Now we are ready to characterize the di�erence between conservation the-

ories with and without virtual particles. The di�erence is that whereas the

conservation hypothesis commits theories without virtual particles to allowing

all linear combinations of observed reactions, it commits theories with virtual

particles only to allowing the linear combinations of observed reactions with in-tegral coe�cients. Before I prove this result in general, it is helpful to illustrate

the phenomenon with an example.

Suppose that the transitionsK ! � and K+K ! K+K+�+� have been

observed. Then if we don't introduce further particles, all transitions involving

the K and � particle are possible: Let K = p1 and � = p2, and encode the �rst

reaction as (1,-1), and the second as (0,-2). Since (1,-1) and (0,-2) are linearly

independent, they form a basis for Q2 and hence their span is Q2. So by Fact

8.3, any conservation theory for K and � that is consistent with the observation

is consistent with all transitions among K and �. A direct way to see this is to

note that if K +K ! K +K + �+ � conserves all quantum properties, then �

must carry 0 of every property. So K must carry 0 of every property as well if

K ! � is possible.

Now let's hypothesize that during the transition K +K ! K +K + �+� a

neutrino � was present, and the reaction that actually took place was K+K !

K + K + � + � + v. Then if we introduce a quantum property q and assign

q(K) = 1;q(�) = 1;q(v) = 2, we obtain a non-trivial conservation theory that

is consistent with K ! � and K + K ! K + K + � + �. To see that q is

non-trivial, notice that the transition K +K ! � is not possible, because the

reactionsK+K ! �+v+ :::+v do not conserve q, for any number of neutrinos

�. The encoding of K +K ! � with only K and � as the particles involved

is (2;�1) = 2(1;�1)� 12(0;�2). So (the encoding of) K +K ! � is a linear

9Strictly speaking, at this point we should enrich the notion of a possible particle world to

include information about which particles are and are not observable.

159

combination of (the encodings of) K ! � and K+K ! K+K+�+�, but nota linear combination with integral coe�cients. On the other hand, the reaction

K +K ! K +K + �+ �+ �+ �+ v + v conserves q and hence the transition

K+K ! K+K+�+�+�+�may be observed, according to the conservation

theory q. The encoding of the transition K + K ! K + K + � + � + � + �

is (0,-4) = 2(0,-2), which is the encoding of K + K ! K + K + � + �. In

general, if a conservation theory Q is consistent with a given set of observed

transitions, then Q allows linear combinations with integral coe�cients of the

observed transitions. This is because multiple neutrinos may appear among the

products of reactions, but not \fractional" neutrinos|it is the essence of the

particle concept that particles are whole, discrete entities.

The next proposition states formally that under the conservation hypothesis,

any linear combination of possible transitions with integral coe�cients is also

possible. Let V = fv1;v2; :::;vng be a �nite set of vectors, and let the integral

span of V be the linear combinations of vectors in V with integral coe�cients.

I write int � span(V ) for the integral span of V ; so int � span(V ) = fv : v =

�ni=1kivi, where each coe�cient ki is an integer.g:

Proposition 8.5 Let E be a �nite set of observed transitions, and let Q be anyconservation theory consistent with E (i.e., E � ker(Q)jD, where D are the

(detectable) particles involved in the transitions of E). Then Q is consistent withall transitions in the integral span of the transitions in E (i.e., int� span(E) �

ker(Q)jD).

So the conservation hypothesis commits a particle theorist to the integral

span of observed transitions. Is there a way of ruling out transitions beyond

this, as minimaxing retractions requires? It turns out that there is: It is always

possible to �nd a conservation theory whose empirical content is exactly the

integral span of the observed transitions. It is easy enough to say how to �nd

such a theory. The task is trivial for the empty set of observed transitions with

empty integral span (for example, we can introduce a quantum property qp for

each particle p, such that only p carries a non-zero value of q). Suppose, induc-

tively, that we have a set of transitions E and a conservation theory Q (possibly

involving virtual particles) whose empirical content is exactly the integral span

of E. If a new transition t is observed that contradicts Q, we introduce a new

undetectable particle � and postulate that � was among the products of the

reaction that gave rise to the observation of t. Then for each quantum property

q in Q that t fails to conserve, we assign to � just enough of q to balance the

q count. It is clear that this will produce a new conservation theory consistent

with t; it's not obvious but nonetheless true that the new theory allows only

the transitions in the integral span of e [ t: (Section 8.10 gives the proof.)

Proposition 8.6 Let E be a sequence of observed transitions. Then there isa conservation theory Q whose empirical content is exactly the integral span of

160

the reactions in E (i.e., ker(Q)jD \R = int � span(E), where D is the set of(detectable) particles involved in the transitions of E).

I am indebted to Jesse Hughes for suggesting Propositions 8.5 and 8.6 and

outlines of their proofs.

So the di�erence between what conservation theories with and without vir-

tual particles can express amounts to the di�erence between the span and the

integral span of a set of observed transitions. It is worth getting clearer about

this di�erence. What transitions exactly are in the span but not the integral

span of a set of transitions e? As the examples above suggest, the answer is

that they are the transitions that are fractions , but not integral multiples, oftransitions in the integral span of e.

Say that a vector v is a proper fraction of another vector v0 if v = v0=m, for

some integerm such that jmj > 1. Given a set �nite set V = v1;v2; :::;vn of vec-

tors, I denote the set of proper fractions of V by fractions(V ); so fractions(V ) =

fv : v is a proper fraction of some vector v0 in V:g. Then we have:

Fact 8.7 Let E be a �nite set of reactions. Then a reaction r is in the span ofE ()some multiple r0 of r is in the integral span of E. That is, span(E) \R =fractions(int� span(E)).

The next question is how di�cult it is to identify a complete empirically

adequate conservation theory for transitions among n detectable particles. Al-

though it might appear as though allowing the particle theorist to introduce

virtual particles should make her task easier, Propositions 8.5 and 8.6 make

clear that she has in fact more empirical possibilities to contend with, since she

can no longer assume that all particles involved in a reaction will be reported.

Weakening her background knowledge in this way has signi�cant methodological

consequences: It is no longer possible to reliably identify a complete empirically

adequate conservation theory in the limit of inquiry, even assuming the conser-

vation principle. To see this, let � be the Occam theorist who always chooses

the closest possible �t to the observed reactions (de�ned in the proof of Propo-

sition 8.6 from Section 8.10). Given the conservation principle, this procedure

settles on an empirically adequate theory if it stops changing its mind. Will

it stop changing its mind? There are two cases in which � changes its theory

on evidence e � t: (1) The transition t is outside the span of the transitions in

e, and (2) t is in the span of e, but not in the integral span of e. If we have

n observable particles, the �rst case cannot occur more than n times, for the

same reason as in Section 8.7. (More precisely, if we observe n linearly inde-

pendent transitions among n detectable particles, then any conservation theory

consistent with the observations permits all transitions among the detectable

particles; see Section 8.9.)

However, if we allow that there may be in�nitely many virtual particles,

� may never stop changing its conservation theories, even if experimentalists

161

have found all the observable particles. Indeed, using the fact that there is no

greatest prime number, we can give a demonic argument to prove that no theo-

rist can reliably identify an empirically adequate conservation theory of particle

reactions. The di�culty arises with only two detectable particles. Suppose we

have observed a transition of the form (0,-n) (and nothing else). Let b be any

prime number greater than n, and consider the transition (0;�b) = b

n(0;�n).

So (0,-b) is in the span of (0,-n) but not in the integral span since b is a prime

number and therefore there is no integer m such that m(0;�n) = (0;�b). Thus

we can construct the by now familiar demonic argument against any theorist

who aspires to �nd an empirically adequate theory of transitions among our

two particles: Present (0;�n); and nothing else until conjectures that the

integral span of (0;�n) contains all possible transitions. Pick the �rst prime b1greater than n, and present (0;�b1). Present nothing else until theorizes that

the integral span of f(0;�n); (0;�b1)g contains all possible transitions. Pick

the next greatest prime b2; present (0;�b2), which is in the span but not the

integral span of f(0;�n); (0; b1)g, etc.

However, if we assume that there are only �nitely many particles|the propo-

sition FIN from Section 8.4|the demon can lead the Occam theorist � to in-

troduce new \virtual" particles only �nitely often. For let � be the stream of

observed transitions, and let Q be a conservation theory for observable and

hidden particles that is empirically adequate for �. Call a particle neutral ifit carries 0 of every quantum property in Q. Without loss of generality, we

may suppose that no hidden particle is neutral (for if a hidden particle has 0

of each quantum property, it is super uous, at least from the point of view of

selection rules). Since no transition in � features more than two reagents, there

is a maximum number of times that a non-neutral particle appears in a reac-

tion allowed by Q. (Otherwise the quantum properties of the hidden particles

would outweigh those of the reagents in some reaction.) This in turn implies

that we can choose a �nite set of reactions B allowed by Q that determine the

empirical content of Q, that is, range(�) = int � span(B)jD \R, where D are

the observable particles in � (i.e., D = particles(range(�))). So after the �nitely

many transitions in BjD are observed along � , the Occam theorist � converges

to an empirically adequate conservation theory.

In sum, the conservation principle Conserve, together with the assumption

that there are only �nitely many particles|that is, Conserve\FIN implies that

the Occam theorist � will converge to an empirically adequate theory of particle

reactions|but we don't know how many virtual particles � may introduce along

the way.

162

8.9 Parsimony, Conservatism and the Number

of Quantum Properties

The methods that I've considered so far are conservative in the sense that the

never eliminate quantum properties or virtual particles, but accommodate new

evidence by postulating the presence of new virtual particles. However, it may

well be possible to �nd more parsimonious theories as new evidence comes in.

For example, a new transition t that is linearly independent of the previous

ones decreases the dimension of the space of conserved quantum properties by

one. For let Q be the previous conservation theory, and let r be a reaction

with visible part t that we posit to account for t. Then there is no set of other

reactions R consistent with Q such that span(R) includes r; otherwise t is in

the span of the visible parts of R, contrary to the assumption that no linear

combination of transitions allowed by Q yields t.

So when a previously prohibited transition occurs, there may be an opportu-

nity to eliminate a quantum property, a virtual particle, or both. If there have

been such opportunities for parsimony, physicists have passed them by and cho-

sen to avoid retractions by extending their theories conservatively. It would

be an interesting project to investigate whether there is a conservation theory

with the same empirical content as the current one but with fewer quantum

properties or fewer virtual particles. Could physicists have avoided introducing

two kinds of neutrinos?

Answering this kind of question would require a computer program to sift

through the masses of data that have been accumulated on particles and particle

reactions. The program should start by considering transitions with no proper

fractions because Fact 8.7 tells us that we can account for these transitions

without virtual particles. Thus transitions whose encodings feature a 1 or -1

are of particular interest. Fortunately, such transitions abound: In a decay,

conservation of energy entails that the decaying particle does not appear among

the products, so the decaying particle occurs a net times of 1 in the decay. Thus

the more linearly independent decays we �nd, the more constraints we obtain

on quantum properties, without having to consider virtual particles.

Recall that there cannot be more linearly independent quantum properties

than jP j � rank(Decays), where jP j is the number of observable particles and

rank(E) is the number of linearly independent decays. Without loss of gen-

erality, we may assume that all quantum properties are linearly independent,

because conservation theories whose quantum properties have the same span

permit exactly the same reactions. So jP j � rank(Decays) = jQj, where jQj is

the number of quantum properties assigned by a conservation theory consistent

with the observed decays. Introducing virtual particles to increase the number

of total particles allows more quantum properties than jP j�rank(Decays), but

the resulting conservation theory is empirically trivial.

How many linearly independent decays can we expect to �nd? The next

163

theorem says: a lot. To be precise, under a mild physical assumption, the num-

ber of linearly independent decays is at least as great as the number of distinct

particles that decay. Physicists refer to particles that decay as unstable. So

we have that rank(Decays) � Unstable, where Unstable is the number of sta-

ble particles. Using the previous equality jP j � rank(Decays) = jQj, we have

that jP j � Unstable � jQj, which means that Stable � jQj, where Stable is the

number of stable particles. Thus under the physical assumption in question,

a conservation theory cannot introduce more (linearly independent) quantum

properties than there are stable particles|a striking connection between quan-

tum properties and the number of stable particles in the world. In particular,

this shows that the conservation hypothesis implies, on pain of triviality, that

there are stable particles in the world. For the task of �nding minimal conser-

vation theories, this result means that if we begin by examining decays, we have

to consider no more quantum properties than the number of stable particles,

which is very small compared to the total number of particles.

The physical assumption in question is that we cannot set up a \decay

pump": A situation in which we start with a particle p, and regain p through

a series of decays of p and its products. Under conservation of energy, decay

pumps should not be possible because each decay leads to a more favorable state

(products with less mass). 10 I refer to the condition that decay pumps do not

occur as the irreversibility condition. To de�ne the irreversibility condition, it

is useful to introduce the notion of a decay tree.

De�nition 8.1 Let D be a set of decays, involving a set of particles P . The

decay tree for particle p 2 P given D is de�ned as follows.

1. Each node of the tree is labelled with a single particle.

2. The root of the tree is labelled with p.

3. If q is stable in D (i.e., for all decays d 2 D; q 62 agents(d)), then allnodes labelled with q are terminal nodes.

4. If a node v labelled with q is not stable in D, the children of v are labelledwith the products of decays of q in D (i.e., the labels of the children of q

form the set fproducts(d) : d 2 D; fqg = agents(d)g ).

The irreversibility condition asserts that in a decay tree for a particle p , no

node other than the root is labelled with p.

De�nition 8.2 A set of decays D satis�es the irreversibility condition ()

for each particle p involved in D, the only node in the decay tree for p given D

labelled with p is the root.

10I leave a rigorous proof of this fact for future work.

164

Now we can state the following theorem: Let a number of decays D be given

in which a �nite number of unstable particles p1; p2; ::; pn decay. (D may also

involve stable particles). Suppose that the decays in D satisfy the irreversibility

condition. Then if we choose one decay di for each particle pi, the decays di are

linearly independent of each other.

There is no short informal argument for why this is true in general (Sec-

tion 8.10 has the formal proof, which involves the pigeon hole principle), but

considering the case of two decays will illustrate the phenomenon. Let p1 and

p2 be two distinct unstable particles with decays d1 and d2; let's suppose that

there is one other stable particle p3. The encodings of d1 and d2 are of the

form d1 = (1;�m1;�n1) and d2 = (�m2; 1;�n2), where m1;m2; n1; n2 are

natural numbers. If m2 = 0, then clearly d1 and d2 are linearly independent.

If m2 > 0, then p1 appears among the products of d2, that is, d2 is of the form

p2 ! p1 + ::: Thus the irreversibility condition implies that p2 does not appear

among the products of d1; that is, m1 = 0. But then d1 = (1; 0;�n1) is clearly

not collinear with d2 = (�m1; 1;�n2).

Theorem 8.8 Let D = fd1; d2; :::; dng be a �nite set of decays of distinctparticles p1; p2; :::; pn (i.e., if i 6= j, then pi 6= pj ). If D satis�es the ir-reversibility condition, then the decays in D are linearly independent, that is,

rank(fd1;d2; :::;dng) = n.

The good news in Theorem 8.8 is that the number of quantum properties

is bounded by the number of stable particles, which shows that conservation

theories are a succinct way of organizing the mass of data about particles and

their reactions. But it is surprising that nature should be so cooperative as to

�t a formalism that we happen to �nd congenial, 11 namely that of selection

rules. 12

One would like an explanation for why the number of linearly independent

decays that occur in nature, and linearly independent reactions in general, grows

immensely with each unstable particle, but then stops increasing to leave room

for the relatively small number of stable particles. Indeed, we may have some

doubts as whether the conservation hypothesis is in fact true|perhaps among

the mass of reactions, there is one that violates one of the current selection rules

11or as Feynman put it, selection rules are easy to \guess". [Feynman 1965, p.67]:\Thereason why we make these tables [of quantum properties] is that we are trying to guess at the

laws of nuclear interaction, and this is one of the quick ways of guessing at nature."12One would have more con�dence in a given quantum property or a virtual particle if we

could relate it to other physical phenomena. Indeed, physicists try to do just that. Thus

Feynman points out that physicists have made (unsuccessful) attempts to interpret baryons

as the source of a �eld, in analogy with charge. Similarly, after introducing a virtual particle,they try to relate it to other phenomena, and of course to detect it. \You might say that the

only reason for the anti-neutrino is to make the conservation of energy right. But it makesa lot of other things right, like the conservation of momentum and other conservation laws,

and very recently it has been directly demonstrated that such neutrinos do indeed exist".[Feynman 1965, p.76]

165

but hasn't been noticed by physicists, or more likely, just hasn't been observed

yet. A way of rigorously testing the conservation hypothesis would be to choose

quantum properties that account for the known decays of unstable particles|

a fairly small number by Theorem 8.8|then systematically generate reactions

that would violate them (that is, that are linearly independent of the decays

already accounted for) and check whether these have been observed or could be

produced in the laboratory.

8.10 Proofs

Fact 8.1 Let H be the set of alternative theories about what particles exist.

1. Without further background knowledge, it is impossible to reliably deter-

mine what elementary particles exist. That is, the discovery problem forH has no reliable solution given the vacuous background knowledge W .

2. If we assume that there are only �nitely many particles, it is possible toreliably determine the ontology of the particle world.

Proof. Part 1: Let � be a particle theorist who aspires to �nding the set of

actually existing elementary particles. An inductive kind of Cartesian demon

may prevent � from achieving its goal as follows. Let H be the set of particles

that � predicts exist before obtaining any evidence (i.e., H = particles(�(;)).

There are two cases: EitherH is �nite or in�nite. If the theorist initially predicts

that there are only �nitely many elementary particles, then the demon presents a

reaction with a new particle not in H. The demon presents no further reactions

until � changes its mind from its initial guess �(;) to a di�erent theory. Secondly,

if the theorist initially predicts that there are in�nitely many particles, then the

demon arranges a set E of reactions that feature only �nitely many particles,

and keeps presenting E until � changes its mind. If � changes its mind from �(;)

to another theory T , the demon chooses the next set of observed reactions in

the same way, depending on whether particles(T ) is �nite or in�nite. Let � be

the sequence of reactions that results from this interplay between the theorist

� and the demon. If � aspires to eventually determining what particles exist,

�'s theories must eventually entail a proposition H about the actually existing

particles along �. If in�nitely many elementary particles exist according to H,

the demon allows only �nitely many particles to occur along �, and so � stabilizes

to a false theory. If H asserts that only some �nite number n of elementary

particles exist, the number of particles that are involved in the reactions in �,

i.e. jparticles(range(�))j is in fact n + 1, so again � is again mistaken about

what elementary particles exist.

Part 2: Simply guess that the observed particles are exactly the ones seen so

far. If there are only �nitely many particles, our assumptions about the evidence

entail that eventually they will be observed. Then this procedure stabilizes to

the correct ontology.2

166

Proposition 8.2 Let H be the set of alternative theories about what particlesexist. Assume that there are only �nitely many particles (i.e., assume FIN

). Then the Occam theorist is the only theorist that reliably identi�es the true

ontology of the particle world, is data-minimal and minimaxes retractions withrespect to each ontological proposition in H.

Proof. If a theorist satis�es the Occam rule, that is, if �(e) = particles(range(e)),

she settles the truth about each ontological propositionH with at most two mind

changes: Her conjectures entail thatH is false until the existence of all and only

the particles inH is entailed by the evidence report. If particles(range(e)) = H,

the theorist might change her mind once so that H is entailed by the theorist's

conjecture. If then the existence of another particle ruled out by H is entailed

by the evidence and the theorist's background assumptions, H is conclusively

refuted, and the Occam theorist rejects H forever after another mind change.

Next suppose that � is reliable, data-minimal and minimaxes mind changes

with respect to what particles exist. Since � is data-minimal, � must take its con-

jectures seriously by Proposition 6.1. Hence particles(�(e))� particles(range(e)).

Suppose that on some evidence report e; �(e) predicts that some particle p exists

whose existence is not entailed by the evidence (and the applicable background

assumptions). Then the demon repeats the reactions in e, and provides no new

evidence to the theorist. Eventually the theorist must change her mind and con-

jecture that p does not exist. At that point the demon allows p to be discovered,

and presents a set of reactions E such that particles(E) = particles(�(e)). The

demon repeats E until � predicts that the actually existing particles are exactly

those featured in E. At that point, � has twice predicted that the actually

existing particles are those in E = particles(�(e)), and thus changed its mind

twice about this ontological proposition. Now the demon can simply let one

more particle be discovered (since the theorist's background knowledge allows

any �nite number of particles), so that � has to change its mind for a third time

about particles(E) to reject that hypothesis. Therefore � does not minimax

mind changes with respect to what elementary particles exist.2

Fact 8.3 Let E be a �nite set of observed reactions, and let Q be a conservationtheory consistent with E (i.e., E � ker(Q)). Then all linear combinations ofreactions in E are consistent with Q (i.e., span(E) � ker(Q)).

Proof. Let E = fe1; e2; :::; eng, and let r 2 span(E). Then r = �aiei for

some rationals ai. Since E � ker(Q), we have that for each ei; [Q]ei = 0, and

so by the fact that matrix multiplication is distributive, [Q]r = [Q](�aiei) =

�ai[Q]ei = �ai0 = 0. Hence r 2 ker(Q).2

Fact 8.4 Let E be a �nite set of observed reactions. Then there is a conserva-tion theory Q for the n observed particles in E such that Q is consistent with

all and only linear combinations of the reactions in E (i.e., ker(Q) = span(E)).Moreover, the quantum properties in Q can be chosen to have integral values

only.

167

Proof. Restrict the logically possible reactions to the particles occurring in

E, so that we are considering the vector spaceQn. The orthogonal complement

of a set of vectors V is denoted by V? , and de�ned as V ? = fv

0 : v � v0 =

0 for all v 2 V g. That is, V ? is the set of all vectors that are orthogonal

to every vector in V . It is a standard fact that if V is a linear space, then

(V ?)? = V . Since span(E) is a subspace of Qn, it follows from a familiar

theorem of linear algebra thatQn is the direct sum of span(E) and [span(E)]?,

that is, every vector r 2 Qn can be uniquely written as rE � rE? , where �

denotes vector addition and rE 2 span(E); rE? 2 [span(E)]?. Now choose a

basis fb1;b2; :::;bmg for span(E); where m � n. By another standard fact,

the orthogonal complement of a set is a subspace, so we may choose a basis

fqm+1; :::;qng for [span(E)]? such that fb1;b2; :::;bmg [ fqm+1; :::;qng is a

basis for Qn . I observe that (*) V ? = [span(V )]?. It is immediate that

[span(V )]? � V? because every vector that is orthogonal to all vectors in the

span of V is orthogonal to those in V . Conversely, let v? be a vector in V?,

and let v be a vector in span(V ). Then v =P

aivi for rationals ai and vectors

vi 2 V . So v? � v = v? � (P

aivi) =P

ai(v? � vi) = 0 since v? is orthogonal

to each vector in vi .

LetQ be thematrix whose rows are qm+1; :::;qn. The kernel ofQ is the set of

all vectors that are orthogonal to each row in Q; so ker(Q) = fqm+1; :::;qng?.

By (*), fqm+1; :::;qng? = [span(fqm+1; :::;qng)]

?. Since fqm+1; :::;qng is

a basis for [span(E)]?; span(fqm+1; :::;qng) = [span(E)]?. Hence ker(Q) =

(span(E)?)? = span(E), as required.2

Proposition 8.5 Let E be a �nite set of observed transitions, and let Q be any

conservation theory consistent with E (i.e., E � ker(Q)jD, where D are the(detectable) particles involved in the transitions of E). Then Q is consistent with

all transitions in the integral span of the transitions in E (i.e., int� span(E) �

ker(Q)jD).

Proof. Let E = fe1; e2; :::; eng, let r 2 span(E), and let P be the set of

particles in Q. For each ei, choose a reaction ri involving only the particles in P

such that the visible part of ri is ei, i.e. rijD = ei, and ri is physically possible

according to Q, that is, ri 2 ker(Q). Let t be a transition in the integral span of

E, such that t = �ziei for integers z0; :::; zjEj . Then [Q](�ziri) = �[Q](ziri) =

0, so r = �ziri is physically possible according to Q. But rjD = �zirijD =

�ziei = t. So the empirical content of Q includes the observation of t, as

required.2

Proposition 8.6 Let E be a �nite set of observed transitions. Then there is

a conservation theory Q whose empirical content is exactly the integral span ofthe reactions in E (i.e., ker(Q)jD \R = int � span(E), where D is the set of(detectable) particles involved in the transitions of E).

Proof. Let E = fe1; e2; :::; eng. I show how to �nd Q inductively. Let Q0 be

the jDj � jDj identity matrix. Clearly ker(Q0) = f0g: Inductively, suppose that

168

ker(Qk)jD\R = int� span(fe1; e2; :::; ek�1g). Reinterpret the transition ek =

agents(ek)! products(ek) by adding a hidden particle h among the products to

arrive at the \actual" reaction ehk= agents(ek)! products(ek) + h. Formally,

particles(ehk) = particles(ek) [ fhg; e

h

kjparticles(ek) = ek, and e

h

k(h) = �1.

Note that ehkjD = ek since all particles reported in ek are detectable. Extend

the matrix of quantum properties Qk with jP j columns to a matrix Qk+1 with

jP j+1 columns by modifying each quantum property (row) q in Qk as follows:

q0jP = qjP , and q0(h) = q � ek. That is, consider each quantum property q,

and assign to h the value q(h) that is required to restore the balance of q in

ek, if there is an imbalance. Geometrically, the introduction of the new particle

adds a dimension to the vector space. The component of each quantum property

in the new dimension is determined by choosing the component just so that it

balances the count of ek for the quantum property.

I now argue that the empirical content of Qk+1 is exactly the integral span

of fe1; e2; :::; ekg, that is, (ker(Qk+1) \R)jD = int � span(fe1; e2; :::; ekg).

(�) By hypothesis, choose for each i � k � 1 a reaction ri among only the

particles in P that is physically possible according to Qk and whose visible part

is ei. Since ri(h) = 0; for each extended quantum property q0, we have that

q0 � ri = q � ri+0(q0�ri(h)) = q�ri = 0 by the assumption that ri is physically

possible according to Qk. In other words, ri conserves all of the extended quan-

tum properties inQk+1 because ri conserves the unextended quantum properties

inQk and does not feature the hidden particle h. Therefore ri is physically possi-

ble according to Qk+1. Moreover, q � ehk = q � ek for each unextended quantum

property q because q(h) = 0. Hence for each extended quantum property q0;

we have that q0 � ehk = q � ehk + eh

k(h)q0(h) = q � ek � q � ek = 0. So e

h

kis

physically possible according to Qk+1, and the visible component of ehkis ek.

It follows by Proposition 8.5 that the physically possible transitions accord-

ing to Qk+1 include all transitions in the integral span of fe1; e2; :::; ekg.

(�) Note that Qk+1 contains as many linearly independent quantum prop-

erties as Qk, that is, rank(Qk) = rank(Qk+1). The dimension of the kernel

of Qk is called the nullity of Qk, written null(Q). By a standard theorem of

linear algebra, the nullity of Qk is the number of particles in P minus the rank

of Qk, and the nullity of Qk+1 is jP j + 1 � rank(Qk+1) = null(Qk) + 1. Let

fb1;b2; :::;bnull(Qk)g � R(P [ fhg) be a basis for ker(Q) \ R(P [ fhg) com-

prising logically possible reactions involving only the particles in P . Then ehkis independent of fb1;b2; :::;bnull(Qk)g since e

h

k(h) = �1 and the reactions en-

coded by b1;b2; :::;bnull(Qk) do not involve h. So fb1;b2; :::;bnull(Q) ; ehkg is a

basis for ker(Qk+1).

Now let r 2 ker(Qk+1) encode a physically possible reaction according to

Qk+1 among only the particles in P . Then r =P

null(Qk)

i=1 aibi+aehk for rationals

a; a1;::: . Since none of the reactions encoded by bi involve h; and eh

k(h) = �1; a

must be an integer if r is a possible reaction. Let rold =P

null(Q)

i=1 aibi and

rnew = aehk . Then rold is a possible reaction according to Qk, that is, rold 2

169

ker(Qk)\R(P ), and so by inductive hypothesis, the visible component of rold is

in the integral span of fe1; e2; :::; ek�1g; let roldjD = �k�1i=1 ziei for integers zi:

Thus the visible component of r is in the integral span of fe1; e2; :::; ekg, since

rjD = roldjD + rnewjD = �k�1i=1 ziei + a(ehk), where a is an integer. This shows

that the empirical content of all reactions that are physically possible according

to Qk+1 is the integral span of fe1; e2; :::; ekg, which completes the inductive

step. Continuing this process until k = n, we obtain the desired conservation

theory Q.2

Fact 8.7 Let E be a �nite set of reactions. Then a reaction r is in the span of

E () some multiple r0 of r is in the integral span of E. That is, span(E) \R =fractions(int� span(E)).

Proof. (�) Immediate.

(�) Let E = fe1; e2; :::; eng, and let r = �aiei encode a reaction in the span

of E involving the particles reported in E. Since r encodes a reaction, the aimust be rationals; let ai = pi=qi for each i. So (�jqj)r = �i(�jqj)pi=qiei =

�i(�j 6=iqj)piei, which is in the integral span of E. Letting �i(�j 6=iqj)piei =

�iniei, and (�jqj) = m we have that mr = �iniei, so r is a fraction of �iniei,

which is in the integral span of E.2

Theorem 8.8 Let D = fd1; d2; :::; dng be a �nite set of decays of distinct

particles p1; p2; :::; pn (i.e., if i 6= j, then pi 6= pj). If D satis�es the irre-versibility condition, then the decays in D are linearly independent, that is,rank(fd1;d2; :::;dng) = n.

Proof. Suppose for reductio thatD is not linearly independent. LetP

n

i=1 aidi =

0. I show that (*) for every particle pi, there is a particle pj such that pi is

contained in the decay tree for pi given D. For consider the decay di of particle

pi. If di is of the form pi ! pi + :::., then pi appears in the decay tree for pi:

Otherwise di is of the form pi ! :::, where pi does not appear on the right-hand

side among the decay products. Assume without loss of generality that ai > 0.

SinceP

n

i=1 aidi = 0, this implies that there is some k 6= i such that either dk is

of the form pi ! ::: and ak < 0, or dk is of the form pk ! pi + :::: and aj > 0.

The �rst case is impossible because we assume that pi only decays in decay di.

So the second case must obtain, and hence pi is in the decay tree of pk given

D. This establishes (*). Now by (*) a sequence of particles with the following

property exists: p0 = p1, and pi+1 is a particle in P such that pi is in the decay

tree of pi+1 given D. Since the successor relation in a tree is transitive, we have

that (**)8i:8j > i:pi is in the decay tree of pj given D. Furthermore, by the

pigeon hole principle at least one of the n particles in P must appear twice in

the sequence of length n + 1: By (**), this implies that some particle pi is in

the decay tree of pi given D . So if D is not linearly independent, D violates

the irreversibility condition.2

170

Chapter 9

Admissibility in Games

9.1 Outline

We have seen several times that admissibility (with respect to some epistemic

value) leads to short-run constraints on scienti�c methods. The theorems that

characterize admissible methods (3.1,3.2,4.1,6.1, 6.2,6.7) have a common struc-

ture: Admissible methods are those that satisfy certain constraints at each stageof inquiry. In this chapter, I show that this characteristic of admissible methods

is not limited to inductive problems, but stems from a more general fact of game

theory: In any game of perfect information|inductive problems such as discov-

ery and theory discovery problems are games of perfect information between the

scientist and \nature"|a strategy for the game is admissible if and only if the

strategy is admissible at each stage of the game, or at each \subgame" as game

theorists would say.

Indeed, this phenomenon is much more general: it applies even in games of

imperfect information, in which players may not always know everything about

the past history of the game when they are choosing their move. Again, ad-

missible strategies in games of imperfect information are exactly those that are

admissible at each stage of the game. 1 Most games of interest to economists,

political scientists and political philosophers for modeling social and economic

interactions are games of imperfect information. I apply my insights into the

structure of admissible strategies to derive predictions about how particular

games will be played based on the assumption that players will not follow dom-

1With a proviso: this claim may fail in games of imperfect recall, in which players forget

what they once knew or did; see De�nition 9.4. In micro-economics, games with imperfectrecall are of marginal interest. Throughout this thesis I have modelled inquiry as a game

of perfect recall. On the other hand, retracting background knowledge leads to a loss of

recall, because the inquirer does no longer know what she once knew. Hence the results fromthis chapter do not apply to a methodological setting that allows retractions of background

knowledge (such as the one in [Levi 1980]).

171

inated strategies. It is natural to iterate this idea, such that after ruling out

dominated strategies once in their deliberations before playing the game, I imag-

ine that players eliminate strategies that are dominated then, etc. I show that

the resulting deliberation procedure has several attractive formal properties that

game theorists have long sought in a \solution concept" for making predictions

in a game [Kohlberg and Mertens 1986]:

� The iterated elimination procedure gives the same answers when applied

to set of strategies in a game as it does when applied to the game tree that

explicitly models the dynamics of a game. This means that the procedure's

results do not depend on how exactly we represent the dynamics of a

particular strategic situation.

� The elimination procedure generalizes the standard backward solution for

games of perfect information; it sometimes makes stronger predictions

than backward induction, but never weaker ones.

� In games of imperfect information, iterated dominance underwrites typical

forward induction arguments.

Although the deliberation procedure based on iterated admissibility is nat-

ural and its results are intuitively attractive, it remains an open problem to

justify the assumption that players would reason about a game in this way, or

to give a decision-theoretic foundation for my solution concept (perhaps based

on common knowledge of \rationality", for a suitable de�nition of rationality,

rather than on deliberational dynamics.) [Bicchieri and Schulte 1997] discusses

some of the issues involved. The versions of the results from this chapter for

�nite games were previously published in [Bicchieri and Schulte 1997].

9.2 Preliminaries

9.2.1 Extensive and Strategic Form Games

I introduce some basic notions for describing deterministic games in extensive

form, which originate with [Von Neumann and Morgenstern 1947] and [Kuhn 1953].

The reader who is unfamiliar with game trees may �nd it useful to review the

game trees displayed in Figures 9.1{9.7 while going through the following de�-

nitions.

An extensive form game for players N = 1; 2; :::; n is described by a game

tree T with nodes V and root r, payo� functions ui for each player i, and

information sets Ii for each player i. The information sets partition the nodes

of a tree, and if information set Ii belongs to player i, then player i moves at

all nodes in Ii. A maximal path in a game tree T is called a history of T . The

payo� function ui assigns a payo� to each player i for each history in T . For

each node x in T , I(x) is the information set containing x. A (pure) strategy

172

si for player i in a game tree T assigns a unique action, called a move, to each

information set Ii of player i in T . I denote the set of i's pure strategies in T

by Si(T ). A strategy pro�le in T is a vector (s1; s2; :::; sn) consisting of one

strategy for each player i. I denote the set of pure strategy pro�les in T by

S(T ); that is, S(T ) = �i2NSi(T ). I use `s' for a generic strategy pro�le. It

is useful to write s�i for a vector of length n � 1 consisting of strategy choices

by player i's opponents. I write S�i(T ) for the set of strategy pro�les of i's

opponents; that is, S�i(T ) = �j2N�figSj(T ).

In the games that I shall consider, the root is the only member of its infor-

mation set (i.e., I(r) = frg), so a strategy pro�le s in T determines a unique

history < r; x1; x2; :::; xn ; ::: >. I refer to this history as the play sequence

resulting from s, and denote it by play(s). When a pure strategy pro�le s in T

is played, each player receives as payo� the payo� from the play sequence re-

sulting from s. With some abuse of notation, I use ui to denote both a function

from strategy pro�les to payo�s for player i, as well as a function from histories

to a payo� for player i, and de�ne ui(s) = ui(play(s)). For a �nite game tree

T , the height of a node x in T is denoted by h(x), and de�ned recursively by

h(x) = 0 if x is a terminal node in T , and h(x) = 1+maxfh(y) : y is a successor

of x in Tg otherwise.

A game G in strategic form is a triple hN;Si2N ; ui2N i, where N is the

number of players and, for each player i 2 N , Si is the set of (pure) strategies

available to i, and ui is player i's payo� function. Given a strategy pro�le

s = (s1; : : : ; sn); ui(s) denotes the payo� to player i when players follow the

strategies (s1; : : : ; sn). I denote the strategic form of an extensive form game T

by the collection S(T ) of strategies in T , with payo�s de�ned as in T .

9.2.2 Restricted Game Trees

I shall describe procedures for deliberation that eliminate possible plays of a

game before it is played. To do so, I introduce some notation for describing the

result of eliminating possibilities in a game. For games in extensive form, I refer

to the result of eliminating possibilities as a restricted game tree.

De�nition 9.1 Restricted Game Trees

� Let T be a �nite game tree for N = 1; 2; : : : ; n players.

� T jV is the restriction of T to V , where V is a subset of the nodes in T .

All information sets in T jV are subsets of information sets in T .

� Tx is the game tree starting at node x (i.e., Tx is the restriction of T to x

and its successors.) If I(x) = fxg, then Tx is called a subgame.

� If si is a strategy for T and T 0 is a restriction of T , sijT0 is the strategy that

assigns to all information sets in T0 the same choice as in T . Formally,

173

sijT0 (I 0

i) = si(Ii), where Ii is the (unique) information set in T that

contains all the nodes in I0i. Note that sijT

0 is not necessarily a strategyin T

0; for the move assigned by si at an information set Ii in T may be

not possible in T0.

� If s is a strategy pro�le in T and T0 is a restriction of T , sjT 0 is the

strategy vector consisting of s[i]jT 0 for each player i.

� Let S � S(T ) be a collection of strategy pro�les in a game tree T withplayers N . Then a node x is consistent with S if and only if there is a

strategy pro�le s in S such that x is part of the play sequence resultingfrom s, i.e., x 2 range(play(s)). The restriction of T to nodes consistentwith S is denoted by T jS. I observe that T jS(T ) = T .

� A node x is consistent with a strategy si by player i in T just in case thereis a strategy pro�le s�i in T such that x appears in the play sequenceplay(si; s�i).

9.3 Admissibility in Games

Consider a set of strategy pro�les S = S1 � S2 � ::: � Sn, and two strategies

si; s0i2 Si of player i. I say that a strategy si for player i is consistent with

S just in case there is a strategy pro�le s in S such that s[i] = si.

Player i's strategy si is weakly dominated by her strategy s0igiven S just

in case:

1. for all n � 1-tuples s�i chosen by i's opponents that are consistent with

S, ui(si; s�i) � ui(s0i; s�i) and

2. for at least one n� 1-tuple s�i consistent with S, ui(si; s�i) < ui(s0i; s�i):

A strategy si is weakly dominated given S just in case there is a strategy

s0iconsistent with S such that s0

iweakly dominates si given S. A strategy si is

admissible given S just in case si is not weakly dominated given S.

My goal in this section is to show that a strategy is admissible in the strate-

gic form of an extensive form game just in case the strategy is admissible at each

stage of the game, that is, at each information set. The next de�nition formu-

lates the notion of a strategy being admissible at an information set. Informally,

a strategy si is weakly dominated by another strategy s0iat an information Ii

in a game tree T if s0inever yields less to i at Ii than si does, and sometimes

yields more. For example, in the game of Figure 9.1, a weakly dominates b for 2

because a yields player 2 the payo� 2 for sure, while b may yield only 0 if player

1 plays R2.2 And in the game of Figure 9.6, b and c weakly dominate a at 2's

information set.2Here and elsewhere, the payo� at a terminal node is given as a pair (x; y), where x is the

payo� for player 1 and y is the payo� for player 2.

174

��#

##

###

ZZZZZZ

��

1,2

a

2

1

L1 R1

2,0

��1

3,2 1,0

%%

%%

eeee

L2 R2

b

��

TTTTT

Figure 9.1: Admissibility in a Game of Perfect Information. The label inside a

nodes indicates which player is choosing at that node.

175

De�nition 9.2 Admissibility at an Information Set


� I de�ne the payo� to player i from strategy si and strategy pro�le s�i atx, written ui(si; s�i; x), to be ui(si; s�i; x) = ui(sijTx; s�ijTx).

� A strategy si is weakly dominated by another strategy s0iat an infor-

mation set Ii belonging to i in T just in case

1. for all strategy pro�les s�i in T , and for all y in Ii; ui(si; s�i; y) �

ui(s0i; s�i; y), and

2. for some strategy pro�le s�i and some node y in Ii; ui(si; si; y) <

ui(s0i; s�i; y).

� A strategy si is admissible at an information set Ii in T just in casesi is not weakly dominated at Ii:

With two quali�cations, I shall prove that a strategy is admissible in an

extensive form game just in case it is admissible at each information set. The

�rst quali�cation concerns the subtle point that a strategy may make prescrip-

tions at information sets that cannot be reached when the strategy is played.

For example, in the game of Figure 9.1, the strategy (R1R2) for player 1 yields

the same payo� as (R1L2). Hence both are admissible strategies for the overall

game, although (R1L2) is admissible at 1's second information set and (R1R2)

is not. Evaluating strategies only with respect to information sets that are con-

sistent with them leads to what [Bicchieri and Schulte 1997] call proper weakdominance, and proper admissibility. So in the game of Figure 9.1, (R1R2) is

properly admissible at 1's second information set .

I say that a node x in a game tree T is consistent with a strategy si

if there is some play sequence play(s) that reaches x such that si = s[i]. An

information set I in a game tree T is reachable with a strategy si if some

node in I is consistent with si .

De�nition 9.3 Sequential Proper Admissibility

� Let T be a �nite game tree.

� A strategy si is properly weakly dominated at an information set Iibelonging to i in T just in case Ii is reachable with si and si is weakly

dominated at Ii.

� A strategy si is properly admissible at an information set Ii just in

case si is not properly weakly dominated at Ii:

� A strategy si is sequentially properly admissible in T if and only ifsi is properly admissible at each information set Ii in T that belongs to

player i.

176

However, it is still not always the case that a strategy that is admissible in

the strategic form of a game is sequentially properly admissible in an extensive

form of the game. For example, in the game of Figure 9.2, the strategy L is

properly weakly dominated for player 2 at her information set: at node y;R

yields a higher payo� than L, and starting at node x, both L and R yield the

same. On the other hand, node y cannot be reached when 2 plays L, so L is an

admissible strategy for the overall game, yielding 2's maximal payo� of 1. The

game in Figure 9.2 has the strange feature that if 2 plays R at x to arrive at y,

she has `forgotten' this fact and cannot distinguish between x and y. Indeed,

this is a game without perfect recall. Informally, a game is one with perfect

recall if no player ever forgets what they knew or did. The formal de�nition of

perfect recall is as follows.

De�nition 9.4 (Harold Kuhn) Let T be a �nite game tree. Then T is anextensive game with perfect recall if and only if for each information set Iibelonging to player i, and each strategy si in T; all nodes in Ii are consistentwith si if any node in Ii is.

I note that if T is a game with perfect recall, then all restrictions of T satisfy

perfect recall. My main result is that in extensive form games with perfect recall,

the notion of proper weak dominance coincides exactly with admissibility among

strategies for the overall game (admissibility in the strategic form).

Theorem 9.1 Let T be a game tree with perfect recall. Then a strategy si forplayer i is admissible in the strategic form S(T ) if and only if si is sequentially

properly admissible in T .

9.4 Iterated Admissibility

A player who is reasoning, say, with the help of admissibility would not go very

far in eliminating plays of the game inconsistent with it, unless he assumes that

the other players are also applying the same principle. In the game of Figure

9.3, for example, player 1 could not eliminate a priori any play of the game

unless he assumed player 2 never plays a dominated strategy.

In general, even assuming that other players choose admissible strategies

might not be enough to rule out possibilities about how a given game might be

played. Players must reason about other players' reasoning, and such mutual

reasoning must be common knowledge. Unless otherwise speci�ed, I shall as-

sume that players have common knowledge of the structure of the game and of

their following the admissibility principle, and examine how common reasoning

about admissibility unfolds.

My procedure for capturing common reasoning about sequential weak ad-

missibility in T is the following. First, eliminate at each information set in T all

177

��#

##

###

��

0,1

L

2

1

��2

0,0 0,1

%%

%%

eeee

L R

'

&

$

%��

`````̀R

x

y

Figure 9.2: A Game Without Perfect Recall

178

��#

##

###

ZZZZZZ

��

��

1,1 0,0 1,1 0,0

L R L R

2 2

1

a b0,0

hhhhhhhhhhhh

c

#"

!�

��

TTTTT

��

TTTTT

Figure 9.3: Weak Admissibility

179

moves that are inconsistent with the admissibility principle, that is, dominated

choices. The result is a restricted game tree T 0.

Repeat the pruning procedurewith T 0 to obtain another restricted game tree,

and continue until no moves in the resulting game tree are weakly dominated.

Note that the recursive pruning procedure does not start at the �nal information

sets. This procedure allows players to consider the game tree as a whole and to

start eliminating branches anywhere in the tree by applying admissibility.

I de�ne the result of common reasoning about sequential proper admissibility

as follows. For a given game tree T , let Seq � PAi(T ) = fsi 2 Si(T ) : si is

sequentially properly admissible in Tg; and let Seq � PA(T ) = �i2NSeq �

PAi(T ).

De�nition 9.5 Common Reasoning About Sequential Proper Admissibility

� Let T be a game tree, with players N = 1; 2; :::n.

� The strategies in T consistent with common reasoning about se-

quential proper admissibility are denoted by CRPSeq(T ), and are de-

�ned as follows:

1. PSeq0(T ) = S(T ):

2. PSeqj+1(T ) = Seq � PA(T jPSeqj(T )):

3. s 2 CRPSeq(T )() 8j : sj[T jPSeqj(T )] 2 PSeqj+1(T ).

Next, consider a game G in strategic form. I de�ne an order-free iterative

procedure for eliminating weakly dominated strategies. If S is a set of strategy

pro�les, let Admissi(S) be the set of all strategies si for player i that are con-

sistent with S and admissible given S, and let Admiss(S) = �i2NAdmissi(S).

De�nition 9.6 Common Reasoning About Admissibility in the Strategic Form

� Let the strategic form of a �nite game G be given by hN;Si2N ; ui2N i, and

let S = S1 � S2 � ::: � Sn be the set of strategy pro�les in G.

� The strategies in S consistent with common reasoning about ad-

missibility are denoted by CRAd(S); and are de�ned as follows.

1. Ad0(S) = S:

2. Adj+1(S) = Admiss(Adj(S)):

3. CRAd(S) =T1

j=0Adj(S):

180

1,3 3,2 1,2

0,02,02,2

2,1 1,2 0,0

L M R

a

b

c

Figure 9.4: Order-Free Elimination of weakly Dominated Strategies

The procedure goes through at mostP

i2N jSi� 1j iterations; that is, for all

j �P

i2NjSi � 1j; Adj(S) = Ad

j+1(S) .

To illustrate common reasoning about admissibility, consider the game in

Figure 9.4. In the �rst iteration, player 1 will eliminate c, which is weakly

dominated by b, and player 2 will eliminate R, which is dominated by L andM .

Since admissibility is common knowledge, both players know that the reduced

matrix only contains the strategies a; b and L;M . Common reasoning about

admissibility means that both players will apply admissibility to the new matrix

(and know that they both do it), and since now L dominatesM , both will know

thatM is being eliminated. Finally, common reasoning about admissibility will

leave b; L as the unique outcome of the game.

From my main result it follows that in games with perfect recall, iterated

sequential proper admissibility and order-free elimination of inadmissible strate-

gies in the strategic form yield exactly the same result.

Theorem 9.2 Let T be a game tree with perfect recall. A strategy pro�le s isconsistent with common reasoning about sequential proper admissibility in T if

and only if s is consistent with common reasoning about admissibility in thestrategic form of T . That is, CRPSeq(T ) = CRAd(S(T )).

In in�nite games, there may not be any admissible strategy (a trivial example

181

is the one-player one-move game in which the player may choose a pay-o� as

high as he pleases). For �nite games, general existence is easy to establish.

Proposition 9.3 For all �nite games G with pure strategy pro�les S, CRAd(S) 6=

;.

9.5 Strict Dominance and Backward Induction

In this section I compare (iterated) sequential proper admissibility with two

other standard recommendations for reasoning about extensive form games:

backward and forward induction.

I establish that in �nite games of perfect information, common reasoning

about weak admissibility (eliminating strictly dominated strategies) gives ex-

actly the same results as Zermelo's backward induction algorithm, which in

�nite games of perfect information corresponds to Selten's notion of subgameperfection [Osborne and Rubinstein 1994, Ch.6]. I then show by examples that

the tight connection between common reasoning about weak admissibility and

subgame perfection breaks down in games of imperfect information.

A strategy is sequentially weakly admissible in a game tree T if it is weakly

admissible at each information set in T . A strategy si for player i is not weaklyadmissible at a given information set Ii if the strategy is strictly dominated

at Ii. This means that there is some other strategy s0ithat yields i a better

outcome than si at every node x in Ii. For example, in the game of Figure 9.3

, playing left (`L') at 2's information set strictly dominates playing right (`R').

The formal de�nition of sequential weak admissibility is the following.

De�nition 9.7 Strict Dominance and Weak Admissibility in Extensive FormGames


� A strategy si is strictly dominated by another strategy s0iat an infor-

mation set Ii belonging to i in T just in case for all strategy pro�les s�iin T , and for all y in Ii, ui(si; s�i; y) < ui(s

0i; s�i; y).

� A strategy si is weakly admissible at an information set Ii in T just

in case si is not strictly dominated at Ii.

� A strategy si is sequentially weakly admissible in T if and only if siis weakly admissible at each information set Ii in T that belongs to playeri.

As with (proper) admissibility, by iteratively applying weak admissibility to

eliminate possible plays we obtain predictions for the course of the game. To

illustrate the procedure, look at the game of Figure 9.3. R is eliminated at

182

2's information set in the �rst iteration, and then c is eliminated for player 1

because, after R is eliminated, either a or b yield player 1 a payo� of 1 for sure,

while c yields 0. This pruning procedure is formally de�ned as follows. For a

given game tree T , let Weak�Adi(T ) = fsi 2 Si(T ) : si is sequentially weakly

admissible in Tg , and let Weak �Ad(T ) = �i2NWeak �Adi(T ) .

De�nition 9.8 Common Reasoning about Sequential Weak Admissibility


� The strategies in T consistent with common reasoning about se-

quential weak admissibility are denoted by CRWA(T ), and are de�nedas follows:

1. WA0(T ) = S(T ):

2. WAj+1(T ) = Weak � Ad(T jWA

j(T )):

3. s 2 CRWA(T )() 8j : sj[T jWAj(T )] 2 WA

j+1(T ):

If T is a �nite game tree, the set of strategies for player i; Si(T ) is �nite, and

our procedure will go through only �nitely many iterations. To be precise, let

max =P

i2N jSij � 1; then the procedure will terminate after max iterations,

i.e. for all j � max;WAj(T ) = WA

j+1(T ).

To describe Zermelo's backward induction algorithm, I introduce the concept

of Nash equilibrium and one of its re�nements, subgame perfection, for generic

�nite games in extensive form. A strategy si in a game tree T is a best reply to

a strategy pro�le s�i of i's opponents if there is no strategy s0ifor player i such

that ui(s0i; s�i) > ui(si; s�i). A strategy pro�le s is aNash equilibrium if each

strategy s[i] in s is a best reply against s[�i]. A strategy pro�le s is a subgame

perfect equilibrium if for each subgame Tx of T; (sjTx) is a Nash equilibrium

of Tx. I say that a strategy si in T is consistent with subgame perfection if

there is a subgame perfect strategy pro�le s of which si is a component strategy,

that is, si = s[i]. I denote the set of player i's strategies in T that are consistent

with subgame perfection by SPEi(T ), and de�ne the set of strategy pro�les

consistent with subgame perfection by SPE(T ) = �i2NSPEi(T ). Note that

not all strategy pro�les that are consistent with subgame perfection are subgame

perfect equilibria. In Figure 9.5, all strategy pro�les are consistent with subgame

perfection, but L; ba0 and R; ab0 are not equilibria, since in equilibrium 1 must

be playing a best reply to 2's strategy.

Finally, T is a game of perfect information if each information set I of T

is a singleton. The game in Figure 9.5 is a game of perfect information.

A standard approach to �nite games of perfect information is to apply Zer-

melo's backwards induction algorithm which yields the set of strategy pro�les

that are consistent with subgame perfection (i.e., SPE(T )) [Osborne and Rubinstein 1994,

Ch.6.2]. Common reasoning about weak admissibility, as de�ned by the pro-

cedure WA, does not follow Zermelo's backwards induction algorithm. For

183

��#

##

###

ZZZZZZ

��

��

1,0 0,0 1,0 0,0

a b a' b'

2 2

1

L R

��

TTTTT

��

TTTTT

Figure 9.5: A Game of Perfect Information

184

example, suppose that in a game tree a move m at the root is strictly domi-

nated by another move m0 at the root for the �rst player. Common reasoning

about weak admissibility rules outm immediately, but the backwards induction

algorithm eliminates moves at the root only at its last iteration. Nonetheless,

in games of perfect information, the �nal outcome of the two procedures is the

same: In these games, the strategies that are consistent with common reasoning

about sequential weak admissibility are exactly those consistent with subgame

perfection.

Proposition 9.4 Let T be a �nite game tree of perfect information. Then a

strategy si is consistent with common reasoning about sequential weak admis-sibility in T if and only if si is consistent with subgame perfection. That is,CRWA(T ) = SPE(T ).

In games of imperfect information, the equivalence between strategies con-

sistent with subgame perfection and those consistent with common reasoning

about sequential weak admissibility fails in both directions. Figure 9.3 shows

that a strategy pro�le s may be a subgame perfect equilibrium although s is not

consistent with common reasoning about sequential weak admissibility: The

strategy pro�le (c;R) is a subgame perfect equilibrium, but R and (hence) c

are not consistent with common reasoning about sequential weak admissibility.

And in Figure 9.6, a is not strictly dominated for player 2, but a is neither a

best reply to L nor to R.

As we may expect, common reasoning about admissibility makes stronger

predictions about the course of the game than common reasoning about weak

admissibility. What is more surprising is that this can happen even in games of

perfect information. In light of Proposition 9.4, this means that the (iterated)

admissibility principle may rule out plays that are consistent with the standard

backwards induction solution to games of perfect information. For an example,

consider again the game of Figure 9.1. Common reasoning about admissibil-

ity rules out b as a choice for player 2 because b is weakly dominated. Then

given that only a remain at 2's decision node, R1 (strictly) dominates L1 for

player 1. So the only play consistent with common reasoning about sequential

proper admissibility is for player 1 to play R1 and end the game. Note however

that common reasoning about sequential weak admissibility, that is, the stan-

dard backwards induction procedure, is consistent with both R1 and the play

sequence L1; b; L2.

I next show that iteratively applying admissibility never leads to weaker re-

sults than iteratively applying weak admissibility. The key is to observe that

if a strategy si is strictly dominated in a game tree T , si will be strictly dom-

inated in a restriction of T . The next lemma asserts the contrapositive of this

observation: If a strategy si is admissible in a restriction of T , si is not strictly

dominated in T .

Lemma 9.5 If a game tree T is a restriction of T 0 and si is sequentially admis-

185

��

��

��

0,1 0,0 0,0 0,0

2 2

1

0,0 0,1

,,

,,

,,

llllll

,,

,,

,,

llllll

'

&

$

%a b c a b c

L R

"""""""""bb

bbbbbbb

Figure 9.6: Subgame Perfection vs. Weak Admissibility

186

sible in T , then there is an extension s0iof si to T

0 such that s0iis sequentially

weakly admissible in T0.

This means that our procedure P �Seq yields, at each stage j, a result that

is at least as strong as that of common reasoning about weak admissibility, the

procedureWA. Hence we have the following proposition.

Proposition 9.6 Let T be a game tree. If a play sequence is consistent with

common reasoning about sequential admissibility in T , then that play sequenceis consistent with common reasoning about sequential weak admissibility. Thatis, fplay(s) : s 2 CRSeq(T )g � fplay(s) : s 2 CRWA(T )g.

9.6 Weak Dominance and Forward Induction

It is commonly held that iterated weak dominance (i.e., iterated sequential

admissibility) captures some of the features of backward and forward induction.

[Fudenberg and Tirole 1993, p.461] thus state that

Iterated weak dominance incorporates backward induction in

games of perfect information: The suboptimal choices at the last

information sets are weakly dominated; once these are removed, all

subgame-imperfect choices at the next-to-last information sets are

removed at the next round of iteration; and so on. Iterated weak

dominance also captures part of the forward induction notions im-

plicit in stability, as a stable component contains a stable component

of the game obtained by deleting a weakly dominated strategy.

Indeed, I have previously shown that, in �nite game of perfect informa-

tion, common reasoning about weak admissibility yields exactly the backward

induction solution. In this section I show how, in �nite games of imperfect infor-

mation, common reasoning about admissibility yields typical forward induction

solutions. Thus backward and forward induction seem to follow from one princi-

ple, namely that players' choices should be consistent with common knowledge

of (and common reasoning about) admissibility. This result may seem question-

able, as it is also commonly held that backward and forward induction principlesare mutually inconsistent [Kohlberg and Mertens 1986], [Myerson 1991]. That

is, if we take backward and forward induction principles to be restrictions im-

posed on equilibria, then they may lead to contradictory conclusions about how

to play.

A backward induction principle states that each player's strategy must be

a best reply to the other players' strategies, not only when the play begins

at the initial node of the tree, but also when the play begins at any other

information set. 3 A forward induction principle says that players' beliefs

3This principle corresponds to subgame perfection and to sequential optimality (see Sec-

tion 4.5).

187

should be consistent with sensible interpretations of the opponents' play. Thus

a forward induction principle restricts the range of possible interpretations of

players' deviations from equilibrium play. Deviations should be construed as

`signals' (as opposed to mistakes), since players should privilege interpretations

of the opponents' play that are consistent with common knowledge of rationality.

The typical example of a contradiction between backward and forward induction

principles would be a game of imperfect information, where one may apply

forward induction in one part of the tree, and then use the conclusion for a

backward induction argument in a di�erent part of the tree [Kohlberg 1990].

The game of Figure 9.7 is taken from [Kohlberg 1990, p.10]. Since player I,

by choosing y, could have received 2, then by forward induction if he plays n

he intends to follow with T ; but for the same reason II, by choosing D, shows

that she intends to play R, and hence|by backward induction|I must play

B. What seems to be at stake here is a con ict between di�erent but equally

powerful intuitions. By playing D, player II is committing herself to follow up

with R, and thus player I would be safe to play y. On the other hand, once

player I's node has been reached, what happened before might be thought of as

strategically irrelevant, as I now has a chance|by choosing n|of signaling hiscommitment to follow with T . Which commitment is �rmer? Which signal is

most credible?

We must remember that players make their choices about which strategy to

adopt after a process of deliberation that takes place before the game is actu-

ally played. I have supposed that during deliberation, players will employ some

shared principle that allows them to rule out some plays of the game as incon-

sistent with it. A plausible candidate is admissibility. Let us now see how the

ex ante deliberation of the players might unfold in this game by applying the

procedure Seq(T ) to the strategies UL;UR;DL;DR and yT; yB; nT; nB. Note

that if we recursively apply to this game the concept of sequential admissibility

presented in the previous section, we must conclude that the only strategies

consistent with common reasoning about sequential admissibility are UR, and

yT . Indeed, common reasoning about sequential weak admissibility alone yields

this result. For during the �rst round of iteration, the strategy nB of player I

is eliminated because this strategy is strictly dominated by any strategy that

chooses y at I's �rst choice node. Similarly, the strategy DL of player II is im-

mediately eliminated because this strategy is strictly dominated by any strategy

that chooses U at the root. So after the �rst round of elimination, II's second

information set is restricted to the node reached with nT , and her choices at this

information set are restricted to R only. This means in turn that y now strictly

dominates nT at I's �rst information set, and U strictly dominates DR at the

root. Finally, the strategies yB and UL are not strategies in the restricted tree

obtained after the �rst round of elimination, and therefore they are eliminated.

After the second round of elimination, only UR and yT survive. Thus common

reasoning about sequential admissibility predicts that players who deliberate

according to a shared admissibility principle will expect U to be chosen at the

188

��#

##

###

ZZZZZZ

��

2,0

y

I

II

U D

0,2

��I

T B

n

��II ��

��II

4,1 0,0

""

""

""

TTTTT

0,0 1,4

TTTTT

'

&

$

%L R L R

��

TTTTT

bbbbbb

Figure 9.7: Backward vs. Forward Induction Principles

189

beginning of the game.

A brief comment about the intuitive plausibility of the iterated admissibility

procedure is now in order. Note that this procedure does not allow the players

to discount whatever happens before a given information set as strategically

irrelevant. For example, if player II were to choose D , player I should not

keep playing as if he were in a new game starting at his decision node. I rather

suggest that I should expect II to follow with R, if given a chance. In which case

he should play y and player II, who can replicate I's reasoning, will in fact never

play D. On the other hand, playing D to signal that one wants to continue|if

given a chance|with R would make little sense, since II must know that nB

is never going to be chosen, and R makes sense only if it follows nB. In other

words, D is not a rational move for player II. Similar reasoning excludes nB as

a rational strategy for player I.

The problem with Kohlberg's and similar examples is that no constraints

are set on players' forward induction \signals". I de�ne the notion of a credible

signal in an extensive form game, and show that the credible signals are the

signals consistent with common reasoning about sequential admissibility (much

as Selten's subgame-perfect equilibria characterize \credible threats"). Thus the

examples in the literature which purport to show the con ict between backward

and forward induction principles involve forward induction signals that are not

credible.

The following de�nition formulates the notion of a forward induction signal

in general, and a credible forward induction signal in particular. The idea is this:

Let us consider a move m at a given information set Ii, and ask what future

moves of player i at lower information sets I 0iare consistent with sequential

admissibility and the fact that m was chosen at Ii. If there are future moves

that are consistent with sequential admissibility and the fact that m was chosen

at Ii, then I take the move m at Ii to be a signal that player i intends to follow

with one of those moves at I 0i. But I argue that in order for this signal to be

credible to i's opponents, at least one of the future admissible moves must be

consistent with common reasoning about sequential admissibility in T .

I say that an information set I 0iin a game tree T is reachable from another

information set Ii with a strategy si if there is are nodes x 2 Ii; y 2 I0isuch

that some play sequence that is consistent with sijTx contains y.

De�nition 9.9 Let T be a game tree with information set Ii. Let T jIi denote

the restriction of T to nodes in Ii and successors of nodes in Ii.

� A strategy si is consistent with forward induction at Ii if si is se-

quentially admissible at Ii.

� A move m at an information set Ii is a forward induction signal for

S�iat a lower information set I 0

i(written < Ii : m; I

0i: S�

i>), where

si 2 S�i()

190

1. si(Ii) = m;

2. I 0iis reachable from Ii with si;

3. si is consistent with forward induction at Ii.

� A forward induction signal < Ii : m; I0i: S�

i> is credible if some strategy

si in S�iis consistent with common reasoning about sequential admissibility

in T , i.e. si 2 CRSeq(T )i:

Let me illustrate these concepts in the game of Figure 9.7. According to my

de�nition, the only strategy that chooses n at I's �rst information set and is

consistent with forward induction is nT . So < I1I : n; I2I : fnTg > is a forward

induction signal, where I1I denotes I's �rst information set and I2I denotes I's

second information set. However, < I1I : n; I2I : fnTg > is not a credible signal.

For nT is inconsistent with common reasoning about sequential admissibility,

since such reasoning rules out L at II's second information set. Similarly for

player II, < I1II : D; I2II : fDRg > is a forward induction signal. But it is

not a credible signal, since DR is inconsistent with common reasoning about

sequential admissibility. Hence neither forward induction signal is credible, as

\sending" either signal is inconsistent with common reasoning about sequential

admissibility as de�ned by CRSeq.

In terms of reasoning about admissibility, the di�erence between Kohlberg's

and my analysis is this. Kohlberg applies admissibility once to argue that D is

a forward induction signal for R and n is a forward induction signal for T . But

if I assume that admissibility is common knowledge among the players, then

neither D nor n are credible signals. Indeed, common knowledge is not even

needed to get to this conclusion: it is su�cient to apply admissibility twice to

get the same result.

9.7 Proofs

Theorem 9.1 Let T be a game tree with perfect recall. Then a strategy si forplayer i is admissible in S(T ) if and only if si is sequentially properly admissiblein T .

Proof. Suppose that a strategy si in S(T ) for player i is weakly dominated

in S(T ). Then there is a strategy s0iconsistent with S(T ) such that

1. for all strategy pro�les s�i consistent with S(T ); ui(si; s�i) � ui(s0i; s�i),

and

2. for some strategy pro�le s��i consistent with S(T ); ui(si; s��i) < ui(s

0i; s��i).

Let x be the �rst node that appears along both the plays of si against s��i and

si against s��i at which si deviates from s

0i, so that x 2 range(play(si; s

��i)) \

191

range(play(s0i; s��i)) and si(Ii(x)) 6= s

0i(Ii(x)). Then x is consistent with si

and s0iin T . Let y be any node at Ii(x) consistent with si and s

0i, and let

s�i be any strategy pro�le of i's opponents. Then ui(si; s�i; y) � ui(s0i; s�i; y);

for otherwise, by perfect recall, let s��i be a strategy pro�le of i's opponents

such that both play(si; s��i) and play(s0

i; s��i) reach y, and such that s��ijTy =

s�ijTy . Then ui(si; s��i) > ui(s

0i; s��i), contrary to the hypothesis that s

0iweakly

dominates si in S(T ). Since I also have that ui(si; s��i; x) < ui(s

0i; s��i; x),

it follows that s0iweakly dominates si at Ii(x) so that si is not sequentially

properly admissible.

Suppose that a strategy si is properly weakly dominated at an information

set Ii in T by strategy s0i. Then there must be a node x in Ii consistent with si

and a strategy pro�le s0�i in T such that s0iyields a higher payo� at x against

s0�i than si does, i.e. ui(si; s

0�i; x) < ui(s

0i; s0�i; x). Assume without loss of

generality that x is reached by the play sequence of si against s0�i, i.e. x 2

range(play(si; s0�i)). Now I de�ne a strategy s�

ithat weakly dominates si in T

as follows.

1. At an information set I 0ithat does not contain x or any successor of x,

s�i(I 0i) = si(I

0i).

2. At an information set I 0ithat contains x or a successor of x; s�

i(I 0i) = s

0i(I 0i).

I show that s�iweakly dominates si in S(T ). Since play(si; s�i) reaches

x; play(s�i; s�i) also reaches x, and so ui(s

�i; s�i) = ui(s

�i; s�i; x) = ui(s

0i; s�i; x) >

ui(si; s�i; x) = ui(si; s�i). Thus s�iweakly dominates si in S(T ) if for no s�i

in T; ui(si; s�i) > ui(s�i; s�i), which I establish now: Let a strategy pro�le s�i

in T be given.

Case 1: the play sequence of (s�i; s�i) does not reach Ii(x). Then play(s

�i; s�i) =

play(si; s�i), and the claim follows immediately.

Case 2: the play sequence of (s�i; s�i) goes through some node y in Ii(x).

Since x is consistent with si and T is a game with perfect recall, y is consistent

with si, and so play(si; s�i) reaches y. As before, I have that (a) ui(si; s�i; y) =

ui(si; s�i) . Also, s�icoincides with s

0iafter node y, and so (b)ui(s

�i; s�i) =

ui(s0i; s�i; y). Since s

0iweakly dominates si at Ii(x), and y is in Ii(x), it follows

that (c) ui(s0i; s�i; y) � ui(si; s�i; y). Combining (a), (b) and (c) it follows that

ui(s�i; s�i) � ui(si; s�i). This establishes that si is weakly dominated given

S(T ):2

Theorem 9.2 Let T be a game tree with perfect recall. A strategy pro�le s is

consistent with common reasoning about sequential proper admissibility if andonly if s is consistent with common reasoning about admissibility in the strategicform of T . That is, CRPSeq(T ) = CRAd(S(T )).

Proof. I prove by induction on j that for all j � 0; PSeqj(T ) = Adj(S(T )):

Base Case, j = 0. Then by de�nition, PSeq0(T ) = S(T ) = Ad0(S(T )).

192

Inductive Step: Assume that PSeqj(T ) = Adj(S(T )) and consider j + 1:

By inductive hypothesis, T jPSeqj(T ) = T jAdj(S(T )). Now a strategy si is in

PSeqj+1i

(T ) () si is in PSeqj

i(T ) and si is sequentially properly admissible

in T jPSeqj(T ). By inductive hypothesis, the �rst condition implies that si is

in Adj(S(T )). By Theorem 9.1 and the facts that T jPSeqj(T ) = T jAdj(S(T ))

and that all restrictions of T are games with perfect recall, the second con-

dition implies that si is admissible in S(T jAdj(S(T ))) = Adj(S(T )). So si

is in Adj+1(S(T )). Conversely, a strategy si is in Ad

j+1(S(T )) () si is in

Adj(S(T )) and si is admissible in Ad

j(S(T )). By inductive hypothesis, the

�rst condition implies that si is in PSeqj(T ), and the second condition may

be restated to say that si is admissible in S(T jAdj(S(T ))). By Theorem 9.1,

the second condition then implies that si is sequentially properly admissible

in T jAdj(S(T )) = T jPSeqj(T ). Hence si is in PSeqj+1i

(T ). This shows that

PSeqj+1(T ) = Ad

j+1(S(T )), and completes the proof by induction.2

Proposition 9.3 For all �nite games G with pure strategy pro�les S;CRAd(S) 6=

;.

Proof. The admissible elements in Sj

isurvive at each iteration j , for each

player i, and there always is a admissible element in each Sj

isince each S

j

iis

�nite. Hence Sj 6= ; for any j, and so S

Pi2N

jSi�1j= CRAd(S) 6= ;.2

For the proof of proposition 9.4, I rely on the well-known one-deviation

property of subgame perfect equilibrium: If it is possible for one player to

pro�tably deviate from his subgame perfect equilibrium strategy si, he can do

so with a strategy s0ithat deviates from si only once.

Lemma 9.0 Let T be a �nite game tree of perfect information. Then s is a

subgame perfect equilibrium in T if and only if for each node x, for each playeri; ui(s[i]; s[�i]; x) � ui(s

0i; s[�i]; x), whenever s[i] and s

0idi�er only at x.

Proof. See [Osborne and Rubinstein 1994, Lemma 98.2].

For the next proposition, I note that if T is �nite, then our iterative proce-

dure goes only through �nitely many iterations. In particular, this means that if

a strategy si is strictly dominated given CRWA(T ), then si is not in CRWA(T ).

Proposition 9.4 Let T be a �nite game tree of perfect information. Then astrategy si is consistent with common reasoning about sequential weak admis-sibility in T if and only if si is consistent with subgame perfection. That is,

CRWA(T ) = SPE(T ).

Proof. I prove by induction on the height x of each node that CRWA(Tx) =

SPE(Tx). The proposition follows when I take x to be the root of T .

Base Case, h(x) = 1. Then all successors of x are terminal nodes. Let

player i be the player to move at x. Let max(x) be the maximum payo� player

i can achieve at x (i.e. max(x) = maxfui(y) : y is a successor of xg). Then

193

sijTx is consistent with subgame perfection at x if and only if si(x) yields i the

maximum payo� max(x), which is exactly when sijTx is not strictly dominated

at x.

Inductive Case: Assume the hypothesis in the case when h(y) < h(x) and

consider x.

()): Let s be a strategy pro�le consistent with common reasoning about

sequential weak admissibility (i.e. s 2 CRWA(Tx) ): Suppose that it is player

i's turn at x. For each player j; s[j]jTy is consistent with subgame perfection in

each proper subgame Ty of Tx, by the inductive hypothesis and the fact that

s[j] is consistent with common reasoning about sequential weak admissibility

in Tx. So the implication ()) is established if I show that s[i] is consistent

with subgame perfection in Tx. Let y be the successor of x that is reached

when i plays s[i] at x. Let max(y) be the maximum that i can achieve given

common reasoning about sequential weak admissibility when he follows s[i] (i.e.

max(y) = maxfui(s[i]; s�i; x) : s�i is consistent with CRWA(Tx)g). For each

y0 that is a successor of x, let min(y0) be the minimum that i can achieve given

common reasoning about sequential weak admissibility when he follows s[i] in

Ty0 . Then I have (*) that max(y) � min(y0) for each successor y0 of x. For oth-

erwise player i can ensure himself a higher payo� than s[i] can possibly yield,

by moving to some successor y0 of x and continuing with s[i]. That is, the

strategy s�iwhich moves to y0 at x and follows s[i] below y

0 strictly dominates

s[i] in TxjCRWA(Tx). But since T and hence Tx is �nite, this contradicts the

assumption that s[i] is consistent with CRWA(Tx). Now by inductive hypothe-

sis, CRWA(Ty0 ) = SPE(Ty0) for each successor y0 of x. So there is a subgame

perfect equilibrium smax in Ty which yields i the payo� max(y) in Ty and in

which player i follows s[i] (i.e. s[i] = smax[i]). Again by inductive hypothesis,

for each successor node y0 of x there is a subgame perfect equilibrium sy0

min in

Ty0 which gives player i the payo� min(y0) and in which player i follows s[i]

in Ty0 . Now I de�ne a subgame perfect equilibrium s� in Tx in which player i

follows s[i]:

1. s�[i](fxg) = s[i](fxg),

2. in Ty ,s� follows smax,

3. in Ty0 ; s� follows s

y0

min, where y0 is a successor of x other than y. By our

observation (*), there is no pro�table 1-deviation from s� for player i at

x, and hence by lemma 9.0, s� is a subgame perfect equilibrium in Tx.

(() Let s be consistent with subgame perfection in Tx. Let i be the player

moving at x: Consider any strategy s[j] in s, where j 6= i. Since j is not moving

at x; s[j] is consistent with common reasoning about sequential weak admissi-

bility in Tx if and only if s[j]jTy is consistent with common reasoning about

sequential weak admissibility in each subgame Ty of Tx. Since s is consistent

with subgame perfection in Tx, there is a subgame perfect equilibrium s� in Tx

194

in which j follows s[j]. Since s� is subgame perfect, s�jTy is subgame perfect in

Ty . Hence s[j]jTy = s�[j]jTy is consistent with subgame perfection in Ty . By in-

ductive hypothesis, this entails that s[j]jTy is consistent with common reasoning

about sequential weak admissibility in Ty . Since this is true for any subgame

Ty of Tx; s[j] is consistent with common reasoning about sequential weak ad-

missibility in Tx. Next, consider s[i], the strategy followed by the player who is

moving at x. I just established that for each iteration WAj(T ) of common rea-

soning about weak sequential admissibility, s�[�i] is consistent with WAj(T ).

Since s� is a subgame perfect equilibrium in Tx; s�[i] is a best reply against

s�[�i] in Tx and each subgame of Tx. So in each subgame Ty of Tx (including

Tx) and at each iterationWAj(T ); s�[i] is a best reply against some strategy pro-

�le of his opponents consistent with WAj(T ), namely s�[�i]jTy , and hence s

�[i]

is sequentially weakly admissible given WAj(T ). Since CRWA(T ) = WA

k(T )

for some k; because T is �nite, s�[i] is consistent with common reasoning about

sequential weak admissibility. This shows that all strategies in the strategy pro-

�le s are consistent with common reasoning about sequential weak admissibility

in Tx, and completes the proof by induction.2

Lemma 9.5 If a game tree T is a restriction of T 0 and si is sequentially admis-

sible in T , then there is an extension s0iof si to T

0 such that s0iis sequentially

weakly admissible in T0.

Proof. I construct s0ias follows. At each information set Ii in T

0 such that

Ii contains a node in T; s0i= si. At all other information sets Ii, s

0ifollows a

strategy that is weakly admissible at Ii. I claim that s0iis sequentially weakly

admissible in T0; let Ii be any information set in T

0 belonging to i.

Case 1: Ii contains a node x in T . Since T is a restriction of T 0; Ii contains

all nodes in IT (x), where IT (x) is the information set in T containing x. So if

si is strictly dominated in T0 at Ii, then si is strictly dominated in T at IT (x),

contrary to the supposition that si is admissible at IT (x)

Case 2: Ii contains no node x in T . By construction, si is weakly admissible

at Ii.2

Proposition 9.6 Let T be a game tree. If a play sequence is consistent withcommon reasoning about sequential admissibility in T , then that play sequenceis consistent with common reasoning about sequential weak admissibility. That

is, fplay(s) : s 2 CRSeq(T )g � fplay(s) : s 2 CRWA(T )g.

Proof. I prove by induction on j � 0 that for each j; T jSeqj(T ) is a restriction

of T jWAj(T ).

Base Case, j = 0. Then Seq0(T ) = WA0(T ), so the claim is immediate.

Inductive Step: Assume that T jSeqj(T ) is a restriction of T jWAj(T ), and

consider j + 1. Choose any strategy pro�le s in Seqj+1(T ). By lemma 9.5,

extend each s[i] in s to a strategy s0[i] that agrees with s[i] on information

sets that have members both in T jSeqj(T ) and T jWA

j(T ), and is sequentially

195

weakly admissible in T jWAj(T ). Call the resulting strategy pro�le s0; s0 is in

WAj+1(T ). Clearly s and s

0 result in the same play sequence, i.e. play(s0) =

play(s), because the same actions are taken at each information set. So all

nodes that are consistent with Seqj+1(T ) are consistent with WAj+1(T ), which

means that T jSeqj+1(T ) is a restriction of T jWAj+1(T ). This completes the

proof by induction. 2

196

Chapter 10

Conclusion

The problem of induction is at the heart of methodology, the philosophy of

science, and such epistemological disciplines as statistics and machine learning.

My dissertation pursued the question of what inferences we should draw from

empirical evidence not through historical case studies or formalizing common

intuitions, but by resolutely seeking hypothetical imperatives for inductive in-

ference, means-ends principles of the form: if those are your aims in inquiry,

then these are the methods you should employ.

Thus my question was what methods are the best for attaining given aims of

inquiry. I obtained principled answers based on a set of means-ends performance

criteria for evaluating the performance of inductive methods. These criteria

result from combining widely accepted epistemic values with standard principles

from decision theory.

Tracing out the consequences of adopting certain cognitive values leads

across some familiar territory, such as the observation, harking back to Descartes,

that those who want to avoid error must believe only what is certain, a means-

ends interpretation of Popper's falsi�cationism, and axioms for \minimal change"

belief revision. The means-ends analysis sheds new light on old principles by

showing them to be the optimal means of choice for certain cognitive values

(but not necessarily for others). It also led me to alternative|and I argue,

better|versions of the standard proposals, notably a new proposal for de�ning

\minimal" theory change.

A venerable tenet of methodology has been that scienti�c methods should

lead us to the right answer about the questions under investigation, if not

quickly, then at least in the long run, and that they should do so reliably; thatis, they should be designed so as to �nd the truth over a range of possible ways

the world could be. Many methodologists have held that this cannot be the

whole story about scienti�c inference, because long-run reliabilty is consistent

with any crazy behavior in the short run, and they want a theory of scienti�c

inference to guide us about what to say here and now. The most fruitful appli-

197

cation of my means-ends analysis addresses this concern. I combine the goal of

�nding the truth in the long run with other auxilliary epistemic values|such

as avoiding error and retractions, and minimizing convergence time|to obtain

powerful short-run constraints on asymptotically reliable methods. On this ap-

proach, we may think of the auxilliary cognitive goals as de�ning standards of

e�ciency for reliable inquiry.

These e�ciency criteria reveal a wealth of epistemologically signi�cant struc-

ture that has so far been overlooked by methodologists almost completely. It

turns out that the e�ciency criteria fall into a tidy hierarchy of feasibility. Thus

these performance standards constitute a scale of inductive complexity: We can

measure the di�culty of an inductive problem by the most stringent standard of

e�ciency that solutions to the problem can obtain. The hierarchy of cognitive

goals shows that two notions of e�ciency are of particular interest with respect

to the problem of induction: minimizing the time required to converge to the

right answer, and avoiding vacillations along the way. The problems of inquiry

in which these criteria apply share a common, topological structure, which I

characterized precisely. I traced this structure in several inductive problems

that look very di�erent on the surface: Goodman's Riddle of Induction, Oc-

cam's Razor, identifying the set of elementary particles, and determining what

particle interactions are possible. In all these cases, the combination of reliabil-

ity, convergence time, and avoiding retractions underwrites intuitively plausible

recommendations for what inferences to draw from given data.

I examined in some detail what e�cient inquiry in particle physics amounts

to. The interaction of means-ends methodology with the conceptual structures

that scientists employ illuminates both. We saw that prominent assumptions,

such as the idea that a set of conservation laws should be the complete theory of

particle reactions, signi�cantly reduce the inductive complexity of the research

problem|they allow reliable and e�cient solutions to the problem. Taking a

close look at the structure of conservation theories, I showed that e�ciency

can require particle theorists to introduce virtual particles, a new and purely

instrumental perspective on the role of hidden particles in particle theories. A

by-product from these insights into the structure of conservation theories is a

surprising connection between the number of stable particles and the number

of conserved quantities: Essentially, under conservation of energy there cannot

be more conservation principles than there are stable particles.

There is more to do in the methodological analysis of particle physics. But

even as it stands, the analysis can serve as a blueprint for other domains: Begin

by stating the problem|what questions are under investigation, and what evi-

dence is available to solve them? Then determine under what assumptions the

problem has a reliable solution, under what assumptions it has e�cient solu-

tions, and what exactly the e�cient solutions are like. The results in this thesis

assemble a variety of tools for carrying out this kind of analysis. Two complex

problems in which they might �nd fruitful applications are inferring theories of

gravitation, and curve �tting.

198

There are a number of important ways to add to the reliabilist toolkit. We

want to cover other aspects of scienti�c method by examining e�ciency in di�er-

ent scenarios for empirical inquiry, for example experimentation. Another open

question is what e�cient inquiry with bounded rationality is like|for example,

what means-ends recommendations can we give to computable agents? The the-

ory of reliable and e�cient empirical inquiry promises to be a rich conceptual

structure with rewarding applications.

199

Bibliography

[Alchourr�on et. al. 1985] C.E. Alchourr�on, P. G�ardenfors and

D. Makinson (1985). \On the logic of

theory change: partial meet contrac-

tion and revision functions." Journal ofSymbolic Logic 50: 510{530.

[Angluin and Smith 1983] Angluin, D. and Smith, C. (1983) \A

survey of inductive inference: The-

ory and methods." Computing Surveys15:237{289.

[Bicchieri and Schulte 1997] Bicchieri, C. and Schulte, O. (1997)

\CommonReasoning about Admissibil-

ity". Erkenntnis 45: 299{325.

[Blum and Blum 1975] Blum, M. and Blum, L. (1975). \To-

ward a Mathematical Theory of Induc-

tive Inference," Information and Con-trol 28:125{155.

[Bub 1994] Bub, J. (1994). \Testing Models of

Cognition Through the Analysis of

Brain-Damaged Performance," British

Journal for the Philosophy of Science.45:837{855.

[Carnap 1962] Carnap, R. (1962) \The Aim of Induc-

tive Logic", in Logic, Methodology andPhilosophy of Science, ed. E. Nagel, P.

Suppes and A. Tarski. Stanford: Stan-

ford University Press.

[Co�a 1991] Co�a, J. (1991) The Semantic Tradi-tion from Kant to Carnap. Cambridge:

Cambridge University Press.

200

[Comte 1968] Comte, A. (1968) A System of PositivePhilosophy. New York, B. Franklin.

[Cooper 1992] Cooper, L.N. Physics: Structure and

Meaning. Hanover, NH: University

Press of New England.

[DeFinetti 1990] DeFinetti, B. (1990). Theory of Proba-

bility, 2 vols. New York: Wiley.

[Donovan et al. 1988] Donovan, A, Laudan, L. and Laudan,

R. (ed.; 1992). Scrutinizing Science.

Baltimore: The John Hopkins Univer-

sity Press.

[Earman 1992] Earman, J. (1992) Bayes or Bust?.

Cambridge, Mass.: MIT Press.

[Feynman 1965] Feynman, R. (1965; 19th ed. 1990) TheCharacter of Physical Law. Cambridge,

Mass.: MIT Press.

[Ford 1963] Ford, K.W. (1963). The World of Ele-mentary Particles. New York: Blaisdell.

[Franklin 1990] Franklin, A. (1990). Experiment, right

or wrong. Cambridge: Cambridge Uni-versity Press.

[Fudenberg and Tirole 1993] Fudenberg, D. and Tirole, J. (1993).

Game Theory. MIT Press, Cambridge,

Mass.

[G�ardenfors 1988] G�ardenfors, P. (1988). Knowledge In

Flux: modeling the dynamics of epis-temic states. Cambridge: MIT Press.

[Gettier 1963] Gettier, E. (1963).

\Is Justi�ed True Belief Knowledge?",

Analysis. 23.6:121{123.

[Glymour 1980] Glymour, C. (1980). Theory and Evi-

dence. Princeton: Princeton University

Press.

[Glymour 1991] Glymour, C. (1991). \The Hierarchies

of Knowledge and the Mathematics of

Discovery", Minds and Machines. 1:75{

95.

201

[Glymour 1994] Glymour, C. (1994). \On the Methods

of Cognitive Neuropsychology," BritishJournal for the Philosophy of Science.

45:815{835.

[Glymour and Kelly 1990] Glymour, C. and Kelly, K. (1990) \Why

You'll Never Know if Roger Penrose is a

Computer," Behavioral and Brain Sci-

ences.Vol. 13.

[Glymour and Kelly 1992] Glymour, C. and Kelly, K. (1992)

\Thoroughly Modern Meno," in: Infer-ence, Explanation and Other Frustra-

tions. ed. John Earman. University of

California Press.

[Gold 1967] Gold, E. (1967). \Language Identi�ca-

tion in the Limit," Information and

Control. 10:447{474.

[Goodman 1983] Goodman, N. (1983). Fact, Fiction

and Forecast. Cambridge, MA: Harvard

University Press.

[Hacking 1968] Hacking, I. (1968). \One problem about

induction", in The Problem of Induc-

tive Logic, ed. Imre Lakatos, pp.44{59.

Amsterdam: North-Holland Publishing

Co.

[Harsanyi 1975] Harsanyi, J.C. (1975). \Can the Max-

imin Principle Serve as a Basis for

Morality? A Critique of John Rawls'

Theory," American Political ScienceReview vol. 69

[Hellman 1997] Hellman, G. (1997). \Bayes and Be-

yond," Forthcoming. Philosophy of Sci-ence.

[Hempel 1965] Hempel, C. (1965).Aspects of Scienti�c

Explanation. New York: Macmillan.

[Hesse 1970] Hesse, M. (1974). The Structure of Sci-

enti�c Inference. Berkeley: University

of California Press.

202

[Hume 1984] Hume, D. (1984). An Inquiry Con-cerning Human Understanding, ed.

C.Hendell. New York: Collier.

[James 1982] James, W. (1982) \The Will To Be-

lieve," in Pragmatism. ed. H.S. Thayer.Indianapolis: Hackett.

[Juhl 1995a] Juhl, C. (1995). \Is Gold-Putnam Di-

agonalization Complete?," Journal ofPhilosophical Logic. 24: 117{138.

[Juhl 1995b] Juhl, C. (1995). \Objectively Reli-

able Subjective Probabilities," Syn-these. forthcoming.

[Juhl 1994] Juhl, C. (1994). \The Speed-Optimality

of Reichenbach's Straight Rule of In-

duction," British Journal for the Phi-losophy of Science. 45:857{863.

[Juhl 1993] Juhl, C. (1993). \Bayesianism and Re-

liable Scienti�c Inquiry," Philosophy ofScience. 60, 2:302{319.

[Juhl and Kelly 1994] Juhl, C. and Kelly, K. (1994) \Realism,

Convergence, and Additivity," in Pro-ceedings of the 1994 Biennial Meeting ofthe Philosophy of Science Association,

ed. D. Hull, M. Forbes and R. Burian.

East Lansing: Mich.: Philosophy of Sci-

ence Association.

[Kant 1785] Kant, I. (1785). Groundwork of the

Metaphysic of Morals trans. J.J. Pa-

ton. London: Hutchinson's University

Library.

[Kelly 1995] Kelly, K. (1995). The Logic of Reli-

able Inquiry. Oxford: Oxford Univer-

sity Press.

[Kelly and Glymour 1992] Kelly, K. and Glymour, C. (1992). \In-

ductive Inference from Theory Laden

Data". Journal of Philosophical Logic.21:391{444.

203

[Kelly et al. 1994] Kelly, K., Juhl, C. and Glymour, C.

(1994). \Reliability, Realism, and Rel-

ativism", in Reading Putnam, ed. P.

Clark. London: BlackIll.

[Kelly and Schulte 1995a] Kelly, K. and Schulte, O. (1995).

\Church's Thesis and Hume's Prob-

lem," Proceedings of the IX Inter-national Joint Congress for Logic,Methodology and the Philosophy of Sci-

ence. Florence 1995.

[Kelly and Schulte 1995b] Kelly, K. and Schulte, O. (1995).

\The Computable Testability of The-

ories Making Uncomputable Predic-

tions,"Erkenntnis. 43:29{66.

[Kelly et al. 1995] Kelly, K., Schulte O. and Hendricks, V.

(1995). \Reliable Belief Revision". Pro-

ceedings of the IX International JointCongress for Logic, Methodology and

the Philosophy of Science. Florence

1995.

[Kelly et al. 1997] Kelly, K., Schulte, O. and Juhl, C.

(forthcoming) \Learning Theory and

the Philosophy of Science,"Philosophyof Science.

[Kitcher 1993] Kitcher, P. (1993). The Advancement

of Science. Oxford: Oxford University

Press.

[Kohlberg and Mertens 1986] Kohlberg, E., and Mertens, J.F. (1986).

\On the Strategic Stability of Equilib-

ria", Econometrica 54:1003{1037.

[Kohlberg 1990] Kohlberg, E. (1990) \Re�nement of

Nash Equilibrium: The Main Ideas", in

Game Theory and Applications, ed. T.Ichiishi, A. Neyman, and Y. Tauman,

Academic Press, San Diego.

[Kreps and Ramey 1987] Kreps, D.M. and Ramey, G. (1987).

\Structural Consistency, Consistency,

and Sequential Rationality", Econo-

metrica 55:1331{1348.

204

[Kuhn 1970] Kuhn, T. (1970). The Structure of Sci-enti�c Revolutions. Chicago: Univer-

sity of Chicago Press.

[Kuhn 1957] Kuhn, T. (1957). The Copernican Rev-

olution. Cambridge, MA: Harvard Uni-

versity Press.

[Kuhn 1953] Kuhn, H. (1953) \Extensive games and

the problem of information," in Contri-butions to the Theory of Games, eds.

Kuhn, H. and Tucker, A. Annals of

Mathematics Studies, 28, Princeton:

Princeton University Press.

[Lakatos 1970] Lakatos, I. (1970). \Falsi�cation and

the Methodology of Scienti�c Research

Programmes," in Criticism and the

Growth of Knowledge. eds. Lakatos, I.and Musgrave, A. Cambridge: Cam-

bridge University Press

[Levi 1967] Levi, I. (1967). Gambling With Truth.

New York: Alfred Knopf.

[Levi 1980] Levi, I. (1980)The Enterprise of Knowl-

edge. Cambridge: MIT Press.

[Levi 1983] Levi, I. (1983). \Truth, Fallibility and

the Growth of Knowledge", in Lan-guage, Logic and Method, eds. Cohen,

R. and Wartofsky, M. D. Reidel Pub-

lishing Company.

[Levi 1988] Levi, I. (1988) \Iteration of conditionals

and the Ramsey test", Synthese 76:49{

81.

[Lewis forthcoming] Lewis, D. ? \Elusive Knowledge," Aus-

tralasian Journal of Philosophy.

[Maher 1996] Maher, P. (1996). \Subjective and Ob-

jective Con�rmation," Philosophy ofScience. Vol. 63, 2: 149{174.

[Martin and Osherson 1995] Martin, E. and Osherson, D. (1995).

\Scienti�c discovery based on belief

205

revision". Proceedings of the X In-ternational Joint Congress for Logic,Methodology and the Philosophy of Sci-

ence. Florence 1995.

[Miller 1974] Miller, D. (1974). \On the Compari-

son of False Theories by Their Bases".

British Journal for the Philosophy ofScience 25:166-177.

[Myerson 1991] Myerson, R.B. (1991).Game Theory,Harvard University Press, Cambridge,

Mass.

[Newell and Simon 1976] Newell, A. and Simon, H. (1976).

\Computer Science as Empirical In-

quiry: Symbols and Search," Commu-nications of the ACM 19: 113{126.

[Newton 1995] Newton, I. (1995). The Principia.

Translated by AndrewMotte. Amherst,

NY: Prometheus Books.

[Nayak 1994] Nayak, A. (1994). \Iterated Belief

Change Based on Epistemic Entrench-

ment". Erkenntnis 41: 353-390.

[Nozick 1981] Nozick, R. (1981). Philosophical Expla-

nations. Cambridge: Harvard Univer-

sity Press.

[Omnes 1971] Omnes, R. (1971). Introduction to Par-ticle Physics. London, New York: Wi-

ley Interscience.

[Osborne and Rubinstein 1994] Osborne, M. and Rubinstein, A. (1994).

A Course in Game Theory, MIT Press,

Cambridge, Mass.

[Osherson and Weinstein 1988] Osherson, D. and Weinstein, S. (1988).

\Mechanical Learners Pay a Price

for Bayesianism," Journal of SymbolicLogic 53: 1245{1252.

[Osherson et al. 1991] Osherson, D., Stob, M. and Weinstein,

S. (1991). \A Universal Inductive In-

ference Machine," Journal of Symbolic

Logic. 56:661{672.

206

[Osherson et al. 1986] Osherson, D., Stob, M. and Weinstein,

S. (1986). Systems That Learn. Cam-

bridge, Mass: MIT Press.

[Peirce 1958] Peirce, C.S. (1958). Collected Pa-

pers of Charles Sanders Peirce, ed.

C.Hartshorne, P.Weiss and A. Burks.

Cambridge, Mass.: Belknap Press.

[Plato 1967] Plato (1967). Plato. Vol. II Translatedby W. Lamb. Cambridge, Ma.: Harvard

University Press.

[Poincare 52] Poincare (1952). Science and Hypothe-sis. New York: Dover

[Popper 1968] Popper, K. (1968). The Logic Of Scien-ti�c Discovery. New York: Harper.

[Popper 1972] Popper, K. (1972). Objective Knowl-edge. Oxford: Clarendon Press.

[Putnam 1963] Putnam, H. (1963). \`Degree of Con�r-

mation' and Inductive Logic," in The

Philosophy of Rudolf Carnap, ed. A.

Schilpp. La Salle, Ill: Open Court.

[Putnam 1965] Putnam, H. (1965). \Trial and Error

Predicates and a Solution to a Prob-

lem of Mostowski," Journal of SymbolicLogic 30: 49{57.

[Putnam 1975] Putnam, H. (1975). \Probability and

Con�rmation," in Mathematics, Matter

and Method. Cambridge: Cambridge

University Press.

[Rawls 1971] Rawls, J. (1971). A Theory of Justice.Cambridge, Ma.: Harvard University

Press.

[Rawls 1996] Rawls, J. (1996). Political Liberalism.

New York, NY: Columbia University

Press.

[Reichenbach 1949] Reichenbach, H. (1949). The Theory ofProbability. London: Cambridge Uni-

versity Press.

207

[Royden 1988] Royden, H.L. (1988).Real Analysis. 3rdedition. New York: Macmillan Publish-

ing Company.

[Salmon 1991] Salmon, W. (1991). \Hans Re-

ichenbach's Vindication of Induction,"

Erkenntnis. 35:99{122.

[Salmon 1963] Salmon, W. (1963). \On Vindicating

Induction," Philosophy of Science 24:

252-261.

[Salmon 1967] Salmon, W. (1967) The Foundations of

Scienti�c Inference. Pittsburgh: Uni-

versity of Pittsburgh Press.

[Savage 1954] Savage, L. (1954). \The Foundations of

Statistics". New York: Dover.

[Schulte and Juhl 1996] Schulte, O. and Juhl, C. \Topology

as Epistemology". The Monist 79:141{

148.

[Searle 1980] Searle, J. (1980). \Minds, Brains and

Programs". The Behavorial and BrainSciences 3:417{24.

[Seidenfeld 1988] Seidenfeld, T. (1988). \Decision The-

ory without `Independence' or without

`Ordering',"Economics and Philosophy4:267{290.

[Selten 1965] Selten, R. (1965) \Spieltheoretische Be-

handlung eines Oligopolmodells

mit Nachfragetr�agheit". Zeitschrift f�urdie gesamte Staatswissenschaft. 12:301-324.

[Sextus Empiricus 1985] Sextus Empiricus (1985). Selections

from the Major Writings on Skepticism,Man and God, ed. P. Hallie, trans. S.Etheridge. Indianapolis: Hackett.

[Shapere 1984] Shapere, D. (1984) Reason and the

Growth of Knowledge. Dordrecht: Rei-del.

208

[Valdes and Erdmann 1994] Valdes and Erdmann (1994). \Sys-

tematic Induction and Parsimony of

Phenomenological Conservation Laws",

Computer Physics Communications 83:171{180.

[Valiant 1984] Valiant, L.G. (1984). \A Theory of

the Learnable", Communications of the

ACM 27(11):1134{1142.

[Van Fraassen 1980] Van Fraassen, B. (1980). The Scienti�c

Image. Oxford: Clarendon Press.

[Von Mises 1981] Von Mises, R. (1981). Probability,Statistics, and Truth. NewYork: Dover.

[Von Neumann and Morgenstern 1947] Von Neumann, J. and Morgenstern, O.

Theory of Games and Economics Be-

havior 2nd edition, Princeton: Prince-

ton University Press.

209

Hard Choices in Scien - Simon Fraser Universityoschulte/files/pubs/schulte_phd.pdf · Chapter 1...

Documents

Transcript of Hard Choices in Scien - Simon Fraser Universityoschulte/files/pubs/schulte_phd.pdf · Chapter 1...